I got a workbook from Dick Kusleika that he thought would be a good candidate for Chart Busters. Basically it’s his sales staff, with their sales and commissions figures. He uses a couple pie charts to compare the salesmen, in terms of sales, commissions, and relative commissions (normalized by sales). Dick describes the process:
1. I look at the Sales chart and look for anomalies in sales by guy. For example, I know Syd has good territory so I’m not surprised that he has a larger slice. David, however, has a crappy territory but I know he had a big order, so again no surprise. Bob also has a bad territory and he didn’t have any large sales, so that would induce me to look into the details and see if there is an error.
2. I look at the Sales chart compared to the Commission chart. They will never be perfectly proportional because guys have different commission rates. Also, some of the sales are non-commissionable. However, I want a chart that points out anomalies so I can investigate for errors (mistyped percentages, misapplied commission, etc).
Here is Dick’s data, and the two pie charts he uses.
Well, aside from the tiny labels (which may just be that way in this hastily hacked together example), the charts are pies, which as we know are generally not a particularly effective way to display data. I’ve decided to try this data in a panel chart.
The first thing I did was convert Dick’s data to a List (or a Table in Excel 2007), to allow easy sorting by any of the relevant columns. I added a column for Ratio, which was simply Commission divided by Sales. I formatted the Ratio column as a percentage and the Sales and Commissions as thousands. I added five more columns, one for each of the columns so far, and two spacer columns. The additional Sales column has this formula in cell E2:
=B2/MAX($B$2:$B$7)*0.95
This gives the largest Sales number a value of 0.95, and the rest are proportionally smaller. The additional Commission and Ratio columns have similar formulas. The spacer columns simply have the amount to bring the cumulative total to 1 or 2. For example, the formula in F2 is:
=1-E2
Start building the chart by selecting the data. Select A1:A7, then hold CTRL while selecting A1:I7, so both areas are highlighted. Insert a stacked horizontal bar chart.
Let’s do a little clean-up. Change the horizontal axis scale maximum to 3 and spacing to 1, remove the gray plot area fill, hide the two spacer series (no fill, no border). Format the vertical axis so the categories are plotted in reverse order (this puts the list in the same order as in the wheet), and so the axis crosses at the maximum category.
We can add a dummy series to add labels above each panel of the chart. Put this helper data into a range someplace:
Copy the data, select the chart, and use Paste Special with the following settings to add the data to the chart as a new series in the proper orientation.
The result is another bar chart series stacked on the rest. it’s not visible, since its values are all zero. Select the series by selecting one of the other series, and press the up or down arrow until this series is selected. Change this series to a Column chart type. Excel automatically put this series onto the secondary axis, and uses the labels on the secondary category axiss along the top of the chart.
Hide the column series by formatting it with no borders and no fill. Hide the primary value axis tick labels (lower horizontal axis).
Finally, if desired you can add value labels to the bars. Since the bars actually plot fractions, we will use Rob Bovey’s Chart Labeler, a free add-in which interfaces smoothly with Excel. Select the chart, click on the XY Chart Labels item on the Tools menu, then Add labels, and make the appropriate selections for the Sales series, then repeat for the Commision and Ratio series.
Now it’s an easy task to check the relative sales of each employee. You can sort by Sales, and see whether the Commission data is similarly sorted, or note whether there may be an error in the figures. The Ratio data is a further check that commissions are appropriately calculated.
So how would you analyze this data?
Naomi B. Robbins says
Note that Jon’s chart is not only a huge improvement for this data but his method can be used with many more sales people. Dick’s pies could not handle many more.
One suggestion that I keep harping on:
If you do label the bars, I’d rather see k and % in the column headers than cluttering the numbers. The values stand out better without the k and %.
Jon Peltier says
Naomi –
Thanks for the observation about how well this chart would scale with a larger staff to monitor.
Your suggestion about the labels is easy to address. The labels aren’t really necessary, since the bars speak for themselves (unlike the pie slices), but some managers like to see them. Here’s the improved version:
Colin Banfield says
Jon, I think that the labels help to provide a sense of scale, in the absence of an axis. However, in many cases, the relative bar lengths would be sufficient.
Out of curiosity, how do your clients share and update charts that use an XY labeler add-in? Is distributing the add-in to every potential user acceptable?
Joe Mako says
My initial reaction is that the data provided does not cover the conditions he is concerned with. One possibility is that the data is not captured, and this is a case of someone who has the data in their head.
I believe that in order to see conditions like:
– Sales vs. Territory Condition and Large Sales
– Anomalies in Commission (mistyped percentages, misapplied commission, etc)
Additional data like this would be useful to create a visual analysis to answer the questions presented:
– Information on the Territory each Sales Guy is assigned (bad, good, etc)
– Raw data that has historical per sale information with details on:
— Type of Sale
— Commission Applied
— Sale Amount
— Expected Commission for types of sales and sales guy
With this level of detail (likely more than one table), he would be able to see and investigate anomalies and errors. Another benefit is that all sorts of other questions could be asked and answered.
I believe he has valid questions, and agree that it would be valuable to have the answers visualized, but the data set is lacking the data to display answers to the questions asked.
I like the final chart you created, but I don’t believe the simple ratio can be fully trusted because of other conditions described that are not in the provided data set. Also I prefer graphs with scales. I think it may be interesting to see the change in percent of total from Sales to commission.
(Click on image to view full size.)
This may be another method to consider for displaying the outliers, but I believe that in order to get a true picture of what is going on, more data would need to be captured.
Jon Peltier says
Colin –
Fortunately, once the labels are in place, they update even as the data is sorted. If my clients need the Labeler for general use, I tell them about it and rely on them to obtain it. If they need labeling as an integral part of something I build for them, I incorporate some of the Labeler’s functionality as part of the project.
Jon Peltier says
Joe –
Good point about the missing information. As Dick points out, he knows some of this anyway. If he needed a robust utility that anyone could use, he’d have to include information about each sales agent’s regions, which of course would complicate the display. Your percentage difference might be more informative than the simple commissions as a percent of sales that I used.
I’ve stated in the past that sometimes the bars which are shaded according to value provide confirmation of the values encoded by bar length, even though the color isn’t critical for this. I’m not really sure that the graduated bar beneath each of your charts adds much to my understanding of the distributions.
Colin Banfield says
Jon, thanks.
Joe, you make some good points. Note that your charts are unnecessarily busy. You already have the bar length to encode the data (plus data labels). What additional information does the color coding provide?
Joe Mako says
Jon –
I agree, removing the legends does not remove any ability to understand the viz, nor does keeping them it add any real value.
Here it is without the legends:
(Click on image to view full size.)
Anonymous says
Jon – my small suggestion regarding your formatting would be to have all the different graphs the same colour, as making them different colours doesn’t encode any more information, and i believe different colours make it just a little harder to compare trends between the graphs.
Perhaps the boxes could go too, as could the decimal points in the series.
A reworked example at http://cid-f380a394764ef31f.skydrive.live.com/self.aspx/.Public/Chartdoctors%20-%20Compare%20Employee%20Sales.xlsx
(Being lazy, I just put 3 bar charts side by side).
Jeff Weir says
Jon – A few minor suggestions re formatting.
1. Perhaps you could lose the colour, as it doesn’t encode any info, and perhaps makes it slightly harder to compare trends between graphs.
2. Could use less decimal points for certain labels
3. Could lose the box around each graph, to give the data (and the eye) a bit more space
My reworked example at http://cid-f380a394764ef31f.skydrive.live.com/self.aspx/.Public/Chartdoctors%20-%20Compare%20Employee%20Sales.xlsx
Jon Peltier says
Anonymous/Jeff –
It would be nice if you posted image files as well as workbooks. The PC I’m on now has Office 2003 without the compatibility pack, so I can’t open the Excel 2007 workbook.
It’s not necessary to use different colors for the different series, but I don’t see how different colors make comparisons more difficult. Could you explain?
The number formatting on the labels is probably a matter of personal preference. Showing 2-3 significant digits seems reasonable.
Jeff Weir says
Sorry Jon…I forgot to fill out the name box when I initially posted my comment.
I’ll saved the workbook in 2003 format.
Re the colours, my point is that they make some series pop out more than others – so at best the more vibrant colors draw the eye away from the less vibrant colors. At best, they are not neutral and so if serve no purpose are probably best avoided.
My point is perhaps a little pedantic…but hey, I’m the kind of guy that uses border lines around series that are just a tad darker (but the same color) as the series themselves, to help them stand out more :-)
And I think that if you redid the graphs all in the same color, then the uniform one would be ‘easier on the eye’.
In his new book Now You See It, Stephen Few describes color as having’pre-attentive attributes’ – basically meaning some processing of different colors happens unconsiously, and (to paraphrase) not evenly. For instance, for me the red series draw my eyes away from the other series. This certainly doesn’t help me concentrate on each series equally. For sure, it’s just a small bias, but there should ideally be no bias unless intended.
Mr Few also worth noting that the distinction between red and green (of the same hue) can’t be seen by 10% of males, or 1% of females. I don’t know whether the red and green above are the same hue, but if so, some people might be looking at it and thinking “Why is the graph on the left a different color than the ones on the right?”. Whereas we want them wondering about the data itself.
Your thoughts? How are you finding Stephen’s book, by the way?
Jeff Weir says
Updated file can be downloaded at http://cid-f380a394764ef31f.skydrive.live.com/self.aspx/.Public/Chartdoctors%20-%20Compare%20Employee%20Sales%20|_%20Excel%202003.xls
Jon Peltier says
Jeff –
Thanks for accommodating an old Luddite with your 2003 workbook. And we’ve probably all forgotten to enter something into a field on a web form, it’s just that I never bothered enforcing names and emails on comments.
I see what you’re saying about the colors, but I don’t imagine it’s too critical, since the series are in separate panes. I also used to use matching borders around bars and columns, but I don’t bother any more.
I’ve suspended reading of Now You See It because I’ve come across a few books on cognition and the human brain. A related topic, actually, and rather fascinating. In fact, my daughter is entering college in the fall, and plans to study BCS (Brain and Cognitive Science). Few has covered the color-blindedness in his earlier works, and I’ve seen the pre-cognitive discussion either in his books or in some related work elsewhere.
Jon Peltier says
The color coding makes use of pre-attentive perception (brought up by Jeff a couple comments ago) to help convey the values.
Joe Mako says
Colin –
You asked “What additional information does the color coding provide?”
It does not add any additional information. I feel it makes the underlying data easier to see.
In this example, I feel the coloring helps bring out Bob’s Commission value as something to be looked into.
Jeff Weir says
@Jon I like the way you turn my critique of color into a design feature, :-)
@Jo – Thanks for posting an example. It’s great to see how different people treat the same data. Some thoughts on your approach:
I agree that your color bars don’t add any additional information. But I disagree that they make the underlying data easier to see. And if you don’t use them, then you should remove the different color gradients from the graphs altogether.
To my eyes:
a) with a continuous range it’s hard in some cases to even find the series colors on the color legend. For instance, the color in Syd’s series in the graph on the extreme right doesn’t look like it exists in the color bar. Perhaps if just 3 discrete colors were used to highlight bad, average, and good ranges, it would be less distracting. But then, a bullet graph would be better at this.
b) it’s distracting, because my eyes and mind have to leave the data in order to decode the meaning of the color in the legend, but this doesn’t tell me anything more than the data itself does.
c) the length of the relative lines are very precise ways to encode quantitative values. The length of Bob’s line is more than enough to draw attention to it. Color is good at drawing your attention to something if used sparingly, but is one of the ‘pre-attentive attributes’ that is not quantitatively percieved in and of themselves (quoting Few here again), and if overused tends to lose it’s ‘pre-attentive attribute’. Whereas lines and 2D precision are very precise ways to encode quantitative values.
I like the extra percentage information you are conveying in your axis. How about if you remove the axis and include the extra percentage information in the data labels? I do that a lot with the graphs I do at work, and believe it’s a good approach as it saves a users eyes from travelling from the end of the graph to the axis and also provides accurate readings that can more easily be intergrated with the other information already in the data lables. Just a thought…
Colin Banfield says
“It does not add any additional information. I feel it makes the underlying data easier to see.”
Don’t see how. For example, in the “Difference in Total Percent” chart, it isn’t obvious to me which number (or numbers) you want to draw the reader’s attention to. Precisely because of the grading, my eyes don’t immediately focus on anything in particular. The point of the grading is even less clear in the other charts.
“In this example, I feel the coloring helps bring out Bob’s Commission value as something to be looked into.”
Even after staring at this chart for a while, that wasn’t obvious to me. On the other hand, if Bob’s commission bar was in a color different from the others, I would have immediately been drawn to it.
Colin Banfield says
Jeff, I think that different colors in Jon’s chart work because we’re looking at different measures (although I won’t argue against the bars being all the same color). At first glance, it was obvious to me that the color differences exist because sales, commissions and ratio are different entities. Hence, I wasn’t drawn to any particular color or chart (at least not consciously). However, if these columns represented the same thing e.g. sales of three different products, I would definitely argue that the bars should all be the same color.
As for the “red” and “green” colors on the chart, the colors used aren’t primary colors, so it’s unclear how a color blind person would perceive them.
Jon Peltier says
Chandoo –
1. I think you sorted the sales data and not the commissions.
2. You remove a visual analysis from the chart by using labels only to encode the percentages.
Colin Banfield says
Chandoo, the chart doesn’t work. I think that it would be OK if we were showing only Sales and Commissions (and the numbers were formatted in $K). However, incorporating two measures on a bar results in a whole pile of confusion. It isn’t obvious at all what (7%, 2000), (4%, 1500) etc. mean as combined measures. Even less obvious is why you would merge the commission and ratio into one thing – especially when there’s no direct correlation between the two measures.
Chandoo says
Hey Jon… very good treatment of the chart. I am sucker for panel charts.
I was looking at the data and thinking may be we can use the good old bar. Here is a little something I have made
Since the sales commission will have to come out of sales revenue, overlapping series works without confusion. As the % is just a calculation of commission / sales, I have included it in the labels.
What do you think?
Jon Peltier says
Chandoo –
The information is numerically redundant, but not visually redundant. And the point is, by keeping the series there, we can let the user put his huge visual cortex to work analyzing it.
Chandoo says
@Jon..
“1. I think you sorted the sales data and not the commissions.”
Oops, my mistake. I have corrected the version.
2. You remove a visual analysis from the chart by using labels only to encode the percentages.
Hmm, While I agree that I have removed a series, you can easily calculate % given the two values. The information is redundant.
@Colin
“Chandoo, the chart doesn’t work. I think that it would be OK if we were showing only Sales and Commissions (and the numbers were formatted in $K). However, incorporating two measures on a bar results in a whole pile of confusion. It isn’t obvious at all what (7%, 2000), (4%, 1500) etc. mean as combined measures. Even less obvious is why you would merge the commission and ratio into one thing – especially when there’s no direct correlation between the two measures.”
If the confusion is due to the fact that I have sorted sales but not commissions, then It is my mistake. I have corrected it now.
There is direct correlation between % and commission.
Chandoo says
@Jon .. debatable. But I have developed contempt for charts and tables where both % and values are mentioned during my MBA days. I still think showing them together is of little use. May be because I secretly derive pleasure in computing percentages myself :D
Jon Peltier says
Chandoo – That’s a perverse pleasure indeed. I still think you’re shortchanging your viewers by leaving out the visual display of the percentages.
jeff weir says
@Chandoo: My point above that “the length of the relative lines are very precise ways to encode quantitative values” is basically restating Jon’s point that:
“the information is numerically redundant, but not visually redundant. And the point is, by keeping the series there, we can let the user put his huge visual cortex to work analyzing it.”
If we were to take your point that “the information is redundant” to the extreme, then we’d just end up with a table of numbers.
For sure, it could be considered redundant to show numerical values (as data labels) as well as numerical values (encoded in line lengths). So if it comes down to a choice between the two, which should we keep…the numbers or the images.
Quoting from Few’s “Now you see it” again:
* “Vision is not just one of the 5 channels through which we sense the world; it is by far the dominant and most powerful sense.
* “When we represent quantitative information in visual form, our ability to think about it is dramatically enhanced”
* “The eye and the visual cortex of the brain form a massively parallel processor that provides the highest-bandwidth channel into human cognitive centers” (Attributed to Colin Ware”
So if it comes down to a choice of one or the other, keep the graphic and lose the numbers. Fortunately we dont have to choose…we can have both.