Peltier Tech Blog

Excel Chart Add-Ins | Training | Custom Solutions | Charts and Tutorials | PTS Blog

 

Main menu:

 
Peltier Tech
Chart Add-Ins

Peltier Tech Waterfall Chart Utility
Peltier Tech Cluster-Stack Chart Utility
Peltier Tech Box and Whisker Chart Utility
Peltier Tech Marimekko Chart Utility
Peltier Tech Dot Plot Utility
Peltier Tech Cascade Chart Utility

 
Excel Dashboards

Subscribe

Site search

Subscribe

Site search


Recent Posts

Recently Commented

Popular Posts

Archive


 

Categories


 

Privacy Policy

Creative Commons License
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Smooth Talking Lies

by Jon Peltier
Thursday, May 28th, 2009
Peltier Technical Services, Inc., Copyright © 2010.
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

When plotting data in a line or XY chart, it can be very deceptive to use the Smoothed Lines option. Here is a sample XY chart with dummy data, that has smoothed lines, and no markers to help decode the shape of the data.

Smoothed line without markers

It’s hardly a better infographic than a kid using chalk on the driveway. You can’t tell which loops and curves are defined by data, and which are mere embellishments.

If markers are added, you can get a feel for the patterns of the data, though the curvee can be a distraction. There is no reason to draw the retrograde loop around x=6. The data also does not merit the local maximum at x=10.7 and y=10.3, nor the local minimum at x=12 and y=5.5.

Smoothed line with markers

If the lines are straight (below), you can guess where there are data points anchoring the lines, though you don’t know whether there might be points located along a straight segment.

Straight line segments without markers

The best way to represent data, and the only way to be honest about the data, is to use markers for each known point, connected by straight line segments. This is particularly true for measured data which may have uncertainty in the actual values.

Straight line segments with markers

This chart shows markers where there are actual points, connects adjacent points with straight lines, and does not make any implicit statements about values between data points.

The following 12-month rolling chart of product sales presents a more practical example.

Sales - smoothed line without markers

What were maximum and minimum sales in the previous year’s section of the chart? Did sales really dip below zero this winter? What weree sales in April? I don’t think you can even define negative sales, unless you had a lot of returns. The curvy lines give uncertainty to the sales values and falsely indicate trends in the data. The lack of markers even means you can’t tell with any certainty where each month’s data point is even located.

When plotted properly, we can see that the maximum far the seven months of 2008 was 25, in December, and the minimum was 10 in September and October. February and March of this year had no sales, but in April, sales jumped to 25. And we can be as certain of these observations as we are of the original data.

Sales - straight line with markers

Related Posts:

Bookmark and share this entry:
  • Twitter
  • Digg
  • Facebook
  • LinkedIn
  • del.icio.us
  • Technorati
  • StumbleUpon
  • Google Bookmarks
  • Reddit
  • MySpace
  • Slashdot
  • Yahoo! Buzz
  • Tumblr

Learn how to create Excel dashboards.

Comments


Comment from Bob
Time: Thursday, May 28, 2009, 5:09 am

Hi Jon,

Since the start of the new year, I’ve been incorporating small multiples of charts in reports from my department. I have found that the size of the marker needs to increase in relation to line and other elements as the overall size of chart is reduced.

The layout I use is 2 x 2 in a word document.

Any thoughts?

Cheers,

Bob


Comment from Jon Peltier
Time: Thursday, May 28, 2009, 6:06 am

If I use a size 5 marker in a regular chart, I tend to keep it at the same size until it gets very small. Otherwise with the screen resolution its shape is not distinct. If it’s printed at a fine resolution, then the size can be decreased more uniformly.


Comment from Michael
Time: Thursday, May 28, 2009, 7:24 am

Mr. Peltier:

Regarding your post on Smoothed Lines, it is incorrect to state that straight lines connecting adjacent points do not make any implicit statements about values between data points. In fact, straight lines imply a linear relationship (y = mx + b) governing whatever process is represented by the two adjacent, discrete samples. If the data were, for example, distance from (x) and intensity of (y) a point-source light, a straight line would of course be wrong.

The only honest way to represent data (observations) is to use markers for each known point, with error bounds (usually bars) indicated. Connecting the data (or indicating a best-fit trend among the data), by whatever shape of line, immediately implies an inferred relationship governing the independent and dependent variables.

All this aside, thanks for yet another thought-provoking post; your Excel blog, the local newspaper online and a decent cup of coffee are all that I require to get my day underway!


Comment from Jon Peltier
Time: Thursday, May 28, 2009, 8:12 am

Michael -

Thanks for commenting. Your point about straight connecting lines is well taken. If there is no particular form of the connecting line between points (i.e., the curve was not coerced into some kind of arbitrary curve), the implication is simply that point B follows point A (or measurement B was taken right after measurement A). It need not follow that any given point between A and B must fall on the line. Straight line segments that zig and zag from point to point do not indicate that any particular trend fluctuates that much, while curves that are carefully crafted to include each point imply more strongly that the behavior is known and follows thee curve.

In addition to your markers and error bars, perhaps you could include a line of best fit. Or if you know the relationship is, for example, an inverse-squared law, then a 1/r² relationship of best fit. Or make a suitable number of appropriately spaced measurements and connect them with straight lines, which taken together suggest such a nonlinear relationship.


Comment from Dick Kusleika
Time: Thursday, May 28, 2009, 9:09 am

I’m not a charting guy, so ignore everything I say! I think the first question is, what are you trying to convey. If you want to know what sales in April were, then perhaps a chart, smoothed or otherwise, isn’t the right vehicle. If, however, you want to demonstrate that August and Christmas are big selling months and March stinks, then a chart, smoothed or otherwise, seems fine.

I guess my question is: If I want to display imprecise information, like general trends, is it OK to use a chart with a low level of precision?


Comment from Jon Peltier
Time: Thursday, May 28, 2009, 10:00 am

Dick -

A chart with a low level of precision conveys information better than a table with a low level of precision. Of course, if there are three numbers, then it’s not worth dusting off your seldom-used charting apparatus to make a chart. Michael’s error bars comprise one way to display imprecision, but often the degree of precision isn’t known as well as a standard deviation or range of values. Especially if the topic is financial numbers, or predictions, or predictions of financial numbers.


Comment from Thom Mitchell
Time: Thursday, May 28, 2009, 12:59 pm

Great title for the post, Jon!
(with apologies to the Chiffons?)


Comment from Jon Peltier
Time: Thursday, May 28, 2009, 2:05 pm

Thom – Thanks. I don’t know where I got it from, it just came out. The Chiffons were sweet talkin’, though, weren’t they?


Comment from derek
Time: Thursday, May 28, 2009, 2:07 pm

Regarding the shape of markers with low screen resolution: that’s why I mostly stick to the square marker. Then the shape is less vulnerable to Excel 2003 or lower’s poor symbol handling. I’m hoping 2010 will be an improvement (psst.. Microsoft! there’s this technology called Truetype, you may have heard of it?)

Speaking of which, I sometimes abandon crude markers altogether, in favour of Wingdings or other text symbols. But that has its own hazards, because they don’t always match up nicely with the lines. You can use error bars as crosshairs to check whether any drift is small enough to tolerate.

Or you can use the error bars themselves. A zero-length error bar with no crossbar and a thick line width makes a surprisingly neat small circular dot, just as precisely placed as, but less pixellated than, the circle marker.


Comment from Mike Woodhouse
Time: Thursday, May 28, 2009, 2:49 pm

Plotting data like this (these?) as line charts just begs for someone to add smoothing (almost always a guarantee of lost information) or some inappropriate trend or curve-fit. In pre-computer days we’d join the dots on our paper charts to save ourselves the tedium of hand-colouring the bars; these days we can just change to a column chart and be done with it. Discrete points should be plotted discretely. Something like that.


Comment from Lori Miller
Time: Monday, June 1, 2009, 7:10 pm

Joining points with straight lines makes more sense than smoothed lines for financial series as the relationships are often not inherently smooth, instead they are modeled on random walks or fractals.

If the underlying relationship is smooth however, smoothed lines will generally be closer to the truth than straight lines (i.e. the error term is smaller) and these kinds of curve are used widely in situations when no particular functional form is assumed. Applications include engineering design, computer animation, typography, non-parametric statistics, fixed-income finance, etc.

The Excel curve is a type of spline with tension applied in parts of the curve where points are relatively close to each other, details can be found in a few threads in excel.charting. For a non-technical description, see:
http://msdn.microsoft.com/en-us/library/ms536358(VS.85).aspx


Comment from Jon Peltier
Time: Monday, June 1, 2009, 7:50 pm

Hi Lori -

Thanks for joining the discussion. Regarding smooth lines for connecting points, I almost presented an example I made some time ago showing wind chill factors. Smoothed lines made sense in that case, as in your examples of engineering design.

When I wrote this post, I was thinking of measured data, though, not analytical output. As with financial (stochastic or random-walk exercises), measured data already has error built into the values in the chart, and there is no reason to assume the curves that Excel draws through the points will model the data any better than straight lines. If you know the shape of the relationship, go ahead and draw a best fit curve through the measured data using the known relationship.

According to Brian Murphy of Rotating Machinery Analysis, Inc. (who has analyzed the living snot out of it), Excel connects data points using beziers for the smoothed lines. Brian presents an analysis in a downloadable file, Smooth_curve_bezier_example_file.xls. I’m not sure the result is much different than for the cardinal spline you cited.


Comment from Lori Miller
Time: Monday, June 1, 2009, 9:43 pm

Jon – i agree that measured data should generally be handled using regression techniques particularly for making forecasts. But the simple trends often inserted into Excel charts can be quite misleading particularly for financial or time series data where other factors are in play such as serial correlation and hidden variable bias.

When mesurement errors are small compared to the chart scale or data is quoted at a fixed level it can make sense to use splines for estimation and there is a large literature on the subject. For example they are used to construct yield curves from bond yields of varying maturities and for modeling empirical distribution functions. Coming from a stats/finance background, i don’t know much about how it works in engineering or other areas.

The conclusions about Excel curves actually followed from Brian’s analysis and i’ve been in contact with him to discuss this. Cardinal splines can be viewed as just a particular type of Bezier curve, restricted to one parameter and using a different set of reference points.


Comment from Jon Peltier
Time: Monday, June 1, 2009, 10:58 pm

“…how it works in engineering…”

In design or in theoretical areas (more R&D than E) you can often use some kind of smoothed lines. I mentioned wind chill; stress concentration factors is another use. As I pointed out before, if you know the shape of the phenomenon being modeled, and as you said, if the measurement errors are relatively small, then go ahead.

I still prefer to see unsmoothed lines, though, so I know clearly where the data is “known”, and I will do the smoothing in my own mind. This is for transferring information, not for making predictions.

“Cardinal splines can be viewed as just a particular type of Bezier curve”

I suspected as much.

Write a comment

I welcome comments from my readers. If you have an opinion on this post, if you have a question or if there is anything to add, I want to hear from you. Whether you agree or disagree, please join the discussion.

If you want to include an image in your comment, post it on your own site or on one of the many free image sharing sites, and include a link in your comment. I'll download your image and insert the necessary html to display the image inline.

Read the PTS Blog Comment Policy.





Subscribe without commenting

Peltier Tech Waterfall Chart Utility Peltier Tech Cluster-Stack Chart Utility Peltier Tech Box and Whisker Chart Utility Peltier Tech Marimekko Chart Utility Peltier Tech Dot Plot Utility Peltier Tech Cascade Chart Utility

Create Excel dashboards quickly with Plug-N-Play reports.