When plotting data in a line or XY chart, it can be very deceptive to use the Smoothed Lines option. Here is a sample XY chart with dummy data, that has smoothed lines, and no markers to help decode the shape of the data.
It’s hardly a better infographic than a kid using chalk on the driveway. You can’t tell which loops and curves are defined by data, and which are mere embellishments.
If markers are added, you can get a feel for the patterns of the data, though the curvee can be a distraction. There is no reason to draw the retrograde loop around x=6. The data also does not merit the local maximum at x=10.7 and y=10.3, nor the local minimum at x=12 and y=5.5.
If the lines are straight (below), you can guess where there are data points anchoring the lines, though you don’t know whether there might be points located along a straight segment.
The best way to represent data, and the only way to be honest about the data, is to use markers for each known point, connected by straight line segments. This is particularly true for measured data which may have uncertainty in the actual values.
This chart shows markers where there are actual points, connects adjacent points with straight lines, and does not make any implicit statements about values between data points.
The following 12-month rolling chart of product sales presents a more practical example.
What were maximum and minimum sales in the previous year’s section of the chart? Did sales really dip below zero this winter? What were sales in April? I don’t think you can even define negative sales, unless you had a lot of returns. The curvy lines give uncertainty to the sales values and falsely indicate trends in the data. The lack of markers even means you can’t tell with any certainty where each month’s data point is even located.
When plotted properly, we can see that the maximum far the seven months of 2008 was 25, in December, and the minimum was 10 in September and October. February and March of this year had no sales, but in April, sales jumped to 25. And we can be as certain of these observations as we are of the original data.
Bob says
Hi Jon,
Since the start of the new year, I’ve been incorporating small multiples of charts in reports from my department. I have found that the size of the marker needs to increase in relation to line and other elements as the overall size of chart is reduced.
The layout I use is 2 x 2 in a word document.
Any thoughts?
Cheers,
Bob
Jon Peltier says
If I use a size 5 marker in a regular chart, I tend to keep it at the same size until it gets very small. Otherwise with the screen resolution its shape is not distinct. If it’s printed at a fine resolution, then the size can be decreased more uniformly.
Michael says
Mr. Peltier:
Regarding your post on Smoothed Lines, it is incorrect to state that straight lines connecting adjacent points do not make any implicit statements about values between data points. In fact, straight lines imply a linear relationship (y = mx + b) governing whatever process is represented by the two adjacent, discrete samples. If the data were, for example, distance from (x) and intensity of (y) a point-source light, a straight line would of course be wrong.
The only honest way to represent data (observations) is to use markers for each known point, with error bounds (usually bars) indicated. Connecting the data (or indicating a best-fit trend among the data), by whatever shape of line, immediately implies an inferred relationship governing the independent and dependent variables.
All this aside, thanks for yet another thought-provoking post; your Excel blog, the local newspaper online and a decent cup of coffee are all that I require to get my day underway!
Jon Peltier says
Michael –
Thanks for commenting. Your point about straight connecting lines is well taken. If there is no particular form of the connecting line between points (i.e., the curve was not coerced into some kind of arbitrary curve), the implication is simply that point B follows point A (or measurement B was taken right after measurement A). It need not follow that any given point between A and B must fall on the line. Straight line segments that zig and zag from point to point do not indicate that any particular trend fluctuates that much, while curves that are carefully crafted to include each point imply more strongly that the behavior is known and follows thee curve.
In addition to your markers and error bars, perhaps you could include a line of best fit. Or if you know the relationship is, for example, an inverse-squared law, then a 1/r² relationship of best fit. Or make a suitable number of appropriately spaced measurements and connect them with straight lines, which taken together suggest such a nonlinear relationship.
Dick Kusleika says
I’m not a charting guy, so ignore everything I say! I think the first question is, what are you trying to convey. If you want to know what sales in April were, then perhaps a chart, smoothed or otherwise, isn’t the right vehicle. If, however, you want to demonstrate that August and Christmas are big selling months and March stinks, then a chart, smoothed or otherwise, seems fine.
I guess my question is: If I want to display imprecise information, like general trends, is it OK to use a chart with a low level of precision?
Jon Peltier says
Dick –
A chart with a low level of precision conveys information better than a table with a low level of precision. Of course, if there are three numbers, then it’s not worth dusting off your seldom-used charting apparatus to make a chart. Michael’s error bars comprise one way to display imprecision, but often the degree of precision isn’t known as well as a standard deviation or range of values. Especially if the topic is financial numbers, or predictions, or predictions of financial numbers.
Thom Mitchell says
Great title for the post, Jon!
(with apologies to the Chiffons?)
Jon Peltier says
Thom – Thanks. I don’t know where I got it from, it just came out. The Chiffons were sweet talkin’, though, weren’t they?
derek says
Regarding the shape of markers with low screen resolution: that’s why I mostly stick to the square marker. Then the shape is less vulnerable to Excel 2003 or lower’s poor symbol handling. I’m hoping 2010 will be an improvement (psst.. Microsoft! there’s this technology called Truetype, you may have heard of it?)
Speaking of which, I sometimes abandon crude markers altogether, in favour of Wingdings or other text symbols. But that has its own hazards, because they don’t always match up nicely with the lines. You can use error bars as crosshairs to check whether any drift is small enough to tolerate.
Or you can use the error bars themselves. A zero-length error bar with no crossbar and a thick line width makes a surprisingly neat small circular dot, just as precisely placed as, but less pixellated than, the circle marker.
Mike Woodhouse says
Plotting data like this (these?) as line charts just begs for someone to add smoothing (almost always a guarantee of lost information) or some inappropriate trend or curve-fit. In pre-computer days we’d join the dots on our paper charts to save ourselves the tedium of hand-colouring the bars; these days we can just change to a column chart and be done with it. Discrete points should be plotted discretely. Something like that.
Lori Miller says
Joining points with straight lines makes more sense than smoothed lines for financial series as the relationships are often not inherently smooth, instead they are modeled on random walks or fractals.
If the underlying relationship is smooth however, smoothed lines will generally be closer to the truth than straight lines (i.e. the error term is smaller) and these kinds of curve are used widely in situations when no particular functional form is assumed. Applications include engineering design, computer animation, typography, non-parametric statistics, fixed-income finance, etc.
The Excel curve is a type of spline with tension applied in parts of the curve where points are relatively close to each other, details can be found in a few threads in excel.charting. For a non-technical description, see:
http://msdn.microsoft.com/en-us/library/ms536358(VS.85).aspx
Jon Peltier says
Hi Lori –
Thanks for joining the discussion. Regarding smooth lines for connecting points, I almost presented an example I made some time ago showing wind chill factors. Smoothed lines made sense in that case, as in your examples of engineering design.
When I wrote this post, I was thinking of measured data, though, not analytical output. As with financial (stochastic or random-walk exercises), measured data already has error built into the values in the chart, and there is no reason to assume the curves that Excel draws through the points will model the data any better than straight lines. If you know the shape of the relationship, go ahead and draw a best fit curve through the measured data using the known relationship.
According to Brian Murphy of Rotating Machinery Analysis, Inc. (who has analyzed the living snot out of it), Excel connects data points using beziers for the smoothed lines. Brian presents an analysis in a downloadable file, Smooth_curve_bezier_example_file.xls. I’m not sure the result is much different than for the cardinal spline you cited.
Lori Miller says
Jon – i agree that measured data should generally be handled using regression techniques particularly for making forecasts. But the simple trends often inserted into Excel charts can be quite misleading particularly for financial or time series data where other factors are in play such as serial correlation and hidden variable bias.
When mesurement errors are small compared to the chart scale or data is quoted at a fixed level it can make sense to use splines for estimation and there is a large literature on the subject. For example they are used to construct yield curves from bond yields of varying maturities and for modeling empirical distribution functions. Coming from a stats/finance background, i don’t know much about how it works in engineering or other areas.
The conclusions about Excel curves actually followed from Brian’s analysis and i’ve been in contact with him to discuss this. Cardinal splines can be viewed as just a particular type of Bezier curve, restricted to one parameter and using a different set of reference points.
Jon Peltier says
“…how it works in engineering…”
In design or in theoretical areas (more R&D than E) you can often use some kind of smoothed lines. I mentioned wind chill; stress concentration factors is another use. As I pointed out before, if you know the shape of the phenomenon being modeled, and as you said, if the measurement errors are relatively small, then go ahead.
I still prefer to see unsmoothed lines, though, so I know clearly where the data is “known”, and I will do the smoothing in my own mind. This is for transferring information, not for making predictions.
“Cardinal splines can be viewed as just a particular type of Bezier curve”
I suspected as much.