I have a few rules for charting measured data:
- Measured data should be plotted on an XY chart.
A line chart is appropriate if the X values are dates and a date-scale X axis is used. - Measured data should be plotted with markers.
- Measured data may have lines connecting points if it makes sense.
It makes sense to use connecting lines if the X variable is increasing monotonically, for example, if you are taking measurements at regular time intervals or at regular increments of the X value. - Lines connecting measured data points should be straight, never smoothed.
In a recent post, I wrote:
You should avoid charts with smoothed lines, especially without markers, because the smoothed lines may misrepresent the actual data being plotted.
In response, Tim Mayes sent me this pair of charts (which he called “Smoothed-XY-Charts-Are-Evil.png“) and his commentary:
This picture shows two XY Scatter charts of the same data. This is supposed to be a step function as shown on the left side. If you choose a smoothed XY scatter, instead of Scatter with Straight Lines, then you get the chart on the right. All too often my students will turn in an example that looks like the one on the right, even though they have seen numerous examples of step functions and I’ve lectured extensively on them.
I’ve seen many more bad examples than good of charts with smoothed lines. One case I recall involved measurements of voltage that we knew ranged between 0 and 10. No measurements fell outside this range, yet the chart extended beyond it.
When the markers are omitted, it looks like the data ranges from about -1 to about 11. I’ve had people ask me how to determine the height of the peaks, which makes little sense:
- There is no evidence in the data that the curve should extend to where it is drawn.
- The smooth curve changes when axis scale values change and when the size of the chart itself changes.
One case where smoothed lines may be acceptable is when the chart shows known (not measured) behavior over a range, generally when values are calculated from a theoretically- or empirically-derived function. One example is this chart I did years ago (back in the 90’s) showing wind chill, that is, perceived temperature based on actual temperature and wind velocity.
When the plot fills a sheet of paper, the smoothed lines worked well. When it is shrunk to fit a column on a web page (as above), the lines look irregular. A better approach would be to calculate data at closer intervals, so many shorter straight lines approximate the desired smooth curve.