In Can You Improve this Graph Showing Suicide Rates in Japan?, Nathan of FlowingData shows a chart of long-term unemployment rates and suicide rates in Japan. The chart comes from Suicide Epidemic in Japan.
What’s wrong with this chart (which I’ve reproduced below)?
- Axis label is faulty. “Japan and Suicide Rate” should be “Japan Long Term Unemployment and Suicide Rates”.
- Rates are not defined. Suicide Rate is number of suicides per 100,000 population. Long Term Unemployment Rate is percentage of total unemployed who have been unemployed for over twelve months.
- Frequency of data is different. Using markers as in my version, it is clear that suicide rates are reported with much less frequency than long term unemployment. At least they didn’t use smoothed lines*.
- Such unrelated rates should not be plotted on the same axis. This chart makes it look like the rates were about the same in 1980, then unemployment dropped below suicides, then it rose above suicides, then both remained the same from about 1992 through 2000.
*Formatting the chart with smoothed lines and no markers gives the reader no indication that the data is reported on a different time scale, and may make it look as though the correlation is even closer.
The first step is to address the overlapping scales. The chart above using one scale is misleading. I can add a secondary axis, and adjust the relative scales of the two series to mislead the reader in any way I want.
The only way to make the scales free from confusion is to plot he series on completely different, non-overlapping scales. I’ve split the chart into two panels to show the series separately on their own scales. It actually looks like the rise in suicides is leading the rise in long term unemployment. If I thought higher unemployment led to increased suicides, I’d expect unemployment to lead.
If we’re looking for a relationship between the two variables, we should plot them on an XY chart. I’ve labeled the points with year, so that one can trace the evolution of the relationship. If I do the math, I get a correlation (R²) of 0.60, not a very strong relationship. By eye, I can see two regions in the chart.
- Up to 1995, when both suicides and unemployment was low, there is a very good negative relationship between suicides and unemployment. Are the number of suicides great enough to reduce the number of long term unemployed?
- From 2000 onward, there seems to be a constant suicide rate despite the large increase in long-term unemployment.
Six data points isn’t much to base a hypothesis on, so I made a (probably invalid) attempt to rectify this, by interpolating suicide rates for the in-between years with no data. The chart looks very much like the one above; I’ve used filled markers to denote actual data, and unfilled for interpolated points.
The correlation plot looks pretty much the same, a bit more convoluted. I did not bother calculating correlation coefficients, as that would stretch the validity of this exercise beyond my comfort zone.