PTS Blog

Main menu:

PTS Utilities

Commercial Utilities developed by Peltier Technical Services

Waterfall Chart
Box and Whiskers


 

Excel Books

Books that I own and use while developing in Excel

Goods and Services

Excel or charting related products and services which I use or feel are worthwhile additions

Subscribe

Subscribe

Site search


Recent Posts

Recently Commented

Popular Posts

Archive


 

Categories


 

Creative Commons License
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Suicide Rates in Japan

by Jon Peltier
Peltier Technical Services, Inc., Copyright © 2008.
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

In Can You Improve this Graph Showing Suicide Rates in Japan?, Nathan of FlowingData shows a chart of long-term unemployment rates and suicide rates in Japan. The chart comes from Suicide Epidemic in Japan.

What’s wrong with this chart (which I’ve reproduced below)?

  • Axis label is faulty. “Japan and Suicide Rate” should be “Japan Long Term Unemployment and Suicide Rates”.
  • Rates are not defined. Suicide Rate is number of suicides per 100,000 population. Long Term Unemployment Rate is percentage of total unemployed who have been unemployed for over twelve months.
  • Frequency of data is different. Using markers as in my version, it is clear that suicide rates are reported with much less frequency than long term unemployment. At least they didn’t use smoothed lines*.
  • Such unrelated rates should not be plotted on the same axis. This chart makes it look like the rates were about the same in 1980, then unemployment dropped below suicides, then it rose above suicides, then both remained the same from about 1992 through 2000.

*Formatting the chart with smoothed lines and no markers gives the reader no indication that the data is reported on a different time scale, and may make it look as though the correlation is even closer.

The first step is to address the overlapping scales. The chart above using one scale is misleading. I can add a secondary axis, and adjust the relative scales of the two series to mislead the reader in any way I want.

The only way to make the scales free from confusion is to plot he series on completely different, non-overlapping scales. I’ve split the chart into two panels to show the series separately on their own scales. It actually looks like the rise in suicides is leading the rise in long term unemployment. If I thought higher unemployment led to increased suicides, I’d expect unemployment to lead.

If we’re looking for a relationship between the two variables, we should plot them on an XY chart. I’ve labeled the points with year, so that one can trace the evolution of the relationship. If I do the math, I get a correlation (R²) of 0.60, not a very strong relationship. By eye, I can see two regions in the chart.

  • Up to 1995, when both suicides and unemployment was low, there is a very good negative relationship between suicides and unemployment. Are the number of suicides great enough to reduce the number of long term unemployed?
  • From 2000 onward, there seems to be a constant suicide rate despite the large increase in long-term unemployment.

Six data points isn’t much to base a hypothesis on, so I made a (probably invalid) attempt to rectify this, by interpolating suicide rates for the in-between years with no data. The chart looks very much like the one above; I’ve used filled markers to denote actual data, and unfilled for interpolated points.

The correlation plot looks pretty much the same, a bit more convoluted. I did not bother calculating correlation coefficients, as that would stretch the validity of this exercise beyond my comfort zone.

Possibly Related Posts:

Bookmark and share this entry:
  • Digg
  • del.icio.us
  • Facebook
  • Technorati
  • TwitThis
  • StumbleUpon
  • Google
  • Reddit
  • MySpace

Comments

I welcome comments from my readers. If you have an opinion on this post, if you have a question or if there is anything to add, I want to hear from you. Whether you agree or disagree, please join the discussion.

Read the PTS Blog Comment Policy.


Comment from derek
Time: Saturday, August 2, 2008, 4:43 pm

I can add a secondary axis, and adjust the relative scales of the two series to mislead the reader in any way I want.

That is the problem with secondary axes. If this data set had a better correlation, you could use the linear least-squares fit to assign a non-arbitrary set of values to the primary and secondary axes, such that the means coincided and the scales were in proportion to the slope of the trend line.

As well as being non-arbitrary (so you can’t be accused of cherry picking), this has the extra advantage of being the scaling that produces the strongest visual impression of correlation. It is, after all, the fit with the least square deviation. So it’s a deceiver’s dream! Fortunately, the scalings also look really suspicious with their odd decimal fractions.

Kaiser Fung of Junk Charts describes a similar technique in “The eyeball test”.


Comment from Juan Orozco
Time: Sunday, August 3, 2008, 1:49 am

Good analysis on the Japan unemployment vs. suicide rate. One test I normally run is the F test. The R2 just says what % of the error (sum of squared errors) are explained by the regression. But we may be biased to reject lower R2’s (or the underlying regression) in large datasets where, in fact, they should be considered. Or accept R2’s in small datasets, when they should be rejected. A better test is the F-test. It answers “what are the chances of this relationship being a random behavior”. As we know, in statistics, we can reject or accept a relationship, depending on our level of confidence. So if the F test returns 99%, and your confidence level is 95%, you can accept there is a linear relationship.

Write a comment





Subscribe without commenting

Create Excel dashboards quickly with Plug-N-Play reports.