In the Calculating Easter post on John Walkenbach‘s new Spreadsheet Page blog, John presented a bar chart showing the occurrences of the dates of Easter between 1900 and 2199. He sorted his dates in decreasing order of occurrence (below left). In Gradients, Fills, and Shadows, Oh My I proposed sorting in date order (below right).
There are advantages to both sorting techniques. The sorting by occurrence (left), as in a Pareto chart, makes it easy to see which item has the largest or smallest incidence, and if you need to take a particular action, you can simply start at one end of the list and work your way to the other.
Often when people have plot this kind of data, instead of sorting by incidence, they will sort in alphabetical order. This is generally not so useful, because alphabetical sorting is rather arbitrary: only the incidence has a quantitative meaning.
Dates, however, have a quantitative meaning, an order all their own. You can use a Pareto sort, as John has, or a date-order sort, as I have. If you are concerned only with the absolute incidences, then the Pareto sort makes sense. If you are interested in the distribution of incidences over your range of dates, then the date-order sort is more useful. It might help you see that certain days of the week or months of the year tend to have higher incidences.
Another advantage of the date-order sorting (or another numerical sorting in a histogram) is that, if you are pressed for space and compress the chart, you can leave out category labels without deleting information from the chart. In the chart below right, I know that 17-Apr fits right between 16-Apr and 18-Apr. In the incidence sorting below left, I can only guess where 17-Apr fits (between 15-Apr and 30-Mar), and if the incidences changed, I would have to guess again where to find 17-Apr.
I couldn’t think of a good way to plot the way that dates move around as the sampling size of Easter dates changes. I made this table showing the incidences of the date of Easter sorted by occurrence, for 100, 200, 300, 400, and 500 years. I highlighted some arbitrary dates, showing how they move around. Some dates stay relatively close to the same position, but some move up and down many rankings. This rearrangement of the scale makes any comparisons meaningless.
In contrast, the dates in the date-order stay in the same order. By definition. I can plot the curves for 100 through 500 sample points together, and see that they follow roughly the same distribution, and I can plot the curves separately and see the same thing. Even though there may be some movement from one curve to the other, it is easier to understand the charted behavior.
Another reason why I was interested in the distribution by date was for a comparison of the dates of Easter calculated by Western Christians (i.e., the Roman dates) and those calculated by Orthodox Christians (i,.e., the Greek dates). This is part of a topic for an upcoming post, but I’ll illustrate the utility of the date-order scale here.
I can plot the two Easter date distributions either together (first chart below) or in separate panels (second chart below. I can see that the distributions are similar in shape, but offset by about two weeks (the Greeks place Easter about two weeks later on average than do the Romans). I’ve displayed only every third date, but I don’t need the missing ones to understand the distributions. A comparison of two Pareto charts would yield no meaningful observations.