In my last post, Moving Averages, I made a bold comment that seasonally adjusted economic quantities used to intrigue me until I learned that they are merely moving averages. I said this only semi-seriously, and Rick Williams commented that seasonally adjusted averages really are seasonally adjusted, taking into account day of the week, month of the years, and other factors. This makes sense; otherwise month-to-month variability due to fluctuating numbers of weekends in each month would disrupt a year-over-year analysis by month. Month-by-month analyses would suffer because some months (e.g., November and especially December) show strong effects of holidays.
Rick has provided a link to FAQs on Seasonal Adjustment from the US Census Bureau. I will follow up with this and other sources.
I’ve been using my blog stats, courtesy WPStats, to discuss various charting techniques. I’ll use this same database to explore approaches to seasonal adjustment. The raw stats with a 7-day moving average are shown here.
I’ve used a pivot table to deconstruct the stats two ways. In this first view, the data is very much like the time series above, with an accentuated division between weeks. I find this helps me distinguish nonconforming points, such as the dip on Thursday, 3 April, the day I switched hosting companies, and another dip on Monday, 26 May, the Memorial Day holiday. I have no explanation for the dip on Friday, 25 April. I believe the spike on Thursday, 8 May, is due to a controversial post, Changes to Charting in Excel 2007, which received many comments.
Using formulas based on the pivot results, I’ve added a line for the weekly average across each week’s data.
I’ve plotted the orthogonal view of the pivot table results below. Rather than grouping by week, I have grouped by day of the week, so the trend for each day is shown distinct of the surrounding days. I’ve overlaid the average for each day of the week on the chart, as well as a simple linear regression fit for each day. The fits are not particularly tight (R² ranges from 0.44 to 0.87), due to fluctuation from week to week, but they all show strong growth over time.
Alert readers may recognize this chart as an example of a Cycle Plot, which Excel MVP Charley Kyd of ExcelUser.com wrote about in Create Cycle Plots in Excel to Chart Seasonal Sales Data and Naomi Robbins wrote about in Introduction to Cycle Plots.
The purpose of seasonally adjusting the data would be to show a better estimation of the traffic on the blog, after removal of cyclical variation (day of the week, month of the year) and other predictable variation (Memorial Day and other holidays). The growth shown by the horizontal lines in the weekly chart above would probably look a bit smoother, the dip for 5/11-5/17 would still be evident, and the spike on 5/8 and the dips on 4/3, 4/25, and 5/26 would also be apparent, and we could look at these dates for contributing causes to their deviations.
I decided to take a cut at seasonally adjusting my rather small data set. Warning: I’m making this up as I go along. It may be completely wrong, but I’m on a roll.
I first decided to smooth out the variation from week to week, by using the linear regressions for each day. These gave me the predicted values shown with the raw values below.
When these are shown by week, they do smooth out the variations well.
When the original (raw) data is removed, we see that the smoothed data shows variation in the shape of each week’s data.
To smooth this out, I decided to adjust the slopes. I decided to try normalizing each day’s slope to each day’s average value. The new charts are shown below.
The new smoothed day to day values show the same shape across each week.
Very nice, this is all completely normalized, and we see no sense of the non-cyclical variation. To show this, I’ll look at the residuals, that is, the difference between the actual and the processed data.
Because the magnitudes of weekdays are greater than weekends, the weekday residuals may be proportionally too large. An alternative view shows the ratio of the actual to the smoothed data.
Both plots show similar deviations. We can see the dips when the hosting company was switched and when people stayed off the internet on Memorial Day; we can see the spike following the Excel 2007 Charting post. However, this was just an exploratory look at the data, and when I revisit it with a rigorous approach based on the census bureau’s article, my analysis is sure to be completely different.
derek says
Michael Middleton’s Data Analysis Using Microsoft Excel also contains a chapter with several good methods of time series forecasting. Naturally they do the opposite of seasonally adjust, but you can get the same effect by removing the seasonal component of the forecast.
Jon Peltier says
Derek – Thanks for reminding me to look in my library. I own a few books on data analysis in Excel (not Mike’s, though I respect what work he’s shared on his web site and in the Excel community). My next step, after the unguided (misguided?) analysis above, should be to review the approaches presented in these books, and in the links from the Census Bureau and other authorities. Then hopefully a bit wiser, try again.
len says
Maybe it would be useful to drop local outliers, in your case probably by distance from the trend line in orthogonal view (third chart). There is no reason for May 8th and the likes to be included in this analysis..
Jon Peltier says
Len –
When I return to this, I will have a mechanism for deciding whether some points are outliers and should be excluded. Special causes (switching hosts) can be excluded, maybe so can a couple other days if they are beyond the confidence intervals computed for the rough fits of the cycle plot.