Dangers of data mining: The case of calendar effects in stock returns

Sullivan, Ryan, Allan Timmerman, and Halbert White, “Dangers of data mining:  The case of calendar effects in stock returns,” Journal of Econometrics 105 (2001), 249-286.

Using the same set of data to both formulate and test a hypothesis introduces data-mining biases.  Calendar effects in stock returns are an outstanding instance of data-driven findings.  Evaluated correctly, however, these calendar effects are not statistically significant.

Researchers have documented day of the week effects, week of the month effects, month of the year effects, and effects for turn of the month, turn of the year, and holidays, none of which was predicted ex ante by theory.  By pure statistical chance, when enough theories are tested on the same set of U.S. publicly-traded common stock returns, some of them are bound to outperform a benchmark, no matter which criteria are used to compare performance.

This paper uses 100 years of data to examine a “full universe” of 9453 calender-based investment rules, and a “reduced universe” of 244 rules.  Investment strategies are tested jointly with many other similar strategies.  Report nominal p-values, and White’s reality check p-value for each null hypothesis of no effect.  White’s p-value adjusts for the data-mining bias.

Conclusions:  Nominal p-values are highly significant for many strategies, but White’s reality check p-values are not significant for any calendar-based strategy.