Sullivan, Ryan, Allan Timmerman, and Halbert White, “Dangers of data mining: The case of calendar effects in stock returns,” Journal of Econometrics 105 (2001), 249-286.
Using the same set of data to both formulate and test a hypothesis introduces data-mining biases. Calendar effects in stock returns are an outstanding instance of data-driven findings. Evaluated correctly, however, these calendar effects are not statistically significant.
Researchers have documented day of the week effects, week of the month effects, month of the year effects, and effects for turn of the month, turn of the year, and holidays, none of which was predicted ex ante by theory. By pure statistical chance, when enough theories are tested on the same set of U.S. publicly-traded common stock returns, some of them are bound to outperform a benchmark, no matter which criteria are used to compare performance.
This paper uses 100 years of data to examine a “full universe” of 9453 calender-based investment rules, and a “reduced universe” of 244 rules. Investment strategies are tested jointly with many other similar strategies. Report nominal p-values, and White’s reality check p-value for each null hypothesis of no effect. White’s p-value adjusts for the data-mining bias.
Conclusions: Nominal p-values are highly significant for many strategies, but White’s reality check p-values are not significant for any calendar-based strategy.
Mayer, Thomas, “Data mining: a reconsideration,” Journal of Economic Methodology 7:2 (2000), 183-194.
- Data Mining
- In the good sense, “data mining” means fitting multiple econometric specifications (in the simple case, multiple OLS regressions) to the data. This is both reasonable and scientific.
- In the bad sense, many economists implicitly equate data mining with running many regressions and then only reporting the one(s) that “work.”
- It is important to report any results that are contrary to the hypotheses, even if they seem very unlikely.
- Unbiased data mining means neglecting to report results only for the following reasons:
- The results fail statistical diagnostic tests
- Their statistical test results are inferior to those of the reported results
- They support the reported results
- They are obviously wrong (such as a significantly negative coefficient that collective experience says should be positive)
- The only case where biased data mining (purposefully omitting all contrary results) is when the author is trying to show that a hypothesis might be correct. In this case, the author’s intent should be clearly stated.
- Contrary evidence can usually be found to all hypotheses and theories, so sometimes all we can do is show that we might be right.
- Even unbiased data mining may be unacceptable if the researcher chooses diagnostic tests and/or significance cutoff levels with which his readers may not agree.
- One possibility is for researchers to simply report more specifications, and for readers and referees to require them.