Dangers of data mining: The case of calendar effects in stock returns

Sullivan, Ryan, Allan Timmerman, and Halbert White, “Dangers of data mining:  The case of calendar effects in stock returns,” Journal of Econometrics 105 (2001), 249-286.

Using the same set of data to both formulate and test a hypothesis introduces data-mining biases.  Calendar effects in stock returns are an outstanding instance of data-driven findings.  Evaluated correctly, however, these calendar effects are not statistically significant.

Researchers have documented day of the week effects, week of the month effects, month of the year effects, and effects for turn of the month, turn of the year, and holidays, none of which was predicted ex ante by theory.  By pure statistical chance, when enough theories are tested on the same set of U.S. publicly-traded common stock returns, some of them are bound to outperform a benchmark, no matter which criteria are used to compare performance.

This paper uses 100 years of data to examine a “full universe” of 9453 calender-based investment rules, and a “reduced universe” of 244 rules.  Investment strategies are tested jointly with many other similar strategies.  Report nominal p-values, and White’s reality check p-value for each null hypothesis of no effect.  White’s p-value adjusts for the data-mining bias.

Conclusions:  Nominal p-values are highly significant for many strategies, but White’s reality check p-values are not significant for any calendar-based strategy.

Stock Market Prices do not Follow Random Walks: Evidence from a Simple Specification Test

Lo, Andrew W., and A. Craig MacKinlay, “Stock Market Prices do not Follow Random Walks:  Evidence from a Simple Specification Test,” The Review of Financial Studies, Vol 1 No 1 (1988), 41-66.

Most tests of the efficient market hypothesis have assumed that common stock returns follow a random walk.  However, some papers, including this one, have presented evidence against the random walk hypothesis.

Methods & Data:  Lo & Mackinlay use weekly return data from September 6, 1962 to December 26, 1985.  The test relies on the characteristic of a random walk whereby the variance between increments is linear in the interval between increments.  In other words, the variance of monthly observations should be about four times the variance of weekly observations.


  • The random walk hypothesis is rejected for weekly stock market returns.
  • The rejection is especially strong for small stocks, but is not entirely explained by infrequent trading or time-variation in volatility.
  • Weekly stock returns do not appear to be mean-reverting.
  • This does not mean the market is not efficient, but it does indicate that the random-walk model is not correct.

Are Seasonal Anomalies Real?

Lakonishok, Josef, and Seymour Smidt, “Are Seasonal Anomalies Real?” The Review of Financial Studies, Vol 1, No 4 (1988), 403-425.

If researchers analyze data using 100 different hypotheses, then formulate a theory based on results and then test the theory using the same data, they are very likely to get significant results for the theory.  This problem frequently arises due to the limited scope of stock return data (only a few standard sources).  Phenomena that are actually just noise get reported as asset-pricing anomalies.

In addition, rational efficient-market economists don’t want to publish or read papers that claim the market is efficient and investors are rational.  Therefore, a type of selection bias can occur when the majority of publications show anomalies, even if the majority of evidence argues against them.

This paper studies anomalies using “new” data to avoid the first problem.  The data are the daily Dow Jones Industrial Average returns, from January 1, 1897 to June 11, 1986.  Recent anomalies studies were done with post-1962 or post-1927 data; thus, using the DJIA since its inception adds 30-65 years of new data.

The 30 firms in the DJIA compose almost 25% of the entire NYSE.  The stocks of these very large firms are highly liquid, and so are unlikely to suffer from issues of nonsynchronous trading, which makes the DJIA a good measure of short-term market activity.  However, using the DJIA means that this study cannot test the January effect, which is observed in small stocks.


  • Monday returns are significantly negative (-0.14%).
  • Turn-of-month price increases are greater than the price increase for the entire month.
  • Prices increase 1.5% between Christmas and New Year’s.
  • Rates of return before holidays is 20x the normal rate of return.
  • Most anomalies are quite small in magnitude.
  • There is no consistent monthly pattern in stock returns.
  • There is no significant evidence that returns in the first part of month are different from returns in the last part.

Macroeconomic Seasonality and the January Effect

Kramer, Charles, “Macroeconomic Seasonality and the January Effect,” The Journal of Finance, Vol 49, No 5 (1994), 1883-1891.

This paper seeks to explain the “January effect,” which is the phenomenon of small and low-priced stocks outperforming in January by a much wider and more significant margin than the rest of the year.

Kramer explains this effect by defining separate betas for January.

Models of Stock Returns–A Comparison

Kon, Stanley J., “Models of Stock Returns–A Comparison,” The Journal of Finance, Vol 39, No 1 (1984), 147-165.

Purpose:  To explain the observed kurtosis (fat tails) and positive skewness in the distribution of stock returns.

Findings:  The discrete mixture of normal distributions proposed in this paper explains these moments better than existing models, including the Student-t.

Motivation:  Mean-variance portfolio theory, option pricing models, and empirical tests of capital asset pricing models and efficient markets make assumptions about the distribution of stock returns.  It has been shown that stock returns cannot be approximated using a single normal distribution, but the normal is just what the theoretical models assume.

Methods:  Returns may be driven by multiple normal distributions–one for random shocks, one for firm-specific information, one for macroeconomic information, etc.  In other words, it is not necessary that each observation of stock return be drawn from the same distribution as all others.  This paper uses a mix of up to five normal distributions and likelihood ratio tests to model the returns of the 30 Dow Jones stocks.

  • Let  r_t = \alpha_i + u_{it}, where r_t is the observed return for period t, and u_{it} is normally distributed with mean zero and variance \sigma_i^2.
  • Let \underline{\gamma}_i be a normal distribution with mean \alpha_i and mean \sigma_i^2.
  • Let \lambda_i = T_i/T be the observations associated with information set I_i over total observations, or the proportion of total observations that are associated with information set I_i.
  • Let \underline{\theta} = \{\alpha_i, ..., \alpha_N, \sigma_i^2, ..., \sigma_N^2, \lambda_i, ..., \lambda_{N-1}\} be a vector of parameters.
  • Let \underline{r} be the vector of returns.
  • The vector of parameters, given a vector of returns, can be found by solving

max \ell(\underline{\theta}/\underline{r}) = \Pi_{t=1}^T \left[\sum_{i=1}^N \lambda_i p(r_t|\underline{\gamma_i})\right].


  • All 30 Dow Jones stocks can be explained by a mixture of 2, 3, or 4 normals.
  • Returns of the S&P500, value-weighted, and equal-weighted market indices were also explained by mixtures of normals.
  • The mixture of normals describes 27 of the 30 stocks, and all three indices, better than does the student-t distribution, which also has fat tails and has been used to model stock returns.
  • Differences in the means of the normal distributions can explain the skewness of observed stock returns.
  • Differences in the variance of the normals can explain the kurtosis of stock returns.
  • Stock returns do appear to be normally distributed, but the distribution parameters are time-varying, and the timing of parameter shifts also varies across stocks.

Mean Reversion in Stock Prices?

Kim, Myung Jig, Charles R. Nelson, and Richard Startz, “Mean Reversion in Stock Prices?  A Reappraisal of the Empirical Evidence,” The Review of Economic Studies, Vol 58, No 3 (1991), 515-528.

Background:  In the 1970s and -80s, stock returns were thought to follow a random walk.  Researchers in the late 1980s began to question this view, and used a variance ratio method to show that autocorrelation did exist in stock returns.  Define the “variance ratio” as the return over K periods divided by the product of the return over one period and K.  If returns follow a random walk, this ratio must equal 1.

However, this assumption is not borne out by the data.  The variance ratio is higher than 1 for periods shorter than a year (positive autocorrelation) and is less than one for periods longer than a year (negative autocorrelation).  A common interpretation of this negative autocorrelation over longer periods is to say that returns are mean-reverting.

Fama & French’s approach is to regress the returns from period t to t+k on the return from period t-1 to t:

r_{k,t+K} = \alpha_K + \beta_Kr_{K,t} + \varepsilon_{K,t+K}

In this model, a negative beta indicates mean-reversion, and a zero beta, a random walk.  This model is also better suited to predicting future returns

Purpose:  This paper re-examines the data and finds no evidence of mean reversion after WWII.  Stock returns in the post-war period are actually mean-averting, meaning that disturbances are too persistent to support a mean-reversion theory. Furthermore, indicators of post-WWII mean-aversion are as statistically significant as indicators of mean-reversion for the whole 1926-1986 period.  The comparison of pre- and post-war returns do not support the random-walk hypothesis, but point to a fundamental change occurring at the end of the war.

Method:  Use statistical methods that do not assume returns are normally distributed.


  • Returns are only mean-reverting pre-WWII.
  • Post-war returns are, if anything, mean-averting.
  • The change may have accompanied the resolution of uncertainties surrounding the duration of the Great Depression, the outcome of WWII, and fears of another post-war depression.

Risk, Return, and Equilibrium Empirical Tests

Fama, Eugene F., and James D. MacBeth, “Risk, Return, and Equilibrium Empirical Tests,” The Journal of Political Economy, Vol 81, No 3 (1973), 607-636.

Purpose:  To test market efficiency, the two-period portfolio model, and the tradeoff between risk and expected return.


  • In the two-parameter portfolio model, the risk of an individual asset is proportional to its contribution to the total portfolio’s ratio of expected value to dispersion (typically, standard deviation).
    • Each asset’s risk depends upon its weight in the portfolio, its covariance with other portfolio assets, and its standard deviation.  The same asset, therefore, can have different risk levels in different portfolios.
    • Risk-averse investors choose assets and weights to form an “efficient portfolio,” or one that maximizes expected return for any level of dispersion.

Theoretical Background:

  • The expected return-dispersion model, E(R_i) = E(R_0) + \beta[E(R_m) - E(R_0)], makes three testable predictions:
    •  In an efficient portfolio, an asset’s relationship between expected return and risk should be linear.
    • \beta_i = \frac{cov(R_i, R_m)}{\sigma^2(R_m)} should measure the total risk of security i in the portfolio m.
    • Higher risk should be associated with higher expected return.
  • In an efficient capital market, investors should form efficient portfolios that fit the model given above.


  • Data are NYSE monthly stock returns for 1926-1968
  • Market return is estimated by the equal-weighted NYSE return
  • Form portfolios of stocks
    • estimated portfolio betas exhibit less error than the sum of individual security betas if the individual betas are not perfectly positively correlated
    • to avoid bunching positive and negative errors in the portfolios, stocks are sorted into portfolios by their betas in one period, then data from a different period are used to calculate portfolio betas.
      • Use 1926-1929 data to sort NYSE stocks into 20 portfolios
      • Use 1930-1934 data to calculate portfolio betas in 1935; use 1930-1935 data to calculate portfolio betas in 1936, etc.
      • portfolios are reformed every four years
      • e.g, regressions in 1950 are on portfolios formed using 1935-1941 data but with portfolio returns calculated using 1942-1949 data.


  • The beta-return relationship is linear for all periods except the five-year post-war period 1951-1955
  • Beta appears to be a very complete measure of risk in all periods.
  • The expected return-beta relationship is positive for all periods but the five years 1956-1960, where it is slightly negative.
  • Findings are consistent with an efficient capital market, where risk-averse investors assemble efficient portfolios.

Good Morning Sunshine: Stock Returns and the Weather

Hirshleifer, David, and Tyler Shumway, “Good Morning Sunshine:  Stock Returns and the Weather,” The Journal of Finance, Vol 58, No 3 (2003), 1009-1032.

  • Weather (days of sunshine) is a truly exogenous variable
  • Sunshine in the cities of 26 countries’ largest stock exchanges strongly and statistically significantly predicts stock returns between 1982 and 1997
  • After controlling for days of sunshine, other weather patterns such as rain and snow have no effect.
  • Trading based on the weather would be optimal for a trader facing very low transaction costs, but even moderate costs would generally prohibit such a strategy.
  • These results are consistent with a theory of mood affecting behavior, but are not consistent with the theory of rational actors.


  • Table III:  The joint betas are significant.  This might be driven by the large but insignificant betas for Buenos Aires and Rio de Janeiro (Sao Paolo; see below)
  • Hirshleifer & Shumway mistakenly collected weather data for Rio de Janeiro, while their return data comes from the Brazil’s largest stock exchange (Sao Paolo).

Do stock market liberalizations cause investment booms?

Henry, Peter Blair, 2000, “Do stock market liberalizations cause investment booms?” Journal of Financial Economics 58 (2000), 301-334.

Purpose:  To show that liberalizing a country’s stock market leads to increased private investment.

Motivation:  International asset pricing theory predicts that a stock market liberalization will be accompanied by a rise in the liberalizing country’s equity prices and by increased investment in physical capital.  Prior research has empirically confirmed the first prediction.  This paper investigates the second.

Findings:  In countries that liberalize their equity markets, where the marginal product of capital is high and domestic cost of capital exceeds the world average, private investment significantly and meaningfully rises.

Data/Methods:  This is an event study of liberalization in a sample of 11 emerging-market countries.

  • Determine dates of liberalization by using date of government mandate, date of first country mutual fund, or date of a jump in the IFC’s Investability Index.
  • Obtain private investment data from the World Bank’s STARS database (Socioeconomic Time Series Access and Retrieval).
  • Find stock returns in local currencies (including dividends) in the IFC Global Index, from the IFC’S Emerging Markets Database (EMDB).
  • Regress changes in log investment on dummies for the year of liberalization and the two following.
    • Include calendar year dummies to control for global macroeconomic trends.
  • Regress changes in log investment on stock returns and lagged stock returns.
    • Again include calendar year dummies.
  • Also use real U.S. interest rates and OECD output growth rates to control for world business cycles.
  • Use dummies to control for other simultaneous reforms: macroeconomic stabilization programs, trade liberalizations, privatization programs, and reductions of exchange controls.
  • Also control for domestic fundamentals, such as GDP growth.


  • Market liberalization leads to increased stock prices.
  • Growth in private investment is strongly correlated with changes in stock prices.
  • The correlation is stronger for valuation changes related to liberalization.
  • Private investment increases after liberalization, even after controlling for global cycles.