A recent MarketWatch piece cited a talk in Hong Kong by Economics Nobel Prize winner Professor Robert Merton wherein he discussed the challenges of evaluating investment managers. The following article assumes that the above summary of Professor Merton’s talk is accurate. The piece, and assumedly the talk, argued that, given typical nominal portfolio returns and volatilities, it takes impractically long to detect evidence of investment skill. The argument claimed to prove that all manager selection is futile. Instead, it proved that naïve nominal performance metrics are of little use.

Any test of the effectiveness of manager selection is also a test of the analytical process that distills skill. That nominal investment performance is primarily due to factor (systematic, market) noise and thus reverts is well-known. It is thus unsurprising to find flaws in an approach to manager selection that is as antiquated as Ptolemaic Astronomy.

In this article, we will illustrate the difference between a naïve attempt to detect evidence of investment skill using nominal returns and a more productive effort relying on alphas (residual, security selection, stock picking returns) isolated using a capable modern multi-factor equity risk model. Whereas the former approach is futile at best, the latter approach is successful. In fact, rather than taking decades, a capable modern system can identify skill with high confidence in months.

Detecting Evidence of Investment Skill Using Nominal Returns

Consider nominal returns of a Portfolio and a Benchmark. The Portfolio is a live long-only fund implementing a Smart Beta active investment strategy:

Chart of the absolute cumulative returns for the Portfolio and the Benchmark as well as Portfolio’s cumulative return relative to the Benchmark

Portfolio’s and Benchmark’s Cumulative Returns

                           Portfolio Benchmark
 Annualized Return            0.1336    0.1433
 Annualized Std Dev           0.0879    0.1093
 Annualized Sharpe (Rf=0%)    1.5194    1.3115

With a heroic assumption that log returns follow a normal distribution, a t-test appears to confirm Professor Merton’s argument. Even with over six years of data, the returns are too noisy for a statistical inference:

Distribution of Portfolio’s Returns Relative to the Benchmark

    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -6.1441 -1.2186 -0.0201 -0.1149  1.2481  5.4068 
 
      One Sample t-test
 t = -0.4607, df = 78, p-value = 0.6768
 alternative hypothesis: true mean is greater than 0
 95 percent confidence interval:
  -0.5300        Inf

Detecting Evidence of Investment Skill Using Alphas/Residuals

By comparison, consider the same Portfolio’s residual returns, or alphas, for the same period, isolated with the AlphaBetaWorks’ standard Long-Horizon Statistical U.S. Equity Risk Model. These are also the returns Portfolio would have generated if its factor exposures had been fully hedged (its returns factor-neutralized, or residualized) using the Model:

Chart of the absolute cumulative residual (alpha, security selection, stock picking returns) for the Portfolio

Portfolio’s Cumulative Residual/Alpha

With an equally questionable assumption that log residuals follow a normal distribution, a t-test is now highly statistically significant:

Chart of the distribution of Portfolio’s residual returns used to detect evidence of investment skill

Distribution of the Portfolio’s Residuals/Alphas

    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.5300 -0.2064  0.2643  0.2620  0.7289  2.3663 
 
      One Sample t-test
 t = 3.3126, df = 78, p-value = 0.0007
 alternative hypothesis: true mean is greater than 0
 95 percent confidence interval:
  0.1303         Inf

Whereas Professor Merton’s argument does indeed apply to nominal returns, it does not apply to their residuals. A critical difference is the lower dispersion of residual returns. Over 90% of the variance of a typical active equity portfolio is due to factor exposures rather than to stock picking. Therefore, using nominal returns to measure skill is like trying to take a baby’s temperature by examining her bath water, rather than the baby herself.

Whereas at least 67 out of 100 monkeys picking stocks at random are expected to outperform the Portfolio, less than 1 out of 1,000 is expected to generate higher residuals – a highly statistically significant result. Thus, with the help of a capable equity risk model, strong evidence of skill can be identified in months rather than in decades.

Converting Residuals into Nominal Outperformance

Assuming the equity risk model uses investable factors, as AlphaBetaWorks’s models do, the residual return stream above is investable. In fact, in the idealized case of costless leverage, positive residual returns can be turned into outperformance relative to any benchmark. Below is the performance of Portfolio after it is hedged to match the factor exposures of the Benchmark. The evidence of skill is now plainly visible in the naïve absolute and relative nominal return metrics:

Chart of the absolute cumulative returns for the Portfolio hedged to match the factor exposures of the Benchmark, the Benchmark, as well as Portfolio’s cumulative return relative to the Benchmark

Cumulative Returns for the Portfolio Hedged to Match the Benchmark and the Benchmark

                          Portfolio with Benchmark Risk  Benchmark
 Annualized Return                                0.1784    0.1433
 Annualized Std Dev                               0.1168    0.1093
 Annualized Sharpe (Rf=0%)                        1.5276    1.3115

Conclusions

Since factor noise dominates nominal returns, the use of nominal returns to detect evidence of investment skill takes far too long to be practical.
After distilling stock picking performance (alphas, residual returns) from factor noise, statistically significant evidence of investment skill can become evident in months, rather than in decades.
Hedging makes it possible to turn positive stock picking returns into nominal outperformance with respect to any benchmark.