Size Matters, If You Control Your Junk

The size premium has been challenged along many fronts: it has a weak historical record, varies significantly over time, in particular weakening after its discovery in the early 1980s, is concentrated among microcap stocks, predominantly resides in January, is not present for measures of size that do not rely on market prices, is weak internationally, and is subsumed by proxies for illiquidity. We find, however, that these challenges are dismantled when controlling for the quality, or the inverse “junk”, of a firm. A significant size premium emerges, which is stable through time, robust to the specification, more consistent across seasons and markets, not concentrated in microcaps, robust to non-price based measures of size, and not captured by an illiquidity premium. Controlling for quality/junk also explains interactions between size and other return characteristics such as value and momentum.


1
The finding that size is related to expected returns dates back at least to Banz (1981), who found that small stocks in the U.S. (those with lower market capitalizations) have higher average returns than large stocks, a relation which is not accounted for by market beta. The relation between size and returns is important for several reasons. First, the size anomaly has become one of the focal points for discussions of market efficiency. Second, the size factor has become one of the staples of current asset pricing models used in the literature (e.g., French (1993, 2014)). Third, the size premium implies that small firms face larger costs of capital than large firms, with important implications for corporate finance, incentives to merge and form conglomerates, and broader industry dynamics. Fourth, the size effect has had a large impact on investment practice, including spawning an entire category of investment funds, giving rise to indices, and serving as a cornerstone for mutual fund classification.
Given the importance of the size effect, it has naturally come under heavy scrutiny. Considering a long sample of U.S. stocks and a broad sample of global stocks, we confirm the common criticisms of the standard size factor: a weak historical record in the U.S. and even weaker record internationally makes the size effect marginally significant at best, long periods of poor performance, concentration in extreme, difficult to invest in microcap stocks, concentration of returns in January, absent for measures of size that do not rely on market prices, and subsumed by proxies for illiquidity. 1 However, we find that measures of size studied by the literature load strongly and consistently negatively on a large variety of "quality" factors. At a broad level, quality is a characteristic or set of characteristics of a security that investors are willing to pay a high price for, all else equal. Asness, Frazzini, and Pedersen (2014), using the Gordon growth model, illustrate various dimensions of quality that can be measured in a number of waysprofitability, profit growth, low risk in terms of return-based measures and stability of earnings, and high payout and/or conservative investment policy. We find a strong and robust size effect when controlling for a firm's quality or its inverse -"junk"and we find that the results are very consistent across a variety of measures.
Controlling for quality/junk reconciles many of the empirical irregularities associated with the size premium that have been documented in the literature and resurrects a larger and more robust size effect in the data. To understand this, note that large firms tend to be high quality firms on each of the above dimensions, while small firms tend to be "junky" (i.e., have the opposite characteristics).
Given that high quality stocks tend to outperform junk stocks in general, including when comparing stocks of similar size (Asness, Frazzini, and Pedersen (2014), Fama and French (2014)), this means that the size effect is fighting a headwind due to the low quality of small stocks. Said differently, small quality stocks outperform large quality stocks and small junk stocks outperform large junk stocks, but the standard size effect suffers from a size-quality composition effect.
We begin by outlining the challenges to the size effect in more detail. First, many papers find that the size effect is simply not very significant, producing only a small abnormal return and Sharpe ratio, with marginal statistical significance. Second, others have argued that the size effect has disappeared since the early 1980s when it was originally discovered and published (partly contributing to its overall weak effect). Dichev (1998), Chan, Karceski, and Lakonishok (2000), Horowitz, Loughran, and Savin (2000), Amihud (2002), and Van Dijk (2011) find that small firms do not outperform big firms during the 1980s and 1990s, rendering the small firm premium obsolete. Schwert (2003) suggests that the small-firm anomaly disappeared shortly after the initial publication of the papers that discovered it and coinciding with an explosion of small cap-based funds and indices. Gompers and Metrick (2001) argue that institutional investors' demand for large stocks in the 1980s and 1990s increased the prices of large companies relative to small companies that accounts for a large part of the size premium's disappearance over this period. More recently, Israel and Moskowitz (2013), McLean and Pontiff (2013), and Chordia, Subrahmanyam, and Tong (2014) examine the attenuation of a host of anomalies, including size, following original publication, declines in trading costs, and increases in active money management. Collectively, the results indicate a decrease in the returns to size, though the evidence of reduction is statistically weak.
Third, the size effect appears to be concentrated among only the smallest, microcap stocks. Horowitz, Loughran, and Savin (2000) find that removing stocks with less than $5 million in market cap causes the small firm effect to vanish. Crain (2011) andBryan (2014) find that the small stock effect is concentrated among the smallest 5% of firms. Since microcap stocks of this size are typically highly illiquid, researchers have questioned the efficacy of size-based strategies net of trading costs.
Fourth, most of the returns related to size seem to occur in January, particularly the first few trading days of the year, and are largely absent the rest of the year (Reinganum (1981), Roll (1981), andKeim (1983)). Gu (2003) and Easterday, Sen, and Stephan (2009) also find that the January effect has declined over time, coinciding with the decline in the small firm premium, as does Van Dijk (2011) in a review of the size literature. Returns coming mostly from January are not damning, but are puzzling, as most of our asset pricing theory would imply a more even monthly distribution Electronic copy available at: https://ssrn.com/abstract=2553889 discovers that non-price-based size measures perform just as well as market-capitalization-based portfolios contrary to Berk's (1995b) finding, revives the returns to size outside of January while simultaneously diminishes the returns to size in January, recovers a more robust size effect in almost two dozen other international equity markets, and reduces size's exposure to both liquidity levels and liquidity risk across several measures. 2 Stocks with very poor quality (i.e., "junk") are typically very small, have low average returns, and are typically distressed and illiquid securities. These characteristics drive the strong negative relation between size and quality and the returns of these junk stocks chiefly explain the sporadic performance of the size premium and the challenges that have been hurled at it.
In summary, controlling for junk produces a robust size premium that is present in all time periods, with no reliably detectable differences across time from July 1957 to December 2012, in all months of the year, across all industries, across nearly two dozen international equity markets, and across five different measures of size not based on market prices.
After reviving the size premium, we turn our attention to the interactions between size and other anomalies found in the literature, shedding new light on the relation between size and other crosssectional predictors of returns such as value and momentum. We find that accounting for junk explains why small growth stocks underperform and small value stocks outperform the French (1993, 2014) models.
The relation between size and quality/junk also has import for theory, presenting another challenge for asset pricing models. For example, the returns to size are much stronger and more stable after controlling for junk. This makes risk-based explanations for the size effect more challenging not only because of its very high Sharpe ratio (e.g., Hansen and Jagannathan (1997)), but also because the riskiest small stocksthe small junk stocksare not the securities that drive a significant positive size premium, as a risk story implies. Rather, it is the low-volatility, high-quality stocks that seem to drive the high expected returns. These results are difficult to reconcile in a riskbased framework and suggest that high quality small stocks may be underpriced, though, as always, there remains the possibility of new risk-based explanations we have not yet considered. In addition, 2 Our results may be related to Hou and Van Dijk (2013), who examine what they call "profitability shocks" to firms and find that in the 1980s and 1990s small firms experience negative profitability shocks that help explain ex post their dismal performance during this period. However, while Hou and Van Dijk (2013) seek to explain the ex post performance of size during this time period, our study seeks to find ex ante measures of quality, across multiple measures of quality in addition to profitability, that have power to explain expected returns. We find that ex ante measures of a variety of quality metrics can explain variation in the expected returns to size returns across time, seasons, and markets. the fact that non-price based measures of size work at least as well as market-based measures, suggests that size is not picking up an omitted risk factor as suggested by Berk (1995a).
Finally, while small firms are certainly less liquid on average, we find that various liquidity proxies offered in the literature do not fully explain the size effects we find when controlling for quality/junk. Controlling for junk, which seems to be related to illiquidity, we find that the substantial remaining size premium is less sensitive to liquidity or liquidity risk and yet delivers an even bigger return premium not explained by other factors. This implies either that the size premium controlling for junk is not as sensitive to liquidity premia, or that better and different liquidity proxies are needed to capture the added returns we find for size once controlling for junk. It also implies that a small-quality portfolio is likely lower cost and higher capacity to implement than a small portfolio that ignores quality and therefore loads on illiquid junk, reducing any micro-structure and practical objections to the size results. Again the task of theory is made more difficult in this regard. These results renew the size anomaly, putting it on more equal footing with other anomalies such as value and momentum in terms of its efficacy and robustness. Moreover, the interaction between size, quality/junk, and other cross-sectional predictors of returns may shed light on other anomalies. Asset pricing theory and subsequent empirical work may consider why size and junk are related and, in particular, why they co-vary so strongly with each other.
The paper proceeds as follows. Section I briefly describes the data and reviews the evidence on the size effect, highlighting the seven challenges to the size premium identified in the literature.
Section II shows that nearly all of these challenges are resolved after controlling for a firm's quality/junk. Section III analyzes interactions between size and growth and value and momentum after controlling for quality/junk. Section IV concludes.

I. Data and Preliminary Analysis: Reexamining the Size Anomaly
We detail the data used in this study and reexamine the evidence of the size effect by replicating some of the challenges identified in the literature using an updated sample.

A. Data
We examine long-short equity style portfolios commonly used in the literature pertaining to size.
For U.S. equities, we obtain stock returns and accounting data from the union of the CRSP tapes and the XpressFeed Global database. Our U.S. equity data include all available common stocks on the merged CRSP/Compustat data between July 1926 and December 2012. We include delisting returns when available in CRSP.
Electronic copy available at: https://ssrn.com/abstract=2553889 Size. For size portfolios, we primarily use Fama and French's SMB, (Small minus Big), factor and a set of value-weighted decile portfolios based on market capitalization sorts, obtained from Ken French's webpage. The decile portfolios are formed by ranking stocks every June by their market capitalization (price times shares outstanding) and forming deciles based on NYSE breakpoints, where the value-weighted average return of each decile is computed monthly from July to June of the following year. The size factor, SMB, is the average return on three small portfolios minus the average return on three big portfolios formed by ranking stocks independently by their market cap and their book-to-market equity ratio (BE/ME) every June and forming two size portfolios using the NYSE median size and three book-to-market portfolios using 30, 40, and 30 percent breakpoints, respectively. The intersection of these groups forms six size and BE/ME portfolios split by small and large (e.g., Small value, small middle, small growth and large value, large middle, and large growth) whose value-weighted monthly returns are computed from July to June of the following year. The SMB factor is then simply the equal-weighted average of the three small portfolios minus the equalweighted average of the three large portfolios.
Fama and French factors. In addition to SMB, the value factor, HML or High minus Low, is formed from the equal-weighted average return of the two value portfolios minus the two growth portfolios, HML = ½ (Small Value + Big Value) -½ (Small Growth + Big Growth). Fama and French (1993) also add the market factor, RMRF, which is the value-weighted index of all CRSPlisted securities minus the one-month Treasury bill rate.
Momentum factor. Ken French's website also provides a momentum factor, which is a longshort portfolio constructed in a similar manner, where six value-weighted portfolios formed on size and prior returns (the cumulative return in local currency from months t-12 to t-2) are used. The portfolios are the intersections of two portfolios formed on size and three portfolios formed on prior returns. The momentum factor, UMD or Up minus Down, is constructed as UMD = ½ (Small Up + Big Up) -½ (Small Down + Big Down).
Short-term reversal factor. Ken French's website also provides a short-term reversal factor, STREV, which is formed similar to the momentum factor except using past returns from just the most recent month t-1 instead of t-12 to t-2.
Non-price based size portfolios. We also form SMB and value-weighted size decile portfolios using non-price based measures of size, as suggested by Berk (1995bBerk ( , 1997, in lieu of a firm's market capitalization to rank stocks. Specifically, using the same methodology as for SMB above, we form five sets of non-price size portfolios based on book value of assets, book value of equity, sales, property, plant, and equipment (PP&E), and number of employees.
Electronic copy available at: https://ssrn.com/abstract=2553889 Quality minus junk. We form a quality minus junk factor, QMJ, following Asness, Frazzini, and Pedersen (2014), which is formed by ranking stocks on measures of quality/junk based on their profitability, growth, safety, and payout. The motivation for their measures comes from the Gordon growth model, where dividing both sides of P = D/(r-g) with the book value and rearranging terms, yields profitability and payout in the numerator and required return and growth in the denominator.
Hence, the components profitability and payout mentioned above approximate the numerator, while safety (measured by return-based measures) proxies for the required return, r, and growth is designed to capture, g. The details of each of these measures are provided in Asness, Frazzini, and Pedersen (2014), and we use several variations of their quality and junk measures, as well as related measures of investment and profitability from Fama and French (2014), and some measures not used by either Asness, Frazzini, and Pedersen (2014) or Fama and French (2014), for robustness. Quality or junk is measured from a combination of these measures and QMJ is formed in a manner similar to the methodology used by Fama and French (1993) where stocks are ranked by size and quality/junk measures independently into two size and three quality/junk groups and the intersection of the groups forms six portfolios where QMJ is equally long the two quality portfolios and short the two junk portfolios. 3 Intra-industry portfolios. We also form SMB portfolios within each of 30 industries used by Fama and French (1997) and available on Ken French's website, where we construct SMB in a similar fashion within each industry so that we obtain 30 SMB industry-neutral portfolios.

Liquidity.
We form decile portfolios based on liquidity levels using monthly turnover (number of shares traded divided by shares outstanding) following Ibbotson, Chen, Kim, and Hu (2013) and bid-ask spread as a percentage of share price following Amihud and Mendelson (1986) and use Pastor and Stambaugh (2003)'s liquidity risk factor-mimicking portfolio, available from Robert Stambaugh's webpage.
International data. We form many of the above portfolios and factors in each of 23 other developed equity markets following the same methodology. Our international equity data include all available common stocks on the XpressFeed Global database for 23 developed markets from January 1983 to December 2012. We assign individual stocks to the corresponding market based on the location of the primary exchange. For international companies with securities traded in multiple markets, we use the primary trading vehicle identified by XpressFeed.
Our global portfolio construction closely follows French (1996 and2012) and Asness and Frazzini (2012). The portfolios are country neutral in the sense that we form long-short portfolios within each country and then compute a global factor by weighting each country's long-short portfolio by the country's total (lagged) market capitalization. The market factor, RMRF, is the value-weighted return on all available stocks across all markets minus the one-month U.S. Treasury bill rate. The size and value factors are constructed using six value-weighted portfolios formed on size and book-to-market. At the end of June of year t, stocks are assigned to two size-sorted portfolios based on their market capitalization. While for the U.S., the size breakpoint is the median NYSE market equity, for the international sample the size breakpoint is the 80th percentile by country in order to roughly match the U.S. size portfolios. Since some countries have a small cross section of stocks in the early years of our sample, we also use conditional sorts that first sort on size, then on book-to-price, in order to ensure we have enough securities in each portfolio (whereas the U.S. sorts are independent).
Portfolios are value-weighted and reconstructed every month and rebalanced every calendar month to maintain value weights. 4 In order to be included in any of our tests we require a firm to have a non-negative book value and non-missing price at fiscal yearend as well as in June of calendar year t. All portfolio returns are in $US and excess returns are relative to the one-month U.S. Treasury bill rate. Table 1 replicates the evidence on the size effect from the literature using the full sample including recent data. The first three columns report results for SMB and the second three columns for the difference in returns between deciles 1 and 10 (a more extreme difference in size than SMB and also unadjusted through bivariate sorts for book-to-price like SMB).The first row reports the mean, standard deviation, and t-statistic of the size premium over the full sample period from July 1926 to December 2012. SMB yields a 23 basis point premium per month that is statistically significant at the 5% level (t-statistic = 2.27). The decile spread returns also yield a positive return of 55 basis points per month, which is also significant. This first result highlights that the size effect is 4 To obtain shareholders' equity we use Stockholders' Equity (SEQ), but if not available, we use the sum of Common Equity (CEQ) and Preferred Stock (PSTK). If both SEQ and CEQ are unavailable, we will proxy shareholders' equity by Total Assets (TA) minus the sum of Total Liabilities (LT) and Minority Interest (MIB). To obtain book equity, we subtract from shareholders' equity the preferred stock value (PSTKRV, PSTKL, or PSTK depending on availability). Finally, to compute book value per share (B) we divide by common shares outstanding (CSHPRI). If CSHPRI is missing, we compute company-level total shares outstanding by summing issue-level shares (CSHOI) at fiscal yearend for securities with an earnings participation flag in the security pricing file.

B. Reexamining the evidence on size alone
relatively weak compared to other anomalies such as value and momentum that each exhibit much stronger and more reliable return premia. 5 The next two rows report the returns to size in the months of January and February through December, separately. The returns to SMB are enormous in January at 2.3% per month and the 1-10 spread in size decile returns is even larger at 6.8% in January. However, February through December SMB delivers only 4 basis points and the 1-10 portfolio spread -1 basis point, both of which are statistically and economically no different from zero. Hence, what reliable positive premium exists for size appears to solely reside in January and is absent the rest of the year.
The next two rows report results over the original sample period studied by Banz (1981) from 1936 to 1975 and the out-of-sample period from Banz (1981Banz ( ), pertaining to 1926Banz ( to 1935Banz ( and 1976Banz ( to 2012. As Table 1 indicates, SMB is insignificant over Banz's original sample period and the 1-10 decile spread is marginally significant (t-statistic of 1.82), though the mean returns are similar to the full period results. The results from Banz (1981) over the same time period for similar decile portfolios are stronger than what we find here, which is likely due to data errors being fixed by CRSP after his paper was published. 6 The out-of-sample evidence from Banz (1981) is actually a bit stronger for SMB, but weaker for the decile spread returns. Overall, the original size effect studied by Banz (1981) is weaker than originally found, consistent with the findings of Israel and Moskowitz (2013).
However, the size effect has experienced significant variation over time, including over relatively long-term periods. The next four rows of Table 1 report results over four sample periods: 1) the full period over which quality/junk measures are available, the "QMJ period" from July 1957 to December 2012; 2) the period from July 1957 to December 1979 shortly before the discovery and publication of the size effect, which we term the "Golden age" because the late 1970s was when most researchers were looking at the size effect, when its performance was highest; 3) the period from January 1980 to December 1999, which we call the "Embarrassment" period because this is when the size effect appears to have vanished promptly after being discovered and published; and 4) the period from January 2000 to December 2012, which we term the "Resurrection" period as the size effect appears to be revitalized during this period. The summary statistics in Table 1 highlight these results.
Indeed, consistent with the literature, the size effect seems to have disappeared in the 1980s and 1990s following its discovery, but also appears to have made a comeback in the last thirteen years.
Since our primary sample, which contains quality measures, is from July 1957 to December 2012, we also report the returns in January only and in the months February to December over this period. Consistent with the longer sample results, the entire size premium seems to be born in the month of January only and is conspicuously absent the rest of the year, and like before, the more extreme size bet from the 1-10 portfolio spreads exaggerates the January size effect. These last results illustrate perhaps the two biggest challenges to the robustness and interpretation of the size effect, where all of the returns to size seem to be coming from the most extreme small stocks in January. Excluding the very smallest stocks in January, there is little evidence of a size premium.
The last three rows of Table 1 report summary statistics for three other sample periods we will examine that pertain to data availability on other factors. The results over these subsample periods, which are partially covered by the other sample periods above, are consistent with our previous findings and not unusual over any of these subsamples.
Overall, there is a weak size effect, whose variation over time and across seasons is substantial, as documented in the literature. We turn our attention to these empirical challenges, as well as four others, through the lens of quality/junk in the next section.

II. The Size Effect, Controlling for Junk: Addressing Seven Challenges
In this section we analyze the seven challenges that have been propelled at the size premium, after accounting for the quality/junk of the stock. Table 2 reports time series regression results of SMB on a variety of factors. The first row of the first four row stanza of Panel A of Table 2 reports results of SMB regressed on the market portfolio, RMRF, over the July 1957 to December 2012 time period, which is the full sample period over which quality/junk measures are available. The intercept or alpha from the regression is 12 basis points (bps) per month with a t-statistic of 1.12, which is insignificantly different from zero, suggesting that the CAPM explains the returns to SMB pretty well. The next row adds the lagged return on the market from the previous month in order to capture delayed price responses of stocks, particularly small stocks, to market-wide news (following the results and implications of Lo and MacKinlay (1988), Hou and Moskowitz (2005), and in the spirit of Asness, Krail, and Liew (2001) to account for non-synchronous price responses due to liquidity differences and lead-lag effects among Electronic copy available at: https://ssrn.com/abstract=2553889 stocks). SMB has a significantly positive coefficient on the lagged market return, which further pushes down the alpha to 7 bps. The third row reports results that add HML and UMD to capture value and momentum exposure. The alpha now is 14 bps with a t-statistic of 1.23. In the presence of the market and the other Fama and French factors (excluding SMB of course), there appears to be no reliable size premium.

The Size effect is not very significant
Finally, the fourth row adds the QMJ factor to the regression. Recall, QMJ is a composite longshort portfolio giving equal weight to long profitable, growing, safe, and high payout companies and short unprofitable, stagnant, risky, and low payout firms. SMB loads very significantly and negatively on QMJ, driving SMB's alpha from 14 to 49 bps per month that is almost five standard errors from zero (t-stat = 4.89). The addition of QMJ not only raises significantly the average return to size, but also increases the precision of the SMB premium as well since QMJ explains a substantial fraction of the variation in SMB's returns. The R-square rises from 15 to 37 percent with the inclusion of this one additional factor. Figure 1 shows the impact of controlling for quality/junk on the size effect by examining SMB hedged with respect to the market, its lagged value, HML and UMD factors and QMJ. Figure 1 plots the cumulative sum of returns over time of SMB hedged with the market, its lagged value, HML, UMD, and QMJ, and SMB unhedged. The plot uses the full sample estimates of the betas from July 1957 to December 2012 to estimate the hedged returns to SMB. 7 As Figure 1 shows, hedging SMB for exposure to junk significantly improves returns. 8 For robustness, Figure 2 reports results across 30 different industries. We form SMB portfolios (long the smallest half of firms and short the largest half of firms) within each of 30 industries available from Ken French's data library. We then examine whether the improvement in SMB after controlling for quality/junk is similar within each industry. Though not 30 completely independent tests, this provides 30 different samples of firms from which we can test the robustness of the results.
Specifically, we compute the alpha of SMB within each industry relative to the market, its lagged value, HML and UMD. We then repeat this computation using the same factors plus QMJ and compare the difference within each industry. The first plot in Figure 2 shows the improvement in SMB alpha after controlling for QMJ for each of the 30 industries. The results are remarkably consistent. For every single industry, there is positive improvement in SMB's returns after 7 We have also used the past rolling 120-months of returns to estimate the regression models and betas in order to calculate the hedged return, representing an implementable out-of-sample hedge portfolio, and found similar though slightly weaker results (presumably due to the noise in estimating the hedge). 8 Comparing the hedged returns to SMB using QMJ versus those just using the market, its lagged value, HML and UMD factors yields very similar results, too, in that the key hedge variable is QMJ that resurrects size. controlling for quality/junk, and for most industries the improvement is significant (with significance, of course, harder to achieve in a much smaller sample of firms within a single industry).
The second figure plots the betas of each SMB portfolio on QMJ, which are all negative and are the mirror image of the improvement in alphas in the plot above it. These results indicate that the relation between size and quality/junk is very robust. Not a single industry fails to find a strong negative relation between size and quality, and as a result, the size premium is consistently alive and well within every single industry.
QMJ makes short work of this first, and perhaps most important, challenge to the size effect, as it simultaneously resurrects the return premium to size as well as explains much of its variation, transforming it from a small and insignificant effect to a large and statistically strong one, doing so consistently across every industry. Figure 1 anticipates the results in this section as casual perusal shows a far more consistent size premium when hedged for QMJ exposure. More formally, the remaining stanzas of rows of Panel A of Table 2 repeat the regressions above over the three subsample periods we defined earliergolden age, embarrassment, and resurrectioncorresponding to the periods over which the size premium varies substantially. During the "golden age" from July 1957 to December 1979 there is a more positive size premium of about 25 bps when adjusting for the market, its lagged value, HML and UMD (though the t-statistic is only 1.52). This is not surprising since we defined the golden age based on SMB's higher positive returns ex post. Adding QMJ, however, makes the age "more golden" as it more than doubles the alpha to 57 bps with a t-stat of 4.00.

Variation in the size premium over time
Looking at the embarrassment period, from 1980 to 1999, where we know SMB did not do well, we see consistently negative alphas, until we add QMJ. Adding QMJ restores SMB's positive alpha over this period to a robust and sizeable 50 bps (t-stat of 3.06), which is the same magnitude as SMB's alpha over the golden age period. Hence, controlling for quality/junk fully explains the very different performance of the size premium over these two seemingly very different periods. Despite SMB performing reasonably well over the golden age and performing very poorly over the embarrassment period, once we control for QMJ, the performance of SMB over both periods is exactly the same. In other words, it's the performance of QMJ (and, as we will see shortly, the performance of junk in particular) that drives the apparent variation over time in SMB's performance.
Electronic copy available at: https://ssrn.com/abstract=2553889 Finally, looking at the resurrection period, we see again positive SMB alphas with respect to the market, its lagged value, HML and UMD factors, but even larger alphas once we control for QMJ.
Like the other two sub periods, the alpha of SMB in the presence of QMJ is of similar magnitude and highly significant. Hence, accounting for junk, the premium for size is robust, positive, and stable, exhibiting far less variation through time. 9 The QMJ factor constructed by Asness, Frazzini, and Pedersen (2014) is a composite of many factors and measures designed to capture quality/junk by looking at variables that proxy for a variety of attributes, including profitability, safety, payout, and growth. In their paper, Asness, Frazzini, and Pedersen (2014) show that various combinations of their measures as well as individual measures yield very similar results. We, too, show that various measures of quality/junk give similar results on the efficacy of the size effect. Panel B of Table 2 repeats the full period regressions for SMB using each of the various four subcomponents of QMJ in place of the full QMJ factor. Despite the vastly different measures, in each case the loading on quality is significantly negative and SMB's alpha is significantly positive and more stable. For example, controlling for profitability instead of QMJ, SMB's alpha is 42 bps, a 30 bps improvement from the base case of controlling for the market, its lagged value, HML and UMD and almost four standard errors from zero. Controlling for safety or payout as measures of quality yields very similar numbers. The weakest quality measure is growth (consistent with Asness, Frazzini, and Pedersen's (2014) findings as well), yet even here there is a marginally significant 20 bps size premium when controlling for this relatively weak measure of quality, and again SMB loads significantly negatively on the growth component. 10 These results are very consistent and indicate the relation between size and quality/junk is quite robust across different measures.
One concern with QMJ is that it is constructed using a variety of measures, some of which individually have been shown to predict returns, and hence may overfit the historical return data (a form of collective data mining from the literature). Using the QMJ subcomponents separately partly addresses this concern, but as a further robustness test, we also employ some related factors from other work and by other authors. We start by looking at a single measure of safety from Frazzini and 9 Other factors do exhibit variation over time in their relation to SMB. One notable example is the lagged return on the market, which is significant in the first two sub periods but insignificant in the most recent sub period. This is consistent with markets becoming much more liquid over time, resulting in less of a lead-lag effect for small stocks and hence less of a delayed reaction to the market for small firms. 10 Growth provides the weakest results, where the SMB alpha is not statistically significant. However, growth is the poorest measure of quality/junk according to Asness, Frazzini, and Pedersen (2014), and yet it still increases SMB's alpha relative to omitting it as a factor and the coefficient on quality as measured by growth is still significantly negative. Given that three out of four subcomponents deliver significantly positive alphas and growth produces alpha improvement with the same sign and direction, the overall results across different measures are quite robust. Pedersen (2013) study of "betting against beta" (BAB). A version of BAB is one part of the safety composite employed in constructing QMJ, but here we break it out separately because unlike the other measures in QMJ, BAB is available going back much further to January 1931, providing an out of sample test.
The first row of Panel C of Table 2 employs the betting against beta or BAB zero-cost factor, which is a dollar-neutral strategy of going long low beta and short high beta stocks from Frazzini and Pedersen (2013), in place of QMJ over the same sample period as QMJ from July 1957 to December 2012. As the table indicates, this measure of safety is also able to capture some of the quality/junk spectrum as the alpha of SMB is pushed upward to 25 basis points (t-statistic of 2.42), and there is a strong negative loading on BAB of -0.43 with a t-statistic of -12.30. Comparing these results to only adjusting for the Fama-French factors (row 3 at the top of Table 2 Panel A, where the alpha is only 14 bps with a t-stat of 1.23) the SMB alpha nearly doubles and is now reliably different from zero.
The next two rows of Panel C of Table 2 report the same regression results over the out-of-sample period from January 1931 to June 1957. The SMB alpha on just the market, its lagged value, HML and UMD factors is only 6 bps over this period, but increases to 16 bps with the inclusion of BAB as a quality metric in the regression. Although the t-stat on the alpha is only 0.90 over this shorter sample period, there is still an improvement from adding a measure of quality to the model, even a simple one such as BAB. The negative loading of SMB on BAB is -0.35 with a t-stat of -4.99, indicating that even this very simple measure of quality is strongly and reliably negatively related to size. The next two rows of Panel C report the regression results over the full period for which BAB is available -January 1931 to December 2012, where SMB's alpha is an insignificant 7 bps relative to the market, its lagged value, HML and UMD factors, but rises to a significant 23 bps per month (tstat of 2.50) with the inclusion of BAB as a measure of quality.
The last two rows of Table 2 Panel C use a measure of quality or junk that is not used among any of the subcomponents of QMJ, due to limited data availability. We use debt ratings on firms to create a credit spread that should be related to other measures of quality, but credit ratings are only available for enough firms beginning in July 1987. Specifically, we use the equity return difference between firms with A-rated or higher debt minus the equity returns of firms with C-rated or below debt, where the market capitalization-weighted average of returns is computed for each group. We call this factor Cred, which captures the equity return difference between firms with high creditworthy debt minus low rated debt. As the last two rows of Panel C of Table 2 show, even over this very short time period, there is a robust negative loading of SMB on this credit spread factor (-0.08 coefficient with a t-statistic of -5.73), but the SMB alpha is only pushed up to 35 bps per month (t-statistic of 2.12) given the small average returns to the credit factor. Nevertheless, the consistent negative relationship between size and another, totally different measure of quality, provides a nice robustness test.
Finally, Panel D of Table 2 examines the Fama and French (2014) five-factor model, which contains the RMW ("robust minus weak") profitability factor and CMA ("conservative minus aggressive") investment factorwhich may pick up elements of quality/junk as well. Indeed, profitability is one measure used in QMJ's construction though in a different form, so this can be thought of as a robustness check by specification of the profit factor. Fama and French (2014) Table 2, but note that the results are nearly identical for their "2x2" and "2x2x2x2" factor specifications. As the first row of Panel D of Table 2 reports, SMB has an insignificant 16 bps alpha relative to the market, its lagged value, and HML and UMD. As the second row indicates, adding the two new Fama and French (2014) profitability and investment factors, RMW and CMA, respectively, SMB loads significantly negatively on RMW (the profitability factor) and marginally negatively on CMA (the investment factor), which doubles its alpha to 33 bps per month (t-statistic of 2.81). These results are consistent with both of the new Fama and French (2014) factors being related to quality/junk, though there is a much stronger relationship for profitability than investment. Intuitively, both profitability and investment are characteristics that should differ widely among high versus low quality firms. In essence, the new Fama and French (2014) factors pick some of this up.
The third row of Panel D then adds QMJ to the regression. Two interesting things happen: 1) the negative coefficients on RMW and CMA disappear, being soaked up by the very strong negative loading on QMJ and 2) SMB's alpha rises even higher to 54 bps per month. Hence, QMJ seems to capture the explanatory power of Fama and French's (2014) profitability and investment factors on the size effect. The fourth row of Panel D repeats this last regression using simple BAB in place of QMJ as a quality measure. In this case, BAB, which SMB loads significantly negatively on, only partially captures the negative exposure to Fama and French's (2014) profitability factor RMW, consistent with BAB being a related, but noisy measure of quality.
The next two rows of Panel D of Table 2 repeat the regressions adding the credit factor, Cred, to the regression over the shorter sample period July 1987 to December 2012 when the credit data is available, as another robustness test. Over this shorter sample period, the Fama and French (2014) profitability and investment factors still exhibit a negative relation with SMB, though the loading on Electronic copy available at: https://ssrn.com/abstract=2553889 investment is not reliably different from zero. Adding QMJ, BAB, and Cred to the regression eliminates the negative exposure to RMW, where each of QMJ, BAB, and Cred all have reliably negative loadings with respect to SMB.
Overall, the results indicate that other forms of capturing the quality of firms, including Fama Hence, the second challenge to the size effectthat it varies significantly through timehas been met. The variation in the size premium over long stretches of time is almost completely explained by the performance of quality and junk. Thus, it is the returns to quality and junk, and not size, that have confounded previous results.

Is the size premium concentrated in extreme stocks?
Figure 3 examines the returns to size more finely by looking across size-sorted decile portfolios.
From this analysis we can address another criticism of the size factor: whether the size premium is concentrated in the extremes or whether there is monotonicity in the relationship between size and average returns.
The top graph of Figure 3 plots the alphas of each size decile with respect to three factor models: 1) the market model (RMRF), 2) the Fama and French factors RMRF, RMRF lagged a month, HML, and UMD and 3) these same factors augmented with the QMJ factor, all regressions are run over the full sample period from July 1957 to December 2012. As the figure shows, the market-adjusted alphas and alphas adjusted for Fama and French factors are higher for the smallest decile of stocks (decile 1) compared to the largest (decile 10), but are otherwise pretty flat across deciles 2 through 9 and exhibit no reliable pattern (e.g., decile 3 appears to be highest, and decile 1 is lower than deciles 4 through 9 when adjusting for the Fama and French factors). In short, there is no consistent relation between size and average returns across the deciles in terms of market or Fama and French-adjusted alphas. This is consistent with claims in the literature that the size effect is concentrated in the extremely small, microcap stocks and not nearly monotonic. However, when adding QMJ as a factor, not only is a very large difference in average returns between the smallest and largest size deciles observed, but, perhaps more interestingly, there is an almost perfect monotonic relationship between the size deciles and the alphas. As we move from small to big stocks, the alphas steadily decline and eventually become negative for the largest stocks. Hence, controlling for quality/junk restores a monotonic relation between size and average returns that is absent otherwise.
The second graph in Figure 3 repeats the plot over the "golden age" period for size from July 1957 to December 1979. Over this sub period we know there is a significant size premium, even in the presence of the market and the Fama and French factors. However, while a significant difference in alphas does indeed exist between deciles 1 and 10 over this period, the relation between size and average returns is closer to but still not nearly monotonic, even over the "golden age" period. The market-adjusted and Fama and French alphas are larger for smaller stocks, but are essentially the same across the first five size deciles with no reliable decline in alpha as size gets bigger. Likewise, larger stocks exhibit lower alphas on average, but there is also no reliable pattern from deciles 6 through 9. Controlling for QMJ, however, we see a strong monotonic relationship between size and alpha. Hence, even over the period where size "worked," there is little evidence of a tight monotonic relation between size and average returns, unless we control for quality/junk. The fact that, as we have already shown, QMJ resurrects the size premium may in part be related to restoring monotonicity as well, since a larger absolute premium may reduce the influence of noise on each portfolio. But, it does not have to work out this way. QMJ could have just as easily raised the returns on all size deciles equally and not improved monotonicity, or it could have added more to the larger deciles or to random deciles and actually reduced monotonicity. The fact that as size increases we see proportionately more alpha when including QMJ, suggests that quality/junk exposure is indeed related to size in a monotonic way and controlling for quality/junk restores a tight linear relation between size and average returns.
The relation between size and QMJ is also quite stable through time. The top graph in Figure 4 plots 10-year rolling beta estimates of SMB on QMJ over the sample period (July 1967 to December 2012) and shows that the betas are always negative and range from -0.40 to -1.25 approximately. The second graph in Figure 4 plots the time-varying betas of each size decile on QMJ. Again, the timeseries variation in the betas is relatively small, but more interestingly the monotonic relation between size and quality/junk is extremely stable though time, as smaller size deciles consistently have more negative QMJ betas and the effect is almost completely monotonic throughout the rolling sample.
There are few months where betas with respect to QMJ are not ordered almost perfectly by sizea remarkable feat considering the estimation error inherent in beta estimates. Repeating the same exercise for other measures of quality/junk using either the subcomponents of QMJ, Fama and French's (2014) profitability and investment factors, or Frazzini and Pedersen's (2013) BAB factor, we find similar patterns.
The challenge that size is concentrated in the extremes or not monotonically related to average returns is cleared up by controlling for quality/junk.

Non-price based size measures
Berk (1995a) argues that because size is typically measured by market capitalization, which contains market prices, any misspecification of the pricing model will lead to a negative relation between size and average returns. He suggests that using non-price based measures of size are therefore a better way to test the true relation between size and average returns. Berk (1995bBerk ( , 1997 finds that using such measures results in no reliable size premium. Table 3 reexamines the relation between non-price based measures of size and average returns in light of controlling for quality/junk. Panel A of Table 3 first shows results without controlling for QMJ and Panel B reports the results when QMJ is included in the regression. We rank stocks based on the non-price size measures suggested by Berk (1997) plus two others, which are: book assets, book equity, sales, PP&E, and number of employees. For each non-price size measure, stocks are ranked into deciles every June and the value-weighted returns of each decile are computed over the following yearthe exact same procedure we use to form the market cap size deciles. Panel A of Table 3 reports the alphas of the return difference between the smallest and largest decile portfolios regressed on the Fama and French factors RMRF, RMRF lagged, HML, and UMD. 11 The alphas and their t-stats are reported over the full, golden age, embarrassment, and resurrection periods. As Panel A of Table 3 shows, and confirms from Berk's (1995bBerk's ( , 1997 results, there is no reliable size premium for any of the non-price size measures over any subsample period, including the golden age for size, where we know market cap size measures perform quite well. Panel B of Table 3 reexamines these results by simply adding QMJ to the regression. Doing so systematically resurrects a size premium among every non-price based size measure and over every subsample period. The contrast in results going from Panel A to B is strikingevery estimated alpha from Panel A is insignificant (and many point estimates negative), while every alpha in Panel B is positive and significant. Comparing the magnitude of these alphas to those based on market capitalization (from Table 1 and Figure 3), we reject Berk's (1995a) conjecture that the non-price based size deciles deliver a smaller or insignificant size premium. Book assets, sales, book equity, PP&E, and employees produce size decile premia of 83, 67, 66, 58, and 68 bps per month, respectively, while the market cap size decile premium is 49 bps over the QMJ sample period, after controlling for QMJ.
Hence, controlling for quality/junk, the non-price size measures produce large return premia that are in fact larger than those based on market cap sorts.
In addition, the non-price based size premia are also very stable across the different subsamples once we control for QMJ, as evidenced by the results over the golden age, embarrassment, and resurrection periods. The stability of size's performance over time in the presence of QMJ is consistent with our earlier results using market cap to measure size.
For robustness, Figure 5 reports results from the intra-industry exercise we conducted earlier, but using SMB portfolios formed from the non-price based size measures instead of market capitalization. Within each of the 30 industries, we form SMB portfolios based on book assets, sales, book equity, PP&E, and number of employees. We then regress the SMB returns on the market, its lagged value, HML and UMD factors and regress the SMB returns on these same factors plus QMJ.
The difference between the alphas are then plotted industry-by-industry in Figure 5, representing the improvement in SMB performance from controlling for quality/junk.
As Figure 5 shows, in nearly every case, for every industry and non-price size measure, there is a more robust SMB premium once we control for QMJ. Of the 150 non-price × industry combinations, only two (book equity in the coal and oil industries) fail to yield increased SMB alphas when adding QMJ as an additional explanatory factor. The second graph of Figure 5 reports the beta on QMJ for each intra-industry non-price based SMB portfolio, which similarly shows that every non-price based size portfolio (except the two noted above) yield significantly negative betas on QMJ.
The challenge that the size premium only shows up for market price based measures of size is met by controlling for quality/junk. Doing so, we find a healthy, robust, and equally large size premium associated with portfolios sorted on non-price based measures of size that is robust in different time periods and within 30 different industries.

Seasonality in Size: the January effect
One of the biggest challenges researchers pose to any interpretation of the size effect is that it mostly resides in January. Table 1 showed that all of the returns to SMB and the decile size spread are concentrated in January, with no evidence of any size effect outside of January. Table 4 reexamines the seasonality in the size premium after controlling for QMJ. The first row reports results from regressions of the returns to SMB on a January dummy, a non-January dummy The second row of Table 4 adds QMJ to the regression. QMJ has two effects on the results.
First, it delivers a positive and significant size premium outside of January of 38 bps (t-stat = 3.62), and second, it mitigates the very large premium in January, dropping it from 2.09% to 1.57%. While a large January premium still remains, the premium for size is now present throughout the year and the difference between the January and non-January alpha, which still exists, is now approximately half that before adding QMJ.
The remaining rows of Table 4 repeat this exercise over the various subsample periods: golden age, embarrassment, and resurrection. In every sub period, QMJ rescues the size effect outside of January, delivering a consistent premium of at least approximately 30 bps (golden age), and as much as approximately 90 bps (resurrection), over the sub periods. In fact, outside of January, the returns to size controlling for QMJ are actually larger (almost twice as large) during the embarrassment period than they are during the supposed golden age for size. Hence, regarding February to December, the notion of the golden age period for size and the embarrassment period for size actually have it backwards! As Table 4 shows, this is due to QMJ confounding the performance of size over this period as well as the extreme returns in January. Put differently, a major reason the golden age for size exists is because of an enormous January return and failure to control for quality/junk. QMJ also diminishes the size premium in January, where it is actually insignificant in the last two sub periods and insignificantly different from the returns in February to December over these sub periods.
These results suggest that quality/junk also helps explain the strong seasonality associated with size-based strategies. In particular, the strong performance of junk stocks in January drives a significant fraction of the apparently high returns to size in January, while depressing the returns to size outside of January. Controlling for quality/junk reduces this seasonal component substantially and shows a strong size premium throughout the year, addressing another one of the major challenges to explaining the size effect. Figure 6 reports results that combine all four of the previous challenges to size: time-variation, concentration in the extremes, non-price size measures, and seasonal patterns. The first set of four graphs from Figure 6 plots the alphas outside of January (from February to December) of various size portfolios with respect to 1) the Fama and French factors RMRF (and its lagged value), HML, and UMD and 2) those same factors plus QMJ. The second set of four graphs plots both sets of alphas in January only. The size portfolios we examine include SMB, the spread in P1 -P10, P2 -P9, and P3 -P8 decile portfolios based on market cap sorts, to gauge whether the relationship between size and average return is driven by the very small and very big stocks. We also examine P1 -P10 spreads in decile portfolios based on sorts of non-price based measures of size: book assets, sales, PP&E, and number of employees. (We drop book equity here for brevity and since it is so highly correlated to book assets.) Finally, all results are additionally reported over the four sample periods we study: the QMJ, golden age, embarrassment, and resurrection periods.
The first graph in Figure 6 plots the alphas of the size portfolios with and without QMJ over the QMJ sample period from July 1957 to December 2012 over all months outside of January (February to December). As the figure shows, there is no size premium for any of the size portfolios from February to December when we do not control for QMJ. Whether using market based or non-price based measures of size, or more extreme size differences across size deciles, outside of January there is no evidence of any positive size effect. If anything, there is a slightly negative size effect February through December.
However, controlling for QMJ, these results change dramatically. First, every single size portfolio exhibits a significantly positive size premium February through December once we control for QMJ. Second, the size premium is monotonically related to size exposure, as the returns to P1 -P10 are larger than P2 -P9, which in turn are larger than P3 -P8, with each being significantly Electronic copy available at: https://ssrn.com/abstract=2553889 positive. Third, non-price based size portfolios yield equally large size premia as market cap based sorted size portfolios.
The next three graphs repeat these plots over the golden age, embarrassment, and resurrection subsample periods, yielding analogous results. In places where the size premium was conspicuously absent -February to December, for less extreme size sorts, for non-price based measures of size, and for periods such as the 1980s and 1990s (the "embarrassment" subsample)controlling for quality/junk completely resurrects the size effect. The QMJ factor seems to explain the substantial variation in the size effect over subsamples, measures of size, and seasons that led researchers to question the robustness of the size effect. Controlling for the quality/junk of a stock reestablishes a very robust and persistent size anomaly immune from these previous challenges.
The next four graphs of Figure 6 repeat the exercise for the month of January only. Here, of course, there is a sizeable premium to all size portfolios before controlling for QMJ, including the non-price based size portfolios. But, in an interesting twist, once we control for QMJ, the premium to these size portfolios in January is actually reduced. In fact, January is the only place where controlling for QMJ actually lowers the size premium, and in the last two subsample periods we study, actually eliminates the January size premium almost completely. This suggests that quality versus junk stocks perform very differently in January versus the rest of the year and are confounding the size effect. Controlling for these characteristics reveals a stable size effect that is not greatly affected by seasonal patterns, time periods, or different measures of size.
Combining all of these results, controlling for quality/junk has the effect of smoothing the returns to size, establishing a clear and robust size premium that is no longer concentrated in January, as most importantly is now quite significant excluding January entirely, and no longer concentrated in certain time periods, or for certain measures. The behavior of small, junk firms varies substantially and is chiefly responsible for diluting the size effect at certain times, for certain months, and for certain measures, and exaggerating it at other times and months. Controlling for these firms through exposure to the QMJ factor, the size premium emerges in every time period, month, and for every reasonable measure of size, even those not based on market prices.

Illiquidity and the size effect
Many researchers claim that size is a proxy for and subsumed by an illiquidity premium. This story for the size effect seems to help rationalize some of its variation over time and some of the seasonal variation, since liquidity may also vary over these times (e.g., if markets become less liquid in January, or, perhaps, became more liquid in the 1980s and 1990s).
In light of our results controlling for quality/junk, we reexamine the relation between size and various proxies for illiquidity used in the literature and whether there is any interaction with quality/junk. Table 5 reports regression results for the size premium, SMB, on the factors RMRF, its lagged value, HML, UMD, and various proxies for liquidity and liquidity risk. For measures of liquidity, we use the decile spread in portfolios sorted on turnover (LIQ) following Ibbotson et al.
(2013) and bid-ask spread (taking an equal-weighted average of the two), as well as the short-term reversal factor (STREV) from Ken French's website. Nagel (2012) argues and shows that short-term reversal profits globally vary with liquidity proxies and capture a liquidity premium. We also use the liquidity risk factor-mimicking portfolio of Pastor and Stambaugh (2003) (LIQRISK). One might also argue that the lagged return on the market is related to liquidity as well. These liquidity factor returns are not that correlated to each other so there is not a significant multicollinearity problem. On the other hand, the fact that they are not very correlated to each other indicates the difficulty and noise in measuring illiquidity. Hence, although these measures represent some of the "state of the art" with regard to liquidity factors, we interpret the following results with caution.
The first row of Table 5  The third row of Table 5 adds QMJ to the regression, in addition to the liquidity and liquidity risk factors. Two things happen. First, the alpha of SMB becomes large and significant at 47 bps per month with a t-stat of 3.90. Hence, QMJ rescues the size effect again, even in the presence of these liquidity and liquidity risk factors. Second, the loadings on the liquidity and liquidity risk factors decline because they are partially soaked up by the presence of QMJ. This result is intuitive, as the junky stocks, which are also highly illiquid and more sensitive to liquidity risk, get picked up by the QMJ factor, which is a more direct control for junk. Thus, controlling for junk removes some of what otherwise appears to be illiquidity exposures for SMB. However, again, since liquidity is measured with noise and along with that there is a lot of debate on how to measure it, we interpret these results with caution.
In addition, we argue that although QMJ may itself be related to liquidity, it is not a proxy for a liquidity premium. In fact, because it is long quality (more liquid) stocks and short junk (less liquid) stocks, if QMJ were just a liquidity factor it should deliver a negative risk premium, the opposite of what we find in the data. So, QMJ is more than just a liquidity effect, but certainly may contain some exposure to liquidity, which tends to reduce SMB's exposure to illiquidity when we add it to the regression.
The next two sets of results of Table 5 report the same regressions for the months of January only and the months February to December only. The seasonal results are striking. First, as we showed earlier, just controlling for the market, HML and UMD, there is a larger size premium in January and no evidence of any size premium the rest of the year (alpha of -3 bps with a t-stat of -0.22). Adding the liquidity factors, the January alpha drops from 64 bps to 39 bps with a t-stat of 0.68, suggesting that liquidity helps in explaining the apparently strong size effect in January. The liquidity variables seem to have little impact on SMB the remainder of the year, with the exception of the illiquidity factor, but the alpha for SMB still remains insignificant at -5 bps. Adding QMJ further reduces SMB's alpha in January to a measly 12 bps. However, for the rest of the year from February to December, QMJ raises SMB's alpha from an insignificant -5 bps to a significant 43 bps (t-stat of 3.48), with no additional exposure to illiquidity or to liquidity risk. Hence, the SMB premium does not appear to be very sensitive to these liquidity proxies, except perhaps in January, and produces a robust and stable return premium across all months outside of January when controlling for QMJ, even in the presence of these illiquidity and liquidity risk factors. It is an open question as to whether other liquidity factors would generate the same results and whether they, too, would be related to QMJ, where quality firms tend to be more liquid and junk firms less liquid.
The remaining rows of Table 5 report results for the golden age, embarrassment, and resurrection sub periods, separately. The results confirm our earlier findings and show that QMJ explains time variation in the size premium, delivering a large and stable premium over each sub period that is also not captured by these illiquidity or liquidity risk factors, and any variation in the exposure of SMB to these liquidity factors appears to be partially explained by QMJ as well.
In this paper we do not directly tackle the potential impact of trading and implementation costs on the returns to size once you control for quality. Since quality firms tend to be more liquid and junk firms less liquid, one might assume that controlling for junk would lead to a portfolio that is cheaper to trade, especially in a long only framework where an investor would be long small, quality stocks instead of long both small quality and junk stocks. Frazzini, Israel and Moskowitz (2013) examine the trading costs of the traditional, SMB factor returns and find based on actual trade data that the returns survive net of trading costs. Together these points should give researchers comfort that improvements we show in the returns to size controlled for junk should survive net of costs.
However, a counter-argument notes that a strategy that sorts on two features, here size and quality, will almost certainly be higher turnover than one sorting on just size. Thus, this is an area of ongoing research. The goal of this paper is to focus on factor model improvements that can be useful for understanding the cross section of returns and to explain the long-term disappointing returns to the size effect. Questions regarding implementation are a separate, and important, topic to be addressed by future work.

International evidence
Finally, to address the last challenge that size is not very robust internationally, we examine the size effect in 24 other countries. This analysis serves an overlapping purpose, which is to perform out of sample tests on the role of QMJ in reviving the size effect.
We form SMB portfolios within each international equity market following the same procedure as above. Similarly, we form QMJ in each of these markets following the same procedure as Asness, Frazzini, and Pedersen (2014). Figure 7 reports the change in SMB alpha for each country from regressing SMB on the local stock market index, its lag, and HML and UMD factors constructed within that market, versus the same set of regressors plus QMJ for that market. As the top graph of Figure 7 shows, there is a positive increase in SMB alpha for 23 out of 24 countries once we control for QMJ (the exception being Ireland where the point estimate is very close to zero and statistically no different from zero). The bottom graph of Figure 7 shows that the betas of SMB on QMJ are, again ex-Ireland, uniformly negative. These results are remarkably consistent across countries, providing evidence of both a robust size premium internationally once we control for QMJ and a wealth of out of sample evidence for our earlier findings.
Finally, and perhaps a bit of overkill, Figure 8 plots the same set of statistics by country for nonprice based measures of size using book assets, sales, book equity, PP&E, and employees. For the vast majority of countries, there is a significant size premium even for non-price based measures of size once we control for QMJ. These results provide even more evidence of a robust size effect internationally as well as a large number of out of sample tests for non-price based size measures that alleviate any data mining concerns.
The role of quality/junk in resurrecting and stabilizing the size effect across time, seasons, nonprice measures, and nearly two dozen international markets amasses an overwhelming set of independent results that show a consistent and substantial size effect on the cross-section of returns once we control for quality/junk.

III. Cross-Sectional Interactions with Value and Momentum
Given the interaction between size and quality/junk, we also reexamine the interactions between size and other cross-sectional characteristics in the literature once we control for junk. We start with the interaction between size and quality/junk itself and then examine the interactions between size and value and momentum.

A. Size and junk
As another way to control for quality/junk in looking at the size effect, we form 25 portfolios based on independent size and quality/junk sorts. Specifically, we form portfolios from independent sorts of stocks into five quintiles using size and quality/junk and the intersection of each of the five categories for each variable comprise each portfolio. Because these are independent sorts with strongly correlated sorting variables the number of firms in each of the 25 portfolios will be quite different. The value-weighted average monthly returns in excess of the monthly T-bill rate and their t-statistics are then computed over the sample period from July 1957 to December 2012.
To get a sense of the intersection between size and quality/junk, Figure 9 examines the size distribution of stocks within the lowest and highest 20% of quality/junk stocks. The first graph in Figure 9 plots, only among the 20 percent of stocks with the lowest quality or highest junk ranking ("junk"), the fraction of the number of stocks over time within each of the five independent size quintiles. The second figure does the same from the universe of high quality (non-junk) stocks. As the top figure shows, junk stocks are comprised of mostly small stocks. As the bottom graph shows, among quality stocks, the average size is larger, but still there are plenty of small stocks represented among the quality group (and the distribution among the various sizes is considerably more even among high quality stocks than among junk stocks). While junk is more correlated with small and quality is more associated with big stocks, there are plenty of large, junky stocks and plenty of small, quality stocks that we can examine the interactions between size and quality. We also note that our comments largely refer to the bulk of the sample occurring after the initial period (after about 1960-1965). The very early part of the sample shows a somewhat more even distribution of size amongst both high quality and junk stocks. Figure 10 shows results from the reverse exercise of looking at the distribution of quality/junk among the smallest and largest stocks, separately. The top graph plots the distribution of junk and high quality stocks among the smallest quintile of stocks, which shows fairly evenly distributed quality/junk characteristics among the smallest stocks, though a slight tilt toward more junk and less Electronic copy available at: https://ssrn.com/abstract=2553889 quality. The bottom graph reports the quality/junk distribution among the largest quintile of stocks and shows that there is a stronger tilt toward high quality and away from junk stocks. Table 6 reports summary statistics of the 25 size-junk portfolios created above. The average monthly returns in excess of the Treasury bill rate are reported for each portfolio (along with their tstats below). 12 Moving across the columns of Table 6 there is a significant size effect, as the smallest stocks outperform on average the largest stocks and the performance is monotonic across the size quintiles. This represents another way to control for quality/junk in looking at the size effect. The only exception to this pattern is among the junkiest, lowest quality stocks, where there is not a monotonic relation between size and average returns, and although the difference between the smallest and largest quintiles is 39 bps per month among the junk stocks, the t-stat on that difference is insignificant. The rest of the quality/junk quintiles exhibit a very strong size effect and a clearly monotonic relation between size and average returns. The equally weighted average of the five small minus big portfolios within each quality quintile averaged across the quality quintiles yields a return spread of 50 bps per month and a t-statistic of 3.18.
The reverse is true as wellcontrolling for size, there is a clear quality minus junk premium. In every size quintile, quality outperforms junk and the relation is fairly monotonic. Hence, quality/junk and the size effect are not the same thing, though they are (negatively) related.
The results in Table 6 provide further insight into our earlier findings. Controlling for QMJ resurrects size in many places where it was previously and seemingly absent. As Table 6 shows, the key set of stocks that need to be controlled for are the junk stocks, where the relation between returns and size breaks down. It is these junk stocks that have on average poor and very volatile returns that vary substantially over time, have very high returns in January, and are illiquid, and hence explains much of the challenges that have been thrown at and confounded with the size effect.
Finally, at the bottom of Table 6 we report results from time-series regression tests of the 25 size-quality/junk portfolios on three different factor models: 1) the Fama and French (1993) Fama and French (2014). It is also worth noting that all of the models contain the SMB size factor, yet none of the models could explain the variation in average returns across the size-quality spectrum without adding a quality factor as well. This indicates that quality is not subsumed by size and is necessary in explaining these portfolio returns. Table 7 examines the size premium among value and momentum stocks, separately. French (1996, 2012) find that there is a negative return to size (small underperforms) among growth stocks, where the poor returns of small growth stocks often lead to rejection of their three factor model and other models as well. Among value stocks, however, there appears to be a large positive size premium, which the Fama and French (1993) model also has difficulty capturing. We reexamine these results adding more recent data and in light of the interaction between quality/junk. We also examine the interaction between the size premium and winners and losers as defined by momentum.

B. Size premium among value and momentum stocks
We first construct an SMB portfolio among growth/expensive stocks only, by using the smallgrowth minus large-growth portfolios from the Fama and French (1993) portfolios used to construct SMB and HML more generally (recall regular SMB is an average of these and the same amongst value stocks). We refer to this portfolio as "SMBExp" to denote it is small minus big among the growth or expensive stocks. We also construct an SMB portfolio among value or cheap stocks similarly, which we denote "SMBChp." Panel A of Table 7 regresses SMBChp, SMBExp, and the difference between SMBChp and SMBExp on the Fama and French (1993) factors RMRF, RMRF lagged, SMB, HML, and the momentum factor UMD. We include SMB as a factor here so that we control for exposure to size in case SMBChp is just a more extreme loading on the size factor than SMBExp (since BE/ME is inversely related to size). By controlling for SMB exposure directly, we therefore focus just on the interaction between size and value. Confirming the evidence in the literature, there is a significant positive alpha for SMBChp and significant negative alpha for SMBExp (the first row of the SMBExp section and the first row of the SMBChp section, respectively). Taking the difference (the first row of the SMBChp-SMBExp section), there is an alpha of 40 bps between small-big cheap and small-big expensive, indicating that SMB among cheap stocks significantly outperforms SMB among expensive stocks, and the performance difference is not captured by the Fama and French factors that include SMB.
The second rows of Panel A of Table 7 repeat the regression with the addition of QMJ. Adding QMJ eliminates the positive SMBChp alpha, the negative SMBExp alpha, and removes any significant difference between them. Hence, the difference in the small firm premium among value and growth stocks, which is not explained by the Fama and French factors, is fully captured by QMJ.
SMB provides the same premium across the value-growth spectrum when controlling for junk, providing another piece of evidence that quality/junk cleans up and helps identify a robust and stable size effectthis time explaining differences across cheap and expensive stocks.

The last row of Panel A reports regression results of the difference between SMBChp and
SMBExp on the new Fama and French (2014)  Panel B of Table 7 examines the size premium among momentum stocks by looking at SMBUp (small minus big among winners) and SMBDown (small minus big among losers), and their difference (SMBUp -SMBDown) in an analogous fashion. SMB among losers appears to underperform SMB among winners by about 57 bps per month (t-stat = 5.22). Controlling for QMJ, the alpha is not explained, but is mitigated. However at 49 bps per month with a t-stat of 4.25, there is a lot left unexplained. Hence, quality/junk can only explain a small part of the variation in size premia among momentum stocks. This is perhaps not that surprising since momentum is a much higher frequency strategy than value and quality/junk measures tend to move around at frequencies closer to value and growth measures. Looking at the last row, which contains the Fama and French QMJ helps capture all of the return differences between small cheap and small expensive stocks that other models fail to capture. It also helps and partly explains the return differences between small winners and small losers, and does so better than other models such as Fama and French (2014). Clearly, Fama and French's (2014) five factors do not span the information in QMJ and cannot explain as well the variation in size premia across value and momentum stocks. Table 8 flips the analysis around and examines value and momentum premia among small and large stocks, respectively. Evidence from French (1996, 2012) and Israel and Moskowitz (2013) shows a much stronger value premium among small stocks than large stocks. We investigate whether QMJ can explain some of the variation in value premia. Panel A of Table 8 examines HML among small stocks minus HML among large stocks (HMLSmall -HMLBig) by regressing their return differences on the market, its lagged value, SMB, HML, and UMD factors without, and then with, QMJ. The first row of Panel A reports the QMJ period results, confirming the evidence from the literature that HML premia are indeed stronger among small stocks (40 bps difference with a tstat of 4.24). However, adding QMJ to the regression completely eliminates the difference between value premia among small and large cap stocks, leaving an alpha of only 13 bps with an insignificant t-stat of 1.44. Hence, exposure to QMJ also explains why small stock value premia are larger than big stock value premia.

C. Value and momentum premia among small and big stocks
The remaining rows of Panel A of Table 8 repeat the regressions over the golden age, embarrassment, and resurrection periods defined earlier, where in every case small cap value outperforms large cap value relative to the Fama and French factors, but controlling for QMJ completely explains this difference. Hence, the relation between value premia and size appears to be fully captured by quality/junk and is robust over time. In other words, it seems that value is performing better among small stocks not because they are small, but because they are more exposed to junk.  (1980 to 1999). Over this period, QMJ reduces the alpha for small cap momentum over large cap momentum slightly, but there is still a lot of alpha unexplained.

Panel B of
Overall, controlling for quality/junk helps explain fully the difference in size premia among value and growth stocks and the difference in value premia among small and large cap stocks.
However, variation in the size premium for momentum winners and losers is only partially explained, and variation in the momentum premium across small and large cap stocks is barely explained by QMJ. Hence, quality versus junk explains a lot of the interactions between size and other return premia, but does not capture everything.

IV. Conclusion
Size mattersand, in a much bigger way than previously thoughtbut only when controlling for junk. We examine seven empirical challenges that have been hurled at the size effectthat it is weak overall, has not worked out of sample and varies significantly through time, only works for extremes, only works in January, only works for market-price based measures of size, is subsumed by illiquidity, and is weak internationallyand systematically dismantle each one by controlling for a firm's quality. The previous evidence on the variability of the size effect is largely due to the volatile performance of small, low quality "junky" firms. Controlling for junk, a much stronger and more stable size premium emerges that is robust across time, including those periods where the size effect seems to fail; monotonic in size and not concentrated in the extremes; robust across months of the year; robust across non-market price based measures of size; not subsumed by illiquidity premia; and robust internationally. These results are robust across a variety of quality measures as well.
We further find that interactions between size and other firm characteristics, such as value and momentum, can also be fully or partially explained by quality versus junk. Hence, the quality of a firm helps clean up the relation between size and the cross-section of expected returns.
This then begs the question why do average returns vary by the quality of a firm? Both riskbased and behavioral asset pricing theories should continue to try to explain this result. In particular, controlling for quality/junk significantly increases the Sharpe ratio of a size-based strategy, which poses a greater challenge for rational models (e.g., Hansen and Jagannathan (1997)). Note we find that after controlling for the riskiest small, junk firms, the size premium gets stronger rather than The table reports summary statistics on the size premium over time. Two zero-cost portfolios are used to capture the returns to size: The "small minus big" (SMB) stock factor of Fama and French (1993), obtained from Ken French's website, and the return spread between size-sorted value-weighted decile portfolios P1-P10. The annualized mean and standard deviation (stdev) of the returns are reported on these spread portfolios, as well as the t-statistic of the mean, over the full sample period (from July 1926 to December 2012), for January and February-December separately over the same sample period, for the same sample period as Banz's (1981) study January 1936 to December 1975, over the period before and after Banz's (1981) study, over four periods we use throughout the paper: the period over which the QMJ factor is available (July 1957 to December 2012), for January and February-December separately over the same QMJ sample period, as well as three sub periods pertaining to the time when the size effect is strongest (July 1957 to December 1979, "golden age"), weakest (January 1980to December 1999, and the most recent resurgence in the size premium (January 2000 to December 2012, "resurrection"). Finally, we report summary statistics on three other sample periods pertaining to when the "betting against beta" strategy (BAB) of Frazzini and Pedersen (2013) -&Post-Banz (1981) 1926:07-19351976:01-2012 Frazzini, and Pedersen (2014). Results are reported over four sample periods: the full QMJ period (July 1957 to December 2012) and the sub-periods for the golden age (July 1957to December 1979), embarrassment (January 1980to December 1999, and resurrection (January 2000 to December 2012) periods. Panel B repeats the same regression but replaces QMJ with each of its four subcomponents: Profitability, Growth, Safety, and Payout, as described in Asness, Frazzini, and Pedersen (2014). Panel C reports regressions that include other measures of quality/junk and out of sample evidence. Specifically, Frazzini and Pedersen's (2013) betting against beta, BAB, strategy, which is long low beta stocks and short high beta stocks, as a proxy for safety, and the spread in equity returns between firms with A-rated debt and higher minus firms with C-rated debt and lower (Cred). Panel D reports regression results from the Fama and French (2014) five factor model (adding a lagged market factor) that includes the factors RMW and CMA, representing profitability and investment, respectively. We also create a "quality" index (QIndex), which is a simple equal-weighted average of all available other quality measures including the Fama and French (2014) profitability and investment factors, BAB, and Cred, when available. Date ranges for the various regressions vary, depending on the factors used, and are reported below in the table.

Table 3: Size Premium for Portfolios Sorted on Non-Price Based Measures of Size
The table reports regression results for the P1-P10 value-weighted spread portfolios sorted using non-priced based measures of size that include: book assets, sales, book equity, property, plant, and equipment (PP&E), and number of employees. The P1-P10 spread portfolio for each of the non-priced size measures is constructed in the same manner used for market capitalization-sorted portfolios. We form decile portfolios by sorting stocks each July, based on their June measure of size using each of the non-priced based size measures, and then compute returns to each decile portfolio, where securities are weighted by the market values, over the following year to the next June. Panel A reports the results from regressions of the non-price based size premia on the market, the lagged market, HML, and UMD, and Panel B reports results for the same regressions that add QMJ as a regressor. For brevity, we only report the estimated alphas and t-statistics of the alphas. Results are reported over the four sample periods: QMJ (1957QMJ ( :07-2012, golden age ( 1957:07-1979:12), embarrassment (1980:01-1999:12), and resurrection (2000:01-2012:12).
Size measure:    (2014), where the alphas are estimated for the months of January and non-January separately using dummy variables for those months. Also reported is the difference between January and other months, along with a t-statistic on that difference in the last column. Results are reported over four sample periods: the full QMJ period (July 1957 to December 2012) and the sub periods for the golden age (July 1957to December 1979, embarrassment (January 1980to December 1999, and resurrection (January 2000 to December 2012) periods for the size premium.   The table reports regression results for the size premium, SMB, on the factors RMRF, its lagged value, HML, UMD, and various proxies for liquidity and  liquidity risk. Specifically, we use the decile spread in portfolios sorted on turnover following Ibbotson et al. (2013) as a proxy for liquidity (LIQ), the short-term reversal factor (STREV) from Ken French's website following Nagel (2012) as another proxy for liquidity, and the factor-mimicking portfolio of liquidity risk (LIQRISK) provided by Pastor and Stambaugh (2003) and available from Robert Stambaugh's webpage. We also add the QMJ factor from Asness, Frazzini, and Pedersen (2014). Results are reported for the full sample for which we have data (January 1968 to December 2012), for the months of January only, for months February-December only, and for the golden age, embarrassment, and resurrection sub periods separately (dates provided in the table).   (2014). The 25 portfolios are formed from independent sorts of stocks into five quintiles using size and quality/junk. The average returns in excess of the monthly T-bill rate and their t-statistics are reported over the sample period from July 1957 to December 2012. Also reported are summary statistics from time-series regressions of the 25 portfolios on each of the following factor models: (i) the Fama and French (1993) Fama and French (1993) factors plus UMD and QMJ are reported over the full sample period from July 1957 to December 2012. Regressions are repeated using the Fama and French (2014) five-factor model, including a lagged return on the market and the momentum factor, UMD, over the period July 1963 to December 2012.

Figure 3: Size Decile Alphas
Plotted are the alphas of each size decile with respect to three factor models: 1) the market model (RMRF), 2) the Fama and French factors RMRF, RMRF lagged a month, HML, and UMD and 3) the Fama and French factors augmented with the QMJ factor. The first graph covers the QMJ sample period from July 1957 to December 2012 and the second graph covers the "golden age" period for size from July 1957 to December 1979.

QMJ Betas of SMB Portfolios by Industry Non-Price Based Size Measures
Book assets Sales Book equity PP&E Employees

Figure 6: Seasonal Patterns in the Size Premium Over Time, with and without QMJ
The first set of four figures plots the alphas outside of January from February to December of various size portfolios with respect to the factors RMRF (and its lagged value), HML, and UMD both with and without the QMJ factor from Asness, Frazzini, and Pedersen (2014). The second set of four figures plots the alphas in January only. The size portfolios include SMB, the spread in P1 -P10, P2 -P9, and P3 -P8 decile portfolios based on market cap sorts, as well as P1 -P10 spreads in decile portfolios based on sorts of book assets, sales, PP&E, and number of employees. Results are reported over four sample periods: the full QMJ period (July 1957 to December 2012) and the sub periods for the golden age (July 1957to December 1979), embarrassment (January 1980to December 1999, and resurrection (January 2000 to December 2012) periods.

Figure 7: International Evidence of SMB Premia Controlling for QMJ
The first figure plots the improvement in SMB alphas (relative to the Fama and French factors RMRF, RMRF lagged a month, HML, and UMD) after controlling for QMJ across 24 countries, as well four regions: global, global excluding U.S., Europe, and North America. Plotted is the difference in SMB alphas between the Fama and French factors versus the Fama and French factors augmented with the QMJ factor, by country and region. The second figure plots the betas of each SMB portfolio on QMJ. The regressions are estimated using rolling five years of data for each country.

Figure 8: International Evidence of Non-Price Size Premia, Controlling for QMJ
The first figure plots the difference in alphas between SMB regressed on the Fama and French factors and SMB regressed on the Fama and French factors plus QMJ for each country and region, where SMB portfolios are formed using non-price based measures of size (book assets, book equity, PP&E, sales, and number of employees). The second figure plots the betas of each SMB portfolio on QMJ by country and region. The regressions are estimated using rolling five years of data for each country or region.

Change in SMB Alpha After Controlling for QMJ, Non-Price Size Measures
Book Assets Book Equity PP&E Sales Employees Electronic copy available at: https://ssrn.com/abstract=2553889

Figure 9: Distribution of Size Among Junk and Quality Stocks
The first figure plots the fraction of the number of stocks over time across five size categories that make up the 20 percent of stocks with the lowest quality/highest junk ranking ("junk"). The second figure plots the fraction of the number of stocks over time across five size categories that make up the 20 percent of stocks with the highest quality/lowest junk ranking ("quality").