Empirica Technologies

Categorical Spectralism: Spectral Decomposition of Large-Cap Equity Return Spaces

Question

Does the eigenvalue spectrum of large-cap equity returns exhibit a spectral gap—a statistically significant separation between random-matrix noise and true common factors—that persists across different market regimes, and does the number of significant factors change over time when the portfolio universe is constrained to highly liquid, marginable equities?

Method

We computed the eigenvalue spectrum of the return correlation matrix for a universe of 12 large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, KO, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 through 2024-12-31, yielding 3772 observations. The data source is yfinance daily adjusted-close returns; the universe is constrained to names with deep liquidity and standard margin eligibility, representing realistic portfolio boundaries for institutional equity strategies.

The inference method is principal component analysis (PCA) of the return correlation matrix, with statistical significance determined by comparison to the Marchenko-Pastur (MP) null distribution. Under the MP null, a correlation matrix of purely random returns (no true common structure) has eigenvalues bounded above by an upper edge and below by a lower edge, both functions of the ratio q = n_assets / n_obs. Eigenvalues exceeding the MP upper bound are statistically distinguishable from random-matrix noise and indicate the presence of genuine common factors. The q-ratio for this dataset is 0.003 (12 assets, 3772 observations), yielding an MP upper bound of 1.116 and a lower bound of 0.8904.

To assess time variation, we recomputed the eigenvalue spectrum and factor count separately within each calendar year (2010 through 2024) on the same universe and method, treating each year as an independent in-sample window. This rolling-window analysis reveals how the number of significant factors evolves across different liquidity and volatility regimes.

Result

The full-sample eigenvalue spectrum exhibits a clear spectral gap. The top 10 eigenvalues are 5.4969, 1.6282, 1.0457, 0.7347, 0.5802, 0.5578, 0.4874, 0.4311, 0.3892, and 0.3411. Comparing these to the MP upper bound of 1.116, we find that 2 eigenvalues exceed the threshold: the first (5.4969) and the second (1.6282). The third eigenvalue (1.0457) falls below the bound and is statistically indistinguishable from random noise. The number of significant factors is therefore 2.

The first factor explains 45.81% of total variance; the two significant factors together explain 59.38% of total variance. The remaining variance is distributed across eigenvalues consistent with the MP null, indicating no additional common structure beyond these two factors.

The loadings on the first factor are dominated by JPM (−0.331), MSFT (−0.321), and BAC (−0.318), with all major names loading negatively and approximately uniformly. This pattern is consistent with a broad market factor: all assets move together, and the sign convention (negative loadings) is arbitrary. The second factor exhibits a different structure: AMZN (−0.413) and NVDA (−0.357) load negatively, while XOM loads positively (0.359). This factor separates growth/technology exposure (negative loadings) from energy/value exposure (positive loading), consistent with a growth-versus-value or sector-rotation factor.

The rolling-window analysis reveals substantial time variation in the number of significant factors. From 2010 through 2015, the factor count was consistently 1 in each calendar year. In 2016, the count increased to 2 and remained at 2 through 2018. In 2019, the count dropped back to 1. From 2020 through 2023, the count was 2 in each year. In 2024, the count increased to 3, the highest observed in the sample.

This time variation is economically interpretable. The single-factor regime in the early 2010s corresponds to the post-crisis period of coordinated monetary policy and low dispersion, when a single market factor dominated. The emergence of a second factor in 2016–2018 coincides with the divergence of growth and value performance and rising sector dispersion. The return to a single factor in 2019 aligns with the late-cycle compression of cross-sectional variance. The two-factor regime in 2020–2023 reflects the COVID-era bifurcation between pandemic winners (technology, e-commerce) and losers (energy, financials), followed by the inflation/rate-hike regime that sustained sector rotation. The three-factor regime in 2024 suggests further fragmentation, potentially driven by AI-driven dispersion within technology (NVDA versus broader tech) or renewed energy/commodity dynamics.

Interpretation

The results provide strong evidence for a persistent spectral gap in large-cap equity returns, even under realistic portfolio constraints. The full-sample spectrum cleanly separates two significant factors from random noise, with the third eigenvalue falling 6.2% below the MP upper bound. This gap is not marginal: the first eigenvalue is 4.9 times the MP threshold, and the second is 1.46 times the threshold, indicating robust common structure well above the noise floor.

The two-factor structure is economically interpretable. The first factor is a broad market factor, capturing the common movement of all assets. The second factor is a growth-versus-value or sector-rotation factor, separating technology/growth names (AMZN, NVDA) from energy/value names (XOM). The absence of a third significant factor in the full sample indicates that, over the 15-year window, no additional persistent common structure (e.g., a separate financial-sector factor or a momentum factor) rises above the noise threshold. The variance not explained by the two factors (40.62%) is consistent with idiosyncratic risk and transient correlations that do not persist long enough to form a stable eigenvalue above the MP bound.

The rolling-window results demonstrate that the spectral gap is not static. The number of significant factors ranges from 1 to 3 across calendar years, with clear regime shifts. The single-factor regime in the early 2010s indicates that, during periods of low dispersion and coordinated policy, a single market factor dominates and secondary factors (growth/value, sector rotation) compress into the noise. The two-factor regime in 2016–2018 and 2020–2023 indicates that, during periods of higher dispersion and divergent sector performance, a second factor emerges above the noise threshold. The three-factor regime in 2024 indicates that, in the most recent period, a third source of common variation (potentially AI-driven technology dispersion or renewed commodity dynamics) has become statistically significant.

Critically, these results are in-sample within each rolling window. The factor count in a given year is computed on that year's data alone, not on out-of-sample data. The time variation therefore reflects genuine changes in the correlation structure within each regime, not overfitting or look-ahead bias. However, the results do not establish that the factor structure identified in one period will persist into the next. A factor that is significant in-sample in 2024 may or may not remain significant in 2025.

The constraint to a highly liquid, marginable universe (12 large-cap names) is a feature, not a limitation, for the research question. The spectral gap persists even in a small, homogeneous universe where one might expect high correlation and limited factor diversity. This suggests that the spectral gap is a robust property of equity return spaces, not an artifact of including many small, illiquid, or sector-concentrated names. The q-ratio of 0.003 (12 assets, 3772 observations) is well within the regime where the MP bound is a reliable null, and the large sample size (3772 days) ensures that the eigenvalue estimates are stable.

The results do not support the hypothesis that realistic margin and liquidity constraints eliminate the spectral gap. On the contrary, the gap is clear and persistent even under these constraints. The results also do not support the hypothesis that the number of significant factors is constant across regimes. The factor count varies from 1 to 3, with economically interpretable regime shifts.

Limitations

The sample is limited to 12 large-cap U.S. equities over a 15-year window. The spectral gap and factor count may differ in other universes (e.g., small-cap, international, sector-specific) or other time periods (e.g., pre-2008, non-U.S. markets). The constraint to highly liquid, marginable names ensures realistic portfolio boundaries but also limits cross-sectional diversity. A larger universe (e.g., 50 or 100 names) might reveal additional factors or a different spectral structure.

The rolling-window analysis treats each calendar year as an independent in-sample window. This choice maximizes the number of observations per window (typically 250+ trading days per year) but does not test out-of-sample stability. A factor that is significant in-sample in one year may not predict returns or remain significant in the next year. The results establish that the factor structure changes over time, but they do not establish that the changes are predictable or that the factors are stable enough for out-of-sample portfolio construction.

The MP null assumes that returns are independent and identically distributed (i.i.d.) Gaussian random variables with no common structure. Real equity returns exhibit time-varying volatility, fat tails, and autocorrelation, which can shift the MP bounds. The MP upper bound of 1.116 is derived under the i.i.d. Gaussian null; violations of this null (e.g., GARCH effects, regime-switching volatility) could shift the threshold. However, the first two eigenvalues (5.4969 and 1.6282) are so far above the bound that moderate shifts in the threshold would not change the conclusion that they are significant.

The interpretation of the second factor as growth-versus-value is based on the loadings (technology/e-commerce negative, energy positive) and the timing of its emergence (2016–2018, 2020–2024). This interpretation is plausible but not definitive. The factor could also reflect other sources of variation (e.g., interest-rate sensitivity, commodity exposure, or idiosyncratic NVDA/AMZN dynamics). A more definitive interpretation would require regressing the factor returns on observable characteristics (book-to-market, earnings growth, sector dummies) or comparing the factor to established benchmarks (HML, momentum).

The three-factor regime in 2024 is based on a single calendar year of data (approximately 250 trading days). This is sufficient to compute a stable eigenvalue spectrum, but it is not sufficient to establish that the third factor is a persistent feature of the return space rather than a transient correlation driven by a specific event (e.g., the AI boom, a commodity shock, or a policy shift). Replication on 2025 data would strengthen the evidence for a sustained three-factor regime.

The results do not address the economic content or investability of the factors. A statistically significant factor (eigenvalue above the MP bound) is not necessarily an economically significant or tradable factor. The factor may have low Sharpe ratio, high turnover, or exposure to uncompensated risks. The results establish that the factors exist in the correlation structure, but they do not establish that the factors are useful for portfolio construction or risk management.

Strengthening the evidence would require: (1) expanding the universe to 50–100 names to test whether the spectral gap and factor count scale with cross-sectional diversity; (2) computing out-of-sample factor stability (e.g., estimating factors on year t data and testing whether they explain variance in year t+1); (3) regressing factor returns on observable characteristics to identify the economic content of the second and third factors; (4) replicating the analysis on international or sector-specific universes to test whether the spectral gap is a universal property of equity return spaces or specific to large-cap U.S. equities; and (5) extending the rolling-window analysis to 2025 and beyond to confirm that the three-factor regime in 2024 is persistent rather than transient.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces