Empirica Technologies

Categorical Spectralism: Spectral Decomposition of Large-Cap US Equity Return Spaces

Question

Does the eigenvalue spectrum of large-cap US equity returns exhibit clustering and spectral gaps consistent with regime transitions, and how many statistically significant common factors emerge above the Marchenko-Pastur null distribution for random correlation matrices?

Method

We computed the eigenvalue spectrum of the return correlation matrix for 11 large-cap US equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, META, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 through 2024-12-31 (3772 observations). The data source is yfinance daily adjusted-close returns for the named tickers over the stated window.

Principal component analysis was applied to the 11×11 correlation matrix, yielding 11 eigenvalues. Statistical significance was established by comparison to the Marchenko-Pastur (MP) distribution, which characterizes the eigenvalue spectrum of a purely random correlation matrix with the same dimensions. For a matrix with q = n_assets / n_obs = 11 / 3772 ≈ 0.003, the MP distribution predicts eigenvalues in the range [λ_min, λ_max] where λ_min,max = (1 ± √q)². Eigenvalues exceeding the upper MP bound are statistically distinguishable from random-matrix noise and indicate genuine common factors.

To assess temporal stability, the same computation was repeated on a per-calendar-year basis (in-sample within each year from 2010 through 2024), yielding a time series of significant factor counts.

Result

The full-period eigenvalue spectrum (descending order) is:

Top 10 eigenvalues: 5.1473, 1.5929, 0.9728, 0.7274, 0.5580, 0.5024, 0.4561, 0.3919, 0.3428, 0.1652.

Marchenko-Pastur bounds: The MP upper bound is 1.1109 and the lower bound is 0.8949.

Number of significant factors: 2 eigenvalues exceed the MP upper bound (5.1473 and 1.5929), establishing two statistically significant common factors above the random-matrix null.

Variance explained: The top factor accounts for 46.79% of total variance; the two significant factors together explain 61.28%.

Factor structure (loadings):

Factor 1 (λ = 5.1473): The three largest-magnitude loadings are JPM (−0.343), MSFT (−0.335), and BAC (−0.331). This factor loads broadly and nearly uniformly across the universe, consistent with a market-wide or systematic risk component.
Factor 2 (λ = 1.5929): The three largest-magnitude loadings are AMZN (−0.399), XOM (+0.395), and CVX (+0.368). The sign pattern reveals a sector-specific contrast: technology (AMZN negative) versus energy (XOM, CVX positive), consistent with a growth-versus-value or tech-versus-energy rotation factor.

Temporal dynamics (per-year significant factor count):

2010–2015: 1 significant factor per year.
2016: 2 factors.
2017: 2 factors.
2018: 1 factor.
2019: 1 factor.
2020: 2 factors.
2021–2024: 2 factors per year.

The time series shows a structural shift: the second factor becomes persistently significant starting in 2016, with brief reversion to a single-factor regime in 2018–2019, followed by stable two-factor structure from 2020 onward.

Interpretation

What the numbers support

The eigenvalue spectrum exhibits a clear spectral gap: the top two eigenvalues (5.1473, 1.5929) lie well above the Marchenko-Pastur upper bound (1.1109), while all remaining eigenvalues fall below it. This gap is the signature of genuine low-dimensional structure in the return covariance, as opposed to high-dimensional noise. The result supports the hypothesis that large-cap US equity returns are governed by a small number of common factors—specifically, two—over the full 2010–2024 period.

The factor loadings provide economic interpretation. Factor 1, with near-uniform loadings across all 11 names, is consistent with a market factor capturing systematic risk common to all large-cap equities. Factor 2, with opposing signs on technology (AMZN) and energy (XOM, CVX), captures a sector rotation or style tilt orthogonal to the market. The emergence of this second factor as persistently significant from 2016 onward (and especially from 2020) suggests a regime transition: the return space became structurally two-dimensional, reflecting increased differentiation between growth and value/energy exposures.

The temporal dynamics reveal that the factor count is not constant. The single-factor regime of 2010–2015 indicates that during the post-crisis recovery, a single market-wide factor dominated. The appearance of a second factor in 2016–2017, its temporary disappearance in 2018–2019, and its persistent re-emergence in 2020–2024 align with known market regimes: the 2016 energy recovery, the 2018 volatility spike and subsequent calm, and the 2020–2024 period of pronounced growth-value divergence driven by pandemic policy, inflation, and rate cycles. The rolling-window result demonstrates that spectral structure is regime-dependent, not a static property.

What the numbers do NOT support

The result does not support the presence of more than two significant factors over the full period. The third eigenvalue (0.9728) lies below the MP upper bound, indicating it is statistically indistinguishable from noise. Claims of higher-dimensional factor structure (e.g., separate factors for each sector or for idiosyncratic firm-level dynamics) are not supported by this data.

The result does not establish that the two-factor structure is stable across all subperiods. The per-year counts show clear variation: a single factor sufficed in 2010–2015 and briefly in 2018–2019. The two-factor regime is dominant in recent years but not universal.

The result does not identify the economic drivers of the factors beyond what the loadings suggest. Factor 1 is consistent with a market factor, and Factor 2 with a tech-energy rotation, but the computation does not test whether these factors correspond to specific macroeconomic variables (e.g., oil prices, interest rates, earnings growth differentials). Such identification would require external regressors.

The result does not provide out-of-sample validation. The eigenvalue decomposition and MP comparison are in-sample statistics. Whether the two-factor structure persists in future data, or whether it generalizes to other universes (e.g., small-cap, international, or sector-specific portfolios), is not addressed.

Relation to the Literature

No closely related papers were retrieved for this computation. The result stands on the computed eigenvalue spectrum and its comparison to the Marchenko-Pastur null. The Marchenko-Pastur distribution itself originates in random matrix theory (Marčenko & Pastur, 1967, Mathematics of the USSR-Sbornik) and has been applied to financial correlation matrices to distinguish signal from noise (e.g., Laloux et al., 1999, Physical Review Letters; Plerou et al., 2002, Physical Review E), but those applications are not directly cited here. The present result extends the random-matrix methodology to a specific large-cap US equity universe over a 15-year window and documents time-varying factor counts, a feature not typically emphasized in static eigenvalue studies.

The two-factor structure—market plus sector rotation—is consistent with classical factor models (e.g., Fama-French, APT), but the present computation does not impose a pre-specified factor structure. Instead, it recovers the dimensionality and loadings empirically from the correlation matrix. The finding that a second factor becomes persistently significant only from 2016 onward suggests that factor structure is regime-dependent, a point underexplored in static factor model literature.

Limitations

Sample size and universe: The computation uses 11 large-cap US equities, a small and highly liquid subset of the market. The q-ratio (0.003) is extremely low, which tightens the MP bounds and increases statistical power to detect factors, but the small cross-section limits the generality of the result. A broader universe (e.g., the S&P 500 or Russell 3000) would provide a more comprehensive picture of factor structure and might reveal additional significant factors.

In-sample only: The eigenvalue spectrum and MP comparison are computed on the same data used to estimate the correlation matrix. There is no out-of-sample test of whether the two-factor structure persists in future periods or whether the identified factors have predictive power for returns. The per-year rolling counts are in-sample within each year, not true out-of-sample forecasts.

Stationarity assumption: The full-period computation assumes that the correlation structure is stationary over 2010–2024. The per-year results show this is not the case: the factor count varies from 1 to 2. A more refined analysis would use rolling windows (e.g., 252-day or 504-day) to track the evolution of the spectrum continuously, rather than discretizing by calendar year.

Economic interpretation: The loadings suggest a market factor and a tech-energy rotation, but the computation does not test these interpretations against external data (e.g., regressing the factor scores on oil prices, the VIX, or the HML factor). The economic labels are plausible but not formally validated.

Marchenko-Pastur assumptions: The MP distribution assumes Gaussian returns and a large-n, large-T limit. Daily equity returns exhibit fat tails and autocorrelation, which can shift the MP bounds. The result is robust to moderate deviations, but extreme non-Gaussianity or strong autocorrelation could affect the significance threshold.

Strengthening the result: To strengthen the finding, one would: (1) expand the universe to hundreds of assets to test whether the two-factor structure generalizes; (2) compute rolling-window spectra at higher frequency (e.g., monthly) to track regime transitions more precisely; (3) perform out-of-sample tests by splitting the data and checking whether factors identified in one period explain variance in the next; (4) regress the factor scores on macroeconomic variables to validate the economic interpretation; (5) apply robust covariance estimators (e.g., shrinkage, robust M-estimators) to mitigate the impact of outliers and non-Gaussianity.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces