Empirica Technologies

Spectral Structure and Regime Detection in Large-Cap Equity Correlation Matrices

Question

Does the eigenvalue spectrum of a large-cap equity correlation matrix exhibit stable spectral density shape and coherent bulk structure above the Marchenko-Pastur random-matrix null, and can the largest eigenvalue serve as a quantitative regime identifier by measuring the strength of market-wide co-movement?

Method

We computed the eigenvalue decomposition of the 11×11 return correlation matrix for a universe of large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, META, MSFT, NVDA, PFE, XOM) over 3,772 daily observations spanning 2010-01-01 through 2024-12-31. Data source: yfinance daily adjusted-close returns.

The Marchenko-Pastur (MP) distribution provides the null hypothesis: under the assumption that returns are independent Gaussian noise, the eigenvalue spectrum of the sample correlation matrix converges to a deterministic density with support [λ₋, λ₊], where the bounds depend on the ratio q = n_assets / n_obs. For our configuration (q = 0.003), the MP bounds are λ₋ = 0.8949 and λ₊ = 1.1109. Eigenvalues exceeding λ₊ are statistically distinguishable from random-matrix noise and represent genuine co-movement structure.

We identified significant factors as those with eigenvalues above the MP upper bound, extracted the top factor loadings, and computed the variance explained by the leading eigenvalue and by all significant factors. To assess temporal stability, we recomputed the eigenvalue spectrum and significant factor count within each calendar year (2010–2024) using the same method, yielding a 15-point time series of factor counts.

Result

Full-sample spectral structure

The top 10 eigenvalues in descending order are: 5.1473, 1.5929, 0.9728, 0.7274, 0.5580, 0.5024, 0.4561, 0.3919, 0.3428, 0.1652. The MP upper bound is 1.1109. Two eigenvalues exceed this threshold: λ₁ = 5.1473 and λ₂ = 1.5929. The remaining nine eigenvalues lie below the MP bound and are consistent with random-matrix noise.

The largest eigenvalue (λ₁ = 5.1473) is 4.63 times the MP upper bound, indicating a dominant market-wide mode. This factor alone explains 46.79% of total variance. The two significant factors together explain 61.28% of variance.

Factor loadings

Factor 1 (market mode, λ = 5.1473): The top three loadings are JPM (−0.343), MSFT (−0.335), and BAC (−0.331). All 11 assets load with the same sign (negative by convention), consistent with a common market factor. The near-uniform magnitude of loadings across financials (JPM, BAC), technology (MSFT, AAPL, GOOGL, META, NVDA, AMZN), energy (XOM, CVX), healthcare (JNJ), and pharmaceuticals (PFE) confirms that this factor captures broad market co-movement rather than sector-specific dynamics.

Factor 2 (sector contrast, λ = 1.5929): The top three loadings are AMZN (−0.399), XOM (+0.395), and CVX (+0.368). This factor separates technology/consumer (negative loadings: AMZN, NVDA, META) from energy (positive loadings: XOM, CVX). The opposing signs indicate a sector rotation or risk-on/risk-off dynamic orthogonal to the market mode.

Temporal dynamics

The per-year significant factor count (eigenvalues above the MP bound within each calendar year) exhibits clear regime structure:

2010–2015: Stable single-factor regime. One significant eigenvalue in each year.
2016: Transition to two-factor regime.
2017: Two factors.
2018: Reversion to one factor (market stress year).
2019: One factor.
2020–2024: Persistent two-factor regime. Two significant eigenvalues in each year from 2020 onward.

The shift from one to two factors in 2016 and the sustained two-factor structure from 2020 forward suggest increased differentiation in sector-level risk premia or a structural change in cross-asset correlation patterns. The 2018 reversion to one factor coincides with elevated market volatility and a compression of sector dispersion during the Q4 2018 drawdown, consistent with a flight-to-correlation during stress.

Interpretation

What the result supports

Stable market mode dominance: The largest eigenvalue (5.1473) is far above the MP bound and explains nearly half of total variance, confirming that large-cap equities share a dominant common factor. This factor is not an artifact of finite-sample noise—it is a genuine structural feature of the correlation matrix.
Sparse significant structure: Only two eigenvalues exceed the MP bound. The remaining nine lie within the random-matrix bulk, meaning that 82% of the eigenvalue spectrum (9 of 11 factors) is statistically indistinguishable from noise. This sparsity is consistent with low-dimensional factor structure in equity returns: a market factor and one orthogonal sector/style contrast account for the majority of explainable co-movement.
Regime detection via factor count: The time series of significant factor counts exhibits discrete regime shifts. The transition from one to two factors in 2016 and the persistent two-factor regime from 2020 onward are not gradual drifts but step changes. This suggests that the number of eigenvalues above the MP bound can serve as a regime indicator: a single-factor regime corresponds to market-dominated dynamics (high correlation, low dispersion), while a two-factor regime indicates the emergence of a secondary orthogonal source of risk (sector rotation, style divergence).
Economic interpretation of factors: Factor 1 is the market mode (uniform loadings). Factor 2 is a sector contrast (technology vs. energy), consistent with known risk-on/risk-off or growth/value dynamics. The fact that this second factor only persistently exceeds the MP bound from 2020 onward suggests that sector differentiation strengthened during and after the COVID-19 shock, possibly due to divergent policy impacts (tech benefiting from remote work, energy suffering from demand collapse and subsequent recovery).

What the result does NOT support

High-dimensional factor structure: The result does not support the hypothesis that large-cap equity returns are driven by many independent factors. Only two factors are statistically significant; the remaining variance is noise or idiosyncratic risk.
Stable multi-factor regime: The factor count is not constant over time. The single-factor regime of 2010–2015 and the two-factor regime of 2020–2024 are distinct. Any model assuming a fixed number of factors across the full sample would misspecify the dynamics.
Predictive power of the spectrum: This is an in-sample decomposition. The result quantifies the correlation structure within the observed window but does not establish that eigenvalue magnitudes or factor counts predict future returns, volatility, or regime transitions. The time variation in factor count is descriptive, not predictive.
Generalization beyond large-cap U.S. equities: The universe is 11 large-cap U.S. stocks. The result does not extend to small-cap, international, or cross-asset universes without recomputation. The MP bounds and factor structure are specific to this sample size, observation count, and asset selection.

Relation to the Literature

No closely related papers were retrieved for this computation. The result stands on the computed eigenvalue spectrum and its comparison to the Marchenko-Pastur null. The methodological framework—using random matrix theory to distinguish signal from noise in correlation matrices—originates in statistical physics and has been applied to financial correlation matrices since the late 1990s, but the specific empirical question (regime detection via factor count dynamics in a large-cap equity universe over 2010–2024) is addressed here through direct computation rather than literature synthesis.

The finding that the largest eigenvalue is far above the MP bound and explains ~47% of variance is consistent with the well-known empirical regularity that equity returns share a dominant market factor. The novel contribution is the quantification of the MP separation (λ₁ / λ₊ = 4.63) and the documentation of discrete regime shifts in the number of significant factors, particularly the transition to a persistent two-factor regime from 2020 onward.

Limitations

Small universe: The sample contains only 11 assets. The MP bounds are sensitive to the ratio q = n_assets / n_obs; with q = 0.003, the bounds are tight, but the small n_assets limits the resolution of the spectral density. A larger universe (50–100 assets) would provide finer structure in the bulk and potentially reveal additional significant factors.
In-sample decomposition: The eigenvalue spectrum is computed on the full sample (or within each year for the rolling analysis). This is a descriptive decomposition, not an out-of-sample test. The result does not establish that the factor structure is stable in held-out data or that it has predictive power for future returns.
Asset selection bias: The universe is hand-selected large-cap names, not a random sample or a market-cap-weighted index. The inclusion of two energy stocks (XOM, CVX) and multiple tech names (AAPL, MSFT, GOOGL, META, NVDA, AMZN) may amplify the sector contrast captured by Factor 2. A more balanced sector representation or a broader index (e.g., S&P 500 constituents) would test whether the two-factor structure is robust.
Daily frequency and stationarity: The correlation matrix is computed on daily returns over a 15-year window. Correlation structure may vary at different frequencies (weekly, monthly) or exhibit non-stationarity within the sample. The per-year recomputation partially addresses this by showing time variation, but a more granular rolling-window analysis (e.g., 252-day windows) would reveal intra-year dynamics.
MP assumptions: The Marchenko-Pastur null assumes independent Gaussian returns. Equity returns exhibit fat tails, autocorrelation, and heteroskedasticity. Deviations from the MP null may reflect these stylized facts rather than genuine factor structure. A more conservative test would use a bootstrapped null or a cleaned correlation matrix (e.g., removing the market mode before testing the residual spectrum).
Economic interpretation of regime shifts: The result documents that the factor count increased in 2016 and 2020, but it does not identify the economic drivers. Linking the regime shifts to specific events (e.g., the 2016 election, the 2020 pandemic, monetary policy changes) would require external data and causal analysis beyond the eigenvalue decomposition.
No confidence intervals on eigenvalues: The eigenvalues are point estimates from the sample correlation matrix. Finite-sample uncertainty in the eigenvalues is not quantified. A bootstrap or jackknife procedure would provide confidence intervals and test whether λ₂ is robustly above the MP bound or whether the two-factor classification is sensitive to sampling variation.

Strengthening the result: A replication on a larger, more representative universe (e.g., S&P 500 constituents), an out-of-sample test of factor stability (e.g., estimating the correlation matrix on 2010–2019 and testing the factor structure on 2020–2024), and a bootstrap-based confidence interval on the eigenvalues would increase confidence in the regime-detection interpretation. Linking the factor count dynamics to macroeconomic or market microstructure variables (volatility, dispersion, policy shocks) would provide economic grounding for the observed regime shifts.

Research evidence, not investment advice.

Spectral theory of correlation matrices — eigenvalue decomposition as regime detection