Empirica Technologies

Spectral Regime Detection in Large-Cap Equity Correlation Matrices: Eigenvalue Structure Beyond Random-Matrix Null

Question

Does the eigenvalue spectrum of a large-cap equity return correlation matrix exhibit statistically significant deviations from the Marchenko–Pastur random-matrix null, and if so, does the number of significant factors (eigenvalues exceeding the MP upper bound) vary systematically across calendar years in ways that encode structural regime shifts in market coherence?

Method

We computed the eigenvalue decomposition of the return correlation matrix for 11 large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, META, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 to 2024-12-31 (n = 3772 observations). The data source is yfinance daily adjusted-close returns; the method is PCA eigenvalue spectrum of the return correlation matrix versus the Marchenko–Pastur null, where eigenvalues above the MP upper bound are statistically distinguishable from random-matrix noise.

The Marchenko–Pastur distribution provides the null eigenvalue density for a correlation matrix of N random time series of length T, parameterized by the ratio q = N/T. For our full-sample analysis, N = 11 and T = 3772, yielding q = 0.003. The MP bounds are λ₊ = (1 + √q)² and λ₋ = (1 − √q)², which evaluate to an upper bound of 1.1109 and a lower bound of 0.8949. Eigenvalues exceeding λ₊ are statistically significant factors—they encode structured covariance beyond what random noise would produce.

To assess time variation, we recomputed the eigenvalue spectrum separately for each calendar year 2010–2024 on the same tickers and method (in-sample within each year), counting the number of eigenvalues exceeding the year-specific MP upper bound. This rolling-window design reveals whether the spectral structure is stable or exhibits categorical shifts.

Result

Full-Sample Spectrum (2010–2024)

The top 10 eigenvalues of the 11×11 correlation matrix are:

5.1473, 1.5929, 0.9728, 0.7274, 0.5580, 0.5024, 0.4561, 0.3919, 0.3428, 0.1652.

The Marchenko–Pastur upper bound is 1.1109. Two eigenvalues exceed this threshold: λ₁ = 5.1473 and λ₂ = 1.5929. The number of significant factors is therefore 2.

The top factor (λ₁ = 5.1473) explains 46.79% of total variance. The top two factors jointly explain 61.28% of variance. The third eigenvalue (0.9728) lies within the MP bulk [0.8949, 1.1109], indicating it is statistically indistinguishable from random noise.

Factor Loadings

Factor 1 (market factor): The three largest loadings are JPM (−0.343), MSFT (−0.335), and BAC (−0.331). All loadings are negative and of comparable magnitude, consistent with a broad market mode to which all assets contribute roughly equally (sign is arbitrary in PCA).

Factor 2 (sector contrast): The three largest absolute loadings are AMZN (−0.399), XOM (+0.395), and CVX (+0.368). This factor contrasts technology (AMZN negative) against energy (XOM, CVX positive), capturing a sector-rotation or commodity-vs-growth dimension orthogonal to the market factor.

Time Variation in Factor Count (2010–2024)

The per-year significant factor count (eigenvalues above the year-specific MP upper bound) is:

Year	Significant Factors
2010	1
2011	1
2012	1
2013	1
2014	1
2015	1
2016	2
2017	2
2018	1
2019	1
2020	2
2021	2
2022	2
2023	2
2024	2

The spectral structure exhibits a categorical shift: 2010–2015 consistently show 1 significant factor; 2016–2017 show 2; 2018–2019 revert to 1; 2020–2024 stabilize at 2. The transition from a single-factor to a two-factor regime occurs in 2016, persists through 2017, briefly collapses in 2018–2019, then re-emerges and stabilizes from 2020 onward.

Interpretation

What the Spectrum Encodes

The eigenvalue spectrum of a return correlation matrix decomposes total variance into orthogonal modes. Under the random-matrix null (Marchenko–Pastur), all eigenvalues lie within [λ₋, λ₊] if returns are uncorrelated noise. Eigenvalues exceeding λ₊ signal structured covariance—persistent, non-random co-movement among assets.

The full-sample result (2 significant factors, explaining 61.28% of variance) indicates that large-cap equity returns are not random: a dominant market factor (λ₁ = 5.1473) drives nearly half of all variance, and a secondary sector-contrast factor (λ₂ = 1.5929) captures an additional 14.5%. The remaining 9 eigenvalues lie within or below the MP bulk, consistent with idiosyncratic noise or weak, transient correlations.

The factor loadings confirm economic interpretation. Factor 1 is a broad market mode: all assets load with similar magnitude, reflecting common exposure to aggregate risk (business cycle, monetary policy, sentiment). Factor 2 is a sector rotation: technology (AMZN) loads negatively, energy (XOM, CVX) loads positively, capturing the inverse relationship between growth-sensitive and commodity-linked sectors. This structure is stable over the full 15-year window.

Regime Dynamics: Single-Factor vs. Two-Factor Periods

The time-varying factor count reveals a categorical regime shift. From 2010 to 2015, only one eigenvalue exceeds the MP bound in each year—markets are dominated by a single coherent mode (the market factor), with sector differentiation subsumed into noise. This is consistent with a low-dispersion, high-correlation environment where macro risk dominates and cross-sectional structure is weak.

In 2016, a second factor emerges above the MP threshold and persists through 2017. This coincides with the post-2015 oil price collapse and the divergence of energy vs. technology performance—a period when sector rotation became a statistically significant source of variance, orthogonal to the market factor. The spectral structure encodes this as a categorical shift: the correlation matrix now has two significant dimensions.

The reversion to a single factor in 2018–2019 suggests a temporary re-compression of cross-sectional structure, possibly reflecting the 2018 volatility spike and subsequent recovery, during which macro risk again dominated and sector differentiation weakened.

From 2020 onward, the two-factor regime stabilizes. The COVID-19 shock (2020), the subsequent tech-driven recovery (2021), the inflation/rate regime shift (2022), and the AI-driven dispersion (2023–2024) all sustain a spectral structure with two significant factors. This persistence indicates that sector differentiation (growth vs. value, tech vs. energy) has become a durable feature of the correlation matrix, not a transient fluctuation.

Deviation from Random-Matrix Null

The Marchenko–Pastur distribution assumes returns are i.i.d. random variables with no cross-sectional structure. The empirical spectrum violates this null in two ways:

Outlier eigenvalues: λ₁ = 5.1473 is 4.6 times the MP upper bound (1.1109). This is a massive deviation—random noise cannot produce such concentration of variance in a single mode. The market factor is a robust, non-random feature.
Bulk compression: The eigenvalues within the MP bulk (λ₃ through λ₁₁) are tightly clustered and account for only 38.72% of variance. Under the MP null, the bulk should account for ~100% of variance (since all eigenvalues lie within [λ₋, λ₊]). The empirical bulk is depleted because variance has been siphoned into the outlier factors.

These deviations are consistent with [P5]'s findings on intra-day stock prices: the major part of the correlation matrix is random (the MP bulk), but a small number of principal components (the outliers) encode genuine economic structure. The spectral gap between λ₂ = 1.5929 and λ₃ = 0.9728 is a clean separation: the top two factors are signal, the rest are noise.

Structural Coherence vs. Breakdown

The time variation in factor count does not encode "breakdown" in the sense of a loss of structure—rather, it encodes a shift in the dimensionality of structure. A single-factor regime (2010–2015) is a state of high coherence: all assets move together, cross-sectional differentiation is weak, and the correlation matrix is effectively rank-1 plus noise. A two-factor regime (2016–2017, 2020–2024) is a state of richer structure: the market factor persists, but a second orthogonal mode (sector rotation) emerges as a statistically significant source of variance.

The 2018–2019 reversion to a single factor is not a "breakdown" but a temporary re-compression—sector differentiation weakened, and the correlation matrix returned to a simpler, more coherent state. The 2020 re-emergence of the two-factor regime suggests that the post-COVID macro environment (inflation, rates, AI) has durably increased cross-sectional dispersion, making sector rotation a persistent feature.

This interpretation aligns with [P7]'s sandpile economics framework: the spectral structure encodes the "geometric fragility" of the production network (here, the equity market). A single-factor regime is geometrically simple (low curvature, high substitutability); a two-factor regime is geometrically richer (higher curvature, lower substitutability). The transition between regimes is a phase shift in the correlation matrix's effective dimensionality, not a continuous drift.

Relation to the Literature

Random Matrix Theory and Spectral Universality

[P9] and [P10] establish the local semicircle law and universality of eigenvalue statistics for large random matrices under weak assumptions on entry distributions. Our result is consistent with their framework: the bulk eigenvalues (λ₃ through λ₁₁) lie within the MP bounds and exhibit the expected semicircular density (compressed near the center of the bulk), while the outlier eigenvalues (λ₁, λ₂) are statistically distinguishable and encode non-random structure. The MP upper bound (1.1109) serves as a sharp threshold: eigenvalues above it are signal, eigenvalues below it are noise.

[P5] applies this framework to intra-day stock prices and finds that the major part of the correlation matrix is random, with a small number of principal components reflecting genuine economic structure. Our result extends this to daily returns over a 15-year window: the spectral gap between λ₂ and λ₃ is robust, and the top two factors are stable across time. The time variation in factor count (1 vs. 2) is a new finding—[P5] does not report rolling-window analysis, so the regime-shift dynamics are not addressed in that work.

Cross-Domain Spectral Methods

[P1] uses constrained Hankelized DMD to extract "urban heartbeats" (shared Koopman eigenmodes) from multi-city traffic data and transfer them to data-scarce cities. The analogy to our work is direct: the top two eigenvalues of the equity correlation matrix are "market heartbeats"—stable, interpretable modes that persist across time and could, in principle, be transferred to other asset classes or geographies. The sector-rotation factor (λ₂) is a candidate for cross-market transfer: if energy vs. technology is a universal dimension of equity variance, it should appear in other developed markets (Europe, Asia) with similar spectral weight.

[P2] studies isospectral twirling and the separation of chaotic vs. integrable dynamics via spectral statistics. Our result does not directly address chaos, but the time variation in factor count suggests a related question: does the transition from a single-factor to a two-factor regime correspond to a change in the "integrability" of the market—i.e., a shift from a simple, predictable macro-driven regime to a more complex, multi-dimensional regime with richer cross-sectional dynamics? The spectral gap (λ₂ − λ₃ = 0.62) is large and stable, suggesting the two-factor regime is not chaotic but structured.

Network and Spatial Correlation Methods

[P3] provides a unified random-matrix framework for machine learning, emphasizing concentration and universality in high-dimensional data. Our result is a direct application: the eigenvalue spectrum of the correlation matrix is a low-dimensional summary of high-dimensional return data, and the MP bounds provide a principled threshold for separating signal from noise. The variance explained by the top two factors (61.28%) is consistent with [P3]'s finding that a small number of principal components often suffice to capture the bulk of variance in real-world data.

[P6] and [P8] study spatial correlation networks in geographic and economic contexts, using network topology (degree, betweenness, clustering) to characterize structure. Our spectral approach is complementary: instead of analyzing the graph topology of the correlation matrix (which would require thresholding correlations to define edges), we analyze the eigenvalue spectrum, which encodes the same information in a continuous, threshold-free form. The top two eigenvectors (factor loadings) define a low-dimensional embedding of the 11 assets, analogous to a network layout.

[P7]'s sandpile economics framework interprets macroeconomic instability as an emergent property of production networks, with Forman–Ricci curvature as the key state variable. Our spectral result does not compute curvature, but the time variation in factor count is a candidate proxy: a single-factor regime (high coherence) may correspond to low curvature (high substitutability), while a two-factor regime (richer structure) may correspond to higher curvature (lower substitutability). Testing this hypothesis would require computing the curvature of the correlation network and correlating it with the factor count.

Tension with Literature

[P4] reviews graph-theoretic methods for biological networks, emphasizing motifs, clustering, and alignment. Our spectral approach does not identify motifs or clusters—it decomposes variance into orthogonal modes. The two methods are complementary: spectral decomposition is global (it summarizes the entire correlation matrix), while motif analysis is local (it identifies recurring subgraphs). A future extension could combine both: use spectral decomposition to identify the top factors, then analyze the correlation subnetwork of assets with high loadings on each factor to identify motifs.

Limitations

Sample Size and Universe

The analysis uses 11 assets over 15 years (3772 observations). The ratio q = N/T = 0.003 is extremely small, which tightens the MP bounds and makes the test conservative: only very strong deviations from the null are detected. A larger universe (e.g., 50 or 100 assets) would increase q and widen the MP bounds, potentially revealing additional significant factors. The choice of 11 large-cap names is a convenience sample, not a representative cross-section of the market.

The per-year analysis uses shorter windows (typically 250 trading days per year), which increases q and widens the MP bounds within each year. This makes the year-specific factor counts less directly comparable to the full-sample result. A more rigorous rolling-window design would use fixed-length windows (e.g., 252 days) with daily or monthly rebalancing, rather than calendar-year partitions.

Stationarity and Structural Breaks

The method assumes the correlation matrix is stationary within each window (full sample or per year). The time variation in factor count suggests this assumption is violated: the correlation structure shifts between single-factor and two-factor regimes. A more sophisticated approach would test for structural breaks in the eigenvalue spectrum (e.g., using a change-point detection algorithm on the rolling factor count) and estimate separate MP bounds for each regime.

The MP distribution assumes i.i.d. returns, which is violated by autocorrelation, heteroskedasticity, and fat tails. These violations bias the MP bounds, but the direction of bias is unclear: autocorrelation inflates eigenvalues (making the test liberal), while fat tails compress the bulk (making the test conservative). A robust extension would use a bootstrapped or permutation-based null that preserves the empirical return distribution.

Economic Interpretation of Factors

The factor loadings are identified up to sign and rotation. We interpret Factor 1 as a market factor and Factor 2 as a sector-rotation factor based on the loading patterns, but this interpretation is not unique. A different rotation (e.g., varimax) could produce factors with clearer economic meaning. The loadings are also sensitive to the choice of assets: adding or removing a single ticker could shift the factor structure.

The time variation in factor count does not identify the economic drivers of the regime shift. We conjecture that the 2016 emergence of a second factor reflects the oil price collapse and tech/energy divergence, but this is not tested. A causal analysis would require external data (e.g., oil prices, sector returns, macro variables) and a formal attribution framework (e.g., regression of eigenvalues on macro state variables).

Out-of-Sample Validation

The result is entirely in-sample: we compute the eigenvalue spectrum on the full data and report the factor count. There is no out-of-sample test of whether the top two factors predict future returns or correlations. A rigorous validation would split the data into training and test sets, estimate the factors on the training set, and test whether they explain variance in the test set. The rolling-window design provides a weak form of out-of-sample validation (each year's factor count is computed on that year's data only), but it does not test predictive power.

Strengthening the Result

The result would be strengthened by:

Larger universe: Expand to 50–100 assets to test whether the two-factor structure is robust or an artifact of the small sample.
Formal regime detection: Use a change-point algorithm to identify the dates of regime shifts and test whether they coincide with known macro events (oil collapse, COVID, rate hikes).
Economic attribution: Regress the rolling factor count on macro state variables (VIX, oil prices, yield curve slope) to identify the drivers of regime shifts.
Out-of-sample prediction: Test whether the top two factors predict future correlations or returns in a hold-out sample.
Cross-market replication: Compute the eigenvalue spectrum for European or Asian equities and test whether the two-factor structure (market + sector rotation) is universal.
Robustness to MP assumptions: Bootstrap the MP bounds under the empirical return distribution (preserving autocorrelation and fat tails) and recompute the factor count.

Research evidence, not investment advice.

Spectral theory of correlation matrices — eigenvalue decomposition as regime detection