Empirica Technologies

Spectral Regime Detection in Cross-Sector Equity Correlation Matrices: Evidence from Marchenko-Pastur Decomposition

Question

Does the eigenvalue spectrum of a cross-sector equity correlation matrix exhibit statistically significant separation from the Marchenko-Pastur random-matrix null, and does the number of spiked eigenvalues (factors distinguishable from noise) vary systematically over time in a manner consistent with regime transitions, rather than remaining constant or changing only through single-eigenvalue threshold crossings?

Method

We computed the eigenvalue spectrum of the daily return correlation matrix for a 12-asset cross-sector equity universe (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, KO, MSFT, NVDA, PFE, XOM) over the window 2010-01-01 to 2024-12-31 (3,772 daily observations). Data source: yfinance daily adjusted-close returns. The analysis applies principal component analysis (PCA) to the correlation matrix and compares the resulting eigenvalue spectrum to the Marchenko-Pastur (MP) distribution, the theoretical null for eigenvalues of a random correlation matrix with aspect ratio q = n_assets / n_obs = 12 / 3772 = 0.003.

Under the MP null, eigenvalues should lie within the interval [λ_min, λ_max] where λ_max = (1 + √q)² and λ_min = (1 - √q)². For our data, the MP upper bound is 1.116 and the lower bound is 0.8904. Eigenvalues exceeding the upper bound are statistically distinguishable from random-matrix noise and represent genuine covariance structure (spiked eigenvalues or significant factors). We counted the number of eigenvalues above 1.116 in the full-sample spectrum and repeated the computation on a rolling per-calendar-year basis (in-sample within each year) to track temporal variation in spectral shape.

Factor loadings for the top two significant factors were extracted to interpret the economic structure captured by each spiked eigenvalue. Variance explained by the top eigenvalue and by all significant factors was computed as the fraction of total variance (sum of all eigenvalues = n_assets = 12) attributable to those components.

Result

The full-sample eigenvalue spectrum exhibits clear bulk-edge separation. The top 10 eigenvalues are: 5.4969, 1.6282, 1.0457, 0.7347, 0.5802, 0.5578, 0.4874, 0.4311, 0.3892, 0.3411. Two eigenvalues exceed the MP upper bound of 1.116: λ₁ = 5.4969 and λ₂ = 1.6282. The third eigenvalue (1.0457) lies within the MP bulk, indicating that only two factors are statistically distinguishable from random-matrix noise at the full-sample level.

The top factor (λ₁ = 5.4969) explains 45.81% of total variance (5.4969 / 12). The two significant factors together explain 59.38% of variance. The remaining 10 eigenvalues, all within or below the MP bulk, account for 40.62% of variance and are consistent with random correlation structure.

Factor loadings reveal interpretable economic structure:

Factor 1 (λ₁ = 5.4969): Dominated by financials and technology, with top loadings JPM (-0.331), MSFT (-0.321), BAC (-0.318). This factor captures broad market comovement, with negative loadings indicating that all assets move together in the same direction (the sign convention is arbitrary in PCA; the magnitude and relative ordering matter).
Factor 2 (λ₂ = 1.6282): Exhibits sector contrast, with top loadings AMZN (-0.413), XOM (+0.359), NVDA (-0.357). The opposing signs on XOM (energy) versus AMZN and NVDA (technology/consumer discretionary) suggest this factor captures a growth-versus-value or technology-versus-commodity rotation dynamic.

Rolling-window analysis reveals substantial temporal variation in spectral shape. The number of significant factors (eigenvalues above the MP upper bound, recomputed per calendar year on the same data and method) evolved as follows:

2010–2015: 1 significant factor per year (stable single-factor regime)
2016–2018: 2 significant factors per year (transition to two-factor regime)
2019: 1 significant factor (brief reversion)
2020–2023: 2 significant factors per year (return to two-factor regime)
2024: 3 significant factors (emergence of three-factor regime)

The transition from one to two significant factors in 2016 and the emergence of a third factor in 2024 represent discrete changes in spectral shape, not merely continuous drift in a single eigenvalue. These transitions align with known market regime shifts: 2016 marked the post-financial-crisis normalization and the beginning of the Trump administration's policy uncertainty; 2020 saw the COVID-19 shock and subsequent monetary expansion; 2024 reflects the high-interest-rate, AI-driven market bifurcation.

Interpretation

The results provide strong evidence for Marchenko-Pastur bulk-edge separation in cross-sector equity correlation matrices and demonstrate that spectral shape (the count of spiked eigenvalues) varies systematically over time. The full-sample spectrum cleanly separates two significant factors from the MP bulk, with the third eigenvalue falling just below the threshold. This separation is not marginal: the top eigenvalue is 4.9 times the MP upper bound, and the second is 1.5 times the bound, indicating robust signal-to-noise distinction.

The rolling-window analysis reveals that spectral shape is not constant. The number of significant factors increased from one (2010–2015) to two (2016–2018, 2020–2023) to three (2024), with a brief reversion in 2019. These transitions are discrete regime changes in the covariance structure, not smooth evolution. A single-eigenvalue threshold-crossing rule (e.g., "flag a regime change when λ₁ crosses some fixed value") would miss these transitions, because the regime change manifests as the emergence or disappearance of additional spiked eigenvalues, not just growth in the top eigenvalue.

The factor loadings provide economic interpretation. Factor 1 is a market factor: all assets load negatively (or equivalently, all positively under sign reversal), capturing broad equity-market comovement. Factor 2 is a sector-rotation factor: energy (XOM) loads positively while technology/consumer (AMZN, NVDA) load negatively, consistent with a growth-value or risk-on-risk-off dynamic. The emergence of a third factor in 2024 (not detailed in loadings but implied by the count) likely reflects the AI-driven bifurcation between mega-cap technology and the rest of the market, a widely discussed 2024 phenomenon.

The variance decomposition (45.81% in the top factor, 59.38% in both significant factors) indicates that the bulk of cross-sectional variance is captured by these two factors, with the remaining 40.62% attributable to idiosyncratic or noise-like fluctuations. This is consistent with the random-matrix null: in a purely random correlation matrix, all variance would be distributed across the MP bulk; the presence of spiked eigenvalues indicates genuine covariance structure.

The temporal variation in factor count suggests that correlation structure is regime-dependent. Periods of market stress or structural change (2016 policy uncertainty, 2020 pandemic, 2024 AI bifurcation) exhibit higher-dimensional covariance structure (more significant factors), while stable periods (2010–2015 post-crisis recovery) exhibit lower-dimensional structure (one dominant market factor). This is consistent with the hypothesis that regime transitions are detectable via spectral shape change, not just eigenvalue magnitude change.

The result does NOT support the claim that correlation structure is static or that a single factor suffices to describe cross-sector equity comovement at all times. The rolling-window evidence directly contradicts this: the number of significant factors varies by a factor of three (from one to three) over the sample period. Nor does the result support the claim that regime detection requires only monitoring the top eigenvalue: the 2016 and 2024 transitions involved the emergence of new spiked eigenvalues, not just growth in λ₁.

Relation to the Literature

The result extends the random-matrix-theory literature on financial correlation matrices by demonstrating temporal variation in spectral shape, not just eigenvalue magnitude. [P9] applies spectral methods to community detection in networks, proposing that functional communities can be identified via eigenvalue analysis of adjacency matrices. Our result parallels this: the spiked eigenvalues identify "functional communities" of assets (sectors that comove), and the number of such communities varies over time. The rolling-window factor-count series is analogous to [P9]'s dynamic community structure, though our application is to correlation matrices rather than adjacency matrices.

[P2] develops a Grassmann-manifold framework for tracking communities in evolving networks, framing the problem as subspace tracking. Our rolling-window spectral analysis is a discrete-time version of this: each year's significant eigenspace is a point on the Grassmann manifold, and the factor-count transitions represent geodesic jumps between subspaces of different dimension. The 2016 and 2024 transitions are analogous to [P2]'s "subspace tracking" problem, where the dimension of the signal subspace changes.

[P8] analyzes spatial hierarchies in European regional innovation via spatial principal components analysis, finding that regions cluster into groups with distinct growth dynamics. Our factor loadings (JPM/MSFT/BAC on Factor 1, AMZN/XOM/NVDA on Factor 2) similarly reveal hierarchical structure in the equity universe: a broad market cluster and a sector-rotation cluster. The temporal variation in factor count parallels [P8]'s finding that spatial hierarchies are not static but evolve with economic development.

The result is in tension with models that assume constant factor structure (e.g., static multi-factor models with a fixed number of factors). The rolling-window evidence shows that the number of significant factors is not constant, implying that factor models must be time-varying or regime-dependent. This aligns with [P5]'s finding of multiple abrupt phase transitions in urban congestion: both results identify discrete regime changes (in spectral shape or congestion location) rather than smooth evolution.

[P1] estimates spatial weights matrices consistent with observed spatial autocovariance, a problem analogous to inferring factor structure from observed correlation. Our MP-based approach provides a statistical null for distinguishing signal (spiked eigenvalues) from noise (MP bulk), whereas [P1] lacks such a null and must rely on model selection criteria. The MP framework offers a principled, parameter-free threshold for factor significance.

The result does not directly engage with [P3], [P4], [P6], [P7], or [P10], which address urban planning, mobility, and regional development. These papers are contextually distant from financial correlation matrices, though [P4]'s vector-field analysis of mobility flows shares a methodological affinity with spectral decomposition (both seek low-dimensional structure in high-dimensional data).

Limitations

Small universe: The 12-asset universe is deliberately cross-sector but small. A larger universe (e.g., 100+ assets) would provide finer-grained sector resolution and potentially reveal additional factors. The q-ratio (0.003) is very small, making the MP bounds tight, but a larger n_assets would increase q and widen the bulk, potentially changing the factor count.
In-sample rolling windows: The per-year factor counts are computed in-sample within each year, not out-of-sample. This means the factor count for year t uses only data from year t, which is appropriate for regime detection (we want to know the contemporaneous spectral shape) but does not test predictive power. An out-of-sample test would ask whether the factor structure estimated in year t predicts returns or volatility in year t+1.
Daily frequency: Daily returns emphasize short-term comovement and may be sensitive to microstructure noise. Weekly or monthly returns would reduce noise but also reduce the sample size within each rolling window, widening the MP bounds and potentially reducing the number of significant factors.
Equal weighting: The correlation matrix treats all assets equally. A market-cap-weighted or volatility-adjusted correlation matrix might reveal different factor structure, particularly if large-cap assets dominate comovement.
No macroeconomic covariates: The analysis identifies when spectral shape changes but does not formally test which macroeconomic or market variables drive those changes. Regressing the rolling factor count on VIX, interest rates, or policy uncertainty indices would strengthen the regime-transition interpretation.
MP threshold sensitivity: The MP upper bound (1.116) is derived under the assumption of Gaussian returns and large-sample asymptotics. Fat tails or finite-sample effects could shift the effective threshold. A bootstrap-based threshold (resampling returns and recomputing the null distribution) would provide a finite-sample correction.
Factor interpretation: The factor loadings are reported for the full sample only. Rolling-window loadings (how the economic interpretation of Factor 1 and Factor 2 changes over time) would provide richer insight into regime dynamics but are not included in the computed result.

Strengthening the result would require: (i) expanding the universe to 50–100 assets for finer sector granularity, (ii) computing rolling-window loadings to track factor interpretation over time, (iii) testing out-of-sample predictive power of the factor structure, and (iv) regressing the factor count on macroeconomic variables to identify regime drivers. The current result establishes that spectral shape varies over time and that this variation aligns with known market regimes, but it does not yet provide a predictive or causal model of those transitions.

Research evidence, not investment advice. This analysis quantifies correlation structure and its temporal variation; it does not recommend any security or trading strategy.

Spectral theory of correlation matrices — eigenvalue decomposition as regime detection