Empirica Technologies

Question

Does the eigenvalue spectrum of large-cap equity return correlations exhibit a statistically significant spectral gap separating signal eigenvalues (representing real common factors) from the Marchenko-Pastur noise bulk, and does the number of signal factors vary systematically over time in a manner consistent with regime transitions?

Method

We computed the eigenvalue decomposition of the return correlation matrix for 11 large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, META, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 through 2024-12-31, yielding 3,772 observations. The ratio q = n_assets / n_obs = 0.003 is well within the asymptotic regime where random matrix theory applies.

We applied principal component analysis to the correlation matrix and compared the resulting eigenvalue spectrum to the Marchenko-Pastur (MP) null distribution, which describes the eigenvalue density of a correlation matrix constructed from purely random, uncorrelated data. Under the MP null with q = 0.003, eigenvalues should lie within the interval [0.8949, 1.1109]. Eigenvalues exceeding the upper bound 1.1109 are statistically distinguishable from random-matrix noise and represent genuine common factors driving covariation in the data.

To assess time variation, we recomputed the eigenvalue spectrum and factor count separately for each calendar year 2010–2024 using the same method (in-sample within each year), producing a rolling series of significant factor counts.

Result

Full-period spectrum

The top ten eigenvalues of the correlation matrix are: 5.1473, 1.5929, 0.9728, 0.7274, 0.5580, 0.5024, 0.4561, 0.3919, 0.3428, 0.1652. The Marchenko-Pastur upper bound is 1.1109 and the lower bound is 0.8949.

Two eigenvalues exceed the MP upper bound: λ₁ = 5.1473 and λ₂ = 1.5929. All remaining eigenvalues fall below the threshold, lying within or below the MP bulk. This constitutes a clear spectral gap: the top two eigenvalues are statistically significant factors; the remaining nine are consistent with noise.

The top factor (λ₁ = 5.1473) explains 46.79% of total variance. The two significant factors together explain 61.28% of variance. The third eigenvalue (0.9728) lies within the MP interval and is not distinguishable from noise.

Factor structure

Factor 1 (the dominant eigenvalue) loads most heavily on JPM (−0.343), MSFT (−0.335), and BAC (−0.331). This factor captures broad market comovement, with near-uniform negative loadings across financials and technology.

Factor 2 (the second significant eigenvalue) exhibits a bipolar structure: AMZN loads negatively (−0.399), while XOM (+0.395) and CVX (+0.368) load positively. This factor separates energy from technology/consumer discretionary, consistent with a sectoral or commodity-price-driven dimension orthogonal to the market factor.

Time variation in factor count

The per-year significant factor count (eigenvalues above the annual MP upper bound, recomputed in-sample each year) is:

2010–2015: 1 factor per year (a single dominant market mode)
2016: 2 factors
2017: 2 factors
2018: 1 factor
2019: 1 factor
2020: 2 factors
2021: 2 factors
2022: 2 factors
2023: 2 factors
2024: 2 factors

The factor count exhibits a structural shift. From 2010 through 2015, the correlation structure is dominated by a single significant factor. Beginning in 2016, a second factor emerges intermittently, then stabilizes: from 2020 onward, two factors are consistently significant. The years 2018–2019 revert briefly to a single-factor regime before the two-factor structure re-establishes.

Interpretation

What the spectrum reveals

The eigenvalue spectrum provides strong evidence of low-dimensional structure in large-cap equity returns. The spectral gap is unambiguous: the top two eigenvalues are 4.6 and 1.4 times the MP upper bound, respectively, while the third eigenvalue lies 12% below the threshold. This is not a marginal effect; the signal eigenvalues are separated from the noise bulk by a factor of approximately 5:1 in magnitude.

The two-factor structure is economically interpretable. Factor 1 is a broad market factor with near-uniform loadings, consistent with systematic risk. Factor 2 is a sectoral tilt separating energy (positive loadings on XOM, CVX) from technology and consumer discretionary (negative loading on AMZN). The orthogonality of these factors (by construction of PCA) implies that energy-versus-tech dispersion is a dimension of variation independent of overall market movement.

Time variation as regime detection

The rolling factor count series exhibits a clear regime transition. The single-factor regime (2010–2015) corresponds to a period of relatively homogeneous equity behavior: all large-cap stocks moved together, with sectoral or idiosyncratic variation indistinguishable from noise. The emergence of a second factor in 2016, and its stabilization from 2020 onward, indicates a structural change in the correlation matrix: sectoral dispersion (specifically, energy versus technology) became a statistically significant, persistent dimension of variation.

This transition aligns with known macroeconomic and market developments. The 2010–2015 period was characterized by low interest rates, low volatility, and relatively uniform equity performance (the "low-vol regime"). The 2016 emergence of a second factor coincides with the beginning of the commodity price recovery and increased sectoral dispersion. The 2020–2024 stabilization of the two-factor structure coincides with the COVID-19 shock, the energy price spike of 2021–2022, and the subsequent inflation/rate-hiking cycle—all of which amplified sectoral divergence, particularly between energy and technology.

The brief reversion to a single factor in 2018–2019 is consistent with the late-cycle compression of sectoral dispersion during that period, when correlations rose and the market moved more uniformly.

What the result does NOT support

This analysis does NOT establish that the spectral gap predicts future regime transitions or hedging effectiveness. The rolling factor count is computed in-sample within each year; it is a descriptive statistic, not a forecast. We have not tested whether an increase in the factor count at time t predicts changes in cross-asset hedging effectiveness at t+1, nor have we computed out-of-sample validation of the factor structure.

The result also does NOT imply that the two-factor model is the "true" dimensionality of equity returns in any fundamental sense. The Marchenko-Pastur threshold is a statistical criterion for distinguishing signal from noise given the sample size and asset count; it is not a test of economic theory. A larger universe or a different sample period might yield a different factor count.

Finally, the interpretation of Factor 2 as "energy versus technology" is based on the observed loadings in this specific sample. The factor is defined mathematically (the second eigenvector), not economically; its sectoral interpretation is ex post and specific to this universe and window.

Relation to the Literature

No closely related papers were retrieved for this computation. The result stands on the empirical evidence alone. The use of random matrix theory (specifically, the Marchenko-Pastur distribution) to distinguish signal from noise in correlation matrices is a well-established technique in quantitative finance and statistical physics, but the specific application to regime detection in large-cap U.S. equities over this window is, to our knowledge, novel in this form.

The two-factor structure we observe is consistent with classical factor models (e.g., Fama-French), which posit that equity returns are driven by a small number of common factors. However, our factors are derived purely from the eigenvalue decomposition of the correlation matrix, without reference to economic variables or portfolio construction. The alignment of Factor 2 with a sectoral (energy/technology) dimension suggests that the spectral method recovers economically meaningful structure, but this is an empirical observation, not a theoretical prediction.

The time variation in factor count is consistent with the broader literature on regime-switching models and time-varying correlation, which documents that equity correlations are not constant but shift with macroeconomic and market conditions. Our contribution is to quantify this variation using a non-parametric, data-driven criterion (the MP threshold) rather than a pre-specified regime model.

Limitations

Sample size and universe

The analysis is restricted to 11 large-cap U.S. equities. This is a small universe, and the q-ratio (0.003) is extremely low, which is favorable for the asymptotic validity of the Marchenko-Pastur distribution but limits the generality of the result. A larger universe (e.g., the S&P 500) would provide a more comprehensive picture of equity correlation structure and might reveal additional factors. The choice of large-cap names introduces survivorship bias and sector concentration (heavy weight on technology and financials).

In-sample only

All results are in-sample. The eigenvalue decomposition is computed on the same data used to interpret the factors. We have not tested whether the factor structure identified in one period predicts returns or correlations in a subsequent period. Out-of-sample validation (e.g., computing the factor loadings on data from 2010–2019 and testing their explanatory power on 2020–2024 returns) would strengthen the claim that the factors represent persistent, generalizable structure rather than sample-specific noise.

Rolling window choice

The per-year rolling factor count is computed using calendar-year windows, which are arbitrary and may not align with economic regimes. A more principled approach would use a rolling window of fixed length (e.g., 252 trading days) or a regime-detection algorithm to endogenously identify breakpoints. The calendar-year choice is convenient but may obscure intra-year transitions or artificially align results with year-end effects.

No cross-asset hedging test

The computation question asks whether the spectral gap "correlates with regime transitions in cross-asset hedging effectiveness," but we have not tested this. We have documented time variation in the equity correlation structure, but we have not computed hedging ratios, hedge effectiveness (e.g., R² of equity returns on bond or commodity futures), or their correlation with the factor count. Establishing that link would require additional computation on cross-asset data.

Interpretation of Factor 2

The interpretation of Factor 2 as "energy versus technology" is based on the top loadings (XOM, CVX positive; AMZN negative) but is not exhaustive. Other assets in the universe (e.g., JNJ, PFE) also load on Factor 2, and their loadings may reflect dimensions orthogonal to the energy/tech narrative. A more complete interpretation would require examining all loadings and their economic correlates (e.g., correlation with oil prices, interest rates, or sector indices).

Assumption of stationarity within windows

The per-year factor count assumes that the correlation structure is stationary within each calendar year. This is unlikely to hold during periods of sharp regime change (e.g., March 2020). A more granular analysis (e.g., quarterly or monthly windows) would capture intra-year variation but would reduce the sample size per window, increasing estimation error and potentially violating the asymptotic conditions for the Marchenko-Pastur distribution.

Research evidence, not investment advice.

Spectral theory of correlation matrices — eigenvalue decomposition as regime detection