Empirica Technologies

Question

Does the eigenvalue spectrum of large-cap US equity returns reveal a stable spectral density structure—specifically, a consistent number and magnitude of statistically significant factors—that persists across market regimes, or does spectral coherence collapse under volatility and liquidity boundary conditions?

Method

We computed the eigenvalue spectrum of the return correlation matrix for 10 large-cap US equities (AAPL, AMZN, GOOGL, JNJ, JPM, KO, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 to 2024-12-31 (3,772 observations). The analysis applied principal component analysis (PCA) to the correlation matrix and compared the resulting eigenvalues against the Marchenko-Pastur (MP) null distribution, which characterizes the eigenvalue spectrum expected from a purely random correlation matrix. Eigenvalues exceeding the MP upper bound (1.1056, computed from the q-ratio of 0.003) are statistically distinguishable from random-matrix noise and represent genuine covariance structure. The MP lower bound was 0.8997.

To assess temporal stability, we recomputed the significant factor count on a per-calendar-year basis using the same method applied in-sample within each year from 2010 through 2024. This rolling-window approach reveals how the spectral structure evolves across different market regimes, including the 2020 COVID-19 volatility shock, the 2022 inflation-driven drawdown, and the 2023-2024 AI-driven rally.

Result

The full-period eigenvalue spectrum exhibits a clear hierarchical structure. The top 10 eigenvalues are: 4.5605, 1.4288, 0.7964, 0.5887, 0.5541, 0.4887, 0.4302, 0.4251, 0.3887, 0.3388. Only the first two eigenvalues (4.5605 and 1.4288) exceed the Marchenko-Pastur upper bound of 1.1056, yielding n_significant_factors = 2. The remaining eight eigenvalues fall within or below the MP null range, indicating they are statistically indistinguishable from random noise.

The dominant factor (eigenvalue 4.5605) explains 45.61% of total variance. The two significant factors together explain 59.89% of variance. This concentration implies that nearly 60% of the covariance structure in this large-cap universe is captured by two orthogonal directions, while the remaining 40% is consistent with idiosyncratic or noise-driven variation.

Factor loadings reveal economically interpretable structure:

Factor 1 (eigenvalue 4.5605): The top loadings are MSFT (−0.375), GOOGL (−0.353), and AAPL (−0.337). This factor captures a broad technology-sector exposure, with the three largest tech names loading most heavily. The negative sign is an arbitrary rotation; the magnitude indicates these stocks move together along this principal direction.
Factor 2 (eigenvalue 1.4288): The top loadings are JNJ (+0.389), AMZN (−0.373), and NVDA (−0.358). This factor appears to separate defensive (healthcare) exposure from high-growth technology. JNJ's positive loading contrasts with the negative loadings on AMZN and NVDA, suggesting a growth-versus-stability axis.

Temporal dynamics show regime-dependent spectral coherence. The per-year significant factor count was:

2010–2016: Consistently 1 significant factor per year.
2017: Transition to 2 factors.
2018: 2 factors (volatility spike in Q4 2018).
2019: Reversion to 1 factor.
2020: 1 factor (despite extreme COVID-19 volatility).
2021–2024: Consistently 2 factors.

The spectral structure is not stable across regimes. The number of significant factors doubled from 1 to 2 beginning in 2017, reverted briefly in 2019–2020, and then stabilized at 2 from 2021 onward. Notably, the 2020 COVID-19 volatility shock—a canonical liquidity and volatility boundary condition—did not collapse spectral coherence; the factor count remained at 1, consistent with a single dominant market-wide risk factor during the crisis. The transition to a persistent two-factor regime in 2021–2024 coincides with the post-pandemic recovery, inflation regime shift, and the emergence of AI-driven differentiation within technology stocks.

Interpretation

The results provide a nuanced answer to the research question. The eigenvalue spectrum does reveal a low-dimensional factor structure (2 significant factors over the full period), but this structure is not stable across market regimes. The factor count exhibits discrete regime shifts: a one-factor regime dominated the 2010–2016 period and briefly re-emerged in 2019–2020, while a two-factor regime has persisted since 2021.

What the data support:

Spectral parsimony: Only 2 of 10 eigenvalues exceed the Marchenko-Pastur bound, indicating that the majority of observed correlation structure is statistically indistinguishable from noise. This aligns with random matrix theory's prediction that high-dimensional correlation matrices contain substantial spurious structure.
Economic interpretability: The two significant factors map onto recognizable economic dimensions—a broad technology-sector factor and a growth-versus-defensive factor. The loadings are consistent with sector and style exposures, not arbitrary rotations.
Regime dependence: The factor count is not a fixed property of the asset universe but varies with market conditions. The transition from 1 to 2 factors in 2017 and the stabilization at 2 factors post-2021 suggest that spectral structure responds to macroeconomic and sector-specific dynamics.
Resilience under stress: Contrary to the hypothesis that spectral coherence collapses under volatility/liquidity boundary conditions, the 2020 COVID-19 shock did not fragment the factor structure. Instead, it compressed covariance into a single dominant factor, consistent with a flight-to-quality or market-wide risk-off dynamic. This is a contraction of dimensionality, not a collapse into noise.

What the data do not support:

Stability across regimes: The hypothesis of a stable spectral density structure is rejected. The factor count varies over time, and the transition points (2017, 2021) are not explained by volatility or liquidity shocks alone.
High-dimensional factor models: The data do not support the presence of more than 2 significant factors. Models positing 5+ independent risk factors in large-cap US equities would be over-parameterized relative to this spectral evidence.
Noise-driven bulks as artifacts: The 8 eigenvalues within the MP range are consistent with random noise, but [P2] suggests that such bulks may contain finer cluster structure not resolved by the MP test. Our 10-asset universe is too small to detect such sub-structures; a larger universe might reveal additional weak factors.

The in-sample nature of the per-year factor counts is a limitation: each year's count is computed on that year's data, so the regime shifts could reflect overfitting to in-sample noise rather than genuine structural changes. An out-of-sample test—computing factors on year t and validating on year t+1—would strengthen the claim of regime persistence.

Relation to the Literature

The result extends and refines findings in the spectral analysis of financial correlation matrices. [P2] documents that eigenvalue bulks in empirical correlation matrices emerge from superpositions of cluster structures, not purely from noise. Our finding of exactly 2 significant factors in a 10-asset large-cap universe is consistent with [P2]'s interpretation: the dominant factor captures broad market exposure (the "market" cluster), while the second factor captures sector or style differentiation (the "growth vs. defensive" cluster). The 8 sub-MP eigenvalues likely contain finer cross-correlations that are statistically indistinguishable from noise at this sample size.

[P1] applies random matrix theory to cryptocurrency portfolios and finds that filtering noise-dominated eigenvalues improves risk-return profiles. Our result provides a complementary perspective: in large-cap equities, the MP bound cleanly separates 2 signal factors from 8 noise factors, suggesting that portfolio construction should focus on the 2-dimensional subspace spanned by the significant eigenvectors. Weighting schemes that exploit all 10 dimensions risk overfitting to noise.

[P4] develops rigorous algorithms for computing spectral properties of Koopman operators in dynamical systems, emphasizing the challenge of continuous spectra and spectral pollution. While our setting is finite-dimensional (10 assets), the temporal variation in factor count (1 vs. 2) suggests that the "true" spectral structure of equity returns may be regime-dependent and non-stationary, analogous to the time-varying spectral measures in [P4]. A future extension could apply resolvent-based methods to detect regime transitions in the spectral density.

[P3] generalizes persistence theory to non-topological settings, including graph theory. The eigenvalue spectrum can be viewed as a persistence diagram in the categorical sense: eigenvalues "persist" above the MP threshold, and their lifetimes (magnitudes above the threshold) encode the strength of the corresponding factors. The regime shifts in factor count (1 → 2 → 1 → 2) could be analyzed as changes in the persistence diagram's topology, providing a formal framework for detecting structural breaks.

The literature does not directly address the question of spectral coherence under volatility/liquidity shocks. Our finding that the 2020 COVID-19 shock reduced dimensionality to 1 factor, rather than fragmenting the spectrum, is novel. This suggests that extreme volatility induces a simplification of the covariance structure—a single dominant risk factor overwhelms sector and style differentiation—rather than a collapse into noise. This is consistent with the empirical observation that correlations spike toward 1 during crises, but it provides a spectral-theoretic quantification: the second eigenvalue falls below the MP bound, leaving only the market factor as statistically significant.

Limitations

Small universe: 10 assets is a minimal test case. The Marchenko-Pastur bound is sensitive to the q-ratio (n_obs / n_assets = 0.003 here, implying a very large sample relative to dimensionality), which makes the test conservative. A larger universe (50–100 assets) would provide a more realistic q-ratio and potentially reveal additional weak factors or finer cluster structure as in [P2].
In-sample regime detection: The per-year factor counts are computed in-sample within each year. This does not test whether a factor structure estimated in year t predicts covariance in year t+1. An out-of-sample rolling-window test (e.g., estimate on years t−2 to t, validate on year t+1) would distinguish genuine regime persistence from in-sample overfitting.
Single asset class: The universe is restricted to large-cap US equities. The spectral structure of a multi-asset portfolio (equities, bonds, commodities) or a global equity universe (US, Europe, Asia) may exhibit different dimensionality and regime dependence.
No macroeconomic covariates: The analysis does not condition on observable regime variables (VIX, credit spreads, liquidity measures). Regressing the factor count or top eigenvalue on such covariates would test whether spectral transitions are predictable from macro conditions.
Eigenvalue spacing and universality: Random matrix theory predicts not only the bulk distribution (MP) but also the spacing distribution of eigenvalues (e.g., Wigner surmise). Testing whether the 8 sub-MP eigenvalues follow the expected spacing distribution would provide additional evidence that they are noise-driven.
Factor rotation and interpretation: PCA eigenvectors are orthogonal by construction, but economically meaningful factors (e.g., Fama-French factors) need not be orthogonal. A rotation to maximize interpretability (e.g., varimax) or a comparison to known factor models would strengthen the economic interpretation of Factor 1 and Factor 2.

Strengthening the result would require: (i) expanding the universe to 50+ assets to test robustness of the 2-factor structure; (ii) implementing an out-of-sample validation of regime persistence; (iii) conditioning the spectral analysis on observable macro/liquidity variables; and (iv) comparing the PCA factors to established factor models (market, size, value, momentum) to assess whether the 2 significant factors are rotations of known priced risks.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces