Question
Does the eigenvalue spectrum of a broad US equity universe exhibit bulk-edge separation consistent with random matrix theory, and do the largest eigenvalues (signal) exceed Marchenko-Pastur thresholds in a way that reveals distinct correlation regimes over the 2010–2024 period?
Method
We computed the eigenvalue spectrum of the return correlation matrix for a 14-asset universe of large-cap US equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, KO, META, MSFT, NVDA, PEP, PFE, PG, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 to 2024-12-31 (3,772 observations). The data source is yfinance daily adjusted-close returns; the method is principal component analysis (PCA) eigenvalue decomposition of the correlation matrix, benchmarked against the Marchenko-Pastur (MP) null distribution for random correlation matrices.
The Marchenko-Pastur distribution provides theoretical bounds for eigenvalues of a correlation matrix constructed from purely random data, parameterized by the ratio q = n_assets / n_obs = 14 / 3772 ≈ 0.004. Eigenvalues exceeding the upper MP bound are statistically distinguishable from noise and represent genuine covariance structure (signal factors). We computed the MP upper bound as 1.1256 and the lower bound as 0.8819. Eigenvalues above the upper threshold indicate significant common factors; those within the bulk [0.8819, 1.1256] are consistent with random correlation noise.
To assess time variation in correlation regime, we recomputed the eigenvalue spectrum and significant factor count on a per-calendar-year basis (in-sample within each year, same universe and method), yielding a 15-year rolling series of factor counts.
Result
Full-sample spectrum
The top ten eigenvalues of the full-sample correlation matrix are:
- λ₁ = 6.2461
- λ₂ = 1.6970
- λ₃ = 1.4173
- λ₄ = 0.7601
- λ₅ = 0.7010
- λ₆ = 0.5597
- λ₇ = 0.4915
- λ₈ = 0.4401
- λ₉ = 0.3945
- λ₁₀ = 0.3707
Against the Marchenko-Pastur upper bound of 1.1256, three eigenvalues exceed the threshold: λ₁ = 6.2461, λ₂ = 1.6970, and λ₃ = 1.4173. The remaining eleven eigenvalues fall below the MP upper bound, with eigenvalues 4–14 lying well below the lower bound of 0.8819, indicating they represent idiosyncratic or residual variance rather than common structure.
The number of significant factors is 3. These three factors collectively explain 66.86% of total variance, while the dominant first factor alone accounts for 44.62% of variance.
Factor structure
The loadings on the top two factors reveal economically interpretable structure:
Factor 1 (λ₁ = 6.2461, 44.62% variance):
- JPM: 0.301
- MSFT: 0.297
- BAC: 0.286
This factor loads most heavily on large financials (JPM, BAC) and the dominant technology name (MSFT), consistent with a broad market or systematic risk factor that captures co-movement across sectors.
Factor 2 (λ₂ = 1.6970):
- AMZN: 0.427
- NVDA: 0.401
- GOOGL: 0.352
The second factor isolates high-growth technology names (AMZN, NVDA, GOOGL), representing a distinct technology/growth style dimension orthogonal to the broad market factor.
The third significant factor (λ₃ = 1.4173) is not detailed in the loadings output but, by elimination and the structure of the universe, likely captures either a defensive/consumer-staples dimension (JNJ, KO, PEP, PFE, PG) or an energy/value tilt (CVX, XOM).
Time variation in correlation regime
The per-calendar-year significant factor count series reveals clear regime dynamics:
- 2010–2012: 1 significant factor per year (single dominant market factor)
- 2013–2014: 2 factors (emergence of a second dimension)
- 2015: 1 factor (regime compression)
- 2016: 2 factors
- 2017: 3 factors (first appearance of three-factor regime)
- 2018–2020: 2 factors (reversion during volatility and pandemic)
- 2021–2024: 3 factors consistently (stable three-factor regime)
The transition from a one-factor regime (2010–2012) to a stable three-factor regime (2021–2024) indicates increasing differentiation in equity return drivers. The 2017 emergence of the third factor, its temporary collapse in 2018–2020, and its re-establishment in 2021 onward suggest that correlation structure is not static but responds to macroeconomic and market microstructure shifts.
Interpretation
Bulk-edge separation
The eigenvalue spectrum exhibits strong bulk-edge separation consistent with random matrix theory. The three largest eigenvalues (6.2461, 1.6970, 1.4173) lie far above the Marchenko-Pastur upper bound of 1.1256, while the remaining eleven eigenvalues fall well below it (the fourth eigenvalue, 0.7601, is 32% below the threshold). This clean separation confirms that the correlation matrix contains genuine low-dimensional structure (three common factors) embedded in a high-dimensional noise background.
The q-ratio of 0.004 (14 assets, 3,772 observations) places this analysis in the regime where the MP bounds are tight and the signal-to-noise distinction is sharp. The dominant eigenvalue of 6.2461 is 5.5 times the MP upper bound, a ratio that would occur with negligible probability under the random-matrix null. The second and third eigenvalues exceed the threshold by factors of 1.5 and 1.3, respectively, indicating robust signal content.
Economic interpretation of factors
The factor loadings align with known equity market structure:
Factor 1 (market/systematic risk): The broad loading across financials and mega-cap technology (JPM, MSFT, BAC) is consistent with a market beta or systematic risk factor. The 44.62% variance share is typical for a diversified equity universe where idiosyncratic risk remains substantial.
Factor 2 (technology/growth): The concentration on AMZN, NVDA, and GOOGL isolates the high-growth, high-volatility technology sector. This factor's emergence as a distinct dimension (rather than being subsumed into Factor 1) reflects the increasing divergence between technology and traditional equity performance over the sample period.
Factor 3 (defensive/value or energy): While loadings are not detailed, the eigenvalue magnitude (1.4173) and the composition of the universe suggest this factor captures either a defensive consumer-staples dimension or an energy/value tilt. The 2021–2024 persistence of this third factor coincides with the post-pandemic regime of elevated inflation and energy volatility, supporting an energy/value interpretation.
Regime dynamics
The time variation in significant factor count is the central empirical finding. The one-factor regime (2010–2012) corresponds to the post-financial-crisis period of coordinated monetary easing and low dispersion, where a single systematic risk factor dominated. The transition to two factors (2013–2016) coincides with the Fed taper and the beginning of technology sector outperformance. The emergence of three factors in 2017 aligns with the start of the late-cycle expansion and rising sector differentiation.
The collapse back to two factors in 2018–2020 is economically interpretable: 2018 saw a sharp volatility spike and sector rotation; 2020 brought the pandemic, which temporarily compressed cross-sectional dispersion as all risk assets moved together. The re-establishment of three factors in 2021 and stability through 2024 suggests a new equilibrium with persistent sector differentiation, driven by divergent monetary policy transmission, inflation dynamics, and technology adoption.
Critically, this is an in-sample, within-year recomputation, not an out-of-sample forecast. The factor count series describes the realized correlation structure in each calendar year, not a predictive model. The stability of the three-factor regime from 2021 onward is a statement about the data, not a claim about future persistence.
What the result does NOT support
The result does not imply that three factors are sufficient to price all assets or that the identified factors are the "true" economic drivers. The 66.86% variance explained by three factors leaves 33.14% unexplained, a substantial residual. The factor structure is descriptive (PCA) rather than structural (e.g., Fama-French factors with economic priors), so the loadings reflect statistical co-movement, not causal mechanisms.
The result does not provide a forward-looking signal for portfolio construction or risk management. The eigenvalue spectrum is computed on the full sample (or within-year samples), so it describes realized correlation, not expected correlation. Using this structure to forecast future regimes would require an out-of-sample validation framework, which is not part of this computation.
The result does not establish that the Marchenko-Pastur threshold is the "correct" criterion for factor significance in all contexts. The MP bound is a null hypothesis for random correlation; it does not account for economic structure, non-stationarity, or fat tails. A more conservative threshold (e.g., a bootstrap or permutation-based bound) might yield fewer significant factors.
Relation to the Literature
No closely related papers were retrieved for this computation, so the result stands on its own empirical grounding. The Marchenko-Pastur distribution is a standard tool in random matrix theory, widely applied in physics and finance to distinguish signal from noise in high-dimensional covariance estimation. The application here—using the MP upper bound as a threshold for eigenvalue significance in equity correlation matrices—is methodologically conventional but not directly tied to a specific prior study in the retrieved literature.
The finding of a low-dimensional factor structure (three significant factors in a 14-asset universe) is consistent with the broader empirical finance literature on factor models, which typically finds that a small number of common factors (market, size, value, momentum, quality) explain the majority of cross-sectional return variation. The time variation in factor count (one to three factors over 2010–2024) aligns qualitatively with studies documenting regime shifts in equity correlation, though the specific dynamics here are novel to this dataset and window.
The economic interpretation of the factors (market, technology/growth, defensive/value or energy) is informed by standard sector and style classifications but is not a formal test of any named factor model (e.g., Fama-French, Carhart). The loadings are purely data-driven (PCA), so any alignment with known factors is suggestive rather than definitive.
Limitations
Small universe: The 14-asset universe is a convenience sample of large-cap US equities, not a representative cross-section of the market. The factor structure may not generalize to mid-caps, small-caps, international equities, or other asset classes. A larger universe (e.g., S&P 500 constituents) would provide a more robust test of bulk-edge separation and factor count.
In-sample only: All results are in-sample. The eigenvalue spectrum is computed on the same data used to estimate the correlation matrix, so there is no out-of-sample validation of the factor structure or regime dynamics. An out-of-sample test would require splitting the sample (e.g., estimate factors on 2010–2019, test on 2020–2024) or using a rolling-window forecast framework.
Stationarity assumption: The Marchenko-Pastur bound assumes the correlation matrix is drawn from a stationary random process. Equity returns are non-stationary (regime shifts, structural breaks, time-varying volatility), so the MP threshold may be too permissive or too conservative in different subperiods. A time-varying MP bound (e.g., computed on rolling windows) would better account for non-stationarity.
No economic priors: PCA is an atheoretical decomposition; it finds the directions of maximum variance without regard to economic interpretability. The factor loadings are suggestive (market, technology, defensive/value), but they are not identified by economic theory or tested against known factor models. A structural factor model (e.g., instrumented by macroeconomic variables) would provide stronger economic grounding.
Daily frequency: The use of daily returns may introduce microstructure noise (bid-ask bounce, non-synchronous trading) that inflates idiosyncratic variance and biases eigenvalues downward. Monthly or weekly returns would reduce noise but at the cost of fewer observations and a less precise MP bound.
No uncertainty quantification: The eigenvalues and factor counts are point estimates with no reported confidence intervals or standard errors. A bootstrap or jackknife procedure would quantify sampling uncertainty in the eigenvalue spectrum and the number of significant factors.
Strengthening the result would require: (i) expanding the universe to hundreds of assets, (ii) implementing an out-of-sample validation framework, (iii) computing time-varying MP bounds on rolling windows, (iv) comparing the PCA factors to economically motivated factor models, (v) using lower-frequency returns to reduce microstructure noise, and (vi) bootstrapping confidence intervals for eigenvalues and factor counts.
Research evidence, not investment advice.