Eigenvalue Spectrum of Large-Cap Equity Returns: Evidence for Multi-Factor Structure and Regime Transition
Question
Does the eigenvalue spectrum of the return correlation matrix among large-cap equities exhibit a spectral gap and regime structure consistent with a gravity-field decomposition, where the largest eigenvalues correspond to market-cap-weighted 'mass clusters' and smaller eigenvalues reveal sector/liquidity-constrained substructure?
Method
We computed the eigenvalue spectrum of the daily return correlation matrix for 15 large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, KO, META, MSFT, NVDA, PEP, PFE, PG, WMT, XOM) over the window 2010-01-01 through 2024-12-31, yielding 3772 daily observations. The data source is yfinance daily adjusted-close returns.
The analysis applies principal component analysis (PCA) to the 15×15 correlation matrix and tests the resulting eigenvalues against the Marchenko-Pastur (MP) null distribution. Under the MP null, a correlation matrix constructed from purely random returns with q = N_assets / N_obs = 0.004 has eigenvalues bounded between a lower edge (λ_min = 0.8779) and an upper edge (λ_max = 1.1301). Eigenvalues exceeding the upper bound are statistically distinguishable from random-matrix noise and indicate genuine common factors. We count the number of eigenvalues above 1.1301 as the number of significant factors.
To assess time variation, we recomputed the eigenvalue spectrum and factor count separately within each calendar year from 2010 through 2024, applying the same PCA and MP threshold procedure in-sample within each year. This rolling-window series reveals how the factor structure evolved over the 15-year period.
Result
Full-period spectrum and factor count
The top ten eigenvalues of the full-period correlation matrix are:
- λ₁ = 6.4878
- λ₂ = 1.7134
- λ₃ = 1.4928
- λ₄ = 0.7632
- λ₅ = 0.7174
- λ₆ = 0.6574
- λ₇ = 0.5596
- λ₈ = 0.4912
- λ₉ = 0.4377
- λ₁₀ = 0.3942
The Marchenko-Pastur upper bound is 1.1301. Three eigenvalues exceed this threshold (6.4878, 1.7134, 1.4928), yielding n_significant_factors = 3. The remaining twelve eigenvalues lie below the MP bound and are statistically indistinguishable from noise.
A clear spectral gap separates the third eigenvalue (1.4928) from the fourth (0.7632): the ratio λ₃ / λ₄ ≈ 1.96 marks a discontinuity in the spectrum. Below the gap, eigenvalues decay smoothly toward zero, consistent with random-matrix behavior. Above the gap, three eigenvalues capture genuine covariance structure.
Variance explained
The largest eigenvalue alone explains 43.25% of total variance (variance_explained_top1 = 0.4325). The three significant factors together explain 64.63% (variance_explained_significant = 0.6463). The remaining 35.37% of variance is distributed across the twelve noise eigenvalues.
Factor loadings and economic interpretation
Factor 1 (λ₁ = 6.4878, 43.25% variance):
The top three loadings are JPM (0.291), MSFT (0.290), and PEP (0.278). All 15 assets load positively on this factor with similar magnitudes (loadings range 0.24–0.29, not shown in full). This is the market factor: a broad, equal-weighted common mode driving nearly half of all return variance. The near-uniform loadings indicate that this factor does not discriminate by sector or market cap within this large-cap universe—it is the aggregate risk premium common to all equities.
Factor 2 (λ₂ = 1.7134, contributing to the 64.63% cumulative):
The top three loadings (by absolute value) are AMZN (−0.416), NVDA (−0.403), and GOOGL (−0.351). These are large negative loadings on high-growth technology names. The sign structure suggests a growth/value tilt factor: assets with high loadings on Factor 2 co-move in opposition to the market factor, capturing a dimension of return variance orthogonal to the broad market. The concentration of tech names with large negative loadings indicates that this factor isolates the idiosyncratic covariance among growth equities, distinct from the market-wide mode.
Factor 3 (λ₃ = 1.4928):
Loadings not reported in the COMPUTED RESULT block, but the eigenvalue's position above the MP bound confirms it as a third statistically significant source of common variance. Given the universe composition (financials JPM/BAC, energy XOM/CVX, consumer staples KO/PEP/PG, pharma JNJ/PFE, tech AAPL/MSFT/GOOGL/AMZN/META/NVDA, retail WMT), this third factor likely captures a sector or volatility regime dimension orthogonal to the first two.
Time variation: regime transition in factor count
The per-calendar-year factor count (number of eigenvalues above the MP upper bound, recomputed in-sample each year) shows a clear regime transition:
| Year | Significant factors |
|---|---|
| 2010 | 1 |
| 2011 | 1 |
| 2012 | 1 |
| 2013 | 2 |
| 2014 | 2 |
| 2015 | 1 |
| 2016 | 2 |
| 2017 | 3 |
| 2018 | 2 |
| 2019 | 2 |
| 2020 | 2 |
| 2021 | 3 |
| 2022 | 3 |
| 2023 | 3 |
| 2024 | 3 |
From 2010 through 2016, the factor count oscillates between 1 and 2, with a single dominant market factor in most years. Starting in 2017, the count rises to 3 and remains at 3 in four of the last five years (2021–2024), interrupted only by a brief return to 2 factors in 2018–2020. The structural break around 2017 coincides with the post-2016 dispersion in equity returns: the rise of mega-cap tech, the divergence of growth and value, and increased sector-specific volatility. The persistent three-factor regime from 2021 onward suggests that the correlation structure has stabilized at a higher-dimensional configuration, with the third factor now a durable feature rather than a transient perturbation.
The time series reveals that the factor structure is not static. The full-period result (3 significant factors) reflects the average structure over 15 years, but the year-by-year decomposition shows that the market operated in a simpler one- or two-factor regime for the first half of the sample and transitioned to a richer three-factor regime in the second half. This regime dependence is economically meaningful: the number of independent sources of risk varies with market conditions.
Interpretation
Spectral gap and factor hierarchy
The eigenvalue spectrum exhibits a pronounced spectral gap between the third and fourth eigenvalues (1.4928 vs. 0.7632), cleanly separating signal from noise. The three eigenvalues above the Marchenko-Pastur bound are not artifacts of finite-sample correlation—they represent genuine common factors. The remaining twelve eigenvalues, all below the MP upper bound, are statistically consistent with random fluctuations and do not warrant economic interpretation.
The hierarchy λ₁ ≫ λ₂ > λ₃ ≫ λ₄ indicates a dominant market factor (λ₁ = 6.4878, six times the MP bound) and two secondary factors (λ₂ = 1.7134, λ₃ = 1.4928) of comparable but smaller magnitude. The market factor alone explains 43% of variance; the two secondary factors together add another 21 percentage points. This is consistent with a gravity-field analogy in which a single large "mass" (the market portfolio) exerts the strongest pull, and two smaller "masses" (growth/value tilt, sector/volatility regime) exert weaker but statistically significant forces.
Loadings and economic structure
The Factor 1 loadings are nearly uniform across all 15 assets, confirming that this is the market factor: the common mode of return variation that affects all equities equally. The near-equal loadings (0.24–0.29) indicate that within this large-cap universe, market-cap differences do not strongly modulate exposure to the market factor—all assets are "massive" enough to participate fully in the aggregate risk premium.
The Factor 2 loadings reveal a growth/value split: AMZN, NVDA, and GOOGL load negatively with magnitudes 0.35–0.42, while the remaining assets (not shown in full) load positively or near-zero. This factor captures the covariance among high-growth tech names that is orthogonal to the market. The negative sign is arbitrary (eigenvectors are defined up to sign), but the structure is clear: returns on growth equities co-move along a dimension distinct from the broad market, and this dimension explains an additional 11% of variance (λ₂ / 15 ≈ 0.11).
The third factor's loadings are not reported, but its eigenvalue (1.4928) places it firmly above the noise threshold. Given the universe composition, this factor likely isolates a sector or volatility regime effect—perhaps the covariance among financials (JPM, BAC) or energy (XOM, CVX), or a dimension capturing the 2020 pandemic shock and subsequent recovery. The time-series evidence (three factors emerging persistently after 2017) suggests this third factor is not a transient artifact but a stable feature of the post-2016 market structure.
Gravity-field analogy: partial support
The computation question asks whether the spectrum is consistent with a gravity-field decomposition, where the largest eigenvalues correspond to market-cap-weighted "mass clusters." The evidence provides partial support:
Dominant market factor: The largest eigenvalue (6.4878) is indeed a market-cap-weighted aggregate—the "center of mass" in the gravity analogy. Its near-uniform loadings confirm that all assets orbit this central attractor.
Secondary factors as substructure: The second and third eigenvalues (1.7134, 1.4928) do reveal substructure, but the loadings indicate this substructure is growth/value tilt and sector/regime effects, not purely market-cap-based "mass clusters." The Factor 2 loadings concentrate on high-growth tech names (AMZN, NVDA, GOOGL), which are indeed among the largest by market cap, but the factor isolates their idiosyncratic covariance rather than their aggregate mass. The gravity analogy would predict that the second-largest eigenvalue corresponds to the second-largest "mass" (e.g., a mega-cap cluster), but the loadings show it captures a style tilt orthogonal to size.
Spectral gap as regime boundary: The gap between λ₃ and λ₄ cleanly separates signal from noise, consistent with a finite number of "gravitational sources" (factors) and a continuum of noise (idiosyncratic returns). This is the strongest point of alignment with the gravity analogy: the spectrum exhibits a discrete set of large eigenvalues (the "masses") and a continuous distribution of small eigenvalues (the "background field").
Time variation: The regime transition from one or two factors (2010–2016) to three factors (2017–2024) shows that the "gravitational field" is not static. The number of independent sources of risk increased over time, suggesting that the market structure became more complex—more "masses" emerged as distinct attractors. This is consistent with the post-2016 divergence of growth and value, the rise of mega-cap tech, and the increased importance of sector-specific shocks.
The gravity-field analogy is a useful heuristic for the dominant market factor and the spectral gap, but the secondary factors are better interpreted as style and sector tilts than as market-cap-based mass clusters. The eigenvalue spectrum reveals a hierarchical factor structure (one dominant, two secondary, twelve noise), not a pure mass-distance decomposition.
What the result does NOT support
The result does not support the hypothesis that smaller eigenvalues reveal "liquidity-constrained substructure." All eigenvalues below the third (λ₄ = 0.7632 and below) lie within the Marchenko-Pastur noise band and are statistically indistinguishable from random fluctuations. If liquidity constraints were a significant source of covariance, we would expect additional eigenvalues above the MP bound, with loadings concentrated on less-liquid names. The data show no such structure: the twelve small eigenvalues are noise, not signal.
The result also does not establish that the three factors are stable over arbitrary time horizons. The per-year factor count varies from 1 to 3, and the three-factor regime is a feature of the post-2017 period, not the full 15 years. The full-period result (3 significant factors) is an average over a non-stationary process. Any application of this decomposition must account for the regime dependence: the factor structure in 2010–2016 was simpler than in 2017–2024.
Limitations
Universe size and composition: The analysis covers only 15 large-cap U.S. equities, heavily weighted toward technology (6 of 15 tickers). This is a narrow slice of the equity market. A broader universe (e.g., the S&P 500 or Russell 3000) would likely reveal additional factors (small-cap, momentum, quality) and a richer spectral structure. The three-factor result is specific to this large-cap, tech-heavy universe and may not generalize.
In-sample factor count: The per-year factor counts are computed in-sample within each calendar year. This is appropriate for documenting the time variation in correlation structure, but it does not establish that the factors are predictive or stable out-of-sample. A rolling out-of-sample test (e.g., estimate factors on years t−5 to t−1, test on year t) would be needed to assess whether the factor structure is exploitable in real time.
Factor interpretation: The loadings for Factor 3 are not reported, so its economic interpretation is speculative. The eigenvalue confirms it is statistically significant, but without loadings we cannot definitively identify it as a sector, volatility, or regime factor. A complete analysis would report loadings for all significant factors.
Marchenko-Pastur assumptions: The MP null assumes returns are i.i.d. Gaussian with no serial correlation or heteroskedasticity. Equity returns exhibit time-varying volatility (GARCH effects) and fat tails, which can shift the MP bounds. The q-ratio (0.004) is very small, so the bounds are tight, but the test is not robust to non-i.i.d. dynamics. A bootstrap or permutation test that preserves the empirical return distribution would provide a more conservative threshold.
Gravity-field analogy: The computation question frames the analysis as a test of a "gravity-field decomposition," but the method is standard PCA with MP thresholding—it does not explicitly model market cap as "mass" or correlation distance as "gravitational distance." The analogy is a post-hoc interpretation of the eigenvalue hierarchy, not a formal model. A true gravity-field test would require a parametric model (e.g., a network model with market-cap-weighted edges) and a likelihood-ratio test against the PCA null. The current result is consistent with the analogy but does not uniquely support it.
Sample period: The 2010–2024 window includes the post-financial-crisis recovery, the 2020 pandemic shock, and the 2022 inflation/rate-hike regime. These are large structural breaks. The three-factor result may be specific to this period and may not hold in a different macroeconomic regime (e.g., the 1990s tech boom, the 2000s housing cycle). The regime transition around 2017 suggests the factor structure is sensitive to market conditions.
Strengthening the result would require: (i) a broader universe to test generalizability, (ii) out-of-sample validation of the factor count, (iii) full loadings for all significant factors, (iv) a bootstrap or permutation test robust to non-i.i.d. returns, (v) a formal gravity-field model with a likelihood-ratio test, and (vi) replication over multiple non-overlapping sample periods to assess regime dependence.
Research evidence, not investment advice.