Empirica Technologies

Question

Does the eigenvalue spectrum of large-cap equity returns reveal a stable factor structure robust to regime shifts (2015–2024), and do instruments with similar spectral signatures exhibit hedging equivalence (lower residual correlation after factor extraction)?

Method

We computed the eigenvalue spectrum of the return correlation matrix for 11 large-cap U.S. equities (AAPL, AMZN, BAC, CVX, GOOGL, JNJ, JPM, META, MSFT, NVDA, PFE, XOM) using daily adjusted-close returns from yfinance over the window 2010-01-01 to 2024-12-31 (3,772 observations). The analysis applied principal component analysis (PCA) to the correlation matrix and compared the resulting eigenvalues against the Marchenko-Pastur (MP) null distribution, which characterizes the eigenvalue spectrum expected from a purely random correlation matrix with the same dimensions and sample size. Eigenvalues exceeding the MP upper bound (1.1109) are statistically distinguishable from random-matrix noise and indicate the presence of genuine common factors. The MP lower bound was 0.8949, and the ratio of assets to observations (q-ratio) was 0.003, placing the analysis in the large-sample regime where random-matrix theory provides sharp predictions.

To assess temporal stability, we recomputed the significant factor count on a per-calendar-year basis using the same method within each year's data (in-sample within each year). This rolling recomputation reveals how the factor structure evolves across market regimes, including the 2015–2016 volatility spike, the 2018 correction, the 2020 pandemic shock, the 2021–2022 inflation regime, and the 2023–2024 AI-driven rally.

Result

The full-period eigenvalue spectrum yielded 10 eigenvalues (the maximum for 11 assets): [5.1473, 1.5929, 0.9728, 0.7274, 0.5580, 0.5024, 0.4561, 0.3919, 0.3428, 0.1652]. Comparing these to the MP upper bound of 1.1109, exactly 2 eigenvalues exceeded the threshold, indicating 2 statistically significant common factors in the full-period data. The largest eigenvalue (5.1473) explained 46.79% of total variance; the two significant factors together explained 61.28% of variance.

The first factor loaded most heavily on JPM (−0.343), MSFT (−0.335), and BAC (−0.331), suggesting a broad market or financial-sector component. The second factor exhibited a clear energy-versus-technology split: AMZN loaded −0.399, while XOM loaded +0.395 and CVX loaded +0.368. This second factor captures the well-documented negative correlation between energy and growth-tech sectors, driven by opposing sensitivities to interest rates, inflation expectations, and commodity prices.

The per-year significant factor count revealed substantial time variation:

2010–2015: consistently 1 significant factor per year.
2016: 2 factors (volatility spike, divergence of energy and tech).
2017: 2 factors (sustained low-volatility regime with sector rotation).
2018: 1 factor (sharp correction compressed cross-sectional dispersion).
2019: 1 factor (recovery rally homogenized returns).
2020: 2 factors (pandemic shock differentiated sectors sharply).
2021–2024: consistently 2 factors (persistent inflation regime, Fed tightening, and AI-driven tech divergence maintained cross-sectional structure).

The transition from 1 to 2 factors in 2016 and the sustained 2-factor regime from 2020 onward indicate that the factor structure is not stable across the full window. The 2015–2024 sub-period exhibits a regime shift: the pre-2016 market was dominated by a single common factor (broad market beta), while the post-2016 market consistently exhibits a second orthogonal factor (sector rotation, particularly energy-tech divergence).

Interpretation

The computed eigenvalue spectrum provides direct evidence that large-cap equity returns are not governed by a single common factor, even within a homogeneous universe of mega-cap stocks. The presence of 2 significant factors in the full period, and the time-varying factor count (1 in early years, 2 in later years), demonstrates that the return-generating process has multiple orthogonal sources of systematic risk.

The first factor's broad loading across financials and technology (JPM, MSFT, BAC) is consistent with a market-wide component—likely exposure to aggregate demand, monetary policy, and equity risk premium. The second factor's energy-tech split (XOM/CVX positive, AMZN negative) isolates a sector-rotation dynamic: when energy outperforms (rising commodity prices, inflation fears), growth-tech underperforms (rising discount rates), and vice versa. This orthogonality is economically interpretable and aligns with the observed negative correlation between energy and technology sectors over the past decade.

The rolling factor count reveals that the factor structure is regime-dependent. The single-factor regime of 2010–2015 reflects the post-financial-crisis recovery, characterized by synchronized global growth and low cross-sectional dispersion (the "risk-on/risk-off" regime). The emergence of a second factor in 2016 coincides with the oil price collapse and the beginning of the Fed's normalization cycle, which introduced a persistent inflation-growth trade-off. The sustained 2-factor regime from 2020 onward reflects the pandemic-induced divergence (stay-at-home tech vs. cyclical energy), followed by the 2021–2022 inflation shock and the 2023–2024 AI-driven tech rally—all of which maintained cross-sectional dispersion.

The variance explained by the two significant factors (61.28%) implies that 38.72% of variance remains idiosyncratic after factor extraction. This residual variance quantifies the extent to which instruments with similar spectral signatures (high loadings on the same factors) do not exhibit perfect hedging equivalence. For example, JPM and BAC both load heavily on factor 1 (−0.343 and −0.331), but their residual correlation (not reported in the computed result) would capture bank-specific risks (credit exposure, regulatory capital, management quality) that are orthogonal to the common factors. The 38.72% residual variance sets an upper bound on the hedging equivalence achievable via factor-based portfolio construction: even after neutralizing exposure to the 2 significant factors, nearly 40% of variance remains unexplained.

The Marchenko-Pastur framework provides a rigorous null hypothesis: if the correlation matrix were purely random (no common factors), the eigenvalue spectrum would lie entirely within the MP bounds [0.8949, 1.1109]. The fact that 2 eigenvalues exceed the upper bound, and that the largest eigenvalue (5.1473) is nearly 5 times the MP upper bound, constitutes strong evidence against the random-matrix null. The remaining 8 eigenvalues (all below the MP upper bound) are statistically indistinguishable from noise, meaning they do not represent genuine common factors but rather sampling variation in the correlation matrix.

The time variation in factor count has direct implications for portfolio construction and risk management. A strategy designed under the assumption of a stable 2-factor structure (e.g., a long-short portfolio hedged against both factors) would have been over-hedged in 2010–2015 (when only 1 factor was significant) and correctly hedged in 2020–2024. Conversely, a single-factor model (e.g., CAPM beta) would have been adequate in the early period but would have left significant residual risk unhedged in the later period. The regime shift in 2016 marks a structural break in the return-generating process, likely driven by the end of the zero-interest-rate policy and the re-emergence of inflation as a macroeconomic state variable.

The energy-tech split in factor 2 is particularly notable because it isolates a hedgeable source of risk: a portfolio long AMZN and short XOM/CVX (or vice versa) would have near-zero exposure to factor 2, leaving only factor 1 (market beta) and idiosyncratic risk. This decomposition enables precise risk targeting: an investor seeking pure market exposure would neutralize factor 2 by balancing energy and tech weights; an investor seeking to isolate the energy-tech rotation would neutralize factor 1 by constructing a market-neutral long-short portfolio. The spectral decomposition thus provides a constructive basis for portfolio design, not merely a descriptive statistic.

Relation to the Literature

No closely related papers were retrieved for this computation. The result stands on its own as an empirical measurement of the eigenvalue spectrum and its time variation. The Marchenko-Pastur framework is a standard tool in random-matrix theory, widely applied in physics and quantitative finance to distinguish signal from noise in high-dimensional correlation matrices. The finding of 2 significant factors in large-cap equities is consistent with the broader empirical asset-pricing literature, which has documented multiple sources of systematic risk (market, size, value, momentum, quality, low-volatility) beyond the single-factor CAPM. The energy-tech split in factor 2 aligns with the well-known negative correlation between energy and growth sectors, driven by their opposing sensitivities to inflation and interest rates.

The time variation in factor count (1 in early years, 2 in later years) is a novel empirical observation within this specific universe and window. It suggests that the factor structure is not a fixed property of the asset class but rather a regime-dependent phenomenon, shaped by macroeconomic conditions (monetary policy, inflation, commodity prices) and market microstructure (sector composition, cross-sectional dispersion). This finding has implications for factor models used in risk management and portfolio optimization: a model calibrated on data from one regime (e.g., 2010–2015) may mis-specify the factor structure in a different regime (e.g., 2020–2024), leading to under-hedged or over-hedged portfolios.

Limitations

The analysis is confined to 11 large-cap U.S. equities over a 15-year window. The small cross-sectional dimension (11 assets) limits the number of factors that can be identified: with 11 assets, the maximum number of eigenvalues is 10, and the MP bounds are relatively wide (0.8949 to 1.1109) due to the low q-ratio (0.003). A larger universe (e.g., 100 or 500 stocks) would yield a denser eigenvalue spectrum and tighter MP bounds, potentially revealing additional significant factors (e.g., size, value, momentum) that are not detectable in this small sample.

The per-year factor count is computed in-sample within each year, meaning each year's factor structure is estimated on that year's data alone. This approach reveals time variation but does not test out-of-sample predictability: we do not know whether the 2-factor structure identified in 2024 will persist into 2025, or whether a new regime shift will introduce a third factor or collapse back to a single factor. An out-of-sample test would require estimating the factor structure on a training window (e.g., 2010–2020) and testing its stability on a holdout window (e.g., 2021–2024).

The residual correlation after factor extraction is not reported in the computed result, so we cannot directly quantify the hedging equivalence of instruments with similar spectral signatures. The 38.72% unexplained variance provides an upper bound on residual risk, but the actual residual correlation between, say, JPM and BAC (both high on factor 1) would require additional computation. A follow-up analysis could compute the residual correlation matrix (correlation of returns after projecting out the 2 significant factors) and test whether instruments with similar loadings exhibit lower residual correlation than instruments with dissimilar loadings.

The eigenvalue spectrum is computed on the correlation matrix (standardized returns), which treats all assets as equally weighted. An alternative approach would compute the spectrum of the covariance matrix (raw returns), which would weight assets by their volatility. High-volatility assets (e.g., NVDA, META) would dominate the covariance spectrum, potentially revealing different factors than the correlation spectrum. The choice of correlation versus covariance depends on the application: correlation is appropriate for equal-weighted portfolios or factor models that standardize returns; covariance is appropriate for volatility-weighted portfolios or risk models that preserve the scale of returns.

The Marchenko-Pastur framework assumes that returns are i.i.d. (independent and identically distributed) and that the correlation matrix is estimated from a single realization of a random process. In reality, equity returns exhibit time-varying volatility (GARCH effects), autocorrelation (momentum), and regime shifts (structural breaks). These violations of the i.i.d. assumption can bias the eigenvalue spectrum: time-varying volatility inflates the largest eigenvalue (the market factor), while autocorrelation can introduce spurious small eigenvalues. A robust extension would apply the MP test to residuals from a GARCH or regime-switching model, or use a bootstrap procedure to construct empirical MP bounds that account for non-i.i.d. dynamics.

The factor loadings are reported for the top 3 assets on each factor, but the full loading matrix (11 assets × 2 factors) is not provided. This limits the economic interpretation: we know that JPM, MSFT, and BAC load heavily on factor 1, but we do not know how AAPL, GOOGL, or NVDA load on factor 1 versus factor 2. A complete loading matrix would enable a richer interpretation of the factors (e.g., is factor 1 a pure market factor, or does it tilt toward financials? Is factor 2 purely energy-tech, or does it also capture other sector rotations?).

Finally, the analysis does not test the predictive power of the factor structure: we do not know whether the 2 factors identified in-sample have explanatory power for future returns, or whether they merely describe historical covariance. A predictive test would regress future returns on the factor loadings estimated from past data, or construct a long-short portfolio based on factor exposure and measure its out-of-sample Sharpe ratio. The current result establishes that the factor structure exists and varies over time, but it does not establish that the factors are economically meaningful or tradeable.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces