Empirica Technologies

Question

Does the eigenvalue spectrum of the return correlation matrix across a diversified universe exhibit a stable, regime-invariant factor structure—distinguishable from random-matrix noise—that persists across market regimes, validating the categorical claim that spectral density yields a codomain-complete basis for portfolio returns?

Method

We computed the eigenvalue spectrum of the daily return correlation matrix for a 10-asset universe (AAPL, AMZN, CVX, GOOGL, JNJ, JPM, KO, MSFT, PG, XOM) over the window 2010-01-01 to 2024-12-31 (3,772 observations). The q-ratio (n_assets / n_obs) is 0.003, placing the analysis in the large-sample regime where random-matrix theory provides a sharp null hypothesis.

We applied principal component analysis to the correlation matrix and tested the eigenvalue spectrum against the Marchenko-Pastur (MP) distribution, the theoretical eigenvalue density for a correlation matrix of purely random returns. Under the MP null, eigenvalues lie within the interval [λ_min, λ_max], where the bounds depend on the q-ratio. For our data, the MP upper bound is 1.1056 and the lower bound is 0.8997. Eigenvalues exceeding the upper bound are statistically distinguishable from random-matrix noise and indicate the presence of genuine common factors driving covariation beyond what random fluctuation would produce.

To assess regime stability, we recomputed the eigenvalue spectrum and factor count separately for each calendar year (2010–2024) using the same method in-sample within each year. This rolling-window analysis reveals whether the number of significant factors and their explanatory power vary systematically with market conditions.

Result

The full-sample eigenvalue spectrum is: [4.6999, 1.4355, 1.0925, 0.522, 0.4955, 0.4741, 0.3945, 0.3756, 0.3462, 0.1641]. Two eigenvalues exceed the MP upper bound of 1.1056: the first eigenvalue (4.6999) and the second eigenvalue (1.4355). The third eigenvalue (1.0925) falls just below the threshold. Thus, n_significant_factors = 2 over the full sample.

The top factor explains 47% of total variance; the two significant factors together explain 61.35% of variance. The remaining eight eigenvalues lie within or below the MP bounds, consistent with random noise.

Factor loadings reveal economic structure:

Factor 1 loads most heavily on MSFT (−0.349), JPM (−0.334), and CVX (−0.330). This factor captures broad market comovement, with balanced representation across technology, financials, and energy.
Factor 2 loads most heavily on AMZN (−0.48), XOM (+0.395), and GOOGL (−0.382). The sign contrast between XOM and the technology names suggests a sector-rotation or energy-versus-growth dimension.

Time variation in the factor count is substantial. The per-year significant factor count is:

2010–2016: 1 factor every year.
2017–2018: 2 factors.
2019: 1 factor.
2020: 2 factors.
2021–2024: oscillates between 2 and 3 factors (3 in 2021, 2 in 2022, 3 in 2023, 3 in 2024).

The factor count increases over time, rising from a stable single-factor regime in the early 2010s to a two-to-three-factor regime in the 2020s. This is not regime-invariant stability; it is regime-dependent evolution.

Interpretation

The eigenvalue spectrum provides clear evidence of non-random factor structure: two eigenvalues significantly exceed the Marchenko-Pastur upper bound, and together they explain 61.35% of variance. This is far more than the ~20% (2/10) one would expect from a uniform random spectrum. The presence of these factors is statistically robust under the random-matrix null.

However, the regime-invariance claim is not supported. The number of significant factors varies from 1 to 3 across calendar years, with a clear upward trend. The early 2010s exhibit a single dominant factor (broad market beta), while the 2020s exhibit two to three factors, suggesting increased differentiation—possibly reflecting the rise of distinct technology/growth versus value/energy dynamics, or the impact of monetary policy regime shifts (zero rates → tightening). The factor structure is not a fixed, codomain-complete basis; it is time-varying and regime-dependent.

The loadings are economically interpretable. Factor 1 is a market factor with broad cross-sector exposure. Factor 2 captures a sector-rotation or style dimension (energy positive, technology negative), consistent with known energy-versus-growth divergences in the 2020s. These are not arbitrary linear combinations; they correspond to economically meaningful sources of covariation.

The codomain-completeness claim requires careful interpretation. The two significant factors span 61.35% of the variance space, not 100%. The remaining 38.65% is distributed across eight eigenvalues consistent with noise. If "codomain-complete" means the significant factors form a sufficient basis for the non-random component of returns, the claim is supported: the significant subspace captures the structured covariation, and the residual is indistinguishable from random fluctuation. If "codomain-complete" means the factors span the entire return space (including noise), the claim is false: the spectrum is not degenerate, and the noise subspace is non-trivial.

The stability of the top eigenvalue is notable. Even as the factor count varies, the first eigenvalue remains dominant (variance explained ~47%), suggesting a persistent market-wide factor. The instability is in the second and third eigenvalues, which move in and out of significance. This is consistent with a stable core (market beta) and a time-varying periphery (sector/style factors).

Relation to the Literature

No closely related papers were retrieved, so this result stands on its own computational evidence. The Marchenko-Pastur framework is standard in random-matrix theory applied to finance (originating in physics and introduced to portfolio theory in the early 2000s), but the specific question—whether the factor structure is regime-invariant—is an empirical claim tested here directly.

The finding of two to three significant factors in a 10-asset universe is consistent with the general empirical literature on equity factor models, which typically identifies a small number of common factors (market, size, value, momentum, quality) driving the bulk of covariation. The time variation in factor count aligns with evidence that factor premia and correlations are regime-dependent, varying with monetary policy, volatility regimes, and macroeconomic cycles.

The sector-rotation interpretation of Factor 2 (energy versus technology) is consistent with observed divergences in the 2020s, when energy outperformed during inflation/rate-hike periods while technology underperformed, and vice versa during easing. This is not a novel finding, but the spectral decomposition provides a clean, model-free quantification of the phenomenon.

Limitations

Sample size and universe choice: The analysis uses 10 assets over 15 years. While the q-ratio (0.003) is favorable for random-matrix inference, the universe is small and hand-selected (large-cap U.S. equities across sectors). A larger, more representative universe (e.g., the S&P 500 or a global equity index) would provide a more comprehensive test of the codomain-completeness claim. The current universe is diversified by sector but not by market cap, geography, or asset class.

In-sample rolling windows: The per-year factor counts are computed in-sample within each year. This is not an out-of-sample test of regime stability. A true regime-invariance test would require estimating the factor structure in one period and validating it in a subsequent, non-overlapping period. The current analysis shows that the factor count varies when recomputed on different subsamples, but it does not test whether a factor model estimated in one regime predicts covariation in another.

Eigenvalue threshold sensitivity: The Marchenko-Pastur bound is a sharp threshold under the random null, but it is a point estimate. The third eigenvalue (1.0925) is just below the upper bound (1.1056). Small perturbations in the data or the q-ratio could shift this eigenvalue across the threshold, changing the factor count from 2 to 3. A bootstrap confidence interval around the MP bound would quantify this sensitivity.

Economic interpretation of factors: The loadings are reported for the top three assets per factor, but the full loading vectors are not analyzed. A complete interpretation would examine all loadings, assess their stability over time, and test whether they align with known economic factors (e.g., regressing the factor returns on Fama-French factors or macroeconomic variables). The current interpretation is suggestive, not definitive.

Categorical claim: The phrase "codomain-complete basis" is not standard in the empirical finance literature, and its precise meaning is ambiguous. If it means the significant factors span the non-random subspace, the claim is supported. If it means the factors span the entire return space (including noise), the claim is false by construction—PCA decomposes the space into signal and noise, and the noise subspace is non-trivial here. Clarifying the categorical claim would sharpen the test.

Regime definition: The per-year factor counts treat calendar years as regimes, but this is an arbitrary partition. Market regimes (bull/bear, low/high volatility, easing/tightening) do not align with calendar boundaries. A more principled regime definition (e.g., NBER recessions, VIX quantiles, or a hidden Markov model) would provide a stronger test of regime invariance.

Strengthening the result: To validate regime invariance, one would need to: (1) define regimes ex-ante using economic or statistical criteria, (2) estimate the factor structure in one regime, (3) test whether the same factor count and loadings hold out-of-sample in a different regime, and (4) repeat across multiple regime pairs. The current analysis shows time variation, which is evidence against regime invariance, but it does not test the stronger claim that a single factor model generalizes across regimes.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces