Empirica Technologies

Question

Does the eigenvalue spectrum of a 10-asset cross-sector portfolio exhibit a statistically significant spectral gap separating structured common-factor dynamics from random-matrix noise, and how does the number of significant factors evolve over time in relation to market regime changes?

Method

We computed the eigenvalue spectrum of the return correlation matrix for a 10-asset portfolio spanning technology (AAPL, AMZN, GOOGL, MSFT), finance (JPM), energy (CVX, XOM), pharmaceuticals (JNJ, PFE), and consumer goods (KO). The data source is yfinance daily adjusted-close returns over the window 2010-01-01 through 2024-12-31, yielding n=3,772 observations with a q-ratio (assets/observations) of 0.003.

The method applies principal component analysis to the correlation matrix and tests eigenvalues against the Marchenko-Pastur (MP) null distribution, which characterizes the eigenvalue spectrum of a purely random correlation matrix. Under the MP null with q=0.003, eigenvalues should fall within the bounds [0.8997, 1.1056]. Eigenvalues exceeding the upper bound of 1.1056 are statistically distinguishable from noise and represent genuine common factors driving cross-asset comovement.

To assess time variation, we recomputed the spectrum annually on in-sample data within each calendar year from 2010 through 2024, counting the number of significant factors (eigenvalues above the MP upper bound) in each year.

Result

The full-sample eigenvalue spectrum exhibits a clear spectral gap. The top two eigenvalues are 4.6758 and 1.4407, both substantially exceeding the MP upper bound of 1.1056. The third eigenvalue is 0.9756, falling within the MP null range [0.8997, 1.1056], as do all subsequent eigenvalues (0.5835, 0.511, 0.4751, 0.4302, 0.391, 0.3526, 0.1644). This yields n_significant_factors = 2.

The first factor explains 46.76% of total variance (variance_explained_top1 = 0.4676), and the two significant factors together explain 61.16% (variance_explained_significant = 0.6116). The remaining 38.84% of variance is consistent with random-matrix noise.

Factor loadings reveal economic structure:

Factor 1 loads most heavily on MSFT (-0.347), JPM (-0.339), and CVX (-0.336), with similar magnitudes across tech, finance, and energy. This factor captures broad market comovement—a common systematic risk component affecting all sectors.
Factor 2 exhibits sector differentiation: AMZN (-0.472) and GOOGL (-0.38) load negatively (tech growth), while XOM loads positively (0.402, energy value). This factor separates growth-oriented technology from value-oriented energy, consistent with a growth-versus-value rotation dynamic.

Time variation in factor count:

The rolling annual recomputation shows the number of significant factors increased from 1 (2010–2016) to 2 (2017–2018), reverted to 1 (2019), rose to 2 (2020–2023), and reached 3 in 2024:

2010–2016: 1 factor (stable low-volatility regime)
2017: 2 factors (emergence of sector differentiation)
2018: 1 factor (market-wide stress, convergence to single risk factor)
2019: 1 factor (post-stress stabilization)
2020: 2 factors (COVID-19 shock, divergence between sectors)
2021–2023: 2 factors (persistent inflation/rate regime, sustained sector rotation)
2024: 3 factors (increased complexity, possible emergence of third structural dimension)

The increase from 1 to 2 factors in 2017 coincides with the end of the post-crisis low-volatility regime and the beginning of sector-differentiated performance. The spike to 3 factors in 2024 suggests an additional structural dimension beyond market beta and growth-value rotation, potentially related to AI-driven tech differentiation or energy transition dynamics.

Interpretation

The spectral gap is unambiguous: two eigenvalues lie far above the Marchenko-Pastur upper bound, while the remaining eight fall within or below the noise range. This confirms the existence of a "structural morphism regime"—a low-dimensional subspace of common payoff-exchange patterns—distinct from the high-dimensional noise regime.

What the result supports:

Dimensionality reduction is statistically justified. The portfolio's return dynamics are well-approximated by a two-factor model (full sample) or one-to-three factors (time-varying), not the full 10-dimensional space. The 61.16% variance explained by two factors represents genuine cross-asset structure, not spurious correlation.
Sector diversification does not eliminate common factors. Despite spanning five sectors, the portfolio exhibits strong comovement: the first factor alone captures 46.76% of variance, indicating that sector labels do not fully insulate assets from shared systematic risk.
Factor structure is time-varying and regime-dependent. The increase from one to two factors in 2017 and the jump to three in 2024 indicate that the dimensionality of the structural regime is not constant. Periods of market stress (2018) compress the spectrum toward a single dominant factor, while periods of sector rotation (2020–2024) expand it.

What the result does NOT support:

The result does not identify the economic drivers of the factors. While the loadings suggest a market factor and a growth-value factor, the PCA is purely statistical. The factors are linear combinations of returns, not exogenous economic variables. Causal interpretation requires additional modeling.
The result does not predict future factor counts or loadings. The time variation is in-sample within each year. The increase to three factors in 2024 does not imply persistence into 2025.
The result does not address liquidity constraints or margin-binding events. The computation question asked whether the spectral gap correlates with liquidity constraints and margin events, but the computed result contains no liquidity or margin data. The eigenvalue spectrum characterizes return comovement, not trading frictions or leverage constraints. Answering the liquidity question requires additional data (bid-ask spreads, margin call frequencies, funding costs) not present in the yfinance return series.

Relation to the Literature

The result aligns with [P2]'s finding that empirically observed eigenvalue bulks in financial correlation matrices emerge as superpositions of smaller cluster structures, not pure noise. Our two-factor result (4.6758, 1.4407) above the MP bound, with the remaining eight eigenvalues in the noise range, mirrors [P2]'s interpretation: the "bulk" of small eigenvalues is consistent with random-matrix theory, while the large eigenvalues reflect genuine cross-correlations between stocks. [P2] attributes these large eigenvalues to factor models, which our loadings corroborate—Factor 1 as a market factor, Factor 2 as a sector rotation factor.

[P1] applies random matrix theory to cryptocurrency portfolios, filtering correlation matrices to enhance risk-return profiles. Our result extends this logic to equities: the spectral gap identifies which dimensions of the correlation matrix contain signal (the two factors above the MP bound) versus noise (the eight below). A portfolio construction strategy could overweight the two significant eigenvectors and underweight the noise dimensions, analogous to [P1]'s filtering.

[P3] studies sparse non-Hermitian random matrices, relevant to directed network models of financial systems. While our correlation matrix is symmetric (Hermitian), the time variation in factor count (1 → 2 → 3) suggests that the effective interaction structure among assets is not static. A directed network representation (e.g., Granger causality among returns) might reveal asymmetric lead-lag relationships not captured by the symmetric correlation spectrum, a direction [P3]'s methods could address.

[P7]'s categorical representation of games offers a conceptual bridge to "categorical spectralism." If we interpret each asset as a player and each factor as a strategic equilibrium (a Nash equilibrium in the game of portfolio allocation), the spectral gap separates equilibrium strategies (the two significant factors) from non-equilibrium noise. The time variation in factor count then represents regime shifts in the equilibrium structure—a change in the "game" being played. This is speculative but suggests a formal connection between spectral methods and categorical game theory.

[P5] and [P6] address spectral methods for dynamical systems and density deconvolution, respectively. While not directly applicable to static correlation matrices, [P5]'s emphasis on rigorous convergence guarantees for spectral computations from data resonates with our use of the Marchenko-Pastur bound as a statistical test. The MP bound provides a rigorous null hypothesis (random matrix) against which to test the observed spectrum, avoiding the "spectral pollution" [P5] warns against in Koopman operator approximations.

[P4]'s non-topological persistence theory is less directly related but hints at a broader categorical framework. If we view the eigenvalue sequence as a filtration (ordered by magnitude), the "persistence" of the top two eigenvalues above the MP bound across the full sample—and the varying persistence of the third eigenvalue in 2024—could be formalized as a persistence diagram in a category of correlation matrices. This would generalize the spectral gap from a binary (signal/noise) classification to a graded notion of structural significance.

Limitations

Sample composition. The 10-asset universe is a convenience sample, not a representative cross-section of the market. The choice of one or two assets per sector introduces selection bias: AAPL and MSFT may not represent "technology" as a whole, and JPM alone does not capture "finance." A larger, sector-balanced universe (e.g., 100 assets, 10 per sector) would provide a more robust test of cross-sector factor structure.
In-sample annual recomputation. The time variation in factor count is computed in-sample within each calendar year, not out-of-sample. The increase to three factors in 2024 is based on 2024 data only; it does not predict 2025. An out-of-sample test (e.g., estimate factors on 2010–2023, test on 2024) would assess whether the factor structure is stable enough to forecast.
No liquidity or margin data. The computation question asked whether the spectral gap correlates with liquidity constraints and margin-binding events, but the yfinance return data contains no information on bid-ask spreads, trading volumes, margin requirements, or funding costs. The eigenvalue spectrum characterizes return comovement, not market microstructure or leverage dynamics. Answering the liquidity question requires merging the return data with high-frequency trading data or broker margin call records.
Linear factor model assumption. PCA assumes that factors are linear combinations of returns and that the correlation matrix is the sufficient statistic for dependence. Nonlinear dependencies (e.g., tail dependence, volatility clustering) are not captured. A nonlinear dimensionality reduction method (e.g., kernel PCA, autoencoders) might reveal additional structure.
Stationarity assumption. The full-sample spectrum (n=3,772 observations) assumes that the correlation structure is stationary over 2010–2024. The time variation in annual factor counts contradicts this assumption, indicating regime shifts. A formal regime-switching model (e.g., hidden Markov model with regime-dependent correlation matrices) would better characterize the non-stationarity.
Interpretation of the third factor in 2024. The jump to three significant factors in 2024 is intriguing but unexplained. Without examining the third factor's loadings (not provided in the computed result), we cannot identify its economic meaning. It could represent AI-driven tech differentiation, energy transition, or a statistical artifact of 2024's specific return realizations. Recomputing the 2024 spectrum with factor loadings would clarify.

Strengthening the result:

Expand the universe to 50–100 assets with balanced sector representation.
Compute out-of-sample factor stability: estimate on a rolling 5-year window, test on the subsequent year.
Merge return data with liquidity proxies (bid-ask spreads, Amihud illiquidity) and margin data (if available from broker APIs) to test the correlation between spectral gap and liquidity/margin events.
Apply nonlinear dimensionality reduction to test whether the two-factor linear model is sufficient or whether nonlinear factors exist.
Fit a regime-switching model to the annual factor counts to formalize the regime transitions (1 → 2 → 3 factors).
Extract and interpret the third factor's loadings in 2024 to identify its economic driver.

Research evidence, not investment advice.

Categorical Spectralism — spectral decomposition of portfolio return spaces