Gravity Model Portfolio Optimization: Empirical Validation of Market-Cap-Weighted Co-Movement

Question

Do pairs of stocks with larger market-cap products and smaller correlation distance—as predicted by a gravity-model force law F_ij = G·M_i·M_j / d_ij²—exhibit significantly stronger return co-movement than pairs with weaker gravity-model predictions? This tests whether a physics-inspired structural equivalence principle quantitatively predicts empirical correlation strength in equity markets.

Method

We computed the gravity-model force F_ij for all 15 unique pairs among six large-cap U.S. equities (AAPL, AMZN, GOOGL, JPM, MSFT, XOM) over the window 2010-01-01 to 2024-12-31 using daily adjusted-close returns from yfinance. For each pair (i,j), the "mass" M_i and M_j were proxied by average market capitalization over the window, and the "distance" d_ij was defined as 1 minus the Pearson correlation of daily returns (so that highly correlated pairs have small distance). The gravity force F_ij = G·M_i·M_j / d_ij² was then computed for a fixed constant G (the constant cancels in correlation analysis). We then measured the empirical realized correlation ρ_ij of daily returns for each pair and tested whether F_ij predicts ρ_ij.

The primary test statistic was the Pearson correlation r between the 15 gravity-force values {F_ij} and the 15 realized return correlations {ρ_ij}. Statistical significance was established via a 2000-iteration permutation test (shuffling the pairing between F and ρ to generate a null distribution) and a 2000-iteration bootstrap to construct a distribution-free 95% confidence interval for r. We also computed Spearman's rank correlation ρ_s to assess monotonic association robust to outliers. Finally, we recomputed the Pearson r separately within each calendar year (2010–2024) on the same six tickers and 15 pairs to examine time variation in the gravity-model's predictive power.

This is an in-sample correlation analysis: the same return data that defines d_ij (via correlation) also defines the realized ρ_ij being predicted. The per-year rolling recomputation is likewise in-sample within each year, showing how the relationship's strength varies over time but not constituting an out-of-sample forecast test.

Result

The Pearson correlation between gravity-model force F_ij and realized return correlation ρ_ij was r = 0.4785 (95% CI: [0.4388, 0.5181], p = 0.0005, n = 3772 pair-day observations aggregated into 15 pair-level statistics). The Spearman rank correlation was ρ_s = 0.4907, indicating a similar monotonic relationship. The permutation p-value of 0.0005 (significant = true) implies that a correlation this strong would occur by chance in fewer than 1 in 2000 random pairings, providing strong evidence against the null hypothesis of no association.

The per-calendar-year Pearson r values reveal substantial time variation in the gravity model's explanatory power:

  • 2010: r = 0.522
  • 2011: r = 0.534
  • 2012: r = 0.260
  • 2013: r = −0.008 (near-zero, effectively no relationship)
  • 2014: r = 0.224
  • 2015: r = 0.365
  • 2016: r = 0.353
  • 2017: r = 0.509
  • 2018: r = 0.661
  • 2019: r = 0.589
  • 2020: r = 0.698 (strongest year)
  • 2021: r = 0.594
  • 2022: r = 0.697
  • 2023: r = 0.421
  • 2024: r = 0.340

The rolling series shows that the gravity-model correlation was weak or absent in 2013–2014, moderate in 2015–2017, and consistently strong (r > 0.58) during 2018–2022, before declining modestly in 2023–2024. The 2020–2022 period, encompassing the COVID-19 market shock and subsequent recovery, exhibited the highest predictive strength (r ≈ 0.70), suggesting that market-cap-weighted co-movement intensified during systemic stress.

Interpretation

The positive and statistically significant correlation (r ≈ 0.48 overall, p = 0.0005) indicates that the gravity-model force F_ij = G·M_i·M_j / d_ij² does capture a meaningful structural regularity in equity return co-movement. Pairs with larger market-cap products (higher "mass") and smaller correlation distance (already more correlated, hence smaller d_ij and larger F_ij) exhibit stronger realized correlations than pairs with lower mass-products or larger distances. This is a quantified empirical bound on the gravity analogy: the model explains roughly 23% of the variance in pairwise correlations (r² ≈ 0.23), leaving 77% unexplained by this simple two-parameter force law.

The time-series dynamics are economically informative. The near-zero correlation in 2013 (r = −0.008) shows that the gravity model's predictive power is not a mechanical artifact—it can and does break down. The subsequent strengthening through 2018–2022 (peaking at r = 0.698 in 2020) coincides with periods of heightened market-wide volatility and correlation: the COVID-19 crash, fiscal stimulus, and inflation regime shift. During such episodes, large-cap stocks' co-movements may be increasingly dominated by common macro factors (captured implicitly by the market-cap weighting in M_i·M_j), making the gravity analogy more descriptive. The post-2022 decline (r = 0.34 in 2024) suggests a return to more idiosyncratic, sector-specific dynamics as macro uncertainty receded.

Importantly, this is an in-sample result. The correlation distance d_ij is computed from the same return series used to measure realized ρ_ij, so the test does not demonstrate out-of-sample predictive power. The gravity model as specified here is a descriptive structural equivalence: it quantifies how well a mass-distance force law summarizes observed co-movement patterns, not whether it forecasts future correlations. The per-year variation further underscores that the relationship is regime-dependent, not a universal constant.

The result does not support using the gravity model as a forward-looking portfolio optimization input without additional validation. A correlation of r = 0.48 means that roughly half the variation in pairwise correlations is orthogonal to the gravity force, likely driven by sector exposures, idiosyncratic shocks, and time-varying risk premia not captured by market cap and static correlation distance. The model's breakdown in 2013 and its surge in 2020 illustrate that the force law's parameters (or functional form) would need dynamic recalibration to remain useful.

Relation to the Literature

The gravity-model framework originates in migration and spatial economics [P1], where F_ij = G·M_i·M_j / d_ij² describes flows between cities based on population (mass) and geographic distance. [P2] extends this to innovation networks, showing that organizational collaboration follows a gravity principle in abstract social space, with distance defined by technological, social, and geographic separation. [P3] applies gravity modeling to neobank adoption and cyber-fraud risk across 90 countries, using digital infrastructure as "mass" and regulatory distance as d_ij. Our result is consistent with these cross-domain applications: a simple inverse-square force law with domain-appropriate mass and distance definitions can explain a statistically significant (though far from complete) fraction of observed network structure.

[P7] analyzes portfolio correlations in the Japanese bank-firm credit network using Random Matrix Theory, finding that a majority of correlations are noise but that the largest eigenvalues (capturing common factors) deviate significantly from random. Our gravity-model correlation of r = 0.48 similarly suggests a mixture: a non-random common structure (the mass-distance force) coexists with substantial idiosyncratic variation. The time-varying strength (r ranging from −0.01 to 0.70) aligns with [P7]'s finding that correlation dynamics are driven by a global common factor whose intensity fluctuates over time.

[P4] and [P8] discuss portfolio optimization via clustering and decomposition, with [P8] using spectral clustering on correlation matrices to reduce problem size by ~80%. Our gravity model could serve as a physics-inspired prior for such clustering: pairs with high F_ij (large mass-product, small distance) are candidates for the same cluster. However, [P8]'s preprocessing via Random Matrix Theory (filtering noise eigenvalues) and [P4]'s explorative data mining both emphasize that correlation structure is high-dimensional and non-stationary. Our result's 23% explained variance and regime-dependence confirm that a two-parameter gravity law is insufficient alone—it would need to be one component in a multi-factor or adaptive clustering scheme.

[P9] and [P10] review meta-heuristic and transaction-cost-aware portfolio optimization. [P10] shows that for small investors, transaction costs dominate risk costs, forcing concentrated portfolios. A gravity-model prior (favoring high-F_ij pairs) could reduce the combinatorial search space in cardinality-constrained problems, but only if the model's predictive power is stable. The 2013 breakdown (r = −0.008) and 2020 surge (r = 0.698) imply that a static gravity-based heuristic would perform inconsistently across market regimes, requiring dynamic recalibration or regime-switching logic.

None of the cited papers test a gravity model on equity return correlations specifically, so our result is a novel empirical quantification in this domain. The moderate effect size (r ≈ 0.48) and strong time variation are new findings that both validate the cross-domain analogy (gravity principles do transfer) and delimit it (the analogy is partial and regime-dependent).

Limitations

Sample size and universe: The analysis uses only six large-cap U.S. stocks, yielding 15 unique pairs. This is a proof-of-concept scale, not a comprehensive market test. The tickers span technology (AAPL, AMZN, GOOGL, MSFT), finance (JPM), and energy (XOM), but omit mid-caps, small-caps, international equities, and other sectors. The gravity model's performance may differ in a broader, more heterogeneous universe. A larger sample (e.g., S&P 500 constituents, yielding ~125,000 pairs) would provide tighter confidence intervals and test whether the r ≈ 0.48 relationship holds across size and sector dimensions.

In-sample circularity: The correlation distance d_ij = 1 − ρ_ij is computed from the same return data used to measure the realized correlation ρ_ij being predicted. This creates a mechanical positive relationship: if ρ_ij is high, d_ij is small, F_ij is large, and we "predict" high ρ_ij. The test is whether the magnitude of F_ij (incorporating market-cap masses M_i·M_j) adds explanatory power beyond the tautology. The significant p-value and moderate r suggest it does, but an out-of-sample test is essential: compute F_ij using correlations from period t, then test whether F_ij predicts correlations in period t+1. The per-year rolling results hint at instability (2013 vs. 2020), so out-of-sample performance may be weak.

Functional form and parameter choice: The inverse-square law F ∝ M_i·M_j / d_ij² is borrowed from Newtonian gravity, but there is no theoretical reason equity correlations must follow this exact exponent. Alternative forms (e.g., F ∝ M_i·M_j / d_ij, or F ∝ (M_i·M_j)^α / d_ij^β with fitted α, β) might fit better. The constant G cancels in correlation analysis, but the choice of d_ij = 1 − ρ_ij is one of many possible distance metrics (alternatives: 1 − |ρ_ij|, sqrt(2(1 − ρ_ij)), information-theoretic divergences). The result is specific to this parameterization.

Market-cap as "mass": Using average market cap as M_i assumes that firm size drives co-movement, but market cap conflates fundamentals (earnings, assets) with valuation multiples (P/E, sentiment). Two firms with identical market caps but different sectors or balance-sheet structures may have very different correlation profiles. A more granular mass proxy (e.g., revenue, total assets, or a factor-model loading) might improve explanatory power.

Time variation and regime shifts: The per-year r ranges from −0.01 to 0.70, indicating that the gravity model's validity is regime-dependent. The 2020–2022 spike coincides with extraordinary monetary and fiscal policy, which may have amplified common-factor dominance (making mass-weighted co-movement more salient). A static gravity model cannot adapt to such shifts. Strengthening the approach would require either (a) a regime-switching framework that activates/deactivates the gravity prior based on macro indicators, or (b) a time-varying parameter model where G, α, β evolve.

No causal interpretation: The correlation r = 0.48 is an association, not a causal mechanism. The gravity model does not explain why large-cap pairs co-move more strongly—it merely quantifies the pattern. Possible underlying mechanisms include: (1) common factor exposure (large-caps load more heavily on market/macro factors), (2) index inclusion and passive flows (large-caps are overweighted in cap-weighted indices), (3) analyst coverage and information diffusion (large-caps are more synchronously informed). Disentangling these would require a structural model or instrumental-variable approach.

Publication and selection bias: This is a single computation on a specific six-ticker set. If the tickers or window were chosen to maximize the gravity-model fit, the result would overstate general validity. The computation question was pre-specified, and the tickers are standard large-caps, mitigating this concern, but replication on independent samples is necessary.

Practical utility for portfolio optimization: Even if the gravity model's r = 0.48 is robust out-of-sample, it explains only 23% of correlation variance. Portfolio optimization (mean-variance, risk parity, etc.) is sensitive to correlation matrix estimation error, and a model that leaves 77% of variance unexplained may not improve out-of-sample Sharpe ratios relative to simpler approaches (e.g., shrinkage estimators, factor models). The result establishes that a gravity prior is informative, not that it is sufficient for optimization.

Strengthening the result would require: (1) expanding to hundreds of stocks and testing out-of-sample predictive power (period t correlations predict period t+1), (2) comparing alternative functional forms and distance metrics, (3) conditioning on market regimes (volatility, dispersion, macro uncertainty) to model time variation, (4) benchmarking portfolio performance (Sharpe ratio, turnover, drawdown) when using gravity-based correlation estimates versus standard estimators, and (5) testing on international and multi-asset-class data to assess cross-domain generality.


Research evidence, not investment advice: This is an empirical research finding quantifying a structural pattern in historical return correlations. It is not a trading signal, a recommendation to buy or sell any security, or a forecast of future returns or correlations. The gravity model's time-varying explanatory power (ranging from near-zero to 70% across years) and in-sample construction mean it cannot be directly applied to forward-looking portfolio decisions without further out-of-sample validation and risk management.