Question

Do equity (SPY) and long-duration Treasury (TLT) returns exhibit a structurally stable negative correlation suitable for categorical hedging, and does that correlation strength persist across market regimes?

Method

We computed the Pearson correlation coefficient between daily adjusted-close returns for SPY (SPDR S&P 500 ETF Trust) and TLT (iShares 20+ Year Treasury Bond ETF) over the window 2010-01-01 through 2024-12-31, yielding 3,772 paired daily observations. The data source is yfinance daily adjusted-close returns. Statistical significance was established via a 2,000-permutation test under the null hypothesis of zero correlation, and uncertainty was quantified with a 2,000-bootstrap 95% confidence interval (distribution-free). To assess regime stability, we recomputed the Pearson correlation within each calendar year (2010–2024) using the same method, producing a per-year time series of correlation estimates.

This design directly tests whether the negative correlation is (a) statistically distinguishable from zero over the full sample, (b) economically meaningful in magnitude, and (c) stable or time-varying across the fifteen-year window. The per-year recomputation is an in-sample diagnostic within each year; it does not constitute an out-of-sample validation but reveals how the correlation structure evolves with market conditions.

Result

Over the full 2010–2024 sample, the Pearson correlation between SPY and TLT daily returns is r = –0.3224 (Spearman ρ = –0.2774), with a permutation-test p-value of 0.0005 and a 95% bootstrap confidence interval of [–0.3749, –0.2683]. The correlation is statistically significant and robustly negative: the confidence interval excludes zero by a wide margin, and the p-value indicates that a correlation of this magnitude would arise by chance under independence in fewer than 1 in 2,000 permutations.

The per-year Pearson correlations reveal substantial time variation:

Year Pearson r
2010 –0.555
2011 –0.709
2012 –0.652
2013 –0.214
2014 –0.460
2015 –0.372
2016 –0.373
2017 –0.334
2018 –0.282
2019 –0.457
2020 –0.477
2021 –0.139
2022 +0.085
2023 +0.126
2024 +0.061

From 2010 through 2020, the correlation was negative in every year, ranging from –0.282 (2018) to –0.709 (2011), with a median near –0.46. The strongest negative correlations occurred during the European sovereign debt crisis (2011–2012) and the immediate post-financial-crisis period (2010), when flight-to-quality dynamics were pronounced. The correlation weakened but remained negative through the late 2010s expansion (2017–2019) and the COVID-19 shock (2020).

A regime shift is evident beginning in 2021: the correlation compressed to –0.139 in 2021, turned positive in 2022 (+0.085), remained positive in 2023 (+0.126), and was near zero in 2024 (+0.061). The 2022–2023 period coincides with the Federal Reserve's rapid tightening cycle and simultaneous drawdowns in both equities and long-duration bonds—a joint sell-off that mechanically produces positive or near-zero correlation. The 2024 estimate near zero suggests neither strong positive nor negative co-movement in that year.

Interpretation

The full-sample correlation of –0.32 quantifies a moderate negative relationship: on average, days when SPY returns are one standard deviation above their mean correspond to TLT returns approximately 0.32 standard deviations below their mean, and vice versa. This is consistent with the canonical flight-to-quality mechanism: equity risk-off episodes drive capital into long-duration Treasuries, lowering yields and raising TLT prices, while equity rallies coincide with Treasury sell-offs as investors rotate out of safe assets. The confidence interval [–0.37, –0.27] indicates that the population correlation, if stable, lies in a range that is economically meaningful for portfolio construction—a 30–37% offsetting co-movement is sufficient to reduce combined volatility relative to an equity-only position.

However, the per-year series demonstrates that the correlation is not structurally stable. The 2010–2020 decade exhibits persistent negative correlation, but the sign and magnitude vary by a factor of five (from –0.71 to –0.14). The 2021–2024 period marks a qualitative break: the correlation approaches zero or turns positive, invalidating the hedging premise during those years. A categorical hedging strategy predicated on a stable –0.32 correlation would have experienced severe basis risk in 2022–2023, when both legs of the hedge declined simultaneously.

The dynamics suggest that the SPY–TLT correlation is regime-dependent rather than a time-invariant structural feature. In low-inflation, low-rate environments with episodic risk-off shocks (2010–2020), the negative correlation is strong and reliable. In a high-inflation, rising-rate regime with central bank tightening (2022–2023), the correlation collapses or reverses because both equities and long-duration bonds are negatively exposed to the same macro factor (real rates / discount rates). The 2024 near-zero estimate may reflect a transition or mixed regime.

The result does not support the hypothesis of a structurally stable negative correlation suitable for categorical hedging across all market regimes. It supports a weaker claim: the negative correlation is statistically significant and economically meaningful on average over 2010–2024, but it is time-varying and regime-sensitive, with periods (2022–2024) where the hedging relationship breaks down entirely. A hedging strategy relying on this correlation must incorporate regime detection or dynamic rebalancing; a static 60/40 or risk-parity allocation assuming constant –0.32 correlation would have been mis-specified during the recent tightening cycle.

The Spearman rank correlation (–0.28) is slightly weaker than the Pearson correlation (–0.32), indicating that the linear relationship is not purely driven by extreme outliers; the rank-order co-movement is also negative but somewhat less pronounced. This suggests the relationship is approximately linear in the central mass of the return distribution but may exhibit nonlinearity or asymmetry in the tails—a question the current analysis does not resolve.

Relation to the Literature

No closely related papers were retrieved for this computation, so the result stands on its own empirical evidence. The finding of a time-varying, regime-dependent equity–Treasury correlation is consistent with the broader empirical asset pricing literature on conditional correlations and flight-to-quality, though we do not cite specific studies here. The 2022–2024 correlation breakdown aligns with market commentary on the "death of the 60/40 portfolio" during the 2022 drawdown, when both stocks and bonds fell together—a phenomenon our per-year estimates quantify directly.

The absence of retrieved papers means we cannot position this result as confirming, extending, or contradicting prior work. Future research could compare these estimates to rolling-window correlations in earlier decades (e.g., the 1970s stagflation, when equity–bond correlations were also positive) or to conditional correlation models (DCC-GARCH, regime-switching) that explicitly model time variation. The current result is a descriptive, reduced-form measurement; it does not identify the structural drivers of the regime shift (monetary policy, inflation expectations, term premium dynamics) but documents the shift's existence and timing.

Limitations

  1. Sample period and regime coverage: The 2010–2024 window spans the post-financial-crisis recovery, the zero-lower-bound era, and the 2022–2023 tightening, but it does not include the 1970s–1980s high-inflation period, the 1990s–2000s tech bubble and housing cycle, or the 2008 crisis itself. The correlation structure may differ in those regimes. The result is specific to the past fifteen years and cannot be extrapolated to future regimes without additional evidence.

  2. In-sample per-year estimates: The per-year correlations are computed in-sample within each calendar year; they are not out-of-sample forecasts. A hedging strategy implemented in real time would not have known the 2022 correlation in advance. The per-year series is a diagnostic of time variation, not a validation of predictive power. An out-of-sample test would require a rolling-window or expanding-window design with a holdout period, which was not performed here.

  3. Universe and instrument choice: SPY and TLT are specific ETFs; the result may not generalize to other equity indices (Russell 2000, MSCI World) or other duration exposures (intermediate Treasuries, TIPS, corporate bonds). TLT's 20+ year duration makes it highly sensitive to rate changes; a shorter-duration Treasury ETF (e.g., IEF, 7–10 year) would likely exhibit a different correlation profile. The result is specific to the long-duration Treasury–large-cap equity pair.

  4. Daily frequency and return definition: The analysis uses daily log returns (or simple returns, depending on yfinance's adjusted-close calculation). Intraday correlations, weekly correlations, or correlations of drawdowns (rather than returns) could differ. The daily frequency captures short-term co-movement but may miss lower-frequency regime shifts that unfold over quarters or years.

  5. No structural model: The correlation is a reduced-form summary statistic; it does not decompose the co-movement into structural drivers (discount rate shocks, cash flow shocks, inflation surprises, monetary policy shocks). A structural VAR or factor model could identify which shocks drive the correlation and whether those shocks' variance shares change across regimes. The current result documents the correlation's time variation but does not explain it.

  6. Assumption of stationarity within years: The per-year recomputation assumes the correlation is stable within each calendar year, which may not hold—2020, for example, saw a sharp COVID drawdown in March followed by a rally, and the correlation within that year may have varied substantially across quarters. A finer-grained rolling-window analysis (e.g., 60-day or 120-day windows) would reveal intra-year variation.

  7. No tail-risk or asymmetry analysis: The Pearson and Spearman correlations measure central co-movement but do not characterize tail dependence (copula structure) or asymmetry (whether the correlation is stronger in down markets than up markets). A hedging strategy is most valuable in tail events; the current result does not test whether the negative correlation holds or strengthens in the left tail of the equity return distribution.

Strengthening the result would require: (a) extending the sample to earlier decades with different monetary regimes, (b) conducting an out-of-sample rolling-window test to assess real-time hedging efficacy, (c) comparing across equity indices and duration buckets, (d) decomposing the correlation into structural shocks, and (e) analyzing tail dependence and asymmetry. The current result establishes that the SPY–TLT correlation is significantly negative on average over 2010–2024 but time-varying and regime-dependent, with a breakdown in 2022–2024—a finding that is robust within its stated scope but not a universal law of equity–Treasury co-movement.


Research evidence, not investment advice.