Reproducibility package

The package behind the 95.8%

The Empirica Score's Phase A claim — 95.8% and 94.9% top-100 overlap with a third-party scoring benchmark on two held-out snapshots, R² 0.859 across 800 paired observations — is real, computed by us, from our own validation run. It is also, today, self-reported: no one outside the firm has re-run it. This page is the package that closes that gap, artifact by artifact, with the status of each shown honestly.

The claim this package backs

Top-100 overlap: 95.8% (2026-05-22 snapshot) and 94.9% (2026-05-12 snapshot) against a publicly-visible third-party scoring benchmark, both held out of the fit, both within the same two-week window. Pass bar pre-set at 60%.
Fit quality:R² 0.859 reproducing that benchmark's scores across 800 paired observations over 8 daily snapshots.
Scope: a replication test on free data — how closely an open model reproduces an existing paid screen. Not a forward-return or alpha claim.
Epistemic status: self-reported, pending external reproduction. The label comes off when an independent party has run the protocol below — not before.

The package

Five artifacts, status per artifact

Methodology description

Published

The full scoring formula — three factor proxies (RS, TA, FA), OLS blend, index-membership and sector terms, fitted weights — with the fit window and refit date.

Live on /empirica-score (formula at a glance + methodology section).

Source code

Pending

The three proxy implementations and the regression that combines them, runnable end-to-end against any 8 daily snapshots.

Exists in the private repository (scripts/empirica_score/). Public release requires a licensing and security pass — until it ships, "independently testable" is a design property, not a public fact.

Input data snapshots

Pending

The 8 daily snapshots (800 paired observations) behind the regression, plus the two held-out snapshots behind the overlap numbers — or, where the third-party benchmark's licensing forbids redistribution, our derived per-snapshot artifacts and an exact description of how to rebuild the input from public sources.

Licensing review decides redistribution vs rebuild-recipe. Either way the gap will be documented, not papered over.

Validation-run protocol

Pending

A step-by-step protocol an outside party can follow to reproduce Phase A: environment, data acquisition, fit, hold-out scoring, and the exact overlap computation — pre-committed so the test can't quietly move after the fact.

Drafted from the internal run; needs an external dry-run before we call it reproducible by someone who isn't us.

External reproduction report

Pending

At least one independent party runs the protocol and publishes their numbers, whatever they are. This is the artifact that retires the "self-reported" label — nothing else does.

Open invitation: if you want to be the reproducer, contact us — we'll support the run and link the result here, agree or disagree.

Why we publish the gap

A claim you can't re-run is a weaker claim

We could have left the overlap numbers on the landing page and said nothing about who has verified them. Most firms do. But the entire premise of Empirica is that research claims should be checkable — we hard-fail submissions for citations that can't be verified, so our own headline number doesn't get a softer standard. Until the external reproduction report exists, the honest description of Phase A is: a real, internally-reproducible result, published with enough detail to audit the method, not yet independently confirmed.

Each pending artifact above names its blocker. As they ship, the badges flip — and the page history is the record that we shipped them rather than quietly redefining the claim.

Back to the Empirica Score →The pipeline, fully open The eight trust standards →

The package behind the 95.8%

Five artifacts, status per artifact

Methodology description

Published

The full scoring formula — three factor proxies (RS, TA, FA), OLS blend, index-membership and sector terms, fitted weights — with the fit window and refit date.

Live on /empirica-score (formula at a glance + methodology section).

Source code

Pending

The three proxy implementations and the regression that combines them, runnable end-to-end against any 8 daily snapshots.

Exists in the private repository (scripts/empirica_score/). Public release requires a licensing and security pass — until it ships, "independently testable" is a design property, not a public fact.

Input data snapshots

Pending

Licensing review decides redistribution vs rebuild-recipe. Either way the gap will be documented, not papered over.

Validation-run protocol

Pending

Drafted from the internal run; needs an external dry-run before we call it reproducible by someone who isn't us.

External reproduction report

Pending

At least one independent party runs the protocol and publishes their numbers, whatever they are. This is the artifact that retires the "self-reported" label — nothing else does.

Open invitation: if you want to be the reproducer, contact us — we'll support the run and link the result here, agree or disagree.

A claim you can't re-run is a weaker claim

Each pending artifact above names its blocker. As they ship, the badges flip — and the page history is the record that we shipped them rather than quietly redefining the claim.