Empirica
See our work
Sign inGet in touch

See our work

Anyone can show a good number.

What is hard — and what we sell — is the rigor that makes a number trustworthy: tested out-of-sample, checked against chance, pre-registered, forward-tested, and reported honestly even when it kills the idea. Below: how we do that, then a worked example or two for each service.

We have no public client case studies yet, so nothing here is a dressed-up client result. Each example is tagged either Reproducible (a real, public result you can re-run, linked to its source) or Illustrative (a representative walk-through of the method, no invented figures).

How we make a result trustworthy

The same checks, on every result we hand you

Out-of-sample, always

We never grade a method on the data it was built on. We hold data back — different periods, different cases — and only believe a result if it survives there.

Tested against chance

Could a coin-flip have done this? We shuffle the labels and re-run (a permutation test). If a result does not clearly beat the shuffle, it does not ship.

Deflated for how many we tried

Try enough ideas and one looks great by luck. We deflate every result for the number of attempts behind it, so we are not rewarding a lucky search.

Pre-registered and hash-frozen

Before we see an outcome, we write down the decision, the hypotheses, and the exact test — then freeze it with a dated SHA-256 hash. The answer cannot be quietly moved after the fact.

Forward-tested

Some claims only the calendar can settle. We pre-register the prediction and let it read out on a future date — not in hindsight.

Reported even when it fails

Our own files are full of “we tried this, it did not work.” That is the point — a firm that only ever shows wins is hiding its losses.

Data-bottleneck engagements

Point us at the data-heavy step that is slowing you down; we rebuild it and prove the gain.

Reproducible

Reproducing an expensive benchmark from public data

We took a publicly-visible third-party screening benchmark and matched 95.8% of its top-100 from public data alone — then published the full reproducibility package, every factor's correlation included. The point is not the number; it is that you can re-run it yourself.

See the method + reproducibility →
Reproducible

Forecasting demand — and proving the number is real

On a public dataset of 17,379 hourly demand records, we forecast a later stretch of hours the model never saw. Scored the easy, flattering way (a random split), it looks like R² 0.67, where 1.0 is perfect. Scored the honest way, out-of-time, it is 0.62 — and it cuts a naive hour-of-day baseline's error by about 6%. The proof it is signal, not luck: across 1,000 shuffled-label runs, not one did better (p ≈ 0.001). Same public data, one script, the same numbers every time.

The public dataset (UCI) →

More worked examples are being added for this service.

Decision-Grade Data Design

Design the dataset you should be building now for the decision you are about to make.

Reproducible

Pre-registering a decision so the answer cannot move

Before any data is touched, we freeze the decision, the hypotheses, the experiment and causal design, and the success test as a signed, dated, SHA-256 record. When the result lands there is no room to quietly re-cut it to taste — the freeze is the accountability.

See Decision-Grade Data →
Illustrative

Designing what to measure, before measuring it

Representative: a team about to run an experiment. We map the decision to its hypotheses, the causal design, the power needed to detect a real effect, and the pre-registration — so the data they collect is decision-grade, not vanity. Illustrative of the design process.

More worked examples are being added for this service.

Want this rigor on your problem?

Bring us the data-heavy step that is slowing you down, or the decision you need better data to make. We will scope it — and you will get the result with its working, not just the headline.

Start a projectGet a free Data Snapshot
About·Decision-Grade Data·See our work·Data Snapshot·Agent-Readiness Benchmark·Verify·Methodology·Evidence·Contact·Sign in·Privacy·Terms·Services terms
© 2026 Empirica Technologies Pty Ltd · ABN 76 698 226 247 · All rights reserved.
empiricaai.org