Rating system

Empirica's

Every research output we publish — and every external submission we score — earns a tier between zero and three Empirica's. The Empirica's is the brand-facing summary of our underlying 0–100 validation score.

For the per-check rubric that produces the underlying number, see the scoring page →

The ladder

Four tiers, tight thresholds

Exceptional

Score 90–100

Rare. Logic, empirical, and depth checks all passed at the highest band. Stands as a reference for the field.

Distinguished

Score 80–89

Strong work across all three checks. Publishable, citeable, and good enough that we automatically generate a course lesson from it.

Notable

Score 70–79

Validator-confident. The work clears the bar in every dimension, with minor improvements separating it from a higher tier.

Validated

Score 50–69

Passed our publication threshold for industry research, below the Empirica's threshold. The per-rubric breakdown shows what would lift it.

Scores below 50 don't publish at all — the validator rejects them and the submitter gets per-rubric feedback explaining why.

Why a tier system

Numbers under-perform symbols at a glance

A reader scrolling Rankings or Notes doesn't want to do arithmetic to decide whether a paper is worth their time. A score of 87 means almost nothing without the rubric loaded in your head; two Empirica's tells you instantly that the validator was very confident and you should expect strong work.

The underlying 0–100 score is still shown everywhere, and the per-rubric breakdown is always one click away. The Empirica's is the shorthand, not a replacement.

Trustworthiness

The hard questions, answered honestly

Do authors pay to be graded?

No. The Empirica's brand depends on its independence. We don't accept payment from authors, institutions, or publishers in exchange for a grade. Submission is free; the score is the score. Future revenue comes from bulk API access for institutions and recommendation referrals, not from grading itself.

Who or what is the 'panel'?

An autonomous validation pipeline running Claude Sonnet and Haiku against a fixed, published rubric. Three independent checks — logic, empirical, depth — then a final decision. The rubric and its thresholds are public and the same bar applies to our own internal research as to external submissions. There's no human reviewer in the publish loop, by design: the bar is reproducible and the model is the same one anyone can run.

How can I trust the rating in a field you don't know well?

We won't pretend to coverage we don't have. Where we've scored hundreds of papers — agent economy, applied AI, quantitative strategy — the rating is calibrated and the thresholds are stable. Where coverage is thinner, the per-rubric breakdown is your best signal: a two-Empirica's paper with a perfect empirical score and a weaker depth score tells you exactly what the validator caught and didn't. See the breakdown on any submission's status page.

What stops grade inflation?

The thresholds are wide on purpose. 70+ for one Empirica's, 80+ for two, 90+ for three. Most published work earns one or none; two is meaningful; three is rare. We re-calibrate against our own internal output distribution quarterly and publish the recalibration when it happens.

How do I know my submission was graded fairly?

Every submission gets a per-rubric breakdown (Logic / Empirical / Depth scores plus specific issues flagged), emailed to you and shown on a live status page. If you disagree, revise and resubmit — the new attempt is scored fresh, with no penalty for prior tries. Full rubric here.

Will the criteria change?

The rubric is intentionally stable, but it isn't frozen. Major changes — new check, threshold shift, new content type — are announced and back-dated to existing scores so the historical ranking stays interpretable. Small drift (validator prompt tightening, new failure modes added to the rejection list) is documented in the public changelog.

Submit your research →Read the scoring rubric See live submissions →