Radical transparency

The Empirica pipeline, fully open

Empirica is a new, founder-led firm. We don't have decades of institutional track record to point at — so instead of implying one, we're doing the thing an established firm won't: publishing the pipeline itself. The architecture, the agent prompts, the validator configuration, the failure rates, the costs. In stages, verified as we go.

How to read this page

This is a disclosure in progress, and we'd rather show the seams than pretend it's finished. Each section below is in one of two states:

Published — the artifact is on this page (or linked), and any number in it is computed from data we hold or cited to a checkable source.
Coming— clearly marked amber blocks naming exactly what we still have to extract, verify, and publish. We publish nothing in those slots until it's real.

What will never appear here: API keys and credentials, live trading state, or the private impact-scoring work — the line between “open methodology” and “leaked operations” is deliberate and documented.

Architecture — what the machine is

Partially published

The shape of the system, in one paragraph: a fleet of always-on research agents rotates across domains (agent economy, mathematics, quantitative methods, strategy), generating research topics, searching arXiv and OpenAlex, and producing cited syntheses. Every output — ours or an external submission — then passes through a separate validation pipeline before anything goes public. A deterministic content gate runs first (template leaks, duplicate headings, banned advice-shaped language, non-research output), then the three-check LLM validator (logic, empirical, depth) documented on the scoring page. Validated work is published to Notes, Publications, or Courses; everything else is rejected with per-check feedback.

Coming — not yet published

Next in this slot: the full architecture diagram — every agent, queue, store, and gate, drawn from the live infrastructure rather than from memory, with a plain-English walkthrough.

Agent prompts — the actual instructions

Coming

The research agents are LLM pipelines, and an LLM pipeline is its prompts. Publishing them shows you what the agents are told to do — and just as importantly, what they are forbidden from doing: inventing citations, stating numbers that don't come from a real computation, framing research findings as investment advice. We intend to publish the synthesis prompts for each research domain, with redactions limited to security-relevant internals (and each redaction marked, not silent).

Coming — not yet published

Next in this slot: the verbatim system prompts for each research domain's synthesis step, with marked-and-explained redactions only where a string is security-relevant.

Validator configuration — the bar, in numbers

Partially published

The parts already public: three independent checks (logic, empirical, depth) with per-check pass floors, documented with thresholds on the scoring page. On top of that sits a publish floor: a validation score of at least 80 is required for publication under the Empirica brand, work scoring 65–79 can appear only in a clearly-labelled working-notes band, and below 65 nothing goes public at all. A validator pass below the floor still does not publish.

What we deliberately do not publish: the exact trigger strings of the deterministic gates. Publishing those would tell a low-effort submitter precisely which phrases to strip to dodge the gate — it would weaken the bar rather than evidence it. We say so here instead of quietly omitting it.

Coming — not yet published

Next in this slot: the validator's full configuration surface — model, check prompts, threshold table, and the decision-combination logic — extracted from the running pipeline and kept in sync with it.

Failure rates — what the bar actually rejects

Coming

A validation pipeline that never rejects anything is theatre. Ours rejects a substantial share of what our own agents produce — and the honest version of that claim is a table, not an adjective: outputs generated vs published, rejection reasons by category, and how the rates move over time. Every number in that table will be computed from our own output index and reproducible, per our verified-statistics rule. Until the table is built and checked, we give you no number here at all.

Coming — not yet published

Next in this slot: the generated-vs-published table: counts and rejection-reason breakdown computed directly from the output index, with the query published alongside the numbers.

Costs — what it takes to run

Coming

Autonomous research has a real unit cost: model tokens, compute, storage. Every LLM call in the fleet is cost-tracked, so we can publish what a research cycle actually costs — and what that implies about the economics of validated machine-generated research. This matters to anyone evaluating whether our pricing is honest, and to anyone building a similar pipeline.

Coming — not yet published

Next in this slot: the operating-cost disclosure: daily fleet burn and approximate cost-per-published-output, computed from the live cost-tracking metrics, with the founder deciding the publication cadence.

Why open the pipeline

Because we can't ask for trust on track record yet

An established research firm earns trust on years of output. A new one has two options: imply a maturity it doesn't have, or show its working. We've chosen the second. The pipeline is the firm — if the architecture is sound, the prompts are disciplined, the validator genuinely rejects weak work, and the costs are sustainable, you can judge that directly instead of taking our word for it.

This page will fill in section by section. Where a section is still amber, that's the honest state of the disclosure — not a teaser for content that secretly exists.

Read the scoring rubric →The eight trust standards Empirica Score reproducibility package →

The Empirica pipeline, fully open

Because we can't ask for trust on track record yet

This page will fill in section by section. Where a section is still amber, that's the honest state of the disclosure — not a teaser for content that secretly exists.