Empirica Technologies

Agent Memory and Knowledge Markets: Cost Structures, Marketplaces, and Arbitrage in Agent-Readable Information

1. Overview

Autonomous agents face a three-way decision every time they encounter a knowledge gap: retrieve from an external source per query, fine-tune the base model on a domain corpus, or rely on parametric memory baked in during pre-training. Each path has a radically different cost curve, freshness profile, and licensing exposure. As agent fleets scale into millions of daily queries, the economics of this choice now dominate operating budgets — often exceeding raw inference spend by 2-5×. This note extends Prior work on agent memory markets (score 62) and LLM API cost structure (score 80) by quantifying per-token retrieval economics, mapping the nascent agent-readable knowledge marketplace, and identifying arbitrage opportunities where structured research is mispriced relative to its downstream inference value.

2. Key findings

Retrieval beats fine-tuning for volatile knowledge. Meta-analysis of 20 biomedical RAG vs baseline LLM studies found a pooled odds ratio of 1.35 (95% CI 1.19–1.53) favouring retrieval for accuracy on knowledge-intensive tasks [P5]. The freshness advantage is structural: RAG indices update in minutes; fine-tuning runs cost thousands and decay the moment new facts arrive [P1, P4].
Hallucination cost is real and measurable. Retrieval-augmented dialogue systems substantially reduce hallucination relative to closed-book baselines [P2]. For agents whose output triggers downstream economic action (trades, purchases, code commits), the cost of a hallucinated fact is the full cost of the reversed action — often $10–$10,000 per incident — which dominates the marginal retrieval cost of a few cents.
Knowledge graph augmentation closes the long-tail gap. MedRAG demonstrates that KG-elicited reasoning on a four-tier hierarchical diagnostic graph improves specificity on near-duplicate disease classes where flat vector retrieval fails [P3]. This implies agent knowledge markets will bifurcate: cheap commodity embeddings vs premium structured/graph data with 5-50× price multiples.
Per-token retrieval economics (current pricing, cited by URL):
- OpenAI text-embedding-3-small: $0.02 per 1M tokens (https://openai.com/api/pricing/)
- OpenAI text-embedding-3-large: $0.13 per 1M tokens (https://openai.com/api/pricing/)
- Voyage voyage-3: $0.06 per 1M tokens (https://docs.voyageai.com/docs/pricing)
- Cohere embed-english-v3: $0.10 per 1M tokens (https://cohere.com/pricing)
- Pinecone serverless: ~$0.33 per 1M read units, ~$4.00 per 1M write units (https://www.pinecone.io/pricing/)
- Anthropic prompt caching read: 10% of base input cost; write: 125% (https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
Fine-tuning crossover point. With OpenAI GPT-4o fine-tuning at $25/M training tokens and inference at $3.75/M input / $15/M output (https://openai.com/api/pricing/), the breakeven against RAG occurs only when (a) the same domain context is reused across millions of queries AND (b) the knowledge is stable for >90 days. For agents working in fast-moving domains (markets, news, code APIs), this threshold is essentially never reached.
Data monetisation as a discipline remains under-theorised. Empirical study of firms attempting to sell data assets identified organisation type, data characteristics, privacy and security as the binding constraints on moving from internal use to external monetisation [P9]. The same constraints now apply to firms selling to agents — with the privacy dimension intensified by regulatory developments [P7, P10].
Platformisation reshapes who captures the value. As argued in cultural-production research, platform infrastructures replace two-sided markets with multi-sided configurations where the platform dictates pricing, curation, and access [P6]. (speculative) Agent knowledge markets are following the same trajectory — Bloomberg Terminal, Refinitiv, S&P CapIQ act as the early "platforms" with agent-readable APIs emerging as a contested second layer.

3. Agent service patterns — what agents are buying and why

3.1 The four cost regimes of agent knowledge

Acquisition mode	Marginal cost per query	Fixed setup	Freshness	Best use
Parametric (base model only)	$0 retrieval; full inference cost	$0	Stale (model cutoff)	Stable general knowledge
RAG over public corpus	$0.0001–$0.002 (embedding + vector lookup + context tokens)	Index build: $10–$1000	Minutes	Volatile facts, citations needed
Fine-tuning	Inference + ~25% context savings	$1k–$100k+	Frozen at training	Stable domain style/format
Knowledge marketplace API	$0.001–$1.00 per query	$0–$30/mo subscription	Real-time	High-signal proprietary data

The fourth row is where Empirica operates. The marketplace tier sits 10-1000× above raw RAG because the value lies not in the tokens but in the curatorial filter: the cost of agents independently crawling, deduping, ranking, and structuring equivalent intelligence would be 50-500× higher than per-query access.

Agent memory and knowledge markets — how agents acquire, store, and monetise information

Agent Memory and Knowledge Markets: Cost Structures, Marketplaces, and Arbitrage in Agent-Readable Information

1. Overview

2. Key findings

3. Agent service patterns — what agents are buying and why

3.1 The four cost regimes of agent knowledge

Subscribe to read the full publication