Agent Memory and Knowledge Markets: How Agents Acquire, Store, and Monetise Information

1. Overview

The agent economy is rapidly developing a tiered market for memory and knowledge services, in which autonomous systems purchase external context (retrieval, freshness, structured facts) rather than rely solely on parametric weights. Retrieval-augmented generation has shifted from a research technique to commercial infrastructure: vector databases, embedding APIs, web-search APIs, and structured research feeds now compete on freshness, latency, cost-per-query, and machine-readability [P3][P4]. The result is an emerging four-layer stack — embedding/storage, retrieval/search, curated knowledge feeds, and agent-to-agent licensing — each with distinct pricing dynamics and staleness economics.

2. Key findings

  • RAG is now the dominant external-memory pattern for production agents. The canonical survey of RAG distinguishes Naive, Advanced, and Modular RAG architectures, with Modular RAG dominating agent deployments because it permits specialised retrievers, rerankers, and memory stores to be priced and swapped independently [P3]. This modularity is what makes a market possible: each module is a buyable service.
  • Memory in agentic systems is multi-typed and not satisfied by raw vector search. Conversational-agent memory research identifies at least four memory classes — semantic, episodic, procedural, and emotional — and shows vector embeddings alone fail to capture extended context, particularly for agentic workflows requiring temporal reasoning [P4]. [EMPIRICA ANALYSIS] This implies the knowledge market will fragment by memory type, not just by domain.
  • Retrieved-context composition materially changes output quality. Cuconasu et al. demonstrate that the position, number, and even the noise content of retrieved passages alters RAG performance in counter-intuitive ways: irrelevant-but-related passages can sometimes improve generation, while topically adjacent distractors degrade it [P7]. This means knowledge sellers compete not only on raw recall but on retrieval-shaped delivery — a service dimension Empirica's pre-structured notes naturally exploit.
  • Internet-augmented dialogue beats static retrieval for freshness-sensitive tasks. Komeili et al. showed that learning to issue search queries on demand outperforms FAISS-indexed memory on knowledge-driven conversation, establishing the empirical basis for paid live-search APIs (Brave, Tavily, Exa, Perplexity) over purely cached vector stores [P6].
  • Vector storage is commoditising; retrieval quality and freshness are not. Public pricing (Pinecone — https://www.pinecone.io/pricing/, Weaviate Cloud — https://weaviate.io/pricing, Turbopuffer — https://turbopuffer.com/pricing, Qdrant Cloud — https://qdrant.tech/pricing/) shows storage trending toward sub-$0.10 per GB-month for serverless tiers, with the dominant cost shifting to queries ($0.40–$4.00 per million reads) and writes ($2–$4 per million). [EMPIRICA ANALYSIS] Storage as a margin business is dying; retrieval-as-judgement (rerankers, hybrid search, freshness SLAs) is where pricing power concentrates.
  • Embedding costs have collapsed an order of magnitude in 18 months. OpenAI text-embedding-3-small at $0.02/1M tokens (https://openai.com/api/pricing/), Voyage and Cohere rerankers at roughly $0.05–$0.10 per 1K searches (https://cohere.com/pricing) — the marginal cost of indexing a corpus has dropped to the point where agents can routinely rebuild memory on each session rather than persist it. This re-prices the make-vs-buy decision for agent memory.
  • Live web-search APIs price by query, not by token. Tavily ($0.008/search), Exa (https://exa.ai/pricing — tiered from ~$0.005/search to $0.025 for neural retrieval), Brave Search API ($3–$9/1K queries — https://brave.com/search/api/), Perplexity Sonar (https://docs.perplexity.ai/guides/pricing). The per-query model creates a fixed-cost-per-knowledge-event, which makes freshness arbitrage tractable: agents can decide whether a question warrants a fresh fetch or a cached lookup.
  • Knowledge staleness has measurable economic cost. The RAG survey explicitly identifies outdated parametric knowledge as a primary driver of hallucination [P3], and broader LLM surveys note that retrieval is the principal mechanism for continuous knowledge update [P2][P5]. [SPECULATIVE] A reasonable industry estimate is that for time-sensitive verticals (markets, regulation, technology releases), knowledge older than 30 days carries an error-rate premium of 2–5× compared to <7-day-old sources — a premium that justifies subscription fees for freshness-guaranteed feeds.

3. Agent service patterns — what agents buy and why

Layer 1 — Embedding + storage (commodity). Agents buy embeddings (OpenAI, Voyage, Cohere, open-weights via Together) and vector storage (Pinecone, Weaviate, Qdrant, pgvector, Turbopuffer). The buy decision is straightforward: in-house embedding infrastructure rarely beats $0.02/1M tokens once amortised. Storage is bought when the corpus exceeds ~1M vectors or queries exceed ~100/sec; below that, SQLite + sqlite-vec or DuckDB is increasingly preferred. [EMPIRICA ANALYSIS] This layer has near-zero gross margin for new entrants.