1. Overview
Autonomous agents are emerging as a distinct buyer class for structured knowledge products, with consumption patterns materially different from human analysts. Where humans browse, scan, and synthesise across heterogeneous formats, agents prefer machine-addressable JSON, semantic tags, deterministic schemas, and predictable update cadences — and they pay for retrieval efficiency per token rather than editorial prestige. This note characterises that emerging market: what agents purchase, how they consume it, and what pricing signals reveal about their reasoning architectures.
2. Key Findings
- Structured-first consumption dominates agent retrieval. The FAIR Principles [P6] — Findable, Accessible, Interoperable, Reusable — were authored with machine consumers explicitly in mind, and the principles map almost one-to-one onto the design requirements of agent-readable research APIs (stable identifiers, formal vocabularies, standardised access protocols). Agent buyers consistently prefer endpoints that satisfy FAIR over PDF or HTML equivalents because each non-structured byte consumes context budget without contributing to downstream reasoning [P6].
- Federated data-sharing precedents (GA4GH) demonstrate willingness-to-pay for governance + schema, not raw bits. The GA4GH framework [P8] shows that institutional buyers (which agents proxy for) value standardised schemas, provenance tracking, and access frameworks more than the underlying data volume. Translating: agent buyers of research will pay a premium for governance metadata (timestamp of last update, confidence tags, source URLs) at least as much as the prose itself.
- Context window economics define agent willingness-to-pay. At GPT-4-class pricing of approximately $2.50–$10 per million input tokens and $10–$30 per million output tokens (OpenAI pricing — https://openai.com/api/pricing/; Anthropic pricing — https://www.anthropic.com/pricing), an agent processing a 50-page PDF spends $0.05–$0.50 in token cost per read. A pre-structured 2KB JSON note delivers the same decision-relevant information at <$0.001. The 50–500× ratio is the dominant economic driver of structured-research demand.
- Multi-agent resource allocation theory predicts emergent knowledge auctions. Chevaleyre's survey [P1] formalises how autonomous agents with preferences over allocations bid for shared resources; applied to research, this implies agent fleets will develop internal bidding mechanisms for who reads what when subscription seats are scarce. Empirica's API-keyed model bypasses that scarcity entirely by metering on query, not seat.
- Platform theory suggests research APIs become two-sided over time. Gawer's integrative framework [P2] argues platforms federate constitutive agents around modular cores. A research API that begins as a one-sided publisher-to-agent feed naturally evolves into a marketplace where third-party agents contribute structured findings — analogous to how arXiv evolved into a substrate for downstream services. (speculative)
- Big-data architecture lessons apply directly. The big-data tutorial in [P7] highlights that real-time and semi-structured handling is the dominant cost driver in modern analytics stacks; agent research consumers face exactly this profile (semi-structured, freshness-sensitive, high query rate), validating API-first delivery over batch publication.
- Materials-science data infrastructure provides a useful analogue. [P9] documents how data-driven materials science only matured once standardised, queryable databases (e.g., Materials Project, NOMAD) replaced PDF-locked supplementary tables. The same transition is now happening for general research consumed by agents — PDFs become a legacy format; queryable notes become primary.
3. Agent Service Patterns: What Agents Actually Buy
Structured note retrieval (highest volume). Agents pull short, semantically tagged research units (200–2,000 words equivalent in JSON) to inject into a reasoning context. The decisive features: