Empirica Course Lesson | Agent Economy Series
Overview: The API Consumption Stack
AI agents do not operate in isolation. Every autonomous task — from answering a question to executing a multi-step workflow — requires external services accessed via paid APIs. These services fall into four primary categories: inference, search, research, and compute. Understanding which categories agents consume most, and why, is foundational to designing cost-efficient agent architectures and identifying where infrastructure value accrues.
The consumption stack is not flat. Different agent types weight these categories differently, and the economics of each category vary substantially in pricing model, latency sensitivity, and substitutability.
Category 1: Inference APIs (LLM and Model Serving)
What they are: APIs that accept a prompt or input and return a model-generated output — text, embeddings, classifications, or structured data. Examples include hosted large language model (LLM) endpoints and multimodal model APIs.
Why agents consume them: - Every reasoning step in an agent loop typically requires at least one inference call. - Chain-of-thought, tool-use, and planning architectures multiply call volume: a single user request may trigger 5–20 inference calls internally. - Embedding APIs (a sub-type) power retrieval-augmented generation (RAG) pipelines, converting documents into searchable vector representations.
Pricing mechanics: - Billed per token (input + output), per request, or via reserved throughput tiers. - Output tokens are consistently priced higher than input tokens — agents that generate verbose intermediate reasoning pay a compounding cost. - Latency tiers exist: real-time inference costs more than batch/async inference by a factor of 2–5× in most provider pricing structures.
Key cost driver: Call frequency. Agentic loops with no call-count discipline can exhaust budgets rapidly. Caching repeated prompts and using smaller models for sub-tasks are the primary mitigation strategies.
Category 2: Search APIs (Real-time Data and Indexing)
What they are: APIs that return current, indexed information from the web, news feeds, financial data streams, or domain-specific databases. These are distinct from static knowledge stored in model weights.
Why agents consume them: - LLM training data has a knowledge cutoff. Any task requiring current information — prices, events, regulatory changes, recent publications — requires a live search call. - Agents operating in dynamic environments (trading, monitoring, compliance) may issue search calls on every decision cycle. - Web search APIs, news APIs, and financial data APIs are the dominant sub-types.
Pricing mechanics: - Typically billed per query, with volume discounts at scale. - Premium tiers offer higher result counts, structured metadata, or real-time indexing latency. - Some providers charge separately for index freshness (near-real-time vs. daily-updated).
Key cost driver: Query volume and result depth. Agents that search broadly before narrowing — rather than issuing targeted queries — generate unnecessary spend. Structured query design is a direct cost-control lever.
Relationship to inference: Search and inference are tightly coupled in RAG architectures. A search call retrieves context; an inference call processes it. The two categories co-scale.
Category 3: Research APIs (Structured Knowledge and Discovery)
What they are: APIs providing access to structured, curated, or expert-validated knowledge — academic literature, patent databases, financial filings, scientific datasets, and structured notes or reports. This category is distinct from general web search in that the underlying data is organised, sourced, and often licensed.
Why agents consume them: - General web search returns noisy, unverified content. Research APIs return citable, structured, high-signal information. - Agents performing analysis, due diligence, literature review, or evidence synthesis require sources they can trust and attribute. - Structured output formats (JSON, standardised schemas) reduce the parsing burden on the agent, lowering downstream inference costs.
Pricing mechanics: - Subscription-based (flat monthly access) or per-query/per-record billing. - Tiered by corpus size, update frequency, and output format richness. - Some providers offer agent-specific pricing — machine-readable endpoints priced differently from human-facing interfaces.
Key cost driver: Coverage and freshness requirements. An agent needing broad coverage across multiple domains may require multiple research API subscriptions, each with its own cost structure.
Empirica's positioning: Empirica operates in this category — providing structured, agent-readable research outputs designed for machine consumption. The value proposition is reducing the inference overhead agents would otherwise spend parsing and validating unstructured web content. (See the published note: Empirica's Positioning in the Agent Economy.)
Category 4: Compute APIs (Processing and Orchestration)
What they are: APIs providing raw computational resources — cloud functions, containerised execution environments, data processing pipelines, vector database operations, and workflow orchestration services.
Why agents consume them: - Agents that manipulate files, run code, process images, or execute multi-step pipelines need compute beyond what an LLM inference call provides. - Vector databases (a compute-adjacent service) store and retrieve embeddings at scale — essential for long-term agent memory. - Orchestration APIs manage agent state, task queues, and inter-agent communication in multi-agent systems.
Pricing mechanics: - Billed by CPU/GPU time, memory usage, storage volume, or API call count depending on service type. - Vector database costs scale with index size and query throughput. - Serverless compute (pay-per-execution) suits bursty agent workloads; reserved instances suit sustained throughput.
Key cost driver: Data volume and task complexity. Agents processing large documents, running simulations, or maintaining large memory stores generate substantial compute costs independent of inference spend.
Consumption Patterns: Which Categories Dominate?
Consumption dominance varies by agent type, but inference consistently represents the largest share of API spend across most agent architectures:
| Agent Type | Inference | Search | Research | Compute |
|---|---|---|---|---|
| Conversational assistant | Very High | Medium | Low | Low |
| Research/analysis agent | High | High | High | Medium |
| Trading/monitoring agent | Medium | Very High | Medium | Medium |
| Code/automation agent | High | Low | Low | Very High |
| Multi-agent orchestrator | High | Medium | Medium | High |
Key observations: - Inference is the baseline cost for all agent types — no agent operates without it. - Search becomes co-dominant when the agent's task requires current or external information. - Research APIs are disproportionately valuable for analysis-heavy agents because they reduce the inference calls needed to validate and structure information. - Compute costs are often underestimated in early agent design but scale sharply as agents handle larger data volumes or more complex workflows.
Cost-Benefit Analysis by Agent Type
Conversational agents: Inference dominates. Cost optimisation focuses on model selection (using smaller models for simple turns) and prompt caching. Search and research APIs add marginal cost but can significantly improve output quality for factual queries.
Research and analysis agents: All four categories are material. The key trade-off is between search API breadth (high query volume, noisy results) and research API depth (lower volume, higher per-query cost, higher signal). Research APIs typically deliver better cost-per-insight for structured analytical tasks.
Trading and monitoring agents: Search API costs can exceed inference costs when agents poll data feeds continuously. Architectural choices — push vs. pull data delivery, webhook-based triggers vs. polling — have large cost implications.
Code and automation agents: Compute costs are the primary variable. Inference costs are bounded by task complexity; compute costs scale with execution time and data volume. Serverless architectures reduce idle spend.
Multi-agent systems: All categories compound. Orchestration overhead (compute) adds a fifth cost layer. Delegation economics matter: assigning tasks to specialised subagents with targeted API access is more cost-efficient than routing all tasks through a general-purpose agent with broad API subscriptions.
Integration with Empirica's Positioning
Empirica's research API sits at the intersection of the search and research categories — providing structured, machine-readable outputs that reduce the total API spend an agent would otherwise incur:
- Replaces multiple noisy search calls with a single structured research query.
- Reduces inference overhead by delivering pre-processed, validated content rather than raw text requiring agent-side parsing.
- Supports agent-readable formats (structured notes, JSON-compatible outputs) that integrate directly into agent pipelines without additional transformation steps.
For agents performing knowledge-intensive tasks, this substitution effect — fewer search calls, less inference processing — can materially reduce total API cost while improving output reliability.
Practical Framework: Choosing API Categories
When designing an agent's API stack, apply this decision sequence:
-
Define the task type. Is the agent primarily reasoning, retrieving, analysing, or executing? This determines which category is load-bearing.
-
Identify freshness requirements. Tasks requiring current information mandate search APIs. Tasks requiring validated, structured knowledge favour research APIs. Tasks using only stable knowledge may rely on model weights alone.
-
Estimate call volume per task. Multiply expected calls per category by per-call cost. Identify which category dominates before committing to an architecture.
-
Evaluate substitution opportunities. Can a research API replace multiple search calls? Can a smaller inference model handle sub-tasks? Can compute be batched to reduce per-unit cost?
-
Build in cost monitoring. API spend in agentic systems is non-linear — a single runaway loop can generate costs orders of magnitude above baseline. Per-category spend limits and circuit breakers are operational necessities, not optional features.
-
Reassess at scale. Cost structures that are acceptable at low volume often become unsustainable at production scale. Volume pricing tiers, reserved capacity, and provider negotiation become relevant above certain thresholds.
Future Trends and Market Evolution
Inference cost compression: Model serving costs have declined substantially as hardware efficiency improves and competition among providers increases. This trend is likely to continue, shifting the relative weight of inference in total API spend downward over time. Search and research APIs may become proportionally more significant as inference becomes cheaper.
Specialised agent APIs: Providers are beginning to offer APIs designed explicitly for agent consumption — with features like stateful sessions, tool-use schemas, and structured output guarantees. These reduce the integration overhead agents currently absorb.
Vertical research APIs: Domain-specific research APIs (legal, biomedical, financial, scientific) are expanding. Agents operating in regulated or knowledge-intensive domains will increasingly rely on these rather than general web search, for both quality and compliance reasons.
Compute-inference convergence: The boundary between inference and compute is blurring as model providers offer code execution, data processing, and tool-use within inference endpoints. This may simplify agent architectures but will require new cost accounting frameworks.
Agent-to-agent API markets: As multi-agent systems mature, agents will increasingly consume APIs provided by other agents — capability markets where specialised agents sell services to orchestrators. This creates a new pricing layer on top of the existing four categories.
Summary
| Category | Primary billing unit | Dominant agent type | Key cost lever |
|---|---|---|---|
| Inference | Tokens | All agents | Call frequency, model size |
| Search | Queries | Data-dependent agents | Query targeting, polling frequency |
| Research | Queries / subscription | Analysis agents | Coverage vs. depth trade-off |
| Compute | CPU/memory/time | Automation agents | Batching, serverless vs. reserved |
Inference is the universal baseline. Search and research APIs are the primary differentiators for knowledge-intensive agents. Compute costs are the most frequently underestimated category. Effective agent economics require explicit modelling of all four categories — and active architectural choices to manage each.