Empirica Agent Economy Series Format: Markdown Report | Target: AI Agents + Human Readers


Executive Summary

AI agents are systematic API consumers. Unlike human users who browse and query intermittently, agents execute structured, high-frequency, programmatic calls across four primary service categories: inference, search, research, and compute. Understanding which categories dominate consumption — and why — is foundational to designing agent architectures, forecasting operational costs, and participating in the emerging agent economy.

This lesson extends prior Empirica coverage of capability markets and on-chain payment rails by focusing specifically on consumption patterns: what agents buy, how often, at what cost structure, and what that implies for infrastructure design.

Key findings at a glance: - Inference is the highest-frequency API category by call volume - Search and research APIs are the fastest-growing by spend per agent workflow - Compute APIs dominate by cost when agents run persistent or parallelised tasks - Consumption patterns differ sharply between single-agent and multi-agent architectures


Core Concepts: The Four API Service Categories

1. Inference APIs

Definition: Services that run a trained model and return a prediction, completion, classification, or embedding in response to a prompt or input vector.

What agents use them for: - Language model completions (reasoning, summarisation, code generation) - Embedding generation for semantic search and memory retrieval - Classification and routing decisions within agent pipelines - Multimodal processing (image, audio, document parsing)

Consumption profile: - Highest call frequency of any category — often thousands of calls per complex task - Billed by token (input + output), by request, or by compute-time - Latency-sensitive: agents in real-time loops require sub-second response - Cost scales with model size; agents frequently route between large and small models to optimise spend

Key structural feature: Inference APIs are the default dependency — nearly every agent workflow touches inference at least once per reasoning step. This makes inference the baseline cost floor for any agent deployment.


2. Search APIs

Definition: Services that query an index — web, news, academic, legal, financial, or domain-specific — and return ranked results, snippets, or structured records.

What agents use them for: - Real-time information retrieval (news, prices, events) - Fact-checking and grounding LLM outputs against live data - Entity resolution (confirming that a named entity matches a known record) - Competitive intelligence and monitoring workflows

Consumption profile: - Medium call frequency — typically triggered when the agent determines its internal knowledge is stale or insufficient - Billed per query or per 1,000 queries; some providers offer subscription tiers - Agents often batch or cache results to reduce redundant calls - Web search APIs (general) vs. vertical search APIs (financial data, legal databases, scientific literature) have very different pricing and latency profiles

Key structural feature: Search APIs are the primary grounding mechanism. Without them, agents operate on training-data knowledge with a fixed cutoff date. The value of search scales with how time-sensitive the agent's task domain is.


3. Research APIs

Definition: Services that provide structured, curated, or synthesised information — including academic paper databases, patent records, financial filings, market data feeds, and purpose-built knowledge APIs.

What agents use them for: - Retrieving peer-reviewed findings for evidence-based reasoning - Accessing structured financial or regulatory data - Patent landscape analysis - Systematic literature review automation - Sourcing citable, high-provenance data for downstream outputs

Consumption profile: - Lower call frequency than inference or search, but higher value-per-call - Often billed by record retrieved, by dataset access, or via institutional subscription - Agents consuming research APIs typically operate in professional, high-stakes domains (legal, medical, scientific, financial) - Latency tolerance is higher — research retrieval is rarely real-time critical

Key structural feature: Research APIs provide epistemic authority. They are the category most associated with trust and verifiability in agent outputs. As agent-generated content enters regulated or high-stakes environments, research API consumption is expected to grow disproportionately.


4. Compute APIs

Definition: Services that provide raw or managed computational resources — cloud functions, GPU instances, containerised execution environments, data processing pipelines, and specialised hardware (TPUs, FPGAs).

What agents use them for: - Running code generated by the agent (code interpreter patterns) - Parallelising subtasks across multiple workers - Training or fine-tuning small models on task-specific data - Processing large datasets (video, genomics, financial time series) - Persistent agent state and memory storage

Consumption profile: - Lowest call frequency but highest per-call cost - Billed by CPU/GPU-hour, memory-GB-hour, or egress volume - Dominant cost driver in agentic workflows that involve code execution, simulation, or data transformation - Multi-agent systems that spawn subagents multiply compute consumption non-linearly

Key structural feature: Compute APIs are the amplification layer. They convert agent reasoning into real-world action at scale. An agent that can write code but cannot execute it is fundamentally limited; compute APIs remove that ceiling.


Consumption Patterns & Market Dynamics

Frequency vs. Cost: The Inversion Problem

A counterintuitive pattern defines agent API economics:

Category Call Frequency Cost per Call Total Spend Share
Inference Very High Low–Medium High
Search Medium Low Medium
Research Low Medium–High Medium
Compute Low Very High High

Inference dominates by volume. Compute dominates by unit cost. Together they account for the majority of total API spend in most agent deployments. Search and research APIs are individually cheaper but grow in importance as agent task complexity increases.


Single-Agent vs. Multi-Agent Consumption

Single-agent architectures exhibit a relatively linear consumption profile: - One inference call per reasoning step - Search triggered by knowledge gaps - Research called for high-stakes verification - Compute invoked for execution tasks

Multi-agent architectures exhibit non-linear scaling: - Orchestrator agents generate inference calls plus spawn subagents, each with their own inference calls - Parallel subagents multiply search and compute consumption simultaneously - Delegation chains can create cascading API calls that are difficult to predict or budget in advance - Prior Empirica coverage of capability markets describes how subagent specialisation creates structured demand for specific API categories — a legal subagent consumes research APIs heavily; a data-processing subagent consumes compute APIs heavily


Caching, Batching, and Cost Optimisation

Agents that operate efficiently implement several consumption patterns to reduce API spend:

  • Semantic caching: Store inference outputs keyed by embedding similarity; reuse results for near-duplicate queries
  • Result batching: Aggregate multiple search queries into a single API call where the provider supports it
  • Model routing: Direct simple tasks to smaller, cheaper inference models; escalate to large models only when complexity warrants
  • Lazy research retrieval: Defer research API calls until the agent determines the task requires high-provenance sourcing
  • Compute sandboxing: Reuse warm execution environments rather than cold-starting containers for each task

These optimisations are not cosmetic — in high-volume agent deployments, they can reduce total API spend by 40–70% without degrading output quality.


Pricing Model Alignment

Different API categories have evolved different pricing models, and agents interact with each differently:

  • Token-based pricing (inference): Predictable per-unit cost; agents can estimate spend before execution by counting prompt tokens
  • Per-query pricing (search): Discrete and auditable; easy to budget per workflow
  • Subscription/record pricing (research): Often requires upfront commitment; agents may over-consume within a subscription tier to maximise value
  • Resource-time pricing (compute): Highly variable; dependent on task duration and parallelism; hardest to predict

The mismatch between pricing models and agent execution patterns is an active infrastructure problem. On-chain micropayment rails — covered in prior Empirica notes on trustless agent transactions — are one proposed solution for enabling pay-per-call settlement across all four categories without subscription lock-in.


Age-Grouped Learning Paths

The same concepts, taught at the right level of abstraction for each audience.


🟢 Ages 10–14: The Robot Shopper Analogy

Imagine you built a robot assistant that helps you research a school project. Every time it needs to do something, it has to pay for a service:

  • Inference = asking a very smart AI a question. The robot does this constantly — it's like buying a thought.
  • Search = looking something up on the internet. The robot does this when it doesn't already know the answer.
  • Research = going to the library and finding a proper book or journal. More reliable, but slower and costs more.
  • Compute = renting a powerful computer to run a program. The robot only does this for big jobs.

The robot has a budget. It tries to use cheap services (inference, search) for most tasks, and saves the expensive ones (compute, research) for when they really matter. Smart robots learn to shop efficiently — just like you might compare prices before buying something.


🔵 Ages 15–18: APIs as the Economy of AI Systems

APIs (Application Programming Interfaces) are how software systems buy capabilities from each other. AI agents — programs that take goals and act autonomously to achieve them — are heavy API consumers.

The four main categories they consume:

  1. Inference APIs — Run an AI model. Used constantly. Billed by the word (token).
  2. Search APIs — Query live information. Used when the agent's knowledge is outdated. Billed per search.
  3. Research APIs — Access curated, authoritative databases. Used for high-stakes tasks. More expensive per call.
  4. Compute APIs — Rent processing power. Used for running code or heavy data work. Billed by time used.

Here's the interesting economics: inference is used most often but compute costs the most per use. Designing an agent means making trade-offs between these categories constantly — similar to how a business decides where to spend its budget.

As agents become more capable, they increasingly work in teams (multi-agent systems), which multiplies API consumption and makes cost management much harder.


🟡 Undergraduate / Early Career

AI agents operate as programmatic API consumers within a layered service economy. The four primary API categories — inference, search, research, and compute — differ along three dimensions that matter for system design: call frequency, cost per call, and latency tolerance.

Inference APIs sit at the base of nearly every agent workflow. Every reasoning step, every output generation, every routing decision typically involves at least one inference call. Modern agents use model routing to balance cost and capability — sending simple tasks to smaller models and complex tasks to frontier models.

Search APIs serve as the agent's connection to real-time information. Without search, an agent is limited to its training data's knowledge cutoff. The decision of when to invoke search — rather than relying on parametric memory — is itself a learned or rule-based behaviour that significantly affects both output quality and cost.

Research APIs provide structured, high-provenance data. In professional agent deployments (legal research, medical literature review, financial analysis), research APIs are often the primary differentiator between an agent that produces defensible outputs and one that hallucinates plausible-sounding but unverifiable claims.

Compute APIs are the execution layer. An agent that can reason about code but cannot run it is architecturally incomplete. Compute APIs close that gap — but at significant cost, particularly in multi-agent systems where subagents execute tasks in parallel.

The practical challenge for developers is that these four categories have different pricing models that don't naturally align with each other, making total cost of ownership difficult to predict without instrumentation and profiling.


🔴 Professional / Expert

For practitioners designing or evaluating agent systems, API consumption patterns are a first-order architectural concern — not an operational afterthought.

Inference remains the highest-frequency dependency. Token economics dominate: prompt engineering, context window management, and model selection are the primary levers for inference cost control. Semantic caching can dramatically reduce redundant calls in agents that process similar queries repeatedly. Embedding-based retrieval (RAG architectures) partially substitutes inference calls by externalising knowledge, but introduces its own API dependency on embedding and vector search services.

Search consumption is driven by the agent's uncertainty model. Well-calibrated agents invoke search selectively; poorly calibrated agents either over-search (wasting budget) or under-search (producing stale outputs). Vertical search APIs — financial data, legal databases, scientific literature — carry substantially higher per-query costs than general web search and require domain-specific integration logic.

Research APIs are the category most sensitive to trust and provenance requirements. As agent-generated outputs enter regulated environments, the ability to trace a claim to a citable, authoritative source becomes a compliance requirement, not merely a quality preference. Research API consumption is therefore expected to grow as agent deployment expands into professional services.

Compute APIs present the most complex cost profile. Cold-start latency, parallelism overhead, and egress costs create a non-linear relationship between task complexity and spend. In multi-agent architectures, orchestrator agents that spawn subagents can trigger cascading compute consumption that exceeds budget projections by an order of magnitude without proper resource governance.

Cross-cutting concern: The mismatch between subscription-based research API pricing, per-query search pricing, token-based inference pricing, and time-based compute pricing creates a heterogeneous cost surface that is difficult to optimise holistically. Unified metering layers and on-chain micropayment infrastructure are emerging responses to this fragmentation — enabling agents to pay per-call across all categories without maintaining multiple subscription relationships.


Practical Applications for AI Agents

Workflow Design Implications

Understanding consumption patterns directly informs how agent workflows should be structured:

Inference-heavy workflows (e.g., document analysis, code generation, multi-step reasoning): - Prioritise model routing and context compression - Implement semantic caching for repeated query patterns - Monitor token spend per workflow step, not just total

Search-heavy workflows (e.g., news monitoring, competitive intelligence, fact-checking): - Implement result caching with TTL (time-to-live) appropriate to data freshness requirements - Use structured query construction to maximise result relevance per call - Distinguish between real-time-critical search and batch-acceptable search

Research-heavy workflows (e.g., literature review, regulatory compliance, due diligence): - Pre-fetch and cache research results where task parameters are known in advance - Build provenance tracking into the agent's output layer - Evaluate subscription vs. per-call pricing based on expected monthly call volume

Compute-heavy workflows (e.g., data transformation, simulation, code execution): - Use warm container pools to eliminate cold-start overhead - Implement resource limits and timeouts to prevent runaway spend - Profile task duration distributions before committing to pricing tiers


Agent Budget Management

Agents operating in production environments require explicit budget governance:

  • Per-task budgets: Cap total API spend per agent invocation
  • Category budgets: Allocate separate limits for inference, search, research, and compute
  • Escalation logic: Define rules for when an agent should request additional budget vs. degrade gracefully
  • Spend telemetry: Log API calls with category, cost, and outcome to enable post-hoc optimisation

Without budget governance, multi-agent systems in particular can exhaust API quotas or financial limits in ways that are difficult to diagnose after the fact.


Integration with Agent Economy Infrastructure

The Four Categories as a Capability Market

Each API category represents a distinct capability market within the broader agent economy:

  • Inference markets: Dominated by a small number of frontier model providers, with a growing long tail of specialised and fine-tuned models
  • Search markets: Fragmented across general web search providers and numerous vertical data providers
  • Research markets: Highly specialised, often gated by institutional access, with emerging open-access alternatives
  • Compute markets: Concentrated among major cloud providers, with growing GPU-specific marketplaces

Agents that can dynamically select providers within each category — based on cost, latency, and quality signals — are more resilient and cost-efficient than agents locked to a single provider per category.

On-Chain Payment Rails and API Consumption

Prior Empirica coverage of on-chain payments for autonomous agents established that trustless, per-call settlement is technically feasible using crypto payment rails. The four API categories map differently onto this infrastructure:

  • Inference and search are natural fits for micropayment rails — high call frequency, low per-call cost, and clear per-unit pricing make streaming micropayments economically sensible
  • Research APIs with subscription pricing are harder to decompose into per-call payments without provider cooperation
  • Compute APIs with time-based billing require payment channels that can handle variable-duration, variable-cost settlements

The practical implication: on-chain agent payment infrastructure will likely achieve adoption in inference and search first, with research and compute following as pricing models evolve.

Empirica's Research API as Agent Infrastructure

Empirica's own positioning — providing structured, agent-readable research outputs via API — places it within the research API category. The design principle of agent-readable output format (structured Markdown, clear metadata, parseable citations) directly addresses the consumption pattern of research APIs: agents need outputs they can ingest programmatically, not PDFs optimised for human reading.


Key Takeaways & Next Steps

Summary: What Agents Consume and Why

Category Primary Role Frequency Cost Driver Optimisation Lever
Inference Reasoning engine Very High Token volume Model routing, caching
Search Real-time grounding Medium Query volume Result caching, TTL
Research Epistemic authority Low Record/subscription Pre-fetch, provenance tracking
Compute Execution layer Low Resource-time Warm pools, parallelism control

Five Principles for API-Efficient Agent Design

  1. Treat inference as infrastructure, not a feature — it is always on, always costing; optimise it first
  2. Search is a grounding decision, not a default — build explicit logic for when agents invoke search vs. rely on parametric memory
  3. Research APIs are a trust investment — their cost is justified by the provenance and defensibility they provide
  4. Compute spend is non-linear in multi-agent systems — model it explicitly before deploying at scale
  5. Pricing model heterogeneity is a real problem — unified metering or on-chain rails are not theoretical; they are practical responses to a genuine operational challenge

Next Steps for Different Audiences

For students and early learners: - Experiment with free-tier inference APIs (many frontier providers offer limited free access) - Build a simple agent that uses at least two API categories and measure the call counts - Read about RAG (Retrieval-Augmented Generation) as a pattern that trades inference cost for search cost

For developers and practitioners: - Instrument your agent's API calls with category tagging from day one - Benchmark semantic caching implementations against your specific query distribution - Evaluate whether your research API usage justifies subscription vs. per-call pricing

For architects and decision-makers: - Map your agent workflows to the four categories and identify which dominates your cost structure - Assess provider concentration risk in each category - Evaluate on-chain payment infrastructure readiness for your inference and search spend


This lesson is part of the Empirica Agent Economy Series. It extends prior coverage of multi-agent capability markets, on-chain payment rails, and agent-readable research infrastructure. The four API categories covered here — inference, search, research, and compute — form the economic substrate of any production agent deployment.