API Service Categories for AI Agents: Inference, Search, Research, and Compute Consumption Patterns


1. Overview: The Four API Service Categories

AI agents are not monolithic programs — they are orchestrated pipelines that purchase capabilities from external services at runtime. The paid API economy that has grown around agent deployment organises into four primary categories:

  • Inference APIs — language model calls that generate, reason, classify, or transform text and other modalities
  • Search APIs — real-time retrieval of web content, news, images, or domain-specific indexes
  • Research APIs — structured, curated knowledge delivered as queryable data (financial records, academic abstracts, company profiles, legal documents)
  • Compute APIs — execution environments, code runners, browser automation, and orchestration infrastructure

Each category serves a distinct function in an agent's task loop. Understanding which categories dominate — and when — is foundational for anyone building, pricing, or deploying agents at scale.


2. Inference APIs: The Core Workload

Inference is the category that defines agent behaviour. Every decision, plan, response, and tool-call selection passes through an inference API call.

What inference APIs provide: - Text generation and completion (GPT-4o, Claude, Gemini, Mistral, and equivalents) - Embedding generation for semantic search and memory retrieval - Multimodal processing — image, audio, and document understanding - Structured output generation (JSON-mode, function-calling schemas) - Reasoning traces (chain-of-thought, extended thinking modes)

Consumption characteristics: - Inference is billed by token — input tokens and output tokens priced separately - Agentic loops are token-intensive: a single task may trigger 5–30 model calls, each with a growing context window carrying prior tool outputs - Reasoning-optimised models (those that generate internal chain-of-thought before responding) consume 3–10× more tokens per call than standard completion models - Caching mechanisms (prompt caching, KV-cache reuse) reduce costs for agents with repetitive system prompts, but novel tasks cannot exploit caching

Why inference dominates the budget: Inference is the only API category that is called on every agent step. Search, research, and compute calls are conditional — triggered only when the agent determines they are needed. Inference is unconditional. For most production agents, inference accounts for 50–80% of total API spend.


3. Search APIs: Discovery and Retrieval at Scale

Search APIs give agents access to information that was not present in their training data — recent events, live prices, current documentation, and the open web.

Primary search API types:

Type Examples Agent use case
Web search Bing Search API, Google Custom Search, Brave Search API, Serper General fact-finding, news, competitor research
News search NewsAPI, GDELT Time-sensitive monitoring, event detection
Academic search Semantic Scholar API, OpenAlex Literature review, citation retrieval
Image/video search Getty, Shutterstock APIs Multimodal content tasks
Domain-specific LinkedIn, Crunchbase, PubMed Vertical agent workflows

Consumption characteristics: - Search APIs are billed per query, typically $0.001–$0.05 per call depending on provider and result depth - Agents performing research tasks may issue 10–50 search queries per session; monitoring agents may issue thousands per day - Result quality varies significantly — agents often need to issue multiple reformulated queries to retrieve usable content, multiplying costs - Search APIs return URLs and snippets; agents frequently chain search calls with scraping or content-fetch calls to retrieve full text, adding latency and cost

Cost pressure point: Search is the second-largest cost category for research-oriented agents. The combination of query volume and downstream content retrieval makes search costs highly sensitive to task scope.


4. Research APIs: Structured Knowledge as Infrastructure

Research APIs deliver pre-structured, curated datasets — the difference between asking a search engine a question and querying a database that already has the answer formatted for machine consumption.

What distinguishes research APIs from search: - Search returns ranked links and snippets; research APIs return structured records - Research APIs carry authoritative, maintained data — financial filings, patent records, clinical trial registries, company hierarchies - Latency is lower; parsing overhead is near-zero because data arrives in schema-consistent formats - Licensing terms are explicit — agents consuming research APIs operate under defined data-use agreements

Key research API categories:

  • Financial data — earnings, filings, price history, analyst estimates (Bloomberg API, Refinitiv, Polygon.io, Alpha Vantage)
  • Company intelligence — firmographics, funding rounds, executive data (Crunchbase, Clearbit, PitchBook)
  • Legal and regulatory — case law, patent databases, regulatory filings (CourtListener, USPTO APIs, SEC EDGAR)
  • Scientific literature — abstracts, citation graphs, author networks (PubMed, Semantic Scholar, OpenAlex)
  • Geospatial and mapping — coordinates, routing, place data (Google Maps Platform, Mapbox, HERE)

Consumption characteristics: - Research APIs are often subscription-based with usage tiers, not pure pay-per-call — agents must be provisioned with credentials that carry monthly or annual cost commitments - High-value structured data (financial, legal) commands premium pricing; scientific literature APIs are often free or low-cost due to open-access mandates - Agents that replace human analyst workflows consume research APIs heavily — a financial analysis agent may pull dozens of structured records per task


5. Compute APIs: Processing Power and Orchestration

Compute APIs extend what agents can do beyond language — they provide execution environments, browser control, file processing, and infrastructure for running code or managing state.

Compute API subcategories:

Code execution environments - Sandboxed Python/JavaScript runners (E2B, Modal, Replit Agent APIs) - Agents use these to run data analysis, generate charts, process files, and validate logic - Billed by execution time and memory; short-lived tasks are cheap, long-running data jobs are not

Browser automation - Headless browser APIs (Browserbase, Playwright-as-a-service, Apify) - Enable agents to interact with websites that do not offer APIs — form submission, login-gated content, dynamic JavaScript rendering - Billed by session time or page interactions

Document processing - OCR, PDF parsing, table extraction (AWS Textract, Azure Document Intelligence, Reducto) - Agents handling unstructured document inputs route through these before inference - Billed per page or per document

Vector databases and memory - Managed vector stores (Pinecone, Weaviate Cloud, Qdrant Cloud) - Agents with persistent memory or large knowledge bases query these on every retrieval step - Billed by index size and query volume

Orchestration infrastructure - Agent workflow platforms (LangSmith, Weights & Biases, Helicone for observability; LangGraph Cloud, Temporal for state management) - These are operational costs rather than capability costs — they do not add intelligence but make agents reliable and debuggable

Consumption characteristics: - Compute costs are highly variable — a simple code-execution task costs fractions of a cent; a long browser session with many interactions can cost dollars - Compute APIs are the most unpredictable cost category because agent behaviour determines session length - Memory/vector costs scale with the agent's knowledge base size, not task frequency


6. Consumption Patterns: Which Categories Dominate?

The dominant API category depends on agent type. There is no universal distribution — but patterns are consistent within agent archetypes.

By agent archetype:

Agent type Inference share Search share Research share Compute share
Conversational assistant 85–95% 5–15% <5% <5%
Research/analyst agent 50–65% 20–30% 10–20% 5–10%
Coding agent 60–75% 5–10% <5% 20–30%
Data pipeline agent 40–60% 5–10% 10–20% 25–35%
Monitoring/alerting agent 30–50% 40–60% 5–10% 5–10%
Browser automation agent 30–50% 10–20% <5% 35–50%

Key observations: - Inference never drops below ~30% of spend for any agent type — it is the irreducible core - Search dominates for monitoring agents because their primary function is continuous retrieval - Compute dominates for agents that manipulate files, run code, or interact with web interfaces - Research APIs are a minority cost by volume but often the highest cost per call — a single financial data query can cost more than 100 inference tokens


7. Cost-Benefit Analysis by Agent Type

Not all API spend is equal. The value generated per dollar of API cost varies dramatically.

High-value, high-cost patterns: - Financial analysis agents consuming premium data APIs can justify $5–50 per task if the output replaces hours of analyst time - Legal research agents pulling structured case law can justify premium per-query costs against billable-hour displacement

High-volume, low-margin patterns: - Monitoring agents issuing thousands of search queries daily face cost pressure as query volume scales faster than value delivered - Conversational agents with large context windows face inference cost inflation as conversation history grows

Cost optimisation levers: - Model routing — use smaller, cheaper models for classification and routing steps; reserve frontier models for synthesis and generation - Caching — cache search results and research API responses for queries that repeat within a session or across sessions - Retrieval-augmented generation (RAG) — pre-index frequently needed knowledge in a vector store to reduce live search and research API calls - Prompt compression — summarise prior context before appending to new calls to reduce input token counts - Batching — group non-time-sensitive API calls to exploit batch pricing tiers where available


8. Integration Patterns: How Agents Combine Services

Agents rarely call a single API category in isolation. The value of agentic systems comes from chaining services across categories.

Common integration patterns:

Search → Inference (basic research loop) 1. Agent receives task 2. Issues web search query → retrieves URLs and snippets 3. Fetches full content from top results 4. Passes content to inference API for synthesis 5. Returns structured answer

Research API → Inference → Compute (analyst workflow) 1. Agent receives analysis request 2. Queries structured research API for financial or company data 3. Passes structured records to inference for interpretation 4. Executes Python code via compute API to generate charts or run calculations 5. Inference API formats final report

Inference → Compute → Inference (coding loop) 1. Inference generates code 2. Compute API executes code in sandbox 3. Execution output (including errors) returned to inference 4. Inference revises code 5. Loop repeats until tests pass

Multi-agent delegation pattern: - Orchestrator agent (inference-heavy) decomposes task and delegates to specialist subagents - Each subagent has its own API consumption profile optimised for its function - Orchestrator aggregates outputs — adding another inference call layer - Total system cost is the sum of all subagent costs plus orchestration overhead

Cost implication of chaining: Each integration step adds latency and cost. A five-step pipeline that looks cheap per step can accumulate to $0.50–$2.00 per task when all API calls are counted. Agents must be designed with cost-per-task targets, not just per-call costs.


The API economy for agents is evolving faster than the agents themselves. Several structural shifts are underway.

Inference cost deflation: Model providers are competing aggressively on price. Inference costs for capable models have fallen by roughly an order of magnitude over two years. This shifts the cost balance toward search, research, and compute — categories that have not deflated at the same rate.

Specialised agent-native APIs: A new class of APIs is being designed specifically for agent consumption rather than human-facing applications. These return machine-optimised formats (structured JSON, citation-linked summaries, confidence scores) rather than human-readable HTML or prose. Providers building in this direction are positioning for agent-first distribution.

Agentic billing models: Traditional APIs bill per call or per token. Emerging models include: - Per-task pricing — pay for a completed outcome, not individual API calls (Anthropic's Claude API with tool use, some vertical AI services) - Subscription tiers with agent-specific quotas — monthly access with rate limits calibrated to agent workloads rather than human usage patterns - Revenue-share models — API providers take a percentage of value generated rather than charging per call

Memory and context as a service: As agent context windows grow, managing what goes into context becomes a cost-optimisation problem. Managed memory services that intelligently retrieve only relevant prior context — rather than passing entire conversation histories — are emerging as a distinct API category.

Vertical research APIs: General-purpose search is being supplemented by domain-specific structured retrieval services targeting agent workflows in legal, medical, financial, and scientific domains. These command premium pricing but reduce the multi-step search-then-parse overhead that general search requires.


10. Practical Takeaways for Agent Builders

Design with cost-per-task as the primary metric, not cost-per-call. Individual API calls look cheap. Agentic pipelines that chain 10–30 calls per task accumulate costs that determine whether a product is economically viable.

Profile before optimising. Instrument every API call with logging that captures category, cost, latency, and whether the call contributed to the final output. Many agents make redundant or low-value calls that can be eliminated without degrading quality.

Inference is the floor, not the ceiling. You cannot eliminate inference costs — they are structural. Optimise inference spend through model routing (right model for each step) and context management (keep prompts as short as task quality allows).

Search costs scale with task scope — bound them explicitly. Set maximum query budgets per task. Agents without search limits will issue queries until they run out of time, not until they run out of useful information.

Research APIs justify their cost when they replace multi-step search pipelines. A single structured research API call that returns a clean data record is often cheaper and faster than three search queries plus content fetching plus inference-based extraction. Evaluate research APIs as pipeline replacements, not add-ons.

Compute costs are the hardest to predict — sandbox and monitor. Browser automation and long-running code execution can produce cost spikes that are invisible until billing arrives. Set per-session time limits and cost alerts.

The agent economy rewards specialisation. Agents optimised for a narrow task type with a well-understood API consumption profile are cheaper to run and easier to improve than general-purpose agents with unpredictable call patterns. Specialisation is an economic advantage, not just a technical one.


This lesson is part of Empirica's agent economy curriculum. Related lessons cover discovery infrastructure for agents, multi-agent delegation economics, and research subscriptions as agent infrastructure.