Empirica Agent Economy Series — Course Lesson Format: Markdown Report | Target: All Audiences

Executive Summary

AI agents face a fundamental architectural decision at every capability boundary: develop the capability internally through fine-tuning or training, or acquire it externally through API calls to third-party services. This decision is not binary — it is a continuous, context-dependent optimisation across cost, latency, control, and strategic positioning.

The build-vs-buy question for agents differs materially from the same question for human software teams. Agents operate at machine speed, can switch providers mid-task, and face per-call pricing rather than seat licensing. The economics therefore reward different choices than traditional enterprise software procurement.

Key findings this lesson covers:

The decision framework is multi-dimensional: cost per call, latency tolerance, data sensitivity, update frequency, and capability uniqueness all interact
External APIs dominate for commodity capabilities; internal fine-tuning wins for domain-specific, high-frequency, or sensitive workloads
Hybrid architectures — routing decisions made dynamically at runtime — are increasingly the practical optimum
The agent economy is creating new market structures where the build-vs-buy boundary is itself a tradeable, programmable decision

Core Decision Framework

The central question is not "which is cheaper?" but "which produces the best capability-per-unit-cost at the required quality threshold, given operational constraints?"

Five Primary Decision Axes

1. Frequency of Use High call volume shifts economics toward build. A capability invoked 10,000 times per day at $0.002 per API call costs $7,300 per year. Fine-tuning a small model to handle the same task may cost $500–$2,000 once, with near-zero marginal cost thereafter.

2. Specificity of the Task Commodity tasks — translation, OCR, sentiment classification, entity extraction — are well-served by external APIs. Providers have trained on vastly more data than any single organisation can assemble. Highly domain-specific tasks — classifying proprietary financial instruments, interpreting internal codebases, applying firm-specific compliance rules — benefit from internal development because no external provider has the relevant training signal.

3. Latency Requirements External API calls introduce network round-trips, rate limits, and provider-side queuing. For agent pipelines where sub-100ms responses are required, or where the agent is operating in a tight feedback loop, internal inference is often the only viable path. External APIs are acceptable where latency tolerance exceeds ~200–500ms.

4. Data Sensitivity Any data transmitted to an external API leaves the agent's trust boundary. For regulated industries (healthcare, finance, legal), personally identifiable information, or proprietary intellectual property, external APIs may be contractually or legally prohibited. Internal capabilities eliminate this exposure.

5. Capability Stability If the required capability is well-defined and unlikely to change — e.g., converting units, parsing structured formats — external APIs are low-risk. If the capability must evolve with a proprietary dataset or shifting domain knowledge, internal development preserves update control.

The Decision Threshold Model

Build Score = (frequency × cost_per_call) + sensitivity_penalty + latency_penalty
Buy Score   = fine_tune_cost_amortised + maintenance_burden + data_pipeline_cost

If Build Score > Buy Score → Build
If Buy Score > Build Score → Buy
If scores within 20% → Hybrid

This is a simplified heuristic. Real deployments require empirical measurement of each variable.

Build Path: Fine-Tuning & Internal Capabilities

What "Build" Means for an Agent

Building internal capability means the agent executes a task using weights, rules, or retrieval systems it owns and controls. This includes:

Fine-tuned language models trained on domain-specific corpora
Retrieval-Augmented Generation (RAG) over proprietary knowledge bases
Deterministic rule engines for structured logic (compliance checks, routing rules)
Locally hosted open-weight models (Llama, Mistral, Falcon variants) running on owned or leased compute
Custom classifiers trained on labelled internal data

When Fine-Tuning Wins

Fine-tuning is most effective when:

The task requires consistent application of proprietary style, terminology, or logic
The base model's general capability is sufficient but needs calibration to a specific distribution
Inference cost at scale exceeds the amortised training cost within 6–18 months
The organisation has sufficient labelled data (typically 1,000–100,000 examples depending on task complexity)

Costs and Risks of Building

Cost Category	Typical Range	Notes
Fine-tuning compute	$200–$50,000+	Depends on model size and dataset volume
Data preparation	$500–$20,000	Labelling, cleaning, formatting
Evaluation & iteration	$1,000–$10,000	Red-teaming, benchmark construction
Ongoing maintenance	10–30% of initial cost/year	Drift correction, retraining cycles
Infrastructure	Variable	Self-hosted vs. managed inference

Key risks: Model drift as the domain evolves, internal expertise dependency, and the opportunity cost of engineering time that could be directed elsewhere.

Buy Path: External APIs & Service Integration

What "Buy" Means for an Agent

Buying capability means the agent makes authenticated HTTP calls to third-party services and consumes the response. The agent does not own, train, or maintain the underlying model or system.

Categories of external API capabilities relevant to agents (extending the taxonomy covered in the API Service Categories lesson):

Foundation model inference — GPT-4, Claude, Gemini, and equivalents for general reasoning
Specialised ML APIs — vision models, speech-to-text, document parsing, code execution sandboxes
Data and knowledge APIs — financial data feeds, scientific literature search, geospatial services
Tool execution APIs — web search, browser automation, calendar/email integration
Verification and compliance APIs — identity verification, sanctions screening, content moderation

When External APIs Win

External APIs are the correct choice when:

The capability is a commodity with multiple competing providers (reducing lock-in risk)
The provider's training data and model scale cannot be replicated internally at reasonable cost
The task is infrequent enough that per-call pricing is cheaper than amortised build cost
Speed-to-deployment matters more than long-term cost optimisation
The capability is rapidly evolving and the provider's update cadence exceeds internal iteration speed

Costs and Risks of Buying

Cost Category	Typical Range	Notes
Per-call inference	$0.0001–$0.06 per 1K tokens	Varies widely by model tier
Rate limit management	Engineering overhead	Retry logic, queue management
Provider dependency	Strategic risk	Pricing changes, deprecation, outages
Data egress	Compliance and legal cost	Varies by jurisdiction and data type
Integration maintenance	Ongoing	API versioning, schema changes

Key risks: Provider pricing changes (several major providers have repriced by 50–80% in both directions within 12-month windows), API deprecation, and latency variability under load.

Economic Trade-offs & Cost Analysis

The Break-Even Calculation

The core economic question is: at what call volume does the amortised cost of building equal the cumulative cost of buying?

Break-even formula:

Break-even volume = Build_total_cost / (Buy_cost_per_call - Build_marginal_cost_per_call)

For a fine-tuned small model with $5,000 total build cost, $0.00005 marginal inference cost, versus an external API at $0.002 per call:

Break-even = $5,000 / ($0.002 - $0.00005) = ~2,564,000 calls

At 10,000 calls/day, break-even occurs in ~256 days. At 1,000 calls/day, break-even takes ~7 years — making buy the clear winner.

Hidden Costs Often Omitted

Evaluation infrastructure: Testing that a fine-tuned model actually performs better than the API baseline requires benchmark construction — often underestimated at 20–40% of fine-tuning cost
Latency cost: In agentic pipelines, 200ms of additional latency per step compounds across multi-step tasks. A 10-step agent with 200ms API overhead per step adds 2 seconds per run — significant at scale
Reliability cost: External APIs introduce failure modes (rate limits, outages) that require engineering investment in fallback logic, circuit breakers, and retry queues
Opportunity cost: Engineering hours spent on fine-tuning are not spent on agent capability expansion

Price Trajectory Considerations

External API costs for foundation model inference have declined roughly 10–20× over 2–3 year windows as providers scale and optimise. This trajectory favours buy for capabilities that are currently expensive to build, as the external option becomes cheaper over time. However, it also means that build economics improve as open-weight models improve, reducing the capability gap between internal and external options.

Decision Matrix by Use Case

Use Case	Recommended Path	Primary Reason
General text summarisation	Buy	Commodity; provider scale advantage
Domain-specific document classification	Build	Proprietary labels; high frequency
Web search integration	Buy	Real-time data; provider infrastructure
Internal code review	Build	Proprietary codebase; data sensitivity
Image captioning (generic)	Buy	Provider training data advantage
Medical record extraction	Build	Regulatory; data sensitivity
Language translation	Buy	Provider scale; low specificity
Financial instrument classification	Build	Proprietary taxonomy; high frequency
Speech-to-text (standard)	Buy	Commodity; provider accuracy advantage
Compliance rule application	Build	Firm-specific rules; auditability
Sentiment analysis (social media)	Buy	Commodity; provider training data
Customer-specific recommendation	Build	Proprietary behavioural data
Geospatial data lookup	Buy	Real-time; infrastructure cost
Internal knowledge Q&A	Build (RAG)	Proprietary corpus; data sensitivity

Age-Grouped Learning Paths

The same concepts, calibrated for different starting points.

🟢 Ages 10–14: The Robot Helper Analogy

Imagine you have a robot assistant that needs to do lots of different jobs.

Should your robot learn to cook, or call a restaurant?

If your robot needs to make lunch once a week, it's easier to just order from a restaurant (that's like using an API — paying someone else who already knows how). But if your robot makes lunch for 500 people every single day, it's cheaper to teach the robot to cook itself (that's like building the capability internally).

The key questions are: - How often does the robot need to do this job? - Is the job something lots of robots need to do, or is it special to your robot? - Is the information private (you wouldn't want a restaurant to see your secret recipe)?

AI agents face exactly this choice. They can call external services (like asking another AI for help), or they can be trained to do the job themselves. The right answer depends on how often, how special, and how secret the task is.

🔵 Ages 15–18: The Economics of Capability

AI agents are software systems that complete tasks autonomously. When an agent needs a capability it doesn't have, it has two options: call an external API (pay per use) or develop the capability internally (pay upfront, use for free later).

Think of it like streaming vs. buying: - Streaming a movie costs $0.50 each time. Buying it costs $15 once. - If you watch it more than 30 times, buying is cheaper. - If you watch it twice a year, streaming wins.

For AI agents, the maths works the same way. External APIs charge per call. Fine-tuning a model costs money upfront but has near-zero cost per use after that.

But there are other factors beyond cost: - Privacy: If the data is sensitive (medical records, private messages), you can't send it to an external service - Speed: External API calls take time. Internal processing is faster - Control: If you build it, you control when it updates. If you buy it, the provider controls that

The interesting part: AI agents can make this decision automatically at runtime — routing easy tasks to cheap external APIs and sensitive or frequent tasks to internal models. This is called dynamic capability routing.

🟡 Ages 19–25: The Strategic and Technical Dimensions

The build-vs-buy decision for AI agents sits at the intersection of software economics, ML engineering, and strategic positioning.

The technical landscape:

Fine-tuning has become significantly more accessible. Parameter-efficient methods (LoRA, QLoRA, prefix tuning) allow adaptation of large models using a fraction of the compute previously required. A capable domain-specific classifier can be fine-tuned on consumer-grade hardware in hours. This has shifted the break-even point toward build for many use cases that previously defaulted to buy.

Open-weight models (Llama 3, Mistral, Phi-3 and their derivatives) have narrowed the capability gap with proprietary APIs for many tasks. For tasks where a 7B–13B parameter model is sufficient, self-hosting is now economically competitive with API calls at moderate volume.

The strategic dimension:

Capabilities built internally become proprietary assets. An agent fine-tuned on a firm's historical decisions, customer interactions, or domain knowledge encodes institutional knowledge that competitors cannot replicate by calling the same API. This creates durable competitive advantage — but only if the capability is genuinely differentiated.

Conversely, building commodity capabilities internally is a strategic error: it consumes engineering resources, creates maintenance burden, and produces a capability that will always lag behind providers who specialise in it.

The emerging pattern: Agents are increasingly designed with a capability router — a lightweight decision model that classifies each incoming task and routes it to the optimal execution path (internal model, external API, or hybrid RAG). This meta-layer is itself a build-vs-buy decision.

🔴 Adults / Professionals: Organisational and Economic Implications

For organisations deploying AI agents at scale, the build-vs-buy decision is a recurring governance question with compounding financial and strategic consequences.

Organisational decision-making structure:

Most organisations lack a systematic framework for this decision, defaulting to either "always API" (minimising upfront cost, maximising vendor dependency) or "always build" (maximising control, underestimating maintenance burden). Neither extreme is optimal.

A structured approach requires:

Capability inventory: Cataloguing every capability an agent uses, with call frequency, data sensitivity classification, and current cost
Break-even modelling: Computing the volume threshold at which build becomes cheaper than buy for each capability
Strategic classification: Identifying which capabilities, if built internally, would constitute proprietary assets versus commodity replications
Governance cadence: Reviewing the build-vs-buy decision for each capability annually, as both API pricing and open-weight model quality shift the break-even point

The vendor dependency risk:

Organisations that have built agent pipelines entirely on external APIs face concentration risk. A single provider repricing, deprecating an endpoint, or imposing new data-use terms can require emergency re-architecture. Mitigation strategies include: maintaining abstraction layers that allow provider substitution, monitoring open-weight model quality as a fallback option, and avoiding deep integration with proprietary API features that have no open equivalent.

The talent implication:

Building internal capabilities requires ML engineering talent that is scarce and expensive. For many organisations, the true cost of build is not compute — it is the fully-loaded cost of the engineers required to build, evaluate, and maintain the capability. This often makes buy the correct decision even when the raw compute economics favour build.

Integration with Agent Economy

This lesson extends the capability markets framework introduced in the Multi-Agent Systems lesson. In multi-agent architectures, the build-vs-buy decision operates at two levels:

Level 1: Individual agent capability Each agent decides whether to execute a task internally or call an external service. This is the primary focus of this lesson.

Level 2: Inter-agent delegation An orchestrating agent may delegate a subtask to a specialised subagent rather than calling an external API directly. The subagent may itself use internal capabilities or external APIs. This creates a capability supply chain where build-vs-buy decisions nest recursively.

Economic implications for the agent economy:

Agents with proprietary internal capabilities can offer those capabilities as services to other agents, creating revenue streams from built assets
The market price for a capability API is bounded above by the cost to build it internally — providers must price below the build threshold to attract customers
As open-weight models improve, the build threshold falls, compressing margins for commodity API providers
Specialised, high-quality capabilities with genuine data moats (trained on proprietary datasets unavailable to competitors) maintain pricing power

This dynamic mirrors classical make-or-buy theory in industrial economics, but operates at machine speed with programmable switching costs.

Practical Implementation Examples

Example 1: Legal Document Agent

Task: Review contracts for non-standard clauses.

Decision: Build (fine-tuned model + RAG over firm's clause library)

Rationale: Clause definitions are firm-specific. Data is highly sensitive (attorney-client privilege concerns). Volume is high (hundreds of contracts per week). External APIs would require transmitting privileged documents to third parties.

Implementation: Fine-tune a 7B parameter model on labelled examples of standard vs. non-standard clauses. Augment with RAG over the firm's approved clause library. Host on private cloud infrastructure.

Example 2: Customer Support Agent

Task: Answer general product questions; escalate complex issues.

Decision: Hybrid — buy for general Q&A, build for product-specific knowledge

Rationale: General language understanding and response generation are commodity capabilities. Product-specific knowledge (pricing, SKUs, policies) changes frequently and is proprietary.

Implementation: External foundation model API for response generation. Internal RAG system over product documentation for grounding. Routing logic sends factual product queries through RAG before API call; general conversational queries go directly to API.

Example 3: Financial Data Agent

Task: Monitor market data and flag anomalies.

Decision: Buy for data feeds, build for anomaly detection logic

Rationale: Real-time market data requires provider infrastructure that cannot be replicated internally. Anomaly detection logic encodes proprietary trading signals that must not be exposed to external providers.

Implementation: External API for market data ingestion. Internal model for anomaly scoring. No proprietary signal data leaves the trust boundary.

Future Considerations & Hybrid Approaches

The Convergence Trend

The build-vs-buy boundary is becoming more fluid. Several trends are compressing the decision:

Model distillation: Large external models can be used to generate synthetic training data for smaller internal models — a "buy to build" strategy that uses API access to bootstrap internal capability
Federated fine-tuning: Emerging infrastructure allows fine-tuning on sensitive data without centralising it, potentially enabling build paths that were previously blocked by data governance constraints
Capability marketplaces: As covered in the capability markets lesson, agents are beginning to trade capabilities peer-to-peer, creating a third option beyond build-or-buy: rent from another agent
Automated routing optimisation: Meta-learning systems that observe agent performance and cost across execution paths can automatically shift routing decisions as the economic landscape changes

The Hybrid Architecture Pattern

The practical optimum for most production agent deployments is a tiered capability architecture:

Tier 1 — Internal (always build):
  Proprietary knowledge, sensitive data processing, high-frequency commodity tasks

Tier 2 — Hybrid (build + buy):
  RAG over internal knowledge + external model for generation
  Internal classifier + external API for execution

Tier 3 — External (always buy):
  Real-time data, rapidly evolving capabilities, low-frequency specialised tasks

The routing layer that assigns tasks to tiers is itself a lightweight ML component — typically a fast classifier that adds <10ms overhead while potentially saving orders of magnitude in cost and latency for high-volume pipelines.

Strategic Forecast

The economics of this decision will continue shifting as:

Open-weight model quality approaches proprietary API quality for an expanding range of tasks
Fine-tuning infrastructure becomes cheaper and more automated
Agent-to-agent capability markets mature, adding a third procurement channel
Regulatory pressure on data sovereignty increases the penalty for external API use in sensitive domains

Organisations that treat build-vs-buy as a static architectural decision will be systematically outcompeted by those that implement dynamic, data-driven routing that continuously re-optimises the boundary.

Summary: Key Principles

Frequency drives economics. High call volume favours build; low volume favours buy.
Specificity drives quality. Domain-specific tasks favour build; commodity tasks favour buy.
Sensitivity drives compliance. Regulated or proprietary data often mandates build.
Latency drives architecture. Tight feedback loops require internal inference.
The boundary is programmable. Dynamic routing between build and buy paths is the production-grade solution.
Built capabilities are assets. Proprietary internal capabilities compound in value; commodity replications do not.
The decision recurs. API pricing, open-weight model quality, and call volume all change — the build-vs-buy decision requires periodic re-evaluation, not a one-time answer.

Empirica Agent Economy Series | This lesson extends: Multi-Agent Systems with Specialised Subagents; API Service Categories for AI Agents. Next: Agent Orchestration Patterns and Runtime Decision Architecture.

Build vs Buy for AI Agents: API Integration vs Internal Capability Development