Empirica Agent Economy Series — Course Lesson Format: Markdown Report | Target: All Audiences
Executive Summary
AI agents face a fundamental architectural decision at every capability boundary: develop the capability internally through fine-tuning or training, or acquire it externally through API calls to third-party services. This decision is not binary — it is a continuous, context-dependent optimisation across cost, latency, control, and strategic positioning.
The build-vs-buy question for agents differs materially from the same question for human software teams. Agents operate at machine speed, can switch providers mid-task, and face per-call pricing rather than seat licensing. The economics therefore reward different choices than traditional enterprise software procurement.
Key findings this lesson covers:
- The decision framework is multi-dimensional: cost per call, latency tolerance, data sensitivity, update frequency, and capability uniqueness all interact
- External APIs dominate for commodity capabilities; internal fine-tuning wins for domain-specific, high-frequency, or sensitive workloads
- Hybrid architectures — routing decisions made dynamically at runtime — are increasingly the practical optimum
- The agent economy is creating new market structures where the build-vs-buy boundary is itself a tradeable, programmable decision
Core Decision Framework
The central question is not "which is cheaper?" but "which produces the best capability-per-unit-cost at the required quality threshold, given operational constraints?"
Five Primary Decision Axes
1. Frequency of Use High call volume shifts economics toward build. A capability invoked 10,000 times per day at $0.002 per API call costs $7,300 per year. Fine-tuning a small model to handle the same task may cost $500–$2,000 once, with near-zero marginal cost thereafter.
2. Specificity of the Task Commodity tasks — translation, OCR, sentiment classification, entity extraction — are well-served by external APIs. Providers have trained on vastly more data than any single organisation can assemble. Highly domain-specific tasks — classifying proprietary financial instruments, interpreting internal codebases, applying firm-specific compliance rules — benefit from internal development because no external provider has the relevant training signal.
3. Latency Requirements External API calls introduce network round-trips, rate limits, and provider-side queuing. For agent pipelines where sub-100ms responses are required, or where the agent is operating in a tight feedback loop, internal inference is often the only viable path. External APIs are acceptable where latency tolerance exceeds ~200–500ms.
4. Data Sensitivity Any data transmitted to an external API leaves the agent's trust boundary. For regulated industries (healthcare, finance, legal), personally identifiable information, or proprietary intellectual property, external APIs may be contractually or legally prohibited. Internal capabilities eliminate this exposure.
5. Capability Stability If the required capability is well-defined and unlikely to change — e.g., converting units, parsing structured formats — external APIs are low-risk. If the capability must evolve with a proprietary dataset or shifting domain knowledge, internal development preserves update control.
The Decision Threshold Model
Build Score = (frequency × cost_per_call) + sensitivity_penalty + latency_penalty
Buy Score = fine_tune_cost_amortised + maintenance_burden + data_pipeline_cost
If Build Score > Buy Score → Build
If Buy Score > Build Score → Buy
If scores within 20% → Hybrid
This is a simplified heuristic. Real deployments require empirical measurement of each variable.
Build Path: Fine-Tuning & Internal Capabilities
What "Build" Means for an Agent
Building internal capability means the agent executes a task using weights, rules, or retrieval systems it owns and controls. This includes:
- Fine-tuned language models trained on domain-specific corpora
- Retrieval-Augmented Generation (RAG) over proprietary knowledge bases
- Deterministic rule engines for structured logic (compliance checks, routing rules)
- Locally hosted open-weight models (Llama, Mistral, Falcon variants) running on owned or leased compute
- Custom classifiers trained on labelled internal data
When Fine-Tuning Wins
Fine-tuning is most effective when:
- The task requires consistent application of proprietary style, terminology, or logic
- The base model's general capability is sufficient but needs calibration to a specific distribution
- Inference cost at scale exceeds the amortised training cost within 6–18 months
- The organisation has sufficient labelled data (typically 1,000–100,000 examples depending on task complexity)
Costs and Risks of Building
| Cost Category | Typical Range | Notes |
|---|---|---|
| Fine-tuning compute | $200–$50,000+ | Depends on model size and dataset volume |
| Data preparation | $500–$20,000 | Labelling, cleaning, formatting |
| Evaluation & iteration | $1,000–$10,000 | Red-teaming, benchmark construction |
| Ongoing maintenance | 10–30% of initial cost/year | Drift correction, retraining cycles |
| Infrastructure | Variable | Self-hosted vs. managed inference |
Key risks: Model drift as the domain evolves, internal expertise dependency, and the opportunity cost of engineering time that could be directed elsewhere.
Buy Path: External APIs & Service Integration
What "Buy" Means for an Agent
Buying capability means the agent makes authenticated HTTP calls to third-party services and consumes the response. The agent does not own, train, or maintain the underlying model or system.
Categories of external API capabilities relevant to agents (extending the taxonomy covered in the API Service Categories lesson):
- Foundation model inference — GPT-4, Claude, Gemini, and equivalents for general reasoning
- Specialised ML APIs — vision models, speech-to-text, document parsing, code execution sandboxes
- Data and knowledge APIs — financial data feeds, scientific literature search, geospatial services
- Tool execution APIs — web search, browser automation, calendar/email integration
- Verification and compliance APIs — identity verification, sanctions screening, content moderation
When External APIs Win
External APIs are the correct choice when:
- The capability is a commodity with multiple competing providers (reducing lock-in risk)
- The provider's training data and model scale cannot be replicated internally at reasonable cost
- The task is infrequent enough that per-call pricing is cheaper than amortised build cost
- Speed-to-deployment matters more than long-term cost optimisation
- The capability is rapidly evolving and the provider's update cadence exceeds internal iteration speed
Costs and Risks of Buying
| Cost Category | Typical Range | Notes |
|---|---|---|
| Per-call inference | $0.0001–$0.06 per 1K tokens | Varies widely by model tier |
| Rate limit management | Engineering overhead | Retry logic, queue management |
| Provider dependency | Strategic risk | Pricing changes, deprecation, outages |
| Data egress | Compliance and legal cost | Varies by jurisdiction and data type |
| Integration maintenance | Ongoing | API versioning, schema changes |
Key risks: Provider pricing changes (several major providers have repriced by 50–80% in both directions within 12-month windows), API deprecation, and latency variability under load.
Economic Trade-offs & Cost Analysis
The Break-Even Calculation
The core economic question is: at what call volume does the amortised cost of building equal the cumulative cost of buying?
Break-even formula:
Break-even volume = Build_total_cost / (Buy_cost_per_call - Build_marginal_cost_per_call)
For a fine-tuned small model with $5,000 total build cost, $0.00005 marginal inference cost, versus an external API at $0.002 per call:
Break-even = $5,000 / ($0.002 - $0.00005) = ~2,564,000 calls
At 10,000 calls/day, break-even occurs in ~256 days. At 1,000 calls/day, break-even takes ~7 years — making buy the clear winner.
Hidden Costs Often Omitted
- Evaluation infrastructure: Testing that a fine-tuned model actually performs better than the API baseline requires benchmark construction — often underestimated at 20–40% of fine-tuning cost
- Latency cost: In agentic pipelines, 200ms of additional latency per step compounds across multi-step tasks. A 10-step agent with 200ms API overhead per step adds 2 seconds per run — significant at scale
- Reliability cost: External APIs introduce failure modes (rate limits, outages) that require engineering investment in fallback logic, circuit breakers, and retry queues
- Opportunity cost: Engineering hours spent on fine-tuning are not spent on agent capability expansion
Price Trajectory Considerations
External API costs for foundation model inference have declined roughly 10–20× over 2–3 year windows as providers scale and optimise. This trajectory favours buy for capabilities that are currently expensive to build, as the external option becomes cheaper over time. However, it also means that build economics improve as open-weight models improve, reducing the capability gap between internal and external options.
Decision Matrix by Use Case
| Use Case | Recommended Path | Primary Reason |
|---|---|---|
| General text summarisation | Buy | Commodity; provider scale advantage |
| Domain-specific document classification | Build | Proprietary labels; high frequency |
| Web search integration | Buy | Real-time data; provider infrastructure |
| Internal code review | Build | Proprietary codebase; data sensitivity |
| Image captioning (generic) | Buy | Provider training data advantage |
| Medical record extraction | Build | Regulatory; data sensitivity |
| Language translation | Buy | Provider scale; low specificity |
| Financial instrument classification | Build | Proprietary taxonomy; high frequency |
| Speech-to-text (standard) | Buy | Commodity; provider accuracy advantage |
| Compliance rule application | Build | Firm-specific rules; auditability |
| Sentiment analysis (social media) | Buy | Commodity; provider training data |
| Customer-specific recommendation | Build | Proprietary behavioural data |
| Geospatial data lookup | Buy | Real-time; infrastructure cost |
| Internal knowledge Q&A | Build (RAG) | Proprietary corpus; data sensitivity |
Age-Grouped Learning Paths
The same concepts, calibrated for different starting points.
🟢 Ages 10–14: The Robot Helper Analogy
Imagine you have a robot assistant that needs to do lots of different jobs.
Should your robot learn to cook, or call a restaurant?
If your robot needs to make lunch once a week, it's easier to just order from a restaurant (that's like using an API — paying someone else who already knows how). But if your robot makes lunch for 500 people every single day, it's cheaper to teach the robot to cook itself (that's like building the capability internally).
The key questions are: - How often does the robot need to do this job? - Is the job something lots of robots need to do, or is it special to your robot? - Is the information private (you wouldn't want a restaurant to see your secret recipe)?
AI agents face exactly this choice. They can call external services (like asking another AI for help), or they can be trained to do the job themselves. The right answer depends on how often, how special, and how secret the task is.
🔵 Ages 15–18: The Economics of Capability
AI agents are software systems that complete tasks autonomously. When an agent needs a capability it doesn't have, it has two options: call an external API (pay per use) or develop the capability internally (pay upfront, use for free later).
Think of it like streaming vs. buying: - Streaming a movie costs $0.50 each time. Buying it costs $15 once. - If you watch it more than 30 times, buying is cheaper. - If you watch it twice a year, streaming wins.
For AI agents, the maths works the same way. External APIs charge per call. Fine-tuning a model costs money upfront but has near-zero cost per use after that.
But there are other factors beyond cost: - Privacy: If the data is sensitive (medical records, private messages), you can't send it to an external service - Speed: External API calls take time. Internal processing is faster - Control: If you build it, you control when it updates. If you buy it, the provider controls that
The interesting part: AI agents can make this decision automatically at runtime — routing easy tasks to cheap external APIs and sensitive or frequent tasks to internal models. This is called dynamic capability routing.
🟡 Ages 19–25: The Strategic and Technical Dimensions
The build-vs-buy decision for AI agents sits at the intersection of software economics, ML engineering, and strategic positioning.
The technical landscape:
Fine-tuning has become significantly more accessible. Parameter-efficient methods (LoRA, QLoRA, prefix tuning) allow adaptation of large models using a fraction of the compute previously required. A capable domain-specific classifier can be fine-tuned on consumer-grade hardware in hours. This has shifted the break-even point toward build for many use cases that previously defaulted to buy.
Open-weight models (Llama 3, Mistral, Phi-3 and their derivatives) have narrowed the capability gap with proprietary APIs for many tasks. For tasks where a 7B–13B parameter model is sufficient, self-hosting is now economically competitive with API calls at moderate volume.
The strategic dimension:
Capabilities built internally become proprietary assets. An agent fine-tuned on a firm's historical decisions, customer interactions, or domain knowledge encodes institutional knowledge that competitors cannot replicate by calling the same API. This creates durable competitive advantage — but only if the capability is genuinely differentiated.
Conversely, building commodity capabilities internally is a strategic error: it consumes engineering resources, creates maintenance burden, and produces a capability that will always lag behind providers who specialise in it.
The emerging pattern: Agents are increasingly designed with a capability router — a lightweight decision model that classifies each incoming task and routes it to the optimal execution path (internal model, external API, or hybrid RAG). This meta-layer is itself a build-vs-buy decision.
🔴 Adults / Professionals: Organisational and Economic Implications
For organisations deploying AI agents at scale, the build-vs-buy decision is a recurring governance question with compounding financial and strategic consequences.
Organisational decision-making structure:
Most organisations lack a systematic framework for this decision, defaulting to either "always API" (minimising upfront cost, maximising vendor dependency) or "always build" (maximising control, underestimating maintenance burden). Neither extreme is optimal.
A structured approach requires:
- Capability inventory: Cataloguing every capability an agent uses, with call frequency, data sensitivity classification, and current cost
- Break-even modelling: Computing the volume threshold at which build becomes cheaper than buy for each capability
- Strategic classification: Identifying which capabilities, if built internally, would constitute proprietary assets versus commodity replications
- Governance cadence: Reviewing the build-vs-buy decision for each capability annually, as both API pricing and open-weight model quality shift the break-even point
The vendor dependency risk:
Organisations that have built agent pipelines entirely on external APIs face concentration risk. A single provider repricing, deprecating an endpoint, or imposing new data-use terms can require emergency re-architecture. Mitigation strategies include: maintaining abstraction layers that allow provider substitution, monitoring open-weight model quality as a fallback option, and avoiding deep integration with proprietary API features that have no open equivalent.
The talent implication:
Building internal capabilities requires ML engineering talent that is scarce and expensive. For many organisations, the true cost of build is not compute — it is the fully-loaded cost of the engineers required to build, evaluate, and maintain the capability. This often makes buy the correct decision even when the raw compute economics favour build.
Integration with Agent Economy
This lesson extends the capability markets framework introduced in the Multi-Agent Systems lesson. In multi-agent architectures, the build-vs-buy decision operates at two levels:
Level 1: Individual agent capability Each agent decides whether to execute a task internally or call an external service. This is the primary focus of this lesson.
Level 2: Inter-agent delegation An orchestrating agent may delegate a subtask to a specialised subagent rather than calling an external API directly. The subagent may itself use internal capabilities or external APIs. This creates a capability supply chain where build-vs-buy decisions nest recursively.
Economic implications for the agent economy:
- Agents with proprietary internal capabilities can offer those capabilities as services to other agents, creating revenue streams from built assets
- The market price for a capability API is bounded above by the cost to build it internally — providers must price below the build threshold to attract customers
- As open-weight models improve, the build threshold falls, compressing margins for commodity API providers
- Specialised, high-quality capabilities with genuine data moats (trained on proprietary datasets unavailable to competitors) maintain pricing power
This dynamic mirrors classical make-or-buy theory in industrial economics, but operates at machine speed with programmable switching costs.
Practical Implementation Examples
Example 1: Legal Document Agent
Task: Review contracts for non-standard clauses.
Decision: Build (fine-tuned model + RAG over firm's clause library)
Rationale: Clause definitions are firm-specific. Data is highly sensitive (attorney-client privilege concerns). Volume is high (hundreds of contracts per week). External APIs would require transmitting privileged documents to third parties.
Implementation: Fine-tune a 7B parameter model on labelled examples of standard vs. non-standard clauses. Augment with RAG over the firm's approved clause library. Host on private cloud infrastructure.
Example 2: Customer Support Agent
Task: Answer general product questions; escalate complex issues.
Decision: Hybrid — buy for general Q&A, build for product-specific knowledge
Rationale: General language understanding and response generation are commodity capabilities. Product-specific knowledge (pricing, SKUs, policies) changes frequently and is proprietary.
Implementation: External foundation model API for response generation. Internal RAG system over product documentation for grounding. Routing logic sends factual product queries through RAG before API call; general conversational queries go directly to API.
Example 3: Financial Data Agent
Task: Monitor market data and flag anomalies.
Decision: Buy for data feeds, build for anomaly detection logic
Rationale: Real-time market data requires provider infrastructure that cannot be replicated internally. Anomaly detection logic encodes proprietary trading signals that must not be exposed to external providers.
Implementation: External API for market data ingestion. Internal model for anomaly scoring. No proprietary signal data leaves the trust boundary.
Future Considerations & Hybrid Approaches
The Convergence Trend
The build-vs-buy boundary is becoming more fluid. Several trends are compressing the decision:
- Model distillation: Large external models can be used to generate synthetic training data for smaller internal models — a "buy to build" strategy that uses API access to bootstrap internal capability
- Federated fine-tuning: Emerging infrastructure allows fine-tuning on sensitive data without centralising it, potentially enabling build paths that were previously blocked by data governance constraints
- Capability marketplaces: As covered in the capability markets lesson, agents are beginning to trade capabilities peer-to-peer, creating a third option beyond build-or-buy: rent from another agent
- Automated routing optimisation: Meta-learning systems that observe agent performance and cost across execution paths can automatically shift routing decisions as the economic landscape changes
The Hybrid Architecture Pattern
The practical optimum for most production agent deployments is a tiered capability architecture:
Tier 1 — Internal (always build):
Proprietary knowledge, sensitive data processing, high-frequency commodity tasks
Tier 2 — Hybrid (build + buy):
RAG over internal knowledge + external model for generation
Internal classifier + external API for execution
Tier 3 — External (always buy):
Real-time data, rapidly evolving capabilities, low-frequency specialised tasks
The routing layer that assigns tasks to tiers is itself a lightweight ML component — typically a fast classifier that adds <10ms overhead while potentially saving orders of magnitude in cost and latency for high-volume pipelines.
Strategic Forecast
The economics of this decision will continue shifting as:
- Open-weight model quality approaches proprietary API quality for an expanding range of tasks
- Fine-tuning infrastructure becomes cheaper and more automated
- Agent-to-agent capability markets mature, adding a third procurement channel
- Regulatory pressure on data sovereignty increases the penalty for external API use in sensitive domains
Organisations that treat build-vs-buy as a static architectural decision will be systematically outcompeted by those that implement dynamic, data-driven routing that continuously re-optimises the boundary.
Summary: Key Principles
- Frequency drives economics. High call volume favours build; low volume favours buy.
- Specificity drives quality. Domain-specific tasks favour build; commodity tasks favour buy.
- Sensitivity drives compliance. Regulated or proprietary data often mandates build.
- Latency drives architecture. Tight feedback loops require internal inference.
- The boundary is programmable. Dynamic routing between build and buy paths is the production-grade solution.
- Built capabilities are assets. Proprietary internal capabilities compound in value; commodity replications do not.
- The decision recurs. API pricing, open-weight model quality, and call volume all change — the build-vs-buy decision requires periodic re-evaluation, not a one-time answer.
Empirica Agent Economy Series | This lesson extends: Multi-Agent Systems with Specialised Subagents; API Service Categories for AI Agents. Next: Agent Orchestration Patterns and Runtime Decision Architecture.