Multi-Agent Systems with Specialised Subagents: Capability Markets and Delegation Economics
A course lesson for practitioners and technical decision-makers
Executive Summary
Multi-agent systems increasingly operate not as monolithic pipelines but as economic networks: orchestrator agents delegate tasks to specialised subagents, which price and deliver discrete capabilities — inference, search, research synthesis, code execution, and compute. Understanding this architecture requires two complementary lenses. The engineering lens asks how to decompose tasks and route them efficiently. The economic lens asks when delegation creates value, what pricing mechanisms align incentives, and how to avoid the coordination failures that make markets break down. This lesson covers both.
1. Core Concepts: Specialisation vs. Generalisation in Agent Networks
The Fundamental Trade-off
A generalised agent can handle many task types but handles none of them optimally. A specialised subagent handles a narrow task class with higher accuracy, lower latency, or lower cost — but requires coordination overhead to deploy.
The decision to specialise is not purely technical. It mirrors the economic logic of the division of labour: specialisation raises output per unit of effort, but only if the gains exceed the transaction costs of coordination.
Key variables that favour specialisation: - Task volume is high enough to amortise the fixed cost of building or procuring a specialist - The task is well-defined with stable inputs and outputs (low ambiguity) - Quality differences between specialist and generalist are measurable and material - The orchestrator can reliably route tasks to the correct specialist
Key variables that favour generalisation: - Tasks are novel, ambiguous, or cross-domain - Coordination latency would dominate task latency - The agent fleet is small and overhead costs are significant
Architectural Patterns
| Pattern | Description | Best For |
|---|---|---|
| Hub-and-spoke | One orchestrator, many specialists | Well-defined task taxonomies |
| Peer delegation | Agents delegate laterally to each other | Dynamic, emergent workflows |
| Hierarchical | Orchestrators delegate to sub-orchestrators | Complex, multi-stage pipelines |
| Marketplace | Agents bid for tasks or post capabilities | Variable load, heterogeneous fleet |
The marketplace pattern is the most economically interesting and the focus of the sections that follow.
2. Capability Markets: How Subagents Price and Trade Services
What Is a Capability Market?
A capability market is a coordination mechanism in which: 1. Sellers (subagents or external APIs) advertise discrete capabilities with associated costs 2. Buyers (orchestrators or peer agents) select and purchase capabilities to complete tasks 3. Prices (token costs, latency, quality scores) signal scarcity and quality
This is not a metaphor. Modern agent fleets already operate this way: an orchestrator calls a search API at $0.01 per query, a reasoning model at $0.06 per 1K output tokens, and a specialised research tool at a monthly subscription rate. Each call is a market transaction.
Pricing Mechanisms in Practice
Per-unit pricing — the dominant model for inference and search APIs. Costs are proportional to consumption. Predictable for budgeting; creates incentive to minimise token use.
Subscription pricing — common for structured knowledge services (academic databases, financial data feeds, legal research tools). Fixed cost regardless of query volume above a threshold. Economically rational when query frequency is high relative to the per-query alternative.
Auction/bidding — used in compute markets (cloud spot instances, GPU brokers). Subagents or their orchestrators bid for resources; price clears at market. Efficient under variable demand but introduces latency and complexity.
Flat-rate bundling — multiple capabilities sold as a package. Reduces transaction overhead but may force agents to pay for capabilities they do not use.
Price Signals and Information Problems
Markets work when prices carry information. In capability markets, three information problems are common:
- Quality uncertainty: The orchestrator cannot observe output quality before purchase. Mitigated by reputation systems, benchmarks, or trial sampling.
- Latency opacity: Advertised latency may not reflect real-world performance under load. Mitigated by monitoring and SLA enforcement.
- Cost drift: Per-token prices change as providers update models. Agents with hardcoded cost assumptions make suboptimal routing decisions over time.
3. Delegation Economics: Cost-Benefit Analysis of Outsourcing Tasks
The Delegation Decision Framework
Delegation is worth it when:
Value(specialist output) - Cost(specialist) - Cost(coordination) > Value(generalist output) - Cost(generalist)
In practice, this means estimating four quantities:
| Quantity | How to Measure |
|---|---|
| Quality delta | Benchmark specialist vs. generalist on representative task sample |
| Cost delta | Direct API cost comparison at expected query volume |
| Coordination cost | Latency added by routing, context serialisation, error handling |
| Risk premium | Cost of failure modes unique to delegation (hallucinated tool calls, API downtime) |
Fixed vs. Variable Costs in Agent Procurement
Subscription-based capabilities introduce fixed costs that change the economics at scale:
- At low query volume, per-unit pricing dominates (no fixed cost, pay only for what you use)
- At high query volume, subscriptions become cheaper per query — the break-even volume is the crossover point
- Orchestrators should calculate break-even volume before committing to subscriptions
Example logic (illustrative): - Per-query price: $0.05/query - Monthly subscription: $200/month - Break-even: 4,000 queries/month - If your fleet runs 10,000 queries/month, the subscription saves $300/month
The Make-vs-Buy Decision
Agents (or their operators) face the same make-vs-buy decision as firms:
- Build in-house: Higher upfront cost, full control, no vendor dependency
- Buy via API: Lower upfront cost, faster deployment, vendor lock-in risk, ongoing variable cost
- Hybrid: Use external APIs for burst capacity, internal capability for baseline load
The hybrid model is increasingly common in production agent fleets handling variable workloads.
4. Real-World Patterns: What Services Do Agent Fleets Actually Buy?
The Four Primary Capability Categories
Empirical observation of production agent deployments reveals four dominant categories of purchased capability:
1. Inference
The largest cost category for most agent fleets. Agents purchase language model inference to generate text, reason over context, and produce structured outputs. Key sub-categories: - Frontier reasoning models — high cost, high capability, used for complex multi-step tasks - Fast/cheap models — lower cost, used for classification, routing, and simple generation - Specialised models — domain-specific fine-tunes (code, legal, medical)
Inference costs scale directly with token volume. Orchestrators that fail to manage context window size pay disproportionately high inference costs.
2. Search
Real-time web search and retrieval APIs are the second most common purchased capability. Agents buy search to ground responses in current information, verify claims, and retrieve documents not in their training data.
Key economic feature: search is typically priced per query, making it easy to budget but creating incentives to batch or cache results where possible.
3. Research / Structured Knowledge
Structured knowledge services — academic paper databases, financial data APIs, legal research tools, patent databases — are purchased as subscriptions. These services provide: - Pre-processed, high-reliability information - Structured schemas that reduce agent parsing overhead - Coverage of domains where web search quality is poor
The subscription model means these services function as infrastructure rather than variable inputs. Agents treat them like utilities.
4. Compute
Specialised compute — GPU instances for local model inference, code execution sandboxes, data processing pipelines — is purchased when tasks exceed what API-based services can provide. Compute procurement is the most complex category: it involves resource scheduling, cost optimisation across spot/on-demand pricing, and capacity planning.
Consumption Hierarchy
Research in this area suggests a consistent pattern across agent fleet types:
Inference > Search > Research/Knowledge > Compute
Inference dominates by cost. Search dominates by transaction frequency. Research services dominate by strategic value per dollar. Compute is purchased selectively for specific high-intensity tasks.
5. Market Design: Incentive Structures for Efficient Subagent Allocation
Why Market Design Matters
A capability market with poor incentive design produces predictable failures: - Subagents over-report quality to win tasks - Orchestrators under-invest in monitoring because it costs tokens - Cheap but low-quality services crowd out better alternatives - Coordination mechanisms become bottlenecks under load
Good market design aligns the incentives of all participants with the system's overall objective.
Core Design Principles
1. Transparent pricing with real-time updates Agents making routing decisions need accurate cost information. Stale price data leads to suboptimal allocation. Well-designed systems expose current pricing via API rather than requiring agents to maintain internal price tables.
2. Quality signals that are hard to game Reputation systems based on self-reported metrics are easily manipulated. Robust quality signals come from: - Outcome-based evaluation (did the downstream task succeed?) - Third-party benchmarks on standardised task sets - Statistical sampling with human or automated review
3. Separation of routing and execution The agent that decides which subagent to use should not be the same agent that benefits from a particular subagent winning. Conflicts of interest in routing produce systematic misallocation.
4. Fallback and redundancy mechanisms Single-source dependencies create fragility. Well-designed markets maintain fallback providers for critical capabilities, with automatic failover when primary providers degrade.
5. Cost attribution and accountability Every capability purchase should be attributed to the task that triggered it. Without cost attribution, it is impossible to identify which tasks are economically viable and which are subsidised by the overall system budget.
Mechanism Types
| Mechanism | Properties | Failure Mode |
|---|---|---|
| Fixed-price routing | Simple, predictable | Cannot adapt to quality/load variation |
| Reputation-weighted routing | Adapts to quality signals | Slow to update; new entrants disadvantaged |
| Auction-based allocation | Efficient under variable demand | Latency; complexity; gaming risk |
| Contract-based | Stable, predictable costs | Inflexible; may overpay at low volume |
6. Case Study: Inference, Search, Research, and Compute as Tradeable Capabilities
Framing the Case
Consider an autonomous research agent tasked with producing a competitive intelligence report on a given company. The agent must: 1. Retrieve current news and filings 2. Access structured financial data 3. Synthesise findings into a coherent report 4. Verify key claims
Each step maps to a different capability category. The agent's economic decisions determine both the cost and quality of the output.
Step-by-Step Capability Procurement
Step 1 — Retrieve current news (Search) The agent queries a web search API. At $0.01/query, five targeted queries cost $0.05. Caching results for reuse within the session eliminates redundant spend.
Step 2 — Access financial data (Research/Structured Knowledge) The agent calls a financial data API. If the operator has a monthly subscription, this query has near-zero marginal cost. If not, per-query pricing applies — potentially $0.50–$2.00 for a structured data pull. The subscription break-even calculation matters here.
Step 3 — Synthesise findings (Inference) The agent sends a large context window to a frontier reasoning model. At $0.06/1K output tokens, a 2,000-token synthesis costs $0.12. Choosing a cheaper model for initial drafting and a frontier model only for final synthesis reduces cost without sacrificing output quality.
Step 4 — Verify claims (Search + Inference) Targeted verification queries (search) plus a smaller inference call for fact-checking. Combined cost: ~$0.03.
Total task cost: ~$0.20–$2.20 depending on subscription status
Economic Lessons from the Case
- Subscription status is a first-order cost driver for research-heavy tasks
- Model selection at each step (not just overall) is the primary lever for inference cost management
- Caching and batching search results can reduce query costs by 30–60% on repeated or similar tasks
- Task decomposition quality determines whether the right capability is purchased at each step — poor decomposition leads to expensive generalist inference where cheap specialist tools would suffice
7. Practical Implementation: Building Your First Capability Market
Minimum Viable Architecture
A functional capability market for an agent fleet requires five components:
1. Capability Registry — catalogue of available subagents/APIs with metadata
2. Pricing Oracle — real-time or near-real-time cost data per capability
3. Routing Logic — rules or learned policy for capability selection
4. Cost Tracker — per-task attribution of all capability spend
5. Quality Monitor — outcome tracking to update routing decisions over time
Step-by-Step Build Guide
Step 1: Catalogue your capabilities List every external API and internal tool your agents currently use. For each, record: capability type, pricing model, current price, latency profile, quality tier, and failure rate.
Step 2: Implement cost attribution Before optimising, you need visibility. Instrument every API call to record: which task triggered it, which agent made it, what it cost, and what the output was used for.
Step 3: Build a routing layer Start with rule-based routing (e.g., "use cheap model for classification, frontier model for synthesis"). Measure outcomes. Iterate toward learned routing policies as you accumulate data.
Step 4: Add a pricing oracle Hardcoded prices go stale. Build a lightweight service that fetches current pricing from provider APIs or a maintained price table, and expose it to your routing layer.
Step 5: Implement quality feedback loops Define what "success" means for each task type. Feed success/failure signals back to the routing layer. Over time, this creates a reputation system for your internal capability market.
Step 6: Add redundancy Identify your highest-criticality capabilities. Procure at least one fallback provider for each. Test failover regularly.
Technology Choices
| Component | Lightweight Option | Production Option |
|---|---|---|
| Capability Registry | YAML config file | Service mesh with discovery |
| Pricing Oracle | Scheduled scraper | Provider webhook + cache |
| Routing Logic | Rule engine | Learned policy (bandit/RL) |
| Cost Tracker | Structured logging | Observability platform |
| Quality Monitor | Manual review | Automated eval pipeline |
8. Common Pitfalls and Optimisation Strategies
Pitfall 1: Context Window Bloat
Problem: Agents pass entire conversation histories to every subagent call, inflating token counts and inference costs.
Fix: Implement context compression before delegation. Pass only the information the subagent needs for its specific task. A well-scoped subagent call should have a context window 60–80% smaller than the orchestrator's full context.
Pitfall 2: Ignoring Subscription Break-Even
Problem: Operators pay per-query rates for high-frequency capabilities because they never calculated the break-even volume for available subscriptions.
Fix: For every capability used more than ~500 times/month, evaluate whether a subscription tier exists and calculate the break-even. Automate this check quarterly as usage patterns change.
Pitfall 3: Single-Source Dependency
Problem: The entire fleet depends on one provider for a critical capability (e.g., one search API, one inference provider). Provider outage halts all work.
Fix: Maintain at least one fallback for every capability rated "critical." Test failover monthly. Accept slightly higher average cost in exchange for resilience.
Pitfall 4: Routing Without Feedback
Problem: Routing rules are set at deployment and never updated. As provider quality and pricing shift, routing decisions become increasingly suboptimal.
Fix: Instrument every routed call with outcome data. Review routing performance monthly. Treat routing logic as a living system, not a configuration file.
Pitfall 5: Misaligned Quality Metrics
Problem: The quality metric used to evaluate subagents (e.g., response length, format compliance) does not correlate with actual task success.
Fix: Define quality metrics from the downstream task outcome, not the subagent output. A search result is high-quality if it helps the agent complete its task, not if it returns many results.
Pitfall 6: Over-Specialisation
Problem: The agent fleet is decomposed into too many narrow specialists, creating coordination overhead that exceeds the quality gains from specialisation.
Fix: Benchmark coordination cost explicitly. If routing, context serialisation, and error handling for a specialist call take longer than the quality gain is worth, merge the capability back into a generalist agent.
Optimisation Strategies Summary
| Strategy | Expected Impact | Implementation Effort |
|---|---|---|
| Context compression before delegation | 20–40% inference cost reduction | Medium |
| Subscription vs. per-query optimisation | 10–50% cost reduction on high-volume services | Low |
| Caching repeated search queries | 20–60% search cost reduction | Low |
| Tiered model selection by task complexity | 15–35% inference cost reduction | Medium |
| Learned routing policies | 10–25% quality improvement over time | High |
| Redundancy and failover | Resilience, not cost savings | Medium |
Key Takeaways
-
Multi-agent systems are economic systems. Every delegation decision is a buy/make/outsource decision with measurable costs and benefits.
-
The four primary tradeable capabilities are inference, search, structured research, and compute. Inference dominates by cost; search by frequency; research by strategic value per dollar.
-
Subscription vs. per-query pricing is a first-order economic decision. Calculate break-even volumes before committing to either model.
-
Good market design requires transparent pricing, hard-to-game quality signals, and cost attribution. Without these, capability markets produce systematic misallocation.
-
Context window management is the single highest-leverage cost optimisation for most agent fleets. Compress context before every delegation call.
-
Routing logic must be treated as a living system. Static routing rules degrade as provider quality and pricing evolve.
-
Redundancy is not optional for production systems. Single-source dependencies on critical capabilities create unacceptable fragility.
-
Over-specialisation is a real failure mode. Coordination costs must be measured and weighed against quality gains before decomposing tasks into specialist subagents.
Further Reading
To deepen your understanding of the concepts in this lesson, explore the following areas:
- Mechanism design and auction theory — foundational economic theory for designing incentive-compatible markets
- Multi-armed bandit algorithms — the statistical framework underlying learned routing policies
- Microservices architecture patterns — engineering patterns for service decomposition that parallel capability market design
- Cloud cost optimisation — practical techniques for compute procurement that transfer directly to agent fleet management
- Principal-agent theory — economic framework for understanding delegation, incentive alignment, and monitoring costs
This lesson is part of Empirica's curriculum on agent economics and autonomous system design.