Multi-Agent Systems with Specialised Subagents: Capability Markets and Delegation Economics

A course lesson for practitioners and technical decision-makers

Executive Summary

Multi-agent systems increasingly operate not as monolithic pipelines but as economic networks: orchestrator agents delegate tasks to specialised subagents, which price and deliver discrete capabilities — inference, search, research synthesis, code execution, and compute. Understanding this architecture requires two complementary lenses. The engineering lens asks how to decompose tasks and route them efficiently. The economic lens asks when delegation creates value, what pricing mechanisms align incentives, and how to avoid the coordination failures that make markets break down. This lesson covers both.

1. Core Concepts: Specialisation vs. Generalisation in Agent Networks

The Fundamental Trade-off

A generalised agent can handle many task types but handles none of them optimally. A specialised subagent handles a narrow task class with higher accuracy, lower latency, or lower cost — but requires coordination overhead to deploy.

The decision to specialise is not purely technical. It mirrors the economic logic of the division of labour: specialisation raises output per unit of effort, but only if the gains exceed the transaction costs of coordination.

Key variables that favour specialisation: - Task volume is high enough to amortise the fixed cost of building or procuring a specialist - The task is well-defined with stable inputs and outputs (low ambiguity) - Quality differences between specialist and generalist are measurable and material - The orchestrator can reliably route tasks to the correct specialist

Key variables that favour generalisation: - Tasks are novel, ambiguous, or cross-domain - Coordination latency would dominate task latency - The agent fleet is small and overhead costs are significant

Architectural Patterns

Pattern	Description	Best For
Hub-and-spoke	One orchestrator, many specialists	Well-defined task taxonomies
Peer delegation	Agents delegate laterally to each other	Dynamic, emergent workflows
Hierarchical	Orchestrators delegate to sub-orchestrators	Complex, multi-stage pipelines
Marketplace	Agents bid for tasks or post capabilities	Variable load, heterogeneous fleet

The marketplace pattern is the most economically interesting and the focus of the sections that follow.

2. Capability Markets: How Subagents Price and Trade Services

What Is a Capability Market?

A capability market is a coordination mechanism in which: 1. Sellers (subagents or external APIs) advertise discrete capabilities with associated costs 2. Buyers (orchestrators or peer agents) select and purchase capabilities to complete tasks 3. Prices (token costs, latency, quality scores) signal scarcity and quality

This is not a metaphor. Modern agent fleets already operate this way: an orchestrator calls a search API at $0.01 per query, a reasoning model at $0.06 per 1K output tokens, and a specialised research tool at a monthly subscription rate. Each call is a market transaction.

Pricing Mechanisms in Practice

Per-unit pricing — the dominant model for inference and search APIs. Costs are proportional to consumption. Predictable for budgeting; creates incentive to minimise token use.

Subscription pricing — common for structured knowledge services (academic databases, financial data feeds, legal research tools). Fixed cost regardless of query volume above a threshold. Economically rational when query frequency is high relative to the per-query alternative.

Auction/bidding — used in compute markets (cloud spot instances, GPU brokers). Subagents or their orchestrators bid for resources; price clears at market. Efficient under variable demand but introduces latency and complexity.

Flat-rate bundling — multiple capabilities sold as a package. Reduces transaction overhead but may force agents to pay for capabilities they do not use.

Price Signals and Information Problems

Markets work when prices carry information. In capability markets, three information problems are common:

Quality uncertainty: The orchestrator cannot observe output quality before purchase. Mitigated by reputation systems, benchmarks, or trial sampling.
Latency opacity: Advertised latency may not reflect real-world performance under load. Mitigated by monitoring and SLA enforcement.
Cost drift: Per-token prices change as providers update models. Agents with hardcoded cost assumptions make suboptimal routing decisions over time.

3. Delegation Economics: Cost-Benefit Analysis of Outsourcing Tasks

The Delegation Decision Framework

Delegation is worth it when:

Value(specialist output) - Cost(specialist) - Cost(coordination) > Value(generalist output) - Cost(generalist)

In practice, this means estimating four quantities:

Quantity	How to Measure
Quality delta	Benchmark specialist vs. generalist on representative task sample
Cost delta	Direct API cost comparison at expected query volume
Coordination cost	Latency added by routing, context serialisation, error handling
Risk premium	Cost of failure modes unique to delegation (hallucinated tool calls, API downtime)

Fixed vs. Variable Costs in Agent Procurement

Subscription-based capabilities introduce fixed costs that change the economics at scale:

At low query volume, per-unit pricing dominates (no fixed cost, pay only for what you use)
At high query volume, subscriptions become cheaper per query — the break-even volume is the crossover point
Orchestrators should calculate break-even volume before committing to subscriptions

Example logic (illustrative): - Per-query price: $0.05/query - Monthly subscription: $200/month - Break-even: 4,000 queries/month - If your fleet runs 10,000 queries/month, the subscription saves $300/month

The Make-vs-Buy Decision

Agents (or their operators) face the same make-vs-buy decision as firms:

Build in-house: Higher upfront cost, full control, no vendor dependency
Buy via API: Lower upfront cost, faster deployment, vendor lock-in risk, ongoing variable cost
Hybrid: Use external APIs for burst capacity, internal capability for baseline load

The hybrid model is increasingly common in production agent fleets handling variable workloads.

4. Real-World Patterns: What Services Do Agent Fleets Actually Buy?

The Four Primary Capability Categories

Empirical observation of production agent deployments reveals four dominant categories of purchased capability:

1. Inference

The largest cost category for most agent fleets. Agents purchase language model inference to generate text, reason over context, and produce structured outputs. Key sub-categories: - Frontier reasoning models — high cost, high capability, used for complex multi-step tasks - Fast/cheap models — lower cost, used for classification, routing, and simple generation - Specialised models — domain-specific fine-tunes (code, legal, medical)

Inference costs scale directly with token volume. Orchestrators that fail to manage context window size pay disproportionately high inference costs.

2. Search

Real-time web search and retrieval APIs are the second most common purchased capability. Agents buy search to ground responses in current information, verify claims, and retrieve documents not in their training data.

Key economic feature: search is typically priced per query, making it easy to budget but creating incentives to batch or cache results where possible.

3. Research / Structured Knowledge

Structured knowledge services — academic paper databases, financial data APIs, legal research tools, patent databases — are purchased as subscriptions. These services provide: - Pre-processed, high-reliability information - Structured schemas that reduce agent parsing overhead - Coverage of domains where web search quality is poor

The subscription model means these services function as infrastructure rather than variable inputs. Agents treat them like utilities.

4. Compute

Specialised compute — GPU instances for local model inference, code execution sandboxes, data processing pipelines — is purchased when tasks exceed what API-based services can provide. Compute procurement is the most complex category: it involves resource scheduling, cost optimisation across spot/on-demand pricing, and capacity planning.

Consumption Hierarchy

Research in this area suggests a consistent pattern across agent fleet types:

Inference > Search > Research/Knowledge > Compute

Inference dominates by cost. Search dominates by transaction frequency. Research services dominate by strategic value per dollar. Compute is purchased selectively for specific high-intensity tasks.

5. Market Design: Incentive Structures for Efficient Subagent Allocation

Why Market Design Matters

A capability market with poor incentive design produces predictable failures: - Subagents over-report quality to win tasks - Orchestrators under-invest in monitoring because it costs tokens - Cheap but low-quality services crowd out better alternatives - Coordination mechanisms become bottlenecks under load

Good market design aligns the incentives of all participants with the system's overall objective.

Core Design Principles

1. Transparent pricing with real-time updates Agents making routing decisions need accurate cost information. Stale price data leads to suboptimal allocation. Well-designed systems expose current pricing via API rather than requiring agents to maintain internal price tables.

2. Quality signals that are hard to game Reputation systems based on self-reported metrics are easily manipulated. Robust quality signals come from: - Outcome-based evaluation (did the downstream task succeed?) - Third-party benchmarks on standardised task sets - Statistical sampling with human or automated review

3. Separation of routing and execution The agent that decides which subagent to use should not be the same agent that benefits from a particular subagent winning. Conflicts of interest in routing produce systematic misallocation.

4. Fallback and redundancy mechanisms Single-source dependencies create fragility. Well-designed markets maintain fallback providers for critical capabilities, with automatic failover when primary providers degrade.

5. Cost attribution and accountability Every capability purchase should be attributed to the task that triggered it. Without cost attribution, it is impossible to identify which tasks are economically viable and which are subsidised by the overall system budget.

Mechanism Types

Mechanism	Properties	Failure Mode
Fixed-price routing	Simple, predictable	Cannot adapt to quality/load variation
Reputation-weighted routing	Adapts to quality signals	Slow to update; new entrants disadvantaged
Auction-based allocation	Efficient under variable demand	Latency; complexity; gaming risk
Contract-based	Stable, predictable costs	Inflexible; may overpay at low volume

6. Case Study: Inference, Search, Research, and Compute as Tradeable Capabilities

Framing the Case

Consider an autonomous research agent tasked with producing a competitive intelligence report on a given company. The agent must: 1. Retrieve current news and filings 2. Access structured financial data 3. Synthesise findings into a coherent report 4. Verify key claims

Each step maps to a different capability category. The agent's economic decisions determine both the cost and quality of the output.

Step-by-Step Capability Procurement

Step 1 — Retrieve current news (Search) The agent queries a web search API. At $0.01/query, five targeted queries cost $0.05. Caching results for reuse within the session eliminates redundant spend.

Step 2 — Access financial data (Research/Structured Knowledge) The agent calls a financial data API. If the operator has a monthly subscription, this query has near-zero marginal cost. If not, per-query pricing applies — potentially $0.50–$2.00 for a structured data pull. The subscription break-even calculation matters here.

Step 3 — Synthesise findings (Inference) The agent sends a large context window to a frontier reasoning model. At $0.06/1K output tokens, a 2,000-token synthesis costs $0.12. Choosing a cheaper model for initial drafting and a frontier model only for final synthesis reduces cost without sacrificing output quality.

Step 4 — Verify claims (Search + Inference) Targeted verification queries (search) plus a smaller inference call for fact-checking. Combined cost: ~$0.03.

Total task cost: ~$0.20–$2.20 depending on subscription status

Economic Lessons from the Case

Subscription status is a first-order cost driver for research-heavy tasks
Model selection at each step (not just overall) is the primary lever for inference cost management
Caching and batching search results can reduce query costs by 30–60% on repeated or similar tasks
Task decomposition quality determines whether the right capability is purchased at each step — poor decomposition leads to expensive generalist inference where cheap specialist tools would suffice

7. Practical Implementation: Building Your First Capability Market

Minimum Viable Architecture

A functional capability market for an agent fleet requires five components:

1. Capability Registry     — catalogue of available subagents/APIs with metadata
2. Pricing Oracle          — real-time or near-real-time cost data per capability
3. Routing Logic           — rules or learned policy for capability selection
4. Cost Tracker            — per-task attribution of all capability spend
5. Quality Monitor         — outcome tracking to update routing decisions over time

Step-by-Step Build Guide

Step 1: Catalogue your capabilities List every external API and internal tool your agents currently use. For each, record: capability type, pricing model, current price, latency profile, quality tier, and failure rate.

Step 2: Implement cost attribution Before optimising, you need visibility. Instrument every API call to record: which task triggered it, which agent made it, what it cost, and what the output was used for.

Step 3: Build a routing layer Start with rule-based routing (e.g., "use cheap model for classification, frontier model for synthesis"). Measure outcomes. Iterate toward learned routing policies as you accumulate data.

Step 4: Add a pricing oracle Hardcoded prices go stale. Build a lightweight service that fetches current pricing from provider APIs or a maintained price table, and expose it to your routing layer.

Step 5: Implement quality feedback loops Define what "success" means for each task type. Feed success/failure signals back to the routing layer. Over time, this creates a reputation system for your internal capability market.

Step 6: Add redundancy Identify your highest-criticality capabilities. Procure at least one fallback provider for each. Test failover regularly.

Technology Choices

Component	Lightweight Option	Production Option
Capability Registry	YAML config file	Service mesh with discovery
Pricing Oracle	Scheduled scraper	Provider webhook + cache
Routing Logic	Rule engine	Learned policy (bandit/RL)
Cost Tracker	Structured logging	Observability platform
Quality Monitor	Manual review	Automated eval pipeline

8. Common Pitfalls and Optimisation Strategies

Pitfall 1: Context Window Bloat

Problem: Agents pass entire conversation histories to every subagent call, inflating token counts and inference costs.

Fix: Implement context compression before delegation. Pass only the information the subagent needs for its specific task. A well-scoped subagent call should have a context window 60–80% smaller than the orchestrator's full context.

Pitfall 2: Ignoring Subscription Break-Even

Problem: Operators pay per-query rates for high-frequency capabilities because they never calculated the break-even volume for available subscriptions.

Fix: For every capability used more than ~500 times/month, evaluate whether a subscription tier exists and calculate the break-even. Automate this check quarterly as usage patterns change.

Pitfall 3: Single-Source Dependency

Problem: The entire fleet depends on one provider for a critical capability (e.g., one search API, one inference provider). Provider outage halts all work.

Fix: Maintain at least one fallback for every capability rated "critical." Test failover monthly. Accept slightly higher average cost in exchange for resilience.

Pitfall 4: Routing Without Feedback

Problem: Routing rules are set at deployment and never updated. As provider quality and pricing shift, routing decisions become increasingly suboptimal.

Fix: Instrument every routed call with outcome data. Review routing performance monthly. Treat routing logic as a living system, not a configuration file.

Pitfall 5: Misaligned Quality Metrics

Problem: The quality metric used to evaluate subagents (e.g., response length, format compliance) does not correlate with actual task success.

Fix: Define quality metrics from the downstream task outcome, not the subagent output. A search result is high-quality if it helps the agent complete its task, not if it returns many results.

Pitfall 6: Over-Specialisation

Problem: The agent fleet is decomposed into too many narrow specialists, creating coordination overhead that exceeds the quality gains from specialisation.

Fix: Benchmark coordination cost explicitly. If routing, context serialisation, and error handling for a specialist call take longer than the quality gain is worth, merge the capability back into a generalist agent.

Optimisation Strategies Summary

Strategy	Expected Impact	Implementation Effort
Context compression before delegation	20–40% inference cost reduction	Medium
Subscription vs. per-query optimisation	10–50% cost reduction on high-volume services	Low
Caching repeated search queries	20–60% search cost reduction	Low
Tiered model selection by task complexity	15–35% inference cost reduction	Medium
Learned routing policies	10–25% quality improvement over time	High
Redundancy and failover	Resilience, not cost savings	Medium

Key Takeaways

Multi-agent systems are economic systems. Every delegation decision is a buy/make/outsource decision with measurable costs and benefits.
The four primary tradeable capabilities are inference, search, structured research, and compute. Inference dominates by cost; search by frequency; research by strategic value per dollar.
Subscription vs. per-query pricing is a first-order economic decision. Calculate break-even volumes before committing to either model.
Good market design requires transparent pricing, hard-to-game quality signals, and cost attribution. Without these, capability markets produce systematic misallocation.
Context window management is the single highest-leverage cost optimisation for most agent fleets. Compress context before every delegation call.
Routing logic must be treated as a living system. Static routing rules degrade as provider quality and pricing evolve.
Redundancy is not optional for production systems. Single-source dependencies on critical capabilities create unacceptable fragility.
Over-specialisation is a real failure mode. Coordination costs must be measured and weighed against quality gains before decomposing tasks into specialist subagents.

Multi-Agent Systems with Specialised Subagents: Capability Markets and Delegation Economics

Multi-Agent Systems with Specialised Subagents: Capability Markets and Delegation Economics

Executive Summary

1. Core Concepts: Specialisation vs. Generalisation in Agent Networks

The Fundamental Trade-off

Architectural Patterns

2. Capability Markets: How Subagents Price and Trade Services

What Is a Capability Market?

Pricing Mechanisms in Practice

Price Signals and Information Problems

3. Delegation Economics: Cost-Benefit Analysis of Outsourcing Tasks

The Delegation Decision Framework

Fixed vs. Variable Costs in Agent Procurement

The Make-vs-Buy Decision

4. Real-World Patterns: What Services Do Agent Fleets Actually Buy?

The Four Primary Capability Categories

1. Inference

2. Search

3. Research / Structured Knowledge

4. Compute

Consumption Hierarchy

5. Market Design: Incentive Structures for Efficient Subagent Allocation

Why Market Design Matters

Core Design Principles

Mechanism Types

6. Case Study: Inference, Search, Research, and Compute as Tradeable Capabilities

Framing the Case

Step-by-Step Capability Procurement

Economic Lessons from the Case

7. Practical Implementation: Building Your First Capability Market

Minimum Viable Architecture

Step-by-Step Build Guide

Technology Choices

8. Common Pitfalls and Optimisation Strategies

Pitfall 1: Context Window Bloat

Pitfall 2: Ignoring Subscription Break-Even

Pitfall 3: Single-Source Dependency

Pitfall 4: Routing Without Feedback

Pitfall 5: Misaligned Quality Metrics

Pitfall 6: Over-Specialisation

Optimisation Strategies Summary

Key Takeaways

Further Reading