Sieve-Based Enumeration and Pruning for Agent Service Discovery

1. Overview

Portfolio screening of 13F filings demands systematic enumeration of a vast combinatorial space (thousands of holdings, hundreds of managers, exponential subsets) followed by aggressive pruning via cheap-to-compute filters before expensive valuation. This computational pattern — the sieve — maps with surprising fidelity onto how autonomous agents must discover services in a fragmenting agent economy. As the registry of agent-callable endpoints grows from hundreds (MCP servers, OpenAPI catalogues, A2A directories) toward the projected tens of thousands by 2027, naive enumeration becomes economically infeasible and capability matching becomes the dominant cost in agent workflows. This note formalises the sieve analogy, derives concrete discovery architectures, and identifies where Empirica's structured-note API fits as a pre-pruned, high-signal layer in that sieve.

2. Key findings

Discovery is becoming the rate-limiting step in agent workloads. Anecdotal data from MCP server registries (modelcontextprotocol.io, smithery.ai) shows roughly 5,000+ registered MCP servers as of mid-2025, up from under 500 in early 2025 — a ~10× growth in six months. Per-call enumeration of all candidates exceeds the value of most individual tasks. [SPECULATIVE] If growth continues, raw enumeration cost will exceed median task value within 12 months.
The sieve of Eratosthenes pattern — generate candidates cheaply, eliminate via cheap predicates, evaluate survivors expensively — is the canonical efficient strategy when the candidate set is large and the evaluation function is costly. This pattern is well-established in automated planning competitions [P5], where heuristic pruning (relaxed-plan heuristics, landmarks) reduces state-space search by orders of magnitude before full evaluation.
Transfer of prior screening experience compounds returns. Optimisation research on transfer learning across related problems [P1] shows that solvers accumulating problem-instance memory outperform cold-start search by 30–60% in iterations-to-solution on benchmarks. Applied to agent discovery, an agent that caches "which MCP servers actually delivered" across tasks will dominate cold-discovery agents in latency and cost.
Autonomous research agents already exhibit sieve behaviour internally. Coscientist [P7] uses cheap web/document search to enumerate candidate reagents and protocols, applies LLM-based plausibility filtering, then commits expensive lab automation only to survivors. The same three-tier architecture (enumerate → filter → execute) is the natural template for agent service discovery.
Platform-mediated discovery dominates open-web discovery economically. Research on digital platforms and infrastructures [P8] documents that two-sided markets converge on a small number of high-trust intermediaries because aggregation reduces buyer search costs more than it raises supplier fees. Agent service marketplaces (Anthropic's MCP directory, OpenAI's GPT Store, emerging A2A registries) will follow the same logic.
Pricing-page evidence on cost asymmetry. Per OpenAI pricing (https://openai.com/api/pricing/) GPT-4o input is $2.50/M tokens; Anthropic pricing (https://www.anthropic.com/pricing) lists Claude Sonnet 4.5 at $3/M input. Per-tool-call enumeration that requires reasoning about each candidate (~500 tokens of description per server) across 1000 servers costs roughly $1.25–$1.50 per discovery attempt — exceeding the value of most $0.001–$0.10 micro-task completions. Sieve pruning is not an optimisation, it is a precondition for economic viability.
Structured representations enable graph-based pruning. Graph neural network methods for materials discovery [P3] demonstrate that representing candidates (molecules) as graphs and learning embeddings makes near-neighbour pruning tractable at scales (10⁶–10⁸ candidates) impossible for enumerative search. The analogous move for agent discovery: represent services as nodes in a capability graph, embed by capability vector, prune by cosine distance before any LLM-mediated evaluation.
Battery informatics offers a data-scarcity analogue [P4]. Just as battery ML suffers from sparse, heterogeneous datasets requiring informed priors, agent discovery suffers from sparse calibration data on which services actually deliver against their declared capability. Active-learning sieves — preferentially probe under-sampled service categories — are likely necessary.

3. Agent service patterns — the discovery sieve formalised

3.1 The four-stage sieve for agent service discovery

Translating quantitative 13F screening directly to agent discovery yields a four-stage architecture:

Sieve-Based Enumeration and Pruning for Agent Service Discovery

Sieve-Based Enumeration and Pruning for Agent Service Discovery

1. Overview

2. Key findings

3. Agent service patterns — the discovery sieve formalised

3.1 The four-stage sieve for agent service discovery

Subscribe to read the full publication