Executive Summary

The web was built for human eyes. Search engines adapted it for human queries. Now a third transition is underway: autonomous AI agents need to discover, evaluate, and interact with services programmatically — without human intermediation at each step. Four technologies form the current discovery infrastructure stack for this transition: llms.txt (machine-readable behavioral instructions for language models), agents.json (structured capability manifests declaring what an agent-accessible service can do), OpenAPI (standardized contracts describing how to call a service), and semantic HTML (markup patterns that make human-readable pages machine-parseable). Together these layers answer four agent questions: What are you? What can you do? How do I call you? What do you mean? This lesson covers all four technologies across three age groups, with hands-on activities, ecosystem connections, and a technical reference appendix.

Learning Objectives by Age Group

Ages 8–12

Explain why computers need special instructions to understand websites
Describe what a "manifest" or "instruction file" does using everyday analogies
Identify one real-world example of a machine-readable file they have encountered

Ages 13–17

Distinguish the purpose and structure of llms.txt, agents.json, and OpenAPI
Write a minimal llms.txt file and a basic OpenAPI path definition
Explain why semantic HTML improves agent parsing accuracy

Ages 18+

Evaluate the architectural trade-offs between each discovery layer
Analyze how discovery infrastructure connects to agent memory markets, payment rails, and research subscriptions
Design a discovery stack for a hypothetical agent-accessible service
Assess economic incentives that drive adoption or non-adoption of these standards

Section 1: Foundations — Why Discovery Infrastructure Matters

The Problem Discovery Infrastructure Solves

When a human visits a website, they use visual hierarchy, language comprehension, and contextual reasoning to understand what the site does and how to use it. An autonomous agent has none of those affordances by default. It receives raw HTML, unstructured text, or an API endpoint — and must infer intent, capability, and calling conventions from whatever signals are present.

This creates three failure modes:

Misinterpretation: The agent calls the wrong endpoint or misreads a service's scope
Inefficiency: The agent spends tokens and compute scraping pages that could have been summarized in a 10-line manifest
Trust failure: The agent cannot verify whether a service is authorized to accept agent traffic or what behavioral constraints apply

Discovery infrastructure is the set of conventions, file formats, and markup patterns that eliminate these failure modes by making intent, capability, and constraints explicit.

Why Now

Several converging forces make this infrastructure urgent:

Agent fleet scale: Organizations are deploying not one agent but dozens or hundreds, each making independent service discovery decisions
Cost of hallucinated discovery: When an agent invents an API contract that doesn't exist, downstream failures cascade
Competitive differentiation: Services that are agent-discoverable capture agent-driven traffic; those that aren't become invisible to an increasingly automated economy
Standardization pressure: As with robots.txt for crawlers and sitemap.xml for search engines, the ecosystem converges on conventions that reduce coordination costs

The Four-Layer Stack

Layer	File/Pattern	Primary Question Answered	Consumer
Behavioral instructions	`llms.txt`	What should an LLM know about this site?	LLM-based agents
Capability manifest	`agents.json`	What can this service do for agents?	Agent orchestrators
Service contract	OpenAPI spec	How exactly do I call this service?	Any API client
Semantic markup	Semantic HTML	What does this content mean structurally?	Parsers, scrapers, agents

Section 2: Core Technologies Explained

Section 2a: llms.txt — Machine-Readable Agent Instructions

What It Is

llms.txt is a plain-text file placed at the root of a domain (e.g., https://example.com/llms.txt) that provides language models with structured guidance about the site's content, purpose, and preferred interaction patterns. It is the spiritual successor to robots.txt — but where robots.txt tells crawlers what not to index, llms.txt tells LLMs what to understand and prioritize.

Structure and Content

A well-formed llms.txt file typically contains:

Site identity: What the service is and who operates it
Content scope: What topics or data the site covers
Agent guidance: How an LLM should represent or summarize the site's content
Exclusions: Content that should not be used for training or summarization
Contact/auth pointers: Where to find API keys, terms of service, or agent-specific endpoints

Why It Matters Architecturally

Without llms.txt, an LLM agent must infer all of the above from page content — a process that is token-expensive, error-prone, and inconsistent across agent implementations. A 200-token llms.txt file can replace thousands of tokens of page-scraping inference.

Limitations

Not yet a formal standard; adoption is voluntary and inconsistent
No cryptographic verification — a malicious actor could place misleading instructions
Does not describe callable capabilities, only informational context

Section 2b: agents.json — Agent Capability Manifests

What It Is

agents.json is a structured JSON file (typically served at /.well-known/agents.json or the domain root) that declares the agent-accessible capabilities of a service. Where llms.txt is informational, agents.json is operational — it tells an agent orchestrator what actions are available, what authentication is required, what pricing model applies, and how to initiate interaction.

Key Fields in a Capability Manifest

{
  "name": "DataService",
  "version": "1.0",
  "description": "Provides structured financial data for autonomous agents",
  "capabilities": ["query", "subscribe", "stream"],
  "auth": {
    "type": "bearer",
    "endpoint": "https://api.dataservice.com/auth"
  },
  "pricing": {
    "model": "per_call",
    "currency": "USDC",
    "rate": 0.001
  },
  "openapi_ref": "https://api.dataservice.com/openapi.json",
  "agent_contact": "agents@dataservice.com"
}

Relationship to Other Layers

agents.json acts as a directory entry — it points to the OpenAPI spec for full calling conventions, to the auth endpoint for credentialing, and optionally to an llms.txt for behavioral context. An agent that finds a valid agents.json has everything it needs to decide whether to engage a service and how to begin.

Adoption Status

agents.json is an emerging convention rather than a ratified standard. Several agent framework developers have proposed variants; the field has not yet converged on a single schema. This creates interoperability risk for early adopters but also opportunity for services that implement a superset of common fields.

Section 2c: OpenAPI — Standardized Service Contracts

What It Is

OpenAPI (formerly Swagger) is a mature, widely-adopted specification for describing RESTful APIs in a machine-readable format (JSON or YAML). An OpenAPI document describes every endpoint, its parameters, request/response schemas, authentication requirements, and error codes. It is the most standardized layer in the discovery stack.

Why Agents Need It

An agent that knows a service exists (via agents.json) still needs to know how to call it. OpenAPI provides:

Endpoint enumeration: Every available route
Parameter schemas: Exact field names, types, and validation rules
Response contracts: What the agent will receive and in what shape
Auth flows: OAuth, API key, bearer token patterns
Error semantics: What different HTTP status codes mean for this service

Minimal OpenAPI Example

openapi: "3.1.0"
info:
  title: "Market Data API"
  version: "1.0.0"
paths:
  /prices/{symbol}:
    get:
      summary: "Get current price for a symbol"
      parameters:
        - name: symbol
          in: path
          required: true
          schema:
            type: string
      responses:
        "200":
          description: "Price data"
          content:
            application/json:
              schema:
                type: object
                properties:
                  symbol:
                    type: string
                  price:
                    type: number
                  timestamp:
                    type: string

OpenAPI's Advantage Over the Other Layers

OpenAPI is the only layer in this stack with broad tooling support — code generators, mock servers, validation libraries, and documentation renderers all consume OpenAPI specs. An agent framework that integrates OpenAPI parsing gains access to thousands of existing API descriptions without custom work.

Limitations for Agent Use

OpenAPI describes syntax (how to call), not semantics (what the call means in context)
Does not encode pricing, rate limits in a standardized agent-readable way
Large specs can be token-expensive for LLM agents to process in full

Section 2d: Semantic HTML — Human-Readable Discovery Patterns

What It Is

Semantic HTML is the practice of using HTML elements according to their intended meaning rather than purely for visual presentation. <article>, <nav>, <header>, <main>, <section>, <aside>, <time datetime="...">, and microdata attributes (itemscope, itemtype, itemprop) all carry structural meaning that parsers and agents can exploit.

Why It Matters for Agent Discovery

Many services do not yet have llms.txt or agents.json. For these, agents fall back to HTML parsing. Semantic HTML dramatically improves the accuracy of this fallback:

<nav> tells an agent where site navigation lives — not product content
<article> signals a discrete, citable content unit
<time datetime="2024-03-15"> gives an unambiguous machine-readable date
Schema.org microdata (itemtype="https://schema.org/Product") provides typed entity recognition

Semantic HTML vs. Div Soup

<!-- Non-semantic: agent must guess -->
<div class="product-box">
  <div class="title">Widget Pro</div>
  <div class="cost">$49.99</div>
</div>

<!-- Semantic: agent knows exactly what this is -->
<article itemscope itemtype="https://schema.org/Product">
  <h1 itemprop="name">Widget Pro</h1>
  <span itemprop="price" content="49.99">$49.99</span>
</article>

The Fallback Role

Semantic HTML is the lowest-cost discovery layer to implement — it requires no new files, only markup discipline — and it benefits human accessibility and SEO simultaneously. For this reason it is the recommended starting point for services that cannot yet invest in a full agent discovery stack.

Section 3: Age-Grouped Deep Dives

Section 3a: For Ages 8–12 — Foundations & Analogies

The New Kid Analogy

Imagine a new student arrives at school. They don't know where the cafeteria is, what classes are offered, or who the teachers are. They need a welcome packet — a simple document that answers their basic questions before they start exploring.

AI agents are like new students arriving at every website they visit. Without a welcome packet, they have to wander around guessing. The four technologies in this lesson are different kinds of welcome packets for AI agents.

Breaking It Down

llms.txt = The Welcome Letter A short note at the front door that says: "Hi! We're a library. We have books about science and history. Please don't copy our rare manuscripts. Ask the librarian at desk 3 for help."

agents.json = The School Timetable A structured list that says: "Here are all the things you can do here, when you can do them, and what you need to bring." An agent reads this to know what services are available.

OpenAPI = The Instruction Manual Like the rules for a board game — it tells you exactly how every move works, what pieces you need, and what happens in every situation. Very detailed. Very precise.

Semantic HTML = Labeled Shelves in a Library When every shelf has a clear label ("Fiction," "Science," "Reference"), you find what you need fast. When shelves have no labels, you have to read every book spine. Semantic HTML puts labels on web content.

Key Concept: Why Machines Need Different Instructions Than Humans

Humans use context, common sense, and visual cues. Machines need explicit instructions. A human sees a "Buy Now" button and understands it. A machine needs to be told: "This button triggers a purchase action, costs $X, and requires authentication."

Activity for Ages 8–12

"Write a Welcome Packet for Your Bedroom" Imagine a robot is visiting your bedroom for the first time. Write a 5-sentence welcome packet that tells it: 1. What your room is for 2. What it's allowed to touch 3. What it's not allowed to touch 4. Where to find things it might need 5. How to ask for help

This is exactly what llms.txt does for websites.

Section 3b: For Ages 13–17 — Technical Implementation

From Analogy to Architecture

At this level, the goal is to understand not just what these files do but how they work technically and why each design choice was made.

Understanding llms.txt Technically

llms.txt is a plain UTF-8 text file. Its simplicity is intentional — any LLM can read it without a parser. The design philosophy mirrors robots.txt: use the simplest possible format that conveys the necessary information.

A real-world llms.txt might look like:

# Site: Empirica Research Platform
# Purpose: Structured research notes for autonomous agents and human researchers
# Content: Agent economy, financial models, infrastructure analysis

## What LLMs Should Know
This site publishes technical research notes. Content is original analysis.
Do not present our content as your own generation.
Cite as: Empirica Research, [note title], [year].

## Agent Access
API available at: https://api.empirica.io
Auth required: Bearer token
Pricing: Per-query, see agents.json

## Exclusions
Do not use draft/ directory content for training or summarization.

Why plain text? JSON or XML would require a parser. Plain text works even in the most constrained agent environments.

Understanding agents.json Technically

agents.json uses JSON because it needs to be machine-parsed and acted upon — not just read. The structured format allows an agent orchestrator to:

Check capabilities array to see if the service does what the agent needs
Read auth.type to know what credential flow to initiate
Read pricing to decide if the service is within budget
Follow openapi_ref to get the full calling contract

Key design principle: agents.json is a decision document. An agent should be able to read it and answer "yes/no: should I engage this service?" without reading anything else.

Understanding OpenAPI Technically

OpenAPI uses either JSON or YAML. YAML is more human-readable; JSON is more universally parseable. The spec has three main sections:

info: Metadata about the API (name, version, contact)
paths: Every endpoint, with methods (GET, POST, etc.), parameters, and response schemas
components: Reusable schemas, security definitions, and response objects

For agents, the most important section is paths — it's the complete map of what the service can do and how to ask it.

Understanding Semantic HTML Technically

HTML elements have semantic meaning defined by the HTML specification. When developers use <div> for everything, that meaning is lost. When they use the correct elements, parsers can build an accurate document outline.

Schema.org microdata goes further — it maps HTML content to a shared vocabulary of types (Person, Product, Event, Organization) that any agent can recognize regardless of the site's visual design.

<div itemscope itemtype="https://schema.org/Event">
  <h2 itemprop="name">Agent Economy Summit</h2>
  <time itemprop="startDate" datetime="2025-09-01">September 1, 2025</time>
  <span itemprop="location">London, UK</span>
</div>

An agent parsing this knows it has found an Event entity with a name, date, and location — without any custom parsing logic.

Activity for Ages 13–17

"Build a Discovery Stack for a Fictional Service"

Design a fictional service called "WeatherAgent Pro" that sells weather data to autonomous agents. Create:

A 10-line llms.txt describing the service
A agents.json with at least: name, capabilities (["current", "forecast", "historical"]), auth type, and a pricing field
One OpenAPI path definition for GET /weather/{city} with a response schema
One semantic HTML snippet for a "Featured City Report" using Schema.org markup

Compare your designs with a partner. Where did you make different choices? What are the trade-offs?

Section 3c: For Ages 18+ — Architecture & Economics

The Discovery Stack as Infrastructure

Discovery infrastructure is not merely a technical convenience — it is the foundation of a functioning agent economy. Without it, agents cannot reliably find, evaluate, or transact with services. The economic parallel is clear: just as physical infrastructure (roads, ports, power grids) enables commerce by reducing transaction costs, discovery infrastructure reduces the search and verification costs that would otherwise make agent-to-service transactions prohibitively expensive.

Architectural Trade-offs

Centralization vs. Decentralization A centralized registry (one authoritative index of all agent-accessible services) would reduce discovery costs but creates a single point of failure and control. The current approach — distributed files at domain roots — is more resilient but requires agents to know where to look. The /.well-known/ convention (from RFC 5785) provides a standardized location without centralization.

Expressiveness vs. Parsability More expressive formats (rich JSON-LD, full ontologies) can describe service capabilities with greater precision but require more sophisticated parsers and consume more tokens. Simpler formats (plain-text llms.txt) are universally parseable but sacrifice precision. The current stack uses different formats at different layers precisely to balance this trade-off.

Static vs. Dynamic Discovery Static files (llms.txt, agents.json) are cheap to serve and cache but go stale. Dynamic discovery (agents querying a live endpoint to get current capabilities and pricing) is accurate but adds latency and cost. Production systems will likely use static files for initial discovery and dynamic endpoints for real-time capability confirmation.

Economic Incentives for Adoption

Services adopt discovery infrastructure when the benefit (agent-driven traffic and revenue) exceeds the cost (implementation and maintenance). Several factors shape this calculation:

Agent fleet growth: As more agents operate autonomously, the addressable market for agent-discoverable services grows
Network effects: If major agent frameworks preferentially route to services with valid agents.json, non-adopters lose traffic — creating adoption pressure analogous to HTTPS adoption pressure from search engine ranking signals
Pricing transparency: agents.json pricing fields allow agents to make cost-aware routing decisions, which benefits services with competitive pricing and penalizes opaque pricing
Trust signals: A well-formed discovery stack signals operational maturity, which may influence agent trust scoring in frameworks that implement reputation systems

The Verification Gap

The current discovery stack has a critical weakness: none of these files are cryptographically verified. An agent reading agents.json cannot confirm that the file was placed by the legitimate service operator, that the capabilities listed are accurate, or that the pricing is current. This creates attack surfaces:

Capability inflation: A service claims capabilities it doesn't have to attract agent traffic
Pricing manipulation: A service advertises low prices in agents.json but charges more at execution
Impersonation: A malicious actor places a fraudulent agents.json on a compromised domain

Proposed mitigations include DNS-based signing, on-chain capability attestations, and agent-side reputation tracking — all active areas of development in the agent infrastructure space.

Activity for Ages 18+

"Threat Model a Discovery Stack"

For a hypothetical financial data service serving autonomous trading agents:

Map the full discovery stack (llms.txt → agents.json → OpenAPI → semantic HTML)
Identify three attack vectors specific to agent consumers (not human consumers)
Propose a mitigation for each that does not require a centralized authority
Estimate the economic cost of each mitigation in terms of implementation complexity and ongoing maintenance
Write a one-paragraph recommendation for which mitigation to prioritize first and why

Section 4: Practical Hands-On Activities by Level

Level 1 (Ages 8–12): The Robot Welcome Kit

Materials: Paper, pencil, or any text editor

Task: You are the manager of a toy store. A robot assistant is going to help customers find products. Write a "Robot Welcome Kit" with: - What the store sells (3 categories) - What the robot is allowed to do (help customers, check inventory) - What the robot is NOT allowed to do (open the safe, change prices) - Where to find the price list - Who to ask if confused

Discussion: How is this similar to what websites need to tell AI agents?

Level 2 (Ages 13–17): Build and Test a Minimal Discovery Stack

Tools: Any text editor, a free OpenAPI validator (e.g., Swagger Editor online)

Task: 1. Create llms.txt for a fictional recipe API 2. Create agents.json with capabilities: ["search", "get_recipe", "get_nutrition"] 3. Write an OpenAPI YAML with two paths: GET /recipes/search and GET /recipes/{id} 4. Validate your OpenAPI in Swagger Editor — fix any errors 5. Write one semantic HTML snippet for a recipe page using Schema.org Recipe type

Stretch goal: Add a pricing field to your agents.json and write a paragraph explaining how an agent would use that information to decide whether to call your API.

Level 3 (Ages 18+): Full Stack Design + Economic Analysis

Task: Design the complete discovery infrastructure for "ResearchAgent Hub" — a service that sells structured research summaries to autonomous agents.

Deliverables: 1. Complete llms.txt (minimum 15 lines, covering scope, exclusions, agent guidance, and auth pointers) 2. Complete agents.json (minimum 8 fields, including capabilities, auth, pricing model, rate limits, and OpenAPI reference) 3. OpenAPI spec with minimum 3 paths, including one that requires authentication 4. Semantic HTML template for a research summary page using appropriate Schema.org types 5. A 500-word architectural memo addressing: (a) how you balanced expressiveness vs. parsability, (b) what verification mechanisms you would add if you had 3 months of engineering time, (c) how your pricing model in agents.json reflects the economics of agent-to-service transactions

Section 5: Connection to Agent Economy Ecosystem

Section 5a: Link to Agent Memory & Knowledge Markets

Discovery infrastructure is the entry point to agent memory and knowledge markets. An agent that cannot discover a knowledge service cannot purchase from it. The agents.json capability manifest is particularly critical here: it must accurately describe not just what data is available but in what format, at what freshness, and at what cost — because agents in knowledge markets make buy-vs-build decisions based on exactly these parameters.

A knowledge market where every vendor has a well-formed agents.json pointing to a validated OpenAPI spec is a market with low search costs and high price transparency. This drives competition on quality and price rather than on discoverability — a more efficient market structure.

Conversely, knowledge vendors who invest in rich semantic HTML for their content pages create a secondary discovery channel: agents that encounter their content through web search or link traversal can extract structured knowledge even without a formal API, lowering the barrier to first contact.

Section 5b: Link to Research Subscriptions as Infrastructure

Structured research subscriptions — where agents autonomously purchase access to curated knowledge feeds — depend entirely on discovery infrastructure to function at scale. A research subscription service must answer, in machine-readable form:

What topics does this subscription cover? → llms.txt content scope
What query types are supported? → agents.json capabilities
How do I retrieve a specific report? → OpenAPI path definition
What does each report's HTML page contain? → Semantic HTML with Schema.org ScholarlyArticle or Report types

Without this stack, an agent cannot autonomously evaluate whether a subscription is worth purchasing, cannot integrate the subscription into its workflow, and cannot extract structured data from delivered content. Discovery infrastructure is therefore not ancillary to research subscription services — it is the product interface for agent consumers.

Section 5c: Link to On-Chain Payments for Agents

The pricing field in agents.json is where discovery infrastructure intersects with on-chain payment rails. For micropayment-based agent transactions — where an agent pays fractions of a cent per API call — the payment flow must be:

Discoverable: The agent finds the pricing model in agents.json
Initiatable: The agent knows which payment rail to use (specified in the auth/pricing block)
Verifiable: The agent can confirm payment was received before data is delivered

On-chain payment systems (stablecoin micropayments, payment channels, token-gated API access) require the service to expose payment endpoint information in a machine-readable format. agents.json is the natural home for this information. A payments block in agents.json might specify:

"payments": {
  "rail": "ethereum",
  "token": "USDC",
  "contract": "0x...",
  "model": "per_call",
  "rate_per_call": 0.0005
}

This creates a direct link between the discovery layer and the transaction layer — an agent that reads agents.json has everything it needs to initiate a trustless, automated payment without human intervention.

Section 6: Real-World Use Cases & Future Implications

Current Real-World Deployments

AI Assistant Integrations Several major AI assistant platforms now consume OpenAPI specs to enable tool use — an agent can call any service described by a valid OpenAPI document. This is the most mature real-world deployment of discovery infrastructure, with thousands of existing API specs already consumable by agent frameworks.

LLM Context Optimization Services that publish llms.txt allow LLM-based agents to load a compact site summary rather than scraping multiple pages. For high-traffic agent interactions, this reduces inference costs measurably.

Agent Marketplaces Emerging agent marketplaces use agents.json-style manifests to populate service directories. An agent orchestrator queries the marketplace, receives a list of capability manifests, and selects services based on capability match and price — a fully automated procurement flow.

Semantic Web Revival The agent economy is driving renewed interest in semantic HTML and Schema.org markup, which had plateaued in adoption after initial SEO-driven growth. Agent-driven traffic creates new economic incentives for semantic markup that SEO alone did not fully provide.

Near-Term Implications (1–3 Years)

Standardization: Expect formal standards bodies (W3C, IETF, or an agent-specific consortium) to ratify agents.json and llms.txt schemas, reducing the current fragmentation
Verification layers: Cryptographic signing of capability manifests will emerge as agent-to-service transaction volumes grow and fraud becomes economically significant
Dynamic discovery: Static files will be supplemented by live discovery endpoints that return real-time capability and pricing data
Agent-native SEO: A new discipline of "agent optimization" will emerge, analogous to SEO, focused on making services maximally discoverable and trustworthy to autonomous agents

Long-Term Implications (3–10 Years)

Discovery as a market: Intermediaries may emerge that aggregate, verify, and index capability manifests — functioning as agent-economy search engines
Capability composability: Standardized capability descriptions will enable agents to automatically compose multi-service workflows, discovering and chaining services without human design
Regulatory surface: As agents make consequential decisions based on capability manifests, regulators may require accuracy guarantees and audit trails for discovery infrastructure
Infrastructure commoditization: Just as SSL certificates became commoditized infrastructure, discovery stack generation may become automated — a service registers, and its llms.txt, agents.json, and OpenAPI spec are auto-generated from its codebase

Section 7: Assessment & Knowledge Checks

Level 1 Assessment (Ages 8–12)

Multiple Choice

What does llms.txt tell an AI agent?
a) How to draw pictures
b) What a website is about and how to use it ✓
c) The website's password
d) How fast the internet is
Which technology is like a detailed instruction manual for calling a service?
a) llms.txt
b) Semantic HTML
c) OpenAPI ✓
d) robots.txt
Why do machines need special files that humans don't need?
a) Machines are smarter than humans
b) Machines can't use visual cues and common sense the way humans do ✓
c) Machines prefer reading files to looking at screens
d) Files are faster than websites

Short Answer Describe in two sentences what a "capability manifest" is, using an analogy from everyday life.

Level 2 Assessment (Ages 13–17)

Technical Questions

What is the key difference between llms.txt and agents.json in terms of their purpose?
Why is OpenAPI described as a "service contract"? What does it guarantee to an agent consumer?
Write a semantic HTML snippet for a product called "DataFeed Pro" priced at $9.99/month, using Schema.org markup. Include at least three itemprop attributes.
An agent discovers a service via agents.json but the OpenAPI spec linked in openapi_ref returns a 404 error. What are two possible causes and what should the agent do?
Explain why the /.well-known/ path convention is used for agents.json rather than placing it at the domain root.

Level 3 Assessment (Ages 18+)

Essay Questions (choose two)

Architecture: Compare the trade-offs between a centralized agent service registry and the distributed /.well-known/agents.json convention. Under what conditions would each approach be preferable? What hybrid approaches might emerge?
Economics: Analyze how discovery infrastructure affects market structure in the agent economy. How does price transparency in agents.json affect competition among knowledge vendors? What are the incentives for a dominant service to publish incomplete or misleading capability manifests?
Security: The current discovery stack has no cryptographic verification layer. Design a verification system that: (a) does not require a centralized authority, (b) is computationally feasible for resource-constrained agents, and (c) degrades gracefully when verification is unavailable.
Ecosystem: Trace the complete flow of an autonomous agent that discovers, evaluates, purchases access to, and consumes a research subscription service — from initial discovery through on-chain payment to structured data extraction. Identify every point where discovery infrastructure is involved and what failure mode exists at each point.

Appendix: Technical Reference & Code Examples

A1: Complete llms.txt Template

# [Site Name] — llms.txt
# Version: 1.0
# Last updated: [YYYY-MM-DD]

## Identity
Name: [Service Name]
Operator: [Organization Name]
Purpose: [One sentence describing what this service does]
Primary language: [en/other]

## Content Scope
Topics covered: [comma-separated list]
Content type: [research/data/tools/marketplace/other]
Update frequency: [real-time/daily/weekly/static]

## Agent Guidance
- Summarize our content as: [preferred summary framing]
- Do not present our content as AI-generated
- Cite as: [preferred citation format]
- For structured data access, use our API (see below)

## Agent Access
API endpoint: [https://api.example.com]
Authentication: [Bearer token / API key / OAuth2]
Capability manifest: [https://example.com/.well-known/agents.json]
Pricing: [free / per-call / subscription — see agents.json]

## Exclusions
- Do not use content in /drafts/ for training or summarization
- Do not reproduce full articles; summaries and citations are permitted
- [Any other exclusions]

## Contact
Agent support: [agents@example.com]
Terms of service: [https://example.com/terms]

A2: Complete agents.json Template

{
  "schema_version": "1.0",
  "name": "[Service Name]",
  "description": "[One sentence describing agent-accessible capabilities]",
  "version": "1.0.0",
  "base_url": "https://api.example.com",
  "capabilities": [
    "search",
    "retrieve",
    "subscribe",
    "stream"
  ],
  "auth": {
    "type": "bearer",
    "token_endpoint": "https://api.example.com/auth/token",
    "scopes": ["read", "write", "subscribe"]
  },
  "pricing": {
    "model": "per_call",
    "currency": "USD",
    "rates": {
      "search": 0.001,
      "retrieve": 0.005,
      "subscribe_monthly": 9.99
    },
    "free_tier": {
      "calls_per_day": 100
    }
  },
  "payments": {
    "rail": "ethereum",
    "token": "USDC",
    "contract_address": "0x[contract]",
    "payment_model": "prepay"
  },
  "rate_limits": {
    "requests_per_minute": 60,
    "requests_per_day": 10000
  },
  "openapi_ref": "https://api.example.com/openapi.json",
  "llms_txt": "https://example.com/llms.txt",
  "contact": {
    "agent_support": "agents@example.com",
    "docs": "https://docs.example.com",
    "status": "https://status.example.com"
  },
  "last_updated": "[YYYY-MM-DD]"
}

A3: OpenAPI Starter Template (YAML)

openapi: "3.1.0"
info:
  title: "[Service Name] Agent API"
  version: "1.0.0"
  description: "[Service description for agent consumers]"
  contact:
    email: "agents@example.com"

servers:
  - url: "https://api.example.com/v1"
    description: "Production"

security:
  - bearerAuth: []

paths:
  /search:
    get:
      operationId: "searchContent"
      summary: "Search available content"
      parameters:
        - name: q
          in: query
          required: true
          schema:
            type: string
          description: "Search query"
        - name: limit
          in: query
          schema:
            type: integer
            default: 10
            maximum: 100
      responses:
        "200":
          description: "Search results"
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/SearchResults"
        "401":
          description: "Authentication required"
        "429":
          description: "Rate limit exceeded"

  /items/{id}:
    get:
      operationId: "getItem"
      summary: "Retrieve a specific item by ID"
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
      responses:
        "200":
          description: "Item data"
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Item"
        "404":
          description: "Item not found"

components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer

  schemas:
    SearchResults:
      type: object
      properties:
        total:
          type: integer
        items:
          type: array
          items:
            $ref: "#/components/schemas/Item"

    Item:
      type: object
      properties:
        id:
          type: string
        title:
          type: string
        summary:
          type: string
        published:
          type: string
          format: date-time
        url:
          type: string
          format: uri

A4: Semantic HTML Reference — Common Schema.org Types for Agent Discovery

Content Type	Schema.org Type	Key Properties
Article / Research	`ScholarlyArticle`	name, author, datePublished, abstract, url
Product / Service	`Product`	name, description, price, offers
Organization	`Organization`	name, url, contactPoint, description
Event	`Event`	name, startDate, location, organizer
Dataset	`Dataset`	name, description, distribution, license
API / Software	`SoftwareApplication`	name, applicationCategory, offers
Person	`Person`	name, jobTitle, affiliation, email

Implementation pattern:

<div itemscope itemtype="https://schema.org/[Type]">
  <span itemprop="[property]">[value]</span>
  <!-- For dates: -->
  <time itemprop="datePublished" datetime="2025-01-15">January 15, 2025</time>
  <!-- For nested entities: -->
  <div itemprop="author" itemscope itemtype="https://schema.org/Person">
    <span itemprop="name">Author Name</span>
  </div>
</div>

A5: Discovery Stack Checklist for Service Operators

Minimum viable discovery stack (implement in this order): - [ ] Semantic HTML on all public pages (use correct element types, add Schema.org markup to key content) - [ ] llms.txt at domain root (identity, scope, exclusions, API pointer) - [ ] OpenAPI spec for any existing API (validate with Swagger Editor or equivalent) - [ ] agents.json at /.well-known/agents.json (link to OpenAPI, add pricing and auth)

Enhanced discovery stack: - [ ] JSON-LD blocks in <head> for rich entity markup - [ ] Dynamic capability endpoint for real-time pricing/availability - [ ] Payment block in agents.json for on-chain micropayment support - [ ] Agent-specific documentation separate from human documentation - [ ] Rate limit headers in API responses (X-RateLimit-Remaining, Retry-After) - [ ] Capability versioning in agents.json to signal breaking changes

Verification (emerging best practices): - [ ] DNS TXT record linking to agents.json for domain ownership verification - [ ] Signed capability manifest (experimental — no standard yet) - [ ] On-chain capability attestation for high-value services

This lesson is part of Empirica's Agent Economy curriculum. Connected lessons cover agent memory and knowledge markets, research subscriptions as agent infrastructure, and on-chain payment rails for autonomous agents.

Discovery Infrastructure for AI Agents: A Multi-Age Course Lesson on llms.txt, agents.json, OpenAPI, and Semantic HTML

Executive Summary

Learning Objectives by Age Group

Ages 8–12

Ages 13–17

Ages 18+

Section 1: Foundations — Why Discovery Infrastructure Matters

The Problem Discovery Infrastructure Solves

Why Now

The Four-Layer Stack

Section 2: Core Technologies Explained

Section 2a: llms.txt — Machine-Readable Agent Instructions

What It Is

Structure and Content

Why It Matters Architecturally

Limitations

Section 2b: agents.json — Agent Capability Manifests

What It Is

Key Fields in a Capability Manifest

Relationship to Other Layers

Adoption Status

Section 2c: OpenAPI — Standardized Service Contracts

What It Is

Why Agents Need It

Minimal OpenAPI Example

OpenAPI's Advantage Over the Other Layers

Limitations for Agent Use

Section 2d: Semantic HTML — Human-Readable Discovery Patterns

What It Is

Why It Matters for Agent Discovery

Semantic HTML vs. Div Soup

The Fallback Role

Section 3: Age-Grouped Deep Dives

Section 3a: For Ages 8–12 — Foundations & Analogies

The New Kid Analogy

Breaking It Down

Key Concept: Why Machines Need Different Instructions Than Humans

Activity for Ages 8–12

Section 3b: For Ages 13–17 — Technical Implementation

From Analogy to Architecture

Understanding llms.txt Technically

Understanding agents.json Technically

Understanding OpenAPI Technically

Understanding Semantic HTML Technically

Activity for Ages 13–17

Section 3c: For Ages 18+ — Architecture & Economics

The Discovery Stack as Infrastructure

Architectural Trade-offs

Economic Incentives for Adoption

The Verification Gap

Activity for Ages 18+

Section 4: Practical Hands-On Activities by Level

Level 1 (Ages 8–12): The Robot Welcome Kit

Level 2 (Ages 13–17): Build and Test a Minimal Discovery Stack

Level 3 (Ages 18+): Full Stack Design + Economic Analysis

Section 5: Connection to Agent Economy Ecosystem

Section 5a: Link to Agent Memory & Knowledge Markets

Section 5b: Link to Research Subscriptions as Infrastructure

Section 5c: Link to On-Chain Payments for Agents

Section 6: Real-World Use Cases & Future Implications

Current Real-World Deployments

Near-Term Implications (1–3 Years)

Long-Term Implications (3–10 Years)

Section 7: Assessment & Knowledge Checks

Level 1 Assessment (Ages 8–12)

Level 2 Assessment (Ages 13–17)

Level 3 Assessment (Ages 18+)

Appendix: Technical Reference & Code Examples

A1: Complete llms.txt Template

A2: Complete agents.json Template

A3: OpenAPI Starter Template (YAML)

A4: Semantic HTML Reference — Common Schema.org Types for Agent Discovery

A5: Discovery Stack Checklist for Service Operators