Your site is invisible to AI agents. Here's how to fix it.

2026-02-22·10 min read·prodlint

AI agents don't use Google

They read your site directly. Or they don't read it at all. When ChatGPT, Claude, Perplexity, or any coding assistant tries to learn about your product, it doesn't search Google and click through results. It hits your domain, looks for specific files, reads specific headers, and either gets what it needs or moves on. If you don't have the right files in place, your site is a blank wall. Traditional SEO won't help here. You can rank #1 on Google and still be invisible to every AI agent on the internet. This is a different set of standards. Almost nobody implements them.

What AI discoverability actually means

Traditional SEO answers one question: where does your site rank on Google? AI discoverability answers four different ones. Can an AI agent find your site? Can it understand what you do? Does it know what content it's allowed to use? Can it interact with your services programmatically? Fourteen standards cover this. Some are well-established things you probably have already: sitemap.xml, meta tags, structured data, page speed. Others are so new that almost nobody implements them: llms.txt, ai.txt, TDMRep, Content-Usage, A2A AgentCard, WebMCP, HTTP Signatures, AI-Disclosure, and AI-specific robots.txt directives. The established ones are table stakes. The emerging ones are where sites go from invisible to discoverable.

robots.txt AI directives

You have a robots.txt. Good. But if it only mentions Googlebot, it tells AI agents nothing. GPTBot, ClaudeBot, PerplexityBot, and a dozen others each have their own user-agent string. Without explicit rules for these bots, they're left guessing whether they're allowed to crawl your content. This is the single highest-impact change you can make. Takes 30 seconds, covers two of the 14 checks at once, and it's the first thing every AI crawler looks for.

Bad: no AI bot directives

# robots.txt
User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /admin

prodlint output

yoursite.com  CRIT  robots-ai-directives  No AI-specific user-agent directives found

Good: explicit rules for AI crawlers

# robots.txt
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Disallow: /admin

llms.txt

The llms.txt spec (proposed by Jeremy Howard) puts a machine-readable summary of your site at the root. Name, what you do, key pages. Without it, an LLM has to scrape your homepage and guess. With it, the LLM gets your pitch in 30 lines. Think of it like a README for AI. A human reads your About page. An LLM reads your llms.txt.

Bad: no llms.txt

GET /llms.txt
404 Not Found

# LLMs have to scrape your homepage and guess
# what your product does. Most get it wrong.

Good: structured site summary for LLMs

# Prodlint

> Open-source linter for AI-generated code. Catches
> hallucinated imports, leaked secrets, missing auth
> checks, and 49 other patterns that compile but
> aren't production-ready.

## Topics
- Code quality
- AI code generation
- Production readiness

## Links
- [Rules](https://prodlint.com/rules): All 52 lint rules
- [Site Score](https://prodlint.com/score): AI readiness scanner
- [Tools](https://prodlint.com/tools): Free file generators
- [MCP Server](https://prodlint.com/mcp): IDE integration

ai.txt

ai.txt tells AI companies whether they can train on your content. Without it, you have no say. With it, you can allow training but block commercial use, or allow summarization but block synthesis. The spec comes from Spawning, and it works like robots.txt but for training data. You define what's allowed, what's blocked, and under what conditions. It's the consent form for AI training. If you care about how your content gets used in model training, this is how you state your terms.

Bad: no ai.txt

GET /ai.txt
404 Not Found

# Your content is being used in training data.
# You've expressed no preference about it.

Good: explicit training permissions

# ai.txt - AI Training Permissions

User-Agent: *
Allowed: Yes
Disallowed-Training: No
Disallowed-Commercial: Yes
Contact: ai@yoursite.com

TDMRep and Content-Usage

The W3C's TDMRep standard and the IETF's Content-Usage draft give you per-path control over text and data mining. These aren't aspirational specs. They're the standards that courts actually reference when deciding data mining cases. TDMRep uses a JSON file at /.well-known/tdmrep.json where you define mining policies per URL pattern. Content-Usage is an HTTP header (or a robots.txt directive) that signals whether AI agents can train on, summarize, or index specific content. If you're in the EU, TDMRep is how you exercise your rights under the DSM Directive, Article 4. Not optional for publishers. If you're anywhere else, these standards still give you a clear, machine-readable way to state your terms before someone scrapes your site and asks forgiveness later. The two overlap in purpose but work at different layers. TDMRep is file-based, good for broad policies. Content-Usage is header-based, good for per-response control. Implementing both gives you coverage across agents that support either one.

A2A AgentCard

Google's Agent-to-Agent protocol uses a JSON file at /.well-known/agent-card.json to describe your agent's identity, capabilities, and auth requirements. Think of it like DNS for the agent web. Without an AgentCard, other agents can't discover or interact with your services. This matters most for sites that expose any kind of API. If you have an API, a chatbot, a tool, or anything that another agent might want to call, the AgentCard tells them how.

Bad: no AgentCard

GET /.well-known/agent-card.json
404 Not Found

# Other agents can't discover your service.
# You're invisible in the agent-to-agent layer.

Good: discoverable agent identity

{
  "name": "YourApp Agent",
  "description": "Product search and recommendations",
  "url": "https://yoursite.com",
  "version": "1.0.0",
  "skills": [
    {
      "id": "product-search",
      "name": "Product Search",
      "description": "Search by keyword or category",
      "inputModes": ["text/plain"],
      "outputModes": ["application/json"]
    }
  ],
  "authentication": {
    "schemes": ["Bearer"]
  }
}

HTTP Message Signatures and AI-Disclosure

RFC 9421 HTTP Message Signatures let agents cryptographically prove who they are. When an AI agent signs its requests, your server can verify that the request actually came from OpenAI, Anthropic, or whoever claims to be asking. Without signatures, you're trusting a User-Agent string that anyone can fake. The AI-Disclosure header is simpler. It's a response header that tells consumers whether content was generated by AI, co-authored with AI, or written entirely by humans. The transparency layer. Neither standard is widely adopted yet. That's exactly why implementing them now matters. Early adoption signals to crawlers that your site takes AI interaction seriously. When these standards become widespread (give it a year), you'll already be compliant.

What a typical scan looks like

Try scanning any site with prodlint's Site Score and you'll probably see a pattern: the established checks pass (robots.txt, meta tags, maybe a sitemap) and everything else fails. The standards that almost nobody implements yet: llms.txt, ai.txt, TDMRep, AgentCard, Content-Usage, HTTP Signatures, AI-Disclosure, WebMCP. Here's an example.

prodlint output

example-saas.com — Score: 18/100  CRIT  robots-ai-directives  No AI-specific user-agent directives found  CRIT  llms-txt              No llms.txt found  CRIT  ai-txt                No ai.txt found at site root  CRIT  content-usage         No Content-Usage directives found  CRIT  tdmrep                No TDMRep configuration found  CRIT  ai-disclosure         No AI-Disclosure header found  CRIT  agent-card            No A2A AgentCard found  CRIT  webmcp                No WebMCP tools detected  CRIT  http-signatures       No HTTP signature support detected  CRIT  structured-data       No structured data found  WARN  opengraph             Found 2/4 key meta tags

Check your site in 10 seconds

Run npx prodlint --web yoursite.com in your terminal. Or use the web scanner at prodlint.com/score. Paste your URL, get your score. If you're missing files, prodlint.com/tools has 7 free generators that create them for you: robots.txt AI directives, llms.txt, ai.txt, Content-Usage headers, TDMRep policies, A2A AgentCards, and AI-Disclosure headers. Fill in the form, copy the output, deploy. These 14 standards aren't going away. They're going to become the baseline. The question is whether you implement them now while they're still a differentiator, or later when everyone's already caught up.

Catch all of these automatically.

52 production readiness checks. Zero config.

GitHub·All 52 rules·GitHub Action