AI Search Optimization
The set of practices that determine whether AI search engines cite a brand. llms.txt is one input; entity, schema, and brand mentions are others.
Read the entry →Stan Consulting · Marketing Atlas · Reference · AI Search
The plain-text file at the root of a domain that tells AI crawlers what the site is, who runs it, what it does, and which pages matter most. The robots.txt of the AI search era.
Section 02 · Quick definition
llms.txt is a plain-text file served at the root of a domain (example.com/llms.txt) that gives AI crawlers and retrieval agents a curated map of what the site is and which pages matter most. The format is markdown-flavored: a heading with the site name, a short summary, optional metadata, and a list of important URLs grouped by section. The spec is maintained at llmstxt.org as a community standard. Compliance is voluntary and growing. The file is read at retrieval time by AI surfaces that choose to honor it.
Section 03 · Why it matters
llms.txt is the cheapest single move an operator can make for AI search visibility. The file takes one afternoon to write. It tells the model in plain language what the brand is, how to refer to it, what the site sells, and which pages are the canonical sources for the questions the brand wants to be cited on. The model still decides whether to cite. The file makes it easier to cite confidently.
The metric matters because most operator domains have hundreds of pages, dozens of which are draft, stale, or thin. An AI crawler arriving without guidance retrieves a random sample and scores confidence against the worst pages it found. An AI crawler arriving at a clean llms.txt retrieves the pages the brand chose to feature and scores against those.
The practical stake is editorial framing, not technical compliance. The file is a chance to tell the model what the brand wants to be known for. Most brands that have written one wrote it as if it were robots.txt, and they got the technical part right and missed the editorial point.
Section 04 · How it works
An AI retrieval agent fetching content from a domain checks whether /llms.txt exists, parses the file as markdown-flavored text, and uses the contents as a navigational and editorial signal during retrieval. The file does not replace robots.txt or sitemap.xml; it sits alongside them with a different audience. Crawlers that honor the spec use the file to prioritize which URLs to ingest and to read the brand's self-description before scoring.
The retrieval agent issues a GET request to https://example.com/llms.txt. The file should return 200 OK with content-type text/plain or text/markdown. Some agents also check /llms-full.txt, which the spec defines as a longer, fuller version with rendered page content. Both are optional but increasingly common.
The agent reads the H1 (the site name) and the blockquote summary that follows. The summary is the brand's self-description in two to four sentences. This is the editorial frame the model uses when it needs to refer to the site without retrieving deeper content.
The agent parses the H2-grouped URL list. Each section name (Docs, Blog, Reference, Services) tells the model how to think about the URLs underneath. The URL list is the brand's explicit choice of which pages to feature. Pages not on the list are still discoverable through sitemap.xml; they just are not the brand's pick for what to read first.
When the model needs to answer a question about the brand or its category, it can use the llms.txt as a starting map. Pages featured in the file are more likely to be retrieved as candidates. The summary text is more likely to be quoted back when the model needs a one-line description of the brand.
The four steps run on agents that honor the spec. Anthropic, OpenAI, and others have signaled support; coverage is growing but not universal. A site without llms.txt is not penalized; a site with a good llms.txt gets a small but compounding edge per retrieval.
Section 05 · Common misunderstandings
“llms.txt is robots.txt for AI. Same job.”
robots.txt is a technical directive: allow or disallow specific crawlers from specific paths. llms.txt is editorial framing: here is what we are, here are the pages that matter. The two files coexist with different audiences. Treating llms.txt as a permission file misses the point and produces a file that says nothing about the brand.
“If the file is voluntary, it doesn't help.”
Voluntary in the sense that the spec is community-maintained, not imposed by a regulator. The agents that matter (Anthropic's ClaudeBot, OpenAI's OAI-SearchBot, Perplexity-User) are reading and using the file. Coverage is partial and growing. Acting on it now is acting before competitors.
“We have a sitemap, so we don't need llms.txt.”
A sitemap is a complete list of URLs for traditional crawlers. llms.txt is a curated short-list with editorial summary. The sitemap helps Google index. The llms.txt helps an AI surface decide what to cite. Different files for different audiences. The work to produce them is also different: one is generated, the other is written.
“Listing every page in llms.txt makes us more visible.”
The opposite. The file is most useful when it is curated. A llms.txt with 400 URLs and no editorial structure tells the model the site has 400 pages of equally weighted content. A llms.txt with 12 carefully chosen URLs tells the model these 12 are the canonical answers. The curation is the value.
“The summary should match our home-page hero copy.”
The summary should match how the brand wants to be described in an answer the model writes. Hero copy is written for a buyer skimming. The llms.txt summary is written for a model that may quote it back verbatim. Plain, concrete, factual. No adjectives the brand could not defend in a deposition.
Section 06 · Diagnostic questions
Does the domain serve a valid /llms.txt at the root, returning 200 OK with text/plain or text/markdown content type?
Does the H1 and summary match the editorial framing the brand wants the model to use, or does it read like home-page hero copy with adjectives?
Are the listed URLs the canonical answers for the questions the brand wants to be cited on, or is the list a dump of every page on the site?
Are the section headings (Docs, Reference, Blog, Services) accurate to how the brand wants the model to think about the URLs underneath?
Does an /llms-full.txt exist with the same URLs and rendered content, or is there a reason to keep the longer file out of scope?
Has the file been updated in the last 90 days to reflect current canonical pages, or is it pointing at a 2024 site map?
Does the AI-Generated-Content section (if used) accurately label which pages on the site are AI-assisted versus human-written?
Section 07 · Related Atlas entries
Section 08 · Five Cents
Most operators who write llms.txt write it as if it were robots.txt for SEO and miss the point. The file is editorial framing for an LLM, not a technical directive. The job is to tell a model that may quote you verbatim what you actually do, in plain words, without the adjectives a board deck would forgive. I have read llms.txt files that read like a hero banner and produced exactly the kind of generic summary the model already could have guessed. The good ones read like a tired founder explaining the company to a new hire on Monday morning. That is the voice the model will quote back.
Stan · Marketing AtlasSection 09 · Sources