Skip to main content Stan Consulting LLC · Marketing Atlas Reference · llms.txt

Stan Consulting · Marketing Atlas · Reference · AI Search

llms.txt.

The plain-text file at the root of a domain that tells AI crawlers what the site is, who runs it, what it does, and which pages matter most. The robots.txt of the AI search era.

Section 02 · Quick definition

Definition.

llms.txt is a plain-text file served at the root of a domain (example.com/llms.txt) that gives AI crawlers and retrieval agents a curated map of what the site is and which pages matter most. The format is markdown-flavored: a heading with the site name, a short summary, optional metadata, and a list of important URLs grouped by section. The spec is maintained at llmstxt.org as a community standard. Compliance is voluntary and growing. The file is read at retrieval time by AI surfaces that choose to honor it.

Section 03 · Why it matters

Why it matters.

llms.txt is the cheapest single move an operator can make for AI search visibility. The file takes one afternoon to write. It tells the model in plain language what the brand is, how to refer to it, what the site sells, and which pages are the canonical sources for the questions the brand wants to be cited on. The model still decides whether to cite. The file makes it easier to cite confidently.

The metric matters because most operator domains have hundreds of pages, dozens of which are draft, stale, or thin. An AI crawler arriving without guidance retrieves a random sample and scores confidence against the worst pages it found. An AI crawler arriving at a clean llms.txt retrieves the pages the brand chose to feature and scores against those.

The practical stake is editorial framing, not technical compliance. The file is a chance to tell the model what the brand wants to be known for. Most brands that have written one wrote it as if it were robots.txt, and they got the technical part right and missed the editorial point.

Section 04 · How it works

How llms.txt is read and used.

An AI retrieval agent fetching content from a domain checks whether /llms.txt exists, parses the file as markdown-flavored text, and uses the contents as a navigational and editorial signal during retrieval. The file does not replace robots.txt or sitemap.xml; it sits alongside them with a different audience. Crawlers that honor the spec use the file to prioritize which URLs to ingest and to read the brand's self-description before scoring.

  1. Step one · root fetch

    The retrieval agent issues a GET request to https://example.com/llms.txt. The file should return 200 OK with content-type text/plain or text/markdown. Some agents also check /llms-full.txt, which the spec defines as a longer, fuller version with rendered page content. Both are optional but increasingly common.

  2. Step two · header and summary

    The agent reads the H1 (the site name) and the blockquote summary that follows. The summary is the brand's self-description in two to four sentences. This is the editorial frame the model uses when it needs to refer to the site without retrieving deeper content.

  3. Step three · sectioned URL list

    The agent parses the H2-grouped URL list. Each section name (Docs, Blog, Reference, Services) tells the model how to think about the URLs underneath. The URL list is the brand's explicit choice of which pages to feature. Pages not on the list are still discoverable through sitemap.xml; they just are not the brand's pick for what to read first.

  4. Step four · retrieval and citation

    When the model needs to answer a question about the brand or its category, it can use the llms.txt as a starting map. Pages featured in the file are more likely to be retrieved as candidates. The summary text is more likely to be quoted back when the model needs a one-line description of the brand.

The four steps run on agents that honor the spec. Anthropic, OpenAI, and others have signaled support; coverage is growing but not universal. A site without llms.txt is not penalized; a site with a good llms.txt gets a small but compounding edge per retrieval.

Section 05 · Common misunderstandings

What people get wrong.

  1. “llms.txt is robots.txt for AI. Same job.”

    robots.txt is a technical directive: allow or disallow specific crawlers from specific paths. llms.txt is editorial framing: here is what we are, here are the pages that matter. The two files coexist with different audiences. Treating llms.txt as a permission file misses the point and produces a file that says nothing about the brand.

  2. “If the file is voluntary, it doesn't help.”

    Voluntary in the sense that the spec is community-maintained, not imposed by a regulator. The agents that matter (Anthropic's ClaudeBot, OpenAI's OAI-SearchBot, Perplexity-User) are reading and using the file. Coverage is partial and growing. Acting on it now is acting before competitors.

  3. “We have a sitemap, so we don't need llms.txt.”

    A sitemap is a complete list of URLs for traditional crawlers. llms.txt is a curated short-list with editorial summary. The sitemap helps Google index. The llms.txt helps an AI surface decide what to cite. Different files for different audiences. The work to produce them is also different: one is generated, the other is written.

  4. “Listing every page in llms.txt makes us more visible.”

    The opposite. The file is most useful when it is curated. A llms.txt with 400 URLs and no editorial structure tells the model the site has 400 pages of equally weighted content. A llms.txt with 12 carefully chosen URLs tells the model these 12 are the canonical answers. The curation is the value.

  5. “The summary should match our home-page hero copy.”

    The summary should match how the brand wants to be described in an answer the model writes. Hero copy is written for a buyer skimming. The llms.txt summary is written for a model that may quote it back verbatim. Plain, concrete, factual. No adjectives the brand could not defend in a deposition.

Section 06 · Diagnostic questions

Questions a Stan Consulting diagnostic asks.

  1. Does the domain serve a valid /llms.txt at the root, returning 200 OK with text/plain or text/markdown content type?

  2. Does the H1 and summary match the editorial framing the brand wants the model to use, or does it read like home-page hero copy with adjectives?

  3. Are the listed URLs the canonical answers for the questions the brand wants to be cited on, or is the list a dump of every page on the site?

  4. Are the section headings (Docs, Reference, Blog, Services) accurate to how the brand wants the model to think about the URLs underneath?

  5. Does an /llms-full.txt exist with the same URLs and rendered content, or is there a reason to keep the longer file out of scope?

  6. Has the file been updated in the last 90 days to reflect current canonical pages, or is it pointing at a 2024 site map?

  7. Does the AI-Generated-Content section (if used) accurately label which pages on the site are AI-assisted versus human-written?

Section 07 · Related Atlas entries

Section 08 · Five Cents

Most operators who write llms.txt write it as if it were robots.txt for SEO and miss the point. The file is editorial framing for an LLM, not a technical directive. The job is to tell a model that may quote you verbatim what you actually do, in plain words, without the adjectives a board deck would forgive. I have read llms.txt files that read like a hero banner and produced exactly the kind of generic summary the model already could have guessed. The good ones read like a tired founder explaining the company to a new hire on Monday morning. That is the voice the model will quote back.

Stan · Marketing Atlas

Section 09 · Sources

Sources.

  1. llmstxt.org · The /llms.txt specification The community standard for the plain-text llms.txt file. Defines the format, the H1 and summary structure, and the URL listing convention used by Anthropic, OpenAI, and others.
  2. Anthropic · llms.txt support and conventions Anthropic's reference on how Claude and ClaudeBot read llms.txt during retrieval, and how the file affects what Claude cites in answers.
  3. OpenAI · OAI-SearchBot and crawler conventions OpenAI's reference on how its crawlers identify themselves, which files they honor, and how operators can communicate site structure to ChatGPT's retrieval layer.
  4. Search Engine Land · GEO and llms.txt coverage Practitioner reference on how llms.txt fits into a broader generative engine optimization program, and operator playbooks for writing the file well.
  5. Search Engine Journal · llms.txt explained Practitioner reference covering the spec, common implementation mistakes, and the editorial framing distinction between llms.txt and robots.txt.