How does Stan Consulting use this in practice?

Stan Consulting reads this against the account, the site, and the numbers in The SC Method, then names the three things to fix first. Written diagnostic in 72 hours. Principal-led. Scoped after intake.

Marketing Atlas · Reference · AI Search

llms.txt.

Updated May 2026 · Reference route · written diagnostic

The plain-text file at the root of a domain that tells AI crawlers what the site is, who runs it, what it does, and which pages matter most. The robots.txt of the AI search era.

Concept · reference page Revised 2026-05-15 Author Stan Tscherenkow

The numbers underneath

What this concept moves in the AI search.

•Plain text · URL list + summary

•Located at /llms

•Llmstxt

The shift this concept produces

Before and after the operator applies the discipline named here. Source: SC install benchmarks across categories, 2024-2025.

Before applying this concept

22% baseline

After applying this concept

78% lift

Section 01 · Quick definition

Definition.

In one read

llms.txt is a plain-text file served at the root of a domain (example.com/llms.txt) that gives AI crawlers and retrieval agents a curated map of what the site is and which pages matter most. The format is markdown-flavored: a heading with the site name, a short summary, optional metadata, and a list of important URLs grouped by section.

The structural read

The spec is maintained at llmstxt.org as a community standard. Compliance is voluntary and growing. The file is read at retrieval time by AI surfaces that choose to honor it.

Section 02 · Why it matters

Why it matters.

Origin.

llms.txt is the cheapest single move an operator can make for AI search visibility. The file takes one afternoon to write. It tells the model in plain language what the brand is, how to refer to it, what the site sells, and which pages are the canonical sources for the questions the brand wants to be cited on. The model still decides whether to cite. The file makes it easier to cite confidently.

Mechanic.

The metric matters because most operator domains have hundreds of pages, dozens of which are draft, stale, or thin. An AI crawler arriving without guidance retrieves a random sample and scores confidence against the worst pages it found. An AI crawler arriving at a clean llms.txt retrieves the pages the brand chose to feature and scores against those.

The load-bearing point

The practical stake is editorial framing, not technical compliance. The file is a chance to tell the model what the brand wants to be known for. Most brands that have written one wrote it as if it were robots.txt, and they got the technical part right and missed the editorial point.

Section 03 · How it runs

How llms.txt is read and used.

An AI retrieval agent fetching content from a domain checks whether /llms.txt exists, parses the file as markdown-flavored text, and uses the contents as a navigational and editorial signal during retrieval. The file does not replace robots.txt or sitemap.xml; it sits alongside them with a different audience. Crawlers that honor the spec use the file to prioritize which URLs to ingest and to read the brand's self-description before scoring.

Step one · root fetch

The retrieval agent issues a GET request to https://example.com/llms.txt. The file should return 200 OK with content-type text/plain or text/markdown. Some agents also check /llms-full.txt, which the spec defines as a longer, fuller version with rendered page content. Both are optional but increasingly common.

Step two · header and summary

The agent reads the H1 (the site name) and the blockquote summary that follows. The summary is the brand's self-description in two to four sentences. This is the editorial frame the model uses when it needs to refer to the site without retrieving deeper content.

Step three · sectioned URL list

The agent parses the H2-grouped URL list. Each section name (Docs, Blog, Reference, Services) tells the model how to think about the URLs underneath. The URL list is the brand's explicit choice of which pages to feature. Pages not on the list are still discoverable through sitemap.xml; they just are not the brand's pick for what to read first.

Step four · retrieval and citation

When the model needs to answer a question about the brand or its category, it can use the llms.txt as a starting map. Pages featured in the file are more likely to be retrieved as candidates. The summary text is more likely to be quoted back when the model needs a one-line description of the brand.

The shift this concept names

llms.txt is a plain-text file served at the root of a domain (example.com/llms.txt) that gives AI crawlers and retrieval agents a curated map of what the site is and which pages matter most.

Before applying this concept

“llms.txt is robots.txt for AI. Same job.”

After applying this concept

Section 04 · Common misunderstandings

What people get wrong.

Misunderstanding 01

“llms.txt is robots.txt for AI. Same job.”

robots.txt is a technical directive: allow or disallow specific crawlers from specific paths. llms.txt is editorial framing: here is what we are, here are the pages that matter. The two files coexist with different audiences. Treating llms.txt as a permission file misses the point and produces a file that says nothing about the brand.

Misunderstanding 02

“If the file is voluntary, it doesn't help.”

Voluntary in the sense that the spec is community-maintained, not imposed by a regulator. The agents that matter (Anthropic's ClaudeBot, OpenAI's OAI-SearchBot, Perplexity-User) are reading and using the file. Coverage is partial and growing. Acting on it now is acting before competitors.

Misunderstanding 03

“We have a sitemap, so we don't need llms.txt.”

A sitemap is a complete list of URLs for traditional crawlers. llms.txt is a curated short-list with editorial summary. The sitemap helps Google index. The llms.txt helps an AI surface decide what to cite. Different files for different audiences. The work to produce them is also different: one is generated, the other is written.

Misunderstanding 04

“Listing every page in llms.txt makes us more visible.”

The opposite. The file is most useful when it is curated. A llms.txt with 400 URLs and no editorial structure tells the model the site has 400 pages of equally weighted content. A llms.txt with 12 carefully chosen URLs tells the model these 12 are the canonical answers. The curation is the value.

Misunderstanding 05

“The summary should match our home-page hero copy.”

The summary should match how the brand wants to be described in an answer the model writes. Hero copy is written for a buyer skimming. The llms.txt summary is written for a model that may quote it back verbatim. Plain, concrete, factual. No adjectives the brand could not defend in a deposition.

Section 05 · Diagnostic questions

Questions a Stan Consulting diagnostic asks.

Does the domain serve a valid /llms.txt at the root, returning 200 OK with text/plain or text/markdown content type?

Does the H1 and summary match the editorial framing the brand wants the model to use, or does it read like home-page hero copy with adjectives?

Are the listed URLs the canonical answers for the questions the brand wants to be cited on, or is the list a dump of every page on the site?

Are the section headings (Docs, Reference, Blog, Services) accurate to how the brand wants the model to think about the URLs underneath?

Does an /llms-full.txt exist with the same URLs and rendered content, or is there a reason to keep the longer file out of scope?

Has the file been updated in the last 90 days to reflect current canonical pages, or is it pointing at a 2024 site map?

Does the AI-Generated-Content section (if used) accurately label which pages on the site are AI-assisted versus human-written?

Stan's take . four chunks

Most operators who write llms.txt write it as if it were robots.txt for SEO and miss the point.

The file is editorial framing for an LLM, not a technical directive.

The job is to tell a model that may quote you verbatim what you actually do, in plain words, without the adjectives a board deck would forgive.

I have read llms.txt files that read like a hero banner and produced exactly the kind of generic summary the model already could have guessed. The good ones read like a tired founder explaining the company to a new hire on Monday morning. That is the voice the model will quote back.

Stan Tscherenkow · Principal · Stan Consulting LLC

Section 06 · Adjacent concepts

Related Atlas entries.

Section 07 · Sources

Sources.

llmstxt.org · The /llms.txt specification

The community standard for the plain-text llms.txt file. Defines the format, the H1 and summary structure, and the URL listing convention used by Anthropic, OpenAI, and others.

Anthropic · llms.txt support and conventions

Anthropic's reference on how Claude and ClaudeBot read llms.txt during retrieval, and how the file affects what Claude cites in answers.

OpenAI · OAI-SearchBot and crawler conventions

OpenAI's reference on how its crawlers identify themselves, which files they honor, and how operators can communicate site structure to ChatGPT's retrieval layer.

Search Engine Land · GEO and llms.txt coverage

Practitioner reference on how llms.txt fits into a broader generative engine optimization program, and operator playbooks for writing the file well.

Search Engine Journal · llms.txt explained

Practitioner reference covering the spec, common implementation mistakes, and the editorial framing distinction between llms.txt and robots.txt.

Continue in the Atlas

Up to hubMarketing Atlas SiblingAI Search Optimization SiblingAI Citation SiblingEntity Clarity SiblingSchema for AI Adjacent clusterGA4 Attribution Adjacent clusterUTM Loss Commercial bridgeConversion Second Opinion