Schema for AI
The schema.org JSON-LD markup decisions that improve AI search engine understanding. Same vocabulary, different priorities. The tactical layer below entity clarity.
Read the entry →Stan Consulting · Marketing Atlas · Reference · AI Search
The disambiguation work that lets an AI search engine confidently identify a brand as the same entity across mentions, pages, and sources. The structural input AI search optimization depends on.
Section 02 · Quick definition
Entity Clarity is the cumulative work that lets an AI search engine confirm a brand, person, product, or place is the same entity wherever it appears. The mechanics are schema @id cross-references, Wikidata and Wikipedia entries, consistent name and address signals across the open web, author bylines tied to a stable Person entity, and llms.txt summaries that match the rest of the site. The output is not better branding. The output is a higher confidence score when an AI retrieval layer asks, can I cite this entity without hedging? Brands with clarity are cited. Brands without it are skipped for the safer option.
Section 03 · Why it matters
An AI search engine cites confidently when it can confirm what entity it is talking about. Two brands with the same name in the same category force the model to disambiguate. If the disambiguation work is missing or inconsistent across sources, the model either picks the wrong entity, hedges with a vague answer, or skips the citation entirely and uses the safer alternative. The cost is invisibility under a name the brand actually owns.
The metric matters because most operator brands have at least one entity collision they have never noticed: a competitor with a similar name in an adjacent geography, a former product of the same name, an unrelated business with the same trade name. The collision is invisible until an AI search query forces the model to pick.
The practical stake is that entity clarity is the single most-overlooked structural surface in pre-2024 SEO work. Pages were optimized for ranking. Entities were left to figure themselves out. AI search punishes that gap.
Section 04 · How it works
An AI retrieval layer encountering a candidate page tries to confirm which entity the page is about. The confirmation runs against a knowledge graph the model assembled from training data and against signals on the page itself. High-clarity pages confirm the entity within seconds. Low-clarity pages produce a confidence score below the citation threshold and get dropped from the answer.
The model checks the page's JSON-LD schema for @id values that resolve to a single canonical entity. A page with Organization @id matching the home-page Organization @id resolves cleanly. A page with no @id, or with @id values that do not match elsewhere on the domain, resolves to nothing the model can fix on its own.
The model looks for cross-references to known entity registries: Wikidata Q-IDs, Wikipedia URLs, sameAs links to Crunchbase, LinkedIn, Bloomberg, Open Corporates. Each cross-reference adds a signal. Entities with three or more strong cross-references are confidently disambiguated. Entities with none rely entirely on the model's training-data memory.
The model checks whether the brand's name, address, founder, and category are consistent across reputable third-party mentions. Inconsistencies (a different city on Crunchbase, a misspelled founder name on a directory listing) lower the confidence score even if the page itself is clean. Consistency is the cheapest and most-ignored signal.
The model checks whether the page's author is a known entity with their own @id, and whether the source (the publisher) is known. A page with a Person @id author tied to a stable bio across the site, plus an Organization @id publisher tied to the home-page entity, scores higher than an anonymous page with no author.
The four steps run on every retrieval. Improvements compound: a brand that fixes @id resolution and adds Wikidata in the same quarter sees the citation rate move on subsequent queries, not on the queries that already ran.
Section 05 · Common misunderstandings
“Entity clarity is just better branding.”
Branding is for humans. Entity clarity is for retrieval layers. A brand can have great branding (recognizable logo, consistent tone, strong recall) and zero entity clarity (no schema @id, no Wikidata, contradictory addresses across listings). The two surfaces do not overlap. Branding work does not produce citations on its own.
“If we're a real business, the AI knows we're a real business.”
Real businesses with bad entity hygiene look identical to fake businesses to a retrieval layer. The model cannot independently verify that an LLC filing exists. The model checks the signals it has access to, which is mostly schema, cross-references, and consistency. A real business that has not invested in those signals scores like an unknown one.
“We have schema. That's entity clarity.”
Schema without @id cross-references is data without identity. A page can have Organization, Article, and BreadcrumbList schema and still produce no entity clarity if the @id values are inconsistent or missing entirely. The work is in the cross-references, not the presence of schema blocks.
“Wikipedia and Wikidata are for big brands.”
Wikidata accepts entries for any verifiable entity with reliable sources. Most operators qualifying for B2B services or e-commerce at scale qualify for Wikidata. The work to create and maintain a Wikidata entry takes hours, not weeks, and the cross-reference value compounds across every AI surface for years. The cost-to-impact ratio is the highest of any AI search work.
“Sharing a name with three other companies is fine.”
It is fine for direct search where the buyer types the URL. It is not fine for AI search where the model has to choose. When two entities with the same name exist, the model cites the one with stronger disambiguation signals. The other one becomes a footnote. Operators sharing a name without doing the disambiguation work are subsidizing the competitor that did.
Section 06 · Diagnostic questions
Does the brand have a Wikidata entry with sameAs cross-references to its own domain, Crunchbase, LinkedIn, and any other authoritative directory?
Are Organization @id values consistent across every page that uses Organization schema, and do they all resolve to the same canonical fragment URL?
Are author bylines tied to Person @id values that resolve to a single bio page with stable URL, photo, and credentials?
Does the brand share a name (or near-spelling) with another company in any related category, and which entity currently wins disambiguation in AI answers?
Are name, address, phone, and founder consistent across all third-party listings (Crunchbase, LinkedIn, Bloomberg, Open Corporates, Google Business)?
Does the llms.txt summary, the home-page hero copy, and the Organization schema description tell the model the same story about what the brand is?
Are products and services tagged with DefinedTerm or Product schema where appropriate, with @id cross-references that survive across the catalog?
Section 07 · Related Atlas entries
Section 08 · Five Cents
There is a cost to having a name shared with three other companies in the same category, and the cost is now showing up in AI answers. The model cannot tell you apart, so it cites the safer one. Safer means stronger disambiguation, more cross-references, a Wikidata entry that confirms what the brand is, an author byline tied to a real person with a real bio. I have looked at AI answers where the brand we worked on was nowhere in the response and a smaller competitor with cleaner entity hygiene was named twice. The brand was real. The signals were not. The fix is not louder marketing. The fix is structural identity the retrieval layer can confirm without guessing.
Stan · Marketing AtlasSection 09 · Sources