Skip to main content
aeogeoseoguidemeta

How to Get Cited by ChatGPT, Perplexity & Claude in 2026: AEO from the Trenches

A practitioner's guide to getting cited by ChatGPT, Perplexity, and Claude — the exact AEO/GEO changes we shipped on BenchLM: quotable lines, Dataset schema, llms.txt, AI-crawler access, and the tooling we use to find what to answer.

Glevd·Published June 12, 2026·12 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

We run a benchmark site, which means our entire business is being the source that gets cited. By May, referrers from ChatGPT, Claude, and Perplexity were showing up in our analytics right next to Google and Bing. This post documents exactly what we shipped on BenchLM to make that happen — and every item is checkable on the live site, because we'd rather show the receipts than theorize.

This is the practitioner's version, not the thought-leadership version. If you want the short version of the tooling: we use outrank.so to find which questions assistants get asked in our niche and to track whether our pages are the ones being lifted into the answers — readers get 10% off the first month with code BENCHLM. (Partner link — it never affects our rankings.) The rest of this post is the structural work that makes those pages citable in the first place.

First, let the bots in (the step everyone skips)

Before any clever content work, check the boring thing: can the AI crawlers actually reach your site? The single most common reason for zero AI citations is that they can't. You can write the most liftable page on the internet and get cited zero times because a line in robots.txt quietly blocks the crawler that would have quoted it.

There are two kinds of AI bot, and they do different jobs:

  • Training crawlers — GPTBot, ClaudeBot, Google-Extended. These feed the model's background knowledge of you. Block them and the model simply doesn't know you exist.
  • Live search / RAG crawlers — OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot. These fetch pages at query time to answer a live prompt. Block these and you get no citations even if the model knows you, because it can't pull a current page to quote.

Allow both, for every provider you care about. The asymmetry that trips people up: a default "block AI scrapers" rule that felt prudent in 2024 is, in 2026, a rule that makes you invisible to the fastest-growing referral channel on the web. Audit robots.txt first; everything below is wasted effort behind a closed door.

The unit of AEO is the quotable sentence

Assistants don't cite pages; they lift sentences. The single highest-leverage content change we made was making every ranking page open with one self-contained, dated, numeric sentence:

As of June 2026, the top coding model on the BenchLM leaderboard is [model] with a weighted coding score of [score].

That sentence is generated from the live leaderboard at render time, so it can't drift stale. When an assistant retrieves the page, the answer to "what's the best coding LLM" is sitting in the first 200 bytes of content, with a date that signals freshness and a number that signals specificity. You can see the live version on /coding, /agentic, and every category page.

The pattern, distilled: claim + number + date + entity, in one sentence, no pronouns — and put it in the first 100 words, because that's where extractors look first. If your key fact needs the surrounding paragraph to parse — if it says "it leads" instead of naming the model — it won't get lifted, because the assistant can't quote it without dragging in context it doesn't trust. Write the sentence you want to see quoted back to you.

AEO vs GEO vs SEO: same goal, three layers

You'll see three acronyms thrown around, and the turf war between them is mostly noise. Here's the honest mapping:

  • SEO gets you into the index. It still gates everything — roughly 87% of ChatGPT's citations overlap with Bing's top 10 results, so if you don't rank, you don't get cited. Classic search optimization is the floor, not the ceiling.
  • AEO (Answer Engine Optimization) engineers the page: the quotable passage, the schema, the FAQ phrasing, the freshness signals. It gives the engine something clean to quote.
  • GEO (Generative Engine Optimization) engineers the ecosystem: domain authority, links, entity consistency across the web. It gives the engine a reason to trust the source. Domain traffic is the strongest single predictor of citations — high-traffic sites earn roughly 3x more than low-traffic ones — which is really a GEO signal wearing an AEO hat.

In practice we don't run three programs. We run one: rank the page (SEO), make it liftable (AEO), and build the authority that makes the lift trustworthy (GEO). Treat anyone selling them as rival disciplines with suspicion.

Make freshness visible and machine-readable

Assistants are biased toward sources that look maintained, and they can read dates. The most-cited pages are overwhelmingly ones updated recently. Three things we ship on every ranking page:

  • Visible "Last verified: [date]" tied to the actual data sync, not a vanity timestamp that updates on every deploy. If the date is real, it stays honest; if it's theater, it eventually contradicts the content.
  • dateModified in structured data, matching the visible date exactly. When the human-readable date and the machine-readable date agree, you look maintained. When they disagree, you look broken.
  • An updated field on evergreen posts, shown next to the publish date. In our experience a March post visibly updated in June out-retrieves a thin June post with no history — answer engines reward the maintained source over the merely recent one.

Structured data: Dataset and FAQ, not just Article

Most sites stop at Article markup. Two schema types do the heavy lifting for citations:

Dataset. The thing that distinguishes a data site is schema.org/Dataset: name, description, license, update cadence, and a distribution block pointing at machine-readable files. We ship it on the homepage and on a dedicated /data page. That page exists for crawlers — humans almost never visit it, yet it's one of the most-fetched URLs by bot user-agents on the whole site, because it's the clean front door that tells a machine "here is the structured, licensed, dated thing you can quote with confidence."

FAQPage. FAQ markup carries one of the highest individual signal weights for citation, but only if you do it right: phrase each question the way people actually ask assistants, match the question text character-for-character with a visible heading, and keep each answer to roughly 40–60 words. That length is not arbitrary — it's about the size of passage an assistant lifts whole. We write the questions ("What is the best LLM for coding?", "How much does GPT-5.4 cost?") as the exact retrieval unit, then answer each in one self-contained paragraph.

llms.txt and machine-readable mirrors

Everything rendered on BenchLM is also available as JSON under /data, described in llms.txt and llms-full.txt. Sites that ship llms.txt tend to get cited more often, for three concrete reasons:

  1. Assistants with browsing tools parse JSON more reliably than HTML tables. A leaderboard that renders beautifully for humans is a parsing gamble for a crawler; the same data as JSON is unambiguous.
  2. Researchers and aggregators who use the files link back — and links remain the trust signal answer engines inherit from classic search. Machine-readable data is link bait for exactly the audience whose links count.
  3. Verifiable sources get cited over plausible ones. When your numbers are downloadable and dated, an assistant can treat them as fact rather than as one more opinion to hedge around.

Which engine, and what to measure

The three big answer engines don't behave the same, and it changes where you put effort. ChatGPT has the most users but pulls heavily from Bing's index, so classic ranking matters most there. Perplexity drives the most clickable citations — actual referral traffic — so it's the one to watch in analytics. Claude has the highest mention rate but cites more conservatively. Measure all three: watch for chatgpt.com, perplexity.ai, and claude.ai referrers, and track whether you're the cited source for your target questions, not just whether traffic is up.

The tools, compared

Everything above is plumbing you build once and maintain. The ongoing work — the part that never finishes — is knowing which questions to be the answer to, and watching whether you actually get cited. A few categories of tool help:

Tool What it does best Note
outrank.so Finds target questions + tracks whether you're cited What we use; 10% off with code BENCHLM
Profound Enterprise AI-visibility monitoring across engines Strong for big brand tracking
Writesonic / Jasper AI content drafting with some AEO features Drafting-first, not measurement-first

We use outrank.so because it's built around the AEO loop specifically — it surfaces the queries assistants are getting asked in your niche, flags where you're not yet the cited source, and tracks whether your published pages are the ones being lifted over time. Several posts on this blog get their target-question lists from it. (Partner link — it never affects our scores, rankings, or coverage.)

The rest of our stack is deliberately unglamorous: a build step that regenerates every data-driven claim from the live dataset so nothing drifts, and an audit script that diffs published claims against current data and fails the build on any mismatch. AEO content that contradicts your own live numbers is worse than no content — it teaches the assistant your source is unreliable.

The checklist

  • Let the AI crawlers in — allow GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, PerplexityBot, Google-Extended in robots.txt
  • Rank in classic search first — ~87% of ChatGPT citations overlap Bing's top 10
  • One quotable, dated, numeric sentence in the first 100 words of every key page
  • Visible "Last verified" dates tied to real syncs, matching dateModified
  • Dataset schema with license and distribution, plus FAQPage with 40–60-word answers
  • A stable /data page and llms.txt describing your machine-readable facts
  • Freshness automation — facts regenerate from source, audits catch drift
  • A way to find which questions to answer next, and to check whether you're cited

The bottom line

Getting cited by ChatGPT, Perplexity, and Claude is mostly being reachable, liftable, fresh, and verifiable — in that order. Let the crawlers in, rank in classic search, put one quotable dated sentence where the extractor reads first, ship Dataset and FAQ schema, and run a feedback loop that tells you which question to own next. Most of that is a weekend of plumbing. The feedback loop is the part you keep running.

Our methodology · The BenchLM dataset · Embed our leaderboard


Some links are affiliate links; they never affect scores, rankings, or coverage. See our affiliate disclosure.

New models drop every week. We send one email a week with what moved and why.