A practitioner's guide to getting cited by ChatGPT, Perplexity, and Claude — the exact AEO/GEO changes we shipped on BenchLM: quotable lines, Dataset schema, llms.txt, AI-crawler access, and the tooling we use to find what to answer.
Share This Report
Copy the link, post it, or save a PDF version.
We run a benchmark site, which means our entire business is being the source that gets cited. By May, referrers from ChatGPT, Claude, and Perplexity were showing up in our analytics right next to Google and Bing. This post documents exactly what we shipped on BenchLM to make that happen — and every item is checkable on the live site, because we'd rather show the receipts than theorize.
This is the practitioner's version, not the thought-leadership version. If you want the short version of the tooling: we use outrank.so to find which questions assistants get asked in our niche and to track whether our pages are the ones being lifted into the answers — readers get 10% off the first month with code BENCHLM. (Partner link — it never affects our rankings.) The rest of this post is the structural work that makes those pages citable in the first place.
Before any clever content work, check the boring thing: can the AI crawlers actually reach your site? The single most common reason for zero AI citations is that they can't. You can write the most liftable page on the internet and get cited zero times because a line in robots.txt quietly blocks the crawler that would have quoted it.
There are two kinds of AI bot, and they do different jobs:
Allow both, for every provider you care about. The asymmetry that trips people up: a default "block AI scrapers" rule that felt prudent in 2024 is, in 2026, a rule that makes you invisible to the fastest-growing referral channel on the web. Audit robots.txt first; everything below is wasted effort behind a closed door.
Assistants don't cite pages; they lift sentences. The single highest-leverage content change we made was making every ranking page open with one self-contained, dated, numeric sentence:
As of June 2026, the top coding model on the BenchLM leaderboard is [model] with a weighted coding score of [score].
That sentence is generated from the live leaderboard at render time, so it can't drift stale. When an assistant retrieves the page, the answer to "what's the best coding LLM" is sitting in the first 200 bytes of content, with a date that signals freshness and a number that signals specificity. You can see the live version on /coding, /agentic, and every category page.
The pattern, distilled: claim + number + date + entity, in one sentence, no pronouns — and put it in the first 100 words, because that's where extractors look first. If your key fact needs the surrounding paragraph to parse — if it says "it leads" instead of naming the model — it won't get lifted, because the assistant can't quote it without dragging in context it doesn't trust. Write the sentence you want to see quoted back to you.
You'll see three acronyms thrown around, and the turf war between them is mostly noise. Here's the honest mapping:
In practice we don't run three programs. We run one: rank the page (SEO), make it liftable (AEO), and build the authority that makes the lift trustworthy (GEO). Treat anyone selling them as rival disciplines with suspicion.
Assistants are biased toward sources that look maintained, and they can read dates. The most-cited pages are overwhelmingly ones updated recently. Three things we ship on every ranking page:
dateModified in structured data, matching the visible date exactly. When the human-readable date and the machine-readable date agree, you look maintained. When they disagree, you look broken.updated field on evergreen posts, shown next to the publish date. In our experience a March post visibly updated in June out-retrieves a thin June post with no history — answer engines reward the maintained source over the merely recent one.Most sites stop at Article markup. Two schema types do the heavy lifting for citations:
Dataset. The thing that distinguishes a data site is schema.org/Dataset: name, description, license, update cadence, and a distribution block pointing at machine-readable files. We ship it on the homepage and on a dedicated /data page. That page exists for crawlers — humans almost never visit it, yet it's one of the most-fetched URLs by bot user-agents on the whole site, because it's the clean front door that tells a machine "here is the structured, licensed, dated thing you can quote with confidence."
FAQPage. FAQ markup carries one of the highest individual signal weights for citation, but only if you do it right: phrase each question the way people actually ask assistants, match the question text character-for-character with a visible heading, and keep each answer to roughly 40–60 words. That length is not arbitrary — it's about the size of passage an assistant lifts whole. We write the questions ("What is the best LLM for coding?", "How much does GPT-5.4 cost?") as the exact retrieval unit, then answer each in one self-contained paragraph.
Everything rendered on BenchLM is also available as JSON under /data, described in llms.txt and llms-full.txt. Sites that ship llms.txt tend to get cited more often, for three concrete reasons:
The three big answer engines don't behave the same, and it changes where you put effort. ChatGPT has the most users but pulls heavily from Bing's index, so classic ranking matters most there. Perplexity drives the most clickable citations — actual referral traffic — so it's the one to watch in analytics. Claude has the highest mention rate but cites more conservatively. Measure all three: watch for chatgpt.com, perplexity.ai, and claude.ai referrers, and track whether you're the cited source for your target questions, not just whether traffic is up.
Everything above is plumbing you build once and maintain. The ongoing work — the part that never finishes — is knowing which questions to be the answer to, and watching whether you actually get cited. A few categories of tool help:
| Tool | What it does best | Note |
|---|---|---|
| outrank.so | Finds target questions + tracks whether you're cited | What we use; 10% off with code BENCHLM |
| Profound | Enterprise AI-visibility monitoring across engines | Strong for big brand tracking |
| Writesonic / Jasper | AI content drafting with some AEO features | Drafting-first, not measurement-first |
We use outrank.so because it's built around the AEO loop specifically — it surfaces the queries assistants are getting asked in your niche, flags where you're not yet the cited source, and tracks whether your published pages are the ones being lifted over time. Several posts on this blog get their target-question lists from it. (Partner link — it never affects our scores, rankings, or coverage.)
The rest of our stack is deliberately unglamorous: a build step that regenerates every data-driven claim from the live dataset so nothing drifts, and an audit script that diffs published claims against current data and fails the build on any mismatch. AEO content that contradicts your own live numbers is worse than no content — it teaches the assistant your source is unreliable.
dateModifiedGetting cited by ChatGPT, Perplexity, and Claude is mostly being reachable, liftable, fresh, and verifiable — in that order. Let the crawlers in, rank in classic search, put one quotable dated sentence where the extractor reads first, ship Dataset and FAQ schema, and run a feedback loop that tells you which question to own next. Most of that is a weekend of plumbing. The feedback loop is the part you keep running.
→ Our methodology · The BenchLM dataset · Embed our leaderboard
Some links are affiliate links; they never affect scores, rankings, or coverage. See our affiliate disclosure.
New models drop every week. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Which LLM is best for voice agents in 2026? We rank models by first-answer latency and output speed — the metrics that actually decide voice — name the fastest capable models, and compare the voice-agent platforms (Retell, Vapi, OpenAI Realtime, ElevenLabs).
Best LLM for math 2026: GPT-5.4 leads AIME 2025, MATH-500, and BRUMO. Compare Claude, Gemini, DeepSeek-R1, GPT-5.5, and value picks by use case.
A complete guide to Perceptron Mk1, frontier video understanding models, video AI benchmarks, and where video-language models are headed next.