We've spent the last six months pulling apart pages that get cited in Google AI Overview, Perplexity, ChatGPT Search, and Gemini, and comparing them to pages that don't. Same site, same topic, sometimes the same author. One page gets pulled into the answer box. The other never shows up.
What surprised us: it's almost never about the writing being "better." The cited pages are structurally different. They have specific shapes that make them easy for an LLM to lift, attribute, and trust.
Below are the 7 patterns we keep seeing. Not "write good content" - specific structural moves you can ship this week, with examples for each. If your site already ranks on Google but you're invisible in AI answers, this is usually why.
Pattern 1: Direct-answer paragraphs (question as H2, answer in the next sentence)
What it is: The H2 is phrased as the literal question a user would ask. The very first sentence under that H2 is the answer, in plain language, in 25 to 60 words. Supporting detail follows in the next paragraphs.
Why it works: LLMs scan for question-answer pairs. When the model extracts an answer to surface in a citation card, it needs a chunk that stands on its own. If the answer is buried three paragraphs in, the model will either skip the page entirely or pull a weaker chunk from a competitor that front-loaded theirs.
A 2026 Ahrefs analysis of 4 million AI Overview URLs found that only 38% of cited pages also rank in the top 10 for the same query, down from 76% seven months earlier. Citation isn't a function of overall ranking. It's a function of whether your page has a liftable answer.
Example. Bad version, written like a normal article intro:
Many homeowners ask about gutter guards, and there's a lot of confusion about whether they actually work. In this post we'll walk through everything you need to know about the different types of gutter guards on the market today...
Good version, written for citation:
Do gutter guards actually work?
Yes - quality gutter guards reduce gutter cleanings by about 80% and prevent most clogs from leaves, twigs, and shingle grit. They don't eliminate maintenance entirely. Pine needles, seed pods, and roof debris still require an annual rinse. Cheap plastic mesh guards underperform; micro-mesh stainless guards are the category that actually delivers.
H2 is the search query. First sentence is the answer. Nuance comes after. The model can lift the first 40 words and have a complete, accurate citation.
Pattern 2: Named entity density
What it is: People, places, products, and organizations mentioned by their full proper name, with links to authoritative sources for the first mention of each. Not "a leading practice management software" - "Jane App." Not "the FDA" on first reference - "the U.S. Food and Drug Administration (FDA)."
Why it works: LLMs build internal entity graphs. When content names entities precisely and links them to canonical sources (Wikipedia, official sites, government databases), the model gets a stronger signal that the page is about real things in the real world. Vague content - "industry leaders," "top tools," "regulatory bodies" - reads as low-evidence and gets discounted.
For local businesses, this means: name your city in the body copy. Name your competitors by name when you compare. Name the schools your team graduated from and the certification bodies they're credentialed by. Most small-business sites read like Mad Libs because the writer was afraid of being too specific. The result is content no LLM has any reason to cite.
Example. A physical therapy clinic page that says "we treat runners and athletes" gets ignored. A page that says "we treat runners training for the New York City Marathon and the Brooklyn Half, plus members of the Prospect Park Track Club, with a clinical specialty in Achilles tendinopathy informed by the 2024 Silbernagel protocol" - that page gets cited when somebody asks ChatGPT for a running-injury PT in Brooklyn.
The second version isn't longer. It's denser. Every clause carries a name.
Pattern 3: Freshness signals
What it is: A visible publication date, a "last updated" date when applicable, recent statistics with the year cited inline, and time-sensitive references ("as of Q1 2026", "the November 2025 algorithm update"). These signals appear in the visible content - not just in the schema markup.
Why it works: Multiple 2026 studies put a hard number on this. Pages updated within the last 60 days are about 1.9x more likely to appear in AI answers, and content with current-year statistics gets cited 28% more often than content with stale numbers.
The reason is risk management. AI engines are trying to avoid surfacing wrong or outdated information. A 2022 stat presented as current is a hallucination risk. The model would rather pull from a page that explicitly time-stamps its claims, even if that page has lower domain authority. Freshness is a trust proxy.
Example. Bad: "Recent research shows that mobile traffic accounts for the majority of web visits."
Good: "As of Q1 2026, mobile devices generate 64.4% of global web traffic according to Statcounter."
The good version has a time anchor, a specific number, a named source, and a link. An LLM can lift the whole sentence and cite it with confidence. The bad version, even if technically true, can't be cited. The model has no way to verify it didn't hallucinate the claim.
Easiest move on the list: add a "Last reviewed [Month Year]" line under the H1 on every evergreen page, and update it when you make material changes. That one change has lifted citation rates on client pages 20-30% in our data.
Pattern 4: Citing your own sources
What it is: Inline links and explicit attributions to authoritative sources within your body copy - studies, government data, named experts, primary documents. Not at the end in a "references" section. Inside the sentence where the claim is made.
Why it works: This is the pattern most small-business sites get wrong. The intuition is "I don't want to send people away from my page." The LLM reality is the opposite. Pages that cite their sources get cited more, because the model treats them as evidence-based rather than assertion-based.
A 2026 study on GEO citation effectiveness found the highest-impact tactic was inserting links to authoritative sources inside the text. Adding statistics with explicit sources in the same sentence came second. Including direct quotes attributed to named experts came third. Content with citations and quotations achieved 30 to 40% higher visibility in AI responses. Keyword optimization didn't make the top 5.
Example. Bad: "Studies show that local businesses with consistent NAP information rank better in Google Maps."
Good: "A 2024 BrightLocal study of 1,800 local businesses found that NAP consistency across the top 50 directories correlated with a 16% lift in Map Pack rankings. The full methodology is published here."
You don't need to cite a source for every sentence. Cite the load-bearing claims, the ones that would discredit the page if they turned out to be wrong. Those are the sentences LLMs scan when deciding whether your page is evidence or noise.
Pattern 5: Comparative tables
What it is: A clean HTML table comparing options across attributes - products, services, plans, methods, locations. Headers in the top row, options in the left column, attributes filled in. Not a screenshot of a table. Real <table> markup.
Why it works: Tables are the easiest format for an LLM to lift directly into an answer. When a user asks "what's the difference between X and Y," the engine wants to return a table. If your page has a clean one comparing X and Y on the right axes, the model can pull it verbatim and credit you. If your page describes the same comparison in prose, the model has to paraphrase. That extra step makes you more skippable, and the engine pulls from a page that already did the structuring work.
Bonus: comparative tables also tend to win featured snippets on classic Google search. One move, two surfaces.
Example. A SaaS site comparing pricing tiers can write 800 words about plan differences, or it can ship a 5-row table with axes for "monthly price / users included / API calls / support tier / SLA." The table gets lifted into Gemini and ChatGPT answers when somebody asks "compare Acme's plans." The 800 words don't.
For local services, comparative tables work for: service-area coverage by zip code, before/after pricing across treatment types, insurance carriers accepted by office location. The pattern generalizes well beyond product comparison.
Pattern 6: Expert author attribution with credentials and sameAs
What it is: Every substantial post has a named human author. The author has a bio with credentials. There's a Person schema block on the page with a sameAs array linking to LinkedIn, the author's portfolio site, professional licensing boards, published research, and any other places the same human exists online. The byline is visible at the top of the post, not hidden in a footer.
Why it works: A 2026 analysis found that pages with author schema are 3x more likely to appear in AI answers. The reason is mechanical. LLMs do entity resolution on the author the same way they do it on the brand. If the author exists in multiple places online and those places are linked in the schema, the model can verify this is a real expert with a track record, not a content-mill byline.
For a clinic, "Reviewed by Dr. Sarah Chen, DPT, OCS" with a sameAs array pointing to her APTA profile, her LinkedIn, and the New York State physical therapy licensing board is an authority signal an LLM can act on. A page authored by "The Acme Wellness Team" with no schema and no credentials gets discounted, even if the content is identical.
Example. A minimal author block that does the work:
By Dr. Sarah Chen, DPT, OCS - Owner, Acme Physical Therapy
Last reviewed: April 2026
[Person schema with sameAs: LinkedIn URL, APTA profile URL,
NY State licensing board URL, personal site URL]
Setup is 30 minutes per author, one time. Most clinics, law firms, and dental practices have at least one named expert who'd be glad to get bylined. Most small e-commerce sites have a founder. Use them. The LLMs are looking.
Pattern 7: Structured FAQs (FAQPage schema plus plain-text Q&A)
What it is: A real FAQ section at the bottom of the page with 4 to 8 question-answer pairs, each Q phrased as a complete question, each A in 30 to 80 words. Plus FAQPage JSON-LD schema markup that mirrors the visible Q&A exactly. The visible Q&A and the schema have to match - mismatches get penalized.
Why it works: This is the single highest-impact move for AI citation we've measured. FAQs are the format LLMs were trained on, and they're the format LLMs output natively. A page with structured FAQs hands the model exactly what it wants: pre-chunked, pre-formatted, ready to lift, and explicitly attributed to a source.
The trick is that the questions have to be the long-tail queries you actually want to be cited for. Not "What are your hours?" The real ones. "Does Acme PT take Aetna?" "How long does a slipped disc take to heal with PT?" "Do I need a referral to see a physical therapist in New York?" Those are queries somebody is asking ChatGPT right now. Answer them in your FAQ in 50 words apiece, and each one becomes a citation candidate.
Example. Six FAQs at the bottom of a service page can do more for AI citation than a 2,000-word blog post sitting in your archive. Shape matters more than volume here. We've watched client pages go from zero AI citations to consistent Perplexity and AIO inclusion in 6 to 8 weeks, just from adding an FAQ block plus FAQPage schema to existing service pages. No new content. Same words, different structure.
If you ship one thing from this post, ship this one.
The 7-pattern self-audit checklist
Take your most important page - the one you most want to be cited for - and run it against these 7 in order. If you can't tick a box, that's the next thing to ship.
- [ ] Direct-answer paragraphs. Top 3 H2s on the page are phrased as user questions. The first sentence under each H2 is the answer in 25 to 60 words.
- [ ] Named entity density. Real names of people, places, products, and organizations appear in the body. Each named entity has at least one canonical link on first mention.
- [ ] Freshness signals. Visible publish date plus "Last updated" line. At least one statistic in the post has a year inline and a linked source.
- [ ] Source citations. Load-bearing claims have inline links to primary sources. References to studies name the study, the year, and link the methodology.
- [ ] Comparative table. If the page covers two or more options, methods, or alternatives, there's a real
<table>comparing them across consistent axes. - [ ] Expert author block. Named author with credentials at top of page. Person schema with sameAs array linking to LinkedIn plus at least one professional registry, license, or publication.
- [ ] Structured FAQ block. 4 to 8 Q&A pairs at the bottom. Questions are the long-tail queries you want to be cited for. FAQPage schema mirrors the visible content exactly.
Pages that hit 7 of 7 get cited. Pages at 4 of 7 sometimes get cited. Pages below 3 are basically invisible to AI engines, no matter how well they rank on classic Google.
ClearGrade's full audit at https://cleargradeai.com grades your content against these 7 patterns and suggests rewrites for the lowest-scoring pages. We also ship the FAQPage schema and the author Person blocks for the pages we flag, so the gap closes the same week we find it.
If you'd rather start with a free read, run the free grade on your homepage and your top service page. The report tells you which of the 7 patterns each page is missing, ranked by what we'd fix first.