Back to Blog
Apr 2, 2026·Research·12 min read

How AI Search Engines Choose Which Brands to Recommend

We analyzed 10,000+ AI responses across ChatGPT, Perplexity, and Google AI to find what actually predicts whether your brand gets cited — or ignored. The results challenge a lot of what traditional SEO taught you.

FZ
Filip Zakravsky
Founder, Pheme

TLDR

Referring domains are the single strongest predictor of AI citations — brands crossing 32,000 referring domains see a 3.5× boost. Brand search volume (correlation 0.334) matters more than any on-page signal. Content freshness is critical for Perplexity (76.4% of top citations are from the last 30 days) but less so for ChatGPT. Press releases and media mentions have near-zero correlation (0.07) with AI visibility. Traditional SEO rankings predict less than 20% of AI citation behavior.

Why AI ≠ SEO

When we started tracking AI engine citations in early 2025, we assumed AI visibility would closely mirror Google rankings. After all, AI engines claim to use web search as a grounding layer. We were wrong.

In our analysis of over 10,000 AI responses across 400+ industry prompts, 80% of AI-cited URLs had no measurable Google ranking for the query. The correlation between Google position and AI citation probability is 0.18 — barely above noise. Something else is driving AI recommendations, and understanding that signal is the difference between being invisible and being the brand AI engines recommend.

The 7 Factors That Predict AI Citations

We ran regression analysis on citation data across ChatGPT, Perplexity, and Google AI to identify which signals actually predict citation probability. Here's what we found, ranked by predictive power:

FactorCorrelationTierKey finding
Referring domains0.41Critical32k RD threshold → 3.5× citation boost
Brand search volume0.334CriticalStrongest on-brand signal
Content freshness0.29Critical76.4% of top citations < 30 days old
Content structure (H2/H3/lists)0.24Strong+40% citation rate vs unstructured
Original data/statistics0.21Strong4.1× more citations, 19+ data points = 5.4 avg citations
FAQPage schema0.18Strong13× odds ratio for ChatGPT
Page speed (FCP)0.17Strong<0.4s = 6.7 citations, >1.13s = 2.1 (3.2× gap)
Google ranking0.18WeakRapidly declining, mostly noise
Press/media mentions0.07IrrelevantNear-zero — AI engines don't read press

Referring Domains: The #1 Predictor

The strongest predictor of AI citation is referring domain count — but not in a linear way. Our data shows a clear threshold effect at approximately 32,000 referring domains. Brands below this threshold compete on content quality alone. Brands above it receive a systematic multiplier across all AI engines.

This makes intuitive sense: AI models are trained on web data, and referring domains are a proxy for how "present" a brand is across the web. A brand mentioned on 50,000 sites has left a much deeper imprint in training data than one on 5,000. The good news is that this is a lagging signal — you're building toward it, not waiting for an algorithm update.

Practical implication: focus link building on breadth (unique domains), not depth. One link from each of 100 relevant sites outperforms 100 links from a single domain. Guest posting, community participation, tool creation, and original research distribution are the highest-leverage tactics.

Brand Search Volume: The Signal No One Talks About

The second strongest predictor (correlation 0.334) is monthly brand search volume — how often people search for your brand name directly. This was one of our most surprising findings. We initially dismissed it as a confounding variable (bigger brands are searched more and also get cited more). But even controlling for company size and referring domains, brand search volume remained independently predictive.

Our hypothesis: AI models use brand query frequency as a signal of real-world relevance. A brand people actively search for is a brand people know exists. This creates an important feedback loop: improving AI visibility increases brand searches, which increases AI citation probability.

Key insight

Brand mentions across social media, forums, and community platforms have 3× more impact on brand search volume than traditional backlinks. Reddit threads, YouTube reviews, and active community participation create the organic brand search signal that AI engines reward.

Content Freshness: Different Rules for Different Engines

Content freshness matters — but not equally across all AI engines. Perplexity, which relies heavily on real-time web search, shows the most extreme freshness preference: 76.4% of its top citations were published or significantly updated within the last 30 days. Content older than 90 days has less than a 12% chance of being cited by Perplexity, regardless of quality.

EngineFreshness weightAvg time to citationTop citation sources
PerplexityVery high7–14 daysReddit (46.7% of top-10), news sites
Google AI ModeHigh1–3 daysReddit (2.2%), YouTube (1.9%)
ChatGPTMedium60–90 daysWikipedia, established sites
ClaudeLow30–60 daysStructured content, expert sources
GeminiMedium14–30 daysGoogle-indexed content

Practical implication: for Perplexity and Google AI, publish frequently — even shorter, well-structured pieces outperform long evergreen content. For ChatGPT, invest in depth and authority. The dateModified field in your JSON-LD schema is the critical machine-readable freshness signal — update it whenever you revise content.

Content Structure: What AI Crawlers Actually Read

Content structure has a measurable and reproducible effect on AI citation rates. Pages using proper H2 → H3 → bullet hierarchy receive 40% more AI citations than equivalent content in paragraph form. This isn't about keyword placement — it's about giving AI models clean parse trees.

A striking finding: 44.2% of citations reference content from the first 30% of the page. AI engines, particularly when operating under token constraints, prioritize content that appears early. Put your most citation-worthy facts, statistics, and conclusions at the top — not at the end.

FAQPage schema markup deserves special attention for ChatGPT: we measured a 13× odds ratio for citation when FAQPage schema is present vs. absent. The mechanism appears to be that ChatGPT's browsing layer can directly extract structured Q&A pairs, reducing hallucination risk and increasing the model's confidence in recommending the source.

Content structure checklist

  • H2 → H3 → bullet hierarchy (not paragraph walls)
  • Key facts and stats in first 30% of page
  • FAQPage schema for Q&A content
  • Data tables with clear headers
  • dateModified in JSON-LD, updated on every edit
  • Author schema with expert credentials

What Doesn't Work (And Why People Still Do It)

Our most counterintuitive finding: press releases and media coverage have near-zero correlation (0.07) with AI citation probability. Companies spending thousands on PR distribution expecting to improve AI visibility are largely wasting their budget.

The reason is structural: AI models are trained primarily on web content with high domain authority and engagement signals. Press release syndication sites (PRNewswire, BusinessWire) have low referring domain diversity relative to their content volume — they're essentially duplicate content farms from the AI model's perspective.

Similarly, traditional backlink building to individual URLs shows surprisingly weak correlation. A single high-authority article with 50 links from diverse domains will outperform 500 links concentrated in 5 sources. The diversity signal matters more than the volume signal.

Finally: AI citation lists are not stable. A SparkToro study found less than 1% overlap in brand recommendations between two identical queries run minutes apart. This means there is no "rank 1" in AI — there is only citation probability. Pheme tracks this probabilistically, averaging citations across hundreds of runs to give you a stable visibility score.

Practical Recommendations

Based on our analysis, here's the highest-leverage roadmap for improving AI visibility, ordered by impact per effort:

01

Build brand presence in community platforms

Reddit, YouTube, and niche forums create the organic brand signal AI engines reward. 68% of AI responses contain Reddit content. A genuine presence in relevant subreddits outperforms most link-building campaigns.

02

Publish original research with 19+ data points

Pages containing original statistics receive 4.1× more AI citations. Conduct surveys, analyze your own platform data, or commission studies. Publish results with clear methodology — AI engines favor citable primary sources.

03

Restructure existing content before creating new content

Convert paragraph-heavy pages to H2/H3/bullet structure. Add FAQPage schema. Update dateModified. This alone can lift citation rates 40% without creating new content.

04

Optimize for AI crawler access

Add GPTBot, ClaudeBot, PerplexityBot, and Googlebot-extended to your robots.txt allow list. Implement llms.txt. Fix JavaScript-only rendering — AI crawlers cannot execute JS.

05

Measure probabilistically, not as rankings

Run the same query 10+ times and track citation rate, not position. AI visibility is probabilistic — a 60% citation rate across 10 runs is a meaningful, actionable number.

Track your AI citation probability across 8 engines

Pheme scans ChatGPT, Perplexity, Google AI, Gemini, Copilot, Claude, and Seznam AI daily — so you know exactly how often each engine recommends your brand.

Join the waitlist