Back to Blog
Mar 28, 2026·Research·10 min read

The AI Consistency Problem: Why Your Brand Appears in Less Than 1% of Identical Queries

Ask ChatGPT "what CRM should I use?" right now, then ask it again in 30 seconds. You will likely get a completely different set of recommended brands. A SparkToro study found less than 1% overlap in brand recommendations between identical queries. This changes everything about how AI visibility should be measured — and built.

FZ
Filip Zakravsky
Founder, Pheme

TLDR

AI engines are fundamentally non-deterministic — the same query returns different brands on different runs. SparkToro measured less than 1% overlap in brand recommendations between identical queries run minutes apart. This means there is no "AI ranking" — only citation probability. Brands optimizing for a stable #1 position are chasing something that doesn't exist. The right metric is citation rate across many runs. The right strategy is raising that probability through consistent signal building.

The experiment that changed how we think about AI visibility

In 2025, SparkToro ran a large-scale study on AI brand recommendation consistency. The methodology was straightforward: take hundreds of commercial queries ("best project management software," "top running shoe brands," "which CRM for small business"), run each query multiple times across ChatGPT, Perplexity, and Google AI, and measure brand list overlap between runs.

The finding was striking: less than 1% of brand recommendations were consistent across multiple runs of identical queries. Run the same query twice and you get almost completely different results. This isn't a bug — it's a fundamental property of how large language models work.

For brands that had been celebrating their "AI rank #1" for a specific query, this was a wake-up call. The position they measured was one sample from a distribution — it would be different the next time a potential customer ran the same search.

Why AI engines are non-deterministic

Unlike Google's search index — which produces deterministic results for any given query at a given moment — AI language models introduce randomness at multiple levels. The most significant is temperature: a parameter that controls how much randomness the model injects into its token selection. Most production AI engines run at a temperature above zero, meaning the model deliberately varies its outputs.

This randomness is intentional and valuable for conversational AI — it prevents robotic, repetitive responses and allows for creative variation. But for brand recommendations, it means that even if your brand is the "best" answer statistically, it won't appear in every response. It will appear with some probability, and that probability is what you should be measuring and optimizing.

For AI engines with web search grounding (Perplexity, Google AI Mode, ChatGPT with web search), there's an additional source of variation: the retrieved web results change over time as new content is published and indexed. Two runs of the same query minutes apart may pull from different web sources, leading to different brand mentions.

Variation by engine type

EnginePrimary source of variationConsistency level
ChatGPT (no web search)Model temperature — pure LLM samplingLow — varies with each token generation
ChatGPT (web search)Temperature + real-time web retrieval variationVery low — both model and sources vary
PerplexityReal-time web retrieval + model samplingVery low — heavily influenced by what's freshly indexed
Google AI ModeSearch index + model samplingModerate — search index is more stable than web crawl
ClaudeModel temperature — training data patternsLow to moderate — less real-time retrieval dependency

What this means for measurement

The practical implication is that any single measurement of AI visibility is statistically meaningless. If you run one query, get one result, and conclude "we appear on ChatGPT" — you've measured one sample from a distribution. That sample tells you almost nothing about your actual citation probability.

The right approach is to treat AI citation like a probability problem. For any given query, you have a citation probability — the fraction of the time your brand appears when that query is run. A brand with a 70% citation probability will appear in roughly 70 out of 100 runs of the same query. That's the number you want to measure and improve.

How to measure citation probability correctly

01Define 30–100 queries relevant to your business and target customers
02Run each query 10+ times across each AI engine (not once)
03Record whether your brand appears in each run (yes/no)
04Calculate citation rate: appearances ÷ total runs = citation probability
05Track this weekly — directional movement matters more than absolute numbers
06Segment by engine: your Perplexity probability will differ from your ChatGPT probability

What this means for strategy

If AI citations are probabilistic, then the strategic goal is clear: increase the probability that your brand appears. You're not trying to "rank #1" — you're trying to raise your citation rate from 15% to 50% to 80%.

This reframes the entire optimization problem. Instead of optimizing a single page for a single query (the SEO mental model), you're building signals that increase your brand's overall probability of appearing across a broad query space. The signals that work are cumulative and compounding:

Referring domain breadth raises your floor

Each unique domain that mentions your brand increases the probability that your brand appears in AI training patterns and real-time retrieval. A brand present on 50,000 sites has a statistically higher floor citation rate than one present on 5,000.

Content structure reduces extraction failure

When AI engines retrieve your content, poorly structured pages fail to yield clean brand + expertise signals. Well-structured content (H2/H3/tables/FAQ schema) reliably produces extractable signals on every retrieval — reducing variance.

Content freshness keeps you in the retrieval pool

For engines with recency weighting (Perplexity, Google AI Mode), outdated content is filtered out of retrieval. Publishing regularly keeps your content eligible for citation — a prerequisite for any citation probability above zero.

Brand search volume is a consistency signal

Brands that people actively search for are consistently recognized by AI models across different sampling runs. Brand awareness campaigns that increase brand search volume directly contribute to AI citation consistency.

The competitive opportunity

The AI consistency problem creates an asymmetric opportunity for brands willing to measure probabilistically. Most companies — if they track AI visibility at all — run a few manual queries, celebrate when they appear, and have no systematic understanding of their actual citation probability.

A brand that methodically tracks citation probability across 50 queries and 5 engines has a fundamentally different understanding of where it stands and what to improve. That brand can identify specific queries where probability is low and allocate effort precisely — rather than guessing.

The brands that will dominate AI search in the next 2–3 years are those that recognize today that the measurement model matters as much as the optimization model. If you're measuring wrong, you can't optimize right.

Key takeaways

There is no stable AI ranking — only citation probability
Single-run measurements are statistically unreliable
Measure citation rate across 10+ runs per query
Track per-engine — probabilities differ significantly
Optimize for probability, not position
Cumulative signals (referring domains, brand volume) raise the floor

Measure your actual AI citation probability

Pheme runs each query multiple times across all engines and gives you a statistically reliable citation rate — not a one-off snapshot.

Join the waitlist