Obsurfable

Top Domains Cited by LLMs: July 2026

Obsurfable

If you want to understand where AI answers actually come from, you have to stop guessing and start looking at citations. Over the last year, a number of independent studies have parsed millions of real responses from ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews, extracted the URLs those systems referenced, and counted which domains show up most. The picture that emerges is remarkably consistent, and it is very different from the one traditional SEO would predict.

This is a snapshot of that picture as of July 2026: the domains large language models cite most, how the rankings shift by platform, and what the underlying distribution tells you about being cited yourself.


The top cited domains, at a glance

No two studies use exactly the same methodology, prompt set, or time window, so the exact percentages differ. But the ordering at the top barely moves. Pulling together several of the larger 2026 analyses—Peec AI's review of 30 million sources, BotRank's study of 1.2 million responses, Geonimo's 2.1 million citations, KIME's cross-platform audit, and Everything-PR's synthesis of 680 million citations—the consensus leaderboard looks like this:

RankDomainWhat it isWhy it gets cited
1RedditCommunity discussionAuthentic, threaded, opinionated first-hand experience
2YouTubeVideo platformTranscripts and descriptions for how-to and review intent
3WikipediaReference encyclopediaThe trust anchor for facts, entities, and definitions
4LinkedInProfessional networkCompany profiles and B2B thought leadership
5Trustpilot / G2 / YelpReview aggregatorsStructured signal for commercial and product queries
6Forbes / Business Insider / TechRadarEditorial mediaAuthority for news, comparisons, and buying guides
7NIH / .gov / arXivPrimary researchCredible sourcing for health, science, and technical queries
8Medium / QuoraUser publishing / Q&ALong-tail explanations and opinion

Different studies rank the exact numbers differently. BotRank put Reddit at 11.47%, YouTube at 9.87%, and Wikipedia at 9.43% of citations among the top 100 domains. Geonimo found Reddit cited roughly three times more often than the runner-up. The 5W Q1 2026 audit, focused on ChatGPT, put Wikipedia (13.15%) narrowly ahead of Reddit (11.97%), with OpenAI.com third. The through-line is the same: a very small number of platforms account for a very large share of what AI models point to.


Reddit: the single most-cited domain

Across almost every study, Reddit is either the #1 or #2 most-cited domain. KIME found it appears in roughly one in four AI answers. On Perplexity specifically, several analyses put Reddit at anywhere from one in five citations up to ~46.7% of top-10 source share—the highest concentration of any domain on any platform.

The reasons are both structural and behavioural. OpenAI and Google both signed content agreements with Reddit, so the data is licensed and available. Just as importantly, Reddit's threaded, contradictory, first-person content extracts cleanly into an answer and reads as authentic in a way that marketing copy does not. When a model is trying to answer "which of these tools is actually good," a subreddit thread of real users arguing is exactly the kind of evidence it wants.

One caveat worth knowing: Reddit's share is volatile. Some trackers observed ChatGPT's Reddit citation share swing dramatically over a matter of weeks in late 2025. Citation rankings in general move on the timescale of weeks, not years—a theme we return to below.


Wikipedia: the trust anchor

Wikipedia is the most-cited single domain on ChatGPT in several datasets, accounting for anywhere from ~7.8% of all ChatGPT citations to nearly half of its top-10 source share depending on the study. It also makes up a meaningful slice of the training data behind these models.

Wikipedia plays a specific role: it is the reference the model reaches for when it needs to establish a fact, define a term, or ground an entity. If your company, product, or founder qualifies for a well-sourced Wikipedia entry under its notability rules, that entry becomes a foundational reference the models lean on across many downstream answers.


The rankings change by platform

Averages hide the most useful detail. Each engine has a distinct personality, and where you want to be cited depends heavily on which one your audience uses. Combining the per-platform breakdowns from Contently, CiteMetrix, and the 5W audit, the top sources by engine look roughly like this:

PlatformOperatorCitation lean (top sources)
ChatGPTOpenAIWikipedia, Reddit, LinkedIn, Forbes, Medium — the most encyclopedia- and editorial-heavy
Google AI Overviews / AI ModeGoogleYouTube, Reddit, LinkedIn, Forbes, Wikipedia — strong bias toward Google-owned properties
GeminiGoogleReddit, YouTube, Wikipedia, Medium, Forbes — more distributed, no single dominant source
PerplexityPerplexity AIReddit, LinkedIn, NIH, G2, Quora — heavy on community and primary/review sources
ClaudeAnthropicThe New York Times, The Atlantic, The Economist, FT, Washington Post — long-form editorial weight

A few patterns stand out. Google's surfaces reliably favour Google-owned properties, YouTube chief among them—YouTube has been observed in nearly 30% of AI Overviews, the highest video weight of any engine. Perplexity skews toward research-credible and review-aggregator sources, which is why G2 and NIH punch above their weight there for B2B and health queries. ChatGPT is the most Wikipedia-heavy. Claude, notably, leans on established long-form journalism more than any other engine.

The practical implication: "getting cited by AI" is not one game. Being visible on Reddit helps you almost everywhere, but if your buyers live in Perplexity, review sites and primary research matter far more than they would for a Claude-heavy audience.


The most important number: the long tail

The headline domains are worth understanding, but the most consequential finding in the 2026 data is about everything below them.

Geonimo's analysis of 2.1 million citations found that 73.5% of all citations came from domains outside the top 100, and 63% of the unique domains in its dataset appeared only once. The top 10 domains accounted for only about 17% of citations. Everything-PR's larger synthesis reached a similar structural conclusion: extreme concentration at the very top, but a very long tail underneath.

This distribution is fundamentally different from Google's page-one economy, where a handful of authority domains capture most clicks. AI citation is simultaneously more concentrated (Reddit, Wikipedia, and YouTube dominate the top) and more democratic (three-quarters of citations go to the long tail). That combination is good news if you are not a household name. You do not need to outrank Wikipedia. You need to be the clearest, most specific answer to a narrow question, so that when a model assembles a response on that topic, your page is the one it reaches into the long tail to grab.


What this means if you want to be cited

The data points to a strategy, not a trick:

  • Show up where models already trust the crowd. Reddit, YouTube, and relevant review sites (G2, Trustpilot, Capterra) are cited constantly. Genuine, non-astroturfed participation in the communities your buyers use is one of the highest-leverage things you can do—especially for Perplexity and Google AI Overviews visibility.
  • Earn a Wikipedia entry if you qualify. For entity-level and factual queries, an accurate, well-sourced Wikipedia page becomes a reference the models reuse across many answers.
  • Publish for the long tail. Since most citations go to domains outside the top 100, the winning move is depth on narrow, awkward, specific questions rather than competing head-on for broad terms. Be the single clearest answer to a question the big platforms don't cover well.
  • Optimise per platform, not in the abstract. Decide which engines your audience actually uses, then work backwards from what those engines cite. B2B in Perplexity is a different playbook from consumer discovery in AI Overviews.
  • Measure share of voice, not rank. Because citation share shifts week to week, the only reliable signal is tracking how often you appear across a defined set of prompts over time.

This last point is where a tool like Obsurfable fits. Rather than guessing whether the shift in Reddit's share this month helped or hurt you, you define the Prompts you care about, run retrieval to see how ChatGPT and other models actually answer them, and check whether you're mentioned or cited. Insights then turn those results into concrete recommendations, so you can act on the citation landscape described here instead of just reading about it.


The bottom line

As of July 2026, the domains LLMs cite most are Reddit, YouTube, Wikipedia, LinkedIn, and a rotating cast of review sites, editorial media, and primary research—with the exact order depending heavily on the platform. Reddit and Wikipedia function as structural infrastructure across the whole ecosystem. But the more actionable truth sits in the long tail: nearly three-quarters of citations go to domains most people have never heard of. Being cited by AI is less about beating the giants and more about being the clearest, most specific, most trusted answer to the question a model is trying to resolve.

Figures in this article are drawn from 2026 citation studies by Peec AI, BotRank, Geonimo, KIME, Contently, CiteMetrix, 5W, and Everything-PR. Because methodologies, prompt sets, and time windows differ between studies, treat the exact percentages as directional rather than definitive.