Reports
What Is AI Reading? Inside the Hidden Mechanics of Generative Citations

As generative AI reshapes the digital landscape, a new question is emerging at the center of content creation and discovery: What exactly is AI reading? A groundbreaking study titled What is AI Reading from Generative Pulse by Muck Rack analyzed over 1 million citations from major AI systems—including OpenAI’s ChatGPT (4o and 4o-mini), Google’s Gemini (Flash and Pro), and Anthropic’s Claude (Sonnet and Haiku)—to uncover the hidden dynamics behind the links these models use when generating responses.
The findings are not only revealing but transformative for anyone in journalism, corporate communications, SEO, or brand strategy.
Citations Aren’t Just Add-Ons—They Reshape AI Behavior
As is obvious to anyone immersed in the world of AI, simply enabling or disabling citation functionality changes the answers themselves. When citations are off, AIs rely more heavily on static training data. But when citations are turned on, the models generate materially different outputs, directly shaped by the real-time sources they pull from.
Key Example: Asked about the worst Major League Baseball team, a citation-disabled AI mentioned the 1962 Mets. But with citations on, it updated the answer to include the 2024 Chicago White Sox with a record-breaking 41–121 season—explicitly citing CBS Sports.
The Dominance of Earned Media
Over 95% of all cited sources come from non-paid media. This includes:
- 27% journalistic content (e.g. Reuters, AP, Financial Times)
- 18% government/NGO sites
- 13% academic or research sources
- 10% aggregator/encyclopedic platforms like Wikipedia or Visual Capitalist
By contrast, paid or advertorial content accounts for less than 5% of citations, making it clear that AI models are systematically biased against marketing-driven content.
Recency Bias: Why New Content Wins
Freshness matters—particularly for OpenAI’s models. In journalistic content, 56% of citations made by ChatGPT were published within the last 12 months, compared to 36% for Claude. This tendency, known as recency bias, refers to the preference for newer, more recently published sources over older ones, even when older sources may still be accurate or relevant.
In the context of generative AI, recency bias means that language models—especially those like ChatGPT that are connected to real-time data—are more likely to reference and trust newly published material, particularly when responding to queries involving current events, emerging technologies, or policy changes. For time-sensitive prompts like “latest advancements in outpatient treatment” or “recent sound recording innovations,” the model heavily weights content that’s been published in the last few months, assuming it carries more relevant or updated insights.
This is a critical insight for content creators and brand strategists: if your material is outdated—even by a year—it is significantly less likely to surface in AI-generated answers. Keeping your content fresh isn’t just good SEO—it’s essential for visibility in the age of AI.
Different Prompts Trigger Different Sources
AI models don’t cite sources randomly—they choose based on the type of question being asked. Different prompt styles lead to different types of sources being referenced:
- Fact lookups and encyclopedic queries tend to draw from static reference sites like Wikipedia and Britannica, relying on well-established but often older information.
- Recent event questions typically trigger citations from major newsrooms such as AP, Reuters, or Axios, where speed and recency are key.
- Advice or opinion-seeking prompts shift the model toward more dynamic and conversational sources like blogs, forums, or platforms such as Reddit or Medium.
- Academic or research-oriented tasks lead AI to cite from journals, preprint servers like arXiv, or government-backed repositories such as PubMed or NCBI.
- Creative requests or step-by-step instructions frequently surface user-generated content, informal how-tos, or community discussion threads from platforms like Quora or niche tech forums.
This variation means the way a question is phrased can have a direct impact on which domains are elevated—and which are left behind.
Claude, for instance, is far less likely to cite major outlets like Reuters than ChatGPT or Gemini, citing Reuters 50x less frequently than ChatGPT.
Authority and Domain Matter—But Not Uniformly
While high-authority outlets dominate, they’re not the only players. Only 15% of the top-cited sources appear in the top 10 across multiple industries. This means that niche-specific content is rewarded. For instance:
- In Finance, sources like Bankrate and NerdWallet are favored.
- In Healthcare, government sources like CDC.gov and NIH.gov dominate.
- In Technology, learning platforms such as Udemy, Coursera, and Medium rise to the top.
On page 15, a visual heatmap shows that Claude exhibits the most domain-specific diversity, frequently selecting industry-unique sources, whereas ChatGPT and Gemini tend to rely more heavily on generalist media.
Industry-Specific Insights: What AI Cites by Sector
Finance & Insurance
- Journalism accounts for 37% of citations, more than any other industry.
- Claude’s top 10 sources are 90% unique, indicating deeper niche exploration.
Healthcare
- Government and NGO sites are cited 18% of the time, more than double the cross-industry average.
- Gemini leads in source diversity for this sector.
Travel/Airline
- Surprisingly, academic citations are nearly absent (just 0.7%).
- Sources like FAA.gov and IATA.org dominate, with less reliance on news outlets.
Retail & E-Commerce
- Aggregators like Wikipedia are less cited here than in other industries (36% vs. 28%).
- Claude cites the most niche content.
Media/Entertainment
-
Journalism leads again at 37%, with niche platforms like TVTechnology and Radioking cited frequently by Claude.
Technology
- Virtually no encyclopedic or academic sources are used.
- Platforms like Medium, Coursera, and SproutSocial appear prominently, reflecting a lean toward practitioner-based knowledge.
Implications for Communications and SEO Teams
The findings of this report reveal that Generative Engine Optimization (GEO) is becoming as important as traditional SEO. AI isn’t just summarizing static databases—it’s actively linking to sources in real time. And those links are influenced by:
- Recency: Update your content regularly.
- Domain Authority: Build backlinks and trust.
- Niche Relevance: Create content tailored to your industry, not just general topics.
- Content Type: Focus on earned media and informative content rather than pure marketing pages.
This changes the calculus for content marketers, PR professionals, and publishers. If your goal is to show up in AI-generated results, you must create content that AI finds valuable—not just users or Google.
Conclusion: The Consequences of Being Read (or Ignored) by AI
This report highlights a fundamental shift in how information is surfaced online: AI models don’t just retrieve content—they selectively curate it. And that curation is redefining visibility in the digital age.
For publishers, researchers, and brands, being cited by AI means being part of the next generation of search. It puts your content in front of users who may never visit your site but trust the model referencing it. The sources that get cited are amplified. Those that don’t—regardless of quality—risk being excluded from the conversation entirely.
This shift creates new winners and losers. High-authority outlets and timely, earned media are favored. Meanwhile, paid content, lightly updated blogs, or less-established voices often go unread—not just by people, but by the systems shaping what people see.
As generative AI continues to play a central role in how knowledge is delivered, the key question becomes less about how to rank in search and more about: How do you become part of what AI considers worth citing?










