AI Sources: Where AI Gets Its Information
Get clear on where AI actually learns about your brand by breaking sources into five categories—pretraining knowledge, real-time retrieval, search indexes, independent crawling, and knowledge graphs—and understand that AI assembles brand perception from multiple sources rather than crawling only your official site.
- Track
- GEO Foundations
- Module
- Core GEO Concepts
- Duration
- 15 min
- Format
- Video
- Views
- 501
Lesson Overview
GEO teams often ask: “Where does the AI actually learn about us?” This is one of the most critical foundational concepts in GEO. Without understanding the structure of AI’s sources, it’s hard to design content and channel strategy.
This lesson breaks down the AI information-source system clearly, helping learners understand why problems like “the official site is correct, yet the AI still answers wrong” occur.
Core Concepts
The Five-Category Breakdown of AI Information Sources
Based on summaries of AI answer engines and Google / Knowledge Graph public documentation, AI information sources can be broken into five layers (Sources: Search Engine Land, support.google.com).
- Pretraining knowledge: the model’s training data forms a base of world knowledge, but it has a cutoff date and cannot cover the latest facts (Source: Search Engine Land).
- Real-time search and retrieval: for new, dynamic, and time-sensitive questions, the AI needs to retrieve web pages and indexes in real time (Source: Search Engine Land).
- Search engine indexes: many AI search products don’t know about the world out of thin air—they rely heavily on existing search indexes. Google AI Overviews pulls primarily from Google search results, and ChatGPT Search also depends on third-party search providers and partner content (Source: Search Engine Land).
- Independent crawling and platform-built indexes: independent search products like Perplexity, for example, crawl web pages themselves and build their own retrieval layer—which means brands can’t focus on Google alone (Source: Search Engine Land).
- Knowledge graphs and authoritative fact bases: Google states explicitly that the facts in the Knowledge Graph come from public sources, licensed data, and information provided directly by content owners. In other words, the brand’s official site is not the only source—third-party authoritative material, structured information, and the consistency of external sources also affect entity understanding (Source: support.google.com).
Core Methodology
We recommend giving learners a model to work with:
AI Sources = owned content + search indexes + third-party authority + knowledge graph + real-time web signals
From this, an important conclusion follows: AI doesn’t just “crawl your official site”—it “assembles your brand perception from multiple sources.” This also explains why “the official site is correct, yet the AI still answers wrong” happens—the error often comes from gaps or inconsistencies in the external evidence layer.
In-Class Exercise
Pick a brand and perform an AI Sources tracing analysis:
- Across 5 prompts, look at which sources the AI answers cited
- Distinguish between owned domains, media sites, forums / communities, directory sites, and documentation sites
- Record whether any incorrect sources / low-quality sources / competitor sources appeared
Learning Outcomes
- An “AI Sources Audit” table
- A “Brand External Evidence Source” checklist
- A “High-Risk Erroneous Source Investigation” table
- The ability to distinguish AI’s categories of sources, explain why “the official site is correct yet the AI still answers wrong,” and identify gaps in the brand’s external evidence layer