Written by

Jonathan Taylor

Published on

May 7, 2025

Running Large-Scale LLM Analysis for Generative Engine Optimization

The Dawn of a New Marketing Channel

Generative Engine Optimization (GEO) is emerging as one of the hottest topics in digital marketing today. It attracts both excitement and skepticism in a space already prone to hype. Traditional SEOs debate whether it's truly revolutionary or just another branch of existing strategies. Regardless of your stance, there's no denying the impact: brands are seeing dramatic increases in referral traffic from platforms like ChatGPT and Perplexity as their adoption continues to grow.

However, we face a significant challenge: despite all the excitement, we lack substance. We're applying playbooks that worked for SEO without the data points and measurement frameworks to determine if strategies need adaptation for GEO. There are plenty of screenshots demonstrating effective strategies, but can we scale this and turn LLM search into a powerful marketing channel?

This question drove my decision to run a large-scale LLM analysis for generative engine optimization.

The May 2025 Core Sample Report

I named it the "Core Sample" report because understanding these platforms requires drilling deep beneath the surface. Consider the countless variables: context variations, memory usage, and differences in how queries are processed by Perplexity versus ChatGPT. SEO already involves optimizing for countless keywords, but now query structure, context, and user role all play crucial factors in what LLMs return.

Despite these complexities, I believe we can gain data-driven insights. This post explains exactly how we conducted our large-scale analysis.

Our Methodology: Going Beyond Manual Testing

Our Methodology: Going Beyond Manual Testing

The simplest way to check your brand presence on ChatGPT is to open it and ask directly. There you go—you've completed one query. However, doing this at scale to allow brands to understand their performance across various queries would be prohibitively time-consuming.

For our initial report, we built a custom application that took keywords and used OpenAI's ChatGPT to transform them into natural language queries. We then submitted these queries to both Perplexity and ChatGPT and collected the responses. From these responses, we extracted citation URLs, using a web crawler to review the pages for on-page and content analysis metrics. We integrated with platforms like PageSpeed Insights for technical performance measurements and connected with Moz for domain authority analysis.

In total, we processed 651 queries and analyzed 3,165 cited pages across both LLM platforms.

Translating Keyword Research into LLM Optimization

When you consider it, the infinite possibilities for query structure are mind-boggling. We don't get data from these LLMs about what people are asking (like we do with Google Search Console for SEO), so we needed to translate our keyword research into natural language queries.

We focused on gathering a core set of keywords that represented particular industries or targets in B2B, then used a script to generate both informational and commercial queries for each keyword. For example, with "generative engine optimization," we'd ask "What is generative engine optimization and why is it important?" for informational intent and "What are the best tools and services for generative engine optimization?" for commercial intent.

Our query generation process maintained the original keyword while creating conversational, human-like questions. This approach allowed us to systematically analyze how LLMs respond to different query intents while using the same core keyword.

Technical Architecture Behind Our Analysis System

Behind the scenes, we built a Node.js-based application with modular components for different aspects of the analysis. We implemented a rate limiting system through a custom ServiceRateLimiter class to enforce appropriate delays between API calls, preventing throttling while maximizing throughput. Where possible, we processed multiple requests simultaneously to improve efficiency, and our system included robust error handling with exponential backoff for failed requests.

We faced significant technical challenges. Each LLM platform has different response structures and citation formats. We created dedicated extraction functions for each platform that could clean URLs, remove query parameters, and validate accessibility before proceeding with deeper analysis. Our data normalization process standardized data formats across different platforms for consistent analysis.

Data Collection: Beyond Surface Metrics

A key challenge with large-scale LLM analysis is the comprehensive nature of the data collection. Each query returns content from the LLM plus citation URLs that we needed to analyze. These citation URLs were particularly interesting because they revealed patterns we could use to determine important factors when optimizing for LLMs.

For each cited page, we collected approximately 150 data points across several categories including technical SEO factors, domain authority metrics, and content structure and quality indicators. Technical SEO analysis included schema markup presence (found in 69% of cited pages versus industry average of 20%), HTML structure scores, mobile-friendliness (present in 95% of citations), SSL implementation (present in 98% of citations), Core Web Vitals metrics, semantic HTML usage, and CDN implementation.

We also gathered domain authority metrics such as Domain Authority scores (median: 45, not 62.8 as originally thought), Page Authority scores, backlink counts, referring domains, spam metrics, and link propensity scores. Content analysis included content type (blogs dominated at 59% of citations, up from our initial estimate of 50%), word count, URL folder depth (average: 2.9, significantly higher than our initial 1.7), content freshness (only 11% less than 3 months old, rather than 41%), readability scores (average: 6.9/10), content depth scores (average: 6.9/10), citation match quality (average: 7.7/10), and content uniqueness (average: 6.4/10).

AI-Powered Content Quality Analysis

One of the most innovative aspects of our methodology was using AI itself to evaluate content quality at scale. We implemented an analyzeContent function that leveraged GPT-4 to evaluate cited pages across multiple dimensions: content depth (how thoroughly the topic is covered), citation match quality (how well the content addresses the original query), content uniqueness (how original the content is compared to similar resources), and readability (how accessible the content is to the average reader).

We also had fun with content classification by creating a "Rock Paper Scissors Lizard Spock" framework. "Rock" content (75%) represents foundational pillar content with comprehensive information. "Paper" content (2%) is thinner content attempting to cover a lot of ground. "Scissors" content (11%) refers to opinion-based content like reviews or editorial pieces. "Lizard" content (12%) is time-based content (e.g., "best shoes in 2025"). Finally, "Spock" content (0%) represents highly imaginative or speculative content.

This classification wasn't just entertaining—it helped us understand what content types LLMs favor for particular queries. The overwhelming preference for "Rock" content (75%) suggests LLMs strongly favor comprehensive, authoritative resources.

Platform Differences: ChatGPT vs. Perplexity

Our analysis revealed interesting differences between the two platforms we studied. We processed different volumes per platform (ChatGPT: 325 queries, Perplexity: 326 queries) and our code tracked platform-specific citation metrics for comparative analysis. While both platforms showed preferences for high-authority domains, the distribution patterns varied.

One particularly interesting finding was that Perplexity cited more recently published content on average than ChatGPT, suggesting potential differences in how the platforms prioritize content freshness. This insight alone could influence how content strategies might be tailored to each platform.

Key Findings and Surprises

Several findings surprised us during our analysis. Schema implementation is dramatically overrepresented—69% of cited pages had schema markup, significantly exceeding the industry average of 20%. This suggests structured data may be far more important for LLM visibility than for traditional SEO.

Page speed seems less critical than expected. Despite poor Core Web Vitals scores (FCP: 4.7s, LCP: 10.4s), pages were still frequently cited, suggesting speed may be less important for LLM citations than for traditional search rankings.

URL structure impacts visibility significantly, but not as we initially thought. URLs with Depth 3 (e.g., /products/category/item) were most frequently cited (49%), with an average depth of 2.9. This suggests that LLMs prefer more specific content paths with greater context, contrary to our initial theory about first-level content accessibility.

Content freshness appears less critical than we initially believed, with only 11% of cited content being less than 3 months old. This is a dramatic difference from our preliminary findings and suggests that evergreen content may perform better than we thought. It's worth noting that 64% of content had an unknown date, which could be skewing these results.

Regarding domain authority, we found that the medium-authority bracket (26-50 DA) dominated citations at 39%, with a median DA of 45 and only 42% of citations coming from high-authority domains (DA above 50). This suggests a more democratic distribution than our initial findings indicated.

Finally, blog content led the way even more definitively than we first thought, dominating LLM citations at 59% (up from our initial 50%), with product pages at 27%, documentation at 8%, and other content types at 6%. This strongly reinforces that narrative formats work exceptionally well for LLM understanding and citation.

Methodology Limitations and Controls

I'm not claiming this report is definitive. Anyone running large-scale LLM analysis faces challenges we must acknowledge:

The sheer variety and potential word combinations to craft queries
Factors like context and user location affecting results
Lack of a foundational data source like Google Search Console

We attempted to control for these variables by creating new instances of ChatGPT or Perplexity for each query, minimizing prompts (using vanilla versions of the platforms), keeping system prompts simple (e.g., "You are a helpful assistant"), and implementing randomness control measures (dice roll distribution analysis). Our goal was not to create the final word on GEO but to establish baseline indicators for experimentation and optimization in this new digital marketing channel.

The Evolution of Our Analysis Approach

Our data collection and analysis approach continues to evolve as LLMs themselves evolve. We've already migrated from storing data in Airtable to using JSON objects for more flexible data storage. Our future plans include integration with database systems for enhanced data access and visualization, expanded query classification with more query types, and more queries (targeting 1,500-2,000), which should yield 7,000-10,000 cited pages. We're also developing technical infrastructure improvements to handle larger scale and adding more platforms (Claude with web search, Gemini, AI overviews).

Practical Applications for Marketers

How can you apply these findings to your GEO strategy? While we're still in the exploration phase, our data suggests several practical tactics.

Implement schema markup right away. With 69% of cited pages using schema (vs. 20% industry average), this appears highly beneficial for LLM visibility. Consider creating deeper content hierarchies in your site structure, as URLs with Depth 3 were most frequently cited (49%), contradicting traditional SEO wisdom about flatter architectures.

Prioritize blog formats even more strongly than we initially thought, as they dominated citations (59%), suggesting narrative formats work exceptionally well for LLMs. Don't be overly concerned about content freshness—our updated data shows only 11% of cited content was less than 3 months old, suggesting that evergreen, high-quality content may perform better than constantly updated material.

Create content with high citation match quality—content that directly answers user queries (scoring 7.7/10 on average) performed best in our analysis. Don't forget to maintain baseline technical factors. While page speed seems less critical, mobile-friendliness (95%) and SSL (98%) appear to be baseline requirements for getting cited by LLMs.

The Next Chapter: June 2025 Core Sample Report

We're already planning our next report, which will expand on our original methodologies while addressing several areas for improvement. We'll gather better information about on-page elements like headings, tables, images, and videos. We're expanding our query classification with more query types and increasing scale with more queries and platforms. Technical infrastructure improvements are in development, along with advanced content analysis methodologies.

Conclusion: Exploration and Experimentation

We're in the exploration age of generative engine optimization. Much overlaps with SEO—if you're working on GEO initiatives using current strategies, they should benefit your SEO efforts too.

However, I see unique opportunities to be more personalized, include more context in our content, and be much more specific to target users or customers. I'm using this data with my clients to help them develop competitive advantages, experimenting to win citations and increase traffic from this new channel.

The May 2025 Core Sample Report is just the beginning of our journey into understanding how to optimize for LLMs. The patterns we've uncovered should spark experimentation, not rigid conclusions. I invite you to join us in exploring this exciting new frontier in digital marketing.

Want first access to our June 2025 Core Sample Report? Subscribe to our updates and be among the first to receive our expanded analysis across more platforms and query types.

Running Large-Scale LLM Analysis for Generative Engine Optimization

The Dawn of a New Marketing Channel

The May 2025 Core Sample Report

Our Methodology: Going Beyond Manual Testing

Translating Keyword Research into LLM Optimization

Technical Architecture Behind Our Analysis System

Data Collection: Beyond Surface Metrics

AI-Powered Content Quality Analysis

Platform Differences: ChatGPT vs. Perplexity

Key Findings and Surprises

Methodology Limitations and Controls

The Evolution of Our Analysis Approach

Practical Applications for Marketers

The Next Chapter: June 2025 Core Sample Report

Conclusion: Exploration and Experimentation

Read More

What is Generative Engine Optimization?

The Marketer's Guide to Vibe Coding: Automating Marketing with AI-Generated Code

Creating an AI Content Generation Pipeline: Behind the scenes of Explain Like I'm Sci-Fi