What is the Princeton KDD 2024 paper?

Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, and Deshpande's 2024 paper 'GEO: Generative Engine Optimization,' presented at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (arXiv:2311.09735). It is the founding peer-reviewed work that formalized GEO as a research field by building GEO-bench, a 10,000-query benchmark, and empirically testing 9 distinct optimization tactics across multiple domains.

What were the 9 tactics tested in the Princeton paper?

Source emphasis (+115%), expert quotes (+41%), statistics (+40%), inline citations (+30%), authority signaling (+25%), improved fluency (+15%), easy-to-read text (+12%), topic relevance (+10%), and keyword stuffing (-22%, NEGATIVE — do not use). The percentages reflect measured changes in citation visibility across 10,000 generative engine queries.

What is Position-Adjusted Word Count (PAWC)?

PAWC is one of two primary visibility metrics introduced in the Princeton KDD 2024 paper. It measures how much of the generative engine's synthesized response is sourced from a given page, weighted by the position in the response (top-of-answer carries higher weight). The other metric is Subjective Impression — judges' rating of how prominently the source is featured. Both metrics moved together for most tactics tested.

How does best-aeo-skill operationalize the Princeton paper?

Each of the 9 Princeton tactics maps to a specific evidence collector in best-aeo-skill's scoring engine, and each collector maps to a numbered Rule in SKILL.md. For example: source emphasis (+115%) → citation_check collector → Rule 12 and Rule 15. Expert quotes (+41%) → quote_extractor → Rule 13 and Rule 14. Statistics (+40%) → statistic_density → Rule 11. This 1-to-1 mapping ensures every recommendation is traceable back to peer-reviewed research.

Where can I read the original Princeton paper?

The paper is available open-access on arXiv at arxiv.org/abs/2311.09735. The version of record is published in the Proceedings of the 30th ACM SIGKDD with DOI 10.1145/3637528.3671900. The accompanying GEO-bench dataset and code are on GitHub at github.com/GEO-Optim/geo-bench.

The Research Behind Generative Engine Optimization

▎ The State of GEO Research

Generative Engine Optimization (GEO) is a young field. The term was formally introduced in November 2023 with the arXiv preprint of "GEO: Generative Engine Optimization" by a Princeton-led team, and presented at KDD 2024 — the Association for Computing Machinery's premier data science conference. Before that paper, the entire literature on optimizing for AI-generated answers was practitioner blog posts and vendor whitepapers.

The Princeton paper changed that. It formalized GEO as a measurable discipline by:

Building GEO-bench, a 10,000-query benchmark spanning 9 domains (legal, history, science, business, etc.)
Defining Position-Adjusted Word Count (PAWC) and Subjective Impression as standardized citation-quality metrics
Empirically testing 9 distinct optimization tactics against this benchmark
Measuring per-tactic and per-domain effects with statistical rigor

Two years later (2026), the paper has accumulated hundreds of citations and remains the only widely-cited peer-reviewed work that quantifies which tactics actually move the needle on AI citation rates. Every other "GEO study" you'll see on the internet — from agencies, vendors, or commenters — either cites this paper or makes uncalibrated claims.

That's why best-aeo-skill operationalizes Princeton specifically. When the user asks "will this work?" — we can point to peer review, not anecdote.

▎ Princeton KDD 2024 — Deep Dive

▎ Primary citation

"GEO: Generative Engine Optimization"

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A.

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24) · ACM, August 2024

arXiv:2311.09735 DOI:10.1145/3637528.3671900 GEO-bench repository

The setup

The team built GEO-bench: 10,000 user queries across 9 domains. For each query, they generated a baseline response using a generative engine (a synthesized answer with cited sources). Then they applied each of 9 candidate optimization tactics to the source content and re-ran the query — measuring whether the modified source got more visibility in the new synthesized response.

"Visibility" was operationalized two ways:

Position-Adjusted Word Count (PAWC) — how much of the synthesized answer is sourced from this page, weighted by where in the answer it appears (top-of-answer = higher weight)
Subjective Impression — judges' rating of how prominently the source is featured in the response

Both metrics moved together for most tactics. The paper reports composite "visibility uplift" percentages, which we use throughout this site.

The headline result

The single most surprising finding from the paper:

Source emphasis — the simple act of bolding citations or framing them prominently — increased citation likelihood by +115%. This was the strongest effect of any tactic tested. Aggarwal et al., 2024 — Section 5.2

Two implications:

You can more than double your AI citation rate with formatting alone, no new content needed
Most existing content on the internet under-emphasizes its sources, which is why "everyone" is dissatisfied with their AI search performance

The paper also identified two negative findings: tactics that reduce visibility. Keyword-stuffing was the most prominent — confirming that the same tactic that hurts in modern Google also hurts in generative engines, possibly more aggressively.

▎ The 9 Validated Tactics · measured impact

#	Tactic	Description	Visibility impact
1	Source emphasis	Bold or otherwise emphasize cited sources, references, attribution.	+115%
2	Expert quotes	Add 2-4 attributed quotations per ~1000 words. Use quotation marks with speaker name.	+41%
3	Statistics	Add numeric claims with sources. Target ~1 stat per 200 words.	+40%
4	Inline citations	Reference primary sources at the point of claim, not only at the bottom.	+30%
5	Authority signaling	Credential markup, named contributors, institutional affiliation.	+25%
6	Improved fluency	Natural language; reduced formulaic phrasing; varied sentence length.	+15%
7	Easy-to-read	Flesch-Kincaid grade 8-10. Higher (academic) loses general AI; lower (oversimplified) loses authority.	+12%
8	Topic relevance	One primary topic per page. Avoid multi-topic mash-up content.	+10%
9	Keyword stuffing	Stuffing the page with target keywords.	-22%

▎ How best-aeo-skill operationalizes Princeton · 9 tactics → 9 measurable signals

A research paper is just text until someone implements it. We built best-aeo-skill as a one-to-one operationalization. Each Princeton tactic maps to a specific evidence collector in our scoring engine, and each collector maps to a numbered Rule in SKILL.md:

Source emphasis (+115%)

→

citation_check

→

Rule 12, 15

Expert quotes (+41%)

→

quote_extractor

→

Rule 13, 14

Statistics (+40%)

→

statistic_density

→

Rule 11

Inline citations (+30%)

→

citation_check

→

Rule 12

Authority signaling (+25%)

→

author_check

→

Rule 41, 56, 57

Improved fluency (+15%)

→

fluency_check

→

Rule 19, 20, 21

Easy-to-read (+12%)

→

readability

→

Rule 19

Topic relevance (+10%)

→

passage_score

→

Rule 35

Keyword stuffing (-22%)

→

hedge_density

→

Rule 91

When you run bestaeo audit, each finding the skill returns is grounded in this map. If a finding says "Add expert quotes — projected +12 GEO score," you can trace it to quote_extractor → Rule 13 → Aggarwal et al., 2024, Section 5.2, Tactic 2. No invented metrics.

▎ Industry Empirical Data · 2026 figures we track

Beyond the Princeton paper, the field generates ongoing empirical data from industry sources. We track the most useful figures and update our scoring weights when reliable measurements appear:

25.11%

Google searches triggering AI Overviews (Q1 2026)

87%

AI referral traffic via ChatGPT

10.13%

Domains with /llms.txt

3.2×

More citations for content under 30 days

Sources we cite

SE Ranking — audited 300,000 domains for llms.txt presence (Q1 2026); reports 10.13% adoption.
Superlines — quarterly tracking of Google AI Overview trigger rates; up from 13.14% in March 2025 to 25.11% in Q1 2026.
Position.digital — analysis of AI referral traffic distribution across engines; ChatGPT dominates at 87%.
HubSpot — case studies showing 6× AI-referred trial uplift within 7 weeks of consistent optimization.
OpenAI usage reports — ChatGPT WAU 900M, monthly visits 5.72B (2026).
SimilarWeb — zero-click search rate tracking; 43% in standard mode, 93% with AI Mode active.

None of these are peer-reviewed in the academic sense, but they are traceable empirical figures from organizations whose business depends on the data being accurate. We treat them as Tier-2 citations: useful, but explicitly marked as industry data, not peer-reviewed research.

▎ best-aeo-skill methodology · composite scoring + confidence labels

The 4-vector composite

The Princeton tactics cluster into four orthogonal vectors. We weight them based on what's most actionable for the typical site:

Technical Accessibility (20%) — robots.txt, AI bot allowance, JS rendering. If crawlers can't reach you, prose doesn't matter.
Content Citability (35%) — statistic density, expert quotes, citations, freshness. The single biggest weight, because Princeton's strongest tactics live here.
Structured Data (20%) — FAQPage, Article, Organization, HowTo, Speakable. Beyond Princeton, but empirically high-leverage for AI Overviews and Perplexity.
Entity & Brand Signals (25%) — author credentials, Knowledge Graph linking, NAP consistency. Sustained citation requires entity presence, not just one-off content quality.

Weights adapt to your business profile (SaaS, e-commerce, publisher, local, agency, devtools, academic, default). A SaaS landing isn't audited like a news article; the Schema vector matters more for SaaS, Citability matters more for publishers.

Confidence labels

Every finding output by the skill carries one of three labels:

Confirmed — directly observed by an evidence collector. Example: parse_html.py returned no <title> tag.
Likely — inferred from ≥2 collectors that agree. Example: schema_validate found no FAQPage AND quote_extractor detected Q&A patterns.
Hypothesis — LLM judgment or single weak signal. Always flagged for human review.

This is the anti-hallucination guarantee: no recommendation is ever presented without a label. If a tool tells you "fix this" without saying how confident it is — be skeptical.

Score bands

86-100 Excellent · cited frequently · maintain freshness
68-85 Good · regular citation, gaps to fix · apply top-3 fixes
36-67 Foundation · indexed but rarely cited · run full audit, fix everything
0-35 Critical · effectively invisible · fix Technical and Schema first, then content

Below 36, a low score is almost always a technical or schema problem, not a content problem. The audit's recommended action ordering reflects this.

▎ Bibliography

[1] Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A. (2024). "GEO: Generative Engine Optimization." Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24). arxiv:2311.09735 · doi:10.1145/3637528.3671900

[2] SE Ranking (2026). "llms.txt Adoption Audit: 300,000 Domains." SE Ranking Research Q1 2026.

[3] Superlines (2026). "AI Search Statistics 2026: Quarterly Tracking." Superlines Research Q1 2026.

[4] Position.digital (2026). "AI Referral Traffic Distribution: Engine-by-Engine." Position Industry Report 2026.

[5] HubSpot (2026). "Answer Engine Optimization Case Studies." HubSpot Marketing Blog 2026.

[6] OpenAI (2026). "ChatGPT Usage Disclosure 2026." OpenAI public reports.

[7] SimilarWeb (2026). "Zero-Click Search Rate Tracking." SimilarWeb Research 2026.

[8] Schema.org Community (2026). "Schema.org Vocabulary Specification." schema.org. schema.org

[9] llmstxt.org (2024). "llms.txt Specification." llmstxt.org. llmstxt.org

[10] Google Search Central (2026). "AI Overviews Documentation." Google Developers. developers.google.com

▎ Next steps

Audit your site → Read the full SKILL.md → Read the GEO/AEO guide → View Princeton paper ↗ Install best-aeo-skill →