Glossary

The language of AI search

Short, citation-ready, and current: 22 terms covering AISO Score, schema markup, AI crawlers, and citability metrics.

Measurement & Metrics

AI citation
An AI citation is a direct reference to a specific URL in an AI-generated answer from ChatGPT, Perplexity, Google AI Overviews, Claude, or Copilot. Unlike a traditional SERP link, an AI citation requires the platform to choose your page as authoritative enough to name and attribute in its response.

Related: AI Overview, AI share of voice

AI Overview
AI Overview is Google's AI-generated summary that appears at the top of search results, synthesizing information from multiple sources and citing specific URLs. Launched in 2024, it replaces or sits above the traditional organic 10-blue-link results for many informational and how-to queries.

Related: AI citation, Fan-out query

AI share of voice
AI share of voice is the percentage of AI-generated answers on a given topic that cite your brand or website. Measured across ChatGPT, Perplexity, Gemini, Claude, and Copilot, it is the closest equivalent to traditional SERP ranking share in the AI search era — who gets named, not who ranks.

Related: AI citation, AISO Score

AISO Score
The AISO Score (AI Search Optimisation Score) is a 0-100 diagnostic that measures a website's readiness for AI search platforms. It evaluates six dimensions — Crawlability, Structure, Authority, Citability, Freshness, and Measurability — and is developed by Datanalytico, the AI search intelligence platform.

Related: AI citation, LLMO (LLM Optimization), GEO (Generative Engine Optimization)

Fan-out query
A fan-out query is an AI-generated sub-query that a system like Google AI Mode creates internally from a single user question. One input question may fan out into 8-15 synthetic sub-queries, each targeting a specific aspect of the answer. Content planned against ~12 fan-out sub-queries ranks best.

Related: AI Overview, Capsule format

Markup & Structured Data

FAQPage schema
FAQPage schema is the JSON-LD structured-data type that wraps a list of questions and answers on a page, each as a Question with an acceptedAnswer. It makes the Q&A content extractable by Google AI Overview, ChatGPT, and Perplexity, which often cite FAQ answers directly as their response snippet.

Related: Schema markup (JSON-LD), Capsule format

llms.txt
llms.txt is a plain-text file served at the root of a domain (e.g. example.com/llms.txt) that describes the site's key pages, products, and company facts for AI assistants. It functions as a curated summary — analogous to robots.txt for crawl rules or sitemap.xml for URLs — to help LLMs cite the right resources.

Related: robots.txt, Schema markup (JSON-LD)

Schema markup (JSON-LD)
Schema markup is structured data embedded in a web page using the schema.org vocabulary, most commonly as JSON-LD inside a script tag. It gives search engines and AI platforms a machine-readable description of the page's content — Organization, Product, FAQPage, Article, Person — so they can extract and cite it accurately.

Related: FAQPage schema, SoftwareApplication schema, BreadcrumbList

SoftwareApplication schema
SoftwareApplication schema is the schema.org type for SaaS and software products. Key fields include name, applicationCategory, operatingSystem, offers (with price and billingDuration), and featureList. It is the correct schema for product or tool pages — distinct from Organization, which describes the company offering the software.

Related: Schema markup (JSON-LD)

AI Crawlers

ClaudeBot
ClaudeBot is Anthropic's web crawler, used to fetch content for Claude's web-search features and, separately, for training. It is controlled via the ClaudeBot User-agent in robots.txt. Allowing ClaudeBot makes a site eligible for citation in Claude's answers; blocking it excludes the site from that surface.

Related: robots.txt, GPTBot

Google-Extended
Google-Extended is a robots.txt User-agent token that controls whether a site's content may be used to train Google's Gemini models and Vertex AI APIs. Blocking Google-Extended does not affect Google Search indexing or AI Overview citations — it only opts out of training-data inclusion.

Related: robots.txt, AI Overview

GPTBot
GPTBot is OpenAI's web crawler, used to fetch public web content for training its models and powering ChatGPT's search features. Site owners can allow or disallow GPTBot via a User-agent block in robots.txt. Blocking GPTBot removes the site from OpenAI's training data but does not block ChatGPT's live web-search citations.

Related: robots.txt, ClaudeBot, PerplexityBot

PerplexityBot
PerplexityBot is Perplexity AI's web crawler, used to build the index that powers Perplexity's search and Pro Research features. Controlled via PerplexityBot in robots.txt. Perplexity is one of the few AI platforms that shows source citations prominently alongside every answer, making PerplexityBot access a direct citation-eligibility gate.

Related: robots.txt, GPTBot

robots.txt
robots.txt is a plain-text file at the root of a domain (example.com/robots.txt) that tells web crawlers which paths they may or may not access. It uses User-agent directives per bot — including GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Applebot-Extended — to control AI training and search indexing.

Related: llms.txt, GPTBot, ClaudeBot, PerplexityBot, Google-Extended

Content Quality

Capsule format
The capsule format is a 40-60 word answer-first content block placed directly under a question-phrased heading. It is the structural unit AI platforms extract as a citation: the question restates the user query, the answer stands alone without surrounding context, and the target keyword appears in the first 20 words.

Related: Information Gain, Fan-out query

E-E-A-T
E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness — Google's framework for assessing content quality and a central signal in AI citation decisions. It is signalled through author bios with credentials, Person and Organization schema, transparent contact information, and demonstrated first-party experience in the topic.

Related: Schema markup (JSON-LD), AI citation

GEO (Generative Engine Optimization)
GEO (Generative Engine Optimization) is the industry-standard term for optimizing content to be cited by generative AI engines such as ChatGPT, Perplexity, and Google AI Overviews. GEO overlaps substantially with LLMO and the AISO Score methodology — different labels for the same shift from ranking-for-keywords to earning-AI-citations.

Related: LLMO (LLM Optimization), AISO Score

Information Gain
Information Gain is a 0-3 score that measures how much unique value a page adds beyond what competitor pages already cover. 0 = redundant, 1 = reframed, 2 = enhanced, 3 = unique. AI platforms prioritize citing pages with higher Information Gain, making it the single most predictive quality signal for AI citability.

Related: AISO Score, Capsule format

LLMO (LLM Optimization)
LLMO (LLM Optimization) is the practice of structuring website content so that Large Language Models can extract, understand, and cite it correctly. It covers passage-level scoring across five dimensions — clarity, completeness, authority, structure, and specificity — often scored per page on a 0-5 scale alongside the broader AISO Score.

Related: AISO Score, GEO (Generative Engine Optimization), Capsule format

Analytics & Compliance

Core Web Vitals
Core Web Vitals are Google's user-experience metrics: Largest Contentful Paint (LCP ≤2.5s), Interaction to Next Paint (INP ≤200ms), and Cumulative Layout Shift (CLS ≤0.1). They are a direct Google Search ranking factor and also correlate with AI citation rates — fast, stable pages are preferred sources.

Related: AISO Score

Ready to apply these concepts to your site?

A free AISO Score scan shows you in 30 seconds how citable your website is across AI platforms.

Get Your Free AISO Score

Last updated: