Glossary
The language of AI search
Short, citation-ready, and current: 22 terms covering AISO Score, schema markup, AI crawlers, and citability metrics.
All terms
- AI citation
- AI Overview
- AI share of voice
- AISO Score
- BreadcrumbList
- Capsule format
- ClaudeBot
- Consent Mode v2
- Core Web Vitals
- E-E-A-T
- Fan-out query
- FAQPage schema
- GEO (Generative Engine Optimization)
- Google-Extended
- GPTBot
- Information Gain
- LLMO (LLM Optimization)
- llms.txt
- PerplexityBot
- robots.txt
- Schema markup (JSON-LD)
- SoftwareApplication schema
Measurement & Metrics
- AI citation
- An AI citation is a direct reference to a specific URL in an AI-generated answer from ChatGPT, Perplexity, Google AI Overviews, Claude, or Copilot. Unlike a traditional SERP link, an AI citation requires the platform to choose your page as authoritative enough to name and attribute in its response.
- AI Overview
- AI Overview is Google's AI-generated summary that appears at the top of search results, synthesizing information from multiple sources and citing specific URLs. Launched in 2024, it replaces or sits above the traditional organic 10-blue-link results for many informational and how-to queries.
- AISO Score
- The AISO Score (AI Search Optimisation Score) is a 0-100 diagnostic that measures a website's readiness for AI search platforms. It evaluates six dimensions — Crawlability, Structure, Authority, Citability, Freshness, and Measurability — and is developed by Datanalytico, the AI search intelligence platform.
- Fan-out query
- A fan-out query is an AI-generated sub-query that a system like Google AI Mode creates internally from a single user question. One input question may fan out into 8-15 synthetic sub-queries, each targeting a specific aspect of the answer. Content planned against ~12 fan-out sub-queries ranks best.
Related: AI Overview, AI share of voice
Related: AI citation, Fan-out query
Related: AI citation, LLMO (LLM Optimization), GEO (Generative Engine Optimization)
Related: AI Overview, Capsule format
Markup & Structured Data
- FAQPage schema
- FAQPage schema is the JSON-LD structured-data type that wraps a list of questions and answers on a page, each as a Question with an acceptedAnswer. It makes the Q&A content extractable by Google AI Overview, ChatGPT, and Perplexity, which often cite FAQ answers directly as their response snippet.
- llms.txt
- llms.txt is a plain-text file served at the root of a domain (e.g. example.com/llms.txt) that describes the site's key pages, products, and company facts for AI assistants. It functions as a curated summary — analogous to robots.txt for crawl rules or sitemap.xml for URLs — to help LLMs cite the right resources.
- Schema markup (JSON-LD)
- Schema markup is structured data embedded in a web page using the schema.org vocabulary, most commonly as JSON-LD inside a script tag. It gives search engines and AI platforms a machine-readable description of the page's content — Organization, Product, FAQPage, Article, Person — so they can extract and cite it accurately.
- SoftwareApplication schema
- SoftwareApplication schema is the schema.org type for SaaS and software products. Key fields include name, applicationCategory, operatingSystem, offers (with price and billingDuration), and featureList. It is the correct schema for product or tool pages — distinct from Organization, which describes the company offering the software.
Related: Schema markup (JSON-LD), Capsule format
Related: robots.txt, Schema markup (JSON-LD)
Related: FAQPage schema, SoftwareApplication schema, BreadcrumbList
Related: Schema markup (JSON-LD)
AI Crawlers
- ClaudeBot
- ClaudeBot is Anthropic's web crawler, used to fetch content for Claude's web-search features and, separately, for training. It is controlled via the ClaudeBot User-agent in robots.txt. Allowing ClaudeBot makes a site eligible for citation in Claude's answers; blocking it excludes the site from that surface.
- Google-Extended
- Google-Extended is a robots.txt User-agent token that controls whether a site's content may be used to train Google's Gemini models and Vertex AI APIs. Blocking Google-Extended does not affect Google Search indexing or AI Overview citations — it only opts out of training-data inclusion.
- GPTBot
- GPTBot is OpenAI's web crawler, used to fetch public web content for training its models and powering ChatGPT's search features. Site owners can allow or disallow GPTBot via a User-agent block in robots.txt. Blocking GPTBot removes the site from OpenAI's training data but does not block ChatGPT's live web-search citations.
- PerplexityBot
- PerplexityBot is Perplexity AI's web crawler, used to build the index that powers Perplexity's search and Pro Research features. Controlled via PerplexityBot in robots.txt. Perplexity is one of the few AI platforms that shows source citations prominently alongside every answer, making PerplexityBot access a direct citation-eligibility gate.
- robots.txt
- robots.txt is a plain-text file at the root of a domain (example.com/robots.txt) that tells web crawlers which paths they may or may not access. It uses User-agent directives per bot — including GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Applebot-Extended — to control AI training and search indexing.
Related: robots.txt, GPTBot
Related: robots.txt, AI Overview
Related: robots.txt, ClaudeBot, PerplexityBot
Related: robots.txt, GPTBot
Related: llms.txt, GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Content Quality
- Capsule format
- The capsule format is a 40-60 word answer-first content block placed directly under a question-phrased heading. It is the structural unit AI platforms extract as a citation: the question restates the user query, the answer stands alone without surrounding context, and the target keyword appears in the first 20 words.
- E-E-A-T
- E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness — Google's framework for assessing content quality and a central signal in AI citation decisions. It is signalled through author bios with credentials, Person and Organization schema, transparent contact information, and demonstrated first-party experience in the topic.
- GEO (Generative Engine Optimization)
- GEO (Generative Engine Optimization) is the industry-standard term for optimizing content to be cited by generative AI engines such as ChatGPT, Perplexity, and Google AI Overviews. GEO overlaps substantially with LLMO and the AISO Score methodology — different labels for the same shift from ranking-for-keywords to earning-AI-citations.
- Information Gain
- Information Gain is a 0-3 score that measures how much unique value a page adds beyond what competitor pages already cover. 0 = redundant, 1 = reframed, 2 = enhanced, 3 = unique. AI platforms prioritize citing pages with higher Information Gain, making it the single most predictive quality signal for AI citability.
- LLMO (LLM Optimization)
- LLMO (LLM Optimization) is the practice of structuring website content so that Large Language Models can extract, understand, and cite it correctly. It covers passage-level scoring across five dimensions — clarity, completeness, authority, structure, and specificity — often scored per page on a 0-5 scale alongside the broader AISO Score.
Related: Information Gain, Fan-out query
Related: Schema markup (JSON-LD), AI citation
Related: LLMO (LLM Optimization), AISO Score
Related: AISO Score, Capsule format
Related: AISO Score, GEO (Generative Engine Optimization), Capsule format
Analytics & Compliance
- Consent Mode v2
- Consent Mode v2 is Google's framework for adjusting analytics and advertising tags based on user consent choices. It uses four parameters — ad_storage, analytics_storage, ad_user_data, ad_personalization — defaulted to denied in EU/CH jurisdictions and updated to granted after opt-in. Required under GDPR and the Swiss nDSG.
- Core Web Vitals
- Core Web Vitals are Google's user-experience metrics: Largest Contentful Paint (LCP ≤2.5s), Interaction to Next Paint (INP ≤200ms), and Cumulative Layout Shift (CLS ≤0.1). They are a direct Google Search ranking factor and also correlate with AI citation rates — fast, stable pages are preferred sources.
Related: AI citation
Related: AISO Score
Ready to apply these concepts to your site?
A free AISO Score scan shows you in 30 seconds how citable your website is across AI platforms.
Get Your Free AISO ScoreLast updated: