llms.txt: Complete Guide 2026 (Make Your Site Readable by...

Q: What is the difference between llms.txt and llms-full.txt?

llms.txt is a compact index listing your important pages with URLs and short descriptions. llms-full.txt contains the full extracted text of your important pages with no HTML. llms-full.txt suits technical documentation; for a standard SaaS marketing site, llms.txt alone is sufficient.

Q: Should you create a llms.txt file in 2026?

Yes if your audience includes developers using AI IDEs, if you publish public technical documentation, or if the creation cost is low. No if you expect immediate visibility impact in ChatGPT or Gemini — systematic adoption by those platforms is not yet confirmed. Prioritize robots.txt, sitemap.xml, JSON-LD structured data, and a GEO content strategy first.

Q: How many URLs should be in llms.txt?

The spec sets no limit. In practice, 20 to 40 URLs is a good target for a SaaS site. The goal is curation, not exhaustiveness. A 200-link file loses the editorial prioritization logic that makes the format valuable.

Definition

llms.txt is a plain-text file in Markdown format, placed at the root of a website (modeled after robots.txt), that gives large language models (LLMs) a structured, curated overview of a site's content. It lists key pages with their URLs and a short description so that ChatGPT, Claude, or Perplexity — when they access your site — immediately understand what to read first. Proposed by Jeremy Howard (fast.ai / Answer.AI) on September 3, 2024, it is not yet an official W3C standard but is gaining steady traction across the AI tooling ecosystem.

When a user asks Perplexity or Claude a question in web-search mode, the engine sometimes crawls your site in real time. If it hits 200 HTML pages with no machine-readable hierarchy, it picks at random — or moves on to a competitor. The llms.txt file is your answer to that problem: a clean, Markdown-formatted index that any LLM can ingest in seconds.

This guide covers everything: the origin of the standard, its exact format, a step-by-step tutorial with a complete real-world SaaS example, 2026 best practices, and — most importantly — an honest assessment of adoption today (spoiler: it's more nuanced than most articles will tell you). If you're already working on your GEO strategy to get cited by AI engines, llms.txt is a complementary lever worth understanding.

This article is for technical SEOs, SaaS founders, developers, and B2B growth practitioners who want to understand the topic without the hype — and with enough detail to decide whether implementation is worth their time.

Table of Contents

Origin of the standard: Jeremy Howard, September 2024
llms.txt vs robots.txt vs sitemap.xml: what's the difference?
Exact format of the llms.txt file
llms-full.txt: the long-form variant
Tutorial: creating your llms.txt for a SaaS (complete example)
2026 best practices: what to include, what to leave out
Who actually reads llms.txt? Honest adoption state in 2026
SaaS B2B use cases: why it matters right now
llms.txt + RAG on the publisher side: the link to your chatbot
FAQ — llms.txt

Origin of the standard: Jeremy Howard, September 2024

The llms.txt file was proposed by Jeremy Howard, researcher and entrepreneur best known for co-founding fast.ai — the organization that democratized deep learning education — and Answer.AI, an applied AI research lab. The proposal was published on September 3, 2024 at llmstxt.org.

The starting observation is straightforward: LLMs have a limited context window. When a model browses a website to answer a question, it cannot read thousands of HTML pages bloated with navigation menus, JavaScript, ads, and boilerplate markup. It needs a stripped-down, hierarchically organized, readable version — exactly what llms.txt provides.

The first reference implementation came from the FastHTML project (also from Answer.AI), which served as the canonical example for the specification. Since then, the convention has been adopted by nbdev, fast.ai projects, Vercel's documentation team, and a growing number of developer-focused technical documentation sites.

One important clarification: llms.txt is not a W3C, IETF, or any other official standards-body standard. It is a convention proposed by a respected expert, gaining traction through organic community adoption — much the same way robots.txt spread before receiving formal recognition from search engines.

llms.txt vs robots.txt vs sitemap.xml: what's the difference?

These three files coexist at your site root and serve complementary purposes. Conflating them is the most common mistake in articles on this topic.

File	Logic	Target audience	Message sent
robots.txt	Restrictive	All bots (Googlebot, GPTBot, ClaudeBot…)	"Do not go here"
sitemap.xml	Inventory	Classic search engine crawlers	"Here are all my pages"
llms.txt	Editorial	Large language models (LLMs)	"Here's what actually matters"

The fundamental distinction of llms.txt is its positive, editorial logic: you are not telling bots what to avoid, you are signaling what you consider most relevant. It is curation, not restriction.

An analogy: if your website were a library, robots.txt would be the "Staff Only" sign, sitemap.xml would be the exhaustive catalog of every book in the stacks, and llms.txt would be the handwritten card on the front desk: "If you read one thing, read these."

All three files are complementary and non-substitutable. llms.txt does not replace robots.txt for controlling AI crawler access. If you want to block GPTBot, ClaudeBot, or PerplexityBot from your site, robots.txt is the correct place to do that — all major AI crawlers honor its directives. llms.txt carries no access-control authority.

Exact format of the llms.txt file

The specification defines a precise Markdown structure with sections in a required order. Here is the anatomy of a compliant file.

Required structure

H1 — The project or site name (required, exactly one occurrence)
Blockquote — A short summary (a few lines) describing the project, its use cases, and its target audience (optional but strongly recommended)
Detail paragraphs — Additional contextual information, without subheadings
H2 sections with link lists — Each H2 defines a category of resources. Each list item is a Markdown link [title](url) with an optional note after a : separator
"Optional" section — An H2 literally named "Optional", for secondary resources that LLMs can skip when context is short

Minimal example compliant with the specification

# My SaaS Project

> B2B SaaS platform for [domain] management. Built for teams of 10 to 500.
> Founded in 2021, headquartered in San Francisco.

This site documents product features, integration guides, and customer case studies.
Pricing information and terms of service are also available.

## Docs

- [Quick Start Guide](https://example.com/docs/getting-started): Set up and configure the platform in under 10 minutes.
- [API Reference](https://example.com/docs/api): Complete documentation for all REST endpoints.
- [Webhooks](https://example.com/docs/webhooks): Trigger external actions from platform events.

## Blog

- [How to automate X with our tool](https://example.com/blog/automate-x): Practical guide with code examples.
- [Case study: Customer Y](https://example.com/blog/case-study-y): Measurable ROI over 6 months of usage.

## Optional

- [Terms of Service](https://example.com/terms)
- [Privacy Policy](https://example.com/privacy)
- [Cookie Policy](https://example.com/cookies)

Syntax rules to follow

The file must be accessible at https://yourdomain.com/llms.txt
UTF-8 encoding, no BOM
Standard Markdown links: [text](url)
Descriptions after URLs are separated by : (not by a new line)
No HTML in the file — pure Markdown only
No officially defined size limit, but the spec recommends a concise file (exhaustiveness is reserved for llms-full.txt)

llms-full.txt: the long-form variant

The specification includes an optional variant: llms-full.txt, served at /llms-full.txt. Its role is different from llms.txt: where the first is a curated index with links, the second contains the full text of important pages, already extracted and cleaned.

The logic: when an LLM can access llms-full.txt, it no longer needs to crawl each URL individually to read the content. Everything is already there in a single text file stripped of HTML, navigation, and noise. This is especially valuable for technical documentation sites where developers want AI coding tools (Cursor, Continue, Aider) to understand the entire project without making dozens of HTTP requests.

Anthropic themselves practice what they preach here — their documentation publishes a llms-full.txt that is a prime example of the format done right. Vercel's developer documentation is another frequently cited real-world example.

llms.txt vs llms-full.txt: which one should you create?

llms.txt: create this first. Lightweight, easy to maintain, readable by any AI client.
llms-full.txt: relevant if you publish dense technical documentation, a public knowledge base, or if you explicitly target developers using AI-powered IDEs. It can grow large and become difficult to maintain manually — tools like llms_txt2ctx (Python) or the llms-txt npm package can auto-generate it from your content.

For a standard SaaS marketing site, llms.txt alone is sufficient given the current state of adoption.

Tutorial: creating your llms.txt for a SaaS (complete example)

Here is a step-by-step tutorial for building an llms.txt file suited to a B2B SaaS. The example is modeled on what an AI chatbot platform like Heeya would produce.

Step 1 — Inventory your high-value pages

Before opening an editor, take stock. For a SaaS, priority pages typically include:

The homepage (with a precise description of what the product does)
Solution / feature pages (one per main offering)
Documentation or API reference pages (if public)
3 to 5 cornerstone blog articles in your core topic cluster
The pricing page
Case studies or customer pages (if public)

Do not list everything. llms.txt is curation, not a duplicate of your sitemap. Cap yourself at 20–40 URLs to keep the file usable within a short context window.

Step 2 — Write each link description

For each URL, write a 1–2 sentence description that answers: "If an LLM reads only this link and its description, does it understand what it will find on this page?"

Weak description: Our blog
Strong description: Complete guide to RAG (Retrieval-Augmented Generation): definition, architecture, and enterprise use cases for B2B SaaS.

Step 3 — Write the file

Here is a complete, realistic example for an AI chatbot SaaS — the kind of file Heeya uses as reference for its English-language content:

# Heeya — AI Chatbot Platform with RAG

> Heeya is a SaaS platform that lets businesses build custom AI chatbots
> powered by their own documents (RAG — Retrieval-Augmented Generation).
> Target audience: SMBs and B2B SaaS teams who want to automate customer support,
> streamline employee onboarding, or build an internal knowledge base.
> Built and supported in English and French.

The platform supports PDF, Word, PPTX, and TXT file ingestion, URL crawling,
and deploying a chatbot on any website via a JavaScript widget.
It includes an analytics dashboard, conversation history, and Stripe-based
subscription management.

## Solutions

- [AI Chatbot for Websites](https://heeya.fr/en/solutions/chatbot): Deploy a custom chatbot on your site. 5-minute setup, JS widget embeddable on any CMS or framework.
- [RAG Expertise](https://heeya.fr/en/solutions/ai-rag-expertise): End-to-end RAG architecture: document ingestion, embedding, vector search, augmented generation. Enterprise use cases included.

## Pricing

- [Heeya Plans](https://heeya.fr/en/pricing): Free, Pro, and Enterprise tiers. Feature comparison and usage limits. Monthly and annual billing available.

## Guides & Blog

- [What is RAG? Business Guide for Non-Technical Teams](https://heeya.fr/en/blog/what-is-rag-business-guide): Plain-English explanation of Retrieval-Augmented Generation with SaaS architecture diagrams and ROI examples.
- [llms.txt: Complete Guide 2026](https://heeya.fr/en/blog/llms-txt-complete-guide-2026): This file — its format, real adoption state, and how to implement it for a SaaS.
- [Generative Engine Optimization (GEO) 2026](https://heeya.fr/en/blog/generative-engine-optimization-geo-2026): How to get cited by ChatGPT, Perplexity, and Claude. GEO strategy for SaaS content teams.
- [AEO vs SEO 2026](https://heeya.fr/en/blog/answer-engine-optimization-aeo-vs-seo-2026): Differences between Answer Engine Optimization and traditional SEO. Practical framework for both.
- [How to Get Cited by ChatGPT Search](https://heeya.fr/en/blog/get-cited-by-chatgpt-search-checklist-2026): Actionable checklist for increasing the odds your content is surfaced in AI-generated answers.
- [Schema.org FAQ & HowTo for Google AI Overviews](https://heeya.fr/en/blog/schema-org-faq-howto-google-ai-overviews): Structured data implementation guide for AI Overviews and featured snippets.

## Optional

- [Terms of Service](https://heeya.fr/en/terms)
- [Privacy Policy](https://heeya.fr/en/privacy)
- [Cookie Policy](https://heeya.fr/en/cookies)

Step 4 — Deploy the file

The file must be served as plain text (Content-Type: text/plain or text/markdown) at the URL /llms.txt. Depending on your stack:

Static sites (Astro, Next.js, Hugo, Eleventy): place the file in your public/ or static/ folder. It will be served automatically at the root path.
FastAPI / Python: add a GET /llms.txt route that returns the content with the correct Content-Type, or mount the file as a static asset using StaticFiles.
WordPress: the "Website LLMs.txt" plugin (30,000+ active installs as of 2026) auto-generates the file from your content structure.
Vercel / Netlify: place the file in your public/ directory; both platforms serve root-level files with correct MIME types out of the box.
Nginx: place the file in your root directory and verify the MIME type is correctly mapped in mime.types.

Then verify the URL is publicly accessible: curl -I https://yourdomain.com/llms.txt should return 200 OK with a text/plain or text/markdown content type.

Step 5 — Reference llms.txt from your site

The specification recommends adding a <link> tag in the <head> of your site to signal the file's existence:

<link rel="llms-txt" href="/llms.txt" type="text/plain" />

This is not yet standardized at the browser or engine level, but it is a forward-looking best practice that anticipates how the standard may evolve — similar to how early adopters of rel="canonical" benefited before it was universally recognized.

2026 best practices: what to include, what to leave out

What to include

Your high-value pages: solutions, core features, cornerstone guides, case studies.
Precise descriptions: the summary should answer "what will I find on this page?" in one sentence.
Your pricing page: often the first thing an LLM looks for when a user asks "how much does [your product] cost?"
Your public API documentation: developers increasingly use AI coding assistants (Cursor, GitHub Copilot, Aider) that will read this file.
A clear blockquote description covering what you do, who you serve, and what makes you different.

What to leave out

Technical dead-end pages: login, payment, redirect, admin, and dashboard pages carry no informational value for an LLM.
Generic pages: a bare contact page tells an LLM nothing meaningful about your business.
Thin-content pages: if a page would not help a reader understand your product, it has no place in llms.txt.
Confidential or sensitive information: llms.txt is public and indexable. Do not include anything you would not want surfaced in an AI-generated answer.
Hundreds of URLs: the entire value of this file is in the curation. A 300-link file is no more useful than a sitemap.xml and far less readable.

Ongoing maintenance

Schedule a quarterly review. Every major new piece of cornerstone content or significant product feature deserves a slot in the file. Dead URLs (returning 301 or 404) should be updated or removed. Treat llms.txt with the same discipline you give your sitemap — it falls out of date without active attention.

Diagram showing the llms.txt file structure in Markdown, its differences from robots.txt and sitemap.xml, and how an LLM reads it

Who actually reads llms.txt? Honest adoption state in 2026

This is the question everyone asks, and the one most articles answer too optimistically. Here are the documented facts as of May 2026.

The major AI engines do not read it systematically

As of May 2026, none of the major players have confirmed systematic integration of llms.txt into their crawl or inference pipelines:

OpenAI (ChatGPT / GPTBot): no official announcement of llms.txt support.
Google (Gemini / Googlebot): Google Search Advocate John Mueller stated publicly that "no AI system currently uses llms.txt." Google AI Overviews relies on its standard indexing pipeline.
Anthropic (Claude / ClaudeBot): Anthropic publishes its own llms-full.txt on its documentation site, which validates the format by example. However, Anthropic has not confirmed that Claude automatically reads the llms.txt files of sites it accesses during web search — at least not in any documented, systematic way.
Perplexity (PerplexityBot): no official confirmation of llms.txt reading in its pipeline.
Microsoft (Bing / Copilot): no public statement on llms.txt integration.

What server logs actually show is primarily on-demand, real-time fetches — a user explicitly asking Claude or Perplexity to "read the llms.txt from this site," not automatic integration into these engines' standard crawl pipelines.

Developer tools actively read it

This is where adoption is real and documented:

Cursor (AI-native IDE): explicitly reads llms.txt and llms-full.txt to contextualize code assistance.
Continue (VS Code / JetBrains extension): same behavior, with explicit llms.txt support in its context management.
Aider (CLI AI coding tool): native llms.txt support documented in its configuration.
RAG frameworks (LangChain, LlamaIndex, Haystack): connectors allow ingesting llms.txt as an index source for a RAG pipeline — making it a structured entry point for AI-powered knowledge bases.

The practical consequence: if your customers or prospects are developers using these tools, your llms.txt has immediate, concrete value. If your audience is exclusively end-users of ChatGPT or Gemini, the current impact is more limited and uncertain.

A reasonable projection

The standard is gaining legitimacy. Anthropic publishes one for their own docs. Popular frameworks integrate it. A contributor community maintains the specification at llmstxt.org. The robots.txt analogy is instructive: that file was not universally respected by crawlers immediately after its creation either. Implementing llms.txt today is anticipating — not capitalizing on a lever that already works at scale.

The right posture: implement it if your audience includes developers, if you publish technical documentation, or if the creation cost is low for your team. Keep it current so you are ready when adoption accelerates — and it likely will.

SaaS B2B use cases: why it matters right now

Beyond the question of adoption by consumer-facing AI engines, llms.txt has concrete B2B SaaS use cases that apply today.

1. Prospects doing AI-assisted due diligence

A growing share of B2B buyers use Claude or ChatGPT in research mode to evaluate solutions before a purchase. They ask: "Explain what Heeya does and how it compares to [competitor]." If the model accesses your site in real time, a well-structured llms.txt allows it to build a response faithful to your positioning — rather than a rough synthesis of disorganized HTML pages. This is your narrative, curated and served to the model that will represent you.

2. Developers integrating your API

If you expose a public API, your developer users work with Cursor, Copilot, or Aider. These tools read llms.txt to understand your API surface. A file that points cleanly to your reference documentation, integration guides, and code examples directly reduces technical onboarding time. Think of it as self-service developer support that scales without your team doing anything extra.

3. Consistency of your AI "knowledge graph"

Large language models build a representation of your brand from everything they have ingested. A well-written llms.txt is a structured source that feeds that representation. It does not guarantee what a model will say about you, but it increases the probability that key facts — what you do, for whom, at what price point — are available and correctly articulated when a model forms its answer. This connects directly to broader GEO (Generative Engine Optimization) strategy.

4. Internal RAG pipelines

If your own team uses an internal RAG tool to work with your knowledge base — product documentation, sales playbooks, technical specs — a llms.txt on your public site gives that pipeline a clean entry point for ingestion. Your internal RAG agents can access your public content in a structured way, without decoding HTML. This is especially relevant if you're exploring RAG architecture for enterprise use.

llms.txt + RAG on the publisher side: the link to your chatbot

There is a direct connection between the logic of llms.txt and what a RAG system does on the publisher side — and it's an angle few guides explore.

llms.txt as an ingestion source for your own chatbot

If you deploy an AI chatbot on your site, you need a list of your important pages with their descriptions — exactly what llms.txt contains. In practice, your llms.txt can serve as the source file for your ingestion pipeline: the system parses the URLs and descriptions, crawls the corresponding pages, chunks them, and indexes them into your vector database. A single update to llms.txt triggers a refresh of the chatbot's knowledge base.

This is an economy of design: you maintain one canonical list of your important resources, which serves both external LLMs and your own internal chatbot. Two use cases, one file to maintain.

Your RAG knowledge base as the best llms-full.txt

Conversely, if you publish your knowledge base in public-facing form (as some SaaS products do for their help centers and documentation), its content is structured, factual, and expertise-dense — exactly what a llms-full.txt should contain. The two artifacts converge: a well-built RAG knowledge base is excellent source material for a llms-full.txt.

This dual use — powering your chatbot and signaling your expertise to external LLMs — is one of the strongest arguments for investing in a structured knowledge base. Our guide to RAG for business teams explains how to build that architecture, and our article on Answer Engine Optimization vs SEO puts llms.txt in its broader strategic context.

Build your knowledge base and RAG chatbot in minutes.

Heeya automatically indexes your documents and web pages to power a custom AI chatbot — and the same content can feed your llms.txt.

Start for free View plans

FAQ — llms.txt

What is llms.txt?

llms.txt is a Markdown file placed at the root of a website (at the URL /llms.txt) that gives large language models (LLMs) a structured, prioritized overview of a site's content. It lists key pages with their URLs and short descriptions, on the same model as robots.txt for search engine crawlers — but with a positive, editorial logic: it tells AI systems what to read, not what to avoid. Proposed by Jeremy Howard (fast.ai) on September 3, 2024, it is not yet an official standard but is gaining adoption across the AI tooling ecosystem.

What is the difference between llms.txt and robots.txt?

robots.txt is restrictive: it tells bots what not to crawl. llms.txt is editorial and positive: it tells LLMs what is worth reading first. robots.txt addresses all bots (Googlebot, GPTBot, ClaudeBot). llms.txt targets language models specifically. The two files are complementary: if you want to block an AI crawler, modify robots.txt — crawlers honor robots.txt directives, not llms.txt, for access control.

Do ChatGPT and Google Gemini read llms.txt?

Not systematically or in any documented way, as of May 2026. None of the major AI engines (OpenAI, Google, Anthropic, Perplexity) have officially announced integrating llms.txt into their automatic crawl pipeline. Google's John Mueller stated publicly that no AI system currently uses llms.txt. Developer tools like Cursor, Continue, and Aider do read it actively. On-demand access can occur when a user explicitly asks Claude or Perplexity to fetch a specific llms.txt file.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a compact index: it lists your important pages with their URLs and short descriptions. It's an annotated table of contents. llms-full.txt is the long-form version: it contains the full text of your important pages, already extracted and cleaned, with no HTML. llms-full.txt is particularly useful for technical documentation sites and AI tools that want to ingest your content in a single request. For a standard SaaS marketing site, llms.txt alone is sufficient.

Should you create a llms.txt file in 2026?

Yes, if your audience includes developers using AI IDEs (Cursor, Continue, Aider), if you publish public technical documentation, or if the creation cost is low (a few hours). No, if you expect immediate impact on your visibility in ChatGPT or Google Gemini — systematic adoption by those platforms is not yet confirmed. It is an anticipatory investment, not an immediate SEO lever. Prioritize robots.txt, sitemap.xml, JSON-LD structured data, and a GEO content strategy first.

How many URLs should be in llms.txt?

The specification sets no limit. In practice, 20 to 40 URLs is a good target for a SaaS site. The goal is curation, not exhaustiveness: only your highest-value pages belong in llms.txt. A 200-link file is no more useful than a sitemap.xml and loses the editorial prioritization logic that makes the format valuable. Technical pages with no informational value (login, T&Cs, redirects) belong in the "Optional" section or not at all.

Does Anthropic support the llms.txt standard for Claude?

Anthropic publishes its own llms-full.txt on its documentation site (docs.anthropic.com/llms-full.txt), which validates the format by practice. However, Anthropic has not confirmed that Claude automatically reads llms.txt files from sites it accesses during web search. ClaudeBot honors robots.txt directives, but there is no documented evidence that it parses llms.txt systematically in its inference pipeline.

How does llms.txt interact with an enterprise RAG strategy?

llms.txt can serve as an ingestion source for a RAG pipeline: the system parses the URLs and descriptions in the file, crawls the corresponding pages, chunks them, and indexes them into a vector database. This is especially useful for keeping a business chatbot's knowledge base in sync with your public content. A single llms.txt update can trigger an automatic RAG index refresh — one file, two uses: signal your content to external LLMs and power your own internal chatbot.