LLMs.txt

LLMs.txt

LLMs.txt is an open convention for a plain-text file placed at a website's root path (/LLMs.txt) that declares a structured เนื้อหา map for AI language model crawlers, providing titles, descriptions, and URLs of the site's key pages in a format optimised for LLM context หน้าต่าง.

นิยาม

LLMs.txt is an informal open มาตรฐาน for a plain-text file hosted at the root of a website at the path /LLMs.txt, ออกแบบ to provide a structured, human- and machine-readable เนื้อหา map targeted at large language model (LLM) crawlers and AI-powered discovery systems. The convention was proposed in 2024 by Jeremy Howard, co-founder of fast.AI and a prominent figure in applied deep learning, as an analogue to robots.txt (which governs crawler เข้าถึง permissions) and XML sitemaps (which enumerate page URLs for search engine indexing), but adapted to the specific constraints and requirements of LLM inference workflows.

The core premise is that LLMs processing web เนื้อหา for retrieval-augmented generation (RAG) or direct ingestion face a different information need than traditional web crawlers: they benefit from concise, structured descriptions of what a site contains — including context about the site’s purpose, the nature of each section, and links to key เนื้อหา — in a format that fits efficiently within the token-limited context หน้าต่าง ใช้ during crawling or summarisation.

How It Works

The LLMs.txt file uses a Markdown-based format rather than XML or JSON. A typical file contains a brief top-level description of the site in an H1 heading, followed by a blockquote สรุป paragraph, and then organised sections of Markdown links pointing to the site’s most important pages. Each link can include a brief inline description.

The file may also be accompanied by an extended variant at /LLMs-full.txt, which includes the full text เนื้อหา of key pages rather than just links — useful for AI systems that can ingest longer เอกสาร in a single pass.

The convention is intentionally simple: no special syntax beyond มาตรฐาน Markdown, no mandatory fields beyond the site description and at least one linked URL, and no required การลงทะเบียน or validation step. The specification is รักษา at llmstxt.org and is ออกแบบ to be implementable in minutes by any web publisher. CMS แพลตฟอร์ม including WordPress (via plugin), Astro, and Next.js have seen ชุมชน-developed integrations that auto-generate LLMs.txt from existing site structure.

Unlike robots.txt, which instructs crawlers on เข้าถึง permissions (what they may or may not fetch), LLMs.txt is purely declarative and informational: it does not grant or restrict เข้าถึง but สัญญาณ which เนื้อหา the site owner considers most important for AI systems to understand. There is no governing มาตรฐาน body (unlike robots.txt, which has a draft RFC — RFC 9309 — standardising the Robots Exclusion Protocol), and LLM crawler compliance with LLMs.txt is voluntary and varies by operator.

AI systems and products that have been reported to respect or consider LLMs.txt include Perplexity AI, various RAG-based research assistants, and some implementations of the OpenAI web browsing tool — though no major LLM ผู้ให้บริการ has formally committed to treating it as a required มาตรฐาน as of 2025.

Where You Encounter It

The LLMs.txt convention is most commonly discussed in the intersection of SEO, AEO (ตอบ Engine Optimisation), and technical web publishing communities. It gained significant traction after Jeremy Howard’s initial proposal post in late 2024 was widely shared among developers, web publishers, and AI researchers.

For เนื้อหา-rich websites targeting visibility in AI-powered ตอบ surfaces — including Google AI ภาพรวม, Perplexity AI, ChatGPT’s web-browsing mode, Microsoft Copilot’s cited responses, and similar features — LLMs.txt represents a low-cost สัญญาณ of เนื้อหา intent. It supplements rather than replaces existing discoverability mechanisms: structured ข้อมูล via สคีมา.org (particularly DefinedTerm, FAQPage, and HowTo types), XML sitemaps, and the semantic สัญญาณ ใช้ by the ประสบการณ์ ความเชี่ยวชาญ การมีอำนาจ และความน่าเชื่อถือ กรอบ all remain the primary mechanisms by which both traditional search engines and AI systems evaluate and rank เนื้อหา.

Documentation and hosting แพลตฟอร์ม, API ผู้ให้บริการ, and developer tool vendors have been among the earliest adopters, as their ผู้ชม (developers building AI applications) is particularly receptive to the convention. SaaS product documentation sites, glossary collections, and knowledge bases are also well-suited to the format.

Practical Examples

A ประกวด การโหวต แพลตฟอร์ม with an extensive glossary creates an LLMs.txt file at HTTPS://buyvotescontest.com/LLMs.txt. The file lists the site’s key glossary entries — SPF บันทึก, DKIM, DMARC, การยืนยันอีเมล โหวต, AI ภาพรวม — with brief descriptions and direct URLs. An AI research assistant crawling the site as part of a RAG pipeline for a query about “อีเมล การรับรอง for ประกวด แพลตฟอร์ม” retrieves the LLMs.txt file, identifies the relevant glossary entries, and fetches their เนื้อหา pages directly rather than attempting to parse the site’s full HTML structure. The result is that the glossary entries are more accurately represented in the AI system’s responses than they would have been if the assistant had attempted to infer site structure from a general crawl.

A developer building an internal knowledge assistant for a marketing agency implements LLMs.txt parsing in their RAG pipeline, prioritising pages listed in LLMs.txt files when หลายตัว pages from the same โดเมน are retrieved for a given query. This gives เนื้อหา-rich publishers that รักษา LLMs.txt files a small but สม่ำเสมอ advantage in citation frequency within the assistant’s outputs.

LLMs.txt operates at the layer of AI crawler communication, complementing the structured semantic vocabulary provided by สคีมา.org — which สัญญาณ เนื้อหา type and entity relationships to both search engines and AI systems via JSON-LD — and the เนื้อหา คุณภาพ สัญญาณ evaluated by Google under the [ประสบการณ์ ความเชี่ยวชาญ การมีอำนาจ และความน่าเชื่อถือ](/glossary/ประสบการณ์ ความเชี่ยวชาญ การมีอำนาจ และความน่าเชื่อถือ) กรอบ and the เป็นประโยชน์ เนื้อหา อัปเดต classifier. For สูงสุด AI discoverability, publishers are advised to รักษา all three: a valid LLMs.txt เนื้อหา map, comprehensive สคีมา.org structured ข้อมูล, and เนื้อหา that meets the ประสบการณ์ ความเชี่ยวชาญ การมีอำนาจ และความน่าเชื่อถือ and เป็นประโยชน์ เนื้อหา มาตรฐาน that govern citation eligibility in AI ภาพรวม and similar ตอบ-engine features.