As large language models (LLMs) like ChatGPT, Claude, Gemini, and others become increasingly central to how users search for and consume content, website owners face a new question:
How can I control how AI models use my website’s content?
Meet llms.txt – a proposed standard that aims to give website owners fine-grained control over how their content is accessed, indexed, cited, or even trained on by LLMs.
What is llms.txt?
llms.txt
is a Markdown-based file (not to be confused with robots.txt
) that was proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024.
It is designed specifically to:
-
Help LLMs understand what parts of your site are most relevant
-
Guide how AI models cite, use, or train on your content
-
Create a machine-readable index for documentation-heavy sites (e.g., API docs, developer portals, legal info)
Unlike robots.txt
, it doesn’t just say “don’t crawl” — it says “here’s what matters” and “here’s how you may or may not use it.”
Why Was It Created?
Search engines like Google rely on structured crawling using robots.txt
.
But LLMs operate differently: they draw on multiple data sources, may use third-party scrapers, and lack a consistent standard for respecting content boundaries.
llms.txt
is a response to that gap — a signal of intent from publishers.
Example llms.txt Format
Here’s an example format:
# My Website
> A concise description of what this site is about
## Core Documentation
– [Getting Started Guide](https://example.com/start)
– [API Reference](https://example.com/api)
## Optional Extras
– [Tutorials](https://example.com/tutorials)
or you can simply take a look at our’s
Who’s Using It?
While still in early stages, llms.txt
is already gaining traction:
-
Mintlify has adopted it for clean technical documentation.
-
Tools like llmstxt.directory are indexing early adopters.
-
Yoast SEO and similar platforms are exploring native support.
-
Reddit’s r/LLMDevs community is actively discussing implementation.
Does It Actually Work?
As of mid-2025, most major LLMs (OpenAI, Google, Anthropic, Meta) do not yet enforce llms.txt
, nor do they explicitly support it.
That said, its presence:
-
Sends a clear legal and ethical signal
-
Can serve as documentation for future disputes
-
Might be respected by scrapers or open-source AI tools earlier than the big players
It’s not enforcement — it’s preparation.
llms.txt
vs robots.txt
Feature | robots.txt |
llms.txt |
---|---|---|
Target audience | Crawlers/search engines | LLMs (ChatGPT, Claude, etc.) |
Format | Plaintext | Markdown with links + metadata |
Purpose | Control indexing | Guide AI citation, summarization, training |
Current adoption | Universal | Experimental |
Should You Use It?
Yes — if you:
-
Own a website with valuable content (docs, articles, datasets)
-
Want to control how your site is used in AI responses
-
Aim to future-proof your rights as AI usage grows
You can place the file at https://yourdomain.com/llms.txt
and begin shaping your visibility in the age of generative AI.
Further Reading and Tools
-
Official proposal by Answer.AI
-
Adoption analysis on Medium
-
Live index of participating sites: llmstxt.directory
-
Reddit developer thread: r/LLMDevs
Final Thoughts
llms.txt
may not change how AI interacts with your site overnight — but like robots.txt
in its early days, it sets the foundation for future rules of engagement.
If you’re building with content, you should be protecting it too.
Need help writing one for your website? Feel free to contact me. I’m happy to help.