Robots.txt Generator

Generate a robots.txt file with your sitemap URL, blocked directories and selective bot blocking. Includes options to block or allow AI crawlers.

robots-generator.tool
Block these bots
robots.txt Output

What Is robots.txt

Robots.txt is a plain-text file at the root of your website (example.com/robots.txt) that tells web crawlers which pages or directories they may or may not access. It is a voluntary standard — well-behaved bots respect it, but malicious scrapers may ignore it. It is the first thing Googlebot checks when crawling your site.

AI Crawler Considerations

Since 2023, multiple AI companies have deployed crawlers to collect web content for training large language models. OpenAI's GPTBot, Anthropic's ClaudeBot, and Common Crawl's CCBot are the most active. You can block these individually using Disallow: / in their User-agent sections. This does not affect Google or Bing indexing.

Frequently Asked Questions

Not entirely. Disallowing Googlebot from a URL prevents crawling, but Google can still index a page it has never crawled if other sites link to it — it just won't have read the content. To prevent indexing reliably, use a noindex meta tag or X-Robots-Tag header on the pages themselves. Robots.txt is for crawl budget management, not indexing prevention.
This is a personal choice. Blocking AI crawlers prevents your content from being used to train language models. It has no effect on Google or Bing. Some publishers block them to protect their content commercially. Others allow them as they feel it increases their content's indirect reach. Neither choice affects your SEO with traditional search engines.
At minimum: /admin/, /login/, /wp-admin/ (if WordPress), /private/, any staging or development directories, /cart/, /checkout/ (e-commerce). These pages have no SEO value and allowing bots to crawl them wastes your crawl budget on non-indexable or sensitive pages.
Crawl-delay tells a bot to wait a specified number of seconds between requests. Crawl-delay: 1 means wait 1 second between each page fetch. This reduces server load from aggressive crawlers. Note: Google does not support Crawl-delay in robots.txt — to control Googlebot's crawl rate, use Google Search Console settings instead.
Always at the root domain: example.com/robots.txt. Subdomains have separate robots.txt files: blog.example.com/robots.txt. A robots.txt for the root domain does not apply to subdomains. Most web servers are configured to serve files from the root directory automatically.