Tell Crawlers Where They Can and Can’t Go
Your staging site just got indexed by Google. Your /admin panel is showing up in search results. GPTBot is scraping your entire content library for AI training. All of these problems are solved (or prevented) by one small text file at your domain root: robots.txt.
This tool gives you a visual builder with presets instead of making you write the syntax from memory. Block all bots, allow all bots, block AI crawlers specifically, pick a preset and customize from there.
The AI Bot Blocking Preset
This is the one everyone’s looking for right now. GPTBot, ChatGPT-User, Google-Extended, and other AI training crawlers are harvesting web content at scale. The “Block AI Bots” preset generates the user-agent blocks to stop them while keeping Googlebot and Bingbot active for normal search indexing.
Whether blocking AI crawlers is the right move for your site is debatable. But if you want to do it, the syntax needs to be exact, and this tool generates it correctly.
Common Configurations
New site launch. Start with Allow All for Googlebot, add your sitemap URL, and block /admin, /api, and any staging or test paths.
WordPress sites. Block /wp-admin/ to save crawl budget (Google doesn’t need to crawl your dashboard). Consider blocking /wp-includes/ and tag/author archive pages that create thin content.
Staging environments. Block everything with Disallow: / for all user-agents. This prevents accidental indexing of unfinished pages. Just remember to remove it before launch, leaving a Disallow: / on production is one of the most common and devastating SEO mistakes.
What Robots.txt Can’t Do
It prevents crawling, not indexing. If other sites link to a page you’ve blocked in robots.txt, Google can still show that URL in search results (without a snippet). To prevent indexing, use a noindex meta robots tag, the Meta Tag Generator can create that.
Also: robots.txt is a polite request, not a firewall. Well-behaved bots (Google, Bing) honor it. Malicious scrapers ignore it entirely. For actual access control, use server-side authentication.
The file goes at yourdomain.com/robots.txt, exactly that location, no subdirectories. Add your sitemap URL (Sitemap: https://yourdomain.com/sitemap.xml) so every search engine can discover it.
Create the sitemap itself with the Sitemap Generator. Everything builds in your browser.