Robots.txt Generator
Build a syntactically correct robots.txt without memorizing the spec. Add user-agent groups (all bots, Googlebot, Bingbot, or AI crawlers like GPTBot and ClaudeBot), stack Allow and Disallow rules per group, set an optional Crawl-delay, and append your sitemap URLs — the file updates live as you type. One-click presets cover the four most requested configurations, including blocking AI training bots, and the validator flags paths that don't start with / before they silently fail in production. Copy the result or download it as robots.txt, ready to upload to your site root.
Quick presets
Add a user-agent group
Your robots.txt
User-agent: * Disallow:
robots.txt is advisory, not security: well-behaved crawlers follow it, but it does not block access or hide private content. Use authentication for anything sensitive.
How to use the robots.txt generator
- Start from a quick preset (Allow everything, Block everything, Block AI training bots, or WordPress standard) or build from scratch.
- Add a group per crawler you want to address — * matches every bot that doesn't have a more specific group.
- Add Disallow and Allow rows. Every path must start with / — the validator warns you if one doesn't.
- Optionally add Crawl-delay (Bing honors it; Google ignores it) and your sitemap URL(s).
- Copy the output or download robots.txt and upload it to the root of your domain.
How robots.txt matching actually works
Each crawler picks exactly one group: the one whose User-agent line matches it most specifically. Googlebot obeys a User-agent: Googlebot group and ignores your User-agent: * group entirely — rules do not cascade between groups, which is the single most common robots.txt mistake. If you block something for everyone, repeat the rule inside every specific group too.
Within a group, when multiple rules match a URL, the longest path wins (most specific match), not the first one listed. Given Disallow: /shop/ and Allow: /shop/sale/, the URL /shop/sale/boots is crawlable because the Allow rule is 11 characters to the Disallow's 6. Two wildcards are supported by Google and Bing: * matches any sequence of characters (Disallow: /*?sessionid= blocks any URL containing that parameter), and $ anchors the end of the URL (Disallow: /*.pdf$ blocks PDFs but not /whitepaper.pdf?download=1). An empty Disallow: means “nothing is disallowed” — that's how the Allow-everything preset works.
robots.txt is not noindex — blocked pages can still rank
Disallow stops crawling, not indexing. If other sites link to a blocked URL, Google can index it anyway and show it with the snippet “No information is available for this page” — a real SERP result pointing at a page you thought you hid. To keep a page out of the index, you need a noindex robots meta tag or X-Robots-Tag header, and the page must not be blocked in robots.txt, otherwise Google never crawls it and never sees the noindex. The two mechanisms are opposites in that scenario, and combining them wrong is how “hidden” admin pages end up in search results. Likewise, robots.txt is a convention, not a lock: it does nothing against scrapers or malicious bots that simply ignore it, and the file itself is public — listing /secret-admin/ in it advertises the path to anyone who looks.
Where the file must live
Crawlers request exactly /robots.txt at the root of each host and protocol: https://www.example.com/robots.txt. A file at /pages/robots.txt is never read, and blog.example.com needs its own file — subdomains don't inherit. The filename is case-sensitive (Robots.TXT fails), the file should be served as UTF-8 text/plain with a 200 status, and Google reads at most 500 KiB. Sitemap lines are the exception to scoping rules: they take a full absolute URL and can live anywhere in the file, outside any group.
Frequently asked questions
How do I block AI training bots without affecting Google rankings?
Use the “Block AI training bots” preset: it disallows / for GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), PerplexityBot, Google-Extended, Applebot-Extended, Bytespider (TikTok/ByteDance), and meta-externalagent, while leaving a separate User-agent: * group that allows everything. Google-Extended controls AI training use only — blocking it does not affect Googlebot or your search rankings.
Does Disallow remove a page from Google?
No. Disallow only stops crawling. A page blocked in robots.txt can still be indexed from external links and appear in results without a description. To deindex a page, allow it to be crawled and add a noindex meta tag or X-Robots-Tag header — then optionally block it after it drops out.
What does Crawl-delay do, and should I set it?
Crawl-delay asks a bot to wait N seconds between requests. Bing and Yandex honor it; Google has never supported it (manage Googlebot's rate in Search Console instead). Only set it if a specific crawler is genuinely hammering your server — a value like 10 can slow Bing's discovery of new pages dramatically.
Why isn't my User-agent: * rule applying to Googlebot?
Because a crawler obeys only its most specific matching group. If your file contains a User-agent: Googlebot group anywhere, Googlebot reads that group exclusively and ignores * entirely. Duplicate any universal rules into every named group.
Can I use robots.txt to hide private or sensitive pages?
No — treat it as a courtesy sign, not a lock. Compliant crawlers respect it, but anyone (including bad bots) can fetch your robots.txt and see exactly which paths you tried to hide. Protect sensitive content with authentication or IP restrictions; use noindex for pages that are public but shouldn't rank.
Where exactly do I upload the file?
At the root of the exact host you want to control: https://www.example.com/robots.txt. Each subdomain (and each protocol) needs its own file; a robots.txt in a subdirectory is ignored. On WordPress, SEO plugins like Yoast can write it for you, or upload via FTP to the web root.
Related tools
- Meta Tag GeneratorGenerate every meta tag your page needs: title, description, canonical, robots, Open Graph, and Twitter cards. Copy a clean, valid HTML head block.
- URL Slug GeneratorConvert any title to a clean URL slug — lowercase, hyphens, accents transliterated, stop words removed. Bulk mode turns a whole list into slugs.
- Hreflang Tag GeneratorGenerate hreflang tags for every language-region pair — paste URLs, pick locales, get HTML link tags or XML sitemap entries with x-default.
- SERP Snippet Preview & Meta Length CheckerPreview your Google snippet pixel-accurately and check title tag and meta description lengths before you publish — desktop and mobile views.
- Article Schema GeneratorGenerate Article, NewsArticle, or BlogPosting JSON-LD schema with author, publisher, and dates. Copy valid markup for Google article rich results.
- Conversion Rate CalculatorCalculate conversion rate from conversions and visitors, solve for the conversions or traffic you need, and get revenue per visitor from your AOV.