Question 1

How do I block AI training bots without affecting Google rankings?

Accepted Answer

Use the “Block AI training bots” preset: it disallows / for GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), PerplexityBot, Google-Extended, Applebot-Extended, Bytespider (TikTok/ByteDance), and meta-externalagent, while leaving a separate User-agent: * group that allows everything. Google-Extended controls AI training use only — blocking it does not affect Googlebot or your search rankings.

Question 2

Does Disallow remove a page from Google?

Accepted Answer

No. Disallow only stops crawling. A page blocked in robots.txt can still be indexed from external links and appear in results without a description. To deindex a page, allow it to be crawled and add a noindex meta tag or X-Robots-Tag header — then optionally block it after it drops out.

Question 3

What does Crawl-delay do, and should I set it?

Accepted Answer

Crawl-delay asks a bot to wait N seconds between requests. Bing and Yandex honor it; Google has never supported it (manage Googlebot's rate in Search Console instead). Only set it if a specific crawler is genuinely hammering your server — a value like 10 can slow Bing's discovery of new pages dramatically.

Question 4

Why isn't my User-agent: * rule applying to Googlebot?

Accepted Answer

Because a crawler obeys only its most specific matching group. If your file contains a User-agent: Googlebot group anywhere, Googlebot reads that group exclusively and ignores * entirely. Duplicate any universal rules into every named group.

Question 5

Can I use robots.txt to hide private or sensitive pages?

Accepted Answer

No — treat it as a courtesy sign, not a lock. Compliant crawlers respect it, but anyone (including bad bots) can fetch your robots.txt and see exactly which paths you tried to hide. Protect sensitive content with authentication or IP restrictions; use noindex for pages that are public but shouldn't rank.

Question 6

Where exactly do I upload the file?

Accepted Answer

At the root of the exact host you want to control: https://www.example.com/robots.txt. Each subdomain (and each protocol) needs its own file; a robots.txt in a subdirectory is ignored. On WordPress, SEO plugins like Yoast can write it for you, or upload via FTP to the web root.

Robots.txt Generator

How to use the robots.txt generator

How robots.txt matching actually works

robots.txt is not noindex — blocked pages can still rank

Where the file must live

Frequently asked questions

Related tools

Learn more