meta robots Tag
Build <meta name="robots"> with directives.
The meta robots tag: complete control over how search engines index your pages
The <meta name="robots"> tag lives inside the <head> of an HTML document and tells crawlers from Google, Bing, Yandex, AI assistants and other bots how they may use the page. It is the most surgical instrument in technical SEO because it works per-URL and per-bot, unlike robots.txt, which works per-directory and only blocks crawling. Getting it right means promo pages disappear from the SERP at the right moment, staging never leaks, and AI scrapers receive exactly the policy you decided. Getting it wrong silently drops entire sections of your site from Google.
Syntax, defaults and common directive combinations
The basic syntax is <meta name="robots" content="...">, where the content attribute holds a comma-separated list of directives. The two implicit defaults are index (the page may appear in the SERP) and follow (link equity passes through outbound links). You only need to declare the opposite when you want to deviate. Frequent combinations include noindex, follow (typical for paginated archives that should not rank but must still distribute equity to children), index, nofollow (for indexable pages full of untrusted user links), noindex, nofollow (the strongest opt-out), noarchive (Google may index but must not cache), and nosnippet (no preview text in the SERP).
<meta name="robots" content="index, follow, max-image-preview:large">
<meta name="robots" content="noindex, follow">
<meta name="robots" content="noindex, nofollow, noarchive">
<meta name="robots" content="max-snippet:160, max-image-preview:large, max-video-preview:30">
<meta name="robots" content="noindex, unavailable_after:2026-12-31T23:59:59-03:00">
Bot-specific overrides: googlebot, bingbot and AI crawlers
You can target one specific crawler by replacing robots with the bot's user-agent token. A <meta name="googlebot" content="..."> directive overrides the generic robots tag for Google only; the same trick works for bingbot, slurp (Yahoo), msnbot, applebot and the new AI crawlers. Since 2023, GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Google-Extended (training data for Gemini), PerplexityBot and Bytespider have become major traffic sources. Some publishers want index by traditional search engines but want to block training; meta robots is one of the two control points.
<meta name="googlebot" content="noindex, nofollow">
<meta name="bingbot" content="noarchive">
<meta name="GPTBot" content="noindex">
<meta name="ClaudeBot" content="noindex">
<meta name="Google-Extended" content="noindex">
X-Robots-Tag HTTP header for non-HTML files
Meta robots only works in HTML, so PDFs, images, videos, JSON feeds and any other binary need the equivalent HTTP header, X-Robots-Tag. Configure it at the server level (Nginx, Apache, Cloudflare Workers) and you can blanket-noindex an entire /downloads/ folder without touching individual files. The header accepts every directive a meta tag accepts and also supports bot-specific syntax like X-Robots-Tag: googlebot: noindex.
# Nginx
location ~* \.(pdf|docx)$ {
add_header X-Robots-Tag "noindex, nofollow" always;
}
# Apache .htaccess
<FilesMatch "\.(pdf|docx)$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
Robots.txt vs meta robots: the classic trap
A page that is both Disallow'd in robots.txt AND tagged with noindex in HTML will, paradoxically, still appear in Google search results β usually with the famous "A description for this result is not available because of this site's robots.txt" snippet. The reason is mechanical: if robots.txt forbids crawling, Google never downloads the HTML, never sees the noindex meta and therefore never removes the URL from the index when it was discovered through backlinks. The correct workflow is: first publish the page with noindex, wait for the next crawl, and only then add the disallow line if you also want to save crawl budget.
Common mistakes that destroy organic traffic
- Leaving
noindexon after deployment β the single most common cause of a sudden traffic drop. Audit staging templates before merging to production. - Using
nofollowon internal navigation links β this is link equity sculpting and Google has explicitly discouraged it since 2009. - Adding
noindexon canonical-pointed pages β Google sometimes treats this as a contradictory signal and may ignore the canonical entirely. - Combining robots.txt disallow with meta noindex as discussed above.
- Forgetting
max-image-preview:largeβ without it, image-rich Discover and SERP cards default to small thumbnails.
FAQ
How long until a noindexed page disappears from Google? Roughly seven days on average, sometimes faster for high-authority sites and slower for orphan URLs. You can speed it up by requesting reindexing in Google Search Console.
Is the X-Robots-Tag HTTP header equivalent to the meta tag? Yes, fully equivalent in directives and bot-targeting. Use the header for non-HTML files or when you cannot edit the HTML.
How do I block AI bots from training on my content? Two layers: add User-agent: GPTBot followed by Disallow: / to robots.txt (and repeat for ClaudeBot, CCBot, Google-Extended, PerplexityBot), and add meta robots with the bot tokens for HTML pages.
Does noindex also stop link equity flow? Not directly. Google has stated that long-term noindex, follow behaves like noindex, nofollow because the URL is dropped from the index. For paginated archives, prefer canonical + indexable child pages.
Related Tools
Handwriting Generator
Convert typed text into an image with handwriting appearance. Useful for adding a personal touch to digital work.
Resume Generator
Fill a simple printable A4 CV from a form with personal data, education and experience.
Favicon Generator
Generate a favicon from text/emoji in all common sizes (16, 32, 48, 64, 192, 512). PNG download.