Robot Meta Tags and X-Robots-Tag HTTP Headers

Mandelbot is Fractle's web crawler and it lets you control what files are indexed using Robot Meta Tags and X-Robots-Tag HTTP Headers. This page describes Mandelbot's support for Robot Tags, which is part of Mandelbot's support for the Robots Exclusion Protocol.

Robot Meta Tags

You can control if Mandelbot indexes a particular HTML page by adding a meta tag inside the head element on the page. This only works on HTML files that specify a Content-Type of text/html in their HTTP response headers.

The robot meta tag consists of a meta tag element with two attributes: name and content. The name specifies the user agent to which the tag applies or contains the value robots, in which case the tag applies to all crawlers. The content is a comma separated list of directives. Both attributes are case insensitive.

Mandelbot uses the user agent Mandelbot and understands the noindex directive. By adding one of the following meta tags to a page you can stop Mandelbot from indexing that page:

<meta name="robots" content="noindex" />

<meta name="mandelbot" content="noindex" />

X-Robots-Tag HTTP Headers

You can control if Mandelbot indexes a particular file by adding a X-Robots-Tag element to the HTTP response headers for the file. This works for all file types.

The X-Robots-Tag element uses the HTTP header field X-Robots-Tag combined with a value that is a comma separated list of directives, which may optionally be preceeded by a user agent name followed by a colon. The value is case insensitive.

Mandelbot uses the user agent Mandelbot and understands the noindex directive. By adding one of the following HTTP response headers to a file you can stop Mandelbot from indexing that file:

X-Robots-Tag: noindex

X-Robots-Tag: mandelbot: noindex

Robot Tag Changes

If you change your Robot Meta Tags or X-Robots-Tag HTTP Headers, Mandelbot will pick up the change when it next crawls the affected files.

Compatibility with Other Crawlers

Mandelbot is compatible with other crawlers. In both robot meta tags and X-Robots-Tag HTTP headers, Mandelbot ignores unknown directives and stops indexing files when encountering the none directive, which is equivalent to using both the noindex directive and the nofollow directive.

For example, Mandelbot will not index pages containing any of these meta tags:

<meta name="robots" content="nofollow,noindex,noarchive" />

<meta name="robots" content="none" />

Nor will it index files whose HTTP response includes any of these headers:

X-Robots-Tag: nofollow,noindex,noarchive

X-Robots-Tag: none

Robots Exclusion Protocol

Mandelbot's support of Robot Tags is just part of its support for the Robots Exclusion Protocol. Mandelbot also supports robots.txt files, which provide control over what files are crawled.

For an overview of the protocol, additional information on Mandelbot's use of Robot Tags, and details on interactions and conflicts between Robot Tags and robots.txt, read about the Robots Exclusion Protocol.