Mandelbot is Fractle's web crawler and it lets you control what files are indexed using Robot Meta Tags and X-Robots-Tag HTTP Headers. This page describes Mandelbot's support for Robot Tags, which is part of Mandelbot's support for the Robots Exclusion Protocol.
You can control if Mandelbot
indexes a particular HTML page by adding a meta tag inside the head
element on the page. This only works on HTML files that specify a
Content-Type of text/html
in their HTTP response headers.
The robot meta tag consists of a meta tag element with two attributes:
name and content. The name specifies the user agent to which the tag
applies or contains the value robots
, in which case the
tag applies to all crawlers. The content is a comma separated list
of directives. Both attributes are case insensitive.
Mandelbot uses the user agent Mandelbot
and understands
the noindex
directive. By adding one of the following
meta tags to a page you can stop Mandelbot from indexing that page:
<meta name="robots" content="noindex" />
<meta name="mandelbot" content="noindex" />
You can control if Mandelbot indexes a particular file by adding a X-Robots-Tag element to the HTTP response headers for the file. This works for all file types.
The X-Robots-Tag element uses the HTTP header field
X-Robots-Tag
combined with a value that is a comma
separated list of directives, which may optionally be preceeded
by a user agent name followed by a colon. The value is case
insensitive.
Mandelbot uses the user agent Mandelbot
and understands
the noindex
directive. By adding one of the following
HTTP response headers to a file you can stop Mandelbot from indexing
that file:
X-Robots-Tag: noindex
X-Robots-Tag: mandelbot: noindex
If you change your Robot Meta Tags or X-Robots-Tag HTTP Headers, Mandelbot will pick up the change when it next crawls the affected files.
Mandelbot is compatible with
other crawlers. In both robot meta tags and X-Robots-Tag HTTP
headers, Mandelbot ignores unknown directives and stops indexing
files when encountering the none
directive, which is
equivalent to using both the noindex
directive and
the nofollow
directive.
For example, Mandelbot will not index pages containing any of these meta tags:
<meta name="robots" content="nofollow,noindex,noarchive" />
<meta name="robots" content="none" />
Nor will it index files whose HTTP response includes any of these headers:
X-Robots-Tag: nofollow,noindex,noarchive
X-Robots-Tag: none
Mandelbot's support of Robot Tags is just part of its support for the Robots Exclusion Protocol. Mandelbot also supports robots.txt files, which provide control over what files are crawled.
For an overview of the protocol, additional information on Mandelbot's use of Robot Tags, and details on interactions and conflicts between Robot Tags and robots.txt, read about the Robots Exclusion Protocol.
Fractle © 2024