Updating the answer.
Exit another way of blocking that would be direct by HTTP Response Header using X-Robots-Tag
Instead of a metatag or robots.txt
, you can also return a X-Robots-Tag:noindex
header in the response to a page request. Here is an example of an HTTP response with a X-Robots-Tag
that instructs crawlers not to index a page:
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
(…)
Here you have more information about X-Robots-Tag : link
Another tip about Tag <meta>
is that you can block specific Bots such as:
<meta name="googlebot" content="NOINDEX, NOFOLLOW">
<meta name="MSNBot" content="NOINDEX, NOFOLLOW">
Blocking by robots.txt
of the site that you do not want to be indexed:
User-agent: *
Disallow: /
No User-agent: *
o *
means that this section applies to all robots.
And Disallow: /
tells the robot not to visit any page on the site.
Another thing, Noindex Nofollow
must be inserted into a meta tag <meta>
not within robots.txt
! The correct one should be:
<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>
Anyway, the <meta>
tag can be ignored by some search engines, as well as robots.txt
, mainly malware-robots. And nofollow
only blocks links from the page you are on, if there is a link to your site on some other page that does not also have nofollow
bot you can find your site by this link, taking the robots or not.