Preventing a site from being listed on Google

6

Well, I'm creating a website and an application that will access the same database. For this, while I will still be developing the site and all its modules, I will put it in the air.

Even for more efficient testing. So I'd like to know how I can make sure my site does not show up on Google searches. That is, I want only I (or anyone who knows the exact url) can access it.

But in such a way that by searching on Google for the name of the site, it is not listed in searches.

    
asked by anonymous 07.08.2015 / 22:44

2 answers

3

Search engines use web crawlers to find and index websites on the web. To prevent your site from being indexed, an alternative is using the meta tag:

<meta name="robots" content="nofollow" />

In the header of your HTML, in order to inform that the page should not be made available to the public.

But it is worth emphasizing that this meta tag only informs the seekers that you do not want to be "found". It does not block access to your site if it is already published to some external server. Anyone with the direct link would still be able to access it.

    
07.08.2015 / 22:47
5

You have several techniques to solve this, each with its advantages and disadvantages. By its description just use robots.txt which is a file that tells search engines that you do not want that content to be indexed by them .

The presence of this file does not guarantee anything, but the most well-known search engines respect this. If you need guarantees, you'll need to use a protection mechanism requiring at least a basic authentication before people access.

It is possible to use the meta technique put in the other answer but it does work. It has to be placed on every page, and if one day need to change it in site , it has to change in all files. You can tell in%% of which pages will be affected in a centralized way. It's much better.

In some cases it may be better or the only way to do it, but rarely is the case. It should be a secondary solution. Still, it's better to do this:

<meta name="robots" content="noindex, nofollow">

You can also control this on the HTTP server. But there are rare cases where this is more interesting. You can use this element in the protocol header:

X-Robots-Tag: noindex
    
07.08.2015 / 22:47