How to block search engines (with pictures)

Table of contents:

How to block search engines (with pictures)
How to block search engines (with pictures)
Anonim

Search engines are equipped with robots (web spiders or bots) that crawl and index web pages. If your site or page is under construction or contains inappropriate content, robots can be prevented from crawling and indexing your site. Learn how to block entire sites, pages and links using robots.txt, or specific pages and links using html tags. Read on to find out how to prevent certain bots from accessing content.

Steps

Method 1 of 2: Block search engines with robots.txt

576315 1
576315 1

Step 1. Review your robots.txt file

A robots.txt file is a simple text or ASCII file that tells search engine spiders what parts of a site they can access. Files and folders listed in the robots.txt file cannot be crawled and indexed by search robots. Use a robots.txt file if:

  • you want to hide certain content from search engines;
  • you are in the process of developing a website and are not ready to be crawled and indexed by search engine spiders;
  • you want to restrict access to reputable bots.
576315 2
576315 2

Step 2. Create and save a robots.txt file

To create a file, open a regular text or code editor. Save the file as robots.txt. The file name must be written in lowercase letters.

  • Don't forget to add an "s" at the end.
  • Select the extension “.txt” when saving the file. If you are using Word, select the "Plain Text" option.
576315 3 1
576315 3 1

Step 3. Create a robots.txt file with an unconditional disallow directive

The unconditional disallow directive will block search robots from all major search engines, thereby avoiding crawling and indexing the site. Add the following lines to the text file:

    User-agent: * Disallow: /

  • It is strongly discouraged to use the unconditional "disallow" directive in the robots.txt file. When a bot like Bingbot reads this file, it will not index your site, nor will the search engine display it.
  • User-agents (User Agents) is another name for web spiders, or search robots.
  • *: An asterisk means the code applies to all user agents.
  • Disallow: /: The forward slash indicates that the entire site is closed to bots.
576315 4 1
576315 4 1

Step 4. Create a robots.txt file with a conditional allow directive

Instead of blocking all bots, consider blocking specific spiders from accessing certain parts of the site. The main commands of the conditional allow directive include:

  • Blocking a specific bot: replace the asterisk next to User-agent on googlebot, googlebot-news, googlebot-image, bingbot or teoma.

  • Blocking a directory or its contents:

    User-agent: * Disallow: / sample-directory /

  • Blocking a web page:

    User-agent: * Disallow: /private_file.html

  • Image blocking:

    User-agent: googlebot-image Disallow: /images_mypicture.jpg

  • Blocking all images:

    User-agent: googlebot-image Disallow: /

  • Blocking a specific file format:

    User-agent: * Disallow: /p*.gif$

576315 5
576315 5

Step 5. Spur bots to index and crawl your site

Many people not only do not block, but, on the contrary, welcome the attention of search engine spiders to their site so that it is fully indexed. This can be achieved in three ways. First, you can opt out of generating a robots.txt file. If the robot does not find the robots.txt file, it will continue crawling and indexing your entire site. Second, you can create an empty robots.txt file. The robot will find the robots.txt file, see that it is empty, and continue crawling and indexing the site. Finally, you can create a robots.txt file with an unconditional permission directive using the code:

    User-agent: * Disallow:

  • When a bot such as googlebot reads this file, it can freely visit your entire site.
  • User-agents (User Agents) is another name for web spiders, or search robots.
  • *: An asterisk means the code applies to all user agents.
  • Disallow: an empty disallow command means that all files and folders are accessible.
576315 6
576315 6

Step 6. Save the text file in the root directory of the domain

After editing your robots.txt file, save your changes. Paste the file into the root directory of the site. For example, if you have a domain www.yourdomain.com, place your robots.txt file at www.yourdomain.com/robots.txt.

Method 2 of 2: Blocking search engines with meta tags

576315 7
576315 7

Step 1. Check out the HTML meta robots

The robots meta tag allows programmers to set parameters for bots or search engine spiders. These tags prevent bots from indexing and crawling the entire site or parts of it. They can also be used to block a specific search engine spider from indexing content. These tags are specified in the header of the HTML file.

This method is usually used by programmers who do not have access to the site's root directory

576315 8
576315 8

Step 2. Deny bots access to one page

Page indexing and / or following links on the page can be disabled for all bots. This tag is usually used when a site is under construction. It is highly recommended that you remove this tag after the site has finished working. If you do not remove the tag, the page will not be indexed or searchable through search engines.

  • Prevent bots from indexing the page and following any of the links:

    
    
  • Prevent all bots from indexing the page:

    
    
  • Prevent all bots from following links on the page:

    
    
576315 9
576315 9

Step 3. Allow bots to index the page, but not follow its links

If you allow bots to index the page, it will be indexed. If you prevent spiders from following links, the path of links from this page to others will be blocked. Insert the following line of code into the header:

    
    
576315 10
576315 10

Step 4. Allow the search engine spiders to follow the links, but not index the page

If you allow bots to follow links, the path of links from this page to others will remain open. If you prevent bots from indexing a page, it will not appear in the index. Insert the following line of code into the header:

    
    
576315 11
576315 11

Step 5. Block the outgoing link

To hide one link per page, place the tag rel inside the link tag. Use this tag to block links on other pages that lead to the specific page you want to block.

    Insert a link to the blocked page

576315 12
576315 12

Step 6. Block a specific search spider

Instead of blocking access to the page for all bots, set a ban on page crawling and indexing for only one bot. To do this, replace the word "robots" in the meta tag with the name of a specific bot. Examples: googlebot, googlebot-news, googlebot-image, bingbot and teoma.

    
    
576315 13
576315 13

Step 7. Spur bots to crawl and index the page

If you want to make sure your page gets indexed and links follow, add the "robots" meta tag to your title. Use the following code:

    
    

Popular by topic