Robots.txt Best practices

Even though SEO specialists put most of their effort into improving the visibility of pages for their corresponding keywords, in some cases it’s required to hide certain pages from search engines. Let’s find out a bit more about this topic.

What is a robots.txt file?

Robots.txt is a file that contains the areas of a website that search engine robots are forbidden from crawling. It lists the URLs that the webmaster doesn’t want Google or any search engine to index and prevents them from visiting and tracking the selected pages.
When a bot finds a website on the Internet, the first thing it does is check the robots.txt file in order to learn what it is allowed to explore and what it has to ignore during the crawl.

To give you a robots.txt example, this is its syntax:

User-agent: *

# All bots - Old URLs

Allow: /

Disallow: /admin/*


What is robots.txt in SEO

These tags are required to guide the Google bots when finding a new page. They are necessary because:

–       They help optimize the crawl budget, as the spider will only visit what’s truly relevant and it’ll make better use of its time crawling a page. An example of a page you wouldn’t want Google to find is a “thank you page”.

–       The Robots.txt file is a good way to force page indexation, by pointing out the pages.

–       Robots.txt files control crawler access to certain areas of your site.

–       They can keep entire sections of a website safe, as you can create separate robots.txt files per root domains. A good example is –you guessed it- the payment details page, of course.

–       You can also block internal search results pages from appearing on the SERPs.

–       Robots.txt can hide files that aren’t supposed to be indexed, such as PDFs or certain images.


Where do you find robots.txt

Robots.txt files are public. You can simply type in a root domain and add /robots.txt to the end of the URL and you’ll see the file…if there is one!

Warning: avoid listing private information in this file.

You can find and edit the file at the root directory on your hosting, checking the files admin or the FTP of the website.


How to edit robots.txt

You can do it yourself

–       Create or edit the file with a plain text editor

–       Name the file “robots.txt”, without any variation like using capital letters.

It should look like this if you want to have the site crawled:

User-agent: *

–       Notice that we left “Disallow” empty, which indicates that there’s nothing that is not allowed to be crawled.

In case you want to block a page, then add this (using the “Thank you page” example):

User-agent: *
Disallow: /thank-you/

–       Use a separate robots.txt file for each subdomain.

–       Place the file on the website’s top-level directory.

–       You can test the robots.txt files using Google Webmaster Tools before uploading them to your root directory.

–       Take note that FandangoSEO is the ultimate robots.txt checker. Use it to monitor them!

See it isn’t so difficult to configure your robots.txt file and edit it anytime. Just keep in mind that all you really want from this action is to make the most of the bots visits. By blocking them from seeing irrelevant pages, you’ll ensure their time spent on the website will be much more profitable.

Finally, remember that the SEO best practice for robots.txt is to ensure that all the relevant content is indexable and ready to be crawled! You can see the percentage of indexable and non-indexable pages among the total pages of a site using FandangoSEO’s crawl, as well as the pages blocked by the file robots.txt.


Have you added a robots.txt file yet?

Check Robots now