Sunday, 20 November 2016

What is a robots.txt file and How to create one?

Many of my friends and fans contacted me on my facebook page My Heart for different queries. Which I found, most asked was, about the robots.txt file. So, friends, I decided to explain the concept here because I think it won’t be convenient for you as well as for me to answer each one of you separately. I will try to make it as simple as possible. So, welcome here! 😊😊

As you all must be familiar with the Google webmaster guidelines (if you have read it ever 😛) saying to make use of robots.txt file on your web server. Now, here comes the basic question, ‘what is robots.txt file?

Robots.txt file is a simple text file used to tell the webcrawlers and bots whether to access a web page or not and a very familiar term for the SEO experts when there comes a need to talk about the on-page search engine optimization of websites. One can simply create a robots.txt file with any text editor such as notepad and save the file with a name ‘robots’ having the ‘txt’ extension. The structure of the file looks like this:



You as the owner of a website use the /robots.txt to give instruction about your site to the web robots or crawlers whether to access a specific page or not. When a robot wants to visit the website, it first checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow:


The first sentence defines this action applies to all bots or crawlers. The next lists the files and directories to be excluded from indexing from a particular website. The location of robots.txt is very important. It must be in the main directory as the search engine crawlers do not search the whole site. If the bots do not find it, they will assume the file does not exist and will index everything they find along the way. The file has full controls of how search engine spiders see and interact with your webpages thus considered it a fundamental part of how search engines work.  Improper usage can hurt the presence of your webpage on the internet.


There are some common robots.txt setups:

User-agent: *                       }                             Allows full access
Disallow:

 


User-agent: *                        }                            Block all access
Disallow: /

 


User-agent: *                       }                             Blocks one folder
Disallow: /folder/

 


User-agent: *                        }                            Blocks one file
Disallow: /file.html

You can add any of the directories you want to restrict search engines to index but first make sure, if you really need one. Look for the content you don’t want search engines to display. There may be a situation where you are developing a live website so you don’t want the crawlers to index it on the current situation. However, the robots.txt file is a good central place to have control over the robots to restrict them from accessing the unnecessary data from your website.


As mentioned above you can manually generate a robots.txt file according to your needs but it can be a real pain and also the syntax errors disturb a lot especially when you are not a master of the skill. But don’t worry; there are many online tools that will generate the file for you. Go and Google it. Next, it is all up to you to type the list of directories you don’t want to get crawled. Enjoy!! 🙌

If you have taken help of this and find it interested or have any suggestion, let us know in the comments below. You can also contact us for any of your query. Your words inspire us to do more of it.