There are a lot of things you need to know and learn in blogging. You can never be perfect in any field as there are many bigger and better things you will know along the way. Even very small things and files in your website matter a lot in terms of Google rankings and SEO as a whole. One such thing is the “robots.txt” file. Initially, when I started blogging, I did not actually know what this file is and the importance of this file. So, I made a lot of research from various sources and understood its exact use and how important it is in SEO. Many newbie bloggers don’t know what robots.txt is and its use, so I thought of writing a perfect descriptive article on it.
What is Robots.txt file?
Robots.txt is very small text file present at the root of your site. As most of you know, the web crawlers and spiders are responsible for the development of the entire web network. Ideally, these crawlers can crawl into any page or any URL present on web, even the one’s which are private and should not be accessed.
It does not restrict people from accessing your content.
In order to take control of the files you want the crawlers to access and restrict, you can direct them using the robots.txt file. Robots.txt is not a html file, but the spiders obey what this file states. This file is not something which protects your site directly from external threats, but it just requests the crawler bots not to enter a particular area of your site.
Where do you find robots.txt file?
The location of this file is very important for the crawlers to identify it. So, it must be in the main directory of your website.
This is where the bots and even you can find the file of any website. If the crawlers won’t find the file in the main directory, they simply assume that there is no robots file for the website and there by index all the pages of the site.
Basic Structure of Robots.txt file
The structure of the file is very simple and any one can understand it easily. It majorly consists of 2 components i.e. User agent and Disallow.
Syntax: User-agent: Disallow:
Complete Understanding of Exclusion with Examples
Firstly, you should know what the components exactly mean and what their function is. “User-agent” is the term used to determine the search engine crawlers, whether it may be Google, Yahoo or any search engine. “Disallow” is the term used to list the files or directories and exclude them from the crawler listings.
Directory or Folder Exclusion:
The basic exclusion which is used by most of the sites is,
User-agent: * Disallow: /test/
Here, * indicates all the search engine crawlers. Disallowing /test/ indicated that the folder with name ‘test’ has to be excluded from being crawled.
User-agent: * Disallow: /test.html
This indicates that all the search engine crawlers should not crawl the file named ‘test.html’.