Thursday, November 12, 2009

Robots.txt and SEO - How to Use Robots.txt

Tonight I ran into my same friend who had previously asked me about hiding text using the same text and background colors, and this time he had another very pertinent question: How is robots.txt to be used for SEO?

Honestly, I have always thought of robots.txt as a way to tell search engines which pages on a website not to index, but hadn't thought about it more broadly than that. So, I figured I'd do some quick web research and let you all know what I learned about using robots.txt for SEO.

First of all, the robots.txt file is used to provide cooperating web robots with instructions on how to crawl the site. If the file is not present, the robot will assume no specific instructions are being given. Also, you need to include a robots.txt file for each subdomain (i.e., subdomain.domain.com/robots.txt).

The main instruction is the "Disallow: /directory/page" command which tells the robot not to crawl those pages. You can also specify that instruction to all robots (with a *) or mention specific robots (or user-agents) (check out this example from CNN).

Another common instruction is to point the robot in the direction of the sitemap file. This is done through the "sitemap: URL location of sitemap" instruction. This is helpful in making sure the robot finds all the pages on your site (more on sitemaps in a later post), but again, not necessary to include.

No comments:

Post a Comment