Stay On Search



Robots.txt Guide for Popular CMS and Shopping Carts

Posted by Mark Thompson in SEO | January 21, 2010 | Comments (9)

It can be difficult to know which directories should be restricted from search engines and which should be allowed.  I thought it would be a good idea to create a robots.txt reference guide for popular content management systems and shopping carts.  Depending on what backend you are using, all you have to do is copy the the text into a robots.txt file and upload it to your server.  This should help manage the issue with duplicate content pages on your site.

WordPress Robots.txt File

**If your blog is in a sub-directory, prefix the below with the blog directory name. (ex: /blog/directory)

[plain] User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: /cgi-bin
Disallow: /images/
Disallow: /search
Disallow: /feed
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$ [/plain]

Magento Robots.txt File

[plain]
User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /catalogsearch/
Disallow: /app/
Disallow: /downloader/
Disallow: /images/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/product_compare/
Disallow: /catalog/ Disallow: /customer/
Disallow: /catalogsearch/advanced/
Disallow: /wishlist/
Disallow: /404/
Disallow: /admin/
Disallow: /api/
Disallow: /install/
Disallow: /catalog/product/view/id/</code>
Disallow: /customer/
[/plain]

Drupal Robots.txt File

[plain]

User-agent: *
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /contact
Disallow: /logout
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/password/
Disallow: /print/
Disallow: /forward/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Block user tracker pages
Allow: /project/track
Disallow: /*/track$
Disallow: /*/track?page=

If you are not using static urls:

Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Disallow: /user/login/
[/plain]

Joomla Robots.txt File

[plain]
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/
Disallow: /libraries/
Disallow: /tmp/
Disallow: /xmlrpc/
Disallow: /admin
Disallow: /administrator
Disallow:/admin/
Disallow: /admin.html
Disallow:/admin.php
[/plain]

Robots.txt References

About Mark Thompson

Mark is the creator of StayOnSearch and president of Search Creatively, a full-service Internet Marketing Company located in Raleigh, North Carolina. He also contributes to many industry related blogs including Search Engine Journal and is active on Facebook and Twitter.Follow Mark on Twitter



9 Comments »

  1. [...] събития в СофияRobots.txt Guide for Popular CMS and Shopping CartsSource: Robots.txt Guide for Popular CMS and Shopping Carts | StayOnSearchPosted in Без категория by admin at февруари 13th, 2010. Отказване [...]

  2. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

  3. Regarding WordPress, what is the reason for adding the closing pre tag after “/images/”?

    Comment by Matthew — February 16, 2010 @ 5:36 pm
  4. Opps sorry about that, that is a error. Thanks for asking about it or I would have missed it. I will fix that. Just disregard that pre tag.

    Comment by Mark Thompson — February 16, 2010 @ 7:10 pm
  5. Cool, thanks for clarifying. Is there any reason not to have a “/” after folders like “/wp-admin” or does it make no difference if it is written “/wp-admin/”?

    Comment by Matthew — February 16, 2010 @ 10:00 pm
  6. It is best to block both the non / and the /. This will ensure that the search engines do not index the directories, given that they treat the non and slash versions as two separate urls.

    Comment by Mark Thompson — February 16, 2010 @ 10:32 pm
  7. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

    Pingback by Setting Up Wordpress for SEO — February 27, 2010 @ 8:40 pm
  8. Here elaborates the matter not only extensively but also detailly .I support the write's unique point.It is useful and benefit to your daily life.You can go those <a href=” http://conditions-encountered.com/ ” >wslmart.net sits to know more relate things.They are strongly recommended by friends.Personally

    Comment by moncler outlet store — June 9, 2010 @ 4:22 am
  9. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

RSS feed for comments on this post. TrackBack URI

Leave a comment