Robots.txt Guide for Popular CMS and Shopping Carts

It can be difficult to know which directories should be restricted from search engines and which should be allowed.  I thought it would be a good idea to create a robots.txt reference guide for popular content management systems and shopping carts.  Depending on what backend you are using, all you have to do is copy the the text into a robots.txt file and upload it to your server.  This should help manage the issue with duplicate content pages on your site.

Wordpress Robots.txt File

**If your blog is in a sub-directory, prefix the below with the blog directory name. (ex: /blog/directory)

 User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: /cgi-bin
Disallow: /images/
Disallow: /search
Disallow: /feed
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$ 
Magento Robots.txt File
 User-agent: *
 Disallow: /*?
 Disallow: /*.js$
 Disallow: /*.css$
 Disallow: /checkout/
 Disallow: /catalogsearch/
 Disallow: /app/
 Disallow: /downloader/
 Disallow: /images/
 Disallow: /js/
 Disallow: /lib/
 Disallow: /media/
 Disallow: /*.php$
 Disallow: /pkginfo/
 Disallow: /report/
 Disallow: /skin/
 Disallow: /var/
 Disallow: /catalog/product_compare/
 Disallow: /catalog/ Disallow: /customer/
 Disallow: /catalogsearch/advanced/
 Disallow: /wishlist/
 Disallow: /404/
 Disallow: /admin/
 Disallow: /api/
 Disallow: /install/
 Disallow: /catalog/product/view/id/</code>
 Disallow: /customer/
Drupal Robots.txt File

 User-agent: *
 # Directories
 Disallow: /database/
 Disallow: /includes/
 Disallow: /misc/
 Disallow: /modules/
 Disallow: /sites/
 Disallow: /themes/
 Disallow: /scripts/
 Disallow: /updates/
 Disallow: /profiles/
 # Paths (clean URLs)
 Disallow: /admin/
 Disallow: /aggregator/
 Disallow: /comment/reply/
 Disallow: /contact/
 Disallow: /logout/
 Disallow: /node/add/
 Disallow: /search/
 Disallow: /user/register/
 Disallow: /contact
 Disallow: /logout
 Disallow: /user/register
 Disallow: /user/password
 Disallow: /user/login
 Disallow: /user/password/
 Disallow: /print/
 Disallow: /forward/
 # Files
 Disallow: /xmlrpc.php
 Disallow: /cron.php
 Disallow: /update.php
 Disallow: /install.php
 Disallow: /INSTALL.txt
 Disallow: /INSTALL.mysql.txt
 Disallow: /INSTALL.pgsql.txt
 Disallow: /CHANGELOG.txt
 Disallow: /MAINTAINERS.txt
 Disallow: /LICENSE.txt
 Disallow: /UPGRADE.txt
 # Block user tracker pages
 Allow: /project/track
 Disallow: /*/track$
 Disallow: /*/track?page=

If you are not using static urls:

 Disallow: /?q=admin/
 Disallow: /?q=aggregator/
 Disallow: /?q=comment/reply/
 Disallow: /?q=contact/
 Disallow: /?q=logout/
 Disallow: /?q=node/add/
 Disallow: /?q=search/
 Disallow: /?q=user/password/
 Disallow: /?q=user/register/
 Disallow: /?q=user/login/
 Disallow: /user/login/
Joomla Robots.txt File
 User-agent: *
 Disallow: /administrator/
 Disallow: /cache/
 Disallow: /components/
 Disallow: /editor/
 Disallow: /help/
 Disallow: /includes/
 Disallow: /language/
 Disallow: /mambots/
 Disallow: /media/
 Disallow: /modules/
 Disallow: /templates/
 Disallow: /installation/
 Disallow: /libraries/
 Disallow: /tmp/
 Disallow: /xmlrpc/
 Disallow: /admin
 Disallow: /administrator
 Disallow:/admin/
 Disallow: /admin.html
 Disallow:/admin.php
Robots.txt References

Related posts:

  1. Competitive Research & Analysis Guide for Search Engine Optimization

Share This Post

  • Subscribe to RSS
  • Twitter
  • Delicious"
  • Digg
  • StumbleUpon
  • Facebook
  • FriendFeed
  • Google
  • LinkedIn
  • Technorati
 banner ad
  • Cool, thanks for clarifying. Is there any reason not to have a "/" after folders like "/wp-admin" or does it make no difference if it is written "/wp-admin/"?
  • It is best to block both the non / and the /. This will ensure that the search engines do not index the directories, given that they treat the non and slash versions as two separate urls.
  • Regarding WordPress, what is the reason for adding the closing pre tag after "/images/"?
  • Opps sorry about that, that is a error. Thanks for asking about it or I would have missed it. I will fix that. Just disregard that pre tag.
blog comments powered by Disqus