Robots.txt Guide for Popular CMS and Shopping Carts

Posted by Mark Thompson - January 21, 2010 - SEO - 15 Comments

It can be difficult to know which directories should be restricted from search engines and which should be allowed.  I thought it would be a good idea to create a robots.txt reference guide for popular content management systems and shopping carts.  Depending on what backend you are using, all you have to do is copy the the text into a robots.txt file and upload it to your server.  This should help manage the issue with duplicate content pages on your site.

WordPress Robots.txt File

**If your blog is in a sub-directory, prefix the below with the blog directory name. (ex: /blog/directory)

[plain] User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: /cgi-bin
Disallow: /images/
Disallow: /search
Disallow: /feed
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$ [/plain]

Magento Robots.txt File

[plain]
User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /catalogsearch/
Disallow: /app/
Disallow: /downloader/
Disallow: /images/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/product_compare/
Disallow: /catalog/ Disallow: /customer/
Disallow: /catalogsearch/advanced/
Disallow: /wishlist/
Disallow: /404/
Disallow: /admin/
Disallow: /api/
Disallow: /install/
Disallow: /catalog/product/view/id/</code>
Disallow: /customer/
[/plain]

Drupal Robots.txt File

[plain]

User-agent: *
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /contact
Disallow: /logout
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/password/
Disallow: /print/
Disallow: /forward/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Block user tracker pages
Allow: /project/track
Disallow: /*/track$
Disallow: /*/track?page=

If you are not using static urls:

Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Disallow: /user/login/
[/plain]

Joomla Robots.txt File

[plain]
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/
Disallow: /libraries/
Disallow: /tmp/
Disallow: /xmlrpc/
Disallow: /admin
Disallow: /administrator
Disallow:/admin/
Disallow: /admin.html
Disallow:/admin.php
[/plain]

Robots.txt References

About the Author

Mark Thompson

Mark is the creator of StayOnSearch and president of Search Creatively, a full-service Internet Marketing Company located in Raleigh, North Carolina. He also contributes to many industry related blogs including Search Engine Journal and is active on Facebook and Twitter. Follow Mark on Twitter
10 comments
Jay K
Jay K

Wordpress questions: 1) What is the [plain] on the first line for? 2) do you need to add Allow: /? 3) do you need to duplicate the list for both no slash or a slash "/"?

Brock
Brock

Thank you, Magento has blocked out a lot of there forums from the public. What is [Plain][/Plain]?

moncler outlet store
moncler outlet store

Here elaborates the matter not only extensively but also detailly .I support the write's unique point.It is useful and benefit to your daily life.You can go those http://conditions-encountered.com/ " >wslmart.net sits to know more relate things.They are strongly recommended by friends.Personally

Mark Thompson
Mark Thompson

It is best to block both the non / and the /. This will ensure that the search engines do not index the directories, given that they treat the non and slash versions as two separate urls.

Matthew
Matthew

Cool, thanks for clarifying. Is there any reason not to have a "/" after folders like "/wp-admin" or does it make no difference if it is written "/wp-admin/"?

Mark Thompson
Mark Thompson

Opps sorry about that, that is a error. Thanks for asking about it or I would have missed it. I will fix that. Just disregard that pre tag.

Matthew
Matthew

Regarding WordPress, what is the reason for adding the closing pre tag after "/images/"?

Mark Thompson
Mark Thompson

Good question. Sorry, the [plain] is just a formatting issue with the page...please disregard that. No you don't need to add Allow: / I would add the no slash and slash, as search engines treat each of those differently. Thanks for the questions :-) Mark

Trackbacks

  1. [...] събития в СофияRobots.txt Guide for Popular CMS and Shopping CartsSource: Robots.txt Guide for Popular CMS and Shopping Carts | StayOnSearchPosted in Без категория by admin at февруари 13th, 2010. Отказване [...]

  2. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

  3. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

  4. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

  5. [...] are certain directories that you should block from search engines.  Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]

  6. [...] th&#1072t &#1091&#959&#965 &#1109h&#959&#965ld block fr&#959m search engines.  Check out th&#1077 Robots.txt guide I wrote th&#1072t w&#1110ll &#1109h&#959w &#1091&#959&#965 exactly wh&#1110&#1089h directories [...]