Robots.txt Guide for Popular CMS and Shopping Carts
Posted by Mark Thompson - January 21, 2010 - SEO - 162415 Commentshttp%3A%2F%2Fwww.stayonsearch.com%2Frobots-txt-guideRobots.txt+Guide+for+Popular+CMS+and+Shopping+Carts2010-01-21+18%3A30%3A55Mark+Thompsonhttp%3A%2F%2Fwww.stayonsearch.com%2F%3Fp%3D1624It can be difficult to know which directories should be restricted from search engines and which should be allowed. I thought it would be a good idea to create a robots.txt reference guide for popular content management systems and shopping carts. Depending on what backend you are using, all you have to do is copy the the text into a robots.txt file and upload it to your server. This should help manage the issue with duplicate content pages on your site.
WordPress Robots.txt File
**If your blog is in a sub-directory, prefix the below with the blog directory name. (ex: /blog/directory)
[plain] User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /tag
Disallow: /author
Disallow: /wget/
Disallow: /httpd/
Disallow: /cgi-bin
Disallow: /images/
Disallow: /search
Disallow: /feed
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$ [/plain]
Magento Robots.txt File
[plain]
User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /catalogsearch/
Disallow: /app/
Disallow: /downloader/
Disallow: /images/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/product_compare/
Disallow: /catalog/ Disallow: /customer/
Disallow: /catalogsearch/advanced/
Disallow: /wishlist/
Disallow: /404/
Disallow: /admin/
Disallow: /api/
Disallow: /install/
Disallow: /catalog/product/view/id/</code>
Disallow: /customer/
[/plain]
Drupal Robots.txt File
[plain]
User-agent: *
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /contact
Disallow: /logout
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/password/
Disallow: /print/
Disallow: /forward/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Block user tracker pages
Allow: /project/track
Disallow: /*/track$
Disallow: /*/track?page=
If you are not using static urls:
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Disallow: /user/login/
[/plain]
Joomla Robots.txt File
[plain]
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/
Disallow: /libraries/
Disallow: /tmp/
Disallow: /xmlrpc/
Disallow: /admin
Disallow: /administrator
Disallow:/admin/
Disallow: /admin.html
Disallow:/admin.php
[/plain]
Robots.txt References
- Google Webmaster Tools: Block or Remove Pages Using Robots.txt File
- The Web Robots Pages
- Inside Google Sitemaps: Using a Robots.txt File
More from StayOnSearch
- How to Optimize Your Website URLs for Search Engines– and for People
- Website & Landing Page Design Elements for Usability and SEO
- How To Write Product Descriptions for Search Engines and Humans
- The Ultimate Social Bookmarking Content Creation Guide (Part 2: StumbleUpon)
StayOnSearch Recommends
- My 16 tips for becoming a blogger worth following (Anthony Kirlew)
- WordPress For Business (BlogGlue)
- CMS – Using a CMS for your site can save you lots of time | Ledfrog.com (Brandon Hann)







15 comments
[...] събития в СофияRobots.txt Guide for Popular CMS and Shopping CartsSource: Robots.txt Guide for Popular CMS and Shopping Carts | StayOnSearchPosted in Без категория by admin at февруари 13th, 2010. Отказване [...]
[...] are certain directories that you should block from search engines. Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]
Regarding WordPress, what is the reason for adding the closing pre tag after "/images/"?
Opps sorry about that, that is a error. Thanks for asking about it or I would have missed it. I will fix that. Just disregard that pre tag.
Cool, thanks for clarifying. Is there any reason not to have a "/" after folders like "/wp-admin" or does it make no difference if it is written "/wp-admin/"?
It is best to block both the non / and the /. This will ensure that the search engines do not index the directories, given that they treat the non and slash versions as two separate urls.
[...] are certain directories that you should block from search engines. Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]
Here elaborates the matter not only extensively but also detailly .I support the write's unique point.It is useful and benefit to your daily life.You can go those http://conditions-encountered.com/ " >wslmart.net sits to know more relate things.They are strongly recommended by friends.Personally
[...] are certain directories that you should block from search engines. Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]
Nice Posts Thanks
Thank you, Magento has blocked out a lot of there forums from the public. What is [Plain][/Plain]?
[...] are certain directories that you should block from search engines. Check out the Robots.txt guide I wrote that will show you exactly which directories you should [...]
WordPress questions:
1) What is the [plain] on the first line for?
2) do you need to add Allow: /?
3) do you need to duplicate the list for both no slash or a slash "/"?
Good question.
Sorry, the [plain] is just a formatting issue with the page…please disregard that.
No you don't need to add Allow: /
I would add the no slash and slash, as search engines treat each of those differently.
Thanks for the questions
Mark
[...] thаt уου ѕhουld block frοm search engines. Check out thе Robots.txt guide I wrote thаt wіll ѕhοw уου exactly whісh directories [...]