Your site index gives you an indication of health problems on your site; if you suddenly see a spike or large drop in Google indexed pages it flags a warning that there is something wrong that needs some attention. We focus mostly on Google, as this search engine sends us the greater share of organic SEO traffic, i.e. they have a larger market share.
Remember that the more pages you have indexed in Google, the more keywords you can rank for (long-tail) and the more keywords you rank for, the higher the amount of traffic you will receive to your site.
So how do we find issues within our site’s index? As an SEO, what you need to be doing are weekly health checks which allow you to monitor your site’s health by tracking things like indexed pages, backlinks and duplicates. If you notice any significant changes in these areas, you should start a new investigation into what has been going on.
Signs that your site’s health is in danger would be a combination of a large drop of indexed pages in Google, and a large increase in indexed pages in Yahoo. This normally indicates duplicate or junk content problem.
If your site has been verified with Google Webmaster Tools, you will be able to see things such as duplicate page titles and descriptions, 404 errors, etc.
I normally use a combination of Google’s Webmaster Tools console and Yahoo SiteExplorer to find any issues that may hinder my site being indexed in Google.
How to Go About It
Here are some tips I agree with which has been taken from this blog post.
1. META Tags
You can add a simple META tag to the top of every live page you have (between the <head> and </head> tags) and configure it to your liking. Here’s how the META tag should look:
<Meta content=”NOINDEX, FOLLOW”>
NOINDEX signifies that site crawlers should not index the page. Alternatively, writing index would tell crawlers to index the page.
FOLLOW signifies that links on this page should be tracked or given credit for (alternatively, no follow would not track links).
The “nofollow” attribute is also used commonly for individual links by adding rel=”nofollow” to the <a> HTML element when web publishers don’t want to get penalized for linking to suspicious or low quality sites.
By default, every page is tagged as “index, follow”, but changing this attribute can help you configure your pages in a few different ways.
WordPress Meta Robots Plug-in
For WordPress users, you can install the Meta Robots plug-in. This will allow you to configure each and every post and page from the editor, as well as configure the global settings of your site.
2. Google Webmaster Tools – Remove URL Tool
You can use Google’s Remove URL tool in Google Webmaster tools for an emergency URL removal, this tool can be found under “Crawler access” as shown in the image below:
You’ll notice at the top of the page, there are one of three things you must do before being able to submit a Removal Request. For a directory/page/file to be removed from Google’s index, you must do one of the following (see Google’s URL removal Requirements):
- Make sure the content is no longer live on the web. Requests for the page or image you want to remove must return an HTTP 404 (not found) or 410 status code.
- Block the content using a Meta No Index tag.
- Block the content using a robots.txt file.
In other words, you must either delete the directory/page/file from your server, or do one of the 2 things I’ve already discussed.
Once this has been done, go ahead and click the New Removal Request button. You’ll find that Google gives you options to choose from.
Choose “Remove page from search results and cache” , check the checkbox and submit your request. Your request will be added to a Pending list of requests, and approximately 24 hours later, if all goes well and all requirements have been satisfied, you’ll find that your requests will have been fulfilled and your directory/page/file will have been removed from Google’s index.
Some Stuff to Note
- Ensure that you do not have any blocked URLs in your XML sitemap.
- Note, that robots.txt file does not work alone; you will still see your blocked URLs in your index but without your page title or Meta description.
- DO NOT add Google Analytics tracking scripts to pages you do not want Google to see.
In conclusion, by cleaning up your Google index, you should see an increase in the amount of indexed pages in Google, with an associated increase in SEO keywords and traffic in your analytics.
I'm quite impress with the information you have given. I've been checking on helpful sites like yours and those such as http://www.TrafficTips101.com to widen my knowledge in improving website traffic. Thanks a lot!
one thing, traffic building is something that takes time. Whether using SEO (search engine optimization), link building, blog carnivals social networking, or other methods your post about Improve Your Website Traffic by Cleaning Up Your Google Index ,A great post and quite informative. You write very well and keep it “simple”, easy, yet direct. Keep up the great work.
Thanks Phil and Mark for your suggested solution. I did the 301 redirect method to get rid of the indexed pages.
If those pages have PR, i will have different approach as i will create same contents and same date posts with help of mysql, still 301 is best
Some non-existing pages still have PageRank on them and are keep getting 404 errors traffic. So what I need to do now is just reformat my XML sitemap for Google and let the non-existing URLs to be expired (from Google's index database) automatically?Correct me if I'm wrong.
I wouldn't recommended blocking your entire site within Webmaster Tools.If you can 301 redirect the pages that are sending you traffic to the new corresponding page, that will help keep the value of those pages. If the pages no longer exist Google will start to drop them from their index automatically. I would also recommend recreating your XML sitemap and submitting it to Webmaster Tools with the proper URLs.
If I select to remove my entire site from Google Webmaster Tools, will Google re-index or reset the index status of my blog?I have too many URLs that needed to be removed because I've lost my WordPress database earlier and Google is still sending me a lot of traffic to those non-exist URLs. I'm receiving too much 404 error reports each day.What would be your suggestion?
Great. I think this is the best solution for 'not found' links. Thanks a lot. However, how much time it takes to process a request , do you have any idea?Regardshttp://ranacseruet.blogspot.com/2009/12/how-to-.....
Hi, even thought about another thing in your article and came back to re-read it! Told you it was useful!
Thanks for your comment. I'm glad that you enjoyed the post and that it was of some use to you :)Saudiqua
A good round up and some really good notes on areas so often overlooked. Much appreciated and now implemented. Thanks Saudiqua.