Your site index gives you an indication of health problems on your site; if you suddenly see a spike or large drop in Google indexed pages it flags a warning that there is something wrong that needs some attention. We focus mostly on Google, as this search engine sends us the greater share of organic SEO traffic, i.e. they have a larger market share.

Remember that the more pages you have indexed in Google, the more keywords you can rank for (long-tail) and the more keywords you rank for, the higher the amount of traffic you will receive to your site.

Finding Issues

So how do we find issues within our site’s index? As an SEO, what you need to be doing are weekly health checks which allow you to monitor your site’s health by tracking things like indexed pages, backlinks and duplicates. If you notice any significant changes in these areas, you should start a new investigation into what has been going on.

Signs that your site’s health is in danger would be a combination of a large drop of indexed pages in Google, and a large increase in indexed pages in Yahoo. This normally indicates duplicate or junk content problem.

If your site has been verified with Google Webmaster Tools, you will be able to see things such as duplicate page titles and descriptions, 404 errors, etc.

I normally use a combination of Google’s Webmaster Tools console and Yahoo SiteExplorer to find any issues that may hinder my site being indexed in Google.

How to Go About It

Here are some tips I agree with which has been taken from this blog post.

1. META Tags

You can add a simple META tag to the top of every live page you have (between the <head> and </head> tags) and configure it to your liking. Here’s how the META tag should look:

<Meta content=”NOINDEX, FOLLOW”>

NOINDEX signifies that site crawlers should not index the page. Alternatively, writing index would tell crawlers to index the page.

FOLLOW signifies that links on this page should be tracked or given credit for (alternatively, no follow would not track links).

The “nofollow” attribute is also used commonly for individual links by adding rel=”nofollow” to the <a> HTML element when web publishers don’t want to get penalized for linking to suspicious or low quality sites.

By default, every page is tagged as “index, follow”, but changing this attribute can help you configure your pages in a few different ways.

WordPress Meta Robots Plug-in

For WordPress users, you can install the Meta Robots plug-in. This will allow you to configure each and every post and page from the editor, as well as configure the global settings of your site.

2. Google Webmaster Tools – Remove URL Tool

You can use Google’s Remove URL tool in Google Webmaster tools for an emergency URL removal, this tool can be found under “Crawler access” as shown in the image below:

You’ll notice at the top of the page, there are one of three things you must do before being able to submit a Removal Request. For a directory/page/file to be removed from Google’s index, you must do one of the following (see Google’s URL removal Requirements):

  • Make sure the content is no longer live on the web. Requests for the page or image you want to remove must return an HTTP 404 (not found) or 410 status code.
  • Block the content using a Meta No Index tag.
  • Block the content using a robots.txt file.

In other words, you must either delete the directory/page/file from your server, or do one of the 2 things I’ve already discussed.

Once this has been done, go ahead and click the New Removal Request button. You’ll find that Google gives you options to choose from.

Choose “Remove page from search results and cache” , check the checkbox and submit your request. Your request will be added to a Pending list of requests, and approximately 24 hours later, if all goes well and all requirements have been satisfied, you’ll find that your requests will have been fulfilled and your directory/page/file will have been removed from Google’s index.

Some Stuff to Note

  • Ensure that you do not have any blocked URLs in your  XML sitemap.
  • Note, that robots.txt file does not work alone; you will still see your blocked URLs in your index but without your page title or Meta description.
  • DO NOT add Google Analytics tracking scripts to pages you do not want Google to see.

In conclusion, by cleaning up your Google index, you should see an increase in the amount of indexed pages in Google, with an associated increase in SEO keywords and traffic in your analytics.