Identifying Duplicate Content and How To Deal With It
Duplicate content and plagiarism can be an easy way for a website to get penalized by the search engines or possibly banned. The search engines have gotten much better at being able to check for duplicate content. If you are interested, here is what Google has to say about duplicate content. For website owners, bloggers, and writers there are a number of tools you can use to identify duplicate content. This post will talk discuss tools used to identify plagiarism, how to deal with duplicate content and what limitations there are for having duplicate content on your site.
Tools To Help Identify Duplicate Content
One of the best and fastest ways to check for duplicate content is to take a snippet of the content and enter it in quotes in a google search. For example, here is an article taken from cnn. I have taken the first sentence and put it in Google.
“Debbie Burk books a four-star hotel in Chicago, hoping to avoid a particular property, which is rated a half star lower.”
As you can see Google will do a great job of finding other sources that have the exact same content on their site. In this case, since CNN is a major news outlet, there are a lot of other sites that pick up their stories and syndicate them on their sites. However if you take a snippet from one of your product or service pages, there should be no detections of plagiarism unless someone has copied it. If you notice that no results show up when you add quotes, try taking out the quotes. Sometimes you will find results that look extremely similar. Usually someone will just modify the content slightly in hopes that it will not get picked up by the search engines as duplicate content.
Here are some of the best duplicate content tools on the web that will not only check for other copies of your content on the web, but will identify internal issues you may have with your site.
Copyscape is probably the most popular duplicate content tool out there. This free service detects copies of your web pages across the web. The free version only returns no more than 10 results for any search, and you are limited to the number of searches you can perform. However they do offer two other premium services for users who need to be able to gather more in-depth duplicate content research. Copyscape premium offers a more comprehensive search for plagiarism detection along with features like batch search (up to 10,000 pages), copy and paste, manage cases of plagiarism, exclude certain sites, compare two urls, and automatic checks using the API.
This tool will allow you to enter a keyword, phrase, or sentence into the search field and it will return Google results of any other sites that have the same words entered. One cool feature the tool will allow you to do is set up a google alert, so it notifies you if someone copies your content.
Plagiarism detect offers a free and premium version, similar to Copyscape. The free version of this tool will allow you to upload text and word doc files for analysis and will return detections found. The premium version has many other features including comparing two documents side-by-side, a more advanced algorithm and a Microsoft Word plugin, so you can check for plagiarism directly from word.
This plagiarism tool will display a visually pleasing diagram of detections of other websites that have copied your content. Plagium will show a calendar of when it was discovered. This tool will allow you to search over the entire web or strictly news sources. You may also refine by language and only check for duplicates in a specific language.
Virante offers a different type of duplicate content tool, that checks more for internal duplicate content issues. The issues it will check include www vs non-www redirect issues, similar pages on your site, issues with index.html vs /, properly returning 404 error pages for any pages that are missing, any PR issues between the www and non-www.
The WebConfs tool will take two urls and determine the percentage of similarity between the two urls. The lower the percentage the less similar the two pages are.
Dealing With Duplicate Content
Rand Fishkin from SEOmoz recently did a Whiteboard Friday on duplicate content and how to deal with it. There are a number of good points that he brings up that I wanted to expand on.
How much duplicate content is ok on my site?
This is a grey area as to exactly how much duplicate content you can have on your site. If Google notices that your entire site is made up of duplicate content, they will most likely remove the majority of your pages from the index and/or penalize your site in the SERPs. However if you are using duplicate content in moderation (quote, section of a press release, product description) you will not have to worry about any penalty. A rule of thumb is to use content from other sources when it makes sense for the user and how it relates to the other content on the page.
What if someone else publishes my content and it gets indexed first, do they get credit?
Google has many ways of identifying the original source for a piece of content. They look at domain trust/authority, PR, inbound links, contextual links back to the original source within the duplicate content. Say your article gets picked up by a number of mainstream news sources, the odds are that even though those sites are authoritative, because they will most likely link back to the source, that will tell Google that your site is the original source. Like Rand said in the video, if a site syndicates your content, usually your domain trust will determine if Google keeps that page in its index.
Being Unique is Not Enough!
To me this is by far the biggest point to make when talking about duplicate content. Many clients think that if you change the title or move a few words around on the page, that it is unique and that will be enough. This is entirely not true. You need to add value and put your own spin on a topic/discussion. Content that goes viral is usually something that is completely unique, has exceptional value, and it a unique way of presenting the information.
What To Do About It
If someone copies your content there are a few things you can do to have it removed.
- Contact the Site: Email the website and politely ask them to remove it
- Submit a Spam Report Request: Send a spam request to Google, notifying them about a duplicate site or page.
- File an Infringement Notification: Visit the DMCA page on Google’s site and follow the instructions needed to properly file a notice of infringement.
Watch the Entire Video: SEOmoz Whiteboard Friday – Dealing with Duplicate Content
- Search Engine Journal: How to Find your Website Duplicate Content Issues
- WebConfs: Duplicate Content Filter: What it is and How it Works