Posted by Eric Enge

Intro from Rebecca: Eric Enge is a guest blogger for SEOmoz. His posts primarily focus on link building, but he has tackled other topics as well. He has previously written about the role of outbound links, various ways to pursue links, the role of directories in link building, Google’s Ajax APIs, and how he doesn’t buy links. Today he’ll be shifting gears a bit and will be talking about duplicate content. Enjoy!


Conventional wisdom among experienced SEOs is that there is no such thing as a duplicate content penalty. In general principle, this notion is true, but there are exceptions to this rule. In other words, duplicate content penalties do exist for certain scenarios, and that is what we are going to discuss in this post.


The Conventional Wisdom

Once again, the conventional wisdom is almost always right. Here it is:

  1. Duplicate content can occur within a site, or across different sites.
  2. A page can be considered duplicate without being identical.
  3. The search engine wants to publish one version of a particular piece of content in their index. This is fundamentally because if a user gets a set of search results, goes to an article, and decides that is not what they want and returns to the search engine to check out other results, giving them another copy of the same article does not help them.

So fundamentally, what search engines implement is a filter. So far so good. Now let’s talk about the consequences:

  1. Search engine bots come to a site with a crawl budget, which is counted in the number of pages they plan to crawl in each particular session. Each time it crawls a page that is a dupe (which is simply going to be filtered out of search results), you have let the bot waste some of its crawl budget. That means fewer of your "good" pages will get crawled.
  2. Links to duplicate content pages represent a waste of link juice. Duplicated pages can gain PageRank, or link juice, and since it does not help them rank, that link juice is misspent.
  3. Lastly, no search engine has offered a clear algorithm for how a search engine picks which version of a page is does show. In other words, if it discovers 3 copies of the same content, which 2 does it filter out? Which one does it still show? Does it vary based on the search query? The bottom line is that the search engine might not do what you want it to do.

While some SEOs may debate some of the specifics above, I think that the general structure will meet with agreement across most SEOs. So, now let’s talk about a couple of problems around the edge of this model.


Problem Numero Uno

It’s that last bullet in the list of consequences. For example, on your site you may have a bunch of product pages, and also offer print versions of those pages. The search engine might just pick the print page as the one to show in its results. This does happen at times, and it can happen even if the print page has lower link juice and will rank less well then the main product page.

I saw this with a recent client. The fix was to nofollow links to the print pages and no noindex those pages as well. Once this was implemented, everything improved significantly for them.

Strictly speaking, no penalty was in fact assessed. However, picking a lower ranking version of the page to show sure felt like a penalty.

A second version of this can occur when you syndicate content to 3rd parties. The problem is that the search engine may boot your copy of the article out of the results in favor of the version in use by the person re-publishing your article. This also does happen. The best fix I know for this, other than noindexing the copy of the article that your partner is using, is to have them implement a link back to the original source page on your site. Search engines nearly always interpret this correctly, and emphasize your version of the content when you do that.

Once again, perhaps no penalty was assessed, but it still sure feels like one.

An Actual Penalty Situation

The above examples are not actual penalties, but for all practical purposes have the same impact as a penalty – lower rankings for your pages. But there are scenarios where an actual penalty can occur.

I worked on one site that was aggregating content from many sources (from thousands of sites). More than 60% of the pages on the site contained content that could be found on those other sites. The value add of the site was in the unique categorization and organization of the content, and in the value-added information about each of the sources.

The site did very, very well for many years. But then the bottom fell out of the whole thing. Traffic dove to less than 20% of its highest levels. The great majority of pages were in the supplementals (back when these were still visible) and even ranked below pages on sites that had duplicated the content from them. The business was fundamentally in ruins.

We were able to rehabilitate the site and get it to about half its original traffic levels. The only thing we did was significantly reduce the amount of duplicate content. By getting it to these lower levels, we apparently got it below a threshold that made Google like the site again.

Summary

We do have scenarios where the way that the search engines select which version of a particular article to show is, for all intents and purposes, a penalty. While the search engine people I have spoken to would not call that a penalty, to a publisher it is. Regardless of what you call it, these are scenarios you need to avoid because they hurt your site.

In addition, real duplicate content penalties do exist. The scenario may need to be extreme, but it can, and does, happen.

Do you like this post? YesNo