Aaron’s discussed content mills in his interview with Tedster yesterday.
What is a content mill?
A content mill is a site that publishes cheap content. The content is either user-contributed, paid, or a mix of the two. The term content mill is obviously pejorative, the implication being that the content is only published to pump content into search engines, and is typically of low value in terms of quality.
The problem is that some sites that publish cheap content may well provide value, but it depends who is reading it. For example, a forum might be considered a content mill, as it contains cheap, user-generated content of little value to a disinterested visitor, or a forum might be a valuable, regularly updated resource provided by a community of enthusiasts!
Depends who you ask.
As Aaron says, content mills are all the rage in 2010. Let’s take a closer look.
Why Are SEOs Interested In Content Mills?
This idea is nothing new. It’s actually white-hat SEO strategy, and has been used for years.
- Research keywords
- Write content about those keywords
- Publish content and attempt to rank that content in search engine results
If you can publish a page at a lower cost than your advertising return, then you simply repeat the process over and over, and you’re golden. Think Adsense, affiliate, and similar means to monetize pages. Take a look at Demand Media.
The Problem With Content Mills
One of the problems with content mills is that in an attempt to drive the production cost of content below the predicted return, some site owners are producing garbage content, usually by facilitating free contributions from users.
At the low end, Q&A sites proliferate wherein people ask questions and a community of people with opinions, informed or otherwise, provide their two cents worth. Unfortunately, many of the answers are worth somewhat less than two cents, resulting in pages of little or no value to an end reader. I’m sure you’ve seen such pages, as such pages often rank well in search engines if they are published on a domain with sufficient authority.
Some sites, like Mahalo, not only automate their page creation, but the use that automated page to generate automate related question pages as well. The rabbit hole has no bottom!
At the other end of the spectrum, we have sites that publish higher-cost, well researched content sourced from paid writers. A traditional publishing model, in other words. Generally speaking, such pages are of higher value to end user, but the problem is that the search engines can’t appear to tell the difference between these pages and the junk opinion pages. If the content mill has sufficient authority, then the junk gets promoted.
And there are many examples in between, of course.
As Tedster mentioned,“the problem here is that every provider of freelance content is NOT providing junk – though some are. As far as I know, there is no current semantic processing that can sort out the two. It’s tough to see how this could be quickly and effectively reined in, at least not by algorithm. I assume that this kind of empty filler content is not very useful for visitors — it certainly isn’t for me. So I also assume it must be on Google’s radar.”.
The Future Of Content Mills
I think Tedster is right – such sites will surely appear on Google’s radar, because junk, low value content doesn’t help their end users.
It must be a difficult problem to solve, else Google would have done so by now, but I think it’s reasonable to assume Google will try to relegate the lowest of the low-value content sites at some point. If you are following a content mill strategy, or considering starting one, it’s reasonable to prepare for such an eventuality.
The future, I suspect, is not to be a content mill, in the pejorative sense of the word. Aim for quality.
Arbitrary definitions of quality are difficult enough, as we’ve discussed above. Objective measurement is impossible, because what is relevant to one person may be irrelevant to the next. The field of IQ (information quality) may provide us some clues regarding Google’s approach. IQ is a form of research in systems information management that deals specifically with information quality.
Here are some of the metrics they use:
- Authority- Authority refers to the expertise or recognized official status of a source. Consider the reputation of the author and publisher. When working with legal or government information, consider whether the source is the official provider of the information.
- Scope of coverage – Scope of coverage refers to the extent to which a source explores a topic. Consider time periods, geography or jurisdiction and coverage of related or narrower topics.
- Composition and Organization- Composition and Organization has to do with the ability of the information source to present it’s particular message in a coherent, logically sequential manner.
- Objectivity – Objectivity is the bias or opinion expressed when a writer interprets or analyze facts. Consider the use of persuasive language, the source’s presentation of other viewpoints, it’s reason for providing the information and advertising.
- Validity – Validity of some information has to do with the degree of obvious truthfulness which the information carries
- Uniqueness – As much as ‘uniqueness’ of a given piece of information is intuitive in meaning, it also significantly implies not only the originating point of the information but also the manner in which it is presented and thus the perception which it conjures. The essence of any piece of information we process consists to a large extent of those two elements.
- Timeliness – Timeliness refers to information that is current at the time of publication. Consider publication, creation and revision dates.
Any of this sound familiar? It should, as the search landscape is rife with this terminology. This is not to say Google look at all these aspects, but they have used similar concepts, starting with PageRank.
As conventional SEO wisdom goes, Google may have tried to solve the relevancy problem partly by focusing on authority, on the premise that a trusted authority must publish trusted content, so the pages of a domain with a high degree of authority receive a boost over those with lower authority levels. But this situation may not last, as some trusted sources, in terms of having authority, do, at times, publish auto-gen garbage content. Google may well start looking at composition metrics, if they aren’t doing so already.
This is speculation, of course.
I think a good rule of thumb, for the time being, should be “will this page pass human inspection?”. If it looks like junk to a human reviewer in terms of organization, and reads like junk in terms of composition, it probably is junk, and Google will likely feed such information back into their algorithms. Check out Google’s Quality Rater Document from 2007 which should give you a feel for Google’s editorial policy.
More: continued here