Over the past year or 2 there have been lots of changes with Google pushing vertical integration, but outside of localization and verticalization, core relevancy algorithms (especially in terms of spam fighting) haven’t changed too much recently. There have been a fewtricky bits, but when you consider how much more powerful Google has grown, their approach to core search hasn’t been as adversarial as it was a few years back (outside of pushing more self promotion).

There has been some speculation as to why Google has toned down their manual intervention, including:

  • anti-trust concerns as Google steps up vertically driven self-promotion (and an endless well of funding for anyone with complaints, courtesyMicrosoft)
  • a desire to create more automated solutions as the web scales up
  • spending significant resources fighting site hacking (the “bigger fish to fry” theory)

Matt Cutts recently made a blog post on the official Google blog, which highlighted that indeed #3 was a big issue:

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.

It sounds like Google was mainly focused on fighting hacked sites and auto-generated & copied content. And now that hacked *GOVERNMENT* websites are available for purchase for a few hundred Dollars (and perhaps millions in personal risk when a government comes after you) it seems like Google’s pushing toward fighting off site hacking was a smart move! Further, there are a wide array of start ups built around leveraging the “domain authority” bias in Google’s algorithm, which certainly means that looking more at page by page metrics was a needed strategy to evolve relevancy. And with page-by-page metrics it will allow Google to filter out the cruddy parts of good sites without killing off the whole site.

As Google has tackled many of the hard core auto-generated spam issues it allows them to ramp up their focus on more vanilla spam. Due to a rash of complaints (typically from web publishers & SEO folks) content mills are now a front and center issue:

As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception.

Demand Media (DMD) is set to go public next week, and Richard Rosenblatt has a long history of timing market tops (see iMall or MySpace).

But what sort of sites are the content mills that Google is going to ramp up action on?

The tricky part with vanilla spam is the subjective nature of it. End users (particularly those who are not web publishers & online advertisers) might not complain much about sites like eHow because they are aesthetically pleasing & well formatted for easy consumption. The content might be at a low level, but maybe Google is willing to let a few of the bigger players slide.

If you recall the Mayday update, Richard Rosenblatt said that increased their web traffic. And Google’s October 22nd algorithm change last year saw many smaller websites careen into oblivion, only to re-appear on November 9th. That update did not particularly harm sites like eHow.

However, in a Hacker News thread about Matt’s recent blog post he did state that they have taken action against Mahalo: “Google has taken action on Mahalo before and has removed plenty of pages from Mahalo that violated our guidelines in the past. Just because we tend not to discuss specific companies doesn’t mean that we’ve given them any sort of free pass.”

My guess is that sites that took a swan dive in the October 23rd timeframe might expect to fall off the cliff once more. Where subject search relevancy gets hard is that issues rise and fall like ocean waves crashing ashore. Issues that get fixed eventually create opportunities for other problems to fester. And after an issue has been fixed long enough it becomes a non-issue to the point of being a promoted best practice, at least for a while.

Anyone who sees opportunity as permanently disappearing from search is looking at a half-empty glass rather than one which sees opportunities that died reborn again and again.

That said, I view Matt’s blog post as a bit of a warning shot. What types of sites do you think he is coming after? What types of sites do you see benefiting from such changes? Discuss. 🙂

More: continued here