As part of their 10th birthday celebrations, Google recently released a 2001 index, to show us how much things have changed.
It is fascinating to look into the past, especially from an SEO point of view. Has the nature of spam changed since 2001? How has Google changed in order to nullify the affects of spam?
When Google filed their registration statement prior to IPO, Google identified a number of risk factors.
One of these risks was:
We are susceptible to index spammers who could harm the integrity of our web search results
There is an ongoing and increasing effort by “index spammers” to develop ways to manipulate our web search results. For example, because our web search technology ranks a web page’s relevance based in part on the importance of the web sites that link to it, people have attempted to link a group of web sites together to manipulate web search results. We take this problem very seriously because providing relevant information to users is critical to our success. If our efforts to combat these and other types of index spamming are unsuccessful, our reputation for delivering relevant information could be diminished. This could result in a decline in user traffic, which would damage our business.”
Curious how Google conflates spamming with relevance, eh. While it could be true that manipulating rank could lead to lower relevance, that isn’t a given. The manipulation could, after all, produce relevant results. “Relevant” being a subjective judgement made by the user.
What Google are really getting at is the type of manipulation that leads to less relevant results, commonly referred to as search engine spam. In this respect, what has changed since 2001?
Has Search Spam Been Defeated?
Or, to put another way, what changes have Google made to reduce the business risk of non-relevant search results?
Compare the following examples with the results we see today:
Now try searching on those two phrases in today’s index. How many differences can you spot? How have the result sets changes? Are they less “spammy”?
Here are a few aspects I noticed:
- The search results are much tighter and much more well policed. You wouldn’t find the penis-envy.com site’s link exchange page ranking in Google’s 2008 search results for Paxil search queries.
- Google used to match keyword strings a lot more than it does today. This is the reason why a lot of on-page optimization techniques have become redundant, and the reason why effective on page optimization in 2008 is more about diversity than repeating words.
- Blogs have came from an obscure force to category leaders in many markets.
- If you happen to be searching outside the US, Google now incorporates, and boosts, regional results.
- Google now incorporates YouTube, news, and other related informational sources, thus forcing results from smaller sites further down the page
- There used to be a lot more hyphenated domain names showing up top ten. Not so much these days.
- Wikipedia, then called Nupedia, had only just started in 2001, so wasn’t yet appearing in every single search result 😉
When Google first emerged, algorithmic search was in real danger of becoming unusable. Engines like Alta Vista were losing the war against spammers, and result sets were becoming increasingly irrelevant. Sergey Brin once declared that it wasn’t possible to spam Google. When Google came along, they had defeated spam forever using a clever link analysis algorithm. No more spam!
Well, not really.
Spam hasn’t gone away. But it is fair to say that Google is doing a pretty good job of maintaining relevance, and in many cases, eliminating the worst forms of spam. For example, it is now uncommon to see the type of deceptive redirects that were common in 1997, whereby if you clicked on a link, you were led you to a site that was unrelated to the link text.
We’ve seen the rise of the authoritative domain, and the relegation of the influence of many smaller sites. Pages hosted on authoritative domains are more likely to rank higher than pages on sites that haven’t established authority. This has, in turn, led to a different type of spam. People hack into authoritative sites in order to place their links, or entire pages, on these domains. Wikipedia has an ongoing battle to keep their pages free from “commercial imperatives”.
The target has, in many ways, shifted down a level.
Since 2001, Google has incorporated verticals.
In this article, Danny Sullivan outlined the use of “invisible tags” in the delivery of search results.
“The solution I see coming is something I call “invisible tabs.” Quietly, behind the scenes, search engines will automatically push the correct tab for your query and retrieve specialized search results. This should ultimately prove an improvement over the situation now, where you’re handed 10 or 20 matching web pages.”
Result sets have increasingly become query dependent, as if you’d pre-selected a topic tab. For example, if your query is determined to have an informational intent, you’re unlikely to receive a commercially oriented result set. It is has become a lot more difficult to get off-topic listings – which in this specific case would be commercial pages – into such result sets.
We’ve also seen the structure of search results pages change markedly. We see images, videos, news, related searches, sub pages, onebox results boxes, personalized results, desktop results, and Adwords. This leaves less and less room for other types of pages, as the search results orient more heavily around a wider variety of data types.
However, in the end, the SERP is still just a list, that looks much like the old list. What will search, and search spam, look like in another tens years?
Over $10 billion dollars are chasing paid search each year, and that figure will surely grow as media spend increasingly shifts online. There is still a strong incentive to use all means necessary to get to the top of the list.
Google will, of course, continue to try and counter this threat to their business model. The PageRank has likely been changed considerably to when it was first published. Google is likely to continue to incorporate usage metrics, making it more and more difficult for less relevant pages to gain a foothold.
On the flip side, will search be important as it is now? There appears to be a trend for more information to be pushed our way, rather than going out and finding it ourselves. RSS, recommendation engines (Amazon, YouTube, et al), community models (Facebook), and more. Will our surfing habits be (voluntarily) monitored, and answers provided before we we’re even aware of the question? We’re already seeing the early stages of this with contextual Adwords in Gmail. These changes will, in turn, give rise to a new breed of spam. While the commercial incentive remains, there will always be a level of spam.
The game of cat and mouse continues…
The Google 2001 Search Index is a Great SEO Tool
Having a glimpse of the past reminds us of how things changes, which might help us think of why they changed and how they may change going forward.
The 2001 index provides for a great tool to show past popular SEO techniques that have become irrelevant, which is useful when the boss uncovers an old spammy strategy that they feel you must follow to succeed. It not only helps us inform employers, but also allows us to talk about and highlight overt forms of spam without the worry of “outing” a page that is currently ranking.
More: continued here