Posted by Dr. Pete
Last week, when the SEO world was distracted by revelations that Google was blocking keyword referral data and nostalgic mania over MC Hammer’s search engine, Search Engine Land released a leaked Google document outlining Google’s official guidelines for quality raters. I read the 125-page document out of curiosity, and I decided to share some valuable insights it contains into the mind of Google.
Sorry, No Secrets Here
If you’re looking for SEO “secrets,” you’ll be disappointed by this post. Although this is an internal document, and Google may not be happy about it being leaked, you won’t find a smoking gun here. What you will find is a training manual on Google’s philosophy of quality. The key to proactive SEO is to understand how Google thinks. If you only chase the algorithm, you’ll always be reacting to changes after they happen. Since the document in question is proprietary, I’m not going to link directly to copies of the document or quote large chunks of it. I’m writing this post because I sincerely believe that understanding Google’s philosophy of quality is a fundamentally “white hat” proposition.
What Is A Quality Rater?
Quality raters are Google’s fact checkers – the people who work to make sure the algorithm is doing what it’s supposed to do. Data from quality raters not only serves as quality control on existing SERPs, but it helps validate potential algorithm changes. When you consider that Google tested over 13,000 algorithm changes last year, it’s a pretty important job.
This particular document focuses on rating SERP quality based on specific queries. Essentially, a rater reviews the sites returned by a given query and evaluates each result based on relevance. Raters also flag sites that they consider to be spam. One last note: Google’s philosophy is not always reflected in the algorithm. The algorithm is an attempt to code quality into rules, and that attempt will always be imperfect. The document, for example, says almost nothing about back-link count, unique linking domains, linking C-blocks, etc. Those are all metrics that attempt to quantify relevance.
Here are 16 insights into the human side of Google’s quality equation, in no particular order…
(1) Relevance Is A Continuum
I think the biggest revelation of the document, in a broad sense, is that Google’s view of relevance is fairly sophisticated and nuanced. Raters are instructed to rate relevance along a continuum with 5 options: “Vital”, “Useful”, “Relevant”, “Slightly Relevant”, and “Off-topic”. Of course, there is always a certain amount of subjectivity to ratings, but Google provides many examples and detailed guidelines.
(2) Relevance & Spam Are Independent
Relevance is a rating, but spam is a flag. So, in Google’s view, a site can be useful but spammy, or it can be irrelevant but still spam-free. I think we see some of that philosophy in the algorithm. Content is relevant or irrelevant, but spam is about tactics and intent.
(3) The Most Likely Intent Rules
Some queries are ambiguous – “apple”, for example, can mean a lot of things without any context. Google instructs raters to, in most cases, use the dominant interpretation. What’s interesting is that their dominant interpretations often seem to favor big brands. In specific examples, the dominant interpretation of “apple” is Apple Computers and the dominant interpretation of “kayak” is the travel site Kayak.com.
Other interpretations (like “apple” the fruit or “kayak” the mode of transportation) automatically get lower relevance ratings if there’s a dominant interpretation. I think the notion of a dominant interpretation makes some sense, and it may be necessary for a rater to do their job, but it’s also highly subjective. In some cases, I just didn’t agree with Google’s examples, and I felt that the dominant interpretation unfairly penalized legitimate sites. Most people may want to buy an iPad when they type “apple”, but a site that specializes in online organic apple sales is still highly relevant to the ambiguous query, in my opinion.
(4) Some Results Are “Vital”
The “Vital” relevance rating is a special case. Any official entity – a company, an actor/actress, a politician, etc., can have a vital result. In most cases, this is their official home-page. Only a dominant interpretation can be vital – Apple Vacations will never be the vital result for “apple” (sorry, Apple Vacations; I don’t make the rules). I suspect this is a safety valve for checking the algorithm – if “vital” results don’t appear for entity searches, many people would question Google’s results, even if the SEO efforts of those entities don’t measure up.
Social profiles can also be vital, if those profiles are for individuals or small groups. So, a politician, actress or rock band could have multiple “vital” pages (their home-page, their Facebook page, and their Twitter profile, for example). Interestingly, Google specifically instructs that social media profiles for companies cannot be considered vital.
(5) Generic Queries Are Never Vital
Obviously, Walmart.com is a vital result for the query “walmart”, but Couches.com is not a vital result for the query “couches”. An exact-match domain doesn’t automatically make something vital, and some queries are inherently generic.
(6) Queries Come in 3 Flavors
Query intent can be classified, according to Google, as Action (“Do”), Information (“Know”) or Navigation (“Go”). Like ice cream, queries can come in more than one flavor (although Neapolitan ice cream should never substitute banana for vanilla). This Do/Know/Go model comes up a lot in the document and is a pretty useful structure for understanding search in general. Relevance is determined by intent – if a query is clearly action-oriented (e.g. “buy computer”), then only an Action (”Do”) result can be highly relevant.
(7) Useful Goes Beyond Relevance
This is wildly open to interpretation, but Google says that “useful” pages (the top rating below “vital”) should be more than just relevant – they should also be highly satisfying, authoritative, entertaining, and/or recent. This is left to the rater’s discretion, and no site has to meet all of these criteria, but it’s worth nothing that relevance alone isn’t always enough to get the top ratings.
(8) Relevance Implies Language Match
If a search result clearly doesn’t match the target language of the query, then in most cases that result is low-relevance. Likewise, if a query includes or implies a specific country, and the result doesn’t match that country, the result isn’t relevant.
(9) Local Intent Can Be Automatic
Even if a query is generic, it can imply local intent. Google gives the example of “ice rink” – a query for “ice rink” should return local results, and clearly non-local results should be rated as off-topic or useless. This applies whether or not the location is in the query. Again, expect Google to infer intent more and more, and local intent is becoming increasingly important to them.
(10) Landing Page Specificity Matters
A good landing page will fit the specificity of the query. A detailed product page, for example, is a better match to a long-tail query for a specific item. On the other hand, if the query is broad, then a broader resource may be more relevant. For example, if the query is “chicken recipes”, then a page with only one recipe isn’t as relevant as a list of recipes.
(11) Misspellings Are Rated By Intent
If a query is clearly misspelled, the relevance of the results should be based on the user’s most likely intent. In the old days, targeting misspellings was a common SEO practice, but I think we’re seeing more and more that Google will automatically push searchers toward the proper spelling. It’s likely Google is only going to get more aggressive about trying to determine intent and even pushing users toward the dominant intent.
(12) Copied Content Can Be Relevant
This may come as a surprise in a Post-Panda world, but Google officially recognizes that copied content isn’t automatically low quality, as long as it’s well-organized, useful, and isn’t just designed to drive ad views. Again, this is a bit subjective, and it’s clear that you have to add value somehow. A site with nothing but copied content (whether legitimately syndicated or scraped) isn’t going to gain high marks, and a site that’s only using copied content to wrap ads around it is going to be flagged as spam.
(13) Some Queries Don’t Need Defining
Dictionary or encyclopedia pages are only useful if a query generally merits definition or more information. If most users understand the meaning of the query word(s) – Google gives the example of “bank” – then a dictionary or encyclopedia page is not considered useful. Of course, tell that to Wikipedia.
(14) Ads Without Value Are Spam
One quote stood out in the document – “If a page exists only to make money, the page is spam.” Now, some business owners will object, saying that most sites exist to make money, in some form. When Google says “only to make money”, they seem to be saying money-making without content value. It’s ok to make money and have ads on your page, as long as you have content value to back it up. If you’ve just built a portal to collect cash, then you’re a spammer.
(15) Google.com Is Low Relevance
By Google’s standards, an empty search box with no results displayed is off-topic or useless. Ironic, isn’t it? Joking aside, the document does suggest that internal search results pages can be relevant and useful in some cases.
(16) Google Raters Use Firefox
I said no secrets, but I guess this is a little bit of inside information. Google raters are instructed to use Firefox, along with the web developer add-on. Do with that as you will.
Knowing Is 53.9% of The Battle*
So, there you go – 16 insights into the mind of Google. Advanced SEO, in my opinion, really comes down to understanding how Google thinks, and how they translate their values and objectives into code. You can lose a lot of time and money only making changes when you’ve lost ranking – really understanding the mind of Google is the best way to future-proof your SEO efforts.
*I always wondered what the other half was – blowing stuff up, apparently.