February Linkscape Update: 66 Billion URLs

Posted by randfish

After some wrestling with Amazon's EC2 and the tragic loss of many hard disks therein, we've finally finished processing and have released the latest Linkscape update (previously scheduled for Feb. 14). This new index is, once again, quite large in comparison to our prior indices, and contains a mix of crawl data going back to the end of last year. In fact, this is technically our largest index ever!

Here are the latest stats:

  • 65,997,728,692 (66 billion) URLs
  • 601,062,802 (601 million) Subdomains
  • 140,281,592 (140 million) Root Domains
  • 739,867,470,316 (740 billion) Links
  • Followed vs. Nofollowed

    • 2.21% of all links found were nofollowed
    • 57.91% of nofollowed links are internal
    • 42.09% are external
  • Rel Canonical – 11.11% of all pages now employ a rel=canonical tag
  • The average page in this index has 71.88 links on it

    • 60.98 internal links on average
    • 10.90 external links on average  

We also ran our correlation metrics against a large set of Google search results and saw very similar data to last round. Here are the latest numbers using mean Spearman correlation coefficients (on a scale of 0 to 1, higher is better):

  • Domain Authority: 0.26
  • Page Authority: 0.37
  • MozRank of a URL: 0.19
  • # of Linking Root Domains to a URL: 0.26

Our evaluation process also check the comprehensiveness of our crawl data against a large set of Google results, and in this index, we've got link data on 82.09% of SERPs. This is slightly down from last month's 82.37%, which we suspect is a result of the late release. Crawl data ages with the web, and new URLs make their way into the SERPs, too. To help visualize our crawl, here's a histogram of when the URLs in this index were seen by Linkscape:

Crawl Historgram for Feb. 28th Index

We always "replace" any older URLs with newer content if we recrawl or see new links to a page, so while there may be some "old, crusty" stuff from December, the vast majority of this index was crawled in mid-to-late January.

In the next few weeks, we're working on a new, experimental index that may be massively larger (2-3X) this one, and closer to what's in Google's main index at scale. This is very exciting for us and we hope, for all of you who use Open Site Explorer, the Mozbar, the Linkscape API and tools from our partners like Hubspot, Conductor, Brightedge and our newest API partner, Ginza Metrics (check out some cool stuff they're doing with Moz data here and in the screenshot below).

Ginza Metrics Backlink Tool
Ginza Metrics' New Backlink Analysis Tool

If you're interested in chatting about using Moz data in products, drop Andrew Dumont a line and he'll be happy to help. And, as always, feedback on this latest index, our tools or metrics are greatly appreciated.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.