2nd November Index Update: Our Broadest Index Yet, and New PA/DA Scores are Live

Posted by randfish

Hey gang – it's that magical time again when Linkscape's web index has updated with brand new data (for the second time this month). Open Site Explorer, the Mozbar and the PRO Web App all have new links and scores to check out. This index also features the updated Page Authority and Domain Authority models covered by Matt last week on the blog.

Here's the current index's metrics:

  • 38,295,116,929 (38 billion) URLs
  • 466,742,600 (466 million) Subdomains
  • 125,007,049 (125 million) Root Domains
  • 387,379,700,299 (387 billion) Links
  • Followed vs. Nofollowed

    • 2.03% of all links found were nofollowed
    • 55.57% of nofollowed links are internal, 44.43% are external
  • Rel Canonical – 10.34% of all pages now employ a rel=canonical tag
  • The average page has 70.61 links on it (down 6.67 from last index; we're likely biasing to a different set of webpages with the broader vs. deeper focus of this release)

    • 59.02 internal links on average
    • 11.59 external links on average

As you can see, we're crawling a LOT more root domains – we expect to have data for an extremely high percentage of all the domains that you might find active on the web. However, because of this broader crawl, we're not reaching as deeply into some large domains (some of that is us weeding out crap, including many more millions of binary files, error-producing webpages and other web "junk"). You can see below a chart of the root domains we've crawled in the last 6 months vs. the total URLs in each index.

November Linkscape Update Graph of Root Domains vs. URLs

We work toward a few key metrics to judge our progress on the index:

  • Correlations with Google rankings (not only of PA/DA, but of link counts, linking root domains, mozRank, etc)
  • Percent of successful API requests (meaning a request for link data on a URL from any source that we had link data for)
  • Raw size and freshness (total # of root domains and URLs in the index, though, as Danny Sullivan has pointed out, this may not be a great metric on which to judge a web corpus)

We've gotten better with most of these recently – PA/DA have better correlations, more of your requests (via Open Site Explorer, the Mozbar or any third-party application) now have link data, and we're slowly improving freshness (this index was actually completed last week, but didn't launch due to the Thanksgiving holiday). However, we are not improving as much on raw index size (root domains, yes, which we've seen correlate with other metrics, but raw URL count, no). This will continue to be a focus for us in the months to come, and we're still targeting 100 billion+ URLs as a goal (though we're not willing to sacrifice quality, accuracy or freshness to get there).

As always, if you've got feedback on the new scores, on the link data or anything related to the index, please do let us know. We love to hear from you!

Do you like this post? Yes No

This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.