How to Find Link-Worthy Data

Posted by MarkJohnstone

You might be a little tired of hearing ‘content is king’.  And it’s increasingly difficult to make content stand out online.  But a few sites are leading the way with their innovative use of data.  There’s the Guardian Datablog, Information is Beautiful and the ubiquitous OK Trends to name but a few.

But sites like these are still in the minority.  So there’s ample opportunity to turn data into links.  But first you need to know…

How To Get Your Hands On Some Tasty Data

data cake

There’s data practically everywhere.  There are tonnes of different sources you can use.

APIs and Scraping

If you’ve got some developer resource available, you can pull data from a shed-load of APIs all over the web.  Mining Twitter and Facebook are obviously popular, but there are lots of other opportunities.

Programmable Web has a massive list of APIs you can tap into.  Speaking at a recent Distilled conference (Boston ProSEO), Dharmesh Shah suggested signing up to the RSS feed from Programmable Web – not because you need to know everything that’s coming out, but for the ideas it will trigger as you go along.  It can save you bucket-loads of time if you’re able to pluck out an idea from a while back that ‘ll work perfectly for a new project.

If there’s no API, scraping is always an option.  And even if the API is available, scraping can be preferable for doing things on the fly, and for the less technically-able (like me).  There are a couple of great resources that have already been written on this – check out the following:

And if you’re really getting into scraping, you should also check out ScraperWiki.  You can find out more about ScraperWiki here and here – especially for those who don’t code.

Surveys

This is a pretty simple one really.  You can create surveys using Mechanical Turk in the same way as Will’s Panda questionnaire.

If you’re using Mechanical Turk, there are some challenges you should be aware of with regards to statistical significance, i.e. are the people doing work via Mechanical Turk really representative of the intended population?  But these kinds of objections can often be worked around by being upfront about where your data has come from.  Don’t try to bury your sources – if people can’t find them, they won’t trust you.  And if they have to seriously dig to get them, somebody will oust you.  Put them up front.  Be very transparent.

The beauty of using survey data is you can ask exactly what you want to ask.  There can be nothing more frustrating than having a great idea, and searching for hours to find a dataset to support it, only to abandon the project.

Open Data

This is a huge one.  Open data is a very hot topic, with more and more governments succumbing to pressure to open up their data.  As an example of how you can use open data, the following graphic by 97th Floor was created using a publicly available data source.  And Open Site Explorer shows 203 root domains linking to the page on which it appears (!).

where does the money go

Rather than searching through various government datasets, the Guardian Datablog have a search engine that allows you to search all of the open data sources from around the world.  And they’re continually adding to it as more and more countries open up their data.

For other publicly available datasets, the following sites have some fairly extensive lists:

Academic Papers

In a similar vein to open data, academic papers and journals can be a great source of valuable information.  The problem with academic papers is they aren’t written for the public.  They’re buried in the depths of the web and barely anyone outside academia reads them.  They tend to be very dry and completely inaccessible.  But they often contain really valuable content.  You just need to turn them into something appealing and easy to understand.

You’re not necessarily being rewarded for being the source of the information, but for digging it out and turning it into a much more consumable and enjoyable format.  If might take a bit of effort, but that’s where you’re adding the value.

Another great thing about these papers and journals is they’ve been properly researched in an academic fashion.  And you’re quoting very respected sources, which will give your content added weight.  Nothing like quoting a few .edus to add some gravitas.

To find academic journals, try Google Scholar or SpringerLink.

Google

One massively overlooked data source – especially by SEOs – is our old friend Google.  As well as providing lots of tools to process data, they’re a useful source as well.  For starters, they have this list of data sources you can explore.  But there’s also the headsmackingly obvious – Google Insights and Google Keyword Tool.

Yes, I’m serious.  Although we’re in a niche where everybody knows about them, the majority of the public still have no idea you can see behind Google and find out what everyone’s searching for and what the trends are.  When I first showed it to some of my friends, they were genuinely amazed.  

There could be some really easy wins you could make without much effort at all.  For examples of simple things you could do, check out these 2 posts by David McCandless.  You could easily do a quick and dirty press release on online trends that could get some decent coverage.

google insights by david mccandless

Client Data

Client data is ideal but there can be a few difficulties.  The advantage of using client data is you can announce something genuinely new – that wasn’t previously in the public domain.  However, there are a number of things to be aware of when using internal data:

  • Some companies will be reluctant to give you access, mostly due to concerns about competitive intelligence
  • There may be delays in getting the data to you, which can impede your ability to deliver on time
  • The data will often have missing entries and errors, and may even be completely unusable
  • The dataset may be too small to be reliable (especially when you start segmenting)

It’s worth raising the above when you first discuss the possibility of using internal data, so you manage expectations.  If you do end up using the data, you have to be careful you don’t over-state your findings.  As mentioned previously, you should clearly state how you sourced your data, so as not to be misleading.  As long as you do this, you can still create something worthwhile.  It’s still a story – or at least it should be if you’re planning on putting it out there.

Anything I’ve Missed?

So there you have it – you need never be short of data again.  But if there are any major sources of data you think I’ve missed, be sure to add them to the comments below.

Do you like this post? Yes No

This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.