Get bulk social network counts for thousands of URLs: free & (almost) no programming

how to crawl a website, fetch social data for every URL, export to Excel. And best of all: Do It Yourself, do it for free, and do it with (almost) no programming.


Let's assume you want to know the social sharing data of a large number of URLs. Think about stuff like the amount of Tweets, FaceBook Likes, Reddits, Stumbles, Pins, Delicious bookmarks, Google +1s, etc. You could for instance want to analyze which networks work best for your site, or even sections of your site (or your competitors' site). Useful intel, i'd say.

DIY: 6 steps to social analytics heaven

There are several ways to fetch this data: from paid services that do it for you to Google spreadsheets. In this article however, I will show you an additional way to easily fetch all this data yourself, for free, and in a reliable and easy to modify way. It's an added bonus that you actually know how to obtain this kind of data. In my book that is a big deal. So let's get started.

Step 1: get a list of URLs

The first step is to actually get the list of URLs you want the social data for. There are several ways to obtain this, depending on what you want to analyze. To stay with my example: if I needed all URLs of a domain, I could use a spider like ScreamingFrog (paid), Xenu (free, Windows) or Integrity (free, Mac). In my example, I used ScreamingFrog to crawl 5000 URLs of State of Search, of which 3870 were HTML pages:

ScreamingFrog interface: spidering StateofSearch.com

Step 2: export the URLs

The next step is to export the URLs to .csv so I can use it for further analysis.

Step 3: create a project in Google/Open Refine

I love this tool. Google Refine (soon to be named 'Open Refine') is an open source tool that works with 'messy data'. You can do all kinds of really cool stuff with datasets to make them better, for example to enrich the data by fetching remote data from APIs with your cells' contents as input. But first things first: create a project by importing the export we created earlier:

Creating a project in Refine: import a .csv

Refine imports the .csv and will show you a preview. Make sure that Refine reads the import correctly. Think about stuff like how to separate the columns (in this case: choose 'comma'), how to character encode the data ('UTF-8') and how to name the columns (usually take the names from the first row of the .csv):

Set the correct options in Refine

If you set it up correctly, your preview screen will now show something like this:

Refine auto-refreshes previews before you start the import

Give your project a name, and submit the 'Create project' button. Forward! Your project is created and (kinda) looks like an Excel sheet. Now we have imported the URLs, but of course we still haven't enriched it with our social data. That is the next step.

Step 4: fetch social data for every URL

SharedCount.com provides a cool and free service: you can query them up to 50.000 times a day (update: register and use an apikey for that) and obtain publicly available shares for a number of social networks, and for every URL you throw at it. Cool!

Logo. Find it on http://www.sharedcount.com/

The response for every request is a small json object. If I request this for an example url, e.g. the homepage of stateofsearch.com, I get this result:

Sharedcount data for the homepage of StateofSearch.com

But we need thousands of results. So let's fetch that data!

That's easy. Just click the button at the top of the column that contains all URLs. This will pull out the actions for that column. Choose 'Edit column -> 'Add column by fetching URLs …' (see the Open Refine Wiki for a good introduction about this functionality):

Fetching external API results, with the URL column as input

The URLs we'll be fetching are http://api.sharedcount.com/?url={url}, where {url} is the value of the cell the new column is based on and {apikey} is the apikey you receive if you register. The code we use is:

'http://api.sharedcount.com/?url=' + value + '&apikey={apikey}

Also make sure to set the Throttle delay to fast (the default is to wait 5000 milliseconds for every request):

Settings for Fetching Remote Content

and now, we wait … for about 3870 URLs x multiple social APIs to fetch the results … You'd better grab a cup of coffee (or 15), because now were crunching a lot of requests.

... :(

After a long wait (it's worth it!) we can see the results. A new column is added that contains json objects for every URL:

Results: sharedcount data for StateofSearch.com

Step 5: from a single json to several columns

The last step is to extract the correct values from the json objects, column by column, social network by social network. For that we select 'Edit column' -> 'Add column based on this column …':

Extract the json results

and after that, use the parseJson() function to extract a value, for example for Twitter:

value.parseJson()["Twitter"]

Just end the function with the name of the key you want the value from. In the case of multidimensional arrays, simply iterate through, for example:

value.parseJson()["Facebook"]["total_count"]

to fetch the 'total_count' value within the Facebook array.

and for Twitter:

Anyway, rinse and repeat. To help you, here are the commands I used all lined up:

value.parseJson()["StumbleUpon"]
value.parseJson()["Reddit"]
value.parseJson()["Facebook"]["total_count"]
value.parseJson()["Delicious"]
value.parseJson()["GooglePlusOne"]
value.parseJson()['Buzz']
value.parseJson()["Twitter"]
value.parseJson()["LinkedIn"]

Step 6: export to Excel for further analysis

The results are in, so now we can export it for further analysis … done! Now the fun stuff starts ... :)

Export to Excel for further fun ... :)

Make it so!

So you see that it's actually quite easy to obtain data from these APIs, without you having to code a parser, save it to a database, validate everything etc etc. You can do this yourself, in a reliable way, with free software. Cool huh ;)


Care to respond?

Thank you! You can use your fave social network:

Or respond to me personally. For example by calling me: +31626414088, via Skype, or .(JavaScript must be enabled to view this email address).