Data Analysis social media

Capturing Tweets with R

Its been a labor of love trying to capture tweets for analysis via R, for starters R is an open source statistical computing software and perhaps best in its league. Leveraging its prowess in analyzing textual data and in particular tweets is a thing to marvel at, so let’s get on how to setup R to capture and analyze tweets. First, register an app with Twitter, it’s pretty simple. Go to https://dev.twitter.com/ and create a new application, of course you have to have Twitter account, if that’s the case you should be seeing the page below.

Create An Application

Good, you are doing well. Key in the details, you can guess the website as well as the Callback URL, they aren’t important at the moment. Next, after successful creation of the app click on My Application and choose you app. This where it starts to get interesting, you should be able to view details of your app with the important facts being below the title OAuth SettingsFor my app, these are the details.

Access level Read-only

Consumer key 8mzRs9PySHKmTcvXBcy5w
Consumer secret ZKNBKniG4ADfyk3tHCWQsj0wowapFpXhqoj8O4OnQQ
Request token URL https://api.twitter.com/oauth/request_token
Authorize URL https://api.twitter.com/oauth/authorize
Access token URL https://api.twitter.com/oauth/access_token
Callback URL None
Sign in with Twitter No

Now you got the feeling you are doing something, time to fire up R. The twitteR package contains the functions for querying Twitter servers, you proceed on by downloading the package (install.packages(“twitteR”)) or if you already have it installed simply call it via require(twitteR) function. Some versions of the package miss the dependency packages ROauth and RCurl, if that’s the case you’ll have to download the packages independently (install.packages(ROauth) and install.packages(RCurl)). Proceed with providing the authentication details provided in your app, keep in my the ROauth package has the authentication function and if it isn’t working the process would halt. Use the code shown below:

code

That’s pretty much standard code, note the consumer key and consumer secret key should be the one provided for you app (I’m using mine). The next part is the tricky section and gave me some headache. After providing the credentials via the OAuthFactory$new() function a system handshake has to be initiated between you app and Twitter server. A handshake in computing is a prior communication between two systems that sets the rules of communication, in this case it is implemented by digital certificate (SSL certificates) sent from Twitter server acknowledging the app and setting type of information to be communicated. The straight and faster way to go round this is to first download the certificate with the R code:

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

Then proceed with the code below to initiate the handshake.

twitCred$handshake(cainfo="cacert.pem")

Success at this stage should request for a pin in a textual format, copy paste the URL.

To enable the connection, please direct your web browser to:
https://api.twitter.com/oauth/authorize?oauth_token=kxzyNUke8nBprcClN4BTipXqgWKKn27Xf7We1qPJZE
When complete, record the PIN given to you and provide it here:

This is what I got, punch back the PIN on the prompt in the above reply.

Ingokho PIN

We are almost there, next is to register the credentials, the function returns TRUE when all is well.

registerTwitterOAuth(twitCred)

At this point I felt home is just a stone throw away only to be hit with the an error while trying to use the search Twitter function.

[1] "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
Error in twInterfaceObj$doAPICall(cmd, params, "GET", ...) : 
  Error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

If this happened to you too do not worry I have the antidote. Set the SSL globally using the code below.

library(RCurl) 

# Set SSL certs globally
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

Horay!!! we’ve done it. Now we have the power to search, collect and analyze tweets. How about we do something interesting? Let’s download tweets from a trending topic and graph who are the highest contributors to the topic. Download and install the ggplot2 and plyr packages, using the searchTwitter() function in twitteR package I captured 1000 tweets from the trending topic #TvYa13Million and graphed it. Here is the code that did all the magic: [PS: I didn’t exit my R sessions, this code is a continuation of the above]

require(ggplot2)
library(plyr)

TvTweets = searchTwitter("#TvYa13Million",n=1000)
users <- ldply(TvTweets,function(x) return(x$screenName))
ggplot(users,aes(x=V1))+geom_histogram()+theme(axis.text.x = element_text(angle = 45, hjust = 1))+ylab("Count of tweets using #TvYa13Million")+xlab("Twitter handle")

And there you go:TvYa13Million

Thank you for reading, you now have the power to capture tweets and everything pertaining to analysis.

Adios!!!

Advertisements

2 comments

  1. Hi Chris,
    I am facing an issue in the final handshake step
    Cred$handshake(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”))
    I get the below error, could you pls help?
    Error in function (type, msg, asError = TRUE) : couldn’t connect to host

    I have installed all the necassary packages tm,RTextTools,topicmodels,twitteR,httr,RCurl,ROAuth. Also the SSL certificate was downloaded successfully, so I am not sure with the error.

    Like

  2. Hi Vivek,

    I think you are missing this part of the code before running your code:

    library(RCurl)

    # Set SSL certs globally
    options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))

    Also don’t download the cacert file while initiating system handshake. Use the code below:

    download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)
    twitCred$handshake(cainfo=”cacert.pem”)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s