Its been a labor of love trying to capture tweets for analysis via R, for starters R is an open source statistical computing software and perhaps best in its league. Leveraging its prowess in analyzing textual data and in particular tweets is a thing to marvel at, so let’s get on how to setup R to capture and analyze tweets. First, register an app with Twitter, it’s pretty simple. Go to https://dev.twitter.com/ and create a new application, of course you have to have Twitter account, if that’s the case you should be seeing the page below.
Good, you are doing well. Key in the details, you can guess the website as well as the Callback URL, they aren’t important at the moment. Next, after successful creation of the app click on My Application and choose you app. This where it starts to get interesting, you should be able to view details of your app with the important facts being below the title OAuth Settings. For my app, these are the details.
Now you got the feeling you are doing something, time to fire up R. The twitteR package contains the functions for querying Twitter servers, you proceed on by downloading the package (install.packages(“twitteR”)) or if you already have it installed simply call it via require(twitteR) function. Some versions of the package miss the dependency packages ROauth and RCurl, if that’s the case you’ll have to download the packages independently (install.packages(ROauth) and install.packages(RCurl)). Proceed with providing the authentication details provided in your app, keep in my the ROauth package has the authentication function and if it isn’t working the process would halt. Use the code shown below:
That’s pretty much standard code, note the consumer key and consumer secret key should be the one provided for you app (I’m using mine). The next part is the tricky section and gave me some headache. After providing the credentials via the OAuthFactory$new() function a system handshake has to be initiated between you app and Twitter server. A handshake in computing is a prior communication between two systems that sets the rules of communication, in this case it is implemented by digital certificate (SSL certificates) sent from Twitter server acknowledging the app and setting type of information to be communicated. The straight and faster way to go round this is to first download the certificate with the R code:
Then proceed with the code below to initiate the handshake.
Success at this stage should request for a pin in a textual format, copy paste the URL.
To enable the connection, please direct your web browser to:
When complete, record the PIN given to you and provide it here:
This is what I got, punch back the PIN on the prompt in the above reply.
We are almost there, next is to register the credentials, the function returns TRUE when all is well.
At this point I felt home is just a stone throw away only to be hit with the an error while trying to use the search Twitter function.
 "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
Error in twInterfaceObj$doAPICall(cmd, params, "GET", ...) :
Error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
If this happened to you too do not worry I have the antidote. Set the SSL globally using the code below.
# Set SSL certs globally
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
Horay!!! we’ve done it. Now we have the power to search, collect and analyze tweets. How about we do something interesting? Let’s download tweets from a trending topic and graph who are the highest contributors to the topic. Download and install the ggplot2 and plyr packages, using the searchTwitter() function in twitteR package I captured 1000 tweets from the trending topic #TvYa13Million and graphed it. Here is the code that did all the magic: [PS: I didn’t exit my R sessions, this code is a continuation of the above]
TvTweets = searchTwitter("#TvYa13Million",n=1000)
users <- ldply(TvTweets,function(x) return(x$screenName))
ggplot(users,aes(x=V1))+geom_histogram()+theme(axis.text.x = element_text(angle = 45, hjust = 1))+ylab("Count of tweets using #TvYa13Million")+xlab("Twitter handle")
And there you go:
Thank you for reading, you now have the power to capture tweets and everything pertaining to analysis.