In Twitter’s opt-in geolocation feature, most Kenyans and by extension the world prefers to geotag only 1% of their tweets. This is a great problem for Twitter applications that hope to ride on its location data. While volunteering for #whatisaroad — a project on mapping potholes in Nairobi via Twitter, a problem arose. The Twitters users who were voluntarily sending pothole photos weren’t geotagging their tweets. It made it extremely difficult to map the tweets — a manual reverse geocoding exercise had to be undertaken, which wasn’t always accurate. So we thought, how do we encourage people to geotag their Twitter reports? We included an instruction to geotag a report — but still almost all reports were missing geotags.
With this problem in mind I decided to look into how to encourage people to geotag tweets. In Nudge Theory , a concept brought to prominence by Richard Thaler, he argues that nudges are at least as effective, if not more effective, than direct instruction, legislation, or enforcement. It is a choice architecture that alters people’s behaviour in a predictable way without economic incentives. Can we nudge people to geotag tweets? Let’s find out. To setup the experiment I gathered 3 million tweets from 767 Kenyans on Twitter (KOT). The tweets were divided into two groups each with a random sample to cater for difference in age on twitter, location, number of followers, and gender.
Next, we create a dataset to use in predicting whether a tweet will be geotagged or not. We utilise the content of the tweet as sole datasource and use all unique words as variables. The algorithm of choice is J48, a decision tree classification algorithm perfect for binary classification problems. The diagram below shows the resultant model for the first group.
At the root node of the tree is usually the most import variable in making a decision as measured by the probability theory of mutual information. In our case, the root node is a weird character <e2><80><a6>. At first I thought this is a punctuation mark that slipped the text processing phase, so I decided to check it up. Turns out that’s the unicode representation of the punctuation known as ellipsis (three dots ). The accidental keeping of the punctuation (…) led to a startling discovery.
An ellipsis is punctuation that is used to show where words have been left out. usually formed by three dots. Any tweets that had an ellipsis almost always never got geotagged. Wikipedia describes five major uses of an ellipsis, namely;
- Indicate an unfinished thought
- As a leading statement
- A slight pause
- An echoing voice
- A nervous or awkward silence
Image by Kyle Simpson
When folks want to elicit the above five sentiments, they always forget to geotag their tweets — lets look at some examples.
To check whether the use of the ellipsis as indication for not geolocating tweets I built another prediction model on the other half of the dataset — results are shown below.
We can observe that the ellipsis has been ranked third on the decision tree and still highly predicting non-geolocation of tweets.
Herein is a nudge experiment. Conducting a Randomized Control Trial on Kenyans On Twitter. Create two groups — a control group which will be asked to tweet without using the ellipsis and treatment group that are allowed to use the ellipsis. The aim of the experiment will be to measure whether the control group will have more geotagged tweets than the treatment group. Hence by discouraging the use of the ellipsis a Twitter user can be nugded to geolocate tweets.
Who wants to be part of the experiment?