Almost two years ago, Safaricom Ltd extended the scratch card code from 12 digits to 16 in-order to increase the computational time required to break the code thereby making them more secure. However, system theory acknowledges that systems expose their weaknesses at points of change. I set to find out if the move to higher dimensionality introduced a weakness in the scratch card hidden reload number. To begin the analysis I formulated the following assumptions to guide me in the process.
- The grouping of the hidden reload number into four digits does not reveal the mechanics of the number generator.
- Increasing the hidden reload number by a factor of four digits provides more data for statistical analysis.
- The hidden reload numbers are separated into groups of four digits only for the purposes of ease of reading.
- The hidden reload number represents a 16 digit number generated by a random number generator.
With the assumptions in place I set to curate the data set, my collection of scratch cards came in handy (448 in number). In understanding each digit has relevance with its position, I created a data set with 16 variables each holding the positional value of the digits as shown below with an additional column of sum of the digits.
Now, here is where everything get’s interesting, mapping the sum of digits produces a near perfect normal distribution as shown below. According to the Central Limit Theorem, the sum of n independent and identically distributed random variables tend to be normally distributed as n becomes sufficiently large. In layman language, it simply means we have proved that the digits are indeed randomly generated which confirms my third and fourth assumptions.
Next, I asked the question, what if within the digits there is a pair that is linearly or otherwise dependent. So, I set my favorite software WEKA to find any rules within the data in a process known as association mining. Running the apriori algorithm with default settings produced results shown below:
From the results I knew I was onto something, there is a relation between the third and sixth digit with a confidence interval of 1 (meaning the rule always works). To better understand the relation I loaded the dataset to R statistical analysis software and used the plot() function to visually inspect the relation between the two variables. The diagram below made me go Bazinga! It is a linear equation.
If X > 0, Y=X-1, otherwise Y = 9. Simply put, if the third number in the scratch card is greater than 0 then the sixth number is the third number minus one, but if the third number is 0 then the sixth number is 9. Pick up any card and test the formula, in a cryptanalytic sense, I’ve broken part of the code used to generate the hidden reload number of Safaricom scratch cards.
Download the dataset here https://www.dropbox.com/s/dvkpoq35u9bmy2t/Hidden.csv?dl=0