Almost two years ago, Safaricom Ltd extended the scratch card code from 12 digits to 16 in-order to increase the computational time required to break the code thereby making them more secure. However, system theory acknowledges that systems expose their weaknesses at points of change. I set to find out if the move to higher dimensionality introduced a weakness in the scratch card hidden reload number. To begin the analysis I formulated the following assumptions to guide me in the process.

- The grouping of the hidden reload number into four digits does not reveal the mechanics of the number generator.
- Increasing the hidden reload number by a factor of four digits provides more data for statistical analysis.
- The hidden reload numbers are separated into groups of four digits only for the purposes of ease of reading.
- The hidden reload number represents a 16 digit number generated by a random number generator.

With the assumptions in place I set to curate the data set, my collection of scratch cards came in handy *(448 in number).* In understanding each digit has relevance with its position, I created a data set with 16 variables each holding the positional value of the digits as shown below with an additional column of sum of the digits.

Now, here is where everything get’s interesting, mapping the sum of digits produces a near perfect normal distribution as shown below. According to the Central Limit Theorem, the sum of *n* independent and identically distributed random variables tend to be normally distributed as *n *becomes sufficiently large. In layman language, it simply means we have proved that the digits are indeed randomly generated which confirms my third and fourth assumptions.

Next, I asked the question, what if within the digits there is a pair that is linearly or otherwise dependent. So, I set my favorite software WEKA to find any rules within the data in a process known as association mining. Running the apriori algorithm with default settings produced results shown below:

From the results I knew I was onto something, there is a relation between the third and sixth digit with a confidence interval of 1 *(meaning the rule always works)*. To better understand the relation I loaded the dataset to R statistical analysis software and used the plot() function to visually inspect the relation between the two variables. The diagram below made me go Bazinga! It is a linear equation.

If X > 0, Y=X-1, otherwise Y = 9. Simply put, if the third number in the scratch card is greater than 0 then the sixth number is the third number minus one, but if the third number is 0 then the sixth number is 9. Pick up any card and test the formula, in a cryptanalytic sense, I’ve broken part of the code used to generate the hidden reload number of Safaricom scratch cards.

Download the dataset here https://www.dropbox.com/s/dvkpoq35u9bmy2t/Hidden.csv?dl=0

Interesting

I like keeping it interesting 😉

You are such a genius! What do you do for a living?

Thank you once again, I work as a data scientist for iHub Research : http://ke.linkedin.com/in/chrisorwa/

Good to see someone use GNU R a rather cool data environment

Thanks, R is my preferred statistical analysis software given its granularity in undertaking minute operations, also given the awesome packages available to manipulate any type of dataset. Not to add its open source.

how can I get your database folder

At the end of the article there’s a link to download the data in csv format.

OMG!!! Now I know how I can apply my statistics!!!! this is so fascinating!

Thanks, welcome to the world of statistical experiments where everything has a meaning and use 😉

what if the first digits is constant for example all starting values is 6? how will your formula be?

I haven’t undertaken analysis of the first digit but I suppose Benford’s Law applies. Read here on its use in analyzing scratch card data http://www.ihub.co.ke/blog/2011/12/insights-from-safaricom-trash/

for the first time ever i’ve seen someone who went to school and made the full use of the knowledge he aquired there,

the analysis was great and and precise

Thanks for the compliment, much appreciated. Trying to do the best with the knowledge we got.

hey dude…there is something called a check digit. i hear they are related in some way, and that digit is generated in order to form the final series of a scratch card..

That’s a good idea, from my understanding check digits are only digital communication to verify that bits sent are bits received. I think it is worth exploring if it is in use in cryptography.

Not just in communication but check digits are used to check that a generated number is valid, e.g a credit/debit card number (see Luhn Algorithm). However, in this particular case it’s only useful if you know the algorithm used to generate the numbers.

I’ll check Luhn Algorithm, I do understand check digits also used in authenticating bank cheques. Thanks for the info.

And how do I use the dataset to generate a code?

And how do I use the dataset to generate a valid code?

You have to deploy data mining skills. Load the data set into a statistical analysis software such as R, WEKA, Matlab, Excel e.t.c and start playing around with formulas and algorithms.

nice.

Thanks.

Can’t even remember when I signed up here, seeing as my email ad is here, but I suppose a geek will be a geek. No offense.

That said, you’re quite the hack. How about we make you rich to boot? Herewith: crack the whole nine yards of the 16 digit code then take it to safaricom-they’ll buy your loyalty with a perky job for sure. Decline and opt for a consultancy-money’s better, plus no tie-downs. Register a firm to that end now, if you haven’t already.

If they don’t listen, look to the West via the net-don’t bother with the local arms of IT multinationals, for their staff will not let you outshine them to their bosses.

You can’t beat getting paid to work your love.

Thank you for the suggestions, I suppose you registered on the blog a while ago and perhaps forgot about it.

That said, when I get to crack the 16 digits I’ll apply for Nobel prize instead of paying Safaricom a visit. The underlying cryptic code is in use by numerous multinationals and its prize is worth more than what Safaricom can offer.

Dude you will be shot dead by Bob Collymo (conman) if u continue with ua madness.. crazy genius.

Really?, shot dead for showcasing a weakness in a system?

Yeah, by all means break the whole 16 digits code but watch out, they kill pipo, the majority of very bright pipo are actually dead, good brains kills (or rather gets u killed). The fellow who wrote the 1st Mpesa software was killed soon afta it caught on

May be in due course I might do so, but in good business practice I’ll not broadcast it rather offer it as a system security consultation to Safaricom.

Very impressive. very meticulously done and seeing as I am allergic to statistics this is a nice read. Blog subscribed pap.

I am interested to know who manufactures the cards for safaricom. is it an inhouse thing or contracted?

Thank you, I got a hunch the cards are manufactured by a contractor since it requires heavy printing work which isn’t a core Safaricom business.

You know what, i’ll take your work a step further. I have been working on fraud detection algorithms especially for mobile banking. I think I can play devil’s advocate and see how well to apply Benford’s Law to try and catch “un-natural numbers” in the sequence. I have a python implementation now in it’s beta phase.

That would be awesome, feel free to take the work as far as you can. Share any progress you make with Benford’s Law on fraud detection.

Ty, I had a feeling that I will find you here, but this is very interesting read……

Thank you.

Good job bro. … he he, I thought I helped!

Thanks you, Indeed you did help.

Nothing is random, not to a keen eye and brains. The beauty of mathematics 🙂

I concur, they are pseudo-random.

Dude you rock men i like the assumption man i love brilliant mind for sure

Thank you, much appreciated.

I do not know the A from Z on what you guys are speaking here about numbers, word is my thing. Let me tell the story when it happens, and this is amazing. Keep it up, and the best of luck.

Thank you, much appreciated. Numbers tell stories too 😉

R is wickedly cool and useful, especially for us CLI(non-GUI)ers! Great read blackorwa

Thank you, coding capabilities in R makes it a very potent analysis tool.

btw, can get u more scratch cards if you need. Ive got a box of about 600 so far

Thanks for the offer, I got loads of them. I even attempted a Guinness World Record for the largest collection of scratch cards. Read about it here : http://blackorwa.wordpress.com/2012/10/24/my-guinness-world-record-attempt/

my classmates should read this. had no idea how R can be applied. very informative!!

Thank you, feel free to pass the article to your classmates.

intresting stuff..al def work on it wen free..

Thank you.

A system is no stronger than its weakest link ….poke the holes and it will crumble ..good stuff

Thank you, perhaps in due course.

what about if the charging amount on the scratch card is included … could there be anything interesting that may come up?

I suppose not since the number generation in not pegged on the value of the card.

Interesting!

Away from internal relationships between the digits, what if there is a correlation between the value of the scratchcard and one/a set of digits?

Thank you, I tested that out and there was no correlation.

Interesting, though the correlation between the value of the scratchcard, value of the scratchcards, serial number and expiry date of the Cards should have been included for a deeper analysis.

It is work in progress, the serial number and expiry date are already broken, read about it here : http://dobanafrica.com/blogs/?p=5

interesting analysis. good job. suppose you group the data according to the package say 20s, 50s, 100s etc and analyse separately?

Thanks, I tried that approach and it didn’t yield any results.

Great work dude. I thought of doing a similar thing but when I actually got down to work I become suddenly very lazy or as I justified it ‘very busy’. So props for putting in the effort and time.

Thank you, much appreciated.

interesting. if the numbers are randomly generated then what you need to know is the algorithms the random number generator and the seed number. Random by definition implies ‘lack any pattern’ I would thus be surprised if you found in meaningful or predictive pattern. But keep at it

You are a GENIUS! I’ll re-post this blog.. Keep it Interesting. 🙂

Thank you, I’m just a curious dude trying to do what he loves most – crunching numbers.

I mean I’ll re-post this blog post. 🙂

impresive work buddy, i like it honestly, its a sincere application of statistical knowledge and modelling outside school….i am glad you like many apreciate that R gui is the best statistical package available

Thank you, after fiddling with various statistical software R provided the best alternative given it’s programmable nature. I however have adopted a work style of using various software in one project since each software has it’s own strengths and weaknesses.

Nice of you to put academic theory into practise.

Thank you, much appreciated.

Amazing work ..that research is remarkable

Thank you, much appreciated.

Chris this is awesome work, fascinating

Thank you.

Good Thinking and right approach. Congrats

Fascinating! I have some exposure to Monte-Carlo simulation using MATLAB, and applying my ‘skills’ on such a project would be interesting… are you willing to share your database of hidden numbers?

Thanks you, I have provided a link at the bottom of the post where you can download the dataset. If you need additional data drop you e-mail on the Get in Touch section and I’lll send them over.

great read, always interesting what great minds can do

Thank you, much appreciated.

Hey, seen your profile. Have you done any empirical analysis on stock market data before?

Hi Cyril,

Yes, I’ve undertaken analysis of stock market data and currently analyzing forex market data.

hee buda, tell me you went to Harvard or some pristine Korean school….I always thought I understood cryptography but this one is a whole new experince…..awesome man

Thank you, however I never graduate from Harvard or any Ivy league college.

This is an interesting blog. Subscribing.

Thanks

Omera yawa… You’re really a black orwa… This is amazing… Kudos..!

Thank you, much appreciated.

That was an awesome read man. Well, i am doing my final year in my electrical & electronics engineering and i must say it’s so sad that, very few graduate engineers opt to develop their passion or even apply their skill to get the real meaning of the underlying facts. What i am getting from the above excerpt is that you chose to be an hobbyist (allow me to use this term lightly) in statistical works. This is quite exceptional and need i say i’m impressed!! I think people need to put their minds and skills to task more intensively, no offence!

Thank you Mwongera, it is always a person choice on what to pursue either as a hobby or a career. I hope you make a great engineer and contribute to open knowledge.

Don’t think you’re right:

Frequency Distribution:

1st number: 43 45 46 38 43 41 55 39 57 40

2nd number: 44 37 48 53 38 33 56 52 46 40

3rd number: 44 54 53 47 39 36 43 39 53 39

4th number: 44 47 54 47 62 44 48 33 33 35

5th number: 60 50 38 31 42 50 44 36 45 51

and so on…

you will notice it’s approximately equal NOT unequal the way you’ve presented in pairs. You are fitting data to a hypothesis… You have no evidence that pairs works. It could be tuples or 4-groups. I’d need a reason to believe your grouping is based on anything but a hunch.

Oh, and also, the space you’re “cracking” is 10^16 = 10,000,000,000,000,000. Given the size of the space the probability of collision on a single digit is: .0000000000000001. So it’s likely they’re just generating randomly as it’s cheap to do so and the probability of error is low. You are also using the parity check codes as part of your distribution. Given it’s functionally dependent on the other numbers this is not right.

Hi Slim,

Perhaps you misunderstood the blog, the frequency distribution wasn’t done on individuals digits but on the sum of the digits appearing as the last column on the attach image.

You can test the pair with any scratch card you pick up. I formulated a hypothesis and went ahead to test it out which turned out to be true.

From the onset of the blog post I mentioned my intention is to find a weakness in the system and not breaking the whole state space.

Huh! I like your reasoning, but I bet you could have given more statistical details, like margin of error for this hypothesis…

I want to try the same thing in SAS and see the output. Though I have basics of R, I don’t prefer it for data manipulation & mamangement, but I acknowledge its power in graphical presentation over SAS and given that it is a open source as compared to SAS which is damn expensive.

Hi Ba,

I abstracted the details so as not to bore everyone down with statistical jargon. Drop your e-mail on the Get In Touch page and I’ll provide you with more details of the analyis.

Thank you.

That’s amazing data analysis dude! Think you will have trouble justifying your hypothesis when you get to the sixteenth digit. However, you are too generous with your findings though.

Thank you, hypothesis can always be disapproved – we use them to guide analysis and the process can be iterated or changed if desired results are not obtained.

They say when you operate from a place of abundance you are never mind sharing 😉

Why can’t you employ a number theorist. I’ve heard of one who made safaricom increase their codes from 12 to 16 digits

Number theory is a vast field with rich insights. I’ll give it a thought and a try, thanks for the heads up.

My head is spinning

I’ll put a stop on the spin 😉

@Orwa , YOU MEAN THE OTHER NOS ARE RANDOMLY SELECTED WITHOUT A PATTERN ‘CAN THEY BE RETRIEVED?

They are random to the “naked eye” but predictable to machines.

Well done BlackOrwa. Could you email me a method of competence to the system. An upgrade measure of your own. In fact, I would be glad to look into this much further as the prospects of it look really lucrative. With the right connections you could be top level management in any of the top four mobile telephony systems.

Hi Keith,

Thank you for the vote of confidence, as you have read on the blog the analysis focusses on the weakness of the system rather than its strengths. A separate analysis would be required to test that but I can shre what I think about it, drop a message on the contact section and we’ll talk.

Thanks,

Orwa, this is really interesting….the main question is are the other 14 numbers random or is there a pattern?

There’s a pattern, soon enough they’ll be deciphered.

@BlackOrwa…Congra Man keep it up…Please so what’s the secret code they use…What about other Numberz?

Thank you Mack, the code is not fully broken, still work in progress.

hey,nice work was wondering if you have used the Monte Carlo stimulation, factoring the serial numbers if the scratch card and assigning it a variable and you could say run a permutation sequence of the all the digits and in theory you could generate the next batch of scratch cards in production… anyway just a thought.

how do i use the dataset

as you wish, it is same the scratch card reload code with each digit on its own column.

hey Orwa,

i am still in high skul but am coming up with something ,what i would like to know is the software u are using …….please e-mail me if possible

WEKA and R.

This awesome I think I can be of help too pips… Got some good hacking skills

So, how can we collaborate.

Hi Kollo, av been wondering where to learn hacking. Do u mind sharing the clues you have maybe notes or videos, anything i’ll appreciate.

hey,there is a connection btn the first and the fifth digit,

the difference is plus or minus 3

I just counter checked that with the data and doesn’t seem to be correct.

hi orwa. av also got something 4 u.

sounds lyk another sophisticated mystery book..as in your works looks deeply researched on and well presented.congrats…hope to join you soon

Thanks Charles

kudos man. I like it. I’ve always read my scratch card and said, there is a huge relationship between the numbers, they literary are not just random! I once took 7digits of a 250 card an my seed in generating a random 16-digit code using r function rnorm() and got 100 codes. piloted a scatter plot and normalized the graph. picked randomly three outliers and two were valid. anyway, that was guts leading me then. I say do not fear a thing, if were not to get in, they should build it better!! never run q(yes)

Thanks.

where has this thread been my whole lifr. Black u gewd man.

Thanks.

hey Orwa is this things facts?

Yes, it is. You can test it out with other scratch cards.

Wow this is really interesting

Thanks

nyathiwa tiji no ler……apwoyi matek an kodi KU kae wanyalo romo mondo wa share idea moro matin kae koso????

Left KU 5 years ago.

hi Chris. am doing some statistics and could kindly request for the kind of software you used . am impressed for that commendable statistical analysis you did. I would like to collaborate

I used R statistical programing language and WEKA machine learning software.

Thank you Chris.

Welcome

Hello guys..a’ve been able to get the relationship among the 1st 8 numbers ..any1 with a clue on the last 8?

Hello, share your results.

jst thinkin if yu add the series of 4 digits and get their difference from the two scratch card i have it generates a parten thou a complex one confirm this from your data

I’ll check that out.

Someone said that he found first 8 digits did it work?

Kindly share link to their analysis.

All I know is on the blog post.

Great Orwa, update on the progress. very much interesting

Thanks Ian. Still working on some new ideas but non paying off at the moment.

Hi mine is abit far from dat. I wanted 2 now how u can the R squire values of a set of several values at once in spss.

hey Orwa i just figured out another one…,the second digit gives u the fourth digit.when the second digit is less than 5 add five to it but wen its greater than5 subtract five from it.wen the second digit is five definately the fourth digit is 8.test with all scratch cards n u will proof mi right.

Hi Pinchez,

Unfortunately the the formula doesn’t work on all scratch cards.

Cheers

Thank God that there are guys like you Chris who have the skills and the persistence to work through complex data such as this. Weldone!

Thanks Evans.

Omg Chris this is so true just looked at my scratch card just now

am a very big fan of code breaking ,, this is kwul

Thanks.

i dont know if it means sth or its normal_find the average of each card-find the average of all those averages-that average equals that of the average of the averages of each of the digits(eg all the first digits)…i hope someone gets me..though vaguely

Dude your so cool…now i know of the relationship between two digits…what of the others???? I’m tired of scratching the cards

Thanks, still working on the rest.

This is awesome dude… i can crack phone IMEI and tweak to redeem the free data bundles… im also interested in doing the same in Scratch Cards… keep me updated on any info sir..

Nice…how about cracking bundles

Perhaps some other time.

WTF! you mean its a fact?

Lemme try it out, have gat bamba 10. lol

Its a fact,true statistics.Still working on 14 digits.

sir,,could you tell me which course should i undertake and in which university …..would like to speak the same language as you…..”weka,,R,,Sas,, blablabla…otherwise it won’t be amiss saying .u r great nigga..

also give me hints on what should i expect 2b .aft the course…thanks

cool kid. there is only one question. when i look at the equation i wonder how if you reverse the equation given a random digit, the result is always fascinating. so why not split the numbers and see the magic. Safaricom wishes we assume the number is one but actually there is more than one generator whose a logarithm is quit simple.

congrats! i will like u to inbox me the real procedure that u used to crack the credit cards.

Hi…. Enjoyed this blog …… Do you still go on with the analysis?

Yes, I still do.

Criasly i dont understand all this concepts..

I really like it and have a great interest in computer security, and ethical hacking, but please guys don’t take this for granted. Do it only for educational purpose, not anything else.

Thank you, it’s all ethical.

u really challenged me and i went on to find other more relationship between the digits and guess what am almost there…i jus need a couple of hours to reveal the whole thing!

That was big mental exploration….am an undergraduate in applied statistics and by at most 2years I will make this my main case study.

Go ye and explore. Thanks.

hey orwa, I think this is interesting, have you ever recharged your sim using this idea

Not yet.

can this work by used airtime scrach cards

Yes.

hello….I think I got it….after several months of analysis.I finaly found out the alogarithm behind generating the codes.

Do share your analysis.

Hey orwa since I first read this article I have been in its hant bt not yet yeild….any proceedings in your side or any more steps ….???

Hi Hulk,

Unfortunately I haven’t had the time to perform further analysis. It’s my hope that others can pick up from my analysis and make progress.

Regards

Share your analysis I have also broken another part after collection of 50 Safaricom used recharge cards

All my analysis is shared on this blog. Do share yours!

Hello guy I am currently doing my post graduate in Applied Statistics and have recently been wondering about the whole concept behind the randomization of the numbers used in credit cards as well as scratch cards. It is very awesome to find other people such as myself who are trying to break the code. I employ you to continue analyzing the whole concept and as I join in the research may we work towards success.

Thank you Michael for the encouraging words.

Pingback: INTERVIEW:FINDING DEEP STRUCTURES IN DATA WITH CHRIS ORWA - Data Science Africa

Not yet got the concept…….

Which access?

can i get the app to my phone i like this teachings .nice one.

Which app?

Blackowa you think the theory is still in place and working

Yes, it is.

based on all cards or just specific cards

Based on all scratch cards.

Ok Thank you..But i tried once and i could not make so i dont know why but may be i didnt get the theory well..May be you can explain to me Abit sir if you are willing to..i mean the third and sixth theory or may be another theory if there is

I have now discovered the sequence of the first eight digits of the scratch card.

step 1

take cards that are attached and of the same denomination.

step 2

compare digits 1,3 &6

step 3

compare digits 2 , 4 & 7

step 4

compare digits 5&8

wow it is interesting how it works you really a top rank genious ,please share the ideas with me.

So if I work out the digits that will be a valid credit?

Yes

help me understand please i have understood abt 3rd & 6th digits..

bt how can i now make a credt i mean hw to deal with the remaining digits

Hey Sterno help me please understand this

Since you posted this, they have acquisitioned an algorithm from a German Company and other safeguards such as regional 16- digit codes, minimising the number of attempts and only activating the cards once it has reached the destination. Attempting to bruteforce their security network is hard man. The Chinese tried and were caught. The data set is invalid at the moment.

Interesting developments

Waaoh! That’s really #Magical. But, I really like it Men. It’s on your way to have the equation, to meet the #Solution challenge.

It’s #Awesome

Keep Up.

gud wrk

Atlast I Got There, My First Way Out Worked, Am Currently Enjoying 500 Airtime

Great

genius!!

Haha Crazy Geeks in Kenya and yet they say Africans have no knowledge people like you should stand up and show what you can do… A flying car should originate from Kenya I see we have the right people.

hello! if i may ask what is the 3rd digit is 9? what will be the 6th digit??

It would be 0

1. Discovered that First Digit is always not equal to zero, rand()%9=0 to 8,and 1+rand()%9=1 to 8 therefore First Digit is 1+rand()(9 for the answer not to be zero.

2. Third digit is always = 0-8, and formula is rand()%9 , for the difference between third and sixth has to be 1 or 9 . In other words,Probability of occurrence of 9 in the formula is 0. Mathematically, in examples;

a. 360/9=40 remainder 0.

b. 361/9=40 remainder 1.

c.362/9=40 remainder 2 ……………….369/9=41 remainder 0,

…….remainder 9 does not occur in any case.

and therefore,formula for sixth digit has to be 1+rand()%9 for probability of 9 to be more than not occurring in the sixth digit……………..more info later..

