Data Analysis

Building the Kenyan County Wellness Index

After the formation of 47 counties in August 2010 by the new constitution of Kenya, there was need to rank counties for purposes of resource allocation. The Commission of Revenue Allocation (CRA) was mandated to construct the formula for revenue sharing but fell short of what was expected.  Before commencing resource allocation, it is prudent to first rank counties in terms of development (or wellness) and not simply use poverty index to sit in place of ‘development index’ as CRA did. Then, as part of Doban Africa, we undertook to construct a compound index that incorporated all development indices such as us poverty index, electricity connection, population, percentage of educated population, water supply, number of families with solar power amongst other. The technical details are shown below.

Raw Data

 

We took a data mining approach and collated data from the Kenyan government open data portal that had inclination as development indicators. Consequent to that used WEKA (Waikato Environment for Knowledge Analysis) data mining software to sift through the data. Out of 15 input fields, 5 produced the biggest correlation margin to predict the output. The correlation coefficient measures the degree of correlation between the actual and the estimated value of the model. The chosen algorithm to construct the index is the M5 Prime (M5P) algorithm which produces a model which is a linear function of weighted sum of the input variables. The first step generates a regression tree using training data. It then calculates a linear model (using linear regression) for each node of the tree generated. The second step tries to simplify the regression tree by deleting nodes of the linear model whose attributes do not increase the error.

 

Index = -0.9459 * Elec + 0.3537 * Solar – 0.0171 * Pop Den – 0.2441 * Pri Ed – 0.2653 * Infra

+ 0.1172 * Ed + 76.6523

 

Stats:

Correlation coefficient                     0.8645

Mean absolute error                        7.0004

Root mean squared error                 9.0576

Relative absolute error                    48.8543 %

Root relative squared error             50.6268 %

 

Running the latest figures of the variables to the model produces the following county rankings.

  1. Nairobi
  2. Mombasa
  3. Kiambu
  4. Kajiado
  5. Nakuru
  6. Uasin Gishu
  7. Nyeri
  8. Kirinyaga
  9. Embu
  10. Kilifi
  11. Machakos
  12. Lamu
  13. Taita Taveta
  14. Laikipia
  15. Muranga
  16. Kisumu
  17. Meru
  18. Kericho
  19. Isiolo
  20. Nyandarua
  21. Trans Nzoia
  22. Garissa
  23. Vihiga
  24. Kisii
  25. Kwale
  26. Tharaka Nithi
  27. Narok
  28. Nyamira
  29. Migori
  30. Kakamega
  31. Busia
  32. Bungoma
  33. Makueni
  34. Bomet
  35. Nandi
  36. Elgeyo Marakwet
  37. Kitui
  38. Siaya
  39. Baringo
  40. Homabay
  41. Wajir
  42. Tana River
  43. Marsabit
  44. West Pokot
  45. Samburu
  46. Mandera
  47. Turkana

 

Do you feel it is a better indicator?

Addendum: 25-09-2015

 

 

Advertisements

4 comments

  1. This is the type of stuff I like. Not aware of M5P, but would you get the same results by running a principal component analysis in STATA or something?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s