After the formation of 47 counties in August 2010 by the new constitution of Kenya, there was need to rank counties for purposes of resource allocation. The Commission of Revenue Allocation (CRA) was mandated to construct the formula for revenue sharing but fell short of what was expected. Before commencing resource allocation, it is prudent to first rank counties in terms of development (or wellness) and not simply use poverty index to sit in place of ‘development index’ as CRA did. Then, as part of Doban Africa, we undertook to construct a compound index that incorporated all development indices such as us poverty index, electricity connection, population, percentage of educated population, water supply, number of families with solar power amongst other. The technical details are shown below.

We took a data mining approach and collated data from the Kenyan government open data portal that had inclination as development indicators. Consequent to that used WEKA (Waikato Environment for Knowledge Analysis) data mining software to sift through the data. Out of 15 input fields, 5 produced the biggest correlation margin to predict the output. The correlation coefficient measures the degree of correlation between the actual and the estimated value of the model. The chosen algorithm to construct the index is the M5 Prime (M5P) algorithm which produces a model which is a linear function of weighted sum of the input variables. The first step generates a regression tree using training data. It then calculates a linear model (using linear regression) for each node of the tree generated. The second step tries to simplify the regression tree by deleting nodes of the linear model whose attributes do not increase the error.

Index = -0.9459 * Elec + 0.3537 * Solar – 0.0171 * Pop Den – 0.2441 * Pri Ed – 0.2653 * Infra

+ 0.1172 * Ed + 76.6523

Stats:

Correlation coefficient 0.8645

Mean absolute error 7.0004

Root mean squared error 9.0576

Relative absolute error 48.8543 %

Root relative squared error 50.6268 %

Running the latest figures of the variables to the model produces the following county rankings.

- Nairobi
- Mombasa
- Kiambu
- Kajiado
- Nakuru
- Uasin Gishu
- Nyeri
- Kirinyaga
- Embu
- Kilifi
- Machakos
- Lamu
- Taita Taveta
- Laikipia
- Muranga
- Kisumu
- Meru
- Kericho
- Isiolo
- Nyandarua
- Trans Nzoia
- Garissa
- Vihiga
- Kisii
- Kwale
- Tharaka Nithi
- Narok
- Nyamira
- Migori
- Kakamega
- Busia
- Bungoma
- Makueni
- Bomet
- Nandi
- Elgeyo Marakwet
- Kitui
- Siaya
- Baringo
- Homabay
- Wajir
- Tana River
- Marsabit
- West Pokot
- Samburu
- Mandera
- Turkana

Do you feel it is a better indicator?

Addendum: 25-09-2015

its not such a bad thing after all

I thought it might be useful one way or another.

This is the type of stuff I like. Not aware of M5P, but would you get the same results by running a principal component analysis in STATA or something?

M5P is a machine learning algorithm in the software WEKA that builds a linear model of input variables. You should obtain same results running PCA on STATA.