After the formation of 47 counties in August 2010 by the new constitution of Kenya, there was need to rank counties for purposes of resource allocation. The Commission of Revenue Allocation (CRA) was mandated to construct the formula for revenue sharing but fell short of what was expected. Before commencing resource allocation, it is prudent to first rank counties in terms of development (or wellness) and not simply use poverty index to sit in place of ‘development index’ as CRA did. Then, as part of Doban Africa, we undertook to construct a compound index that incorporated all development indices such as us poverty index, electricity connection, population, percentage of educated population, water supply, number of families with solar power amongst other. The technical details are shown below.
We took a data mining approach and collated data from the Kenyan government open data portal that had inclination as development indicators. Consequent to that used WEKA (Waikato Environment for Knowledge Analysis) data mining software to sift through the data. Out of 15 input fields, 5 produced the biggest correlation margin to predict the output. The correlation coefficient measures the degree of correlation between the actual and the estimated value of the model. The chosen algorithm to construct the index is the M5 Prime (M5P) algorithm which produces a model which is a linear function of weighted sum of the input variables. The first step generates a regression tree using training data. It then calculates a linear model (using linear regression) for each node of the tree generated. The second step tries to simplify the regression tree by deleting nodes of the linear model whose attributes do not increase the error.
Index = -0.9459 * Elec + 0.3537 * Solar – 0.0171 * Pop Den – 0.2441 * Pri Ed – 0.2653 * Infra
+ 0.1172 * Ed + 76.6523
Correlation coefficient 0.8645
Mean absolute error 7.0004
Root mean squared error 9.0576
Relative absolute error 48.8543 %
Root relative squared error 50.6268 %
Running the latest figures of the variables to the model produces the following county rankings.
- Uasin Gishu
- Taita Taveta
- Trans Nzoia
- Tharaka Nithi
- Elgeyo Marakwet
- Tana River
- West Pokot
Do you feel it is a better indicator?