There is a 20 km stretch of road along the A104 highway between Sachangwan and Salgaa in Nakuru County that has gained a reputation as the habitation ground of grisly road accidents. The commonplace blame is on a steep unwinding section of the road that currently hosts a black stop signage to caution drivers. Other interventions such as erection of speed bumps have not reduced carnage on the highway. This brings back the question , What’s the main cause of the accidents? The National Transport and Safety Authority (NTSA) collects data on major accidents that happen in the country. We chose to answer the question with data.
NTSA collects 19 variables related to an accident, these are;
- Light Conditions
- Road Class
- Motor Vehicles Involved (Number Plates)
- Vehicle Type
- Vehicle Make
- Vehicle Model
- Brief Details
- Name of Victim(s)
- Gender of Victim(s)
- Age of Victims
- Cause Code
- Type of Victim
- Number of Victims
In the dataset, to look at accidents just before you get to salgaa-sachangwan area and after the blackspot. To that end, we created a subset of data that had fatalities in Nakuru county along the A104 highway which resulted in 83 entries. The dataset covered Gigil, Naivasha, Nakuru, Salgaa, Sachangwan, and Molo. Initial feature extraction resulted in 384 variables but we fell into the curse of dimensionality since you cannot have more variables than number of rows.
So we opted to select only variables that can assist in giving accidents and ‘identity’. Final dataset had 82 features we covered location, cause, victims, day, and description of accidents. Our hypothesis was that if there is something unique at Salgaa that’s causing many accidents, then it would show up as an anomaly on an outline analysis. Straight into the code room we deployed our favourite outlier – Principal Component Analysis. The result is shown below.
Fooled by Randomness
From the diagram above some interesting patterns can be deduced. First one is that Nakuru experiences fatal accidents on Friday. This does make sense, it can be attributed to personal cars leaving the city for upcountry weekend stay – or partying. Given this a clear pattern, the accidents can be mitigated by speed monitoring and scrutiny of personal cars by NTSA on Fridays.
The second clear signal are trailer accidents happening in Naivasha on Saturdays. We don’t have a supposition why trailer end up in fatal accidents on Saturdays – it is worthy of a field research. However, this is also a clear pattern tied to time of day which makes it easy to design an intervening program.
When it comes to Salgaa, we have the cluster ball of everything mixed up. Accidents happen almost all days of the week, it involves trucks, lorry, matatu, motorcycles, personal cars, hit and run, head on collisions et cetera. Only one word can describe the phenomenon – randomness. Salgaa is randomness disguised and perceived as non-randomness. There isn’t one strong factor(s) that correlates with accidents happening at the location.
This tell us that accidents that occur at Salgaa-Sachangwan are caused my multiple factors – both human and infrastructural. On NTSA’s coding scheme, the cause of an accident is always reduced to one factor – it can be expanded to include all factors that apply. A final root-cause analysis is then performed to identify the main causal factor for an accident. An example would be a football match happening out of the city that results in more people in a hurry using a narrow road at the same time on a rainy day.
Some important variables were left out in the analysis due to the relatively low numbers of accidents recorded in the area. From a statistical perspective we have to wait for more accidents to happen so we can include them.
Happy Holidays! Drive Safe!
The complete dataset can be found here: https://docs.google.com/spreadsheets/d/e/2PACX-1vS-mRRb_bnLI4-UcLHHBL9-Tg4QpmnXibavwoyzlzyV0fj7jzsBfc6h4_aWWl5OhFnf1lkXY3yeELAI/pubhtml
Data collected by Anthony Otieno