The Problem at Sachangwan-Salgaa

There is a 20 km stretch of road along the A104 highway between Sachangwan and Salgaa in Nakuru County that has gained a reputation as the habitation ground of grisly road accidents. The commonplace blame is on a steep unwinding section of the road that currently hosts a black stop signage to caution drivers. Other interventions such as erection of speed bumps have not reduced carnage on the highway.  This brings back the question , What’s the main cause of the accidents? The National Transport and Safety Authority (NTSA) collects data on major accidents that happen in the country. We chose to answer the question with data.

NTSA collects 19 variables related to an accident, these are;

1. Date
2. Time
3. Light Conditions
4. Location
5. County
8. Place
9. Motor Vehicles Involved (Number Plates)
10. Vehicle Type
11. Vehicle Make
12. Vehicle Model
13. Brief Details
14. Name of Victim(s)
15. Gender of Victim(s)
16. Age of Victims
17. Cause Code
18. Type of Victim
19. Number of Victims

The Method
In the dataset, to look at accidents just before you get to salgaa-sachangwan area and after the blackspot. To that end, we created a subset of data that had fatalities in Nakuru county along the A104 highway which resulted in 83 entries. The dataset covered Gigil, Naivasha, Nakuru, Salgaa, Sachangwan, and Molo. Initial feature extraction resulted in 384 variables but we fell into the curse of dimensionality since you cannot have more variables than number of rows.

So we opted to select only variables that can assist in giving accidents and ‘identity’. Final dataset had 82 features we covered location, cause, victims, day, and description of accidents. Our hypothesis was that if there is something unique at Salgaa that’s causing many accidents, then it would show up as an anomaly on an outline analysis. Straight into the code room we deployed our favourite outlier – Principal Component Analysis. The result is shown below.

Fooled by Randomness
From the diagram above some interesting patterns can be deduced. First one is that Nakuru experiences fatal accidents on Friday. This does make sense, it can be attributed to personal cars leaving the city for upcountry weekend stay – or partying. Given this a clear pattern, the accidents can be mitigated by speed monitoring and scrutiny of personal cars by NTSA on Fridays.

The second clear signal are trailer accidents happening in Naivasha on Saturdays. We don’t have a supposition why trailer end up in fatal accidents on Saturdays – it is worthy of a field research. However, this is also a clear pattern tied to time of day which makes it easy to design an intervening program.

When it comes to Salgaa, we have the cluster ball of everything mixed up. Accidents happen almost all days of the week, it involves trucks, lorry, matatu, motorcycles, personal cars, hit and run, head on collisions et cetera. Only one word can describe the phenomenon – randomness.  Salgaa is randomness disguised and perceived  as non-randomness. There isn’t one strong factor(s) that correlates with accidents happening at the location.

This tell us that accidents that occur at Salgaa-Sachangwan are caused my multiple factors – both human and infrastructural. On NTSA’s coding scheme, the cause of an accident is always reduced to one factor – it can be expanded to include all factors that apply.  A final root-cause analysis is then performed to identify the main causal factor for an accident. An example would be a football match happening out of the city that results in more people in a hurry using a narrow road at the same time on a rainy day.

Caveat Lector
Some important variables were left out in the analysis due to the relatively low numbers of accidents recorded in the area. From a statistical perspective we have to wait for more accidents to happen so we can include them.

Happy Holidays! Drive Safe!

Data collected by Anthony Otieno

1. What the hell are you talking about?

Like

1. For statisticians and number gurus, the points are clear as stated

Like

2. Your data is till 5/11/17, if you were to include data till 17/12/17, would the conclusion change or support your theory of randomness at Salgaa?

Like

1. That’s the extent of the data that NTSA provided. The conclusion may as well change with more data.

Like

3. I have noted there were no subarus in the accidents statistics as opposed to common belief that they are involved in many accidents. VWs too.

Like

1. Good catch. Time to rewrite the Subaru story – some insurance companies have dropped Subaru out of their client list. Do however note that this data only capture accidents with fatalities. There might be more Subaru accidents on non-fatal accidents.

Like

4. I particularly love how you explain your job….as a business analyst, this is one of the toughest jobs…kudos. That being said, does NTSA offer such data sets? I think they should avail to the public, as far as i know, i haven’t seen them anywhere unless you point them for me…cheers

Like

1. Thank you. NTSA does avail monthly stats. You have to aggregate them yourself

Like

5. I can offer more robust statis
tical analysis

Like

6. I like this guys reasoning

Like

1. Thank you.

Like