Original post appeared on Henk Harmsen

In an agricultural society such as Kenya, much depends on the rain. Whether a rainfall pattern is “good” depends on total rainfall (sufficiency), the fact that it comes at a certain point in the year (reliability), and that the rainfall amount per month is more or less the same on a year-by-year basis (constancy).

There are concerns that the climate is changing, and the rainfall patterns with it. Complaints that I hear frequently in Kenya are reliability (“The rain comes much later now than when I was young”) and constancy (“You just can’t count on it anymore”).

With the help of a rainfall dataset for Machakos, it is possible to examine these questions a little closer. In statistics, there is never enough data, and climate change data with their low signal to noise ratios are notorious. The analysis will, therefore, be ‘on the back of an envelope’, using a trick: the number of rainfall records can be predicted.

**Weather forecast: seasonal and unpredictable
**The dataset contains 30 years of data in periods of 10 days (dekads).

A first impression of the time series is obtained by decomposing the monthly rainfall data. An observed time series consists of a trend, a seasonal pattern, and the remainder. The plot shows a clear regular seasonal pattern, with a trend that stabilises on a slightly higher level after the year 2000. The irregular time-series is not as irregular as it should be, meaning that trend and seasonal pattern removal were not completely successful.

The raw rainfall pattern over the years is shown below.

**Monthly rainfall has very high year-to-year variations
**That’s a great deal of variation – how can this be summarized or reduced somewhat? Boxplots bring the solution, as they summarize the data in just 5 numbers: outliers (low), quartiles 25%, 50% (median) and 75% and outliers (high). The “box” contains 50% of the observations (quartiles 25% to 75%) and the median (quartile 50%). Outliers are exceptional values and these are shown as dots.

From these plots, it is clear that this seasonal rainfall pattern has large uncertainty margins. The rain will come, and the dry months will be absolutely dry, but everything else has large year-to-year variations.

**Total amount of rainfall varies with a factor two
**How about suffiency? The total rainfall is very variable – hard to see whether the total amount is changing with this amount of yearly data available.

At this point, the only thing you can say to a “rainfall complainer” is: “What did you expect? This climate *is* unpredictable.”. But these plots showed the rainfall patterns without considering whether they are changing or not.

**Are the rainfall patterns changing?
**In order to find out whether rainfall patterns are changing to compare the data against something, for example an average. In this case, the dataset does not stretch over many years, so that a different approach is needed.

Now, here’s the trick. When you collect rainfall data, records will be broke. The rain start later than last year, the total amount is larger (or lower) etc. The expected number of records can be estimated. When you start with the collection of your data, the first datapoint will automatically be a record. The second datapoint must be larger than the first one, so this probability is 1/2. This goes on in a series: 1/1 + 1/2 + 1/3 + …1/n. The cumulative sum of this series is the expected number of records. With the 31 years of data that we have for Machakos, this amounts to the sum of 1/1 + ..1/31 = 4.

The graphs for both probability and the expected number of records to be broke are shown below.

The total amount of rainfall in Machakos broke 5 records during the period 1983-2013, whereas 4 records were expected. Is the climate really changing? For this we need a second “trick”: the bootstrap.

The bootstrap works like this: take 31 balls, each for every year, and write the total amount of rainfall on them. Put them in an urn and sample 31 balls out of it. Calculate the number of records and write it down. Repeat this process about 10000 times, and plot a histogram. The results have the familiar shape of the normal distribution, meaning that we can calculate the 90% confidence interval for it. In the case of total rainfall, that confidence interval is from 1 to 6. Both the expected number of records and the actual number of rainfall records fall in this range.

What is the conclusion of all this? If there is no trend in the total rainfall, then it does not matter in what sequence the balls with rainfall will be drawn from the urn. That means that the actual number of records will fall in the confidence interval. If it doesn’t then the sequence of sampling *does* matter, which is an indication of a changing pattern.

**Late arrival of the rainy season
**For every year the first day of rain has been identified. The number of expected and annual records was calculated and a simulation was carried out.

On the basis of the results, there are more late arrivals of the rainy season than expected.

Cover photo by: Pamela Cahoon Clem

** **