In the previous posts I made something of a deal about the number called the growth factor. I’d like to explain why it’s such a deal. Consider the following graph:

It looks like “Mr. Toad’s Wild Ride” and is generated from the data I’ve collected from the Johns Hopkins COVID-19 website. I hope to show here that VERY small changes in it’s value can lead to major changes in the outcomes. First, let’s look at the actual numbers:

6, 0.1666667, 7, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

The 6 and 7 look a lot like outliers so running R’s boxplot routine on the data gives the following plot:

Boxplot, using Tukey’s Method, does indicate that 6 and 7 are too far away from the mean of the data so we’ll eliminate them. That leaves:

0.1666667, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

Additionally, growth factors 2 or greater lead to extremely high infection counts (the entire world would be infected in under 33 days) that are not reflected in reality so eliminate those data points.

0.1666667, 0.4285714, 1, 1.142857, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 1.099338, 1.096386

There are no outliers in the remaining data:

so, I’m going to use the last seven days growth factors to predict the next seven days number of infections. For comparison let’s check the projections using the raw growth rates, the rates without the outliers plus rates greater than 2, and finally the filtered seven day averaged.

I will be using the running average in future predictions. Let’s see how things look on Friday . . . film at 11:00.