Please excuse my obsession

In the previous posts I made something of a deal about the number called the growth factor. I’d like to explain why it’s such a deal. Consider the following graph:

It looks like “Mr. Toad’s Wild Ride” and is generated from the data I’ve collected from the Johns Hopkins COVID-19 website. I hope to show here that VERY small changes in it’s value can lead to major changes in the outcomes. First, let’s look at the actual numbers:

6, 0.1666667, 7, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

The 6 and 7 look a lot like outliers so running R’s boxplot routine on the data gives the following plot:

Boxplot, using Tukey’s Method, does indicate that 6 and 7 are too far away from the mean of the data so we’ll eliminate them. That leaves:

0.1666667, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

Additionally, growth factors 2 or greater lead to extremely high infection counts (the entire world would be infected in under 33 days) that are not reflected in reality so eliminate those data points.

0.1666667, 0.4285714, 1, 1.142857, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 1.099338, 1.096386

There are no outliers in the remaining data:

so, I’m going to use the last seven days growth factors to predict the next seven days number of infections. For comparison let’s check the projections using the raw growth rates, the rates without the outliers plus rates greater than 2, and finally the filtered seven day averaged.

Obviously out of line with observed reality. The early data was very noisy.
The outliers and values >2.0 are dropped but several very LOW values from early in the data collection appear to pull the growth factor abnormally low.
Using the seven day running average growth factor produces results more congruent with observed reality.

I will be using the running average in future predictions. Let’s see how things look on Friday . . . film at 11:00.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s