I’m a data driven type of person. If there’s a situation where numbers are available I’ll use them to make sense of what’s going on. The current pandemic is a perfect example. I became tired of the bloviating on radio and TV and started doing a little data analysis of my own.
A very good source of up to date numbers comes from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. You can see their COVID-19 dashboard here. Every day between 10:00AM and 11:00AM I use this website to find out how many confirmed cases have been reported for North Carolina. I take this number and add it to a dataset that I process with some R code that I’ve written.
Before I get into the actual data I want to go through an easy example illustrating how the data is processed. Let’s say we have an infection that causes the number of cases to double every day:
From these numbers a growth factor can be calculated which we can use to predict the number of future cases:
|New Cases||1-0 = 1||2-1 = 1||4-2 = 2||8-4 = 4||16-8 = 8||?||?|
|Growth||NA||1/1 = 1||2/1 = 2||4/2 = 2||8/4 = 2||?||?|
New cases are the number of yesterday’s cases subtracted from the current day’s cases and growth (G) is the number of today’s new cases divided by the number of yesterday’s new cases. Since the number of cases doubles in our example the G is a constant, 2.
Which is born out in the sequence:
What the current NC data looks like
Early data was a little noisy leading to wildly different (and horrific) growth factors from day to day. Here’s what the daily growth factors look like:
In order to smooth things out a bit the R code filters out all growth factors greater than 2.0 then takes the average of the most recent seven. Growth factors greater than 1.0 indicate that the infection is spreading, 1.0 is the inflection point where there are a constant number of cases each day, and growth factors less than 1.0 indicate fewer and fewer cases per day. Eventually the growth factor gets to zero (or very close to it).
Plotting the raw data, daily number of cases, and a seven day projection gives the following plot:
The number of daily cases (blue line) has been hovering around the 100 mark for several days. Ideally, this line should slope negative and go to zero. The projected line is for the next week. By 03Apr2020 the data suggests around 2100 confirmed cases. I’ll repost a new graph next Friday for a comparison.