Friday check-in . . .

. . . OK, I’m a day late from my promised posting time. I ran into some data problems that I had to work through – details are below.

First, last week’s prediction. The stats were pointing to around 4,033 cases by 10Apr2020. The actual reported cases from the Johns Hopkins site was 3962 – a difference of only 71 cases. We’ll have to wait until next week though to see if this was a phenomenal bit of luck or whether the simple model I’m using is matching reality.

The problem I ran into with the data started on Wednesday. The day-to-day variation in the reported cases was wildly erratic. The growth factor is calculated directly from the daily cases and I think I’ve made my point about the sensitivity of predictions based on the growth factor.

I didn’t want to do anything too radical to the data otherwise it would appear that I was manipulating the numbers to make my own point. After a little research I decided to apply a three day running average on the new cases numbers. This is a common technique used to smooth financial and scientific data. If you have the following set of data:

DAY1234567
CASES804011010010019060

Day 2 is (80+40+110)/3, day 3 is (40+110+100)/3, etc. This gives you the following smoothed data

DAY1234567
CASES8076.683.3103.3130116.660

Day 1 will always be 80 and day 7 will become a smoothed value if and when data for day 8 is collected.

Applying this technique to the Johns Hopkins data gives the following graph:

It’s hard to do a direct comparison with the new cases projection from last week because it wasn’t using smoothed data BUT from last week’s graph one would expect around 450-500 new cases to be reported today. The smoothed data predicts just over 300 which is much closer to the reported 287. While this looks promising I’ll be checking this as time progresses and will report the results next Friday.

Armed now with smoothed data what do the growth factors look like? Here’s a plot of the growth factors over time from the original and smoothed data:

and here is this week’s plot of accumulated cases and a projection for Friday the 17th of April.

A significant note is that over the last three projections the growth factors have been 1.16, 1.10, and 1.04 – a definite trend down. This is good news as it indicates that social distancing appears to be working and every incremental decrease in the growth factor spreads the number of cases out over time so that our hospitals don’t become saturated.

Until next Friday . . . be safe, be healthy!!

Here we are a week later . . .

. . . let’s check to see if what was conjectured last Friday was close to correct. As a recap, here’s last Friday’s graph:

predicting 2,152 cases by the 34th day after 01Mar or 03Apr. Today’s graph with weekly projection looks like:

The last data point is 2,041 cases. This varies from 2,152 by only 5.2%. But, past performance doesn’t guarantee future results so we’ll have to see if next Friday North Carolina has a reported 4033 +/- 208 cases.

You may notice that the blue line showing the number of new cases is missing from the second graph. The y-axis scale became so large as to make that line a mere blue squiggle along the 0 grid line. I have broken that data into a separate graph showing the daily number of reported cases with a non-linear least squares fit and seven day projection. Today it looks like:

For me this is the Scary Graph. Down around day 20, 50 cases were getting reported each day. Ten days later this jumps to around 150. Ten days after that (the end of next week), if the projection holds, we’ll be seeing 500 reported cases a day. This is not a good trend and indicates that we haven’t entered into the middle part of the infectious spread where the number of new cases per day is relatively constant.

An interesting feature that jumps out at me are the “blips” in the data. It’s not obvious below day 15 but there’s a hint of it. There’s a jump up then a plunge the next day followed by three days of smoothly increasing data.

Will we now get three smoothly increasing days close to the projected line as we have in the previous two cycles? If the blips persist what’s causing them? [My bet is some of the reporting entities are releasing their numbers in batches and some are releasing numbers daily.]

Stay tuned (click the follow link if you’re interested). If something striking happens I’ll post it but look for the next post to drop next Friday.

Please excuse my obsession

In the previous posts I made something of a deal about the number called the growth factor. I’d like to explain why it’s such a deal. Consider the following graph:

It looks like “Mr. Toad’s Wild Ride” and is generated from the data I’ve collected from the Johns Hopkins COVID-19 website. I hope to show here that VERY small changes in it’s value can lead to major changes in the outcomes. First, let’s look at the actual numbers:

6, 0.1666667, 7, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

The 6 and 7 look a lot like outliers so running R’s boxplot routine on the data gives the following plot:

Boxplot, using Tukey’s Method, does indicate that 6 and 7 are too far away from the mean of the data so we’ll eliminate them. That leaves:

0.1666667, 0.4285714, 2.333333, 1, 1.142857, 3, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 3.058824, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 2.188406, 1.099338, 1.096386

Additionally, growth factors 2 or greater lead to extremely high infection counts (the entire world would be infected in under 33 days) that are not reflected in reality so eliminate those data points.

0.1666667, 0.4285714, 1, 1.142857, 1.083333, 1.653846, 1.255814, 1.555556, 0.4047619, 0.9807692, 1.107843, 1.168142, 1.787879, 0.2923729, 1.099338, 1.096386

There are no outliers in the remaining data:

so, I’m going to use the last seven days growth factors to predict the next seven days number of infections. For comparison let’s check the projections using the raw growth rates, the rates without the outliers plus rates greater than 2, and finally the filtered seven day averaged.

Obviously out of line with observed reality. The early data was very noisy.
The outliers and values >2.0 are dropped but several very LOW values from early in the data collection appear to pull the growth factor abnormally low.
Using the seven day running average growth factor produces results more congruent with observed reality.

I will be using the running average in future predictions. Let’s see how things look on Friday . . . film at 11:00.

Dying for the Dow . . .

In my opinion, not sheltering is not an option.

If you read my previous post about COVID-19 you know a key number is the growth factor. What’s not obvious is HOW sensitive the outbreak is to this seemingly little number. So, with this in mind I thought I would recast the math in terms which some folks could relate.

If you have a savings account (not a given here in the U.S.) you know that you receive a nominal interest rate on your savings. Let’s say your bank offers a 16% return on your savings (yes, a complete fantasy – I’m earning 0.70%). You start a new account with one dollar as an opening deposit and let it sit for a year. At the end of the year you’ll have . . . yes, $1.16.

BUT, this is an annual rate whose accrued interest is spread over a year. The growth factor is a DAILY rate generated from the previous day’s measurements. So, if you open an account with one dollar and apply a 16% daily rate for one year you’ll have:

$336,640,200,000,000,000,000,000.00

Ok, you argue that’s for an entire year. We’re talking about a pandemic that just lasts a few months. Applying the new case formula in the previous post with a growth factor of 1.16 for three months you’ll get:

$632,730.00

Certainly not chump change. Of course math is agnostic so whether it’s dollars or infections the number is the number. Currently there are 649,904 COVID-19 cases worldwide. Given the virus has been very active since 01Jan2020 (about three months) it looks like our growth factor of 16% is pretty good.

As I type this there are currently 115,547 confirmed cases in the U.S. In 28 days at a growth factor of 1.16 there will be:

7,371,950

cases. NOW do you see why you should take this seriously? NOW do you see why it’s important to stay home? You’ll never be able to benefit from a rebounding economy if you’re dead.

(By the way, I didn’t just pull 1.16 from some dark hole. It’s the growth factor for North Carolina as of 27Mar2020 at 10:30.)

COVID-19 in North Carolina: What the data shows

I’m a data driven type of person. If there’s a situation where numbers are available I’ll use them to make sense of what’s going on. The current pandemic is a perfect example. I became tired of the bloviating on radio and TV and started doing a little data analysis of my own.

A very good source of up to date numbers comes from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. You can see their COVID-19 dashboard here. Every day between 10:00AM and 11:00AM I use this website to find out how many confirmed cases have been reported for North Carolina. I take this number and add it to a dataset that I process with some R code that I’ve written.

Before I get into the actual data I want to go through an easy example illustrating how the data is processed. Let’s say we have an infection that causes the number of cases to double every day:

Day 1 2 3 4 5 6 7
Cases 1 2 4 8 16 ? ?

From these numbers a growth factor can be calculated which we can use to predict the number of future cases:

Day 1 2 3 4 5 6 7
Cases 1 2 4 8 16 ? ?
New Cases 1-0 = 1 2-1 = 1 4-2 = 2 8-4 = 4 16-8 = 8 ? ?
Growth NA 1/1 = 1 2/1 = 2 4/2 = 2 8/4 = 2 ? ?

New cases are the number of yesterday’s cases subtracted from the current day’s cases and growth (G) is the number of today’s new cases divided by the number of yesterday’s new cases. Since the number of cases doubles in our example the G is a constant, 2.

exponential

Which is born out in the sequence:

Day 1 2 3 4 5 6 7
Cases 1 2 4 8 16 32 64

 

 

What the current NC data looks like

Early data was a little noisy leading to wildly different (and horrific) growth factors from day to day. Here’s what the daily growth factors look like:

growth_factors

In order to smooth things out a bit the R code filters out all growth factors greater than 2.0 then takes the average of the most recent seven. Growth factors greater than 1.0 indicate that the infection is spreading, 1.0 is the inflection point where there are a constant number of cases each day, and growth factors less than 1.0 indicate fewer and fewer cases per day. Eventually the growth factor gets to zero (or very close to it).

Plotting the raw data, daily number of cases, and a seven day projection gives the following plot:

The number of daily cases (blue line) has been hovering around the 100 mark for several days. Ideally, this line should slope negative and go to zero. The projected line is for the next week. By 03Apr2020 the data suggests around 2100 confirmed cases. I’ll repost a new graph next Friday for a comparison.

Three musicians are sitting in a bar . . .

. . . in Austria. At 3:00AM. The topic turns to “Funny titles for music”. We aren’t talking about the usual titles like “If You Won’t Leave Me, I’ll Find Somebody Who Will” by Warren Zevon, “Shoop Shoop Diddy Wop Cumma Cumma Wang Dang” by Monte Video and the Casettes, or “Satan Gave Me a Taco” by Beck. We are throwing out titles that could put you into a major existential crisis. Unfortunately, only one stuck with me, “The Negative Space of Speed” that I subtitled “A Musical Null Set“.

Returning from Austria I felt that this title/subtitle needed to actually exist as a piece of music. What I soon realized is there is no possible combination of pitches, rhythms, dynamics, and tempi that could adequately describe this piece. The title implies “not music” or the absence of music. From this realization I managed to score the work for a full wind symphony (minus the piano and harp).

The beauty of the work is its flexibility. It requires NO rehearsal, can be played with ANY combination of wind and percussion instruments, and the musician’s skill level is completely irrelevant. It can also be used to pad an otherwise sparse concert repertoire.

Let’s say that you’ve programmed your concert for six works. But one of the works will have to be cut. There are any number of reasons this could happen:

  • you discovered that you don’t have performance rights to the work
  • the guest soloist turns out to be a serial killer and is arrested the day before the performance
  • the Fire Marshall rules that you can’t play your pyrophone in the concert hall
  • or, the band just can’t play the music.

No problem. Pull the work and replace it with “The Negative Space of Speed“. Your program still has six works and it appears you’re playing a cool Avant-garde 21st century work. Which you are . . . at least conceptually.

Please note that this is not an homage to (or blatant ripoff of) 4’33”, John Cage’s work that explores the idea that segments of time can be filled with structured or ambient sound (noise). Both can be considered music. Additionally, Cage was interested in the behavior of the audience and how their actions contribute to the music.

The Negative Space of Speed” plays with the idea of zero duration. Without duration there can be no music – structured or otherwise. BUT oddly enough, there can still be performance. The conductor turns the page to the score, cues the ensemble, and the audience eagerly anticipates the downbeat of this new piece. What they don’t realize is they missed the piece entirely and the downbeat is for the next work on the list (probably “First Suite in E-flat for Military Band” by Gustav Holst). While this may not cause a near riot as did the premiere of Stravinsky’sThe Rite of Spring” it hopefully will fuel a lively audience discussion of what constitutes music.

And yes, you too can add “The Negative Space of Speed” to your repertoire. The perpetrators of this work have decided to make it a free download!! Just click here.

 

 

Electronic Brain Droppings

I’m finally learning the electronics that I should of learned 40 years ago. I did a bunch of Heathkit projects (Heathkit is still around but a shadow of their previous selves.) including my entire stereo system but I never delved too deeply into the theory of how the circuits worked. So now I’m back studying circuit theory, operational amplifiers, first-order filters, and an entire zoo of 21st century electronics design. (That integrated circuit does ALL of that!?!)

Having designed a UV densitometer for my photographic work I’m already thinking about a second, more sophisticated, version. I’m also wanting to explore musical vibrato from the standpoint of what repetition and depth frequencies people find pleasing. Both of these will require some careful design work.

One thing I’ve already had to deal with is non-standard resistor values. When working with operational amplifiers there are sets of equations used to calculate resistor or capacitor values. Since I have a bigger collection of resistors than capacitors it’s easier for me to fix the capacitor value and calculate the resistor value(s) I need. Of course these calculated values are generally no where near the standard resistor values. Figuring out which combination of resistors will get you closest to 27,658 ohms gets very tedious. So being a retired programmer I did what any programmer would do . . . I wrote a python script to find the combinations for me. (View the source code here.)


A note to any programmers who may look at this code.

I don’t care that I’m not using some latest-greatest, neat-oh-bleet-oh Python 3.x feature. I don’t care that there’s a “better way” to do this or that there’s a more “acceptable” way. I’m a RETIRED programmer. This is for my amusement and if you find any part of this useful all the better.

The script accepts the target resistance and a percentage tolerance and looks for either three resistors in series and two resistors in parallel plus a series resistor whose value is within the percentage tolerance provided. Running the script without arguments and with the ‘-h’ option gives the following expected output:

wemrt% ./r_finder.py 
usage: r_finder.py [-h] [--record] resistance tolerance
r_finder.py: error: the following arguments are required: resistance, tolerance

wemrt% ./r_finder.py -h
usage: r_finder.py [-h] [--record] resistance tolerance
positional arguments:
      resistance  resistance in ohms e.g. 1000000 not 1M
      tolerance   percent tolerance e.g. 5 not 0.05

optional arguments:
       -h, --help  show this help message and exit
       --record    If provided write results to r_finder.dat

 

Running the script with the 27,658 ohm value mentioned above at a tolerance of 5% produces:

wemrt% ./r_finder.py 27658 5

Target resistance: 27658 ohms
Acceptable range: 26275.100000 to 29040.900000 ohms

                    Series Solution

Resistor 1: 27000 ohms
Resistor 2: 560 ohms
Resistor 3: 82 ohms

               Parallel/Series Solution

   First resistor: 33000 ohms
Parallel resistor: 150000 ohms
    Network value: 27049.18 ohms
  Serial resistor: 1800 ohms

  Series Solution: 27642.00 ohms at 0.06%
Parallel Solution: 28849.18 ohms at 4.31%

at very low resistance values there may not be a good serial solution but a parallel solution exists:

wemrt% ./r_finder.py 89 5.0

Target resistance: 89 ohms
Acceptable range: 84.550000 to 93.450000 ohms

                    Series Solution

Resistor 1: 82 ohms
No resistor found less than 7 ohms.
No solution found with tolerance 5.00%

               Parallel/Series Solution

   First resistor: 100 ohms
Parallel resistor: 680 ohms
    Network value: 87.18 ohms
No serial resistor found.

  Series Solution: 82.00 ohms at 7.87%
Parallel Solution: 87.18 ohms at 2.05%

and at very large resistance values there may not be a good parallel solution:

wemrt% ./r_finder.py 1432947 5.0

Target resistance: 1432947 ohms
Acceptable range: 1361299.650000 to 1504594.350000 ohms

                    Series Solution

Resistor 1: 1200000 ohms
Resistor 2: 220000 ohms
Resistor 3: 12000 ohms

               Parallel/Series Solution

   First resistor: 1500000 ohms
Parallel resistor not found.

  Series Solution: 1432000.00 ohms at 0.07%
Parallel Solution: 1500000.00 ohms at 4.68%

In order to see the behavior of this script I wrote a driver to generate test cases that used the –record option to capture the solutions found. The driver accepted a starting resistance, ending resistance, increment, and percentage tolerance generating a file looking like:

#!/bin/sh
./r_finder.py 2000 5.000000 --record > /dev/null 2>&1
./r_finder.py 4000 5.000000 --record > /dev/null 2>&1
./r_finder.py 6000 5.000000 --record > /dev/null 2>&1
./r_finder.py 8000 5.000000 --record > /dev/null 2>&1

The output from running these commands was fed into R and plotted:

Obviously there’s jitter in the solutions but raising the resolution to every 50 ohms to 100,000 ohms gives a more interesting plot:

The serial solution will (almost) always beat the parallel solution but the periodicity of the solutions is interesting. Of course the final decision as to which solution to use depends on the sensitivity of the circuit, availability of parts, and possibly even the amount of space available on the PCB board.

 

The Mahler Hammer at Eastman

The Eastman Philharmonia under Neil Varon’s baton performed Mahler’s 6th Symphony on 14Nov2018. In need of a Mahler Hammer (on short notice) Neil contacted me about either renting, buying, or having one made. Since the hammer detailed on this site was available from the Duke University Wind Symphony I worked with the Eastman School staff to have it shipped to them in time for their last rehearsals and performance.

IMG_1343

Good striking form!

hammer

Mahler Hammers can also be used for “percussive maintenance”!

 

 

 

Just in case you thought . . .

. . . my previous post was useless, here’s something to consider.

You go to a nice restaurant and order a $40 glass of Domaine de Montille Corton Clos du Roi Grand. The sommelier pours you a bit, you swirl, smell, sip and nod your approval. The sommelier pours your wine and leaves you looking at the glass wondering if you’ve been cheated.

The shape of the glass is determined by the function sin(x). Here’s the plot (x=0.0 to 1.5 radians) from the R statistical system. It’s been rotated 90 degrees counter clockwise to make visualization easier and is marked at x=0.75 (half the height) and x=1.111871 (half-full by volume).

winehalffull

Using this as a template draw a cross-section of a nice wine glass and fill it to half the height (fig. 1) and half the volume (fig. 2) like so:

figures

Being a beer kind of guy, if I’m going to spend $40 on a glass of wine, I want a full glass of wine. But, as you can see what appears to be a fairly full glass of wine is still only half full by volume. To prove this we can use the program from the previous post to calculate the volumes bounded by a = 0.0 and b = 1.111871 and a = 1.111871 and b=1.5. Here are the results:

[Walters-iMac:~/desktop] wemrt% python halffull.py
lower_bound = 0.000000
upper_bound = 1.500000
height      = 1.500000
volume      = 2.245359
half-volume = 1.122680 <- This should be the volume of the
                          top half of the glass.

Tolerance : 0.000001
Iterations: 20
lower_bound = 0.000000
upper_bound = 1.111871
height      = 1.111871
volume      = 1.122679

Then using the upper_bound for the lower_bound and 1.5 for the upper bound:

[Walters-iMac:~/desktop] wemrt% python halffull.py
lower_bound = 1.111871
upper_bound = 1.500000
height      = 0.388129
volume      = 1.122676 <- Off by 4 X 10^-6 rounding error.
half-volume = 0.561338

Tolerance : 0.000001
Iterations: 19
lower_bound = 1.111871
upper_bound = 1.315996
height      = 0.204125
volume      = 0.561337

So, yes you we’re being shorted by quite a lot of wine. Call the sommelier back, show him this post and tell him that for $40 he can bloody well give you half a glass of wine.

 

Half-full? Half-empty? Neither . . .

I ran across this image on FaceBook that had various answers to the classic “Is the glass half-empty or is it half-full?” question. The glass is classically rendered with the water level appearing about half-way up the height of the glass. After considering the possible answers I decided that the only one that was correct was the surrealist because the two equal length frustums (one filled with water and the other air) had different volumes. But this was an offhanded observation that I felt needed some proof. Sooo . . . here we go.

First, let’s design a glass from an easy function:

glass_half_empty_half_full

 

I’m going to divide the glass in two exactly at half its height. The question is, do both halves contain the same amount of water? To answer that use the formula found on the wikipedia page:

two_frustums

So where along the x-axis is half full? There are a couple of ways to approach this question but since we’re already thinking about the above diagram as a solid of revolution and with a little programming a general approach for all glasses can be devised.

The basic idea is to derive the definite integral of the function so that by setting the limits appropriately we can find the total volume and search for the x value that evaluates to half the total volume. The integrations of two functions (the one above and another for a more curved glass) can be seen here. The formulas are coded into the following script:

python_script

When this script is executed the output is:

lower_bound = 2.000000
upper_bound = 4.000000
height      = 2.000000
volume      = 14.660766
half-volume = 7.330383
Tolerance : 0.000001

Iterations: 23
lower_bound = 2.000000
upper_bound = 3.301927
height      = 1.301927
volume      = 7.330384

So the x-axis value which divides the glass into two equal volumes is the lower bound plus the height of 1.301927 or 3.301927. Checking the math using this new value gives:

last_check

Well, ok, they aren’t exactly equal . . . the difference is 3.708323 X 10^-6. Close enough. The glass with the correct amount of fluid in it would look like the following.

really_half_full