Trends in Covid-19 Spread
The John Hopkins University (JHU) collects data on how the COVID-19 virus spreads over the world and makes it available on github. In this analysis, we take the data and extract the numbers that describe how the growth of casualties develops. Looking at the trends in these numbers, one may deduce how fast or how slow the virus is spreading and also compare how the situation evolves in different countries.
Feel free to use the python implementation that produces the presented plots and that can be modified to show the results for any country that is in the JHU data base.
This post bases on joint efforts and discussions of the CSC group at the MPI Magdeburg.
The numbers in the presented plots are updated automatically1.
Table of Contents
Why This Analysis
If one looks at the actual numbers generated by an exponential growth, the overall growth is so strong, that one barely sees changes in the dynamics. If one looks at the numbers in log scale, the data becomes more handy. In log scale an exponential growth looks like a straight line and the slope of the line indicates how fast the exponential growth is.
To see whether the exponential growth changes its dynamics, one may check whether the line in the logarithmic plot changes its slope (hopefully it gets less steep). That’s why we plot the slope of the logarithmic line that connects the numbers of two consecutive days. Since the data is wiggly, we also plot averages of the slopes over 2 or 5 days to better spot the trends.
This post was inspired by an illustration that was discussed on twitter2 in the last days: It shows how, seemingly, the exponential growth in casualties did decrease some days after lockdowns have been implemented. In this analysis, we try to automatically detect such changes in the growth for a number of countries and represent it in an accessible form that allows easy comparisons.
We examine the (logarithmic) slope of the curves of the accumulated casualties
per day. That is for day
d and the day
d-1 before that day, we plot
log2(x[d]) - log2(x[d-1]), where
x holds the number of accumulated deaths for every day
log2 is a function that computes the logarithm of a number with
respect to the basis3
To make sense of the numbers, we start with some example scenarios. In the plot below, we have plotted the values for some fictitious growth scenarios.
An exponential growth – every day
dthe number of additional casualties is
1.1to the power of
dand think of a daily increase by 10%). This scenario leads to a constant value in the plots of around
A constant growth – every day another
10casualties are added. This is like the number of daily deaths due to traffic incidents in Germany4 in 2010.
A growth that decreases exponentially – every day the number of additional casualties gets smaller exponentially.
General Explanations of the Numbers
- A constant value means that the number of deaths grows exponentially.
- If this constant is 1, this means that the numbers of casualties doubles every day.
- A value of about 0.5 means a daily plus of 40%.
- A decreasing curve indicates that the exponential growth of seriously infected people is stalled or reversed.
- If the value approaches 0, this indicates that the virus is contained.
What It Would Look Like and What It Should Look Like
To have a comparison of what an uncontrolled spread and what a well controlled spread would look like, we ran two simulations in a covid simulator5.
- A scenario for Germany in which no interventions are taken.
- A scenario for which we defined interventions such that the number of hospitalized patients that require intensive care (ICU) was always below 45000.
In the uncontrolled case, exponential growth is detected with a slope of about 0.18. After some time, when most people have been infected the, growth decreases down to zero.
In the controlled scenario, the rate is brought down in the initial phase. Then, exponential growth happens at a lower rate (in the plot the value is about 0.07) though for a longer time before it fades out.
To relate the slopes to the actual cases, in a second plot, we display the corresponding numbers of people that will need intensive care in a hospital. In our model, we assumed that 1.15% of the infected people will have to be treated in an intensive care unit.
The Actual Numbers
As of the date indicated in the plots, the JHU data delivers the following numbers. We plot the logarithmic slopes for several countries for the last 50 days. See the pdf file for more countries and a better resolution of the plots.
Explanations of the Data Representation
Because of natural fluctuations and, maybe, because of varying delays in the data transmission to JHU, the data does not describe a nice, smooth curve. Accordingly, the computed slopes can vary a lot, what leads to the point clouds like in the plot for France 🇫🇷. The smoothing effect of the 2 days and 5 days average makes the corresponding data points look smoother. There are a few points that lie outside the range of the plots. However, we do not adjust the plots to fit all outliers, since then the interesting parts of the curve will be less well resolved. On the other hand, the plot range of China 🇨🇳 is adjusted to better resolve the small numbers.
Some Interpretation of the JHU Data
As of today, one may say that:
- In China 🇨🇳 the spread seems to be under control.
- Italy 🇮🇹 seems to have weakened the exponential growth.
- For Spain 🇪🇸 the numbers report a visible decrease in the growth rate. The rate is still high though. However if this decrease goes on, Spain might not overtake Italy.
- A comparison of France 🇫🇷 and Spain shows the effects of low growth rates. In the last days, the rates were similar. And although the virus seems to have hit France before it hit Spain, the total numbers in France are smaller by a factor of 3.
- In Germany 🇩🇪 the rates seem to be decreasing but are still high. However since the decrease started earlier than for Spain, France, or Italy, there is reason to assume that Germany will suffer less from the virus (provided that the trend is stable).
Other Things that can be Seen
- The curves seem to have similar phases. A phase of exponential growth followed by a decrease.
- The curves differ in the length of the phases (France seems to have a long phase of exponential growth if compared to Spain)
- And the starting points are different which gives a hint on when the outbreak became visible in the different countries.
Notes and Acknowledgements
On the data
Certainly, the casualties lag6 behind the actual spreading of the virus by a number of days. However, one may think that numbers of casualties are a more reliable data point than the number of infected.
This analysis is purely based on empirical trends. No statistical data analysis tools have been applied to (pre-)process the data, like data denoising (except that some outliers might not be shown because of the plot margins).
The initial work, namely making the data easily available in python as well as the code that produces the title picture, was done by Petar Mlinarić.
Updates once a day in the morning. ↩︎
The basis is not too important here. If one takes 2, then a value of 1 means a doubling. A different basis would only scale the plots but not change the qualitative outcomes. ↩︎