Through our data management and analytics consulting services we help companies outline a solid data strategy and foster an insights-driven culture

We humans are suffering from systematic flaws in rational judgment and in understanding the world. Such systematic errors in perception are known as cognitive biases. Wikipedia lists over 100 well studied cognitive biases and if you think such biases concern others but not you, well, you are not alone. Like most of us you just suffer from bias blind spot, which is a cognitive bias itself!

With the current outbreak of COVID-19 going on, a cognitive bias that puzzles millions of people in home-isolation around the globe is the *exponential growth bias* [1]. Our brain is not wired to think in exponential terms. We can easily estimate where we will get if from our front door, we take 30 steps, but we cannot estimate how far we will get if we take 30 exponential steps [2], that is, double our step each time. 1+2+4+8+16. The answer is, we’d go around the globe 26 times!

In COVID-19 blog posts series we will investigate how interactive visualisations and predictive analytics can help to make sense of the outbreak data

Coronavirus spreads exponentially rather than linearly. Exponential growth bias makes it very hard for citizens to evaluate the status of the pandemic in their countries including the potential outcomes in forthcoming days. A major aim in our COVID-19 series of blog posts is to present and provide interactive visualisations and tools, apprehensible by the non-expert, that will help to ease the exponential growth bias barrier and shed some light to trends and correlations underpinning COVID-19 data.

Questions that we will try to deal with are:

- Do we have reliable means to investigate how is one country doing compared to another?
- How well do exponential evolution examples (like the one with the exponential steps out of your front door) reflects the actual coronavirus spreading situation?
- Is COVID-19 severity correlated with demographics, weather and socioeconomic conditions?
- Which are the measures that each country took and when? How effective are they?
- Can we predict for a certain country or city the overall outcome and the time evolution of the disease?
- Which is the AI and Machine Learning answer to the outbreak?

Before delving into ways to visually assess the COVID-19 outbreak, let us consider death daily reports ranging from the 22^{nd} of February to date, i.e. the 10^{th} of April. We pick 5 countries as an example: Italy and Spain, representing countries having suffered much, but they have reached the peak, UK and US, that where hit later in time and they are still getting worse in terms of numbers of people infected and fatalities, and, finally, we have Belgium representing a country with a relatively late increase in affected cases.

If we plot the cumulative (overall) fatalities each date, we get the chart above. As for today (4/10/2020), the impression we get is that the situation in Italy and Spain is similar, the UK case is milder, and US deteriorates in a fast pace. Note however that if we were back on the 4^{th} of April so we only had data up to that point, then we might have concluded that US is likely to have a profile like Spain and Italy. Regarding Belgium, it seems to experience a much milder and relatively well controlled outbreak. Feel free to include in the chart the countries of your choice. You can select them from the drop-down menu or just write down the country you want in the country selection pane. Is this chart the best approach to visually get a sense of exponential growth and assess the situation in each individual country? The brief answer is No! We suggest relying on three complementary techniques: *time alignment*, *use of a logarithmic scale* and *exponential fit*.

Media and policy makers are making statements like “we are 2 to 3 weeks behind” country X, where, X is often Italy. Such comparisons, if correct, provide a tangible assessment of the situation and bypass exponential growth bias. To generalize such a comparative analysis, we must align all countries in time, as if the outbreak had started on the same day. There are many ways to achieve this. The less elaborated one is to consider as day 1, the dates that the first case was recorded in each country. A somewhat better option is to let several cases to accumulate and align all countries to have a fixed number of total cases, e.g. 50, in a common day. We follow the latter approach in the next chart.

The conclusions we can draw now are much improved^{3}. Spain apparently is in a wrorse position compared to Italy (at least until around day 45) and US situation is rapidly getting worse compared to Spain and Italy. The situation in UK is as bad as in Italy. Belgium, on the other hand, seems to be in a worse situation compared to what we could observe without time-alignment.

Time alignment allows reliable comparative analysis among outbreaks

Time-alignment is more beneficial when comparing outbreaks occurring far apart in time or if some of them are in their early stages of development.

Although time alignment reveals insights regarding the agility of the outbreak in each country, there is a clear issue to tackle. The early days of each outbreak are compressed near the bottom of the chart and hard to compare. This is inevitable because of the nature of exponential growth, that it takes a while to get going, but, before long, it skyrockets. Such an example is depicted next, involving the confirmed infections in US, Spain, and Greece. Clearly, comparisons among the early stages of the epidemics in US and Spain are hard to be made and any visual assessment about Greece is not feasible.

The solution to this is to change the vertical axis of the chart from linear to a logarithmic scale. This means, instead of rising linearly, 10, 20, 30…, 1000, etc. to escalate in an exponential fashion, e.g. powers of 10, (10, 100, 1000…). Unless you are well into maths, it might be hard to grasp this. There are just two things to take with you when visualizing data on the log scale; a) the space we preserve in the vertical axis to visualize cases 1 to 10 is equal to the space we preserve for cases 11 to 100 or 101 to 1000, 1001 to 10000, etc. This is helpful since we can now offer more space to the small data values that correspond to the onset of each outbreak. b) If the cases of a country are growing exponentially, then in log-scale they will appear on a straight line. In a later post we might also investigate the potential of log-scale representations revealing the most impactful of the containing measures that each country took. Let us see how the previous example appears when we the vertical axis is in logarithmic scale rather than linear:

The clutter of the small values is resolved. An extra merit is that we can easily compute the number of days that it took for each country to increase the reported cases by a factor of 10, say from 1.000 to 10.000. We just need to count the dots lying between the corresponding grid lines, something which reveals that in the early stages of the epidemics, the confirmed cases in Spain escalated in a somewhat faster pace than in the US. Spain took 15 days to go from 100 to 10.000, whereas to the US it took 17 days. After that, the growth rate of Spain slows down (takes another 15 days to go from 10.000 to 100.000 compared to 9 days for the US). It is helpful to point over the data points to get a tooltip listing the exact date and number of cases.

Logarithmic scale is beneficial when focusing on the early stages of an outbreak

Most often, when coronavirus spread is related to exponential growth, exponential growth of fixed growth rate is implied. The growth rate is an expression of the percentage increase of cases from any day to the next. A 10% growth rate means that if you get 100 cases you will have 110 the day after or when you have 1.000.000 cases you are going to have an extra of 100.000 cases the day after that. In a log-scale chart this means that it should take the same number of days to get from one grid line of the log-scale chart to the next, which is equivalent to have all the points lying on a straight line. Fortunately, the real data reveals that such a behaviour is a roughly accurate approximation only during the early stages of the epidemic, say, in the US, going from case 100 to 20.000. After that the growth rate is getting lower for several reasons including the excessive measures that the states took to contain the epidemic and the fact that the ability of the virus to infect new people is gradually reduced, either because potential candidates have been ill and recovered, and therefore immune, or they passed away.

What does it mean to not be able to grasp exponentially evolving phenomena? It means that when you have a series of data from date A to date B, we are not capable of imaging the data to come during the following days. Well, here comes the aid of predictive modeling! We can take a model that is constrained to obey an exponential growth behaviour and tune it in a data driven way (i.e. using the available data from date A to date B) to fit them best. This means, among all possible exponential growth curves (which are infinite), to find and use the one that follows the available data as tightly as possible.

Assuming fixed growth, exponential fit estimates the expected future case numbers

Let us take the US example and find the exponential curve that best fits the data up to the first 1.025 confirmed infections, i.e. up to the 11^{th} of March indicated by the vertical dashed line (“Use data up to this day”).

The thin transparent curve is the predicted exponential curve, and, as it can be seen, fits well up to the 27^{th} of march, where the total number of cases are 85.991. (You can confirm the exact values in the tooltip appearing when hovering over the data points). Moreover, if we hover over the exponential curve of the US, we get the growth rate of the predicted exponential fit, which is 32.51%. This means that it is predicted that the confirmed infections in the US in each day is 32.51% larger than the day before. Fortunately, the true growth rate of the infections was not fixed, and it lowered after the 27th of March where the estimated model diverged from the true data.

Assume the growth rate in country X is 10% fixed. How many days it takes from 10 cases to get to 1000? If your answer is 100 days, do not worry, it is the exponential growth bias to blame!

Exponential growth fit failed (good for us!) to express the Coronavirus spreading dynamics. This was because the epidemic is not growing with a fixed growth rate. In following blog posts we will explore more reliable approaches to deal with this.

In the mean time feel free to fully interact with LIBRA MLI's COVID-19 interactive tool in https://libramli.ai/tools/covid-19-visualisation-tool-A.

Stay tuned!

[1] Levy, M.R. and Tasoff, J., 2017. Exponential-growth bias and overconfidence. *Journal of Economic Psychology*, *58*, pp.1-14.

[2] This nice example has been taken from *Abundance** : The future is better than you think*, by Peter Diamandis and Steven Kotler, Simon and Schuster publications, 2012*.*

[3] The charts you see here are live so they will be automatically updated in a daily basis. Today (13^{th} of April), the data we plot in the chart reach 49 days in US, 52 in Spain, 62 in Italy, 47 in UK and 43 in Belgium. Some of the conclusions might change when the situation evolves.

Data innovation strategies

Through our data management and analytics consulting services we help companies outline a solid data strategy and foster an insights-driven culture

Visual Data Exploration

Our tailor-made exploratory dashboard suites allow our customers to make sense of critical aspects of their business and data and make informed decisions.

AI-enabled business intelligence

Our mission is to provide our clients with affordable and easy to use artificial intelligence that turns raw business data into actionable insights.

AI-powered products & projects

AI can drive true innovation, differentiate your product case and help you gain a competitive edge.