posts » COVID-19: some epidemiological modelling

COVID-19: some epidemiological modelling

My last post was about Covid-19, essentially advising you to self-isolate - even if you don't yet (think you) have the disease. I've had a lot of responses to that: essentially with people agreeing to follow my advice. However, I think it's important to give some explanation of why this is so important. Today, therefore, I started trying to develop some models. In this post, I describe my findings (and the limitations attached to them).


First off, I'm again going to provide the caveat that I am not an infectious disease modeller. That said, "Modelling and the Dynamics of Infectious Disease" was one of the modules I took when I did the MSs. Epidemiology course at the London School of Hygiene and Tropical Medicine. So I have a little bit of experience - and, still, the course handbook! Today, I dug out those notes and have been examining the first few lectures; I've also received some more recent material from a friend of mine who recently went on another infectious disease modelling course at the LSHTM (this essentially provided me with some pointers of how to code in R - which if you've also read my Open Science post you'll know is my preferred statistical prgramming language).

Now that's out of the way, we can get on with thinking about things. I'm going to base this post on the situation in the United Kingdom, as that's the country I'm most familiar with and know how to get the data for. But it should be easy to adjust for other places if you prefer.


I also want to say early on in this post that there are a lot of assumptions that are made here - by myself but also implicit in the models themselves. Therefore, please take these figures with a (large) grain of salt: it's probably wise to listen to experts over me, as they will have access to more complicated code that is able to take into account many more factors than I am able to.

Specifically, one of the biggest assumptions that is made in the following is that there is random mixing between all individuals: something which is never true. Think about it: kids go out to school and mix with lots of people, but it's generally the same people every day; similarly, adults will tend to mix with a few particular people (at work) and with their families. Most mixing is not random, even if there is an element of randomness in some of our daily lives (who's serving us when we go to the supermarket, which carriage of the train we get on and who else is in it, etc).

On the flip side, though, please think of the six degrees of separation which suggests that everyone in the world is connected to everyone else through a maximum of six steps through other contacts.

Model compartments

Now, the easiest types of model to understand are deterministic models. Luckily, these are well-suited to what we want to do. Based on the known clinical course of a disease, we can assign people to "compartments" - a typical example with three compartments may be:

  1. individuals who are susceptible to a disease;
  2. individuals who have the disease - and thus can give it to others;
  3. individuals who have recovered from the disease and are now immune.

This model is typically know as an "SIR model" - for Susceptible, Infectious and Recovered/immune.

COVID-19 is slight more complicated than that, and we can compartmentalise it further. Most people in the literature seem to currently be considering it with four compartments - the extra compartment is individuals who are infected but not yet infectious, and is often known as 'E'. For example, a recent article in the Lancet Infectious Diseases journal (published on 11 March 2020) entitled "Early dynamics of transmission and control of COVID-19: a mathematical modelling study" used this, an SEIR, approach.

There's one additional compartment we should also add to our model, however, and that's the number of deaths. I'm going to call this 'D' as I don't know what the main convention is but it seems logical to me. The reason I want this compartment is because I explicitly want to know how many deaths there might be in the UK for each of model scenarios. So now the model looks like this:

              +-> R
S -> E -> I --|
              +-> D

Model parameters

Once we've decided on the compartments, we then need to figure out how many people are in each compartment, and how they move between them. These can be done using known or fictitious data or, as is more common, a combination of both. Indeed, we normally don't know what all the correct parameters are so we are trying to guess them - and hence the term modelling, as we are seeing what the results will be assuming different scenarios. Anyway, let's start with building up a relatively simple model, and seeing what happens.

Our first compartment, S, has no one entering it - as no one has the disease - and the only people leaving it are those who are becoming infEcted (i.e. joining compartment E).

The second compartment, E, has those entering it from compartment S, and those leaving it to join compartment I (Infectious). Similarly, compartment I has people joining it from compartment E, and also those leaving who are (mostly) going to join compartment R as they are recovered and, we hope, now immune and thus no longer susceptible (I'll add the deaths in a little while, below). This is relatively easy to model, and we can call the parameters as follows:

  • beta: the chance of someone who has the disease giving it to someone who is susceptible to the disease.

  • delta: the rate at which people who are infected become infectious.

  • gamma: the rate at which people who are infectious recover.

Each of these parameters can be estimated from other parameters that we already know (or think we know) about the disease. Chief among these is the basic reproduction number, R_0 which represents the number of people in a completely susceptible population who will get the disease if just one infectious person is introduced to that population. If we know the basic reproduction number, we can work out beta. Similarly, if we know how long people are infected but not infectious for (i.e. in a waiting or "latent" period), we can work out delta, and if we know how long people are infectious for, we can work out the recovery rate gamma.

An example using measles

Let's look at all this in an example, shown in the figure below, using measles. Measles has an incubation period of around 10-12 days before the onset of symptoms, and is then said to be infectious from 4 days prior to onset of rash to 4 days after; the rash itself appears about 3 days after the symptoms start. For the sake of argument, we will say the latent period (when infected but before becoming infectious) is 9 days and the infectious period is 8 days. Measles is also very highly infectious, with a basic reproduction number of around 14, so we will also use that information. Finally, we will say that our population is completely susceptible and consists of 100 people. Here the graph:

Hypothetical measles infection in a susceptible population of 100 people

You should see there are six lines, although in fact we've only really using (and interested in) four of them - as there are no deaths (black line), the total population (yellow line) remains the same throughout. The other lines, however, are the interesting ones. They show:

  • in green: the number of people who are susceptible to infection. This starts of at everyone (well, 99 people, as one person is infected at the beginning) and drops pretty rapidly until basically everyone is infected by day 20.

  • in cyan (light blue): labelled as "exposed", this is the number of people who are currently infected but not yet infectious and thus are not spreading disease. This shows the highest peak at around 15 days after the outbreak has started.

  • in red: labelled as "infected", this is the number of people who are infected and infectious - i.e. can transmit measles to other people in the population.

  • in (dark) blue: this line shows the people who have recovered from measles and are no longer at risk of catching the disease because they are immune.

Looking at COVID-19

So what do we know about COVID-19? Actually, quite a lot already - some countries have been extremely good at collecting and sharing data, and there has been a big push for making the science as open as possible.

Here are some data:

Here's the graph:

COVID19 - very basic/naive graph, R zero = 4

There are a number of assumptions I have made here, beyond those of the basic model. First, I've assumed everyone recovers. Second, I've assumed the population is static over time - no births or deaths, and no migration - and that the number of people is equivalent to all those reported as living in the UK in the middle of 2018 which is 66,435,600. Finally, I assume that the basic reproduction number was 4, the infectious period was 14 days and that 100 people were originally infectious. The time scale along the bottom is in days, so we see that the peak of infections occurs at around 133 days - or after 19 weeks. I'd also like you to note at this point that the peak of that curve is about 20,000,000 people (indicated as 2e+07 on the y-axis).

COVID-19 and fatalities

Well, what we're really worried about is dying. In good health care systems, the fatality rate for COVID-19 has been reported as being as low as 0.3% to 1.0%, but where health care is not available, it's much higher. Here's a graph where I've tried to take that into consideration:

COVID19 - including 1% fatality rate, R zero = 4

I've kept the same assumptions as above - that the basic reproductive number is 4, that the incubation period is 5.1 days and people are infectious for 14 days, after which time they either recover (99% of people) or die (1% of people). The total population size is again 66,435,600 with 100 people initially infectious.

Again, we see a peak of infected people around 133 days (19 weeks) although this time slightly fewer people are affected at once - perhaps 15 million; some of them are now dying instead of recovering, so this changes the population make-up a bit as we can see from the yellow line. Also, again similar to the previous graph (to be expected because none of the basic parameters have changed), "take off" (when the number of people exposed and/or infected starts to become much above zero) really starts to occur around day 77 (11 weeks). It's also interesting to note that the black curve (representing deaths) only takes off considerably after the other curves, probably around day 98 (3 weeks after the other curves).

Varying R zero

In the next graph, the only thing that I've changed is the basic reproductive number. This time, R zero is set to 3 instead of 4, which is perhaps slightly more consistent with the literature. See The reproductive number of COVID-19 is higher compared to SARS coronavirus by Ying Liu, Albert A Gayle, Annelies Wilder-Smith and Joacim Rocklöv in the Journal of Travel Medicine which was published in mid-February for a summary of some estimates. There are others, too.

COVID19 - including 1% fatality rate, R zero = 3

This time, you can see that the take off point for all the curves is shifted to the right (i.e. it occurs later) and that the height of the red curve is not so high: there are perhaps 10 million people affected at one time, rather than the 15 or 20 million we saw previously. This is what is known as "flattening the curve" You will also notice that the black curve hasn't really change in shape or height, just position: it's move to the right as it has been delayed, although the total number of people affected are still around the same as previously.

More about assumptions - and what you can do

I've made a lot of assumptions in this post, as I've already mentioned, so please please please do not take any of the numbers I've provided as The Truth. Indeed, I've quite possibly made an error in the code, too, so I wouldn't want anyone to rely on the statistics I have provided (at least not until I can discuss things with friends and colleagues who are more knowledgeable than I in these matters). But what is important is the shape of the curves, particularly in relation to R zero.

What can be done about this? Infectious transmission and outcomes from infectious disease depends on both the infecting agent and on the organism being infected - in this case, us. The agent itself has a degree of infectiousness, but this depends on the abilities we provide it with to be transmitted from one person to another. As with the human immunodeficiency virus, where using barriers to protect humans from other humans' bodily fluids (whether that be gloves and protective clothing in medical and dentistry work or the use of condoms for safe sex), our own human behaviours are hugely important. For COVID-19, we simply do not know enough yet:

  • How is it transmitted? We think it's aerosol spread, but we're not yet completely sure.

  • When is it transmitted? We think it's from the onset of symptoms, or perhaps a few days before, until symptom resolution (or death). But we're not yet completely sure - it could be much longer.

  • Are we immune after we've had it? We think people gain immunity, but we're not yet completely sure.... there are certainly stories circulating about people who've had it twice.

  • Can we predict who's going to be affected badly, or who might die? We think it primarily/only affects older people, but we're not yet completely sure... There are clearly other important risk factors we haven't identified yet.

  • How can I protect myself? .... Well, I'd like a Nobel prize too, but this is currently where much of the debate is at.

Indeed, the real important question is, "how do we protect each other?" The UK has talked about "herd immunity" - in the graphs I've shown, this is akin to the end bits of the graphs, where there is a constant (very low) rate of new infections, along with constant rates of the number of people susceptible and recovered. That's going to take a long time, and if that is the main method of dealing with this disease I think a lot of people will die as health care services will be overwhelmed by the high R zero. In fact, many countries have advocated social distancing; some haven't. In France, schools and educational establishments (including universities) are now closed, all meetings are being cancelled and people being told to work from home wherever and whenever possible. The Provost of UCL in London, Professor Michael Arthur, wrote "These are extraordinary times, and as such, require an extraordinary response;" all face-to-face teaching at the university has been cancelled as of yesterday. Indeed, those places that are shutting down and allowing people to self-isolate probably realise that isolation is currently our only weapon: we need to slow down the epidemic, we need to enable more time for us to develop knowledge, to develop treatments (a vaccine!) and to try to prevent our medical systems getting overwhelmed. The next six months is going to be tough.

is there good evidence on the number of days before any symptoms that a person can i infect others? i hD another comment but it took such a long time to go through the rigmarole of getting. link from you that ive forgotten what it was!

Comment by j.ginn2 Sun 15 Mar 2020 14:41:41 UTC

Hi Jay,

Thanks for the comment. The question you ask is a really important one: current estimates are that the incubation period is 5.1 days and that 97.5% of people develop symptoms by 11.5 days. So most people will be infectious before they have symptoms. More detail is provided in an article entitled The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application published on the 10 March 2020 by Stephen Lauer et al in the Annals of Internal Medicine.

Best wishes,


Comment by asm [] Sun 15 Mar 2020 18:07:07 UTC

I will also see what I can do to make the commenting process easier over the coming days. But it's not, I will admit, my priority! Depending how much I can write and how much other time I have, I will maybe see if someone else can help me with things...

Anyway, thanks for persisting! I hope you will consider commenting further.

Comment by asm [] Sun 15 Mar 2020 21:15:53 UTC