## COVID-19: some epidemiological modelling

My last post was about Covid-19, essentially advising you to self-isolate - even if you don't yet (think you) have the disease. I've had a lot of responses to that: essentially with people agreeing to follow my advice. However, I think it's important to give some explanation of why this is so important. Today, therefore, I started trying to develop some models. In this post, I describe my findings (and the limitations attached to them).

### Caveat

First off, I'm again going to provide the caveat that I am **not** an
infectious disease modeller. That said, "Modelling and the Dynamics of
Infectious Disease" was one of the modules I took when I did the
MSs. Epidemiology course at the London School of Hygiene and Tropical
Medicine. So I have a little bit of
experience - and, still, the course handbook! Today, I dug out those
notes and have been examining the first few lectures; I've also received
some more recent material from a friend of mine who recently went on
another infectious disease modelling course at the LSHTM (this
essentially provided me with some pointers of how to code in
R - which if you've also read my Open
Science post you'll know is my preferred
statistical prgramming language).

Now that's out of the way, we can get on with thinking about things. I'm going to base this post on the situation in the United Kingdom, as that's the country I'm most familiar with and know how to get the data for. But it should be easy to adjust for other places if you prefer.

### Assumptions

I also want to say early on in this post that there are a *lot* of
assumptions that are made here - by myself but also implicit in the
models themselves. Therefore, please take these figures with a (large)
grain of salt: it's probably wise to listen to experts over me, as they
will have access to more complicated code that is able to take into
account many more factors than I am able to.

Specifically, one of the biggest assumptions that is made in the
following is that there is *random mixing between all individuals*:
something which is **never** true. Think about it: kids go out to school
and mix with lots of people, but it's generally the same people every
day; similarly, adults will tend to mix with a few particular people (at
work) and with their families. Most mixing is not random, even if there
is an element of randomness in some of our daily lives (who's serving us
when we go to the supermarket, which carriage of the train we get on and
who else is in it, etc).

On the flip side, though, please think of the six degrees of separation which suggests that everyone in the world is connected to everyone else through a maximum of six steps through other contacts.

### Model compartments

Now, the easiest types of model to understand are *deterministic*
models. Luckily, these are well-suited to what we want to do. Based on
the known clinical course of a disease, we can assign people to
"compartments" - a typical example with three compartments may be:

- individuals who are susceptible to a disease;
- individuals who have the disease - and thus can give it to others;
- individuals who have recovered from the disease and are now immune.

This model is typically know as an "SIR model" - for Susceptible, Infectious and Recovered/immune.

COVID-19 is slight more complicated than that, and we can compartmentalise it further. Most people in the literature seem to currently be considering it with four compartments - the extra compartment is individuals who are infected but not yet infectious, and is often known as 'E'. For example, a recent article in the Lancet Infectious Diseases journal (published on 11 March 2020) entitled "Early dynamics of transmission and control of COVID-19: a mathematical modelling study" used this, an SEIR, approach.

There's one additional compartment we should also add to our model, however, and that's the number of deaths. I'm going to call this 'D' as I don't know what the main convention is but it seems logical to me. The reason I want this compartment is because I explicitly want to know how many deaths there might be in the UK for each of model scenarios. So now the model looks like this:

```
.
+-> R
S -> E -> I --|
+-> D
```

### Model parameters

Once we've decided on the compartments, we then need to figure out how many people are in each compartment, and how they move between them. These can be done using known or fictitious data or, as is more common, a combination of both. Indeed, we normally don't know what all the correct parameters are so we are trying to guess them - and hence the term modelling, as we are seeing what the results will be assuming different scenarios. Anyway, let's start with building up a relatively simple model, and seeing what happens.

Our first compartment, S, has no one entering it - as no one has the disease - and the only people leaving it are those who are becoming infEcted (i.e. joining compartment E).

The second compartment, E, has those entering it from compartment S, and those leaving it to join compartment I (Infectious). Similarly, compartment I has people joining it from compartment E, and also those leaving who are (mostly) going to join compartment R as they are recovered and, we hope, now immune and thus no longer susceptible (I'll add the deaths in a little while, below). This is relatively easy to model, and we can call the parameters as follows:

**beta**: the chance of someone who has the disease giving it to someone who is susceptible to the disease.**delta**: the rate at which people who are infected become infectious.**gamma**: the rate at which people who are infectious recover.

Each of these parameters can be estimated from other parameters that we
already know (or think we know) about the disease. Chief among these is
the basic reproduction number,
R_0 which
represents the number of people in a completely susceptible population
who will get the disease if just one infectious person is introduced to
that population. If we know the basic reproduction number, we can work
out beta. Similarly, if we know how long people are infected but not
infectious for (i.e. in a waiting or "latent" period), we can work out
*delta*, and if we know how long people are infectious for, we can work
out the recovery rate *gamma*.

### An example using measles

Let's look at all this in an example, shown in the figure below, using measles. Measles has an incubation period of around 10-12 days before the onset of symptoms, and is then said to be infectious from 4 days prior to onset of rash to 4 days after; the rash itself appears about 3 days after the symptoms start. For the sake of argument, we will say the latent period (when infected but before becoming infectious) is 9 days and the infectious period is 8 days. Measles is also very highly infectious, with a basic reproduction number of around 14, so we will also use that information. Finally, we will say that our population is completely susceptible and consists of 100 people. Here the graph:

You should see there are six lines, although in fact we've only really using (and interested in) four of them - as there are no deaths (black line), the total population (yellow line) remains the same throughout. The other lines, however, are the interesting ones. They show:

in green: the number of people who are susceptible to infection. This starts of at everyone (well, 99 people, as one person is infected at the beginning) and drops pretty rapidly until basically everyone is infected by day 20.

in cyan (light blue): labelled as "exposed", this is the number of people who are currently infected but not yet infectious and thus are not spreading disease. This shows the highest peak at around 15 days after the outbreak has started.

in red: labelled as "infected", this is the number of people who are infected

*and*infectious - i.e. can transmit measles to other people in the population.in (dark) blue: this line shows the people who have recovered from measles and are no longer at risk of catching the disease because they are immune.

### Looking at COVID-19

So what do we know about COVID-19? Actually, quite a lot already - some countries have been extremely good at collecting and sharing data, and there has been a big push for making the science as open as possible.

Here are some data:

Incubation (i.e. latent) period is "estimated to be 5.1 days (95% CI, 4.5 to 5.8 days)." - from "The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application" published in the Annals of Internal Medicine on 10 March 2020.

Infectious period: I don't know. I've seen varying data suggesting it's anywhere from 2-3 weeks or longer. Of course, if you die you're not as contagious for as long. (I'll come back to that point).

R zero: again, most people seem to be estimating between 2.5 and 4.5.

Here's the graph:

There are a number of assumptions I have made here, beyond those of the basic model. First, I've assumed everyone recovers. Second, I've assumed the population is static over time - no births or deaths, and no migration - and that the number of people is equivalent to all those reported as living in the UK in the middle of 2018 which is 66,435,600. Finally, I assume that the basic reproduction number was 4, the infectious period was 14 days and that 100 people were originally infectious. The time scale along the bottom is in days, so we see that the peak of infections occurs at around 133 days - or after 19 weeks. I'd also like you to note at this point that the peak of that curve is about 20,000,000 people (indicated as 2e+07 on the y-axis).

### COVID-19 and fatalities

Well, what we're really worried about is dying. In good health care systems, the fatality rate for COVID-19 has been reported as being as low as 0.3% to 1.0%, but where health care is not available, it's much higher. Here's a graph where I've tried to take that into consideration:

I've kept the same assumptions as above - that the basic reproductive number is 4, that the incubation period is 5.1 days and people are infectious for 14 days, after which time they either recover (99% of people) or die (1% of people). The total population size is again 66,435,600 with 100 people initially infectious.

Again, we see a peak of infected people around 133 days (19 weeks)
although this time slightly fewer people are affected at once - perhaps
15 million; some of them are now dying instead of recovering, so this
changes the population make-up a bit as we can see from the yellow line.
Also, again similar to the previous graph (to be expected because none
of the basic parameters have changed), "take off" (when the number of
people exposed and/or infected starts to become much above zero) really
starts to occur around day 77 (11 weeks). It's also interesting to note
that the black curve (representing deaths) only takes off considerably
*after* the other curves, probably around day 98 (3 weeks after the
other curves).

### Varying R zero

In the next graph, the only thing that I've changed is the basic reproductive number. This time, R zero is set to 3 instead of 4, which is perhaps slightly more consistent with the literature. See The reproductive number of COVID-19 is higher compared to SARS coronavirus by Ying Liu, Albert A Gayle, Annelies Wilder-Smith and Joacim Rocklöv in the Journal of Travel Medicine which was published in mid-February for a summary of some estimates. There are others, too.

This time, you can see that the take off point for all the curves is
shifted to the right (i.e. it occurs later) and that the height of the
red curve is not so high: there are perhaps 10 million people affected
at one time, rather than the 15 or 20 million we saw previously. **This
is what is known as " flattening the curve"** You will also notice that
the black curve hasn't really change in shape or height, just position:
it's move to the right as it has been delayed, although the total number
of people affected are still around the same as previously.

### More about assumptions - and what *you* can do

I've made a lot of assumptions in this post, as I've already mentioned,
so please please please do not take any of the numbers I've provided as
*The Truth*. Indeed, I've quite possibly made an error in the code, too,
so I wouldn't want anyone to rely on the statistics I have provided (at
least not until I can discuss things with friends and colleagues who are
more knowledgeable than I in these matters). But what *is* important is
the shape of the curves, particularly in relation to *R zero*.

What can be done about this? Infectious transmission and outcomes from
infectious disease depends on both the infecting agent *and* on the
organism being infected - in this case, us. The agent itself has a
degree of infectiousness, but this depends on the abilities we provide
it with to be transmitted from one person to another. As with the human
immunodeficiency virus, where using barriers to protect humans from
other humans' bodily fluids (whether that be gloves and protective
clothing in medical and dentistry work or the use of condoms for safe
sex), our own human behaviours are hugely important. For COVID-19, we
simply do not know enough yet:

How is it transmitted? We

*think*it's aerosol spread, but we're not yet completely sure.When is it transmitted? We

*think*it's from the onset of symptoms, or perhaps a few days before, until symptom resolution (or death). But we're not yet completely sure - it could be much longer.Are we immune after we've had it? We

*think*people gain immunity, but we're not yet completely sure.... there are certainly stories circulating about people who've had it twice.Can we predict who's going to be affected badly, or who might die? We

*think*it primarily/only affects older people, but we're not yet completely sure... There are clearly other important risk factors we haven't identified yet.How can I protect myself? .... Well, I'd like a Nobel prize too, but this is currently where much of the debate is at.

Indeed, the real important question is, "*how do we protect each
other?*" The UK has talked about "herd immunity" - in the graphs I've
shown, this is akin to the end bits of the graphs, where there is a
constant (very low) rate of new infections, along with constant rates of
the number of people susceptible and recovered. That's going to take a
long time, and if that is the main method of dealing with this disease I
think a lot of people will die as health care services will be
overwhelmed by the high R zero. In fact, many countries have advocated
social distancing; some haven't. In France, schools and educational
establishments (including universities) are now closed, all meetings are
being cancelled and people being told to work from home wherever and
whenever possible. The Provost of UCL in
London, Professor Michael Arthur, wrote "These are extraordinary times,
and as such, require an extraordinary
response;" all
face-to-face teaching at the university has been cancelled as of
yesterday. Indeed, those places that are shutting down and allowing
people to self-isolate probably realise that isolation is currently our
only weapon: we need to slow down the epidemic, we need to enable more
time for us to develop knowledge, to develop treatments (a vaccine!)
and to try to prevent our medical systems getting overwhelmed. The next
six months is going to be tough.

is there good evidence on the number of days before any symptoms that a person can i infect others? i hD another comment but it took such a long time to go through the rigmarole of getting. link from you that ive forgotten what it was!

Hi Jay,

Thanks for the comment. The question you ask is a really important one: current estimates are that the incubation period is 5.1 days and that 97.5% of people develop symptoms by 11.5 days. So most people will be infectious before they have symptoms. More detail is provided in an article entitled The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application published on the 10 March 2020 by Stephen Lauer et al in the Annals of Internal Medicine.

Best wishes,

Andrei

I will also see what I can do to make the commenting process easier over the coming days. But it's not, I will admit, my priority! Depending how much I can write and how much other time I have, I will maybe see if someone else can help me with things...

Anyway, thanks for persisting! I hope you will consider commenting further.