# Individual identity and movement networks for disease metapopulations

See allHide authors and affiliations

Edited by Bryan Grenfell, Pennsylvania State University, Erie, PA, and accepted by the Editorial Board March 22, 2010 (received for review January 18, 2010)

## Abstract

The theory of networks has had a huge impact in both the physical and life sciences, shaping our understanding of the interaction between multiple elements in complex systems. In particular, networks have been extensively used in predicting the spread of infectious diseases where individuals, or populations of individuals, interact with a limited set of others—defining the network through which the disease can spread. Here for such disease models we consider three assumptions for capturing the network of movements between populations, and focus on two applied problems supported by detailed data from Great Britain: the commuter movement of workers between local areas (wards) and the permanent movement of cattle between farms. For such metapopulation networks, we show that the identity of individuals responsible for making network connections can have a significant impact on the infection dynamics, with clear implications for detailed public health and veterinary applications.

Networks are now a well-understood and powerful scientific tool. When the number of interactions between elements or individuals is relatively low, then networks offer an intuitive means of capturing and describing the structure of such interactions. Examples abound, from computer and Internet connections (1) to metabolic networks (2) and food webs (3), from transportation patterns (4) to actors within the same movies (5)—in each of these, the theory of networks has provided valuable insights into how such interactions are structured and has hinted at deeper underlying patterns. When interactions describe connections between people, then network theory is often used to explore the implications of disease spread through the population (6). However, data on human-to-human contacts through which infections can spread are rare, generally being isolated to small populations that have been sampled with detailed questionnaires—of these, networks of sexual encounters, such as the studies in Colorado Springs (7) and Manitoba (8), are probably the best examples. Another common source of network interaction comes from the movement of individuals between populations, who could potentially carry infection with them. With concerns over novel or reemerging infections, such networks of movements often form the core of mathematical models that examine the spatiotemporal spread of infections and the associated public health implications. Examples include work movements for the spread of seasonal influenza in the United States (9, 10); aviation traffic for the spread of smallpox (11), sudden acute respiratory syndrome (12), and general epidemics (13); commuting movements for deliberate release of smallpox (14); and trade for the 1918–1919 influenza pandemic (15); a more systematic review of techniques and applications is available from Riley (16). The sensitivity of the epidemic to the topological structure of the movement network has already been shown (13, 17); in contrast here, through detailed simulation, we address how such movement networks should be modeled and the biases that occur from making naïve assumptions.

Two data sets from Great Britain provide highly detailed information on the network of individual movements. From the 2001 censuses of England, Wales, and Scotland there is information regarding the populations of, and commuter movements between, the 10,000 wards in Great Britain (*SI Text*). Wards (which generally contain between 2,000 and 8,000 people) provide a good compromise between fine spatial resolution, computational efficiency, and requirements for individual-level confidentiality. From the Cattle Tracing System (18) we have individual records of the movement of cattle between the 150,000 farms, markets, and slaughterhouses in Great Britain (*SI Text*). Both of these movement datasets can be conceptualized as relatively sparse networks; the analysis of such networks yields a wide range of important information on the pattern of movements and the structure of the populations (19, 20). Here, however, we are interested in how such movements lead to the percolation of infection through the population, and we show that great care is needed if realistic rates of spread are to be predicted.

The most natural way to capture the spread of infection due to the movement of individuals between locations is to partition the population into discrete subpopulations, with the movements between subpopulations naturally creating an interaction network. Both people and cattle are spatially aggregated, and such aggregations form the natural scale for the subpopulations within the metapopulation model. Cattle are clearly aggregated into farms, and for directly transmitted infections we expect the vast majority of the transmission between cattle to occur within the farm environment and for infection to predominately transfer between farms through the movement of infected livestock. People tend to aggregate into towns, cities, or other communities (with urban areas in Britain having more than 200 times the population density of rural areas); we partition the population into wards, which provide a natural sociopolitical division of the population and correspond to one of the smallest aggregations for which reliable census data are available. For both hosts we consider two infections: a rapid infection parameterized to correspond to pandemic influenza in humans or foot-and-mouth disease in cattle, and a slower infection parameterized to match smallpox in humans or bovine tuberculosis in cattle. All four of these infectious diseases are important for either human health or for the livestock industry, and hence accurate prediction of the spread of infection is important for informing policy.

When simulating the dynamics of disease spread through a network of contacts it is often simply assumed that each node (ward or farm) is a single entity that can be either susceptible, infected, or recovered (with no internal structure) and that infection is passed along network edges (between nodes connected by movements) at a given rate. This rate can either be the same for all edges (19, 21) or can be proportional to the strength of the edge (number of movements associated with that link) (22). This model formulation has parallels with the classic Levins metapopulation ideal in ecology (23) and has been highly successful for examining the spread of infection through networks of individuals (6). However, when nodes contain many individuals it is more intuitive to use a complete metapopulation framework (24) whereby the subpopulation at each node (farm or ward) has its own internal homogeneous dynamics and the interaction between subpopulations is governed by the network (10, 13, 25–27). Here we investigate three approaches of modeling the interactions within such a metapopulation, to assess the validity of common approximations. Initially we focus on results from a simple metapopulation in which subpopulations are arranged in a linear network with nearest-neighbor movements; this simple model allows us to examine a comprehensive range of epidemiologic assumptions before focusing on the more complex movement patterns within Great Britain.

## Results

### Modeling Metapopulation Interaction.

The traditional method of modeling transmission in full stochastic metapopulation models is to allow subpopulations that are connected to share a proportion of their associated force of infection (10, 14, 26); we term such methods *kernel transmission* based, because effectively the movement network defines a spatial transmission kernel between subpopulations. (We note that for commuter movements the interaction between subpopulations is actually based on the convolution of the kernel, because infection can pass from subpopulation A to subpopulation B whenever individuals from these subpopulations meet in location C.) We can conceptualize this kernel model as moving a continuum of infection across network connections—effectively moving the pathogen. An improvement is to realize that the pathogen generally exists as a highly aggregated distribution within hosts, and it is hosts that move between populations. Our second model (*random-movers*) therefore moves randomly chosen individuals between subpopulations in accordance with the recorded patterns (either permanent movement for cattle or movement there and back for commuters) (13, 27). Finally, a more realistic assumption is to identify individuals that move (25, 28): for the cattle network, this is achieved by knowing the unique identity of each animal and the associated movements; for the movement-to-work network, this is achieved by defining a group of individuals who commute daily between each pair of home and work subpopulations.

### Conceptual Commuter Model.

For the commuting-to-work movements, we initially study the problem using a simple linear network of subpopulations in which individuals commute to the adjacent subpopulations (Fig. 1; see *Materials and Methods* for model construction). In all three model formulations (kernel transmission, random-movers, and commuters) we observe a traveling wave of infection. Although random-movers and kernel-transmission models lead to similar wave speeds (captured by the average time between epidemic peaks in adjacent subpopulations; Fig. 1*A*), there is significantly more variation observed for random movers. This is because there is greater heterogeneity in the movement of pathogen for the random-mover model, because the pathogen is aggregated within infected hosts. In the random-mover model, if an infected individual moves there is a discrete jump in the force of infection between subpopulations, whereas in the kernel model this force of infection occurs at a constant rate. More surprising is the substantially reduced rate of spread associated with regular commuters; this is because each subpopulation has been further partitioned into commuters and noncommuters, and infection must cross these partitions to travel to previously uninfected parts of the network.

These patterns are consistent across a range of epidemiologic parameters, with the commuter formulation predicting a wave speed that is consistently between 80% and 85% of the other predictions (Fig. 1*B*). In addition, we find that small amounts of random movements do not destroy this effect; in fact, the reduction in speed shows only slight nonlinear dependence as we scale from a population of pure random movers to a population of pure workers (Fig. 1*C*). One potential difficulty is the assumption of random mixing within each subpopulation because it has previously been shown that a high degree of assortativity (with commuters preferentially mixing with other commuters) can speed the spread of infection (29). Although we do observe this behavior in our model (Fig. 1*D*), its effect is far less pronounced than the difference between types of movement.

### Commuter Model for Great Britain.

We now extend these theoretical observations for the speed of spatial spread to two applied examples and consider the impact of movement patterns on aggregate epidemic dynamics (Fig. 2). We consider the metapopulation of wards in Great Britain linked by both the commuter network and nonwork travel patterns (*SI Text*; see ref. 17 for details of this combined network structure) with infection parameterized to match pandemic influenza and smallpox dynamics (Fig. 2 *A* and *B*; parameters are given in *SI Text*). For both infections we observe a far slower growth rate, with a later and lower peak, for a population of commuters compared with a population of random movers, in agreement with the 80% slower speed predicted by the simpler model; this pattern also holds independent of where the infection is seeded and even when the metapopulation is simulated at a range of other larger spatial scales. In addition, from the census data we can also separate the population of each ward into those that work (and hence have a defined daily commute) and those that do not work (and therefore are free to move randomly); as expected from the results shown in Fig. 1*C*, this mixed population behaves in between the two extremes. It is important to stress that here we are comparing methods of capturing human movements, and therefore in all simulations the expected number of journeys between each pair of wards remains the same irrespective of whether these are fixed commuter movements, random movements, or a mixture of the two. We note that including different movement patterns on weekends, or potential biases in the estimation of rare long-distance journeys, does not noticeably change the predicted epidemic dynamics for either infection.

### Model for Cattle Farms in Great Britain.

It might be envisaged that the importance of individual identity is related to the rapid movement of individuals relative to the epidemiologic time scales and the partitioning of the population into distinct commuter groups. However, these ideas and results also extend to the farming industry, in which animals tend to remain on a farm for far longer periods and not to return to their point of origin after a movement. For the United Kingdom cattle industry we have a record of the movement of individual cattle (18), generating a network of contacts between farms and allowing us to simulate the dynamics with a high degree of detail. Here we compare simulations in which the identity of cattle moving between farms is retained and simulations in which cattle on each farm are chosen randomly to move regardless of their identity—thereby moving the same number of animals between the same farms on the same day but losing the individual identity of animals on the farm. This loss of identity is analogous to the difference between the commuter and random-mover formulations. We compare these two assumptions by simulating multiple epidemics with the same initial individual cattle infected and comparing the distributions of epidemic sizes that are predicted (Fig. 2 *C* and *D*). Because of the vast heterogeneity in the number of cattle per farm, the specific structure of the movement network, and the relative rarity of movements between two given farms, it is not possible to produce an average or generic epidemic in the cattle population; instead we are forced to compare the distribution of simulation epidemic sizes while accounting for the extreme sensitivity of the expected epidemic to the initial seeding of infection. In general, we predict larger epidemics when moving random cattle compared with moving cattle with a known identity, with the effects pronounced for both slow infections (such as bovine tuberculosis; Fig. 2*C*) and for short-lived highly transmissible infections (such as foot-and-mouth disease; Fig. 2*D*). We can therefore conclude that the normal duration of animal stays on farms (details given in *SI Text*) generally has a protective influence, limiting the spread of infection compared with more naïve assumptions about the network of animal movements. This impact is generally greater for the more rapidly spreading infection (Fig. 2*D*); when individual identity is lost there are more frequent short stays on a farm (*SI Text*), and hence the infection is more likely to escape to new farms before recovery. In contrast, for the true pattern of movements, farms often act as bottle-necks for infection.

## Discussion

Individuality, and the associated stochastic integer-based approach, has long been understood to be epidemiologically important, especially when dealing with small infectious populations and matters of persistence (30, 31). However, here we have extended this notion to show that we must not only recognize the individual-based nature of the population but also maintain the identity of individuals, especially when behavior (such as movements) is highly structured. In conclusion, both simple theoretical models and applied simulations of disease spread have shown that models that do not maintain individual identity overestimate (by approximately 20%) the degree of spatial spread and therefore overestimate the aggregate growth rate (and peak height) of an epidemic; leading us to question whether simple network models can ever accurately capture the effects of movements between populations.

Given the importance of spatial spread in the containment and control of human and livestock infections, our work shows that great care is needed in interpreting and simulating patterns of movement if such models are to be used as predictive tools for public-health or veterinary advice. In light of our findings, detailed predictions from previous metapopulation disease models (especially when regular commuters are involved) must be reconsidered because the spatial spread of the epidemic, the aggregate growth of the epidemic, the peak number of cases, and the degree of spatial synchrony are all potentially overestimated if individual identity has been ignored. Given that our findings are not influenced by the scale of the subpopulations, we speculate that similar results will hold true for fully individual-based models, such that the action of regular movements will reduce epidemic spread.

## Materials and Methods

### Mathematical Models.

Here we briefly present the models for both the cattle- and human-based epidemics. Both of these models have a common core (the transmission of infection to susceptibles and the recovery of infected individuals) but differ in the way that the force of infection and movements are calculated.

### Infection Dynamics.

The basic infection dynamics are common to all of the examples given in the article. We assume SIR-type (susceptible-infected-recovered) infection dynamics, such that immunity is life-long. We simulate the dynamics stochastically (*SI Text*), such that the population is finite and integer-based and each process occurs probabilistically; however, for brevity we use the notation of ordinary differential equations to express the underlying expected rates of change and to provide a clear comparison with previous models:This is a compartmental model for a particular subgroup *x* of the entire population, where *S*, *I*, and *R* are the number of susceptible, infected, and recovered individuals, respectively; to allow greater biologic realism the infected class has been subdivided into *m* classes leading to gamma-distributed recovery times and transmission rates that depend on the time since infection (32). (The above equations describe the mean behavior of an infinitely large system, although we model the behavior of a system of finite size.) All updating is performed in discrete time using Gillespie's τ-leap method (33), which allows us to naturally capture the day–night distinction in the commuter model and the daily pattern of movements in the cattle model.

It is only the force of infection *λ _{x}* and the meaning of the subscript that differs between the human and cattle models. These differences are described below.

### Human Commuters.

In all three models (kernel-based, random-movers, and commuters), we assume that at night the force of infection is purely determined by the frequency of infection within the home location; whereas during the day the force of infection reflects the action of the differing movement assumptions.

For the kernel-based transmission model, the subscript simply denotes the home location of individuals (*x* = *i*), such that the force of infection to individuals in location *i* is:where *N _{i}* is the resident population size of subpopulation

*i*. The daytime force of infection considers the potential interaction between infected individuals from subpopulation

*j*meeting susceptibles from subpopulation

*i*in location

*k*; the denominator is the expected daytime size of the population in location

*k*, therefore allowing us to model frequency-dependent transmission. The β parameters control the transmission rate as a function of the stage of infection, whereas the movement matrix

*M*

_{i}_{,}

*corresponds to the proportion of individuals that live in location*

_{j}*i*who travel to location

*j*and is obtained from census records. The nighttime force of infection simply assumes that all transmission occurs within the home subpopulation.

For both the random-movement model and the commuter model the subscript denotes both the home and daytime location [*x* = (*h*, *d*)], and the force of infection is determined by an individual's daytime and home (nighttime) location:The principal difference between random movements and commuter movements is that for commuters an individual's home and daytime location are fixed at the start of the simulation, whereas for individuals that move randomly, their daytime location is chosen randomly each day, proportioned according to the movement matrix *M _{h}*

_{,}

*. The kernel model can be considered as a limiting case of random-movers when their daytime location is rapidly and repeatedly chosen throughout each day.*

_{d}### Cattle Movements.

For cattle infections, the model is modified to reflect the fact that movements are permanent; and we lose the distinction between day and night. Eq. **1** still holds for the infection dynamics within a farm, and the subscript *x* always refers to the farm identity. For the random-movement and fully individual-based models the force of infection is far simpler:and infection is transferred through the displacement of individual animals according to the movement records (18); we either move the animal in question (preserving its identity) or we move a randomly selected animal from the appropriate farm. The mechanism within the fully individual-based model is relatively straightforward, moving each identified animal according to the Cattle Movement database and allowing transmission (and recovery) to occur within each farm. We conceptualize the problem by translating the individual movement records into a set of daily movements between farms—effectively moving from individual-based data to a dynamic network between subpopulations. This dynamic network is then used to move randomly chosen animals between farms, preserving between-farm interactions but losing individual identity. The mechanism for random movements is achieved within the model by moving the correctly identified animal (as in the full model) but randomly switching the infectious status of the animal due to move and any randomly chosen animal on the farm (including itself); this is equivalent to choosing a random animal but computationally more efficient.

## Acknowledgments

We thank Ian Hall, Joseph Egan, Steve Leach, Judith Legrand, and Neil Ferguson for helpful discussions during the preparation of this work; and two anonymous referees for their very helpful suggestions. This work was supported by The Wellcome Trust, the Medical Research Council and the Research and Policy for Infectious Disease Dynamics Program.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: m.j.keeling{at}warwick.ac.uk.

Author contributions: M.J.K. designed research; M.J.K., L.D., M.C.V., and T.A.H. performed research; L.D. and M.C.V. contributed new reagents/analytic tools; M.J.K., L.D., M.C.V., and T.A.H. analyzed data; and M.J.K., L.D., M.C.V., and T.A.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. B.G. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1000416107/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- Lloyd A,
- May R

- ↵
- ↵
- Williams R,
- Berlow E,
- Dunne J,
- Barabasi A,
- Martinez N

- ↵
- ↵
- ↵
- Keeling MJ,
- Eames KTD

- ↵
- ↵
- ↵
- ↵
- Viboud C,
- et al.

- ↵
- ↵
- Hufnagel L,
- Brockmann D,
- Geisel T

- ↵
- Colizza V,
- Barrat A,
- Barthélemy M,
- Vespignani A

- ↵
- ↵
- ↵
- Riley S

- ↵
- ↵
- Lysons RE,
- Gibbens JC,
- Smith LH

- ↵
- Kao RR,
- Danon L,
- Green DM,
- Kiss IZ

- ↵
- Robinson SE,
- Everett MG,
- Christley RM

- ↵
- Vernon MC,
- Keeling MJ

- ↵
- Smith D,
- Lucey B,
- Waller L,
- Childs J,
- Real L

- ↵
- Levins R

- ↵
- Hanski I,
- Gaggiotti O

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Lloyd AL

- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Population Biology