Networks and Simulation — Tutorial Seven

Supplementary Notes: Epidemic Spreading

These notes are intended as pre-tutorial reading and focus on the ideas most directly relevant to the tutorial exercises, in particular the connections between network structure, spectral properties, and epidemic dynamics that have been building throughout the course.

1. Why Network Structure Matters for Epidemics

Epidemiological models have a long history outside of network science. The classical SIR model — developed by Kermack and McKendrick in 1927 — treats a population as a well-mixed soup of individuals who interact equally with everyone else. This produces elegant closed-form results and captures many features of real outbreaks, but it misses something fundamental: not everyone interacts with everyone else. Infection spreads through contact, and contacts are structured by networks — social networks, transport networks, supply networks. The structure of those networks shapes who gets infected, when, and in what order.

This matters practically. In a well-mixed population, every individual is equally at risk from every other. In a real network, highly connected nodes are both more likely to be infected (many exposures) and more likely to infect others (many contacts). A disease that would struggle to sustain itself in a well-mixed population can reach endemic equilibrium in a heterogeneous network, because the hubs maintain transmission even when most nodes have recovered. Conversely, interventions that target hubs — vaccination of highly connected individuals, quarantine of high-degree nodes — can be far more effective than random vaccination, because they disproportionately remove transmission pathways.

The tutorial exercises take you through this logic computationally: from a basic SI model on a single network, to a comparison across network types, to the epidemic threshold, and finally to the question of which node to seed if you want (or want to prevent) maximal spread.

2. The SI Model

The simplest epidemic model distinguishes only two states: Susceptible (S) and Infected (I). Infected nodes transmit to susceptible neighbours at rate β per contact per time step; there is no recovery. The fraction of infected individuals can only increase — once infected, always infected. This is unrealistic for most diseases but a clean model for irreversible spreading processes: rumour propagation, adoption of innovations, or cascading failures in infrastructure networks.

In the discrete-time network version, at each time step:

Every infected node i independently attempts to infect each susceptible neighbour j, succeeding with probability β.
There is no recovery.

The SI model always results in the entire connected component of the seed node becoming infected; the question is only how fast.

Early-time growth. Near the start of the epidemic, almost all nodes are susceptible, so infected nodes find susceptible neighbours to infect with high probability. The number of infected nodes grows approximately exponentially:

\[I(t) \approx I_0 \, e^{\beta \langle k \rangle t}\]

where $\langle k \rangle$ is the mean degree of the network and $I_0$ is the initial fraction infected. This is the epidemic growth rate: networks with higher mean degree sustain faster early growth. In a semi-log plot, this early phase appears as a straight line whose slope is $\beta \langle k \rangle$.

As the epidemic progresses and susceptible nodes become scarce, growth slows and the curve bends toward full infection. The shape is sigmoidal on a linear scale: slow start, rapid growth, saturation.

3. The SIS Model and the Epidemic Threshold

In an SIS model, infected nodes recover and return to the susceptible state at rate μ per time step. This allows for an endemic equilibrium: a sustained fraction of infected nodes that neither grows to full infection nor dies out. Whether such an equilibrium exists depends on the balance between β (spreading) and μ (recovery), and critically, on the network structure.

The epidemic threshold. For an SIS model on a network, the condition for a sustained endemic state is:

\[\frac{\beta}{\mu} > \frac{1}{\lambda_1}\]

where $\lambda_1$ is the largest eigenvalue of the adjacency matrix. Below this threshold, any outbreak will eventually die out; above it, the epidemic persists.

To see why $\lambda_1$ appears here, consider the linearised dynamics near the disease-free state (everyone susceptible). Near this state, the probability that node i is infected, $\rho_i(t)$, obeys approximately:

\[\dot{\rho}_i \approx \beta \sum_j A_{ij} \rho_j - \mu \rho_i\]

In matrix form: $\dot{\boldsymbol{\rho}} \approx (\beta \mathbf{A} - \mu \mathbf{I})\boldsymbol{\rho}$. The system grows from zero if the largest eigenvalue of $\beta \mathbf{A} - \mu \mathbf{I}$ is positive, which requires $\beta \lambda_1 > \mu$, i.e., $\beta/\mu > 1/\lambda_1$.

This is the same logic as the Laplacian eigenvalues governing consensus dynamics in Tutorial 6: the spectral structure of the network operator determines the qualitative behaviour of a dynamical process on the network.

What determines λ₁? The largest adjacency eigenvalue is bounded by the degree sequence. For many networks, a good approximation is:

\[\lambda_1 \approx \sqrt{k_{\max}}\]

for sparse networks, or more precisely for heterogeneous networks:

\[\lambda_1 \approx \frac{\langle k^2 \rangle}{\langle k \rangle}\]

This means heterogeneous networks — those with heavy-tailed degree distributions — tend to have large $\lambda_1$, which lowers the epidemic threshold $1/\lambda_1$. Disease can persist at lower transmission rates in networks with hubs than in homogeneous networks of the same mean degree.

4. Heterogeneous Networks and the Role of Degree Distribution

The contrast between Erdős–Rényi (ER) graphs and power-law (scale-free) networks is one of the key empirical findings of network epidemiology.

In an ER graph, all nodes have roughly the same degree (close to the mean), so $\langle k^2 \rangle \approx \langle k \rangle^2$ and $\lambda_1 \approx \langle k \rangle$. The epidemic threshold is a finite value $\beta/\mu > 1/\langle k \rangle$. Below this threshold, outbreaks die out; above it, they persist.

In a power-law network with degree distribution $P(k) \sim k^{-\gamma}$ for $\gamma \leq 3$, the second moment $\langle k^2 \rangle$ diverges as the network grows. As $n \to \infty$, $\lambda_1 \to \infty$ and the epidemic threshold $1/\lambda_1 \to 0$. In the limit of a large network with $\gamma \leq 3$, there is no epidemic threshold: any positive transmission rate β, however small, can sustain an endemic infection. This is a dramatic structural difference from the ER case.

For finite networks — as in the exercises — the threshold does not literally vanish, but it is substantially smaller for the power-law network than for the ER network, even at the same mean degree. You should observe this in your SIS simulations: the power-law network sustains endemic infection at lower β/μ ratios.

The mechanism is the hub effect: even when most of the network has recovered, the hubs (very high-degree nodes) remain connected to many susceptible neighbours and can reignite the epidemic. The hubs act as reservoirs that maintain transmission when most nodes have cleared the infection.

The quantity $\langle k^2 \rangle / \langle k \rangle$ captures this directly. Compute it for both networks and compare: the power-law network will have a substantially larger value, corresponding to a larger $\lambda_1$ and a lower epidemic threshold.

5. The SIR Model

In an SIR model, recovery confers permanent immunity: infected nodes move to a Removed (R) state at rate μ and cannot be reinfected. Unlike the SIS model, the epidemic always eventually dies out — there is no endemic equilibrium. The question is: what fraction of the population is ultimately infected (the final epidemic size) and how fast does the epidemic run its course?

The epidemic threshold for SIR is the same condition as for SIS: $\beta/\mu > 1/\lambda_1$. Below the threshold, the epidemic dies out after infecting a vanishingly small fraction of the network. Above the threshold, it reaches a non-trivial fraction of nodes before exhausting itself.

Above the threshold, the epidemic proceeds through three phases visible in the time series:

Exponential growth (early): the infected fraction I(t) grows rapidly while susceptible nodes are plentiful.
Peak infection: I(t) reaches a maximum when the rate of new infections equals the rate of recoveries.
Decay: I(t) falls as susceptible nodes are exhausted and recovered nodes provide immunity. The Removed fraction R(t) asymptotes to the final epidemic size.

Higher β/μ above the threshold produces a faster, larger epidemic; lower β/μ (but still above threshold) produces a slower, smaller one. The SIR model on a network with heterogeneous degree will show faster early growth and a larger final size than an ER network with the same mean degree, for the same reasons as in the SIS case.

6. Seeding and Node Centrality

In all models above, we seed a single randomly chosen node. But does the choice of seed matter? This connects directly to the centrality analysis of Tutorial 3.

Intuitively, a seed node that is highly connected (high degree) will immediately expose many susceptible neighbours at time step 1, producing faster early growth than a low-degree seed. A high-PageRank seed sits in a part of the network that is globally well-connected, meaning the epidemic can quickly reach far parts of the network through short paths. A randomly chosen seed may be low-degree and weakly connected, producing slow early growth.

In practice, the advantage of targeted seeding is most pronounced in the early phase of the epidemic. In the SI and SIR models, the epidemic will eventually infect the whole component regardless of seeding (in SI) or the same final fraction (in SIR above threshold). The advantage of a central seed is primarily speed. In the SIS model, seeding at a hub may also increase the probability of establishing an endemic state when β/μ is near the threshold, since the hub immediately creates a large infected cluster with many infection opportunities.

The degree to which degree centrality and PageRank agree on which node to target connects back to the Tutorial 3 discussion of centrality correlation. On homogeneous networks (ER graphs), degree and PageRank are strongly correlated and agree on the best seed node. On heterogeneous networks, they may disagree — and the node with highest PageRank may not be the node with highest degree. Testing which produces faster or larger epidemics is a direct empirical test of which centrality measure better captures “epidemic importance” on each network type.

7. Practical Notes

Loading the Atlas networks. The data files at the Atlas exercise URLs are edge lists in plain text format. Load them with:

import networkx as nx
G = nx.read_edgelist("path/to/data.txt")

Check whether the network is connected with nx.is_connected(G) and, if not, work on the largest connected component with G = G.subgraph(max(nx.connected_components(G), key=len)).copy().

Computing λ₁. For the adjacency matrix, the largest eigenvalue can be computed efficiently using scipy.sparse.linalg.eigsh(A, k=1, which='LM'), where A is the sparse adjacency matrix. This avoids constructing the full dense matrix.

Randomness. All three models are stochastic. Results will vary across runs. For quantitative comparisons (time to 80% infection, endemic fraction, final epidemic size), average across at least 10 runs and show variability with percentile bands. For qualitative comparisons (does an endemic state exist?), a single well-chosen run is often sufficient but should be replicated to confirm.

Discrete vs. continuous time. The exercises use a discrete-time model. In continuous time, the SIS threshold condition $\beta/\mu > 1/\lambda_1$ is exact. In discrete time, there are corrections of order $\beta^2$, but for small β these are negligible and the threshold condition is still a good approximation.

References

Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London, Series A, 115(772), 700–721. https://doi.org/10.1098/rspa.1927.0118