Tutorial Two — Generative Models

Reading: Chapters 16, 17.3, and 18.1 of The Atlas for the Aspiring Network Scientist (2025 edition): Random graphs, degree distributions, and the configuration model.

1. One of the most striking theoretical results in network science is the sudden emergence of a giant connected component as edges are added to a random graph — a phase transition with a precise theoretical prediction.

Using the Erdős–Rényi model G(n, p) with n = 1000 nodes:

(a) Experimentally recreate the figure in Chapter 16 of the Atlas showing the relative size of the largest connected component as a function of p. Simulate a sufficient number of random graphs for each value of p and plot the mean together with the 25th and 75th percentile bands. (A log scale on the x-axis may help.)

(b) On a separate plot, show the mean clustering coefficient for the same ensemble of graphs across the same range of p.

(c) At approximately what value of p does the giant component appear? How does this compare to the theoretical prediction from the reading? What is happening to the clustering coefficient at the same transition point, and why?

(Optional) (d) For a fixed mean degree ⟨k⟩ = 5, simulate ER graphs of increasing size n and measure the average shortest path length between pairs of nodes in the giant component. How does the average path length scale with n? Is this what you would expect from a social network? What are the implications for how quickly information or disease can spread through a random network?

2. The ER model makes a specific, testable prediction about the shape of the degree distribution. Before building more sophisticated models it is worth checking how well this prediction holds for real networks.

(a) Load at least four networks from the repositories used in Tutorial 1 (networks.skewed.de or icon.colorado.edu). Choose networks from different domains (e.g., social, biological, infrastructure). For each, plot the degree distribution.

(b) For each network, compute the mean degree and overlay the Poisson distribution that an ER graph with the same number of nodes and mean degree would predict.

(c) How well does the ER prediction fit the empirical distributions? Describe any systematic differences you observe. What are the implications of these differences for using the ER model as a null model when analysing real networks?

3. The ER model captures the emergence of connectivity but fails to reproduce two features that are almost universal in real-world networks: high clustering and short average path lengths occurring together. The Watts–Strogatz model was designed to explain exactly this combination.

The Watts–Strogatz model starts from a regular ring lattice — each node connected to its k nearest neighbours — and then rewires each edge independently with probability β. At β = 0 the graph is a regular lattice; at β = 1 it is effectively random.

(a) Generate Watts–Strogatz graphs across a range of rewiring probabilities β ∈ [0, 1] for fixed n and k. For each value of β, compute the mean clustering coefficient C(β) and the average shortest path length L(β). Normalise both by their values at β = 0 (i.e., plot C(β)/C(0) and L(β)/L(0) on the same axes). What do you observe, and at what range of β does the “small-world” regime occur?

(Optional) (b) Compare the degree distributions of a Watts–Strogatz graph (at a β value in the small-world regime) and an ER graph with the same mean degree. How do they differ? What does this tell you about what the WS model is and is not capturing relative to real networks?

(c) Return to the networks from Exercise 2. For each, measure the clustering coefficient and average path length. Which networks, if any, are plausibly consistent with the small-world model? Which are not, and why?

(Discussion) (d) Suppose you are advising on epidemic preparedness for a city of 500,000 people. You do not have access to the full contact network, but you can estimate the mean number of daily contacts per person and the degree of social clustering in the population. Which of the three models covered so far — ER, configuration model, or Watts–Strogatz — would you choose to simulate the network, and why? What assumptions are you making, and how sensitive do you think your conclusions would be to the choice of model? This question does not have a single correct answer — bring your reasoning to the tutorial.

4. The configuration model is the workhorse null model of network science: it generates random graphs that preserve a network’s degree sequence while randomising everything else. Building it yourself makes clear both what it controls for and what it does not.

(a) Implement the stub-matching configuration model for a given degree sequence. Apply it to one of the networks from Exercise 2 and generate 10 realisations. Verify that each realisation has the correct degree sequence. How frequently do self-loops and multi-edges appear in your output, and what does their presence tell you about the graph space being sampled?

(b) For the same network, generate an ensemble of 1000 configuration model graphs and 1000 ER graphs (matching the original mean degree). Plot the resulting degree distributions of both ensembles alongside the empirical degree distribution. What does the configuration model preserve that the ER model does not?

(c) Beyond the degree sequence, what structural properties of the original network does the configuration model not preserve? Give at least two concrete examples and explain why they matter when using the model for null hypothesis testing.

(Optional) Implement the Fosdick et al. MCMC approach for sampling from (i) the vertex-labeled simple graph space and (ii) the stub-labeled loopy multigraph space (with self-loops and multi-edges removed to produce a simple graph). Compare the degree distributions and any other structural properties you find relevant across the two sampling methods. Under what circumstances does the choice between them matter?

5. Real social networks consistently exhibit far more clustering than random graphs of the same size. Whether this is explained by their degree distributions alone — or whether something more structural is at work — is an open empirical question that null models let us probe directly.

(a) Collect a set of at least six social networks of varying sizes from the Tutorial 1 repositories. For each, compute the mean clustering coefficient C.

(b) For each network, also compute C for an ensemble of ER graphs (matching the network’s density) and an ensemble of configuration model graphs (matching its degree sequence), taking the mean across the ensemble in each case.

(c) Plot C as a function of network size n on a log–log scale, showing three curves: the empirical networks, the ER null model, and the configuration model null model.

(d) What conclusions can you draw about the relative roles of edge density and degree structure in explaining the clustering observed in social networks as they grow? Is the degree sequence sufficient to account for clustering, or does something further need to be explained? How does your answer here relate to what you found in Exercise 3?

Note: igraph’s clustering coefficient implementation is significantly faster than networkx for this exercise.