Networks and Simulation — Tutorial Four

Supplementary Notes: Community Structure

These notes are intended as pre-tutorial reading to accompany the exercises. They trace the development of community detection from its origins in modularity optimisation through to principled statistical inference and spectral methods. The goal is not to survey every available algorithm — there are many — but to give you a coherent framework for understanding what different approaches are actually doing, and what assumptions they carry.

1. What Is a Community?

Before asking how to detect communities, it is worth asking what we mean by one. The intuitive picture is familiar: a community is a group of nodes that are more densely connected to each other than to the rest of the network. Social groups, protein complexes, web pages on related topics, airports serving the same geographic region — in each case, something causes nodes to cluster.

But this intuition is not a definition. “More densely connected than what?” immediately raises the question of a reference point. A group of 10 nodes all connected to each other is clearly a community in a sparse network but may be unremarkable in a dense one. Any formal notion of community structure has to specify — explicitly or implicitly — what baseline it is comparing against.

This turns out to be the fault line that separates the major approaches in the field. Different methods encode different answers to “compared to what?”, and those different answers produce different results on the same network. Recognising that a community detection method always encodes an assumption about structure is the central critical skill this tutorial develops.

Community structure intuition

2. Modularity: Optimisation With an Implicit Null Model

2.1 Arriving at the formula from first principles

You already have the tools to derive the most widely used measure of community quality. In Tutorial 2, you showed that the configuration model generates random graphs that preserve the degree sequence of a given network. Specifically, the expected number of edges between nodes $i$ and $j$ in a configuration model ensemble is:

\[\langle A_{ij} \rangle = \frac{k_i k_j}{2m}\]

where $k_i$ and $k_j$ are the degrees of nodes $i$ and $j$ and $m$ is the total number of edges. This is the null model expectation: if you randomised all connections while keeping each node’s degree fixed, this is how many edges you would expect between any pair on average.

Modularity, introduced by Newman and Girvan (2004), is built directly on this null model. For a given partition of the nodes into communities — call the community assignment of node $i$ as $c_i$ — modularity asks: how many more edges fall within communities than the configuration model would predict? Formally:

\[Q = \frac{1}{2m} \sum_{ij} \left[ A_{ij} - \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)\]

where $\delta(c_i, c_j) = 1$ if nodes $i$ and $j$ are in the same community and $0$ otherwise. The $1/2m$ factor normalises $Q$ to lie between $-1$ and $1$.

The interpretation is direct: $Q$ is the fraction of edges that fall within communities, minus the fraction you would expect if edges were distributed according to the degree sequence alone. A partition with $Q > 0$ has more within-community edges than the configuration model predicts; a partition with $Q \approx 0$ is no better than chance.

Modularity null model

Community detection via modularity maximisation then becomes an optimisation problem: find the partition that maximises $Q$. The best available partition is the one where rearranging any node to a different community would reduce $Q$. Fast approximate maximisation algorithms such as the Louvain method (Blondel et al., 2008) have made this computationally tractable for networks with millions of nodes.

2.2 Why modularity became dominant

Modularity was enormously influential for a simple reason: it gave researchers a single scalar measuring partition quality without requiring any prior knowledge of the number of communities, their sizes, or their internal structure. Combined with fast algorithms, it became a default tool in network analysis across biology, sociology, physics, and computer science.

Its appeal is also visual and intuitive. A high-modularity partition tends to produce the kind of block structure that looks like communities on a plot. When it works, it works clearly.

3. The Problems With Modularity

The history of modularity is, in part, a history of discovering its failure modes. Three deserve careful attention because each reveals something important about what any community detection method is actually doing.

3.1 Non-zero modularity in random graphs

A basic sanity check for any community quality measure is to ask: what does it return on a network with no community structure? For modularity, the answer is not zero.

Guimerà, Sales-Pardo, and Amaral (2004) and McDiarmid and Skerman (2013) showed that random graphs — networks with no planted structure — have a positive expected modularity that grows with network size. This means that modularity maximisation applied to a random graph will always find a partition with $Q > 0$, and will do so with higher $Q$ as the network gets larger. The maximum modularity of a random graph is not a fixed baseline; it is a moving target.

This observation motivates the null hypothesis test used in Exercise 3. Rather than comparing the observed modularity to zero, you should compare it to the distribution of modularity values obtained from an ensemble of random graphs with the same degree sequence — exactly the configuration model null you built in Tutorial 2. Exercise 3 implements this test for the networks under study. Note that while this test is conceptually natural, it is not as widely standardised in the literature as the use of modularity maximisation itself; the test requires running modularity maximisation many times on rewired graphs, which is computationally expensive and rarely done in practice.

3.2 The resolution limit

Fortunato and Barthélemy (2007) showed that modularity maximisation cannot reliably detect communities smaller than a scale set by the total number of edges in the network. Specifically, communities with fewer than approximately $\sqrt{m}$ internal edges tend to be merged with neighbouring communities, even when the planted structure is clear.

The intuition is that modularity is a global quantity — it balances the gain from merging small communities against the loss from splitting large ones. Once communities are small enough, the gain from splitting them is too small for modularity to detect over the noise from global optimisation. This is not a limitation of any particular algorithm for maximising modularity; it is a property of modularity itself. No algorithm can find communities that modularity cannot distinguish.

Community resolution limit

In practice this means that if your network is large and you expect communities at multiple scales, modularity will reliably find the large ones and silently merge the small ones. The partition it returns will look reasonable without being correct — which is more dangerous than an obviously wrong result.

Exercise 2 demonstrates the resolution limit concretely on a controlled network where the planted partition is known.

3.3 The degeneracy of the optimisation landscape

Good, de Montjoye, and Clauset (2010) showed that the modularity landscape — the surface of $Q$ values over all possible partitions — is extremely flat near its maximum. For most real networks, there are exponentially many partitions with $Q$ values extremely close to the maximum, and these near-optimal partitions can assign nodes very differently.

This means that “find the modularity-maximising partition” is not a well-posed problem in practice. You are not finding the optimal partition; you are finding one of many near-optimal partitions. Different runs of the same algorithm on the same network will frequently return substantially different community assignments, all with nearly identical $Q$ values. Ghasemian, Hosseinmardi, and Clauset (2019) extend this analysis and show that many popular community detection methods — not only modularity — exhibit a systematic tendency to overfit or underfit community structure depending on network properties.

3.4 Confirmatory failure: the most important limitation

The fourth problem is the one that cuts deepest, and it is the focus of Exercise 3.

Modularity maximisation is a procedure that takes a network as input and returns a partition. It will always return a partition — even if the network has no community structure at all. The question “does this network have communities?” is not one that modularity can answer. It can only tell you “given that I must partition this network, here is the best partition I found.”

This creates a subtle but serious problem. Suppose you apply modularity maximisation to a network and obtain $Q = 0.4$, and a null hypothesis test confirms this is significantly above what a random graph would produce. What have you shown? You have shown that the network has some structure beyond what the degree sequence predicts. You have not confirmed that the specific partition returned by modularity is correct, or even that the network has the kind of community structure modularity is designed to detect.

Modularity can reject a specific null hypothesis (the configuration model), but it cannot confirm a specific structural hypothesis (the network has this community structure). The distinction matters enormously for interpretation — and it is the central motivation for moving to the model-based approach in Section 4.

4. The Stochastic Block Model: Inference With Explicit Assumptions

4.1 Making the generative model explicit

The stochastic block model (SBM) is a generative model for networks with community structure. You already know the building blocks from Tutorial 2; the SBM assembles them in a specific way.

Each node $i$ is assigned to one of $K$ communities, labelled $c_i \in {1, \ldots, K}$. Edges between nodes are then drawn independently, with the probability of an edge between nodes $i$ and $j$ depending only on their community assignments:

\[P(A_{ij} = 1) = \omega_{c_i, c_j}\]

The $K \times K$ matrix $\omega$ contains all the connection probabilities: $\omega_{rs}$ is the probability of an edge between a node in community $r$ and a node in community $s$. When $\omega_{rr} > \omega_{rs}$ for $r \neq s$, the model produces the assortative structure we usually associate with communities. But the SBM is more general: it can also produce disassortative structure, bipartite-like patterns, or any other configuration of inter-group connectivity.

The SBM generative process

Notice what the SBM generates within each community: an Erdős–Rényi random graph with connection probability $\omega_{rr}$. The SBM is a collection of ER random graphs — one per pair of communities — stitched together by the community assignment of each node. This is the direct connection back to Tutorial 2.

4.2 Inference rather than optimisation

Fitting the SBM to an observed network is an inference problem, not an optimisation problem. Given the observed network, we want to find the community assignments ${c_i}$ and parameters $\omega$ that best explain what we see. Using the minimum description length (MDL) principle (Peixoto, 2013), the SBM fitting procedure asks: what is the most compact description of the network? A model with more communities can always fit the data better — but at the cost of a longer description of the model itself. MDL penalises this complexity, preferring the simplest model that describes the network without discarding genuine structure.

This is why the SBM can do something modularity cannot: it can conclude that the best model is the one with no community structure at all. If the data do not support a partition into $K > 1$ communities, the inference procedure returns a single-group solution — equivalent to the ER graph from Tutorial 2. Modularity, by contrast, always returns a multi-group partition because it is always optimising over partitions, never evaluating whether a partition is warranted.

4.3 The SBM as a hypothesis testing framework

When you fit an SBM and it returns $K = 1$, the method is saying: the best explanation of this network does not require community structure. When it returns $K > 1$, it is saying: describing the network with $K$ communities is more efficient than describing it without.

Crucially, this tests the specific structural hypothesis the SBM encodes, not just a general “something is happening” test. Peixoto (2023) argues that this is the right framing for community detection: not a search for the most visually appealing partition, but the problem of finding the most parsimonious generative model that is consistent with the data.

5. The Degree-Corrected Block Model

The standard SBM has a significant limitation that mirrors the limitation of the ER model in Tutorial 2. Because each community is internally an ER graph, all nodes in the same community have the same expected degree. Real networks almost never look like this — even within a well-defined community, degree is heterogeneous.

The degree-corrected block model (DC-SBM), introduced by Karrer and Newman (2011), addresses this by modifying the connection probabilities to allow heterogeneous degrees within communities:

\[P(A_{ij} = 1) = \frac{\theta_i \theta_j \, \omega_{c_i, c_j}}{2}\]

where $\theta_i$ is a node-specific parameter controlling its expected degree. Within each community, nodes can now have varying degrees — the DC-SBM is, in effect, a collection of configuration model graphs rather than ER graphs. Again the connection to Tutorial 2 is direct: the DC-SBM adds degree heterogeneity within communities in exactly the same way the configuration model adds degree heterogeneity to the ER model.

SBM vs DCSBM

The two models — SBM and DC-SBM — can be compared directly using MDL. Both are fitted to the same network and their description lengths compared. If the DC-SBM provides a substantially shorter description, the data support degree heterogeneity within communities. If the SBM is sufficient, the added complexity of the DC-SBM is not warranted. This is model selection in a transparent form: the data tell you which assumptions are necessary and which are superfluous.

Exercise 4 applies this comparison to the political blogs network.

6. Spectral Methods: Efficient Approximations to Ideal Inference

6.1 Why spectral methods exist

Exact Bayesian inference for the SBM is computationally expensive. Practical fitting uses Markov chain Monte Carlo sampling, which can be slow for large networks. Spectral methods offer a different route to the same community signal. Rather than running full probabilistic inference, they extract community structure from the eigenvalues and eigenvectors of a carefully chosen matrix.

This connection is not coincidental. Spectral methods can be understood as linear approximations to belief propagation on the SBM, which is the optimal inference algorithm for the model in certain parameter regimes (Krzakała et al., 2013).

6.2 The modularity matrix and spectral clustering

The modularity formula can be rewritten in matrix form. Define the modularity matrix:

\[B_{ij} = A_{ij} - \frac{k_i k_j}{2m}\]

The modularity of a partition is then $Q = \frac{1}{2m} \mathbf{s}^T B \mathbf{s}$, where $\mathbf{s}$ encodes community membership. Maximising $Q$ is approximately equivalent to finding the leading eigenvector of $B$ and partitioning nodes by the sign of their entries.

This connects directly to Tutorial 3. The power method you implemented there finds the leading eigenvector of the adjacency matrix $A$. Spectral community detection does the same, but for the modularity matrix $B$: the matrix encodes a null model comparison rather than raw connectivity, but the computational logic is identical.

6.3 The eigenvalue spectrum and detectability

The eigenvalue spectrum of the adjacency matrix carries direct information about community structure. For a sparse random graph with no community structure, the bulk of the eigenvalue spectrum forms a dense band bounded on both sides — this is the noise floor. Eigenvalues in the bulk carry no signal about community structure.

When genuine community structure is present, additional eigenvalues appear outside the bulk — one for each detectable community, in addition to the leading eigenvalue that reflects the mean degree. These outlier eigenvalues are the signal: their corresponding eigenvectors encode which nodes belong to which community.

As you weaken the community structure — reducing $p_\text{in}$ toward $p_\text{out}$ in a planted partition model — the outlier eigenvalues retract toward the bulk. At a critical threshold, they merge into the bulk and disappear. Below this threshold, no algorithm can recover the partition better than chance. This is the detectability threshold established by Decelle et al. (2011) and Mossel, Neeman, and Sly (2012): it is not a limitation of any particular method, but a fundamental information-theoretic limit on what the network structure contains. Krzakała et al. (2013) provide a detailed analysis of this spectral picture for sparse networks.

Eigenvalue spectrum

The practical implication matters: a failed community detection attempt may not mean your algorithm is poor. It may mean the communities are simply not detectable from the network structure alone.

6.4 Beyond modularity: the non-backtracking matrix and Bethe Hessian

The modularity matrix is not the only spectral operator for community detection. Krzakała et al. (2013) showed that the non-backtracking matrix — a matrix defined on directed edges rather than nodes, of dimension $2m \times 2m$ — has spectral properties precisely matched to the SBM’s community signal. Its eigenvalues that correspond to detectable communities are real and separated from the bulk; all others are complex. The number of real outlier eigenvalues tells you the number of detectable communities.

The non-backtracking matrix is large and non-symmetric, making it computationally inconvenient. The Bethe Hessian (Saade, Krzakała, and Zdeborová, 2014) is a smaller, symmetric matrix defined on nodes that shares the same leading eigenvectors. In this sense the Bethe Hessian provides the same spectral community signal in a more tractable package.

The conceptual progression is: belief propagation (exact but slow) → non-backtracking matrix (linear approximation with exact spectral signal) → Bethe Hessian (same signal, symmetric, efficient). You do not need to implement any of these from scratch. What matters is the conceptual picture: different matrices expose different structural signals, and the community signal lives in the outlier eigenvalues that escape the bulk.

7. Practical Guidance: Which Method When?

Use modularity when you want a fast exploratory partition, you have no strong prior about the number of communities, and you are aware of and can accept the resolution limit. Treat the result as a starting point for interpretation, not a confirmed finding. If you apply a null hypothesis test (comparing the observed modularity to its distribution under random rewiring), be aware that a significant result only confirms structure beyond the degree sequence — it does not confirm the specific partition modularity returned.

Use the SBM when you want to confirm — not merely detect — community structure, or to compare models, or to ask whether community structure is present at all. The MDL criterion gives a principled answer to “is a partition warranted?” that modularity cannot provide. Prefer the DC-SBM unless you have specific reason to believe degrees are homogeneous within communities; in practice, real networks almost always favour the DC-SBM.

Use spectral methods when computational speed matters and you can accept an approximate partition without a full posterior, or when you want to diagnose how many detectable communities a network contains before running full inference. The Bethe Hessian is the most principled spectral option.

Always consider what structure you are comparing against. Peel, Larremore, and Clauset (2017) show that metadata labels — political affiliations, demographic categories, known group memberships — often do not correspond to communities in the network-theoretic sense any of these methods detect. Finding that a recovered partition does not reproduce known metadata is not necessarily evidence that your method failed. It may mean that the metadata and the network structure capture different aspects of the same system, and both can be informative in different ways.

8. A Note on the Relationship to Earlier Tutorials

The connections to Tutorials 2 and 3 are not coincidental — they are the point.

From Tutorial 2: the configuration model null model is embedded in the modularity formula. The SBM within each community is an ER graph; the DC-SBM within each community is a configuration model. The closing exercises of Tutorial 2 showed that the configuration model cannot explain real network clustering — which is precisely the phenomenon that community structure models formalise.

From Tutorial 3: eigenvector centrality is the leading eigenvector of $A$. Spectral community detection finds the leading eigenvectors of $B$ or the Bethe Hessian. The power method you implemented for eigenvector centrality is the same computational tool; only the matrix changes. The spectral gap — the separation between the largest eigenvalue and the rest of the spectrum — is exactly what determines whether communities are detectable: outlier eigenvalues that escape the bulk correspond to detectable communities, and their eigenvectors encode the partition.

References

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics, P10008.

Decelle, A., Krzakała, F., Moore, C., & Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic implications. Physical Review E, 84, 066106.

Fortunato, S., & Barthélemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), 36–41.

Ghasemian, A., Hosseinmardi, H., & Clauset, A. (2019). Evaluating overfit and underfit in models of network community structure. IEEE Transactions on Knowledge and Data Engineering, 1–1.

Good, B. H., de Montjoye, Y.-A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81, 046106.

Guimerà, R., Sales-Pardo, M., & Amaral, L. A. N. (2004). Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70(2), 025101.

Karrer, B., & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E, 83, 016107.

Krzakała, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., & Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52), 20935–20940.

McDiarmid, C., & Skerman, F. (2013). Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics, 43, 431–437.

Mossel, E., Neeman, J., & Sly, A. (2012). Stochastic block models and reconstruction. Probability Theory and Related Fields, 1–38.

Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.

Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.

Peixoto, T. P. (2013). Parsimonious module inference in large networks. Physical Review Letters, 110, 148701.

Peixoto, T. P. (2023). Bayesian inference of the number of communities in stochastic block models. Preprint.

Saade, A., Krzakała, F., & Zdeborová, L. (2014). Spectral clustering of graphs with the Bethe Hessian. Advances in Neural Information Processing Systems, 27.