Teaching material for Networks and Simulation
Tutorial 2 Notes
Tutorial 2 Exercises
Tutorial 3 Notes
Tutorial 3 Exercises
Tutorial 4 Notes
Tutorial 4 Exercises
Tutorial 5 Notes
Tutorial 5 Exercises
Tutorial 6 Notes
Tutorial 6 Exercises
Tutorial 7 Notes
Tutorial 7 Exercises
Tutorial 8 Notes
Tutorial 8 Exercises
These notes are intended as pre-tutorial reading and focus on the ideas most directly relevant to the tutorial exercises, particularly the connections back to Tutorials 3 and 4.
Visualisation serves two distinct purposes in network analysis, and conflating them leads to poor designs for both.
Exploratory visualisation is for the analyst. The goal is to help you find patterns you did not know to look for: unexpected clusters, outlier nodes, surprising connections. Exploratory visualisations are typically dense with information, often interactive, and do not need to be self-explanatory. You are the audience.
Communicative visualisation is for an audience. The goal is to convey one specific finding to people who may have no background in network science. A communicative visualisation ruthlessly removes anything that does not serve the central claim — including most of the underlying network structure. The audience is the person reading your report or viewing your presentation.
These two purposes impose different and often conflicting requirements. A visualisation optimised for exploration (complete, multi-layered, showing everything) is usually a poor communicative tool. A visualisation optimised for communication (stripped down, single message) is usually too simplified to support discovery. A common mistake is trying to make a single figure serve both purposes at once, and succeeding at neither.
This distinction matters for the project report. The grading rubric explicitly rewards visualisation quality, and specifically calls out “unambiguous captions that clearly describe the narrative of the report.” Captions describe what the figure is saying, not what it is showing. A communicative visualisation has a narrative; an exploratory one does not need one.
A visualisation encodes data through visual channels: properties of graphical elements that the eye can perceive and distinguish. For network visualisation, the most commonly used channels are position, size, colour (hue), colour (lightness), shape, and transparency.
Figure 1. The main visual channels used in network visualisation. Position is the most accurately perceived channel for quantitative data; colour hue is best for categorical distinctions; size and transparency work well for continuous quantities.
Not all channels are equally effective, and the match between channel and data type matters:
A key principle: encode the most important variable in the most accurately perceived channel. If community membership is your primary claim, use colour. If centrality is your primary claim, use size. Avoid the temptation to encode everything — each additional channel adds cognitive load.
A layout algorithm assigns x/y coordinates to nodes. This is a choice, not a technical detail, and different choices reveal different aspects of network structure.
Figure 2. The same network drawn with three different layouts. The bridge nodes (red) are structurally identical in all three, but their visual salience and the legibility of the two communities differ substantially across layouts.
The most commonly used layout for small-to-medium networks. Treats edges as attractive springs and non-edges as repulsive forces, then finds a low-energy configuration. The implicit claim is that proximity ≈ structural cohesion: nodes that are densely connected cluster together, and sparse bridges appear as long edges between clusters.
This claim is often, but not always, correct. The energy landscape typically has multiple local minima, so two runs of the same algorithm on the same network can produce different-looking results. This is a feature (exploring different minima can reveal different structures) and a limitation (results are not reproducible without fixing the random seed). For the Medici network, which you have already analysed, the spring layout will place the Medici node near the centre of the graph, visually reflecting their bridging structural position.
Force-directed layout fails gracefully on small networks and badly on large ones. Above a few hundred nodes, the energy minimisation either fails to converge or converges to a configuration where dense regions form an undifferentiated mass. The OpenFlights network (~3,300 nodes, ~67,000 edges) is far beyond the scale where force-directed layouts produce interpretable results without additional steps.
Positions nodes equally spaced around a circle. No structural information is encoded in node positions; all structure is expressed through the edge pattern. Circular layouts are useful when the edge pattern itself is the object of interest — for example, comparing which nodes are densely cross-connected versus which connect only through specific intermediaries. The Medici network in a circular layout makes the Strozzi–Albizzi cluster legible as a set of interconnected nodes even though the Medici’s bridging position is less visually immediate.
Uses the eigenvectors of the graph Laplacian to assign coordinates. Specifically, the two eigenvectors corresponding to the smallest non-zero eigenvalues (the Fiedler vector and the next) define an x/y plane that captures the primary directions of variation in the network’s connectivity structure. Nodes that are structurally similar (well-connected to each other) appear close together; nodes that are on opposite sides of the primary cut appear far apart along the x-axis.
Spectral layouts are deterministic (same result every run) and have a direct mathematical interpretation: the layout visualises the algebraic structure of the graph Laplacian. They tend to work best for networks with clear cut structure — two or more communities with relatively sparse connections between them. They perform less well on dense, homogeneous networks where the Laplacian spectrum has no clear gap.
For networks where nodes have real-world spatial coordinates — airports, power grids, road networks, social networks where users have listed locations — using latitude/longitude as x/y coordinates is often the most informative choice for communication purposes. The geographic layout makes the network immediately interpretable to a non-specialist and connects the analysis to the real world.
For the OpenFlights network, a geographic layout reveals continental clustering (dense edges within North America and Europe), the sparsity of connections across oceans, and the role of geographic bottleneck airports like Anchorage and Reykjavik as bridges between continental clusters. What the geographic layout hides: network-structural proximity. Two airports may be structurally very similar (same degree, same community, many shared neighbours) but geographically distant, and this similarity will be invisible in the geographic layout.
Choosing a layout. The question is not which layout is best, but which structural feature matters for the question you are asking. If your question is “which nodes are the bridges?” the spring layout will show this. If your question is “where in the world are the communities?” the geographic layout is better. If your question is “is there a clear bipartition in this network?” the spectral layout is most appropriate. A circular layout is rarely the best choice for a final figure but can be useful for exploration.
The most persistent misconception in network visualisation is that a network must be visualised as a diagram with nodes and edges. This is often wrong — and the cases where it is wrong are precisely the cases where students most need an alternative.
Figure 3. Three representations of the same six-node network. The node-link diagram shows topology; the adjacency matrix (sorted by community) shows block structure; the bar chart shows a single node attribute (degree) precisely. Each answers a different question.
Represent the network as a square matrix where rows and columns correspond to nodes and filled cells indicate edges. If nodes are sorted by community membership, within-community edges form dense blocks along the diagonal and between-community edges appear as scattered off-diagonal entries. This representation:
The adjacency matrix is particularly effective for demonstrating the results of community detection (Tutorial 4). A sorted adjacency matrix with the block-diagonal structure clearly visible is one of the strongest ways to communicate a community detection result.
For spatially embedded networks, plot nodes at their real-world coordinates with edges as lines or arcs. This is not a layout algorithm choice — it is a representation choice. The difference is that a geographic layout uses position to encode real-world geography, not network structure. As discussed in Section 3, this is often the best choice for communicating network findings to non-specialists.
Perhaps the most underused option: simply do not draw the network at all. Many network findings are most clearly communicated through ordinary statistical charts:
The question to ask is not “how do I draw this network?” but “what representation most clearly conveys the finding?” The answer is often not a network visualisation at all.
The techniques that work well for a 30-node network often fail completely at 3,000 nodes. The three most common failure modes at scale:
Recognising these failure modes, and knowing which alternative representation addresses each one, is one of the most practically useful skills in network analysis.
A figure caption should describe what the figure says, not what it shows. Compare:
The second form is what the rubric means by “unambiguous captions that clearly describe the narrative of the report.” Every figure in your project report should have a caption of the second type.
Every design choice in a visualisation — the layout algorithm, the channels used, what is included and excluded, whether to draw a network at all — encodes an implicit argument about what matters. Choosing a geographic layout argues that geography is the primary organising principle of the network. Choosing to show only the top 100 nodes by degree argues that the rest are not important for the finding being communicated. These arguments can be right or wrong, and they should be made consciously and defensible.
This is why the exercises in Tutorial 5 ask you to describe and justify your design choices, not just implement them. The reasoning is the result.