Assortative mixing in networks is the tendency for nodes with the same attributes, or metadata, to link to each other. It is a property often found in social networks manifesting as a higher tendency of links occurring between people with the same age, race, or political belief. Quantifying the level of assortativity or disassortativity (the preference of linking to nodes with different attributes) can shed light on the organisation of complex networks. It is common practice to measure the level of assortativity according to the assortativity coefficient, or modularity in the case of categorical metadata. This global value is the average level of assortativity across the network and may not be a representative statistic when mixing patterns are heterogeneous. For example, a social network spanning the globe may exhibit local differences in mixing patterns as a consequence of differences in cultural norms. Here, we introduce an approach to localise this global measure so that we can describe the assortativity, across multiple scales, at the node level. Consequently we are able to capture and qualitatively evaluate the distribution of mixing patterns in the network. We find that for many real-world networks the distribution of assortativity is skewed, overdispersed and multimodal. Our method provides a clearer lens through which we can more closely examine mixing patterns in networks.
Click here to go to the project page.
L. Peel, J-C. Delvenne, R. Lambiotte, Multiscale mixing patterns in networks PNAS 2018
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because such networks' links are formed explicitly based on those known communities. However, there are no planted communities in real world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. Here, we show that metadata are not the same as ground truth, and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structure.
Click here to go to the project page.
L. Peel, D.B. Larremore, A. Clauset, The ground truth about metadata and community detection in networks Science Advances 2017
Interactions among people or objects are often dynamic in nature and can be represented as a sequence of networks, each providing a snapshot of the interactions over a brief period of time. An important task in analyzing such evolving networks is change-point detection, in which we both identify the times at which the large-scale pattern of interactions changes fundamentally and quantify how large and what kind of change occurred. Here, we formalize for the first time the network change-point detection problem within an online probabilistic learning framework and introduce a method that can reliably solve it. This method combines a generalized hierarchical random graph model with a Bayesian hypothesis test to quantitatively determine if, when, and precisely how a change point has occurred. We analyze the detectability of our method using synthetic data with known change points of different types and magnitudes, and show that this method is more accurate than several previously used alternatives. Applied to two high-resolution evolving social networks, this method identifies a sequence of change points that align with known external "shocks" to these networks.
Click here for a dynamic visualisation of the change points found in the MIT Reality Mining dataset.
L. Peel, A. Clauset, Detecting Change Points in the Large-scale Structure of Evolving Networks AAAI 2015
Professional team sports provide an excellent domain for studying the dynamics of social competitions. These games are constructed with simple, well-defined rules and payoffs that admit a high-dimensional set of possible actions and nontrivial scoring dynamics. The resulting gameplay and efforts to predict its evolution are the object of great interest to both sports professionals and enthusiasts. In this paper, we consider two online prediction problems for team sports:~given a partially observed game Who will score next? and ultimately Who will win? We present novel interpretable generative models of within-game scoring that allow for dependence on lead size (restoration) and on the last team to score (anti-persistence). We then apply these models to comprehensive within-game scoring data for four sports leagues over a ten year period. By assessing these models' relative goodness-of-fit we shed new light on the underlying mechanisms driving the observed scoring dynamics of each sport. Furthermore, in both predictive tasks, the performance of our models consistently outperforms baselines models, and our models make quantitative assessments of the latent team skill, over time.
Click here for an interactive visualisation of the inferred team skills for NFL and NBA.
L. Peel, A. Clauset, Predicting Sports Scoring Dynamics with Restoration and Anti-persistence ICDM 2015