Doubly Stochastic Scaling Unifies Community Detection

Graph partitioning, or community detection, has been widely investigated in network science. Yet, the correct community structure on a given network is essentially data-driven. Thus, instead of a formal deﬁnition, diverse measures have been conceived to capture intuitive desirable properties shared by most of the community structures. In this work, we propose a preprocessing based on a doubly stochastic scaling of network adjacency matrices, to highlight these desirable properties. By investigating a range of community detection measures, and carefully generalising them to doubly stochastic graphs, we show that such a scaling uniﬁes a whole category of these measures—namely, the so-called linear criteria—onto two unique measures to set up. Finally, to help practitioners setting up these measures, we provide an extensive numerical comparison of the capacity of these measures to uncover community structures within stochastic block models, using the Louvain algorithm.


Introduction
By mapping local-level elementary interactions between data, networks provide a powerful template that enables one to analyse emergent behaviours in complex systems, such as biological systems, social networks, etc. [1,Chap.5].Hence, these last decades, analysis of complex networks has been at the core of several research works [2].One aspect has gained a lot of attention: the problem of graph partitioning, also called community detection [3,4,1,Chap.21].Defining a network as a set of entities (called nodes or vertices) connected by interactions (called links or edges), the aim of community detection is to partition the set of the nodes into groups of nodes that are similar or strongly related.The Louvain algorithm cannot detect the smallest community (top matrix, the smallest community highlighted in the red square); and it is unable to detect the two communities connected in an imbalanced fashion (bottom).Right: After scaling, Louvain can detect small communities in presence of larger ones (top); and it can detect the community structure when there is an imbalance in the flows of edges (bottom).
In real-world applications, the rightful community structure depends on the network.For this reason, there exists no formal definition of a community structure since it is always possible to find a community structure that contradicts the definition.However, it is generally admitted that community structures share similar properties: a community should be a group of nodes densely connected, and sparsely connected to the rest of the graph-see Table 1.1 from [5].Thus, a number of measures that capture these properties has been designed to assess the quality of a community structure proposed on a network, e.g.[6,7,8].Optimising such measures is generally a NP-complete problem [9,3,5], thus approximation algorithms have been proposed that perform community detection by approximating the "best" community structure.The most famous is probably the Louvain algorithm [10] that aims to maximise the so-called Newman-Girvan modularity [6].Because of its simplicity, its accuracy in detecting communities, and its efficiency in terms of computational cost [11], it has been one of the most widely-used community detection algorithms for more than 10 years.But there are communities, very intuitive and yet poorly detected by algorithms in general, that even Louvain is unable to detect: 1) Small communities in large networks are generally missed-this is typically the so-called resolution limit [12].2) In directed networks, flow-based communities are usually not detected in presence of an imbalance of the edges leaving and entering these communities.Points 1) and 2) are illustrated in the middle panels of fig. 1, where the results of Louvain algorithm applied to two toy networks exhibiting such community structures are displayed.
The aim of this study is to investigate the potential of matrix scaling as a preprocessing for community detection.Our contributions are three-folds: • We propose a preprocessing based on the so-called doubly stochastic scaling, to increase the detectability of communities, in particular those usually hardly detectable as illustrated in fig. 1.
• By extending several graph partitioning measures to weighted graphs, in particular doubly stochastic graphs, we show that the proposed proprecessing unifies these measures onto two unique measures to set up.
• We conduct extensive comparisons of the capacity of these measures to uncover community structures within stochastic block models (SBMs), which provides guidance for customising them.
The paper is organised as follows: Section 2 lists the definitions and notations to be used through the paper.Section 3 gives an overview of related work.Section 4 presents the method: we introduce the doubly stochastic scaling (section 4.1) and detail the proposed preprocessing (section 4.2), showing its potential on toy examples and a real-world network (section 4.3).In section 5, we discuss the generalisation of six graph partitioning measures to weighted graphs, in particular doubly stochastic ones.Section 6 compares these measures, first theoretically in section 6.1, then experimentally in section 6.2.We finally conclude the study and discuss future work in section 7.

Definitions and Notations
In this section, we present some definitions and notations to be used through the study.Basic mathematical objects are listed in table 1. Graphs.In this study, we investigate networks (also called graphs) that can be weighted or not.Except when stated otherwise, networks are undirected.For a network G = (V, E, Ω), V is the set of nodes, E ⊂ V × V the set of edges, and the function Ω : provides the weights of edges.We limit the study to graphs that are positively weighted.To simplify notations, we assume that graphs have integer nodes, i.e.V = {1, ..., n}.
When there is no possible confusion about the network, letters n and m 60 denote the number of nodes and the total weight of edges respectively, that is n = |V | and m = {u,v}∈E ω({u, v}).The degree of a node u is defined as ω({u, v}).If ∃δ ∈ R : ∀u ∈ V, d u = δ, the graph is said to be δ-regular.We denote by simple graphs the unweighted undirected networks without self-loop-i.e.∀u ∈ V, {u, u} / ∈ E.

65
Adjacency Matrices.A (directed) graph G = (V, E, Ω) can be represented by its adjacency matrix, that is a matrix A ∈ R n×n + where a i,j = ω((i, j)) if (i, j) ∈ E 0 otherwise .
Conversely, given a matrix A ∈ R n×n + , we call the adjacency graph of A the graph whose A is the adjacency matrix.
For undirected graphs, when the adjacency graph of A has no self-loop, then 2m = n i=1 n j=1 a i,j = e T Ae.When the adjacency graph of A is unweighted, we define the complementary of A (denoted A) as the matrix in R n×n + such that that is A = J − A.
Community Structures.Given a graph G = (V, E), a community structure is a partitioning of the set of nodes V , that is a set of subsets of V : C = {C t } t=1..k such that k t=1 C t = V and ∀t = s, C t ∩ C s = ∅.This community structure can be represented as an equivalence relation X on V × V such that uX v ⇐⇒ ∃t ∈ {1, .., k} : u, v ∈ C t .
It can also be represented as a matrix X ∈ {0, 1} n×n such that x i,j = 1 if iX j 0 otherwise.
A matrix X ∈ {0, 1} n×n encodes an equivalence relation X (and hence a community structure) if and only if: ∀i ∈ {1, .., .n},x i,i = 1 (a) ∀i, j ∈ {1, ..., n}, x i,j = x j,i (b) ∀i, j, k ∈ {1, ..., n}, x i,k + x j,k − x i,j ≤ 1 (c) where (a), (b), (c) indicate respectively the reflexivity, the symmetry and the transitivity of the equivalence relation represented by X [13].We denote by Eq(n) the set of the equivalence relations on a set V such that |V | = n.That is, we write X ∈ Eq(n) when a matrix X ∈ {0, 1} n×n verifies (a), (b), (c), and X ∈ Eq(n) for an equivalence relation defined on the set V .For any X ∈ Eq(n), its complementary is defined by X = J − X.

Double Stochasticity.
In the following, we specifically focus on networks that are doubly stochastic, that is such that their adjacency matrices have their row and column sums equal to 1. Formally, a (directed) network G = (V, E, Ω) is said to be doubly stochastic if its adjacency matrix S ∈ R n×n We remark that doubly stochastic graphs are 1-regular graphs.
In this study, we preprocess graphs so that they (or equivalently their adjacency matrices) are doubly stochastic.Transforming a matrix A ∈ R n×n + onto a doubly stochastic matrix is an operation called "scaling A onto its doubly stochastic form".One achieves this by finding two vectors r, c ∈ R * + n such that The matrix S = D(r)AD(c) is called the doubly stochastic scaling of A, and vectors r and c are called the scaling factors.The existence of a doubly stochastic scaling is non-straightforward and is detailed in section 4.1.

Related Work
Doubly Stochastic Scaling for Community Detection.In this study, we design a preprocessing for community detection, based on doubly stochastic scaling.This scaling has already been used in the context of community detection.It is the first stage of the algorithm from [14] that partitions migration networks.However, the rationales for scaling in [14] (invariance of relative odds and approximation of maximum entropy) greatly differ from ours.Also in [15], doubly stochastic scaling is used as a preprocessing step for a spectral algorithm.Furthermore, in [16], authors aim to partition a dataset by finding the doubly stochastic matrix that best approximates the dataset similarity matrix.Finally, in [17], authors use doubly stochastic scaling to perform co-clustering, as scaling factors should approximate the joint densities between the random variables inferring data, and the random variables inferring partitions.We remark that all these studies use the doubly stochastic scaling for a very specific purpose: achieving uniform marginals in the flow table [14], obtaining staircase-like singular vectors [15] or scaling factors [17], or approximating a similarity matrix [16].
On the other hand, the method proposed here is a wider-purpose preprocessing that can be used prior to any community detection method.
Community Detection Measures.This study investigates a bunch of measures designed to assess the quality of a community structure on a network, based on the network structural properties.Nodes within a community are supposed to be densely connected, while being loosely connected to nodes outside their community.Different manners to define "densely" and/or "loosely" lead to different measures.They can be subdivided into three families.Measures based on density, such as Newman-Girvan modularity [6] or coverage [18], define a community as a group of nodes with a high density of edges.Measures based on sparsity also exist, that consider that the amount of edges between two communities must be low (e.g.conductance, expansion [19], or normalised cut [7]).Some measures are a mixture of density and sparsity, such as LambdaCC [5] or Balanced modularity [13].Given a dynamic process defined on the graph edges (e.g. a random walk), measures from the third kind consider a community as a group of nodes from which the process struggle to escape (such as the Map equation [8], the Markov stability [20], or the community distance from [21]).
Remark that some measures based on a priori hypotheses about the ground truth community structure exist, such as the likelihood from SBM-based techniques [22], the likelihood of preserving node neighbourhoods in node2vec [23], or the cross-entropy error over nodes with known label in Graph Convolutional Networks [24].But they are beyond the scope of this study.Most of the measures investigated here are listed in [25] to be used in the Louvain algorithm.This requires them to be defined for graphs with integer weights, and the measures are thus extended to such graphs when needed.Since generalisation is not the purpose of [25], this is done straightforwardly and does not always fit the philosophy of the initial measures, as shown in section 5.

Doubly Stochastic Scaling Preprocessing
In this section we describe and discuss the preprocessing that we propose for community detection, that relies on a doubly stochastic scaling of the graph adjacency matrix.Not every square matrix is amenable to a doubly stochastic matrix.We thus first provide the conditions for such a scaling to exist, and discuss the relations with graph connectivity.We then present the proposed preprocessing, and discuss its impact on some community structures.

Doubly Stochastic Scaling and Graph Connectivity
The Sinkhorn-Knopp Theorem.Given a square matrix A ∈ R n×n + , it is not always possible to find two vectors r, c ∈ R n + such that eq. ( 3) is verified.In order for such a scaling to exist, the pattern of A-i.e. the positions of its nonzero entries-must respect certain conditions, which are provided by the so-called Sinkhorn-Knopp theorem [26].In order to introduce this theorem, we first provide two definitions about the pattern of a matrix on which it relies.
These definitions can be found in [27].
with A 1 , A 2 two square and non empty matrices.
This definition implies that A is not amenable to a block triangular matrix by independent permutations of its rows and its columns.Definition 2. Total Support.A matrix A ∈ R n×n is said to have a total support if every nonzero entry lies on a strictly positive diagonal.One characterisation of this definition proposed in [27] is that there are two permutation matrices R, Q such that with A 1 , ..., A k bi-irreducible matrices.
Theorem 1. Sinkhorn-Knopp.Given a matrix A ∈ R + n×n , a necessary and sufficient condition that there exists a doubly stochastic matrix S = D(r)AD(c) with r, c ∈ R * + n , is that A has a total support.If S exists then it is unique.Vectors r and c are also unique up to a scalar multiple if and only if A is bi-irreducible.
Relations with the Connectivity of the Adjacency Graph.We now introduce the definition of irreducibility, that draws a link between the connectivity of a network and the pattern of its adjacency matrix.
with A 1 , A 2 square and non empty.A characterisation of irreducible matrices from [28] is that they are the adjacency matrices of strongly connected graph.

Algorithm 1: Preprocessing Undirected Graphs
(a i,j ); Every bi-irreducible matrix is also irreducible.Reciprocally, if a matrix is irreducible with its diagonal zero-free-a i,i = 0, ∀i ∈ {1, ..., n}-, then this matrix is bi-irreducible (easily proven by applying the algorithm from [29] to such a matrix).Since definition 3 states that irreducible matrices are adjacency matrices of strongly connected graphs, then the adjacency matrix of every strongly connected graph can be made bi-irreducible (thus scalable) by ensuring that its diagonal is strictly positive (e.g. by adding a positive diagonal matrix to the adjacency matrix, which is equivalent to adding self-loop to the graph).
Remark 1.For an undirected graph, adding a diagonal matrix to its adjacency matrix is sufficient to make it scalable onto a doubly stochastic graph, whatever its connectivity.Indeed, every symmetric matrix with zero-free diagonal has a total support (Lemma 3.3 from [30]).
For a directed graph, each strongly connected component must be scaled and partitioned apart1 .These components can be found by applying the Dulmage-Mendelsohn decomposition on the graph adjacency matrix whose diagonal has been made zero-free [29].

The Preprocessing
We propose to apply a doubly stochastic scaling on networks as a preprocessing for community detection.As discussed in section 4.1, some requirements have to be fulfilled to ensure that the network can be scaled, which depend on whether the graph is directed.The steps to follow to scale a matrix A ∈ R n×n + are described in algo. 1 if A is the adjacency matrix of an undirected graph, respectively in algo. 2 if the adjacency graph of A is directed.
In algo. 1, symscalone is the method from [31] that can compute a doubly stochastic scaling of a general square matrix with total support.It is well designed for symmetric matrices in particular, because it preserves matrix symmetry.In algo.2, dmperm is the Dulmage-Mendelsohn decomposition, evoked in remark 1.When applied to A + I, it returns the strongly connected components of the adjacency graph of A. The largest component is then scaled using the so-called RAS or Sinkhorn-Knopp Algorithm [26,30].For both directed and undirected networks, the adjacency graph of the doubly stochastic matrix S returned by the algorithm is the one on which communities are then detected.For directed network, it means that only the largest strongly connected component is partitioned.However, this is straightforward to extend to the whole graph, by scaling and partitioning each component in turn.
For both algorithms, it is necessary to add entries in the diagonal of the matrix to scale, to ensure that conditions from theorem 1 are verified.We remark that adding diagonal elements leaves the community structure intact, as the community structure of a graph is linked to the diagonal block structure of its adjacency matrix, which is not impacted by its diagonal entries.In both versions of the preprocessing, we choose to add very small entries (10 −8 times the matrix smallest entry) to impact as little as possible the numerical values in the final scaling.This is an empirical choice which is not theoretically justified, and it would be interesting to analyse how these diagonal entries impact the final scaling.We leave this analysis to further work.

Impact of the Preprocessing on Synthetic and Real-World Data
Rationales on Toy Examples.Our intuition that the doubly stochastic scaling may improve community detection comes from the two toy examples from fig. 1, used in section 1 to illustrate the difficulty of detecting some community structures.First, doubly stochastic scaling leverages the weight of the edges in small and large communities.One may think at a trivial example, where a simple graph is composed of two disjoint communities of different size n 1 > n 2 , such that the probability for two nodes in a same community to be linked is equal to p in , for both communities.Then, in average, a node in the large community shares more links with nodes from its community than a node in the small community (p in × n 1 > p in × n 2 ).This is not true anymore if we look at the doubly stochastic scaling of the adjacency matrix.In this case, every node in both communities shares strictly the same amount of edges with nodes from its community, that is 1 by definition of the doubly stochastic scaling.
Secondly, doubly stochastic scaling can rationally be expected to mitigate against an existing imbalance in the direction of edges, because of its so-called vanishing effect.To understand it, we explain the behaviour of doubly stochastic scaling on A = 1 1 0 1 .This matrix has no total support.Thus, according to theorem 1, it is not amenable to a doubly stochastic form.Nevertheless, doubly stochastic scaling algorithms provide scaling factors r and c that tend towards (0, +∞) T and (+∞, 0) T respectively, so that the doubly stochastic scaling of A tends towards S = 1 0 0 1 , in which the off-diagonal element had vanished [30,26].
As a matter of fact, the proposed preprocessing indeed improves the detectability of community structures of the toy examples from fig. 1: Louvain algorithm applied directly on the graphs fails to detect their structures; on the other hand, when applied on the preprocessed graphs, it returns their ground truth community structures, as shown in the right panels of fig. 1.
Food Web of Florida Bay.Here we observe the impact of the proposed preprocessing on the network of trophic dynamics within Florida Bay.In this directed network, a node is a compartment and an edge indicates carbon exchangesroughly, an edge from node a to node b means that species in compartment a are eaten by species in compartment b.The network contains 128 compartments, that can be divided onto 9 types according to [32], namely Phytoplankton producers, Seagrass and seagrass roots, Microfauna, Macroinvertebrates, Fishes, Birds, Reptiles, Mammals, and Detritus.According to [33], this partitioning into types corresponds to the network underlying community structure.The matrix S ∈ R p×p + returned by algo. 2 is illustrated in fig. 2. The ground truth community structure is indicated by the black lines.Because numeri-cal values range from 1 to 10 −82 , only entries higher than 10 −12 are plotted.Nonzero entries below this threshold are shown by black '+'s.From fig. 2 we observe that the preprocessing clearly tends to make vanish the edges between communities.This is highlighted by the high density of black '+'s in the offdiagonal blocks, meaning that numerous entries in S off-diagonal blocks have a value that falls below 10 −12 .To assess the extent to which the preprocessing indeed sharpens the network community structure, we need to compare the consistency of these communities on both the raw and the scaled networks.We base this comparison on the concepts defined below., that is the ratio between the amount of edges that node u shares with nodes in C and the degree of u.Thus, the average level to which nodes from community and matrix Φ provides a view of the community structure consistency.
Clearly, the higher the reflexive values of Φ, the more consistent the community structure.We compute the values of Φ for two matrices that are symmetrisations of the raw and preprocessed directed networks, namely B + B T , where B is the adjacency matrix of the raw network largest strongly connected component; and S + S T , where S is the matrix S with its diagonal values put to 0. We remove the diagonal because most of S diagonal entries are scaled close to 1 (whereas they are initially very small).Thus, keeping the diagonal provides spuriously high values for Φ(C, C), ∀C ∈ C, whatever the community structure.
These values of Φ are displayed in fig. 3. The three last communities that contain no more than 3 nodes are missed by both the raw and the preprocessed matrices.And looking at the structure of these communities restricted to the analysed component in fig.2, it is indeed not possible to consider them as standalone communities, without having been told so.The community corresponding to the Birds tends to be merged with Fishes by both raw and preprocessed networks.This is also in line with what can be observed from fig. merged with Fishes.We also remark that the preprocessing has more impact on the consistency of smaller communities-reflexive Φ values are 3.74 times higher in the preprocessed network than in the raw one for Microfauna and Macroinvertebrates, 1.73 for Fishes.These observations illustrate the potential of the proposed preprocessing to increase the detectability of community structures within networks with an imbalance in edge direction between communities, as well as small-size communities.
This list of graph partitioning measures is not exhaustive.These measures are actually the linear criteria from [34].Formally, denoting by F a criterion that assesses the quality of a community structure on a graph represented by its adjacency matrix, F is a linear criterion [25] if it can be written as where A and X are respectively a graph adjacency matrix and a community structure, ϕ : R → R is a function and K is some constant scalar.
For each criterion, we address three points: • We explain quickly the measure background, that is how it works and why it assesses community structures, as well as its formulation.
• Most of these measures are initially designed for unweighted networks, and some have been generalised to weighted graphs afterward.When such a generalisation exists, we may either use it or derive another one that we find more suitable for the community structure detection on doubly stochastic graphs.We hence discuss the measure generalisation to weighted graphs and especially to doubly stochastic ones.
• We provide a reduced form for the problem of finding the best community structure on a graph represented by its adjacency matrix A using a criterion F .Namely, this problem is expressed as where φ and φ are two functions in R + respectively called the positive and negative agreements, as in [34].This reduced form allows us to compare the criteria in section 6.

Newman-Girvan modularity
Principle.The Newman-Girvan modularity introduced in [6] is the most famous graph partitioning measure.The idea behind this criterion is that a community structure in a network actually characterises the property of assortative mixing in this network [37].The assortative mixing is the tendency of similar nodes to draw connection amongst themselves instead than with dissimilar nodes: as an example, in a social network, people who speak the same language or have similar sociological background have more chance to be friends.Hence, given an assortative network, a good community structure is one such that the fraction of edges that connect nodes in a same community is high.
However, this notion cannot be used as a standalone.Indeed, the trivial structure that brings all the nodes in a same community always maximises this fraction of edges.Thus, to derive the modularity, Newman and Girvan also assume that random graphs do not exhibit a community structure [6].The modularity is hence designed to compare the fraction of intra-community edges in a network with the expected fraction of intra-community edges in random graphs with the same degree sequence than the initial graph (i.e.generated by the configuration model).In the configuration model with degree sequence {d 1 , ..., d n }, the probability of an edge between two nodes i and j can be approximated by d i d j /2m.The modularity is thus defined as with A the adjacency matrix of the network, and C a community structure.In turn, this can be re-written (as in [38]) with X ∈ Eq(n) the matrix representation of the community structure C.
Generalisation.The initial Newman-Girvan modularity from [6] is designed for unweighted graphs only.In [38], Newman proposes two steps to generalise modularity to weighted graphs.First, he investigates multi-graphs, that are simple networks in which two vertices can share more than one simple edge, as in fig. 4. Newman generalises some basics from simple networks to multi- graphs to derive an adapted modularity.Namely, let A ∈ N n×n be a multigraph adjacency matrix: 1) The degree d i of a vertex i in the multi-graph is the number of simple edges adjacent to i: 2) The constant 2m becomes the sum over the degrees, that is 2m = With these simple adaptations of degrees and number of edges, Newman generalises the modularity by simply applying eq. ( 7) to multi-graphs, with a i,j , d i and 2m as defined above.Secondly, modularity is extended from multi-graphs to positively weighted graphs with the following remark: Given a graph whose adjacency matrix can be written as A = αN, with α ∈ R + and N ∈ N n×n , and considering d i = k a i,k and 2m = i d i , then for any X ∈ Eq(n), the results of the formula from eq. (7) applied to A and to N are equal.Hence, the modularity as defined in eq. ( 7) can be extended to graphs for which it exists a unit flow-i.e. an α-allowing to consider them as multi-graphs.
We show in [39, Property 1] that for every square matrix whose entries are rational, a unit flow can be found, but that this is not true for any weighted matrix.However, we also provide [39, Property 2] a proof that eq. ( 7) can be extended to any undirected positively weighted graph3 .We thus apply directly eq. ( 7) to doubly stochastic matrices in the following.
Reduced Form.Given an adjacency matrix A, finding the best community structure in the sense of the Newman-Girvan modularity provided in eq. ( 7) is equivalent to maximising the function This provides the reduced form of eq. ( 6), with positive and negative agreements equal to respectively φ(a i,j ) = a i,j and φ(a i,j ) = d i d j /2m.Moreover, in a doubly stochastic graph, ∀i, d i = 1 and 2m = n.Thus, for a doubly stochastic matrix S, we can simplify the Newman-Girvan modularity as and the negative agreement φ(s i,j ) = 1/n does not depends on i, j.

Balanced Modularity
Principle.This criterion is proposed in [13] to complete the Newman-Girvan modularity.Recall from section 5.1 that, given a simple graph G = (V, E) and a community structure, the Newman-Girvan modularity compares the ratio of edges within communities-i.e.intra-community edges-with the expected ratio of intra-community edges within a random graph with the same degree sequence than G.Then, the idea behind the Balanced modularity is to also take into account the ratio of inter-community edges.In other words, the Newman-Girvan modularity considers that a good community structure on G should have a ratio of intra-community edges "higher than by chance", whereas the Balanced modularity considers that a good community structure should have a ratio of inter-community edges lower than by chance as well.
To take into account the ratio of inter-community edges, the Balanced modularity focuses on the complementaries of the graph and the community structure.We can state its concept as follows.Let us denote by Φ : R n×n × R n×n → R the function such that which is equivalent to the Newman-Girvan modularity from eq. ( 8) when A is an adjacency matrix and B ∈ Eq(n).Thus, given A the adjacency matrix of a simple graph and X a community structure, the Balanced modularity is defined as An explicit formula can be derived from eq. ( 10) by expressing the degrees and number of edges in the complementary of a simple graph through those from the graph.It can be indeed observed from fig. 5 that Hence, we can write which is the formula of the Balanced modularity provided in [13].

345
Generalisation.The Balanced modularity is built on the complementary of the graph, which stands for simple graphs only.However, a generalisation of this criterion to weighted graphs is proposed in [25].It consists in stating that a i,j = max k,l (a k,l ) − a i,j = a max − a i,j in eq.(11).But this generalisation does not fit with the spirit of this criterion as stated in eq. ( 10), because it does not update d i and k d k according to the new definition of A = a max J − A in the second sum of eq.(11).That is, it does not inject the weighted generalisation of A in eq.(10).Hence, we propose another generalisation.Considering that, for a weighted graph defined by its adjacency matrix A, the complementary of A can be expressed as A = α × J − A, with α a scalar (that may depends on A).Thus, the degrees of nodes in the complementary graph are .
By injecting A in eq. ( 10), the Balanced modularity becomes It remains to discuss the value of α.First, we remark that, for A the adjacency matrix of any simple graph, the graph associated with A + A is the complete graph with self-loop: it is not possible to add any edge in this graph, that is, all edges are saturated.In a general case, given A ∈ R n×n + the adjacency matrix of some positively weighted graph, without any other knowledge on the graph, we can assume that an edge is saturated if its value is a max , with a max as defined above.In this case, we cant state A = a max J − A as in [25].This generalised Balanced modularity is provided by setting α = a max in eq. ( 12), which is slightly different than changing a i,j for a max −a i,j in eq (11), as proposed in [25].For doubly stochastic graphs, there is an upper-bound on the weight of an edge, that is 1.Indeed, as a doubly stochastic graph is a 1regular, positively weighted graph, no edge can have a weight above 1.Hence, 1 is the value that saturates an edge, and we can state α = 1 in eq. ( 12) if the matrix is doubly stochastic.
Reduced Form.We derive the reduced form for the formula given in eq. ( 12), as this formula can be used for weighted and simple graphs as well (by setting α = 1, it becomes equal to eq. ( 11) when A represents a simple graph).Recalling that X = J − X-or equivalently, ∀i, j, x i,j = 1 − x i,j -, maximising eq. ( 12) is equivalent to maximising and the positive and negative agreements for the Balanced modularity in the general case can be stated as respectively φ(a i,j ) = a i,j + (αn However, for a doubly stochastic matrix S, the formula of eq. ( 13) can be greatly simplified.With α = 1, by remarking that ∀i, which allows us to simplify the positive and negative agreements as φ(s i,j ) = s i,j and φ(s i,j ) = 1 n , with the latter one that does not depends on i, j.

365
Principle.This criterion, proposed in [34, Chap.2.5.6], is based on a principle very similar to Newman-Girvan's one.The conceptual difference between these two criteria is that, given a graph and a community structure, the Deviation to Uniformity criterion compares the ratio of intra-community edges within the graph with the expected ratio of intra-community edges within δ-regular random graphs, by stating δ as the average degree in the initial graph-whereas the random model in Newman-Girvan modularity has the same degree sequence than the initial graph.Such a random model corresponds to graphs where edges are uniformly distributed among nodes.Thus the probability that there is an edge between two nodes i and j is equal to , where d k s are the degrees of the nodes in the initial graph.Hence, given A ∈ R n×n + the adjacency matrix of some positively weighted graph, and X ∈ Eq(n) a community structure, the Deviation to Uniformity can be written as This criterion is defined for weighted graphs such that those that fall into the scope of this study, so we do not discuss its generalisation.
Reduced Form.The reduced form is directly derived from eq. ( 14) by stating the positive and negative agreements as respectively φ(a i,j ) = a i,j and φ(a i,j ) = k d k n 2 .For a doubly stochastic matrix S, then k d k = n and the criterion from eq. ( 14) can be simplified as The negative agreement thus becomes φ(s i,j ) = 1 n .

Deviation to Indetermination
Principle.This criterion introduced in [13], is based on the principle of indetermination between two categorical variables as explained below.Given a set S of M objects, and P, Q two categorical variables on S. A categorical variable indicates the category taken by an object from the set.For instance, the objects can be human beings, and the categories are mother tongues, or first names, as long as we can consider that each human being has only one mother tongue and only one first name.Formally, we state where {p 1 , ..., p π } are the categories of variable P -e.g., languages if P (u) is the mother tongue of individual u-, respectively {q 1 , ..., q σ } the categories of variable Q.We remark that, as a unique category is attributed to each object by a variable, P and Q also represent equivalence relations-two individuals named Morgan are in relation according to the Q that represents first names.We remark that P and Q can be represented by two matrices P ∈ {0, 1} M ×π , respectively Q ∈ {0, 1} M ×σ , such that , which allows us to write the equivalence relations defined by the variables P and Q as C (p) = PP T ∈ Eq(M ), respectively C (q) = QQ T ∈ Eq(M ).We can also derive their contingency table N = P T Q, with n i,j = |{u ∈ S : P (u) = p i and Q(u) = q j }| the number of objects with both category p i from P and category q j from Q.
Given these matrix notations, we explain below the indetermination between categorical variables.Considering two categorical variables as two equivalence relations, an interesting problem is to measure their association [40].This is done by comparing the agreements and disagreements between the two variables-these notions are illustrated in table 2. Indetermination is a special case of association.Strictly speaking, one says that two variables are indetermined if their number of agreements is equal to the their number of disagreements, that is The notion of indetermination can be generalised to allow one to weight positive and negative cases differently.Indeed, it might worth to give more weight to objects that are related than to those that are not [41].Recall that π (respectively σ) is the number of categories for variable P (respectively Q), an interesting generalisation of indetermination is to weight positive cases with π − 1 and negative cases with 1 in P , respectively σ − 1 for positive and 1 for negative cases in Q.This provides the following equality for indetermination This choice of weights is special because two categorical variables that verify eq. ( 16) verify also other properties, e.g. they make vanish the so-called Jansen-Vegelius criterion, one of the most famous association criteria.Besides, eq. ( 16) is strongly related to another special case of association, called the geometrical independence-see [41] for comparisons and discussions about the different 375 notions of independence and indetermination.From here, we use the term indetermination to speak about the generalised indetermination weighted as in eq. ( 16).
It is shown in [41] that eq. ( 16) can be rewritten using the contingency table Thus, for any contingency table N ∈ N p×q , the deviation to indetermination is measured by The Deviation to Indetermination criterion is based on eq. ( 17), and can be understood as follows.Let N ∈ N p×p be a contingency table built on two variables with the same categories P, Q ∈ {0, 1} M ×p , thus, an equivalence relation  groups together categories such that P and Q are highly determined (or far from the indetermination) when restricted to categories from a same group.
The parallel with community structures is done by remarking that a multigraph can be seen as the contingency table of two categorical variables, whose categories are nodes and which are defined on a set S consisting in the end nodes of edges.An example is provided in fig.6.In this figure, the edges of a multi-graph are named e 1 , ..., e 5 , and we define two categorical variables on their end nodes.Namely, each edge can be written e = (u, v), where u and v are the end nodes of e, with u the source node (e(s)) and v the target node (e(t)).As the direction of an edge is immaterial in a undirected graph, the two categorical variables are created by swapping the role of end nodes-e.g. in fig.6, P sees e 1 = (a, b), whereas Q states e 1 = (b, a).Thus, considering A ∈ N n×n the adjacency matrix of some multi-graph as the contingency table of two such variables, one can look for the community structure that groups together the nodes such that these two categorical variables are highly determined when restricted to these nodes.Roughly, given such a group of nodes, it means that most of the edges have either both or none of their end nodes in this group.By adapting eq. ( 18) to the specific case of multi-graphs, we remark that finding such a community structure is equivalent to finding X ∈ Eq(n) that maximises Generalisation.This criterion is naturally defined on mutli-graphs, since we can write them as contingency tables.Using the same trick than for Newman-Girvan modularity, this criterion can be directly applied to undirected positively weighted graphs.
Reduced Form.The Deviation to Indetermination criterion from eq. ( 19) can be rewritten as in eq.( 6) by choosing the positive and negative agreements as respectively For a doubly stochastic matrix S, simplifications can be done.Indeed, since ∀i, d i = 1 and k d k = n, eq. ( 19) can be simplified as The positive and negative agreements become respectively φ(s i,j ) = s i,j and 395 φ(s i,j ) = 1/n, the latter not depending on i, j.

Zahn Criterion
Principle.Strictly speaking, the Zahn criterion does not assess the consistency of a community structure on a given network.However, it can be straightforwardly extended to such a purpose.The Zahn criterion is designed to compare two relations over a set of objects [35].More precisely, given a finite set V and a symmetric relation R over this set (that is ∀(u, v) ∈ V × V , R verifies uRv ⇐⇒ vRu), Zahn wants to find the equivalence relation X which is the closest to R. To this aim, Zahn designs a distance between two relations by considering both relations as subsets of the cardinal set V × V , and counting the number of pairs that belong to only one subset.An example is provided in the top panel of fig. 7, where the symmetric relation R and the equivalence relation X are defined on a set V = {a, b, c, d, e}.They are represented as subsets of V × V by grids, where a coloured cell means that the two corresponding objects are related.For instance, by looking at the row of object a in the grids, we see that aRb and aRc for R, and aX a, aX b, and aX c for X .This can be rewritten (a, b), (a, c) ∈ R and (a, a), (a, b), (a, c) ∈ X .In both grids, the dark coloured cells correspond to pairs of objects that belong to both relations and the light ones are pairs that lie in only one subset.With this mapping between  the relations defined on a set V and the subsets of V × V , the distance defined by Zahn is with R a symmetric relation, and X an equivalence relation.
In [34], it is proposed to use this criterion to assess community structures on simple graphs, by remarking that a simple graph can be characterised by a symmetric relation over the set of its nodes, and a community structure on this graph is an equivalence relation over the graph nodes as well.The bottom panel of fig.7 illustrates the relations R and X as respectively a graph and a community structure.Zahn distance is also rewritten in [34] to get a matrixoriented formulation of this criterion.Denoting A the adjacency matrix of the simple graph associated with the symmetric relation R, respectively X the matrix representation of the equivalence relation X , Hence, Zahn distance can be rewritten as (ai,jxi,j + xi,jai,j).
When used for community detection, the Zahn criterion is often stated as equivalent to the so-called Condorcet criterion [34].However, in [39, Chap.1.1.3],400 we show that the Condorcet criterion cannot be extended to the problem of finding the best community structure given a graph.
Finally, we observe that the criterion of eq. ( 22) defines a distance in the formal mathematical sense (i.e., it is positive, symmetric, separable and verifies the triangle inequality).This does not hold anymore for any generalisation Generalisation.Zahn distance, originally designed for comparing relations over a finite set, is straightforwardly extended to simple graphs and community structures.On the other hand, its generalisation to weighted graphs is not as straightforward since there is no trivial matching between a weighted graph and a symmetric relation.However, a generalisation of Zahn criterion to weighted graphs is proposed in [25].It is directly derived from eq. ( 22) by defining the complementary of the real-valued matrix A as A = a max J−A, with a max = max i,j (a i,j ).This leads to the criterion Nevertheless, we propose other generalisations, as this one does not always seem suitable.Indeed, the purpose of generalising criteria to weighted graphs is to enable them to assess community structures on prepocessed doubly stochastic graphs.Assuming that a graph before preprocessing is simple and thus associated with a symmetric relation R over its set of nodes.Calling S the adjacency matrix of the preprocessed graph, and X some community structure.Then, with Zahn criterion as defined in eq. ( 23): • There is an imbalance between the impact on the criterion of pairs in R ∩ X and in X ∩ R. Any pair in R ∩ X results in a penalisation of the criterion equal to s max , whereas a pair in R ∩ X results in a penalisation equal to s i,j ≤ s max .Hence, each pair in R ∩ X penalises the criterion equally to the highest penalisation that can be reached by a pair in R ∩ X .
• Except for pairs (i, j) such that s i,j = s max , every pair that lies in X ∩ R penalises the criterion.
We believe that these points are non desirable aspects of the previous generalisation of Zahn criterion.For this reason, we propose other generalisations.As authors of [25], given a positively weighted matrix A, we choose generalisations that simply redefine the complementary of A in eq.(22).That is, we define the generalised criterion as with α some constant to set up.We find that choosing α as the mean element of the matrix A, that is, α = a mean , is a good trade-off to mitigate against the two drawbacks listed above: • Each element in R ∩ X penalises the criterion with the mean value of the adjacency matrix.
• An element in X ∩ R penalises the criterion only if its value is lower than the adjacency matrix mean value.Otherwise it even favours the criterion.
We consider two possible definitions of α = a mean , namely a i,j , and ( 25) We denote by d ω Z,2 the criterion from eq. ( 24) obtained with α = a mean from eq. ( 25), respectively d ω Z,3 the one obtained using α = a mean from eq. ( 26).Different behaviours of d ω Z,1 , d ω Z,2 and d ω Z,3 are illustrated on a toy example in fig.8.This figure shows a weighted graph with two disjoint components, where each component is a clique with its unique own edge value.In red and in blue, two community structures are proposed, that we denote respectively X r and X b .The criterion from eq (23) considers that X b is a community structure that better approximates the ground truth structure of the graph than X r .
On the other hand, the criterion d ω Z,2 states that X r is better than X b , which may be a more desirable situation.Finally, d ω Z,3 considers the two structures as equivalent.
Remark 2. With both generalisations from eq (24) using α = a mean , negative values are possible, and the symmetry is not preserved (d Z (A, X) = d Z (X, A)), whereas eq (23) ensures the positivity of the results and preserves the symmetry, since ∀X ∈ Eq(n), x max = 1.
Reduced Form.We now aim to find formulations of eqs.( 22), ( 23) and ( 24) that fit with the reduced form from eq. ( 6).We first remark that, given A the adjacency matrix of some graph, the Zahn criterion d Z (A, .) is an objective function that one aims to minimise, whereas in eq. ( 6), the criterion must be a function to maximise.Hence, we express the opposite of d Z and remark that minimising the function given in eq (22) is equivalent to maximising For unweighted graphs, the positive and negative agreements of the Zahn criterion are thus respectively φ(a i,j ) = a i,j and φ(a i,j ) = 1/2.For weighted graphs, the opposite of d ω Z from eq. ( 24) produces the following reduced form for Zahn criterion with α = max i,j (a i,j ) for the generalisation of eq. ( 23) and α = a mean for the generalisation proposed here.Moreover, this second generalisation on a doubly stochastic matrix S implies that α = 1/n using eq.( 25), respectively α = n/nnz using eq.( 26).Both negative agreements φ(s i,j ) = 1/2n and φ(s i,j ) = n/2nnz do not depend on i, j.

Correlation Clustering Criterion
Principle.The Correlation Clustering is first introduced by Bansal et al. in [36].
Their problem can be stated as follows.Given a set of objects such that, for each pair of objects, one knows if the objects are similar or dissimilar, the aim is to find a clustering that "maximises agreements", or equivalently "minimises disagreements".They model the set of objects as a complete graph such that each pair of nodes (objects)-or equivalently, each edge-has a label "+" if objects are similar, and a label "-" if objects are dissimilar (see fig. 9), and give a formal definition of maximising agreements/minimising disagreements.
• Maximising agreements means finding a clustering with both as many edges labelled "+" having end nodes in a same cluster as possible, and as many edges labelled "-" with end nodes in different clusters as possible.
With notations from fig. 9, it means solving • Minimising disagreements means finding a clustering with both as few edges labelled "+" with end nodes in different clusters as possible, and as few edges labelled "-" having end nodes in a same cluster as possible.
With notations from fig. 9, it means solving One of the authors' rationales for formalising a clustering problem as a Correlation Clustering problem is that, on the contrary of other clustering methods that used to exist, the Correlation Clustering problem can be solved without indicates edge weights, indicates edge labels, they focus on a generalised formulation of the "minimising disagreements" problem by looking for argmin In [34], it is proposed to separate positive and negative labels in the weight indicator Ω, that can be expressed as creating two functions Ω + and Ω − such that and This allows to simplify eq. ( 29) as By denoting we remark that minimising X → g CC (G, .) is equivalent to minimising a function d CC defined by Generalisation.The case of graphs with positive and negative edges is beyond the scope of this study.However, in positively weighted networks, it is natural to assume that an edge indicates that its two end nodes are similar.In turn, one can assume that dissimilarities are indicated by an absence of edge.We use this idea to generalise the Correlation Clustering criterion to positively weighted graphs.For this purpose, we define the pattern of a matrix as the following function P : R n×n . Given G = (V, E, Ω) some positively weighted graph and A ∈ R n×n its adjacency matrix, the absence of edge in G is characterised by J − P A .Thus, denoting by λ > 0 the penalisation for clustering together nodes that are dissimilar, the Correlation Clustering from eq. ( 31) becomes where A is the adjacency matrix of some positively weighted graph, and X is a community structure on this graph.The proposed generalised Correlation 470 Clustering hence depends on some parameter λ > 0 to set up.
Remark 3. When focusing on simple networks, this generalised Correlation Clustering criterion is close to the LambdaCC function proposed in [43].
Remark 4. Another way to generalise the Correlation Clustering criterion may be to consider that a positively weighted graph is actually a complete graph, where an edge whose weight is equal to 0 is the strongest case of dissimilarity.
In this case, one can shift the weights so that the graph has positive and negative values.Given A the adjacency matrix, the most straightforward way to do so is to consider that an edge is a dissimilarity if it is below the mean value of A, In this case, the criterion from eq. ( 31) becomes which is equivalent to the formula of the Deviation to Uniformity criterion developed in Section section 5.3.
Reduced Form.We aim to reduce the formula from eq. ( 32) to make it fit with eq. ( 6).As the Correlation Clustering criterion defined at eq. ( 32) is a criterion to minimise to obtain the best community structure, we look at its opposite.Minimising d λ CC is equivalent to maximising The positive and negative agreements for this generalised criterion are respectively φ(a i,j ) = a i,j and φ(a i,j ) = λ × (1 − p A i,j ).

Comparison of the Criteria
In this section, we compare the criteria from section 5.In table 3, we recall the reduced formulations of these criteria when applied on simple or doubly stochastic graphs.

Homogenisation on Doubly Stochastic Graphs
The first key result directly observed from table 3 is that, when applied to doubly stochastic graphs, many criteria become equivalent, as stated in theorem 2.
Theorem 2. Given S ∈ R n×n the adjacency matrix of some doubly stochastic graph, and X ∈ Eq(n) a community structure, thus X).Theorem 2 extends theorem 6.1 from [34] that states that these criteria are equivalent in the case of k-regular simple graphs.Furthermore, the doubly stochastic Zahn modularities, while not strictly equivalent to these four criteria, have very similar formulations.Actually, one can draw a parallel between Zahn formulations and the parametrised Newman-Girvan modularity, used to mitigate against the so-called resolution limit of the Newman-Girvan modularitythat is, its unability to highlight small communities [12].This function is defined in [44] as Clustering Parametrised

Modularity
Table 3: Reduced formulations of the different criteria when applied on simple (respectively doubly stochastic) graphs.
with A the adjacency matrix of some simple graph, X ∈ Eq(n), and γ > 0 the parameter.In definition 5, we also define a parametrised criterion for the doubly stochastic version of Newman-Girvan modularity.
Definition 5. Given S ∈ R n×n the adjacency matrix of some doubly stochastic graph, X ∈ Eq(n), and γ > 0 a scalar, the parametrised doubly stochastic Newman-Girvan modularity is defined as These parametrised versions of the Newman-Girvan modularity are added to the list of criteria, as the last row of table 3, for both simple and doubly stochastic graphs.The doubly stochastic versions of the Zahn criterion can be expressed using definition 5, as stated in property 1.
Property 1.Given S ∈ R n×n the adjacency matrix of some doubly stochastic graph and X ∈ Eq(n).The doubly stochastic Zahn modularities can be expressed as parametrised doubly stochastic Newman-Girvan modularities, using the following values for the γ parameter: .
Thus, the Correlation Clustering criterion is the unique doubly stochastic criterion from table 3 that cannot be expressed as a parametrised doubly stochastic Newman-Girvan modularity.We now provide the main result of this study in result 1.
Result 1. Generalising the criteria to doubly stochastic graphs unifies those criteria.Namely, there are two families of parametrised criteria: 1.The Newman-Girvan-like ones 2. The Correlation Clustering-like ones Each criterion is obtained from one of these parametrised criteria, using a specific parameter.
Figure 10: Two instances from the benchmark, built using the p in and pout from the first column (left) and the last column (right) of table 4.

Numerical Comparisons
In this section we compare the behaviours of the different criteria to uncover community structures, applied on modular simple graphs on one hand, and their doubly stochastic preprocessing on the other hand.To that purpose, we optimise those criteria using the optimisation framework proposed by the Louvain algorithm [25].Benchmark.For these numerical experiments, we build a range of random modular networks, using eight Stochastic Block Models (SBMs).In brief, SBMs are random models for generating networks with some block structure, with prescribed probabilities of edges within and between the blocks.Models in which intra-block probabilities are higher than inter-block probabilities produce networks with community structures [45].Each SBM is used to generate 10 graphs of 1600 nodes, with an average degree equal to 100 and 31 blocks: 16 blocks of 20 nodes, 8 blocks of 40 nodes, 4 blocks of 80 nodes, 2 blocks of 160 nodes and one block of 320 nodes.They all have one unique probability of intra-block edge and one unique probability of inter-block edge, denoted respectively p in and p out .These SBMs differ in the values of parameters p in and p out , which are chosen so that the community structures of the random graphs become less and less sharp.The sharpness of the community structure is assessed by the socalled network mixing parameter [4].The nodal mixing parameter measures the strength of a node's community membership by computing the ratio between its links outside the community and its degree.The greater the mixing parameter for each node, the weaker the community structure.The network mixing parameter µ is the mean value of the nodal mixing parameters [11].Two instances from the benchmark are illustrated in fig.10.These are two modular networks generated by the SBMs with highest and lowest mixing parameters.Finally, 80 benchmark graphs are built using SBMs from the NetworkX library 5 , and preprocessed using algo.1.The pairs of intra-and inter-edge probabilities p in and pin 7.32 6.58 5.83 5.09 4. 34  p out used in the SBMs are showed in table 4, along with the corresponding theoretical mixing parameters, and the average mixing parameters observed in the simple graphs, and in the doubly stochastic scaling of these graphs, respectively.All numbers are multiplied by 10 to improve readability.We observe that the mixing parameters of the preprocessed graphs tend to be slightly below those of the simple graphs, and of the theoretical value as well.
Scores.To assess the quality of the community structures returned by Louvain, we compare them to the ground truth by adapting the definitions of Precision, Recall and F1-score to community detection.Namely, assume we have X * ∈ Eq(n) the ground truth, and X ∈ Eq(n) the community structure returned by Louvain.We define the number of true positives as the number of pairs of different elements that are put together by both community structures, that is T P = i<j x i,j × x * i,j .The number of false positives is the number of pairs that are put together by X but not by X * : F P = i<j x i,j × (1 − x * i,j ).And the number of false negatives is the number of pairs that are put together by X * but not X, namely F N = i<j (1 − x i,j ) × x * i,j .Now, we can derive Precision, Recall and F1-score of X as usual.
• Precision: P rec( X) = T P T P + F P Furthermore, since the Louvain algorithm is sensitive to node labelling, we apply it four times to each network in the benchmark, using a random labelling.Thus, in the following figures, the points on the curves are the average score of the 40 returned community structures (10 networks and 4 runs of Louvain).For each point, these 40 community structures are summarised by box plots, that indicate the median (white circle with black point), 25th and 75th percentiles (edges of the box), and extreme values (extrema points of vertical segments).
Finally, the number of communities returned by Louvain often helps to explain some observations done on the scores.Indeed, as the number of communities is not constrained, Louvain algorithm may find either more or less communities than expected, with different impacts on Precision and Recall.Thus, we also compare the number of communities returned by Louvain with the expected number (31).Given n c the number of communities in X, we compute This allows a fairer comparison between the criteria that over-or under-partition the graphs.These ratios are displayed in table 5. Structures with more (respectively less) communities than expected are highlighted by a "+" (respectively a "-") exponent .Also, it may happen that some of the 40 structures have more communities than expected, while other have less.Such cases are indicated by the exponent "*".
Parametrised Newman-Girvan Modularities.We first focus on the behaviours of the parametrised Newman-Girvan modularities, when varying the parameter γ.The F 1-scores (y-axis) over γ parameters (x-axis) of the Newman-Girvan modularities applied on simple and preprocessed graphs are provided in fig.11 and 12 respectively.As stated in the legends, each curve corresponds to one mixing parameter value µ bin from table 4. On these figures, we observe that both modularities are able to provide community structures close to the ground truth for some γ ∈ [1.25,2].In the right panels, one can also observe that, for very large γ's, F 1-score tends to 0. This means that, whatever the sharpness of the ground truth community structure, it exists some γ beyond which Louvain returns the structure with one community per node.We also observe that the fundamental difference between simple and doubly stochastic criteria is that the parametrised Newman-Girvan modularity is much more sensitive to γ variations when applied on simple graphs.Indeed, in fig.11, the F 1-score curves are quite sharp.For each µ, there is a peak at the γ value that maximises the F 1-score.Moreover, this peak is not located at the same γ across the µ's (1.625 for µ = 0.769, 1.75 for µ = 0.699 and 1.875 for the other values of µ).On the other hand, in fig.12, the F 1-score curves are much smoother and the maxima lie along a plateau, whose length depends on the mixing parameter µ.Thus, there is much more chance to pick a γ that provides a sound community structure for preprocessed graphs than for simple ones.
Correlation Clustering Criteria.We now focus on the behaviours of the Correlation Clustering criteria, when varying the parameter λ.As previously, F 1-scores over λ parameters are provided in fig.13 and 14, for Louvain algorithm applied on simple and preprocessed graphs, respectively.This time, criteria on both simple and doubly stochastic graphs highlight plateaus at their maxima.However, one can observe that these plateaus do not appear for the same parameter values.Indeed, there is a factor 100 between the x-axes of the two figures (on the left panel of fig.13, x-axis limits are 10/n and 500/n, while these are 1/10n and 50/10n for fig.14).This observation is consistent with property 2. F λ CC (A, X).Thus ∀i = j, a i,j = 0 =⇒ x * i,j = 0.  Respectively, if S ∈ R n×n + is doubly stochastic, and λ ω > n/2, thus X * ∈ Eq(n) that maximises F ω,λω CC (S, .) is such that ∀i = j, s i,j = 0 =⇒ x * i,j = 0.This property states that, for large values of the λ parameter, in the community structure that optimises the Correlation Clustering criterion, each community must be a clique of the graph.In the benchmark analysed here, this strong constraint implies that, for such λ values, the optimal community structure does not fit the ground truth one.Thus, in property 2, λ and λ ω provide an upper bound beyond which the Correlation Clustering criteria are not able to uncover the ground truth community structures of the graphs from the benchmark.Those graphs having an average degree equal to 100 implies that those upper bounds are such that λ ≈ 100 × λ ω , which is consistent with the differences of x-axes between fig. 13 and 14.Finally, we remark that, opposite to the observations made on Newman-Girvan modularities, maximum plateaus are smoother for simple graphs.
All Criteria on Simple Graphs.In this paragraph, we compare the different behaviours of all the criteria designed for simple graphs.Recall and Precision are displayed in fig.15.The parameters for the Correlation Clustering and the parametrised Newman-Girvan criteria, respectively λ = 210/n and γ = 1.625, are chosen so that the average F 1-score is maximised over all the mixing parameters.From the right panel, we observe that, except for Zahn criterion, all the measures return high scores of Recall (all above 0.8 even for the largest mixing parameter).This is consistent with the fact that, except when used with the Zahn criterion, the Louvain algorithm tends to return structures with less communities than expected, when applied on simple graphs, as it can be seen from -highlighted cells in table 5. Thus, some of the ground truth communities are merged into the returned ones.And a high value of Recall means that the returned communities tend to cover the ground truth ones.On the other hand, Louvain with Zahn criterion returns almost 5 times more communities than expected when µ = 0.484, and this ratio keeps increasing with µ, which explains the slump of this criterion Recall curve.This is an expected result, since it is proven in [34] that the community structure that maximises the Zahn criterion is such that subgraphs induced by each community must be 1/2-dense.Looking at the values of p in and p out from table 4, ground truth communities are expected to respect this property up to µ = 0.484, included.However, Louvain algorithm only approximates the best community structure for the criterion, which explains why the slump starts at µ = 0.484 in the tests.
When looking at the left panel, we can roughly divide the remaining measures into two categories: the Balanced Modularity, the Deviation to the Indetermination and the Newman-Girvan modularity, that exhibit low Precision scores, and the Deviation to the Uniformity, the Correlation Clustering criterion and the parametrised Newman-Girvan modularity that exhibit much better Precision values.Once again, this is consistent with the ratio of the number of communities returned by Louvain, highlighted in table 5. Indeed, low Precision values are expected when ground truth communities are merged into the returned ones.And from -highlighted cells in table 5, one can remark that the tendency of Louvain algorithm to provide less communities than expected is emphasised for the criteria with lower Precision scores.Finally, one can focus on the somehow strange shape of the parametrised Newman-Girvan modularity, that achieves its minimum for the second smallest value of mixing parameter.As observed from fig. 11, the parameter value γ that maximises the F 1-score is not consistent over all the mixing parameters.Thus the choice of γ, which is a trade-off between the mixing parameters, clearly disadvantages graphs with lowest mixing parameters.
All Criteria on Doubly Stochastic Graphs.Here, we discuss the behaviours of the criteria applied to doubly stochastic graphs.As previously, the parameters for the Correlation Clustering criterion (λ = 20/10n) and for the parametrised 10, and standard deviations by 100.Parameters for the Correlation Clustering criteria and parametrised Newman-Girvan modularities are those that maximise the average F1-score, as explained in the previous paragraphs.We observe that the most accurate criteria are the parametrised ones.Indeed, the criterion which provides the best F1-score overall is the Correlation Clustering criterion on simple graphs (F λ CC ), closely followed by the Correlation Clustering and parametrised Newman-Girvan criteria on doubly stochastic graphs (F ω,λ CC and F ω,γ N G ).Last from this pool is the parametrised Newman-Girvan modularity on simple graphs (F γ N G ).These four criteria exhibit average F1-scores above 0.85 for all the mixing parameters.
We now compare the four criteria unified by theorem 2, namely the Deviation to Indetermination (F DI ), Balanced Modularity (F BM ), Newman-Girvan modularity (F N G ) and Deviation to Uniformity (F DU ).First, we observe that the latter provides very high F1-scores compared to the other measures.Except for the largest mixing parameter value, and its high standard deviations, the Deviation to Uniformity is almost competitive with the parametrised criteria.This is an artifact due to the benchmark, in which network community structures are typical deviations to regular graphs.On the other hand, the three others are not competitive with the doubly stochastic Newman-Girvan modularity (F ω N G ) that generalises them all.
Our last observations concern the Zahn criteria.From table 6, it seems that none of the doubly stochastic versions of the Zahn criterion can compete with the one for simple graphs.However, it can be seen from table 5 that the number of communities returned by the Zahn criteria are quite different, making them hard to compare based on their F1-score.To highlight this, in fig.17, we plot the confusion matrices of a community structure with µ = 0.4 returned using F Z (left panel), and F ω Z,3 (right panel) (for each criterion, the chosen community structure is the one that provides the maximum F1-score).We observe that their behaviours are opposite: the community structure found on a simple graph correctly detects the largest communities, but split those of sizes 20 and 40: nodes from the ground truth 20-node communities are assigned to 36 communities by Louvain (16 are expected), and nodes from the 40-node communities are split into 22 communities (8 expected).On the other hand, Louvain used with F ω Z,3 on a preprocessed graph perfectly detects communities of size 20, 40 and 80.However, it splits the two communities of size 160 into 20 communities, and the 320-node community into 225 ones.This illustrates again the tendency of the proposed preprocessing to sharpen small-size communities, here at the expense of the larger ones.It also highlights that finding the more desirable partitioning remains application-dependant, and should not be chosen on the basis of maximum F 1-score only.

Conclusion and Future Perspectives
Broadly speaking, the aim of this study was to investigate the utility of doubly stochastic scaling as a preprocessing for community detection.In particular, its capacity to increase the detectability of small-size communities and communities with an imbalance in edge direction, such communities being in general poorly detected by community detection algorithms.The proposed preprocessing was presented in section 4, along with illustrations of its potential to sharpen those kinds of communities on toy examples and on a real-world network.In section 5, we have generalised a range of graph partitioning measures to weighted networks, with a particular focus on the case of doubly stochastic ones.Of utmost interest is the result that the doubly stochastic scaling unifies these measures, as stated in section 6.1.That is, all of the six measures defined for simple graphs can be expressed using only two parametrised measures for doubly stochastic graphs.Extensive comparisons of these measures have been conducted using SBMs in section 6.2, where we observed that the measures the most able to accurately uncover community structures are the parametrised ones, for both simple and preprocessed graphs, but foremost that a great care should be given to the choice of the measure to maximise, as different measures behave extremely differently.
In the future, we would like to investigate the impact of the diagonal added to ensure the convergence of the scaling in algo. 1 and 2, in terms of numerical values within the resulting preprocessed graph.This would provide us with theoretical basis to help making the right choice.Furthermore, to keep improving community detection methods, we would like to incorporate the knowledge obtained from scaling factors to the process of discovering communities.Indeed, after scaling, all nodes have the same degree.This may be seen as a non desirable feature, as it means that some initial information about node centrality (namely, the degree) is lost.And for real applications, the more central the node, the more harmful an error of assignation on this node.However, as stated in [30], another kind of information about node centrality, similar to hub and authority centralities from [46], is conveyed by the scaling factors, and should  be exploited to ensure that a greater care is taken to the correct assignation of nodes with high centrality.Finally, we would like to extend the proposed preprocessing to the detection of overlapping communities.Indeed, in many applications, one node can be involved in more than one community [47].In a doubly stochastic scaling, a node belonging to many communities should produce high scaling factors (because of its high degree) and thus low numerical values in the doubly stochastic scaling, as illustrated in fig.18.This may provide a framework to identify those nodes.

Figure 1 :
Figure 1: Left: Adjacency matrices of networks with community structures.Middle: The Louvain algorithm cannot detect the smallest community (top matrix, the smallest community highlighted in the red square); and it is unable to detect the two communities connected in an imbalanced fashion (bottom).Right: After scaling, Louvain can detect small communities in presence of larger ones (top); and it can detect the community structure when there is an imbalance in the flows of edges (bottom).

Figure 2 :
Figure 2: Output of algo. 2 on the Florida Bay network, and its ground truth partitioning.Black '+'s indicate nonzero entries with numerical values below 10 −12 .

Definition 4 .
Community Structure Consistency.Assuming a matrix M ∈ R p×p + and C its ground truth community structure.The level to which a node u ∈ {1, ..., p} belongs to a community C ∈ C is assessed 2 by ϕ(u, C) = i∈C m(u, i) p j=1 m(u, j)

FFigure 3 :
Figure 3: Φ values for the raw network (left) and the scaled network without self-loop (right).The three first communities are much more consistent in the scaled network (diagonal Φ values are dominant) than in the raw one.

Figure 4 :
Figure 4: A weighted graph with positive integer edges (left) and the corresponding multigraph (right).

Figure 5 :
Figure5: Top: A simple graph and its adjacency matrix.Bottom: the corresponding complementary graph and its adjacency matrix.Degree of each node is given next to the corresponding row, the sum of degrees lies below the matrices.

Figure 6 :
Figure 6: Top: A multi-graph (left) is the contingency table (right) of two categorical variables.Bottom: In these categorical variables, variables are edges and categories are end nodes.For undirected graphs, source and target end nodes can be swapped (right versus left).

Figure 7 :
Figure 7: Top: The Zahn distance defined between two relations is based on the set representation of these relations.Coloured cells indicate pairs of related objects.Bottom: A symmetric relation can be seen as a simple graph.An equivalence relation corresponds to a community structure.

EFigure 9 :
Figure 9:  In the graph to the left, the green edges are similarities between the nodes they link.The set of these edges is denoted by E + .The orange edges link nodes that are dissimilar.The set of these edges is denoted E − .

Property 2 .
Given A the adjacency matrix of a simple graph, and λ > nnz(A)/2.Assuming that X * = argmax X∈Eq(n)

Figure 13 :
Figure 13: The F 1-score (y-axis) over λ (x-axis) of the Correlation Clustering criterion on simple graphs.

Figure 14 :
Figure 14: The F 1-score (y-axis) over λ (x-axis) of the Correlation Clustering criterion on doubly stochastic graphs.

Figure 15 :
Figure 15: Precision and Recall (y-axes) of all the criteria applied on simple graphs, over the mixing parameters (x-axes).

Figure 16 :
Figure 16: Precision and Recall (y-axes) of all the criteria applied on doubly stochastic graphs, over the mixing parameters (x-axes).

Figure 17 :
Figure 17: Confusion matrix of one community structure returned by Louvain used with F Z (left), respectively F ω Z,3 (right).

Figure 18 :
Figure 18: Doubly stochastic scaling of a toy example of overlapping communities.Left: Values of the scaling factor.Right: The scaling form of a simple graph exhibiting two overlapping communities.

Table 1 :
Typography of mathematical objects.

Table 2 :
All possible agreement/disagreement relations between two objects u and v according to two categorical variables P and Q.

Table 4 :
Edge probabilities in each SBMs (p in and pout), theoretical mixing parameters (µ theo ), and the observed average mixing parameters on the simple ( µ bin ) and preprocessed ( µ stoch ) graphs.

Table 5 :
Average ratio scores of the community structures returned by Louvain, with standard deviations between parentheses.Top : : more communities than expected; : number of communities less than half the expected number, in average.Bottom: : less than 2 communities, in average; : less than 6 nodes per community, for µ = 0.18.