BDMA - Massive Graph Management and Analytics

Jose Antonio Lorencio Abril

Fall 2023

PIC

Professor: Nacéra Seghouani

Student e-mail: jose-antonio.lorencio-abril@student-cs.fr

This is a summary of the course Massive Graph Management and Analytics taught at the Université Paris Saclay - CentraleSupélec by Professor Nacéra Seghouani in the academic year 23/24. Most of the content of this document is adapted from the course notes by Seghouani, [4], so I won’t be citing it all the time. Other references will be provided when used.

Contents

 1 Introduction
 2 Preliminaries
  2.1 Graph Theory Preliminaries
  2.2 Linear Algebra Preliminaries
 3 Random Walks on Graphs
  3.1 First Perron-Frobenius Theorem
  3.2 Random Walks on Graphs
  3.3 PageRank
 4 Centrality Measures
  4.1 Degree Centrality
  4.2 Neighbourhood centrality
  4.3 Eigenvector Centrality
  4.4 PageRank Centrality
  4.5 Katz (or alpha) centrality
  4.6 Clustering Coefficient Centrality
  4.7 Closeness Centrality
  4.8 Betweenness Centrality
 5 MapReduce Computation Model
  5.1 The Map Function
  5.2 Shuflling/Grouping Function
  5.3 Reduce Function
  5.4 MapReduce Execution Model
  5.5 Use-Case: Matrix-Vector Multiplication by MapReduce
  5.6 Relational Algebra by MapReduce
  5.7 Some Issues of MapReduce
 6 Spark Parallel Computing Framework
  6.1 Spark’s Engine Properties and Components
  6.2 Spark’s Execution Architecture
  6.3 Resilient Distributed Datasets (RDDs)
 7 Community Detection Approaches
  7.1 k-Clique
  7.2 k-Core
  7.3 Community Detection Problem
 8 Models of Influence and Diffusion
  8.1 Introduction
  8.2 Influence and Diffusion Models
  8.3 Influence Maximization Problem Formulation

1 Introduction

Graph-structured data is at the heart of complex systems and plays a major role in our daily life, science and economy. Examples of this data are the cooperation between billions of individuals, or communication infraestructures with billions of cell phones, computers and satellites, the interactions between thousands of genes and metabolites within our cells, and so on.

Therefore, understanding its mathematical foundations, description, prediction, and eventually being able to control them is one of the major scientific challenges of the 21st century.

2 Preliminaries

2.1 Graph Theory Preliminaries

A graph is a pair G = (V,E ), where V is the set of vertices and E V × V is the set of edges. Usually, we denote |V | = n and |E | = m.

There are different types of graphs:

Some examples are:

PIC

Continuing with definitions, let G = (V,E) be a graph (directed or undirected). Let di+ and di- denote the number of edges coming out and coming to vi, respectively. The degree of vi is

     +    -
di = di + di .

Note that it counts double for undirected graphs.

Now, let Ni+ and Ni- the set of successors and predecessors of vi, respectively. Then, the set of neighbors of vi is

N  = N+ + N -.
  i   i     i

A path between two vertices, u,v V , denoted u v, is a sequence of vertices (u = v0,v1,...,vk- 1,vk = v), where (vi-1,vi) E,i = 1,...,k. The length of a path, L(u ⇝ v), is the number of edges in the cycle, that is, k.

A cycle is a path from a vertex to itself, u u.

The distance between two nodes, d(u,v), is the shortest path length between them:

d (u,v) = minL (u ⇝ v).
         u⇝v

The eccentricity of a node, ecc(u), is the greatest distance between u and any other vertex in the graph:

ecc (u) = mva∈xV d (u,v) .

Note that this could be infinity if we cannot reach some node from u. Usually, we consider only reachable nodes, because this can give us information about the graph, but a value of infinity is not very informative.

The diameter of a graph, diam(G ), is the greatest distance between two nodes in the graph:

diam (G) = max d(u,v) = max ecc(u).
          u,v∈V         u∈V

The radius of a graph, rad(G ), is the minimum eccentricity of any vertex in the graph:

rad(G) = mui∈nV ecc(u ).

The center of a graph, C(G), is the set of all vertices of minimum eccentricity, i.e., the graph radius:

C (G) = {u : ecc(u) = rad (G )}.

Example 2.1. Compute the diameter, radius and center of the following graphs:

PIC

The solution is the following:

PIC

In each node, we show its eccentricity. The diameter is 6, the radius is 3 and the center is c (in blue).

PIC

Solution:

PIC

In this case, the diameter is 5, the radius is 3 and the center is {c,h}.

A partial graph of G = (V,E) is a graph G= (V,E ′), where E′⊂ E.

A subgraph of G = (V,E ) is a graph G=   ′  ′
(V ,E ) where V ′⊂ V and E′⊂ E. Note that partial graphs are also subgraphs.

A graph G = (V,E) is said to be connected if, and only if, u,v V,u v.

A (strongly) connected component of G = (V,E) is a subgraph Gcc = (Vcc,Ecc), where u,v V cc,u v V cc. That it, a connected subgraph. It is called strongly when the paths are directed.

A graph G = (V,E) is a tree if, and only if, G is a connected graph without cycles. In this case, the graph has m = n - 1 edges.

A graph G = (V,E) is a forest if, and only if, all connected components of G are trees.

2.1.1 Breadth First Search (BFS)

BFS is a method to traverse the nodes of a graph, by starting at one node and traversing all its neighbours. Then, all neighbours of its neighbours, and so on.

For this, we use a FIFO queue. The algorithm is:

1procedure BFS(G=(V,E), r) 
2  Q <- emptyset 
3  enqueue(Q,r) 
4  r.label = True 
5   
6  while Q is not empty do 
7    v <- dequeue(Q) 
8    for neig in neighbours(v) do 
9      if not neig.label then 
10        enqueue(Q,neig) 
11        neig.label = True 
12      end if 
13    end for 
14  end while 
15end procedure

Example 2.2. Apply BFS in the following graph, starting at node A.

PIC

Q=[A]. We visit A’s neighbours:

PIC

Q=[F,G]. Now, by lexycographical order, we visit F’s neighbours:

PIC

Q=[G,B,E]. Now, we visit G’s neighbours. Since it has no new unvisited neighbours, there is no change.

Q=[B,E]. Now, we visit B’s neighbours:

PIC

Q=[E,C]. Now, we visit E’s neighbours:

PIC

Q=[C,D]. Everything is visited, so the queue will be slowly emptied!

2.1.2 Depth First Search (DFS)

In the case of DFS, the objective is also to traverse the whole graph. The difference is that in this case we try to go as deep as we can in the graph before visiting more neighbours.

It can be implemented with a stack, let it be a explicit stack, or an implicit one.

The implementation with an explicit stack is the following:

1procedure DFS(G=(V,E),r) 
2  S <- emptyset 
3  push(S,r) 
4  while S is not empty do 
5    v <- pop(S) 
6    if not v.label then 
7      v.label = true 
8      for neig in neighbours(v) do 
9        push(S, neig) 
10      end for 
11    end if 
12  end while  
13end procedure

The implementation with an implicit stack is recursive, and is as follows:

1procedure BFS(G=(V,E), r) 
2  Q <- emptyset 
3  enqueue(Q,r) 
4  r.label = True 
5   
6  while Q is not empty do 
7    v <- dequeue(Q) 
8    for neig in neighbours(v) do 
9      if not neig.label then 
10        enqueue(Q,neig) 
11        neig.label = True 
12      end if 
13    end for 
14  end whileDFS*(G=(V,E), r) 
15  r.label = true 
16  for neig in neighbours(r) do 
17    if not neig.label then 
18      DFS*(G, neig) 
19    end if 
20  end for 
21end procedure

Example 2.3. Let’s repeat the example, now using DFS:

PICPICPIC

PICPICPIC

PIC

2.1.3 Greaph Representations

A graph, G = (V,E ), with n vertices and m edges can be encoded using different structures:

2.1.4 Exercises

  1. Using graph traversal algorithms, propose an algorithm that computes the number of edges between a given vertex and all other vertices.

    1procedure n_edges(G=(V,E), r) 
    2  Q <- emptyset 
    3  enqueue(Q,r) 
    4  r.n_edges = 0 
    5 
    6  while Q is not empty do 
    7    v <- dequeue(Q) 
    8    for neig in neighbours(v) do 
    9      if not neig.n_edges then 
    10        enqueue(Q,neig) 
    11        neig.n_edges = v.n_edges + 1 
    12      end if 
    13    end for 
    14  end while  
    15end procedure

  2. Given the following cycles with even and odd lengths (with the distances or depths from the grey vertex), what do you think about the case of graphs with an odd cycle (in number of edges)? Is this a characteristic property? State the general case.

    PIC

    Proposition: a graph contains a cycle C with an odd number of edges if, and only if, (x,y) E|depth(x ) = depth  (y).

    Proof : first, we know that all edges connect vertices of ’neighbouring’ depths. That it, (x,y) E, it holds |depth (x)- depth (y)|1.

    [ = ⇒ ] By reduction ad absurdum, seeking a contradiction, suppose that (x,y) C, with depth(x)depth(y). This means that depth(x) = depth(y) ± 1. Therefore, there is, along the cycle, a node of even depth, followed by a node of odd lenght, and so on. When we close the cycle, the final node is the inicial one, so its depth is 0 (even). Therefore, we need an even number of edges, to conserve the parity.

    [ ⇐= ] If there is an edge (x,y) E with depth(x) = depth(y), then we can consider the path tree that was used to annotate the depths. In this tree, x and y have a first ancestor z in common, from which we can form an odd cycle of size 2 (depth(x)- depth (z)) + 1 by adding the edge (x,y) to this subtree starting at z.

  3. Propose an algorithm that determines if a graph contains an odd cycle.

    1procedure hasOddCycle(G=(V,E)) 
    2  v <- a vertex from V 
    3  depths <- n_edges(G,v) #from the first exercise 
    4   
    5  for (u,v) in E do 
    6    if depth[u] == depth[v] then 
    7      return True 
    8    end if 
    9  end for  
    10end procedure

  4. In a bipartite graph, can there be a cycle with an odd number of edges? Is this a characteristic property?

    No, it is not possible!

    Proposition: A graph is bipartite if, and only if, all cycles are of even size.

    [ = ⇒ ] If the graph is bipartite, any path alternates between each vertex of each partition to create a cycle ending by the initial vertex. Therefore, all cycles must be of even size.

    [ ⇐= ] Consider the partition of vertices with even depth V 1, and the partition of vertices with odd depth V 2.

    Since there is no odd cycle, then, from question 2, we know that (u,v) E it is depth(u) = depth(v) ± 1. Therefore, the graph is bipartite.

  5. Propose an algorithm that allows to determine if a graph is bipartite. Test your algorithm in the following graph. Is it bipartite? Justify your answer.

    PIC

    The algorithm is the same as in exercise 3, because of exercise 4.

    The proposed graph is clearly not bipartite, because there are several odd cycles.

  6. Graph coloring is a way of coloring the vertices of a graph in such a way that no two adjacent vertices share the same color. A 2-colorable graph is a graph that can be colored with only 2 colors.

    1. What is the link with the previous exercise? Justify your answer.

      Proposition: a graph is 2-colorable if, and only if, it is bipartite.

      Proof : [ =⇒ ] If it is 2-colorable, with colors red and blue. Then we take V 1 = {u|color(u) = blue} and V 2 = {u|color(u) = red}. G is clearly bipartite with this partition.

      [ ⇐= ] If it is bipartite, with partition V 1 and V 2, then we can color all nodes in V 1 in blue, and all nodes in V 2 in red. The graph is 2-colorable.

    2. We want to write an algorithm, inspired by DFS search, which takes as input a graph, G = (V,E), and which returns a pair (result,color) where result is True if the graph is colorable, False otherwise, and color is a dictionary associating a color 0 or 1 to each vertex. This algorithm should stop as soon as possible when the graph is not 2-colorable.

      1procedure coloring(G=(V,E), r) 
      2  color <- {r: 0} 
      3  stack <- emptyset 
      4  push(stack, r) 
      5   
      6  while stack is not empty do 
      7    v <- pop(stack) 
      8    for neig in neighbours(v) do 
      9      if neigh is not in color.keys then 
      10        push(stack, neigh) 
      11        color[neig] = 1 - color[v] 
      12      elif color[neig] = color[v] then 
      13        return False, color 
      14      end if 
      15    end for 
      16  end while 
      17  return True, color 
      18end procedure

  7. Compute the shortest path in the following graph using Dijkstra’s algorithm, starting at A:

    1procedure dijkstra(G=(V,E), r) 
    2  dist <- {r:0} 
    3  P <- emptyset 
    4   
    5  for v in V-{r} do 
    6    dist[v] = infinity 
    7  end for 
    8 
    9  while V-P is not empty do 
    10    w <- select(v in V-P and dist[v]=min_u dist[u]) 
    11    P <- P union {w} 
    12    for neig in neighbours(w)-P do 
    13      if dist[w]+weight(neig,w) < dist[neig] then 
    14        dist[neig] <- dist[w]+wight(neig,w) 
    15      end if 
    16    end for 
    17  end while  
    18end procedure

    PIC

    We start with: dist =








    ABCDE F G














    0







    , and P = .

    Now, w = A and P = {A }. We update dist =








    ABCDEFG














    0 3 1







    .

    Now, w = G and P = {A,G }. We update dist =








    ABCDEFG














    0 3 2 1







    .

    Now, w = F and P = {A,F,G }. We update dist =








    ABCDEFG














    0 6 3 2 1







    .

    Now, w = E and P = {A,E,F,G }. We update dist =








    ABCDEFG














    0 6 4 8 3 2 1







    .

    Now, w = C and P = {A, C,E,F,G }. We update dist =








    ABCDEFG














    0 6 4 6 3 2 1







    .

    Now, w = B and P = {A,B, C,E,F,G }. dist does not change.

    Finally, w = D and P = {A,B, C,D,E, F,G}. dist does not change.

  8. Given the following graphs:

    PIC

    1. Give the different representations of these graphs.

      A1 =

      0
      1
      2
      3
      4
      5
      0
      1
      2
      3
      4
      5
                                                    (                  )
                                              |  0  1 0  0  1  1 |
                                              ||  1  0 1  1  0  1 ||
                                              ||  0  1 0  1  0  1 ||
                                              |(  0  1 1  0  0  1 |)
                                                 1  0 0  0  0  1
                                                 1  1 1  1  1  0
      ,L1 =
      0 : {1,4,5}
      1 : {0,2,3,5}
      2 : {1,3,5}
      3 : {1,2,5}
      4 : {0,5}
      5 : {0,1,2,3,4}

      B1 =

      01
      04
      05
      12
      13
      15
      23
      25
      35
      45
      0
      1
      2
      3
      4
      5
                                               (                             )
                                           1  1  1  0  0 0  0  0  0  0
                                         || 1  0  0  1  1 1  0  0  0  0 ||
                                         || 0  0  0  1  0 0  1  1  0  0 ||
                                         || 0  0  0  0  1 0  1  0  1  0 ||
                                         ( 0  1  0  0  0 0  0  0  0  1 )
                                           0  0  1  0  0 1  0  1  1  1

      A2 =

      0
      1
      2
      3
      4
      5
      0
      1
      2
      3
      4
      5
                                                    (                  )
                                                 0  1 0  0  1  0
                                              ||  0  0 0  0  0  1 ||
                                              ||  0  0 0  1  0  1 ||
                                              ||  0  0 0  1  0  0 ||
                                              (  1  1 0  0  0  0 )
                                                 0  0 0  0  1  0
      , L2 =
      0 : {1,4}
      1 : {5}
      2 : {3,5}
      3 : {3}
      4 : {0,1}
      5 : {4}
      ,

      B1 =

      01
      04
      15
      23
      25
      33
      40
      41
      54
      0
      1
      2
      3
      4
      5
                                                (  1 1  0  0  0  0  2  0 0 )
                                          |  2 0  1  0  0  0  0  2 0 |
                                          ||  0 0  0  1  1  0  0  0 0 ||
                                          ||  0 0  0  2  0  3  0  0 0 ||
                                          |(  0 2  0  0  0  0  1  1 2 |)
                                             0 0  2  0  2  0  0  0 1

    2. Compute A2,A3. What does Aijr represents?

      Aijr represents the number of paths of length r from node i to node j.

    3. What is the complexity of Ar? Is it possible to reduce it?

      Computing Ar is O(   )
 rn3, since it requires r products of complexity O(  )
 n3.

      However, we can reuse some results to reduce the complexity:

      • If r is even, we can do Ar = (  r)
 A 22.

      • If r is odd, we can do Ar = A(    )
 A r-12-2.

      Therefore, we can obtain Ar in O(      3)
logr ⋅n.

2.2 Linear Algebra Preliminaries

A norm is a function f that measures the size of a vector. It must satisfy the following properties:

A widely use family of norms are the p-norms:

       ∘∑-----p
∥x∥p = p    |xi|,
          i

with the most common one being the Euclidean norm, for p = 2:

      ∘ ∑-----
∥x∥ =      x2i.
         i

The determinant of a square matrix is equal to the hypervolume of the parallelotope defined by the vectors of the matrix. It is 0 if, and only if, the set of vectors is colinear.

The determinant can be used for many things:

Some properties of the determinant are:

A square matrix, A, is invertible (non-singular, non-degenerate), with inverse denoetd A-1, if B such that

AB = BA  = I,

in this case, A-1 = B.

Proposition 2.1. For a square matrix, A, the following properties are equivalent:

  • A is invertible.

  • All vectors in A are linearly independent.

  • |A|0.

  • AT is invertible.

  • 0 is not an eigenvalue of A.

Properties of the inverse:

An eigenvector or characteristic vector of a linear transformation, T, is a non-zero vector that changes by a escalar factor, λ, when transformed by T. That is, v is an eigenvector of the linear transformation T if

T (v) = λv.

There is a direct correspondence between n×n matrices and linear transformation in the n-dimenstional vector space into itself. That is, every linear transformation T can be represented as a matrix AT (the matrix depends on the chosen base). Therefore, we can say that AT has an eigenvector v if

AT v = λv.

The scale factors of the eigenvectors are called eigenvalues.

We can find the eigenvalues by solving a polynomial function on λ called the characteristic polynomial of AT:

(A - λI)v = 0.

Now, this equation has non-zero solution if, and only if,

|A - λI| = 0.

Therefore, we can compute |A - λI| and find all values of λ that makes it equal to 0.

Once we have the eigenvalues, we can use them to find the corresponding eigenvectors.

Example 2.4. Compute the eigenvalues and eigenvectors of A = (      )
  2  1
  1  2.

         |            |
         || 2- λ    1  ||        2       2
|A - λI| = | 1    2- λ | = (2 - λ ) - 1 = λ - 4λ + 3.

This has as solutions

       √ -------
λ = 4±---16--12-= 4±-2-= 2± 1.
         2          2

Therefore, we have λ1 = 12 = 3.

To find the eigenvectors, we solve

             {
Av = λv  ⇐⇒    2x+ y  = λx .
               x+ 2y  = λy

For λ1 = 1, it is

{
 2x + y = x
 x + 2y = y  ⇐⇒  x = - y,

so the eigenvector associated to λ1 = 1 is

      (    )
         t
vλ1 =   - t  .

For λ2 = 3, it is

{
 2x + y  = 3x ⇐ ⇒  x = y,
 x + 2y  = 3y

so the eigenvector associated to λ2 = 3 is

     (   )
vλ =    t  .
  2     t

We call the algebraic multiplicity, ti, of the eigenvalue λi to its multiplicity as root of the characteristic polynomial:

                        t1       t2           tk
P (A) = |A - λI| = (λ- λ1) (λ - λ2) ⋅...⋅(λ- λk) .

Note that A can have at most n distinct eigenvalues, although some of them may be complex.

Proposition 2.2. If the eigenvalues of A are all different, then the corresponding eigenvectors are linearly independent.

The eigenspace of an eigenvalue, λ, is the spave generated by the eigenvectors associated to λ.

The dimension of the eigenspace of λ is the geometric multiplicity of λ. The geometric multiplicity of an eigenvalue is, at most, its algebraic multiplicity.

Example 2.5. Let’s get some eigenspaces:

A = (          )
  - 1 1  0
( - 4 3  0 )
   1  0  2, so

|A - λI| = || - 1 - λ  1      0  ||
||   - 4   3- λ    0  ||
||   1      0    2- λ || = (- 1- λ)(3- λ) (2- λ) + 4(2- λ)
= (2- λ)[(- 1- λ)(3- λ)+ 4] = (2 - λ)(              2   )
 - 3 + λ- 3λ + λ + 4
= (2- λ)(         )
 λ2 - 2λ+ 1
= (2- λ)(λ- 1)2.

This has roots λ1 = 1, with algebraic multiplicity 2, and λ2 = 2, with algebraic multiplicity 1.

Now, we get the eigenvectors associated to them:

             (
             |{ - x +y   = λx
Av = λv ⇐ ⇒    - 4x+ 3y = λy .
             |( x+ 2z    = λz

For λ1 this is

(                    (
|{ - x + y   = x      |{y         = 2x
  - 4x + 3y = y ⇐ ⇒   - 4x+ 3y  = y  ,
|( x+ 2z     = z      |(x         = - z

so vλ1 = (   t )
(  2t )
   - t, with dimension 1 (it could be 2).

For λ2 this is

(| - x + y  = 2x      (| y      = 3x       (|x = 0
{                    {                   {
|( - 4x + 3y = 2y  ⇐⇒  |( - 4x   = - y ⇐ ⇒  |(y = 0      ,
  x+ 2z    = 2z        x+ 2z  = λz        2z     = 2z

so vλ2 = (    )
   0
(  0 )
   t, with dimension 1 (it could not be differently).

B = (            )
(  4   6   0 )
  - 3  - 5 0
  - 3  - 6 1, so

|B - λI| = |                    |
||4- λ     6      0   ||
|| - 3  - 5- λ    0   ||
| - 3    - 6   1 - λ | = (4- λ)(- 5- λ) (1 - λ) + 18(1- λ)
= (1 - λ) [(4 - λ)(- 5- λ) +18] = (1 - λ) (- 20- 4λ+ 5λ + λ2 + 18)
= (1 - λ)( 2       )
 λ + λ - 2 = (1- λ)2(- 2- λ).

This has roots λ1 = 1, with algebraic multiplicity 2, and λ2 = -2, with algebraic multiplicity 1.

Now, we get the eigenvectors associated to them:

            (| 4x+ 6y       = λx
            {
Av = λv ⇐⇒  |( - 3x - 5y    = λy .
              - 3x - 6y+ z = λz

For λ1 = 1, we have

(                       (
|{ 4x +6y       = x      |{ x  = - 2y
| - 3x- 5y     = y  ⇐⇒  |          .
( - 3x- 6y+ z  = z      ( z  = z

Therefore, the eigenspace associated to λ1 is

        ((      )  (    ))
        {   - 2t       0  }
E (λ1) = ((   t  ) ,(  0 )) .
             0        u

For λ2 = 2, we have

(                         (                     (
|{ 4x+ 6y       = - 2x     |{ x       = - y       |{x  = - y
  - 3x - 5y    = - 2y ⇐ ⇒                  ⇐ ⇒           .
|( - 3x - 6y + z = - 2z    |( - 3y +z = - 2z      |(y  = z

Thus, the eigenspace associated to λ2is

        (    )
          - t
E (λ2) = (  t ) .
           t

C = (            )
   1   - 1 0
( - 1  2   1 )
   0   1   1,

|C - λI| = ||1 - λ  - 1     0  ||
|| - 1  2 - λ    1  ||
||  0     1    1- λ || = (1 - λ)2(2- λ) - 2(1 - λ )
= (1 - λ) [(1 - λ)(2- λ)- 2]
= (1 - λ)(                )
 2- λ - 2λ+ λ2 - 2
= (1 - λ) (λ2 - 3λ)
= (1 - λ) (λ- 3)λ.

             (
             |{x - y       = λx
Cv = λv ⇐ ⇒   - x+ 2y + z = λy .
             |(y + z       = λz

λ1 = 0:

(
|{ x- y        = 0     {                {
  - x +2y + z = 0 ⇐⇒    x = y      ⇐⇒    x = y   ,
|( y+ z        = 0       y+ z   = 0       y = - z

so

        (    )
           t
E (λ1) = (  t ) .
          - t

λ2 = 1:

(                     (
|{x - y        = x     |{ y = 0            {
 - x+ 2y + z  = y ⇐ ⇒   - x+ z  =   ⇐ ⇒   y = 0  ,
|(y + z        = z     |( z       = z       x = z

so

        (  t )
E (λ2) = ( 0 ) .
           t

λ3 = 3:

(
|{x - y       = 3x      {
 - x+ 2y+ z  = 3y  ⇐⇒    y  = - 2x ,
|(y + z       = 3z        y  = 2z

so

        (    )
          - t
E (λ3) = (  2t ) .
           t

D = ( 1  - 1  4  )
( 3   2   - 1 )
  2   1   - 1,

|D - λI| = || 1- λ   - 1     4   ||
||   3   2 - λ   - 1  ||
||   2     1   - 1- λ ||
= (1- λ)(2- λ) (- 1- λ) + 12 + 2 - 8(2- λ) + 1 - λ + 3(- 1 - λ )
= (         2)
 2 - 3λ + λ(- 1- λ) - 4 + 4λ
= - 2 - 2λ + 3λ + 3λ2 - λ2 - λ3 - 4 + 4λ
= - λ3 + 2λ2 + 5λ - 6

To obtain the roots, we can use Ruffini:

-1 2 5-6
1 -11 6





-1 1 6 0

So λ1 = 1 is a root and we have now -λ2 + λ + 6 = 0, obtaining

    - 1± √1-+-24-  - 1 ± 5
λ = ------ 2---- = ---2--,

and we get λ2 = -2 and λ3 = 3.

             (| x- y +4z   = λx
             {
Dv = λv ⇐ ⇒  |( 3x+ 2y- z  = λy .
               2x+ y - z  = λz

λ1 = 1:

(                     (
|{x - y+ 4z   = x      |{y        = 4z     {y  = 4z
 3x + 2y- z  = y ⇐ ⇒   3x + 3z  = 0  ⇐⇒             .
|(2x + y- z   = z      |(2x + 2z  = 0        x = - z

Then, E(λ )
 1 = (  - t )
(  4t )
   t.

λ2 = -2:

(| x- y+ 4z   = - 2x     (| 3x - y+ 4z  = 0      {
{                       {                       5y- 5z     = 0
|( 3x+ 2y- z  = - 2y ⇐ ⇒ |( 3x + 4y - z  = 0 ⇐ ⇒   2x+ y + z  = 0
  2x+ y- z   = - 2z       2x + y+ z   = 0

     {                 {
⇐ ⇒   y = z        ⇐⇒    y = z   .
      2x + 2y  = 0       x = - y

Then, E(λ2) = (     )
   - t
(  t  )
   t.

λ3 = 3:

(                      (
|{x - y+ 4z   = 3x      |{- 2x- y + 4z = 0      { - 2x - y+ 4z = 0
 3x + 2y- z  = 3y ⇐ ⇒   3x - y- z    = 0  ⇐⇒
|(2x + y- z   = 3z      |(2x + y- 4z   = 0        5x- 5z       = 0

{
 y = 2z
 x = z   .

Then, E(λ3) = (    )
(  t )
   2t
   t.

E = (  6   - 2  2 )
( - 2  3   - 1 )
   2   - 1  3,

|E - λI| = ||                   ||
|| 6- λ   - 2    2   ||
||  - 2  3- λ   - 1  ||
   2     - 1  3 - λ
= (6- λ)(3 - λ)2 + 4 + 4 - 4(3- λ) -(6- λ) - 4(3- λ)
= (6- λ)(9 - 6λ+ λ2) + 2 - 8(3- λ) + λ
= 54 - 36λ + 6λ2 - 9λ + 6λ2 - λ3 - 22 + 8λ + λ
= - λ3 + 12λ2 - 36λ + 32.

Again, we can use the Ruffini rule:

-112-36 32
2 -2 20 -32





-110-16 0

So λ1 = 2 is a root, and we now have -λ2 + 10λ - 16 = 0, which gives us

          √--------
λ = - 10-±-100-- 64-= --10±-6 = 5± 3.
          - 2          - 2

Therefore, λ1 is a double root and the other root is λ2 = 8.

             (
             |{ 6x- 2y + 2z  = λx
Ev  = λv ⇐⇒    - 2x + 3y- z = λy .
             |( 2x- y + 3z   = λz

λ1 = 2:

(                        (
|{ 6x - 2y + 2z  = 2x      |{ 4x- 2y+ 2z  = 0      {
  - 2x+ 3y- z  = 2y  ⇐⇒    - 2x + y- z = 0 ⇐ ⇒    4x- 2y+ 2z  = 0
|( 2x - y+ 3z   = 2z      |( 2x- y +z    = 0        2x- y+ z    = 0

⇐⇒  2x - y+ z = 0

If x = 0: y = z.

If x = t0: -y + z = -2t, working for y = t and z = -t.

So

        ( (  0)  (   t ) )
E (λ ) = { ( t) ,(   t ) } .
    1   (    t      - t  )

λ2 = 8:

(                        (
|{ 6x- 2y + 2z   = 8x      |{- 2x- 2y + 2z = 0      {
  - 2x + 3y- z = 8y ⇐ ⇒   - 2x- 5y - z  = 0  ⇐⇒    - 3y - 3z   = 0
|(                        |(                         2x- y - 5z = 0
  2x- y + 3z   = 8z       2x - y- 5z    = 0

     {                 {
⇐ ⇒   y = - z      ⇐⇒    y = - z .
      2x - 4z  = 0       x = 2z

Therefore,

        (  2t )
E (λ2) = ( - t) .
           t

F = ( 0  - 1 - 1 )
( 1   2   1  )
  1   1   2,

|F - λI| = |                |
||- λ   - 1   - 1 ||
|| 1   2- λ    1  ||
| 1     1   2 - λ |
= - λ(2- λ)2/- /2+//2/- λ/+//λ+ 2 - λ
= (2 - λ) [- λ (2 - λ)+ 1]
= (2 - λ)(       2   )
 - 2λ+ λ + 1
= (2 - λ) (1- λ)2.

One root is λ1 = 1 with algebraic dimension 2, and λ2 = 2 with algebraic dimension 1.

            (
            |{ - y- z     = λx
Fv = λv ⇐⇒    x + 2y + z  = λy.
            |( x + y+ 2z  = λz

λ1 = 1:

(
|{ - y - z   = x      {
  x+ 2y + z = y  ⇐⇒    x+ y +z  = 0
|(
  x+ y + 2z = z

If x = 0: y = -z.

If x = t0: y + z = -t. This works for y = t,z = -2t.

Therefore,

        ({ (  0  ) (   t  ) )}
E (λ1) =  (  t  ) ,(   t  )   .
        (   - t      - 2t   )

λ2 = 2:

(                     (
|{ - y- z    = 2x      |{ - y - z = 2x     {
  x+ 2y+ z  = 2y ⇐ ⇒    x+ z   = 0   ⇐⇒    x = - z .
|( x+ y+ 2z  = 2z      |( x+ y   = 0         x = - y

⇐⇒  2x - y+ z = 0

So

        (  t )
E (λ ) = ( - t) .
   2      - t

Another way to represent eigenvalues and eigenvectors is

AV  = VΛ,

where V = [v1,...,vn] is the matrix formed by putting each eigenvector as a column, and

    (                 )
       λ1
    ||      λ2         ||
Λ = |(          ..      |)
                .
                   λn

is the diagonal matrix formed by all eigenvalues.

A matrix A is diagonalizable if there exist n linearly independent eigenvectors. That is, if the matrix V is invertible:

Λ = V -1AV.

This leads naturally to the eigen-decomposition of the matrix,

         -1
A = V ΛV   .

A real matrix, U, is orthogonal if UTU = UUT = I.

Proposition 2.3. The following statements are equivalent:

  • UT is orthogonal.

  • UT = U-1.

  • |U| = 1.

  • U’s eigenvectors are orthonormal (the pairwise dot product is 0 and the norm is 1).

Example 2.6. Some examples of orthogonal matrices:

A matrix A is said to be positive semi-definite when it can be obtained as the product of a matrix by its transpose:

          T
∃X|A = XX   .

Positive semi-definite matrices are always symmetric, because

AT = (XXT )T = XXT  = A.

A symmetric matrix A is positive semi-definite if all its eigenvalues are non-negative.

Proposition 2.4. Let A be a positive semi-definite matrix. Then:

  • 0 λ1 λ2 ... λn and its eigenvectors are pairwise orthogonal when their eigenvalues are different.

  • The eigenvalues are composed of real values.

  • The multiplicity of an eigenvalue is the dimension of its eigenspace.

In this case, since eigenvectors are orthogonal, it is possible to store all the eigenvectors in an orthogonal matrix.

Therefore, the eigen-decomposition of a positive semi-definite matrix, A, could be

A = UΛU T,

with U an orthogonal matrix.

As a consequence, the eigen-decomposition of a positive semi-definite matrix is often referred to as its diagonalization.

An alternative definition for positive semi-definite matrix is:

A is positive semi-definite if xTAx 0,x.

If it is xTAx > 0,x, then it is positive definite.

If it is xTAx 0,x, then it is negative semi-definite.

If it is xTAx < 0,x, then it is negative definite.

The rank of a matrix is the dimension of the vector space generated by its columns (or rows). This corresponds to the maximum number of linearly independent columns of A. A matrix whose rank is equal to its size is called a full rank matrix. Only full rank matrices have an inverse.

Proposition 2.5. The sum of the eigenvalues of a matrix is the sum of the elements of its main diagonal.

The product of the eigenvalues is equal to the determinant of the matrix.

We can now define the Laplacian matrix for undirected graphs, as

     (
     |{- 1  ,(vi,vj) ∈ E
Lij =  0    ,(vi,vj) ∕∈ E ,
     |(d    ,i = j
       i

or, equivalently,

L = D - A,

where A is the degree is the matrix of G, and A its adjacency matrix.

2.2.1 Exercises

  1. What could you say about these matrices?

    1. A = (          )
   - 1  32
    1  - 1, det(A) = -1
2, A is invertible. Its eigenvalues are λ1 = -1+√6-
2 and λ2 = -1-√6-
2, with vλ1 = (  √-  )
   26t
    t and vλ2 = (   √-  )
  - -62 t
    t.

    2. B = (         )
  - 1  32
   23  - 1. The second row is equal to the first row multiplied by -2
3. Therefore, it is not invertible.

    3. I: its determinant is 1. It is symmetric, orthogonal, its own inverse. Triple eigenvalue 1, with eigenspace the whole space.

  2. Show that An = XΛX-1.

    First, this is only true if A is diagonalizable. If that is the case, then we can proceed by induction on n:

    n = 1: Obvious.

    n = 2:

      2  (     -1)2       -1     -1     2  -1
A  =  X ΛX     = X ΛX   X ΛX   = X Λ X   .

    Suppose it is true for n - 1:

    An -1 = X Λn-1X -1.

    Then, for n, we have:

      n     n-1        -1   n-1 - 1     n  -1
A  = AA     = XΛX    XΛ    X   = X Λ X   .

  3. Find the eigenvalues and unit eigenvectors of ATA and AAT with A = (      )
  1  1
  1  0 the Fibonnaci matrix.

    First of all, notice that A is symmetric, so ATA = AAT = A2 = (  2 1 )
   1 1.

    ||             ||
|| 2- λ    1   ||
    1   1 - λ = (2- λ)(1 - λ) - 1 = 2 - 3λ + λ2 - 1 = λ2 - 3λ + 1. The roots of this polynomial are

           √-----      √-
λ = 3-±-9---4=  3±--5-.
        2         2

    Now,

                 {
A2v = λv ⇐⇒    2x+ y  = λx ⇐ ⇒  {x = (λ- 1)y
               x+ y   = λy

    Therefore

            (   √ - )
E (λ1) =  1+2-5t
            t

    with unit eigenvector v1 = √-1---
 4-√5-(   √ - )
   1+2-5
    1.

    And

            ( 1-√5- )
E (λ2) =    2  t
            t

    with unit eigenvector v2 = √-1√--
 4-  5(  1- √5 )
   -2--
    1.

  4. Without multiplying

        ( cosθ  - sinθ ) ( 2  0 ) (  cosθ   sinθ )
S =   sinθ   cosθ     0  5     - sin θ  cosθ   ,

    find the determinant, the eigenvalues and eigenvectors. Why S is positive definite?

    We have S = UΛUT with U orthogonal. Therefore, the eigenvalues of S are 2 and 5. Its determinant is 10. The eigenvectors are the eigenvectors of Λ rotated as well, that is:

        (             ) (      ) (             )
      cosθ  - sin θ     1 0      cosθ   sinθ
V =    sinθ   cosθ      0 1     - sin θ  cos θ  .

    S is positive definite because

            (              )(      ) (              )
xSxT = x   cosθ  - sin θ     2  0     cosθ   sin θ  xT ,
           sinθ   cosθ      0  5     - sinθ cosθ

    now note that

    (  (              ))    (              )
      cos θ - sin θ    T     cosθ   sin θ   T
  x   sinθ   cosθ      =    - sinθ cosθ  x  ,

    so

             ( 2  0 )
xSxT = y   0  5  yT ≥ 0,

    because (  2  0)
   0  5 is positive semi-definite (symmetric with positive eigenvalues).

  5. For what numbers c and d are the following matrices positive definite?

    1. A = (          )
(  c  1  1 )
   1  c  1
   1  1  c: all principal minors must be positive. That is:

      • c > 0.

      • |     |
|| c  1||
| 1  c| = c2 - 1 > 0. Combined with the previous one, this is c > 1.

      • |        |
|| c  1 1 ||
|| 1  c 1 ||
| 1  1 c | = c3+2-3c. Roots: 1,

        10-3 2
        1 1 1 -2





        11-2 0
        , and we have c2 + c- 2, with roots c =     √-
--1±2-5. We are only interested in the interval (1,∞ ), in which c3 - 3c + 2 > 0.

      Therefore, it is c > 1.

    2. B = ( 1  2  3 )
( 2  d  4 )
  3  4  5 :

      • 1 > 0.

      • |     |
|| 1  2||
| 2  d| = d - 4 > 0 ⇐⇒ d > 4.

      • || 1  2 3 ||
|| 2  d 4 ||
|| 3  4 5 || = 5d + 24 + 24 - 9d - 16 - 20 = -4d + 12 > 0 ⇐⇒ - 4d > -12 ⇐⇒ d < 3.

      Therefore, there is no value for d for which B is positive.

  6. Show that if λ12,...,λn are the eigenvalues of a matrix A, then Am has as eigenvalues λ1m2m,...,λnm.

    Induction on m.

    m = 1: Obvious.

    m = 2: Let vi be the eigenvector associated to λi, then

      2                              2
A vi = A (Avi) = A (λivi) = λiAvi = λivi,

    so λi2 is an eigenvalue of A2, with associated eigenvector vi.

    Suppose it is true for m - 1, then, for m:

            (       )    (      )
Amvi = A Am -1vi = A  λmi-1vi = λmi -1Avi = λmi vi,

    and we have the result.

  7. What is the determinant of any orthogonal matrix?

    If U is orthogonal, then UUT = I. Then,

            |   T|     | T|     2
1 = |I| = |UU | = |U||U | = |U |.

    Therefore, |U | = ±1.

  8. For an undirected graph, both the adjacency matrix and the Laplacian matrix are symmetric. Show that the Laplacian matrix is positive semi-definite.

3 Random Walks on Graphs

3.1 First Perron-Frobenius Theorem

Definition 3.1. A matrix, A, is positive if Aij > 0,i,j. Similarly, it is non-negative if Aij 0,i,j. Similar definitions apply for negative and non-positive matrices.

Remark 3.1. Observe that it is not the same for a matrix to be positive as to be positive semi-definite.

The Perron-Frobenius theorem for non-negative matrices leads to the characterization of non-negative primary eigenvectors. This is useful in stationary distributions, such as those of Markov chains and the famous Google’s page rank algorithm.

Theorem 3.1. Perron-Frobenius Theorem for positive matrices

If A is a positive matrix, then:

  • λ* > 0,v* > 0,∥v*∥2 = 1 such that A v = λ*v* (v* is a right column eigenvector).

  • λ* > 0,w > 0,∥w ∥2 = 1 such that w A = λ*w (w is a left row eigenvector).

  • For any other eigenvalue, λ, it holds, |λ| < λ* (λ* is a dominant eigenvalue, called the Perron eigenvalue).

  • λ* is unique and v* is unique (the only vector of unit length associated to λ*).

Definition 3.2. A non-negative matrix A is:

  • Irreducible if, i,j,k * such that Ai,jk > 0.

  • Primitive if, k * such that i,j, Ai,jk > 0.

Theorem 3.2. Perron-Frobenius Theorem for non-negative matrices

If A is a non-negative matrix, then:

  • λ* > 0,v*0,∥v*∥2 = 1 such that A v = λ*v* (v* is a right column eigenvector).

  • λ* > 0,w 0,∥w ∥2 = 1 such that w A = λ*w (w is a left row eigenvector).

  • For any other eigenvalue, λ, it holds, |λ|λ* (λ* is a dominant eigenvalue, called the Perron eigenvalue).

  • If A is irreducible, then the vector v* is unique and it holds v* > 0.

  • If A is primitive, then the eigenvalue λ* is unique.

Note now that a graph, G = (V,E ), with adjacency matrix A, then: G is connected ⇐⇒1 i,j |V |,k * such that Ai,jk > 0. This means that the adjacency matrix of connected graphs is irreducible.

Now, if a graph is k-connected, i.e., there is a k-path between all nodes, then its adjacency matrix is primitive. One sufficient condition for a graph to be k-connected is being connected and having Aii > 0 for some i.

3.2 Random Walks on Graphs

A random walk on a graph, G = (V,E ), is a random process that starts from some vertex vi, and repeatedly moves to a neighbour vj chosen at random (for example with uniform distribution). The random walk, ξt, is therefore a random variable describing the position of a random walk after t steps. The probability of going from node i to node j is the transition probability,

Pij = P (ξt+1 = j|ξt = i).

The sequence of nodes can be regarded as a Markov chain, i.e. a discrete time stochastic process, where the position ξ0 is the initial state, according to the init distribution, P0, and from this point the next state only depends on the current state. The t-step transition probability is

Pitj = P (ξt = j|ξ0 = i).

Some examples are the path traced by a molecule in a liquid or a gas (Brownian motion), the price of a fluctuating stock, the financial status of a gambler, etc. The term random walk was first introduced by Karl Pearson in 1905.

The following is a basic visual example of a random walk on a graph:

PIC

Note that we can express the transition probability Pij in a matrix P. This matrix is the transition probabilities matrix, and it is row-stochastic or row-Markov, meaning,

Pij ≥ 0,∀i,j, and ∑ Pi,j = 1,∀i.
                j

This implies that

          (   )   (    )
            1        1
P ⋅1 = P ⋅|( ... |) = |(  ... |) .
            1        1

This means that ( 1 )
|  .|
(  ..)
  1 is an eigenvector and 1 is an eigenvalue. 1 is the largest eigenvalue because

∥Pv∥1 ≤ ∥v∥1 ,

so, for an eigenvalue λ,

|λ|∥v∥ = ∥λv∥  = ∥Pv∥  ≤ ∥v∥  ,
     1      1       1     1

so |λ |1.

From the Perron-Frobenius theorem for non-negative matrices, we know that:

3.2.1 The Stationary Distribution

Let πt be the row vector giving the probability distribution of ξt, that is, πit is the probability that the random walk is at node i at time t. Therefore, we can write

πt+1 = πtP,

which, applied recursively, leads to

  t+1    0 t+1
π    = π P   .

Or, we can take limits

lim πt+1 = lim πtP.
 t        t

If this limit exists, limtπt = π, then

π = πP.

Convergence is ensured if P is irreducible.

Example 3.1. The following example does not converge:

    (      )
P =   0  1
      1  0

A common way to perform random walks on graphs is with the uniform probability. That is,

                       {
P  = P (ξ   = j|ξ = i) =  1di if (i,j) ∈ E,
 ij      t+1     t        0  otherwise,

where di is the degree of node i. Equivalently,

Pij = ∑-Aij----= Aij= D -1Aij.
        j∈V Aij   di     ii

The random sequence of vertices ξ01,...,ξtt+1,... visited on G is a Markov Chain with state space V and matrix transition probabilite P = D-1A.

Example 3.2. Given the graph:

PIC

The transition matrix for the uniform distribution is:

    (                   )
       0  0  11  01  01  0
    ||  01  01  3  31  31  0 ||
P = ||  4  41  01  4  41  01 ||
    ||  0  41  41  01  4  41 ||
    (  0  4  4  41  01  4 )
       0  0  0  2  2  0

3.2.2 Balance Condition

A probability distribution π satisfies the balance condition if

πiPij = πjPji,∀i,j ∈ V.

If π satisfies the balance condition, then it is the stationary distribution for the undirected graph. To see this, notice that the balance condition can be rewritten as

  A       A
πi--ij = πj-ji.
   di     dj

Since the graph is considered without direction, Aij = Aji, and then

πi  πj
d-= d- = c,
 i   j

where c is a constant, for all i,j. Now, we know that iπi = 1, so

1 = ∑  πi = ∑  πjdi = ∑ cdi = c∑ di.
     i      i  di     i        i

Therefore

∑
   di = 1.
 i      c

Finally, it must be

           d      d
πi = dic = ∑-i = --i-.
           jdj   2 |E |

In this case:

        ∑         ∑           ∑         ∑         ∑
(πP )i =    πjPji =    πj 1-Aji =  cAji = c  Aji = c   Aij = cdi = ∑di-= πi.
         j         j   dj      j         j         j              jdj

Therefore, we have seen that the stationary probabilities are proportional to the degrees of the vertices.

In particular, if G is d-regular, i.e., all nodes have degree d, then

   --d-   --d-   1-
π = 2 |E | = d ⋅n = n

is the uniform distribution. With this setup, a random walk moves along every edge with the same frequence.

The balance condition implies time-reversibility: the reversed walk is also a Markov chain.

3.2.3 Hitting Time

Definition 3.3. The expected hitting time, Hij, is the expected number of steps before node j is reached in a random walk starting at node i:

     {1 + ∑  P  H    if i ⁄= j,
Hij =       k ik kj
       0             otherwise.

Remark. In general, HijHji, so H is not symmetric.

Remark 3.2. H follows the triangle inequality

Hij ≤ Hik + Hkj.

Definition 3.4. The commute time, Cij, is the expected number of steps in a random walk starting at node i, reaching node j and coming back to i again:

Cij = Hij + Hji.

3.2.4 Lazy Random Walk

The lazy random walk is a variation of the random walk, in which the walk stays at the current node with probability 12, and continue with the walk with the rest of the probability.

In this case, the transition matrix is

     (
     |{ 12   if i = j,
Pij =   12di- if (i,j) ∈ E,
     |(0    otherwise.

If Q is the transition matrix for the uniform random walk, then

πt+1 = πtP =  1πt + 1πtQ.
             2    2

Proposition 3.1. If the lazy random walk converges and the uniform random walk is irreducible, then it converges to the same stationary distribution as the uniform random walk.

Proof. Let Q be the transition matrix for the uniform random walk, then, the stationary distribution is

π = πQ.

For lazy random walk, say the stationary distribution is π. Then:

    1     1         1     1
π′ = 2 π′ + 2π′Q ⇐ ⇒ 2π′ = 2π′Q ⇐ ⇒ π′ = π′Q.

Therefore, since Q is irreducible, the uniqueness of π implies π= π. __

3.3 PageRank

The web is very heterogeneous bu nature, and certainly huge. We cannot expect the web graph to be connected. Page and Brin proposed a way to overcome this problem, by ensuring the convergence of random walks on the web graph.

The idea is to fix a positive constant, p, between 0 and 1, called the damping factor, and which represents the probability that a user leaves the current page and goes to a random web.

Therefore, the page rank transition matrix is

Pg = (1 - p)P + pB,

where B = 1
n( 1  1  ... 1 )
| 1  1  ... 1 |
|| .  .  .    .||
( ..  ..   ..  ..)
  1  1  ... 1.

p is usually chosen small, like 0.15, modelling a situation in which a surfer will, most of the time, follow the outgoing links and move on to one of the neighbours. A smaller percentage of time, the surfer will dump the current page and choose arbritrarily a different page from the web.

Proposition 3.2. Pg is stochastic.

Proof. We need to proof that, for all i, it holds jPgi,j = 1.

jPgi,j = j(1- p)Pij + pBij
= (1 - p) jPij + p jBij
= (1 - p) 1 + p j 1
--
n
= 1 - p + p n1
--
n
= 1 - p + p
= 1.
__

4 Centrality Measures

Centrality Measures try to answer the question ’What characterizes an important vertex?’. They define a real-valued function on the vertices of the graph, m : V , that serves to rank the vertices. However, there are many different ways to define such function, leading to different definitions of centrality, such as cohesiveness, ability to transfer information across the network, to influence other nodes, to control information flow, etc.

There are many centrality measures that count the number of paths through a given vertex. These differ in how relevant walks are defined and counted. For example, if we only consider paths of length one, we would be computing degree centrality, while if we allow paths of arbitrary length, we would be computing eigenvalue centrality.

4.1 Degree Centrality

The more neighbours a vertex has, the higher its communication ability is, increasing its importance.

Definition 4.1. Given the graph G = (V,E), with adjacency matrix A, the degree centrality is computed as

D = Au,

where u = 1 n.

Example 4.1. Consider the following graph:

PIC

The degree centrality is

         (                  ) (   )    (   )
            0  1  0 0  1  0      1       2
         ||  1  0  1 0  1  0 || ||  1||    || 3 ||
D = Au = ||  0  1  0 1  0  0 || ||  1||  = || 2 || ,
         ||  0  0  1 0  1  1 || ||  1||    || 3 ||
         (  1  1  0 1  0  0 ) (  1)    ( 3 )
            0  0  0 1  0  0      1       1

so the nodes with highest value are nodes (2,4,5).

One drawback of this measure, is that it is very likely that several nodes present the same exact value, difficulting an unique ranking of vertices.

4.2 Neighbourhood centrality

This measure correspond to the average degree of each vertex neighbours. We could understand this measure as measuring how much a vertex is related to influencial vertices.

Definition 4.2. Given the graph G = (V,E), with adjacency matrix A, the neighbourhood centrality is computed as

N = D -1AD,

where D is the diagonal matrix where Dii = di is the degree of vertex i and D is the degree centrality. Each vertex’ measure is

     ∑
Nv = --u∈Nv du-.
        dv

Example 4.2. The neighbourhood centrality of the previous example graph is

             (  1                ) (                  ) (    )   (   3+3  )   (      )
             |  2  1             | |  0  1  0  0 1  0 | |  2 |   |  2+22+3- |   |   3  |
             ||     3  1          || ||  1  0  1  0 1  0 || ||  3 ||   ||   33+3  ||   || 2.33 ||
N  = D-1Au = ||        2   1      || ||  0  1  0  1 0  0 || ||  2 || = ||  2+23+1- || = ||   3  || ,
             |(            3  1   |) |(  0  0  1  0 1  1 |) |(  3 |)   |(  2+33+3- |)   |(   2  |)
                             3        1  1  0  1 0  0      3          33         2.66
                                1     0  0  0  1 0  0      1          1           3

so the nodes with highest value are nodes (1,3,6).

4.3 Eigenvector Centrality

A natural extension of degree centrality is to consider all reachable nodes, not just neighbours. Eigenvector centrality measures a node’s importance while considering the importance of its neighbours. A high eigenvector centrality means that a node is connected to many nodes that have high scores themselves.

Definition 4.3. Given the graph G = (V,E), with adjacency matrix A, the eigenvector centrality of node v is

        ∑
Ev = 1-    AvuEu,
     λ u∈Nv

where λ is a parameter. Note that this can be written as

E =  1AE,
     λ

or

AE = λE.

This means that E is an eigenvector of A, for the eigenvalue λ.

Bonacich suggested that the eigenvector of the largest eigenvalue of A could make a good network centrality measure.

The eigenvector E must be non-negative and according to the Perron-Frobenius theorem, the largest λ enforces this property, making it a suitable value.

Example 4.3. Let’s compute E for the previous example graph. The matrix A has as largest eigenvalue λ = 2.54, and the corresponding eigenvector is

    ( 2.5 )
    | 3.1 |
    || 2.2 ||
E = || 2.5 || .
    |( 3.2 |)
       1

Note that it is usually unfeasible to compute the eigenvalues and eigenvectors. It is more usual to get the vector iteratively as

Ek = A -Ek-1-.
       ∥Ek-1∥

4.4 PageRank Centrality

Google’s PageRank is a variant of the eigenvector centrality, which uses in-degree to award one centrality point for every link a node receives. As we saw, the algorithm is based on a web surfer who is randomly clicking on links, with a certain probability to go to a different place of the web (the damping factor).

Therefore, we define the matrix

Pg = (1 - p)P + pB,

where Pij = {
  -1  if j ∈ Ni
  d0i  otherwise, and Bij = 1
n.

Now, we apply the eigenvector centrality to this modified matrix, as

P E  = λE  = E ,
 g  g    g    g

with λ = 1 because Pg is stochastic.

Or, iteratively as

Egk = PgEgk-1.

Note that in this case it is not necessary to normalize the vector at each step, because Pg is stochastic. A good Eg0 is Eg0 = (  1n )
|  .. |
(  .1 )
   n.

4.5 Katz (or alpha) centrality

The main problem with eigenvector centrality is that it only works well when the graph is strongly connected (so Perron-Frobenius is applicable in its stronger form). Real networks do not usually have this property, specially if they are directed. The vertices that are not in strongly connected components will have value 0.

A way to work around this problem was proposed by Leo Katz. The idea is to give each node a minimum, positive amount of centrality, that it can transfer to other nodes, so:

       ∑
Kv = α    AvuKu + β,
        u

where Kv is the Katz centrality of node v, β is a vector whose elements are all equal to a given positive constant and α (0,1) is a parameter. Equivalently, this is

K  = αAK  + β,

so

(I - αA )K = β,

and

K = (I - αA )-1β.

For this to work, I -αA must be invertible, which happens if and only if |I - αA|0 ⇐⇒||1I - A ||
 α0, so 1
α must not be an eigenvalue of A. This is ensured if we take 1
α > λmax, or 0 < α < -1--
λmax.

An iterative way to compute K is

     ( ∞      )
K  =  ∑  αkAk   u.
      k=1

The strength of α decreases at each iteration, acting as attenuation factor.

4.6 Clustering Coefficient Centrality

Triadic closure is the property among three nodes A, B, and C (representing people, for instance), that if the connections A-B and A-C exist, there is a tendency for the new connection B-C to be formed.

The clustering coefficient measures the proportion of neighbours of each node, that connected to each other.

Definition 4.4. Given a graph G = (V,E ), with adjacency matrix A, the clustering coefficient of node v is

       |{{u,v,w} : (u,v),(v,w ),(u,w ) ∈ E}|
CCv  = --------------(dv)--------------,
                      2

where the numerator is the number of triangles involving v and its neighbours, and the denominator is the total number of possible links between v’s neighbours.

The more densely connected the neighbourhood of v is, the higher is its clustering coefficient.

Example 4.4. The clustering coefficient of the graph example that we’ve been working with is

     (  11 )   (  1 )
     |  1 |   |  1 |
     ||  30 ||   ||  30 ||
CC = ||  10 || = ||  0 || .
     |(  31 |)   |(  1 |)
        30        30

4.7 Closeness Centrality

Closeness centrality is a measure of how close a node is, on average, to the rest of the nodes, in terms of shortest paths. It measures the average distance between a node v and all other nodes in the network. Thus, the more central a node is, the closer it is to all other nodes.

Definition 4.5. Given a graph G = (V,E ), the closeness centrality of node v is

CLv = ∑-----1------.
        r⁄=vdist(v,r)

It can be normalized by the factor

          N - 1
CLv = ∑------------.
        r⁄=vdist(v,r)

An alternative is the harmonic centrality, obtained as

Hv = ∑   ---1----,
     r⁄=v dist(v,r)

with dist(v,r) = 0 if there is no path from v to r.

4.8 Betweenness Centrality

A family of betweenness measures are defined to capture a node’s importance as a conduct of information flow in the network. This has wide applications in network theory, because in a telecommunications network, a node with higher betweenness centrality would have more control over the network, since more information will pass through that node.

The most well-known betweenness metric measures the number of times a node is on a shortest path between two nodes.

Definition 4.6. Given a graph G = (V,E ), the betweenness centrality of node v is

Bv = ∑    σs,t(v),
     s⁄=v⁄=t  σs,t

where σs,t is the number of shortest path from source node s to target node t, and σs,t(v) is the number of shortest path between these two nodes going through v.

This measure can be normalized by the number of ordered pairs not including v:

  • For directed graphs

               1       ∑   σs,t(v)
Bv = (n---1)(n---2)     -σs,t-.
                  s⁄=v⁄=t

  • For undirected graphs

                       ∑
Bv = ------2------     σs,t(v).
     (n - 1)(n - 2)s⁄=v⁄=t σs,t

Example 4.5. For the undirected star graph:

PIC

The center vertex has a betweenness of (n-1)(n-2)-
   2 (or 1, if we normalize it), while the leaves have a betweenness of 0.

Exercise 4.1. What about the following graphs?

5 MapReduce Computation Model

The advent of big data and the increasing analysis needs favorised the design of parallel algorithms, specially in the realm of big data processing pipelines, with a tradeoff between communication costs and degree of parallelism.

MapReduce is a processing paradigm that works on top of distributed environments. More precisely, it was built on top of Google File System (GFS) and Hadoop Distributed File System (HDFS), used to manage large-scale data and to be tolerant to hardware and networks faults. To do this, HDFS splits files into large blocks and distributes thema cross nodes in a cluster, and MapReduce is the programming model used to manage many large-scale parallel computations.

Basically, the idea is that the data is first splitted, then some operation is done to it, and then it’s merged to produce the final results. For this, we will just need to define the Map and Reduce functions, while the system manages the parallel execution on distributed data and the coordination between them, leading with the possibility that one of the tasks may fail.

Example 5.1. Word Counter

Consider a text file splitted into partitions A,B,C,D, across different nodes. We want to count how many times each word appears in the whole document. For this, we can use MapReduce as follows:

  1. Map: for each word, w, in each partition, generate the pair (w,1).

  2. Shuffle/sort: collects and groups the pairs by key (word), in order to guarantee that the same key will be processed by the same reduce task. Shuffling is the process of redistributing data from Map nodes to Reduce nodes.

    In our example, we would have, for each word w, the pairs (w,[1,...,1]), with as many 1s as w appearances.

  3. Reduce: for each input (w,[1,...,1]), output (w, Nw), where Nw is the amount of 1s.

The Map task will typically process many words in one or more chunks. If a word, w, appears m times among all chunks assigned to that process, there will be m key-value pairs (w,1) among its output.

To perform the grouping and distribution to the Reduce task, the master controller merges the pairs by key and produces a sequence of (w,[1,...,1]). Since it knows how many reduce tasks there will be, r, it will produce r lists, putting a list in one of r local files destined to one of the Reduce tasks. Each key is assigned as input to one, and only one, Reduce task.

The Reduce task executes one or more reducers, one per key. The outputs from all reducers are merges into a single final file.

5.1 The Map Function

In general, a map function can be defined as a function, mf : E1n E2n, where Ei is the domain of the input (1) or output (2) and f : , that applies f to each coordinate. That is:

mf ([e1,...,en]) = [f (e1),...,f (en)].

For example:

m⋅2([2,3,6]) = [4,6,12].

In the MapReduce scheme, map is more restrictive, as the function f must produce a key-value pair. That is, for all i = 1,...,n, it is

f (e ) = (k ,v ).
   i     i i

For example, in the word counter example:

mf ([”a”,”b”,”a”]) = [(”a”,1),(”b”,1),(”a”,1)].

5.2 Shuflling/Grouping Function

The shuffle function consists in grouping the outputs of the map function by key, so

s([(k1,v1),...,(kn,vn)]) = [(k1,(vj : kj = k1,∀j = 1,...,n )),...].

Following the previous example:

s([(”a”,1),(”b”,1),(”a”,1)]) = [(”a”,[1,1]),(”b”,[1])].

5.3 Reduce Function

Generally, a reduce function applies to a vector/row, and outputs a single value, applying the aggregation function f:

rf ([v1,...,vn]) = f (v1,...,vn).

In MapReduce, reduce applies to each output of the shuffle function with the same key:

rf ((k,[v1,...,vn])) = [(k′1,f (v1,...,vn)),...,(km′, f (v1,...,vn))].

Following the previous example:

rsum ([(”a”,[1,1]),(”b”,[1])]) = [(”a”,2) ,(”b”,1)].

A MapReduce pipeline can be a composition of different rfr s mfm.

The process is illustrated below:

PIC

5.4 MapReduce Execution Model

Whenever we launch the execution of a MapReduce pipeline, the following happens:

PIC

5.4.1 Coping with Node Failures

If the master node fails, the entire MapReduce job must be restarted.

If a worker node fails, it would be detected and managed by the master, since it periodically pings the worker processes. All the map tasks assigned to this worker have to be redone in this case.

5.4.2 Algorithms by MapReduce

This paradigm is not a solution to every problem, and in fact it only makes sense when files are very large, and rarely outdated. Its original purpose was to execute very large matrix-vector multiplications.

5.5 Use-Case: Matrix-Vector Multiplication by MapReduce

Let M be a n × n squared matrix and V a vector of size n. Their product,

W = M V,

is defined by

      n
w  = ∑  m  v .
  i  j=1  ij j

We can store M and V in a file in HDFS as triples ((i,j),mij) for M and pairs (j,vj) for V 1 . Now we can compute the computation by MapReduce as:

For this to work, all the pairs from V must be available in all chunks (V cannot be stored distributely).

More concretely, we can define the functions:

1map(key,val): 
2  i = key(1) 
3  j = key(2) 
4  for (j2, v) in V: 
5    if j == j2: 
6      emit(i, val*v) 
7 
8reduce(key, val): 
9  sum = 0 
10  for v in val 
11    sum += v 
12  emit(key, sum)    

PIC

Now, if n is large, V might not fit in main memory of a worker node, and a large number of disk accesses may be required. We can improve the approach by distributing V and refining the algorithm as follows:

5.5.1 Matrix Multiplication

This approach can be extended to matrix multiplication. Now, let M be a matrix of size n1 ×n2 and N a matrix of size n2 × n3, the product P = MN is a matrix of size n1 × n3, where

      n
     ∑ 2
pik = j=1MijNjk.

The matrices are stored as (M, (i,j),mij) and (N, (j,k),njk).

In addition, M could be divided into K vertical stripes of size (n1,nk) and N into K horizontal stripes of size (nk,n3), where knk = n2. In this setup, we can apply the algorithm to compute each Mk Nk and then sum them all.

The functions can be defined more precisely as:

1map_1(T,(i,j),T_ij): 
2  emit(j , (T,i,T_ij)) 
3 
4reduce_1(key, val): 
5  for v in val: 
6    for w in val: 
7      if v(1) == M and w(1) == N: 
8        i = v(2) 
9        M_ij = v(3) 
10        k = w(2) 
11        N_jk = w(3) 
12        emit((i, k), M_ij*N_jk) 
13 
14map_2(key, val): 
15  emit(key, val) 
16 
17reduce_2(key, val): 
18  sum = 0 
19  for v in val: 
20    sum += v 
21  emit(key, sum)

5.6 Relational Algebra by MapReduce

5.6.1 Selection

Let R(A1,...,An) be a relation stored as a file in HDFS. The elements of this file are the tuples of R. The selection operator, σC(R) can be defined using MapReduce as:

5.6.2 Projection

For the projection, πA(R), we can do:

5.6.3 Join

R(A)⊳⊲BS(C ) with A,B,C sets of attributes satisfying B A,B C, can be implemented with MapReduce as:

5.6.4 Aggregation

The aggregation operator, γA,θ(B)(R), where A B is the set of attributes of R, and A B = , can be defined with MapReduce as:

5.7 Some Issues of MapReduce

6 Spark Parallel Computing Framework

Nowadays, data is growing faster than processing speeds, and so the only possible solution is to parallelize on large clusters.

Apache Spark is an open source implementation of a framework for large-scale data processing, providing an interface for programming clusters with implicit data parallelism and fault tolerance. It extends a programming language with read-only data structure distributed over a cluster of machines, the Resilient Distributed Datasets (RDDs), maintained in a fault-tolerant way. RDDs were developed in 2012 in response to limitations in Hadoop’s MapReduce, which forces a particular linear dataflow as a sequence of HDFS reads and writes.

Spark is up to 100 times faster than traditional Hadoop thanks to its in-memory data processing:

PIC

6.1 Spark’s Engine Properties and Components

Spark was originally written in Scala, a high level language for JVM. There are APIs for Java, Scala, Python, R,...

The Dataframe API was released as an abstraction on top of the RDD, as well as packages like MLlib or GraphX, that can be used for machine learning and graph analytics. These APIs facilitate the implementation of both iterative algorithms and interactive or exploratory data analysis.

Spark requires a cluster manager and a distributed storage system:

Spark also supports different sources of data, in different formats and from different databases. All these relationships are shown below:

PIC

6.2 Spark’s Execution Architecture

Data is splitted into partitions or blocks, and the driver assigns tasks to each worker, which reads a HDFS block, and has a cache. Each worker process and cache data, if necessary, sending the results to the driver when done.

PICPICPICPIC

6.3 Resilient Distributed Datasets (RDDs)

RDDs are immutable and distributed collections of objects, spread across a cluster, and stored in RAM or disk (when they are persistent). They are statically typed, i.e., RDD[T] has objects of type T. The types can be of any type of Python, Java, or Scala objects, including user-defined classes.

RDDs are built via parallel transformations and computed via parallel actions on distributed datasets, executed lazily. For instnace, RDDs are splitted into multiple partitions, which may be computed on different nodes of a cluster.

Inside Apache Spark, the workflow is managed as a directed acyclic graph (DAG). Nodes represents RDDs while edges represent the operations executed on the RDDs. Spark keeps track of the set of dependencies between different RDDs. This is called the lineage graph.

Example 6.1. Python Example: first line mentioning Python

1# spark context creation 
2sc = pyspark.SparkContext(...) 
3 
4# creating an RDD of strings with textFile() 
5lines = sc.textFile("README.txt") 
6 
7# Transformations: 
8#   Construct a new RDD from a previous one, one common transformation is filtering data that matches a predicate 
9pythonLines = lines.filter(lambda line: "Python" in line) 
10 
11# Actions: 
12#   Compute a result based on an RDD, and either return it to the driver program or save it to an external sotrage system 
13pythonLines.first()

Some notes:

6.3.1 RDDs Operations

6.3.2 RDDs Actions

6.3.3 Caching

Spark RDDs are lazily evaluated, and sometimes we use the same RDD multiple times. Naively, Spark will recompute the RDD and all of tis dependencies each time we call an action on the RDD. To avoid this, we can ask Spark to persist data using persist().

Notice that calling persist() does not force the evaluation of the RDD.

If we cache too much data, Spark will automatically delete old partitions. For the memory-only storage levels, it will recompute these partitions the next time they are accessed. This means that caching unnecessary data can lead to eviction and increased re-computation time.

7 Community Detection Approaches

7.1 k-Clique

Definition 7.1. A clique is a complete subgraph.

A k-clique is a complete subgraph with k nodes.

Example 7.1. In the following picture2 , we observe the brute force process to find all 4-cliques in this 7-node graph. For this, we need to check C74 combinations.

PIC

Most versions of clique problems are hard, and a common problem is that of finding maximal cliques. This is, cliques with the largest number of nodes. For this problem, we have different algorithms, like:

There are also algorithms with better theoretical complexity, but Bron-Kerbosch and some variants that improve it are more efficient in practice. The basic form of the algorithm is as follows:

1Bron-Kerbosch(G=(V,E)): 
2  # Initialization 
3  P = V 
4  X = {} 
5  R = {} 
6   
7  def BronKerbosch(R,P,X): 
8    if P = {} and X = {} then 
9      R is maximal 
10   
11    for v in P: 
12      P_prime = P.intersection(neighbours(v)) 
13      X_prime = X.intersection(neighbours(v)) 
14      R_prime = R.union({v}) 
15      BronKerbosch(R_prime, P_prime, X_prime) 
16      P = P - {v} 
17      X = X.union({v})

For example, the following tree represents the execution of the algorithm for the shown graph:

PIC

7.2 k-Core

k-Cliques are interesting as a concept, but they are very restrictive, and so some relaxations have been proposed. A very well known one are k-Cores [1]. The basic idea is that networks may present a core-periphery structure, where in the core there are a lot of connected nodes, while in the periphery we find a more sparse structure, with peripheral nodes called whiskers.

Definition 7.2. The k-core of a graph G is a maximal connected subgraph of G where vertices have at least degree k. Also called k-degenerate.

The approach to find k-cores is k-core decomposition. The idea is that, given a graph G = (V,E), we delete recursively all vertices, and edges connecting them, of degree less than k, to extract the k-core, such that:

The algorithms goes as:

1kCore(G=(V,E)): 
2  L = {} # Maps each vertex to the highest k-core it belongs 
3  d = [] # List of degrees for each vertex 
4  D = {} # Dictionary mapping each degree to all vertices with that degree 
5  d_max = max([degree(v) for v in V]) 
6 
7  for v_i in V: 
8    d[v_i] = degree(v_i) 
9    D[d[v_i]].append(v_i) 
10 
11  for k in range(0,d_max): 
12    while not D[k].empty(): 
13      v_i = D[k].pop # Get vertices of k-core 
14      L[v_i] = k 
15       
16      for u_j in neighbours(v_i): # Update neighbours removing the edges to the extracted vertex 
17        if d[u_j] > k: 
18          D[d[u_j]].pop(u_j) 
19          D[d[u_j]-1].append(u_j) 
20          d[u_j] -= 1

Example 7.2. Let’s go through an example:

PIC

First, we initialize d and D:



























i 12345678910111213141516171819202122232425


























d[v_i]465746552 2 2 3 4 4 4 3 3 5 3 2 3 1 1 1 1


































d[v_i] 7 6 5 4 3 2 1








D[d[v_i]][4][2,6][3,7,8,18][1,5,13,14,15][12,16,17,19,21][9,10,11,20][22,23,24,25]








Now, we start the procedure. For k = 1, the first node to be taken out would be 22, so:



























i 1234567891011121314151617181920 21 22232425


























d[v_i]465746552 2 2 3 4 4 4 3 3 5 3 2 3 21 1 1 1


































d[v_i] 7 6 5 4 3 2 1








D[d[v_i]][4][2,6][3,7,8,18][1,5,13,14,15][12,16,17,19,21][9,10,11,20,21][22,23,24,25]










v_i 22


L[v_i]1


Now, with 23:



























i 123456789101112131415161718192021 22232425


























d[v_i]465746552 2 2 3 4 4 4 3 3 5 3 2 2 11 1 1 1


































d[v_i] 7 6 5 4 3 2 1








D[d[v_i]][4][2,6][3,7,8,18][1,5,13,14,15][12,16,17,19][9,10,11,20,21][21,23,24,25]











v_i 2223



L[v_i] 1 1



Now, with 24:



























i 12345678910111213 14 1516171819202122232425


























d[v_i]465746552 2 2 3 4 4 3 4 3 3 5 3 2 1 1 1 1 1


































d[v_i] 7 6 5 4 3 2 1








D[d[v_i]][4][2,6][3,7,8,18][1,5,13,14,15][12,16,17,19,14][9,10,11,20][21,24,25]












v_i 222324




L[v_i] 1 1 1




And so on...

7.3 Community Detection Problem

A community in a network refers to the occurrence of clusters or groups of nodes in a network, that are more densely connected internally than with the nodes outside the community. There are mainly two cases:


PICPIC
Figure 1: Non-verlapping communities (left) and overlapping communities (right).


There are several considerations or properties of communities that can be used to obtain them:

This problems is a combinatorial optimisation problem, and it is NP-hard if we want a exact solution. However, there are many approaches that leverage greedy techniques or approximate heuristics, that we are going to see now.

7.3.1 k-Clique Community

Definition 7.3. A k-clique community is the union of all k-cliques that can be reached from one to the other through a sequence of adjacent k-cliques.

Two k-cliques are adjacent if they share k - 1 vertices.

To find k-clique communities, we can use the Clique Percolation Method [2], which consists in:

  1. Find all maximal cliques, sorting them based on degrees.

  2. Create clique overlap matrix, where M[i,j] = 1 if cliques i and j overlap in at least k - 1 nodes.

  3. Communities are the connected components that arise from this.

7.3.2 Louvain Algorithm

The most populat community detection algorithm is the Louvain algorithm, which is based in the concept of modularity:

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for social networks, allowing the modeler to incorporate arbitrary degree distributions.

The idea is to assign a degree dv to each node v. Each degree is a half-link, or stub, and the sum of stubs must be even to be able to build a graph, i.e., vV dv = 2m. Then, we choose two stubs uniformly at random and connect them by a link, then choose another pair from the remaining 2m - 2 stubs, and connect them. We repeat this process until there are no more stubs. The resulting network keeps the same degrees but randomly pairs up nodes.

A realization might include cycles, self-loops or multi-links. The uniform distribution of the matching must be kept, so these are not excluded. However, their expected number goes to zero for large networks, because the probability of v being connected to one of w stubs is 2dmw--1. Since node v has dv stubs, the probability of v being connected to w is -dvdw-
2m -1 which is almost the same as dvdw-
2m for large m.

Now, modularity measures the relative density of links inside communities with respect to links outside communities. There are different methods to compute the modularity, but in the most common version, the randomization of the edges is done to preserve the degree of each vertex.

The basic idea is to compare the number of links within communities with the number expected on the basis of chance. The generated network has less links between nodes of the same community, and more between nodes of different communities.

It can be computed as

        ∑  ∑  [         ]
Q = -1-        Aij - didj δ(ci,cj)
    2m   i  j       2m

or

    [∑  ∑        ∑    ∑    ]
     --i--jAij   --idi--jdj
Q =     2m     -   (2m )2    δ (ci,cj),

where di is the degree of vertex i, and ci is the community of vertex i. δ is the Kronecker function, so δ(ci,cj) is 1 if ci = cj, and 0 otherwise.

The modularity is -1 Q 1 and it is positive when links within communities exceed links within communities in a randomly rewired network.

This is the modularity of a graph, but we can also compute the modularity of a community, c, as

     ∑    ∑         ∑      ∑
     --i∈c--j∈c-Aij  --i∈cdi--j∈c-dj
Qc =      2m      -     (2m )2

or

     ∑     ( ∑      )2
Q  = --in -  --in+out   ,
  c   2m       2m

where in is the sum of edge weights between nodes within the community c (each edge is considered twice), and in+out is the sum of all edge weights for nodes within the community, including incident ones from the other communities.

Note that if there is only one community, then Q = 0.

Now, finding the optimal value for modularity is impractical, because it needs to go through all possible combinations of the nodes into communities, and so the Louvain method is a heuristic method for greedy modularity maximisation in 2 phases:

  1. Modularity is optimized by allowing only local changes to node communities memberships.

    1. Initially, each node is assigned its own community.

  2. The identified communities are aggregated into super-nodes, to build a new network.

    1. This phase is repeated until there is only one super node, the whole graph.

This algorithm is widely used for large networks, because it is efficient and produces high dense communities. It has a complexity of O(nlogn) in time, and can be used on weighted graphs. In addition, it provides hierarchical communities, allowing us to define the level of detail of the communities, and also to obtain subcommunities within communities. A sample visualization is the following:

PIC

In more precise terms, the phase 1 goes as follows: what is ΔQ if we move a node v from CJ to CI:

ΔQCJ  v→CI = ΔQCJ \{v} + ΔQCI ∪{v}.

But how can we derive ΔQCI{v}? The key here is that the gain in modularity does not depend on the original community of v. It can easily be computed by moving an isolated node v into CI:

                    [         ]
ΔQCI ∪{v} = QCI∪{v} - QCI +Q {v},

and therefore

           [∑             ( ∑          )2]  [ ∑     ( ∑      )2  (    )2]
ΔQC  ∪{v} =  --in+2-⋅dv,in-   --in+out+dv-    -  --in--   --in+out  -   dv-    .
    J            2m             2m            2m        2m         2m

On the other side,

           [             ]
ΔQCJ \{v} =  QCJ\{v} + Q{v} - QCJ,

so

           [∑             ( ∑          )2  (    )2]  [ ∑     ( ∑      )2]
ΔQ       =  --in- 2-⋅dv,in--  --in+out--dv-  -   dv-   -   --in--   --in+out   .
   CJ\{v}        2m              2m           2m        2m        2m

Therefore, the gain only depends on dv,in and C.

The complete algorithm is:

  1. For each node v:

    1. Compute modularity gain from removing v from its community and placing it in the community of its neighbours.

    2. Place v in the community that maximizes ΔQ.

    3. One iteration is achieved for all the nodes in a sequential manner.

    4. Repeat the procedure sequentially to all nodes until no more improvement (local maximum of modularity).

    5. The output of this phase depends on the order in which nodes are considered. Research shows that this does not significantly affect the overall modularity.

  2. Nodes from communities are grouped into super nodes.

    1. Links between nodes of the same community c are represented by self-loops weighted by adding up the links between these nodes: c has a loop edge with weight

           ∑
w =       Ai,j.
    vi,vj∈c

    2. Links between communities are weighted by adding up the links between community’s nodes. Each (ci,cj) has a link with weight

            ∑
w =         Ai,j.
    vi∈ci,vj∈cj

7.3.3 Walktrap Approach

This other approach is based on random walks. The idea is to consider a random walk on a graph, in which at each time step, we move to neighbours uniformly at random:

     Aij
Pij =-d-, P = D -1A,
       i

and Pijt represents the probability to get from i to j in t steps.

We would think that two nodes i and j that lie in the same community should have a high Pijt, as well as similar Pikt ~ Pjkt, for different k. Therefore, we can define a distance (or similarity) between nodes, as

       ┌ ---------------
       ││  n (P t - Pt)2   ∥               ∥
rij(t) = │∘ ∑ ---ik----jk---= ∥∥D - 12P t- D- 12Pt∥∥.
         k=1     dk             i        j

In addition, we approximate the computation by

P t~  Nik,
 ik   Ni

where Nik is the number of walks starting at i and passing through k, and Ni is the number of walks starting from i.

The approach goes as follows:

  1. Assign each node to its own community.

  2. Compute the distance between adjacent nodes, rij(t).

  3. Choose the two closest communities and merge them.

  4. Update the distance between communities, as

             ┌ ----------------
         ││ ∑n (Pctk - Pctk)2 ∥∥   1        1   ∥∥
rc1,c2 (t) = ∘  ---1-d---2---= ∥D -2Pct1 - D -2P tc2∥,
           k=1      k

    where

    Pt = -1 ∑  Pt.
 ck  |c|i∈c ik

  5. Finish when there is only one community.

7.3.4 Girvan-Newman Approach

Edge betweenness of e is the fraction of shortest paths between all nodes s and t going through edge e:

       ∑  nes,t
ebt(e) =   ns,t.
       s⁄=t

In this case, we focus on edges that connect communities, and construct them by progressively removing edges with the highest betweenness value. It goes as:

  1. For all e E

    1. Compute ebt(e)

    2. Remove the edge e with largets ebt(e)

  2. Repeat until all edges are gone. Stop when it splits in 2 components. The output is a dendogram.

8 Models of Influence and Diffusion

8.1 Introduction

B. Ryan and N. Gross published Acceptance and Diffusion of Hybrid Corn Seed in two communities [3] in 1943. In this paper, they studied the information effect and the adoption of technology between 1924 and 1940. They found out that:

This information is shown here:

PIC

This idea has been studied multiple times in different areas:

8.1.1 Compartmental Models in Epidemiology

General models for infectuous diseases from human to human have been modeled since the Spanish flu in 1918. Relevant applications have been to SIDA and more recently to COVID 19.

The epidemiological models are based on 2 concepts:

There are variants with or without the dynamics of birth and death, immunity periods, etc

SIR is the simplest predictive model, and recovery confers lasting resistance, with death being negligible. The three compartments are:

Now, S(t),I(t) and R(t) are functions defined to study the dynamic in a short infectious period. Let’s define the rest of the variables:

The SIR system can be expresed using the following differential equation system:

(| dS = - β-⋅S ⋅I
{ ddtI   βN
|( ddtR = N ⋅S ⋅I - γI
  dt = - γI

The Gillespie algorithm is used to simulate chemical or biochemical systems. It is a stochastic simulator algorithm that generates a statistically correct trajectory of a stochastic equation system for which the reaction rates are known.

The SIR model has many variants, that have been developed over the years:

8.1.2 Challenges in Social Networks

Diffusion models are used to identify the way information is transmitted in a network. So:

From the computer science perspective, we need to develop fast and efficient algorithms on large networks.

8.2 Influence and Diffusion Models

These can be probabilistic models. The probability that someone becomes activated based on its activated n neighbours is

                n
P (n ) = 1- (1- p) ,

where p is the activation probability of a neihbour and n is the number of activated neighbours. The intuition here is that the more neighbours are activated, the higher probability that you will become activated.

But they can also be threshold models. In this case, nothing happens until the threshold is reached:

P (b) = δ (b > b) .

In both cases, the main idea is to define a diffusion process on the network, originating from a set seed S. The expected number of activated nodes at the end is the influence of S, σ(S).

The network is represented as a directed graph, G = (V, E), where individual ndoes are active or inactive. The process is thus defined as:

8.2.1 Independent Cascade Model

In the cascade model, when node u becomes active, it is given a single chance to activate each currently inactive neighbour v, with a probability of success of pu,v.

This probability is independent of the history. If u succeeds, then v will be active in step t + 1, but whether or not u succeeds, it cannot make any further attempts to activate v in subsequent rounds.

If v has multiple newly activated neighbours, their attempts are sequenced in an arbitrary order.

For example, see the following diffusion process:

PIC

Let now Dt be the set of active nodes at step t. For v N(Dt), its probability of being active at t + 1 is

p  (t+ 1) = 1-    ∏    (1 - p  ).
 v                         u,v
              u∈N(v)∩Dt

The sets It and St follow the following iterative definition:

I0 = S,

S0 = V \ I0,

S   = S \ I  ,
 t+1   t   t+1

where It+1 is obtained from St and It as explained before.

Then, the set of all infected nodes throughout the contagion process originating at S is

I(S) = ∪t≥0It,

and the expectation is taken over the random infection attempts from the infected nodes, as

σ (S ) = E [|I(S)|].

However, there is still something that we need to sort out: how to define pu,v.

A commonly used approach is to assign to each edge (u,v) the probability pu,v = -1
d-v (note that we consider the directed edges).

Some studies propose to learn influence probabilities from data, for example studying the propagation actions in social networks, such as replies, forwards, etc.

8.2.2 Threshold Model

In this case, we consider, for each node u, the proportion p of active neighbours, and (1- p) of inactive ones. To become activated, node u needs to verify

ρ ⋅p⋅du > γ ⋅(1- p)⋅du,

so the threshold to accept is

    --γ--
p > ρ +γ ,

where ρ is the reward (weight) of active nodes and γ is the reward of inactive nodes. The model applies this rule in cascade until there are no more possible activations.

For example, if ρ = 3 and γ = 2, the threshold is 25. It would go as follows:

PIC

The Linear Threshold Model In this case, each node v is influenced by each incoming active neighbour u, according to a weight wu,v, so that

 ∑
      wu,v ≤ 1.
u∈N (v)

Each v has a random acceptance threshold θ ~U[0,1], which represents the fraction of v’s neighbours that must become active in order for v to become active.

Given random thresholds, and an initial set of active nodes, S0, with all the other nodes inactive, the diffusion process unfolds in discrete steps: in step t, all nodes that were active in step t- 1 remain active, and activate any node v such that the total weight of its active neighbours is at least θv:

 ∑
     wu,v ≥ θv.
u∈N(v)

A possible way to define wu,v is by taking into account the degree of v, as wu,v = -1
dv.

8.2.3 Cascades and Clusters

Homophily can often serve as a barrier to diffusion, by making it hard for innovations to arrive from outside communities. This means that we can use the structure of a network to analyze the success or failure of a cascade.

A cluster of density δ is a set of nodes such that each node in the set has at least a δ fraction of its network neighbours in the set. For the cascade process to enter the cluster, the threshold should be smaller than 1 - δ.

8.3 Influence Maximization Problem Formulation

Given G = (V,E), let σ be a function such that σ : S maps a set of nodes S V to their influence value σ(S), i.e., the expected number of activated nodes in a diffusion process.

The influence maximization problem asks:

For a given budget, k, find a set of k nodes, S, with objective

max σ (S).
|S|≤k

Solving this problem is NP-hard, so we need to leverage approximate methods.

8.3.1 Greedy Framework

The approximation by greedy algorithm consists in iteratively adding nodes to S by maximizing the marginal gain in each step:

1greedy_max_influence(G=(V,E), k): 
2  S = {} 
3  for i in range(1,k) do 
4    u = argmax_u([ sigma(S.union({u})) - sigma(S) for u in V-S]) 
5    S = S.union({u})

8.3.2 Submodular Functions

A set function f is submodular if for any sets R T and v∕∈T,R, it holds

f (R ∪ {v})- f (R) ≥ f (T ∪{v}) - f (T).

These functions represents functions of diminishing returns: the marginal gain from adding an element to a set R is at least as high as the marginal gain from adding the same element to a superset of R.

It also implies the monotonicity of f, f(R ∪ {v}) f(R).

Theorem 8.1. For a non-negative, monotone, submodular funciton f, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value.

Let S* be a set that maximizes the value of f over all k-element sets.

Then

       [    (    1)k]
f (S) ≥ 1 -  1- --    f (S*).
                k

In other words, S provides a (   1)
 1- e = limk→∞1 -(    1)
 1 - kk approximation of the optimal value.

It can be proven that σ(⋅) is a submodular function, and therefore, when we use the greedy approach, obtaining a set S, it must be

      (     )
           1      *
σ(S) ≥  1- e  σ (S  ).

References

[1]   Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters, 2008.

[2]   Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.

[3]   Bryce Ryan and Neal C. Gross. The diffusion of hybrid seed corn in two iowa communities. Rural Sociology, 8(1):15, 1943.

[4]   Nacéra Seghouani. Massive graph management and analytics. Lecture Notes.