Networks Part II Sharad Goel Columbia University Computational Social Science: Lecture 6 March 1, 2013
Corporate E-mail Communication [ Adamic & Adar, 2004 ] via Easley & Kleinberg
Networks/Graphs Nodes/vertices people, organizations, webpages, computers Edges represent connections between pairs of nodes
Distance Length of the shortest path between two nodes
Distance Length of the shortest path between two nodes
Breadth-first Search iteratively explore nodes one layer at a time
# initialize distances dist = {} for u in G: dist[u] = NA dist [u0] = 0 d=0 periphery = { u0 } while len(periphery) > 0: # find nodes one step away from the periphery next_level = {} for u in periphery: next_level += { w for w in neighbors[u] if dist[w] == NA } # update distances d += 1 for u in next_level: dist[u] = d # update periphery periphery = next_level
BFS @ scale undirected network Input edge list, starting node u0 Output Distance to all nodes from u0
BFS @ scale undirected network Input: edge list, distances (u, d) 1. join distances with edge list 2. foreach (u, d, w) output (w, d+1) [ also output (u0, 0) ] 3. group by w, and output min d
Connected Components undirected network Input Edge list Output List of nodes for each component
Connected Graph There is a path between every pair of nodes
Connected Graph There is a path between every pair of nodes
Connected Component A connected subset of nodes that is not contained in any larger connected subset
Connected Components undirected network 1. Select a node u0 that has not yet been assigned 2. BFS starting from u0 3. Record nodes reached by BFS
Consider the global human social network, with an edge between every pair of friends Is this network connected?
Consider the global human social network, with an edge between every pair of friends Is this network connected? No – there are people with no (living) friends, who are hence isolated from the rest of the network
Consider the global human social network, with an edge between every pair of friends Is there a “giant” connected component?
Consider the global human social network, with an edge between every pair of friends Is there more than one “giant” component?
Consider the global human social network, with an edge between every pair of friends Is there more than one “giant” component? No – unlikely to have two large disconnected sets of people
Consider the global human social network, with an edge between every pair of friends Is there more than one “giant” component? No – unlikely to have two large disconnected sets of people Historically it was more likely e.g., pre-Columbian America & Eurasia
Consider the global human social network, with an edge between every pair of friends On average, how far are people from one another?
The Small-world Experiment Stanley Milgram, 1967 296 people were randomly selected in Omaha and Wichita Packages sent to the selected individuals with instructions to forward to a particular stock broker in Boston through a chain of people they knew on a first-name basis.
The Small-world Experiment Stanley Milgram, 1967 Of the 296 packages, 232 did not reach target Of the 64 that did arrive, average path length was 6 “Six degrees of separation”
Small-world phenomenon Is “six degrees” big or small?
Small-world phenomenon navigational vs. topological
The Anatomy of the Facebook Social Graph J. Ugander, B. Karrer, L. Backstrom, C. Marlow 721 million users, 69 billion edges 5 degrees of separation
Edge list  degree distribution undirected network Input Edge list Output Degree distribution
3 1 2 5 4 7 6 Degree of node u # of edges incident on u
Edge list  degree distribution undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
Edge list  degree distribution undirected network Map input: u, k identity, key := k Reduce input: k, {u1, …, um} output: k, m
An email network of 130M users Edges indicate reciprocated communication
An email network of 130M users Edges indicate reciprocated communication (log-log plot)
Clustering
Clustering
Triadic closure 1. Opportunity 2. Incentive 3. Commonality
Counting Triangles undirected network Input adjacency list Output Number of triangles incident on each node
Counting Triangles In memory for u in nodes: triangles[u] = 0 for w in neighbors[u]: triangles[u] += len(neighbors[w] & neighbors[u]) triangles[u] = triangles[u] / 2
Counting Triangles @ scale Every node needs to know to which nodes it is connected and to which nodes its neighbors are connected
Counting Triangles @ scale Map input: u {w1, …, wk} foreach wi: output wi u {w1, …, wk} Reduce In memory triangle count
Homophily the tendency of individuals to associate with similar others “birds of a feather flock together”
Birds of a Feather: Homophily in Social Networks McPherson, Smith-Lovin, Cook race, sex, age, religion, education, occupation, social class, behaviors, attitudes, abilities, aspirations
Homophily 1. Preference 2. Influence 3. Opportunity
Fantasy Football
Computing Homophily Input Edge list, race of each individual Output Distribution of race among friends White Black Latino Asian White Black Latino Asian
Computing Homophily 1. Join edges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w
Computing Homophily 1. Join edges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w 3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
Computing Homophily 1. Join edges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w 3. Group edges (u, v, urace, vrace) by sorted([urace, vrace]) 4. Count edges in each group 5. Normalize the table
How do ideas and products spread through society?
The structure of diffusion 93% 5% 1% 0.3% 0.3%
Computing the structure of diffusion Input Twitter network Time-stamped “adoptions” of 1B URLs Output Distribution of cascade structures
Computing the structure of diffusion We assume v influenced u to adopt link t if v is the last of u’s contacts to adopt before u 2 3 5 9
Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
Computing the structure of diffusion Group edges by their labels (URL)
Computing the structure of diffusion Compute the connected components for each forest corresponding to a URL
Computing the structure of diffusion Definition. Two (rooted) trees are isomorphic if they are identical under a relabeling of the vertices.
x (x) (x, x) ((x)) (x, (x)) Basis. The canonical name c(T) for the one-node tree T is x. Induction. If T has more than one node, let T1, . . . ,Tk denote the subtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk) under the lexicographic order. Then the canonical name for T is (c(T1), . . . ,c(Tk)). Aho et al. [1974]
Computing the structure of diffusion Compute the canonical name for each tree in the URL forests
Computing the structure of diffusion Count the number of trees of each type
Computing the structure of diffusion We assume v influenced u to adopt link t if v is the last of u’s contacts to adopt before u 2 3 5 9
Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
Computing the structure of diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)
Computing the structure of diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w) 3. Group (link, u, uts, w, wts) by (link, u)
Computing the structure of diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w) 3. Group (link, u, uts, w, wts) by (link, u) 4. Output unique “parent” edge (link, u, uts, w, wts) for each group

Computational Social Science, Lecture 06: Networks, Part II

  • 1.
    Networks Part II Sharad Goel Columbia University Computational Social Science: Lecture 6 March 1, 2013
  • 2.
    Corporate E-mail Communication [Adamic & Adar, 2004 ] via Easley & Kleinberg
  • 3.
    Networks/Graphs Nodes/vertices people, organizations, webpages, computers Edges represent connections between pairs of nodes
  • 5.
    Distance Length of theshortest path between two nodes
  • 6.
    Distance Length of theshortest path between two nodes
  • 7.
  • 8.
    # initialize distances dist= {} for u in G: dist[u] = NA dist [u0] = 0 d=0 periphery = { u0 } while len(periphery) > 0: # find nodes one step away from the periphery next_level = {} for u in periphery: next_level += { w for w in neighbors[u] if dist[w] == NA } # update distances d += 1 for u in next_level: dist[u] = d # update periphery periphery = next_level
  • 9.
    BFS @ scale undirected network Input edge list, starting node u0 Output Distance to all nodes from u0
  • 10.
    BFS @ scale undirected network Input: edge list, distances (u, d) 1. join distances with edge list 2. foreach (u, d, w) output (w, d+1) [ also output (u0, 0) ] 3. group by w, and output min d
  • 11.
    Connected Components undirected network Input Edge list Output List of nodes for each component
  • 12.
    Connected Graph There isa path between every pair of nodes
  • 13.
    Connected Graph There isa path between every pair of nodes
  • 14.
    Connected Component Aconnected subset of nodes that is not contained in any larger connected subset
  • 15.
    Connected Components undirected network 1. Select a node u0 that has not yet been assigned 2. BFS starting from u0 3. Record nodes reached by BFS
  • 16.
    Consider the globalhuman social network, with an edge between every pair of friends Is this network connected?
  • 17.
    Consider the globalhuman social network, with an edge between every pair of friends Is this network connected? No – there are people with no (living) friends, who are hence isolated from the rest of the network
  • 18.
    Consider the globalhuman social network, with an edge between every pair of friends Is there a “giant” connected component?
  • 19.
    Consider the globalhuman social network, with an edge between every pair of friends Is there more than one “giant” component?
  • 20.
    Consider the globalhuman social network, with an edge between every pair of friends Is there more than one “giant” component? No – unlikely to have two large disconnected sets of people
  • 21.
    Consider the globalhuman social network, with an edge between every pair of friends Is there more than one “giant” component? No – unlikely to have two large disconnected sets of people Historically it was more likely e.g., pre-Columbian America & Eurasia
  • 22.
    Consider the globalhuman social network, with an edge between every pair of friends On average, how far are people from one another?
  • 23.
    The Small-world Experiment Stanley Milgram, 1967 296 people were randomly selected in Omaha and Wichita Packages sent to the selected individuals with instructions to forward to a particular stock broker in Boston through a chain of people they knew on a first-name basis.
  • 24.
    The Small-world Experiment Stanley Milgram, 1967 Of the 296 packages, 232 did not reach target Of the 64 that did arrive, average path length was 6 “Six degrees of separation”
  • 25.
    Small-world phenomenon Is “sixdegrees” big or small?
  • 26.
  • 27.
    The Anatomy ofthe Facebook Social Graph J. Ugander, B. Karrer, L. Backstrom, C. Marlow 721 million users, 69 billion edges 5 degrees of separation
  • 28.
    Edge list degree distribution undirected network Input Edge list Output Degree distribution
  • 29.
    3 1 2 5 4 7 6 Degree of node u # of edges incident on u
  • 30.
    Edge list degree distribution undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
  • 31.
    Edge list degree distribution undirected network Map input: u, k identity, key := k Reduce input: k, {u1, …, um} output: k, m
  • 32.
    An email networkof 130M users Edges indicate reciprocated communication
  • 33.
    An email networkof 130M users Edges indicate reciprocated communication (log-log plot)
  • 34.
  • 35.
  • 36.
    Triadic closure 1. Opportunity 2.Incentive 3. Commonality
  • 37.
    Counting Triangles undirected network Input adjacency list Output Number of triangles incident on each node
  • 38.
    Counting Triangles In memory for u in nodes: triangles[u] = 0 for w in neighbors[u]: triangles[u] += len(neighbors[w] & neighbors[u]) triangles[u] = triangles[u] / 2
  • 39.
    Counting Triangles @ scale Every node needs to know to which nodes it is connected and to which nodes its neighbors are connected
  • 40.
    Counting Triangles @ scale Map input: u {w1, …, wk} foreach wi: output wi u {w1, …, wk} Reduce In memory triangle count
  • 41.
    Homophily the tendency ofindividuals to associate with similar others “birds of a feather flock together”
  • 42.
    Birds of aFeather: Homophily in Social Networks McPherson, Smith-Lovin, Cook race, sex, age, religion, education, occupation, social class, behaviors, attitudes, abilities, aspirations
  • 43.
  • 44.
  • 45.
    Computing Homophily Input Edge list, race of each individual Output Distribution of race among friends White Black Latino Asian White Black Latino Asian
  • 46.
    Computing Homophily 1. Joinedges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w
  • 47.
    Computing Homophily 1. Joinedges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w 3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
  • 48.
    Computing Homophily 1. Join edges (u, v) by u, demographics (w, race) by w 2. Join edges (u, v, urace) by v, demographics (w, race) by w 3. Group edges (u, v, urace, vrace) by sorted([urace, vrace]) 4. Count edges in each group 5. Normalize the table
  • 49.
    How do ideasand products spread through society?
  • 50.
    The structure ofdiffusion 93% 5% 1% 0.3% 0.3%
  • 51.
    Computing the structureof diffusion Input Twitter network Time-stamped “adoptions” of 1B URLs Output Distribution of cascade structures
  • 52.
    Computing the structureof diffusion We assume v influenced u to adopt link t if v is the last of u’s contacts to adopt before u 2 3 5 9
  • 53.
    Computing the structureof diffusion Draw a labeled edge from v to u 3 t 5
  • 54.
    Computing the structureof diffusion Group edges by their labels (URL)
  • 55.
    Computing the structureof diffusion Compute the connected components for each forest corresponding to a URL
  • 56.
    Computing the structureof diffusion Definition. Two (rooted) trees are isomorphic if they are identical under a relabeling of the vertices.
  • 57.
    x (x) (x, x) ((x)) (x, (x)) Basis. The canonical name c(T) for the one-node tree T is x. Induction. If T has more than one node, let T1, . . . ,Tk denote the subtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk) under the lexicographic order. Then the canonical name for T is (c(T1), . . . ,c(Tk)). Aho et al. [1974]
  • 58.
    Computing the structureof diffusion Compute the canonical name for each tree in the URL forests
  • 59.
    Computing the structureof diffusion Count the number of trees of each type
  • 60.
    Computing the structureof diffusion We assume v influenced u to adopt link t if v is the last of u’s contacts to adopt before u 2 3 5 9
  • 61.
    Computing the structureof diffusion Draw a labeled edge from v to u 3 t 5
  • 62.
    Computing the structureof diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)
  • 63.
    Computing the structureof diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w) 3. Group (link, u, uts, w, wts) by (link, u)
  • 64.
    Computing the structureof diffusion 1. Join adoptions (link, u, uts) by u, edges (u, w) by u 2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w) 3. Group (link, u, uts, w, wts) by (link, u) 4. Output unique “parent” edge (link, u, uts, w, wts) for each group