Approximate Bayesian Computation for inference in generative models for large-scale network data Antonietta Mira Data Science, Group, ICS Università della Svizzera italiana, USI, Lugano, Switzerland and Department of Science and High Technology, DISAT Università dell’Insubria, Como, Italy joint with JP Onnela, Sixing Chen Harvard School of Public Health, US Ritabrata Dutta ICS, USI 13.12.17, SAMSI, USA Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Agenda Large-scale network data and Network models Approximate Bayesian Computation ABC for large-scale network data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Large-scale network data: Digitized Relationship Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Relationships =⇒ Adjacency matrix Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Relationships =⇒ Adjacency matrix + edge weights + node covariates + time + spatial coordinates + . . . = complex/large network data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Network data social network: friendships financial network: EU overnight interbank money transfer economy: EU countries import-export biology/ genetics networks: protein-protein interaction communication network: CDR information / knowledge network: patents citation / collaboration network: wikipedia / co-authorship Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Motivation Systems of scientific and societal interest have large numbers of interacting components Representation as networks: node = component, edge = interaction Different models of network data: Models of network structure (e.g. Erdös-Rényi) Models of dynamical processes on networks (e.g. SI model) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Communication network of 7 million nodes + 23 million ties JP Onnela et al. PNAS, 2007 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Big picture of statistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Big picture of statistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Two types of models Statistical model py|θ(yi |θ) = 1 √ 2πσ2 exp − 1 2σ2 (yi − µ)2 , θ = (µ, σ) Generative model → given θ = (µ, σ) →zi ∼ N(0, 1) →yi = µ + σzi →yi ∼ py|θ(yi |θ) In some setting easier to give the model in the generative form Sometimes there is no 1:1 correspondence bwn statistical and generative model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Big picture of statistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Likelihood-based statistical inference Likelihood function: L(θ) ∝ py|θ(y|θ) Plays a central role in statistical inference Maximum likelihood estimation: ˆθMLE = argmaxθ L(θ) Bayesian inference: pθ|y (θ|y) ∝ L(θ)pθ(θ) Not available for generative models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Why generative models? Allow to use knowledge domain on how the data were generated without having to make excessive compromises in the modeling Scale well with big data No limits on the number of unobserved/latent variables Easier to study the effect of interventions on generative models rather than statistical models LFI: Procedure to obtain probabilistic statements about θ even if likelihood is not available, as for generative models ABC: is a LFI methodology Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Examples of generative models Astrophysics: Simulating the formation of galaxies, stars, or planets Evolutionary biology: Simulating species evolution Ecology: Simulating species migration over time Neuroscience: Simulating neural circuits Epidemiology: Simulating the spread of an infectious disease Meteorology: Simulating weather prediction Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) Starting point is Bayes’ theorem: p(θ|y) = p(y|θ)p(θ) p(y) p(θ|y) = posterior p(y|θ) = likelihood p(θ) = prior p(y) = evidence Evaluation of the LHD may be computationally expensive or infeasible, ruling out both likelihood-based and posterior-based inference Approximate Bayesian Computation (ABC) avoids direct evaluation of the LHD and approximates it by generating synthetic data (synthetic observations) by simulation from the model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) ABC rejection sampler is the simplest form of ABC ABC rejection sampler Sample parameter θ∗ from the prior p(θ) Simulate dataset y∗ under the given model specified by θ∗: y∗ ∼ p(·|θ∗) Accept θ∗ if d(y∗, y) ≤ Distance measure d(y∗, y) determines the level of discrepancy between the simulated data y∗ and the observed data y The accepted θ∗ are approximately distributed according to the desired posterior and, crucially, obtained without the need of explicitly evaluating the LHD Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) M(θ) Generative model Nature y0 y∗ Observed Dataset Simulated Dataset Data Space Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) It may be unfeasible to compute the distance d(y∗, y) for high-dimensional data Lower dimensional summary statistic S(y) to capture the relevant information in y Comparison is done between S(y∗) and S(y): accept θ∗ if d(S(y∗), S(y)) ≤ Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Principle behind likelihood-free inference Replace LHD, py (y|θ), by SUMMARY LHD pS (S(y)|θ) where S(y) = summary statistics But pS (S(y)|θ) is also unknown Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Principle behind likelihood-free inference Replace LHD, py (y|θ), by SUMMARY LHD pS (S(y)|θ) where S(y) = summary statistics But pS (S(y)|θ) is also unknown Use an APPROXIMATE SUMMARY LHD: ˜pS (S(y)|θ), based on pseudo data y∗ generated from the model The POSTERIOR is also approximate and summarized: ˜pS (θ|S(y)) ∝ p(θ)˜pS (S(y)|θ) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) If S is sufficient wrt θ, then it contains all information in y about θ (by definition), and using S(y) in place of the full dataset does not introduce any error For most models it may be impossible to find sufficient statistics S, in which case application relevant summary statistics are used Use of non-sufficient summary statistics introduces a further level of approximation Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Network Models Distinction between two types of models of network structure: Statistical models (e.g. ERGM, Goyal-Blitzstein-DeGruttola) DATA DRIVEN Pros: inference on model parameters; hypothesis testing; model selection Cons: scalability; hard to incorporate domain knowledge Generative models (e.g. Price model) KNOWLEDGE DRIVEN assume that microscopic mechanisms that govern network formation and evolution are known, ask what happens if we apply these mechanisms repeatedly Pros: scalability, easy to incorporate knowledge, intervention Cons: no inferential tools; no model comparison Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Generative Model of Social and Contact Networks From the perspective of time expenditure of subject i: spend time with existing friends (a) become friend of a friend (b) make totally new friends (c) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Generative Model of Social and Contact Networks Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Generative Model of Social and Contact Networks Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
ABC for Generative Network Models ABC + generative network model = generic + sound inferential framework ABC rejection sampler for generative network models Observe an empirical graph G Set up generative network model M Sample parameter θ∗ from the prior p(θ) Simulate graph G∗ from model M using parameter θ∗ Accept θ∗ if d(S(G∗), S(G)) ≤ Some simple network summaries: degree sequence, k-stars, subgraph counts, centrality measures (betweenness, eigenvector, random walk, etc.), etc. Can use KNN to identify points in the space of summary statistics close to S(G) From Rejection-ABC to SMC-ABC by Drovandi+Pettitt, 2015 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Exact inference for Erdös-Rényi (ER) For the Erdös-Rényi model the LHD is available: Aij = dyad (pair of nodes): value 1 (connected) or 0 (not connected) Yij = RV coding the state of the dyad p = P(Yij = 1) N = number of nodes n = number of dyads: N(N − 1)/2 (undirected) and N(N − 1) (directed) L = number of connected dyads (sufficient statistics) LHD = p(G|p) = i=j P(yij = Aij ) = i=j pAij (1−p)1−Aij = pL (1−p)n−L MLE = ˆp = L/n = proportion of connected dyads to all dyads Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Exact Bayesian inference for ER Beta prior p(θ) = Beta(α = 5, β = 50) Beta posterior p(θ|D) = Beta(α + L, β + [N(N − 1)/2) − L]) Bayesian Estimator = posterior mean Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L We perform 100,000 draws from the prior pi For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summary statistics S(G1 i ), . . . , S(GK i ), retain pi if S(Gi ) = Li = L = S(G) We retained only 162 values of pi Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L We perform 100,000 draws from the prior pi For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summary statistics S(G1 i ), . . . , S(GK i ), retain pi if S(Gi ) = Li = L = S(G) We retained only 162 values of pi The mean and median of the prior are 0.091 and 0.086 the 95% prior credible interval is [0.031, 0.178] The mean and median of the posterior are 0.050 and 0.050 the 95% posterior credible interval is [0.043, 0.055] Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Degree distribution = summary statistic = S(G) KS = distance measure We performed 100,000 draws from the prior For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summaries S(G1 i ), . . . , S(GK i ), obtain a sequence of distances y1 i = d(S(G1 i ), S(G)), . . . , yK i = d(S(GK i ), S(G)) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Pool the distances yk i for all values of i and s: 100, 000 · K total Choose a cutoff value d∗ = 10th percentile Higher acceptance rate: 761 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Erdös-Rényi Pool the distances yk i for all values of i and s: 100, 000 · K total Choose a cutoff value d∗ = 10th percentile Higher acceptance rate: 761 The mean and median of the prior are 0.09 and 0.09 the 95% prior CI is [0.03, 0.18] The mean and median of the posterior are 0.05, 0.05 the 95% posterior CI is [0.04, 0.06] Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Inference for Erdös-Rényi TRUE: density functions for the prior (solid green) and the posterior (solid red) Here the prior distribution is the beta distribution B(α = 5, β = 50) True value of the parameter is θ = p = 0.05 = black vertical line ESTIMATED: Prior p(θ) (green histogram) and posterior p(θ|y) (red histogram) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Approximate Bayesian inference for Easley-Kleinberg (EK) The EK model is a simple model of directed networks Unlike ER, the EK model is a growing network model new vertices attach themselves to existing ones using a preferential attachment rule: each new node is not equally likely to attach itself to any of the existing nodes but, instead, has a preference for some nodes over others based on their degree More specifically, linear preferential attachment specifies that an incoming node will attach to an existing vertex vi with probability proportional to its degree ki More complicated functions of ki are possible, and in general one can incorporate nodal attributes, or covariates, in the expression for the attachment function RESULTS: Similar performance to the Erdös-Rényi model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Inference for social network generative model ABC + social network generative model Prior and posterior draws True parameter values: 0.3 and 0.6 (solid lines) Posterior means: 0.299 and 0.613 (dashed lines) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Summary Simulator-based models: Generative model for networks Challenge: Inference of parameters of generative model is challenging due to intractability of likelihood function Solution: Approximate Bayesian Computation Application: ABC works for different mechanistic models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Spreading Process on Network We study the spread of an epidemic / fake news on a network: Can be considered as a spreading process on a network Data = a fixed and known human contact interaction network = a fixed and know social network (N) Data = at some time points over a period of observation we know which nodes in the network are infected = we know which websites report the fake news Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Simple spreading process: Susceptible-Infected (SI) Rate of diffusion (θ): How fast a disease or a fake news spreads on a network? Seed-node (nsn): Can we infer the source of the infection or of the fake news? Algorithm 1 SI spreading process, over the network N starting at n0 SN, for each time-point t = 0, . . . , T 1: for t = 0 to T do 2: It = Infected nodes at time t 3: for each i ∈ It do 4: Select j from the neighbors of i with equal probability 5: If neighbor j is already infected, do nothing If not, infect it with probability θ 6: end for 7: end for Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Inference of SI process on networks? M : SI y (θ, nsn)? Inference of the parameters (θ, nsn) defining the SI process Challenging problem: due to the intractability of the likelihood New types of inferencial schemes are required Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
ABC for spreading process on networks ABC for spreading process Observe temporal data on network y Set up a model M described by SI process Sample parameter θ∗, n∗ SN from the prior p(θ, nSN) Simulate temporal data y∗ from model M using θ∗, n∗ SN Accept θ∗, n∗ SN if d(S(y∗), S(y)) ≤ using application relevant summaries S For inference, we use simulated annealing ABC SABC, Statistics and Computing, 2015 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Simulated network topologies Network structure and its connectivity affect the disease spread Two different network topologies 1 Barabasi-Albert network (BA): scale-free degree distribution 2 Erdös-Rènyi network (ER): edges are independent and equally likely 3 BA(m=4) and ER(p=0.05) network with 100 nodes 4 First observed time slice: t0 5 Length of the simulated SI process: T Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Details of ABC Need to define: 1 Model M: SI process as a generative model 2 Summary statistics S : y → S(y) 3 discrepancy measure d(S(y), S(y∗)) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Details of ABC Statistics: Two summary statistics s = (s1, s2, . . . , sT ), si = proportion of infected nodes at ti If whole network is infected at time τ, then st = 1 for t > τ G = (G1, G2, . . . , GT ), Gi = sub-graph of infected nodes at ti Distance: Two distances combined 1 d1(s1 , s2 ) = 1 T T t=1(s1 t − s2 t )2 2 d2(G1 , G2 ) = 1 T T t=1 i∈G1 t j∈G2 t ρ(i,j) ρmax , where ρ(i, j) = the shortest path between nodes i and j and ρmax = maximum shortest path length on the network 3 Final distance: d((s1 , G1 ), (s2 , G2 )) = 1 2 d1(s1 , s2 ) + d2(G1 , G2 ) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Details of ABC Prior on the seed node: Uniform on the infected nodes at t1 Prior on the infectivity parameter: Uniform on [0, 1] Perturbation Kernel: A perturbation kernel used to explore the parameter space is defined as a distribution on (θ, nSN) given the present parameter values (θ∗, n∗ SN), K((θ, nSN)|(θ∗, n∗ SN)) = K1(θ|θ∗, ˆσ)K2(nSN|n∗ SN) where K1 = Gaussian with ˆσ being the variance of the θ sampled in the previous step of SABC optimally rescaled K2 = discrete distribution on the neighboring nodes of n∗ SN with each node having a probability inversely proportional to its degree Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
BA (top) & ER (bottom): marginal posterior (SI) First observed time slice: t0 = 20 Length of the simulated SI process: T = 70 0.0 0.5 1.0 θ 0.0 4.7 9.4 Density P(θ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.07 0.14 Density 0.0 0.5 1.0 θ 0.0 3.4 6.9 Density P(θ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.03 0.06 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
BA (top) & ER (bottom): Bayes Estimate (SI) Average over 100 simulated dataset for same true parameter value 0.0 0.5 1.0 ̂θ 0.0 ̂.6 9.1 Density θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.34 0.69 Density 0.0 0.5 1.0 ̂θ 0.0 4.̂ 8.5 Density θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.20 0.39 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Complex contagion process (CC): Modeling fake news on social network Rate of exposure (β): Can we infer the rate of exposure of a node to infected nodes? Threshold (γ): Can we infer the percentage of infected neighboring nodes needed to infect a new node or to make it believe the fake news? Seed-node (nsn): Can we infer the source of the infection or of the fake news? For algorithmic details please consult Dutta et. al. 2017 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
BA (top) & ER (bottom): marginal posterior (CC) First observed time slice: t0 = 20 Length of the simulated SI process: T = 120 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 1.0 00 1.000 2 .0 0 0 3 .0 0 0 4.0 00 5.000 6.000 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.32 0.64 Density 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 2 .0 0 0 4.000 6.000 8.0 00 10.0 00 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.25 0.50 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
BA (top) & ER (bottom): Bayes Estimate (CC) Over 100 simulated dataset for same true parameter value 0.0 0.5 1.0 ̂ β 0.0 0.5 1.0 ̂γ 15.0 00 ̂0.000 45.0 00 θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.48 0.97 Density 0.0 0.5 1.0 ̂ β 0.0 0.5 1.0 ̂γ 10.0 00 ̂0.0 003 0 .0 0 0 40.000 θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.38 0.77 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
BA (left) & ER (right): Dependence of inference on ∆t = T − t0 0.0 0.5 1.0 θ 0.0 5.1 10.2 Density ΔT =10 ΔT =Δ0 ΔT =50 θ0 0.0 0.5 1.0 θ 0.0 4.1 8.2 Density ΔT =10 ΔT =Δ0 ΔT =50 θ0 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 0 .8 0 0 1 .6 0 0 2.40 0 3.200 4.0 00 0 .8 0 0 0.800 0.8 00 1 .6 0 0 2 .4 0 0 3.20 0 4.0 00 0 .8 0 0 0 .8 0 0 1.60 0 2.400 3 .2 0 0 4 .0 0 0 4.80 0 ΔT Δ 28 ΔT Δ 46 ΔT Δ 100 θ0 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 0 .6 0 0 1.200 1 .8 0 0 2.4 00 3 .0 0 0 3.600 1.5 00 3.0 00 4 .5 0 0 6.000 7.500 1 .5 0 0 3.0 00 4.5 00 6.0 00 7.5 00 ΔT Δ 28 ΔT Δ 46 ΔT Δ 100 θ0 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Real Network: SI on Indian village contact network Human-human interaction network by considering a financial network between 354 villagers living in South Indian state of Karnataka 354 nodes and 1541 edges 0.0 0.5 1.0 θ 0.0 5.8 11.6 Density P(θ|x0) θ0 ̂θ 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.01 0.03 Density Marginal posterior distribution The shortest path length between n0 SN = 70 and ˆnSN = 59 is 1 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Real Network: CC on Facebook social network 4039 nodes and 88234 edges 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 2.500 5 .0 0 0 7 .5 0 0 10.0 00 12.5 00 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.11 0.22 Density Marginal posterior distribution. The shortest path length between n0 SN = 2000 and ˆnSN = 2435 is 1 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Summary Simulator-based models: A diffusion process on a fixed network Challenge: Inference of parameters of spreading process and seed-node is challenging due to intractability of likelihood Solution: Approximate Bayesian Computation Application: ABC works for different simulated and real network topologies Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Computational facility ABCPy: A python suite of ABC: user friendly and modular Paralleized: The most popular ABC algorithms are parallelized Super-computers: Work done in collaboration with Centro Svizzero di Calcolo Scientifico - CSCS Usability: In collaboration with CSCS, we offer to infer model/parameter of your problem using the most powerful super computer in Europe (CRAY) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
ABC for Model Comparison Standard Bayesian approach to model comparison involves Bayes factors and posterior probabilities of model index Posterior probabilities are poorly estimated by ABC (theory and simulation) Alternative approach: select the most likely model using a classifier and compute ABC approximation of the posterior predictive error rate Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
ABC for Model Comparison ABC for model comparison (Part I) Observe an empirical graph G Identify alternative possible generative network models M1 and M2 Draw model index from the model prior: τ1 = P(M = 1) = P(M = 2) = τ2 Draw parameter θ∗ from the prior p(θ|M) Simulate graph G∗ from the given generative network model using parameter θ∗ Accept θ∗ if ρ(S(G∗), S(G)) ≤ using any summaries S Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
ABC for Model Comparison ABC for model comparison (Part II) Accepted draws form the ABC approximation of the joint posterior p(θ, M|y) Generate n independent pseudo-data sets for each such draw (ABC approximation of the posterior predictive distribution) Compute posterior error rate using a random forest classifier (i.e., how frequently it returns the true model index) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Super Lerner for Model Selection Select model based on ABC training data with Super Learner (SL) SL = ensemble algorithm for prediction Combines a library of candidate algorithms Given loss function, SL aims for the lowest cross-validated risk Discrete SL minimizes over library of candidate algorithms Full SL minimizes over convex combinations of candidates algorithms Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Given bounded loss function, both version of SL have Oracle property The user is able to select: Loss function Candidate algorithms Predictors for each candidate algorithms This flexibility is invaluable for models with intractable likelihood Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Procedure for Model Selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
We use as loss function = 1 − AUC AUC is the area under the receiver operating characteristic curve Choice of statistics to use for calibration and to use as predictors Unlikely to obtain sufficiency given the nature of the likelihood Statistics for calibration: Need to characterize similarities between candidate models Ensure candidate models can plausibly generate the observed network Statistics as predictors: Need to characterize the differences between candidate models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Model for Proof of Concept Model (2 mechanisms) for graphs with fixed node and edge count 1 Pick two unconnected nodes uniformly at random 2 Determine probability to add edge based on number of closed triangles 3 Add edge based on determined probability 4 Repeat until requisite edge count is reached Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Simulation Studies Simulation for selection between: Full model with both mechanism Submodel with only first mechanism 10000 networks generated from each as training data p0 = 0.3 and p1 = 0.1 for both candidates Edge count vary over 500, 1000, 2000 p2 for full model vary between 0.05, 0.03, 0.01, 0.005 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Statistic for calibration: edge count Statistics as predictors: Triangle count Average local clustering coefficient Quartiles of degree distribution Candidate algorithms for SL: Support vector machine k-nearest neighbors Random forest Performance will be evaluated on training data directly via cross-validation Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Conclusions ABC: very powerful methodology for sound statistical inference in LHD free settings Two distinct paradigms to the study of networked systems: Statistical modeling of networks (statistics) Network science (physics and applied mathematics) Statistical network science, a mixture of these two paradigms, appears potentially promising in the era of complex data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
Further research lines Fast ABC inference via surrogate model for network Inference both on stochastic process (eg. spreading process) on a dynamic network Embed Super Learner in UQ framework Antonietta Mira / ICS and DISAT Approximate Bayesian Computation

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Approximate Bayesian Computation for Inference in Generative Models for Large-scale Network Data - Antonietta Mira, Dec 13, 2017

  • 1.
    Approximate Bayesian Computation forinference in generative models for large-scale network data Antonietta Mira Data Science, Group, ICS Università della Svizzera italiana, USI, Lugano, Switzerland and Department of Science and High Technology, DISAT Università dell’Insubria, Como, Italy joint with JP Onnela, Sixing Chen Harvard School of Public Health, US Ritabrata Dutta ICS, USI 13.12.17, SAMSI, USA Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 2.
    Agenda Large-scale network dataand Network models Approximate Bayesian Computation ABC for large-scale network data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 3.
    Large-scale network data:Digitized Relationship Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 4.
    Relationships =⇒ Adjacencymatrix Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 5.
    Relationships =⇒ Adjacencymatrix + edge weights + node covariates + time + spatial coordinates + . . . = complex/large network data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 6.
    Network data social network:friendships financial network: EU overnight interbank money transfer economy: EU countries import-export biology/ genetics networks: protein-protein interaction communication network: CDR information / knowledge network: patents citation / collaboration network: wikipedia / co-authorship Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 7.
    Motivation Systems of scientificand societal interest have large numbers of interacting components Representation as networks: node = component, edge = interaction Different models of network data: Models of network structure (e.g. Erdös-Rényi) Models of dynamical processes on networks (e.g. SI model) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 8.
    Communication network of7 million nodes + 23 million ties JP Onnela et al. PNAS, 2007 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 9.
    Big picture ofstatistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 10.
    Big picture ofstatistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 11.
    Two types ofmodels Statistical model py|θ(yi |θ) = 1 √ 2πσ2 exp − 1 2σ2 (yi − µ)2 , θ = (µ, σ) Generative model → given θ = (µ, σ) →zi ∼ N(0, 1) →yi = µ + σzi →yi ∼ py|θ(yi |θ) In some setting easier to give the model in the generative form Sometimes there is no 1:1 correspondence bwn statistical and generative model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 12.
    Big picture ofstatistical inference GIVEN: Data = y = (y1, . . . , yn) Model which describes data, py|θ(y|θ) indexed by Parameters = θ = (θ1, . . . , θd ) Prior probability density function for θ, pθ WANTED: Some probabilistic statement about θ and model point estimation confidence/credible intervals hypothesis testing prediction model selection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 13.
    Likelihood-based statistical inference Likelihoodfunction: L(θ) ∝ py|θ(y|θ) Plays a central role in statistical inference Maximum likelihood estimation: ˆθMLE = argmaxθ L(θ) Bayesian inference: pθ|y (θ|y) ∝ L(θ)pθ(θ) Not available for generative models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 14.
    Why generative models? Allowto use knowledge domain on how the data were generated without having to make excessive compromises in the modeling Scale well with big data No limits on the number of unobserved/latent variables Easier to study the effect of interventions on generative models rather than statistical models LFI: Procedure to obtain probabilistic statements about θ even if likelihood is not available, as for generative models ABC: is a LFI methodology Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 15.
    Examples of generativemodels Astrophysics: Simulating the formation of galaxies, stars, or planets Evolutionary biology: Simulating species evolution Ecology: Simulating species migration over time Neuroscience: Simulating neural circuits Epidemiology: Simulating the spread of an infectious disease Meteorology: Simulating weather prediction Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 16.
    Approximate Bayesian Computation(ABC) Starting point is Bayes’ theorem: p(θ|y) = p(y|θ)p(θ) p(y) p(θ|y) = posterior p(y|θ) = likelihood p(θ) = prior p(y) = evidence Evaluation of the LHD may be computationally expensive or infeasible, ruling out both likelihood-based and posterior-based inference Approximate Bayesian Computation (ABC) avoids direct evaluation of the LHD and approximates it by generating synthetic data (synthetic observations) by simulation from the model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 17.
    Approximate Bayesian Computation(ABC) ABC rejection sampler is the simplest form of ABC ABC rejection sampler Sample parameter θ∗ from the prior p(θ) Simulate dataset y∗ under the given model specified by θ∗: y∗ ∼ p(·|θ∗) Accept θ∗ if d(y∗, y) ≤ Distance measure d(y∗, y) determines the level of discrepancy between the simulated data y∗ and the observed data y The accepted θ∗ are approximately distributed according to the desired posterior and, crucially, obtained without the need of explicitly evaluating the LHD Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 18.
    Approximate Bayesian Computation(ABC) M(θ) Generative model Nature y0 y∗ Observed Dataset Simulated Dataset Data Space Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 19.
    Approximate Bayesian Computation(ABC) It may be unfeasible to compute the distance d(y∗, y) for high-dimensional data Lower dimensional summary statistic S(y) to capture the relevant information in y Comparison is done between S(y∗) and S(y): accept θ∗ if d(S(y∗), S(y)) ≤ Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 20.
    Principle behind likelihood-freeinference Replace LHD, py (y|θ), by SUMMARY LHD pS (S(y)|θ) where S(y) = summary statistics But pS (S(y)|θ) is also unknown Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 21.
    Principle behind likelihood-freeinference Replace LHD, py (y|θ), by SUMMARY LHD pS (S(y)|θ) where S(y) = summary statistics But pS (S(y)|θ) is also unknown Use an APPROXIMATE SUMMARY LHD: ˜pS (S(y)|θ), based on pseudo data y∗ generated from the model The POSTERIOR is also approximate and summarized: ˜pS (θ|S(y)) ∝ p(θ)˜pS (S(y)|θ) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 22.
    Approximate Bayesian Computation(ABC) If S is sufficient wrt θ, then it contains all information in y about θ (by definition), and using S(y) in place of the full dataset does not introduce any error For most models it may be impossible to find sufficient statistics S, in which case application relevant summary statistics are used Use of non-sufficient summary statistics introduces a further level of approximation Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 23.
    Network Models Distinction betweentwo types of models of network structure: Statistical models (e.g. ERGM, Goyal-Blitzstein-DeGruttola) DATA DRIVEN Pros: inference on model parameters; hypothesis testing; model selection Cons: scalability; hard to incorporate domain knowledge Generative models (e.g. Price model) KNOWLEDGE DRIVEN assume that microscopic mechanisms that govern network formation and evolution are known, ask what happens if we apply these mechanisms repeatedly Pros: scalability, easy to incorporate knowledge, intervention Cons: no inferential tools; no model comparison Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 24.
    Generative Model ofSocial and Contact Networks From the perspective of time expenditure of subject i: spend time with existing friends (a) become friend of a friend (b) make totally new friends (c) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 25.
    Generative Model ofSocial and Contact Networks Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 26.
    Generative Model ofSocial and Contact Networks Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 27.
    ABC for GenerativeNetwork Models ABC + generative network model = generic + sound inferential framework ABC rejection sampler for generative network models Observe an empirical graph G Set up generative network model M Sample parameter θ∗ from the prior p(θ) Simulate graph G∗ from model M using parameter θ∗ Accept θ∗ if d(S(G∗), S(G)) ≤ Some simple network summaries: degree sequence, k-stars, subgraph counts, centrality measures (betweenness, eigenvector, random walk, etc.), etc. Can use KNN to identify points in the space of summary statistics close to S(G) From Rejection-ABC to SMC-ABC by Drovandi+Pettitt, 2015 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 28.
    Exact inference forErdös-Rényi (ER) For the Erdös-Rényi model the LHD is available: Aij = dyad (pair of nodes): value 1 (connected) or 0 (not connected) Yij = RV coding the state of the dyad p = P(Yij = 1) N = number of nodes n = number of dyads: N(N − 1)/2 (undirected) and N(N − 1) (directed) L = number of connected dyads (sufficient statistics) LHD = p(G|p) = i=j P(yij = Aij ) = i=j pAij (1−p)1−Aij = pL (1−p)n−L MLE = ˆp = L/n = proportion of connected dyads to all dyads Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 29.
    Exact Bayesian inferencefor ER Beta prior p(θ) = Beta(α = 5, β = 50) Beta posterior p(θ|D) = Beta(α + L, β + [N(N − 1)/2) − L]) Bayesian Estimator = posterior mean Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 30.
    Approximate Bayesian inferencefor Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 31.
    Approximate Bayesian inferencefor Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 32.
    Approximate Bayesian inferencefor Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L We perform 100,000 draws from the prior pi For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summary statistics S(G1 i ), . . . , S(GK i ), retain pi if S(Gi ) = Li = L = S(G) We retained only 162 values of pi Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 33.
    Approximate Bayesian inferencefor Erdös-Rényi Simulate observed ER graph G from a model with N = 100 and p = 0.05 Number of edges = summary statistic = S(G) = L We perform 100,000 draws from the prior pi For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summary statistics S(G1 i ), . . . , S(GK i ), retain pi if S(Gi ) = Li = L = S(G) We retained only 162 values of pi The mean and median of the prior are 0.091 and 0.086 the 95% prior credible interval is [0.031, 0.178] The mean and median of the posterior are 0.050 and 0.050 the 95% posterior credible interval is [0.043, 0.055] Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 34.
    Approximate Bayesian inferencefor Erdös-Rényi Degree distribution = summary statistic = S(G) KS = distance measure We performed 100,000 draws from the prior For each pi , we generate a sequence of graphs G1 i , . . . , GK i , compute corresponding summaries S(G1 i ), . . . , S(GK i ), obtain a sequence of distances y1 i = d(S(G1 i ), S(G)), . . . , yK i = d(S(GK i ), S(G)) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 35.
    Approximate Bayesian inferencefor Erdös-Rényi Pool the distances yk i for all values of i and s: 100, 000 · K total Choose a cutoff value d∗ = 10th percentile Higher acceptance rate: 761 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 36.
    Approximate Bayesian inferencefor Erdös-Rényi Pool the distances yk i for all values of i and s: 100, 000 · K total Choose a cutoff value d∗ = 10th percentile Higher acceptance rate: 761 The mean and median of the prior are 0.09 and 0.09 the 95% prior CI is [0.03, 0.18] The mean and median of the posterior are 0.05, 0.05 the 95% posterior CI is [0.04, 0.06] Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 37.
    Inference for Erdös-Rényi TRUE: densityfunctions for the prior (solid green) and the posterior (solid red) Here the prior distribution is the beta distribution B(α = 5, β = 50) True value of the parameter is θ = p = 0.05 = black vertical line ESTIMATED: Prior p(θ) (green histogram) and posterior p(θ|y) (red histogram) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 38.
    Approximate Bayesian inferencefor Easley-Kleinberg (EK) The EK model is a simple model of directed networks Unlike ER, the EK model is a growing network model new vertices attach themselves to existing ones using a preferential attachment rule: each new node is not equally likely to attach itself to any of the existing nodes but, instead, has a preference for some nodes over others based on their degree More specifically, linear preferential attachment specifies that an incoming node will attach to an existing vertex vi with probability proportional to its degree ki More complicated functions of ki are possible, and in general one can incorporate nodal attributes, or covariates, in the expression for the attachment function RESULTS: Similar performance to the Erdös-Rényi model Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 39.
    Inference for socialnetwork generative model ABC + social network generative model Prior and posterior draws True parameter values: 0.3 and 0.6 (solid lines) Posterior means: 0.299 and 0.613 (dashed lines) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 40.
    Summary Simulator-based models: Generativemodel for networks Challenge: Inference of parameters of generative model is challenging due to intractability of likelihood function Solution: Approximate Bayesian Computation Application: ABC works for different mechanistic models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 41.
    Spreading Process onNetwork We study the spread of an epidemic / fake news on a network: Can be considered as a spreading process on a network Data = a fixed and known human contact interaction network = a fixed and know social network (N) Data = at some time points over a period of observation we know which nodes in the network are infected = we know which websites report the fake news Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 42.
    Simple spreading process:Susceptible-Infected (SI) Rate of diffusion (θ): How fast a disease or a fake news spreads on a network? Seed-node (nsn): Can we infer the source of the infection or of the fake news? Algorithm 1 SI spreading process, over the network N starting at n0 SN, for each time-point t = 0, . . . , T 1: for t = 0 to T do 2: It = Infected nodes at time t 3: for each i ∈ It do 4: Select j from the neighbors of i with equal probability 5: If neighbor j is already infected, do nothing If not, infect it with probability θ 6: end for 7: end for Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 43.
    Inference of SIprocess on networks? M : SI y (θ, nsn)? Inference of the parameters (θ, nsn) defining the SI process Challenging problem: due to the intractability of the likelihood New types of inferencial schemes are required Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 44.
    ABC for spreadingprocess on networks ABC for spreading process Observe temporal data on network y Set up a model M described by SI process Sample parameter θ∗, n∗ SN from the prior p(θ, nSN) Simulate temporal data y∗ from model M using θ∗, n∗ SN Accept θ∗, n∗ SN if d(S(y∗), S(y)) ≤ using application relevant summaries S For inference, we use simulated annealing ABC SABC, Statistics and Computing, 2015 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 45.
    Simulated network topologies Networkstructure and its connectivity affect the disease spread Two different network topologies 1 Barabasi-Albert network (BA): scale-free degree distribution 2 Erdös-Rènyi network (ER): edges are independent and equally likely 3 BA(m=4) and ER(p=0.05) network with 100 nodes 4 First observed time slice: t0 5 Length of the simulated SI process: T Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 46.
    Details of ABC Needto define: 1 Model M: SI process as a generative model 2 Summary statistics S : y → S(y) 3 discrepancy measure d(S(y), S(y∗)) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 47.
    Details of ABC Statistics:Two summary statistics s = (s1, s2, . . . , sT ), si = proportion of infected nodes at ti If whole network is infected at time τ, then st = 1 for t > τ G = (G1, G2, . . . , GT ), Gi = sub-graph of infected nodes at ti Distance: Two distances combined 1 d1(s1 , s2 ) = 1 T T t=1(s1 t − s2 t )2 2 d2(G1 , G2 ) = 1 T T t=1 i∈G1 t j∈G2 t ρ(i,j) ρmax , where ρ(i, j) = the shortest path between nodes i and j and ρmax = maximum shortest path length on the network 3 Final distance: d((s1 , G1 ), (s2 , G2 )) = 1 2 d1(s1 , s2 ) + d2(G1 , G2 ) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 48.
    Details of ABC Prioron the seed node: Uniform on the infected nodes at t1 Prior on the infectivity parameter: Uniform on [0, 1] Perturbation Kernel: A perturbation kernel used to explore the parameter space is defined as a distribution on (θ, nSN) given the present parameter values (θ∗, n∗ SN), K((θ, nSN)|(θ∗, n∗ SN)) = K1(θ|θ∗, ˆσ)K2(nSN|n∗ SN) where K1 = Gaussian with ˆσ being the variance of the θ sampled in the previous step of SABC optimally rescaled K2 = discrete distribution on the neighboring nodes of n∗ SN with each node having a probability inversely proportional to its degree Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 49.
    BA (top) &ER (bottom): marginal posterior (SI) First observed time slice: t0 = 20 Length of the simulated SI process: T = 70 0.0 0.5 1.0 θ 0.0 4.7 9.4 Density P(θ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.07 0.14 Density 0.0 0.5 1.0 θ 0.0 3.4 6.9 Density P(θ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.03 0.06 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 50.
    BA (top) &ER (bottom): Bayes Estimate (SI) Average over 100 simulated dataset for same true parameter value 0.0 0.5 1.0 ̂θ 0.0 ̂.6 9.1 Density θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.34 0.69 Density 0.0 0.5 1.0 ̂θ 0.0 4.̂ 8.5 Density θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.20 0.39 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 51.
    Complex contagion process(CC): Modeling fake news on social network Rate of exposure (β): Can we infer the rate of exposure of a node to infected nodes? Threshold (γ): Can we infer the percentage of infected neighboring nodes needed to infect a new node or to make it believe the fake news? Seed-node (nsn): Can we infer the source of the infection or of the fake news? For algorithmic details please consult Dutta et. al. 2017 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 52.
    BA (top) &ER (bottom): marginal posterior (CC) First observed time slice: t0 = 20 Length of the simulated SI process: T = 120 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 1.0 00 1.000 2 .0 0 0 3 .0 0 0 4.0 00 5.000 6.000 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.32 0.64 Density 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 2 .0 0 0 4.000 6.000 8.0 00 10.0 00 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.25 0.50 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 53.
    BA (top) &ER (bottom): Bayes Estimate (CC) Over 100 simulated dataset for same true parameter value 0.0 0.5 1.0 ̂ β 0.0 0.5 1.0 ̂γ 15.0 00 ̂0.000 45.0 00 θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.48 0.97 Density 0.0 0.5 1.0 ̂ β 0.0 0.5 1.0 ̂γ 10.0 00 ̂0.0 003 0 .0 0 0 40.000 θ0 0 1 2 3 4 5 ρ( ̂nSN,n0 SN̂ 0.00 0.38 0.77 Density Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 54.
    BA (left) &ER (right): Dependence of inference on ∆t = T − t0 0.0 0.5 1.0 θ 0.0 5.1 10.2 Density ΔT =10 ΔT =Δ0 ΔT =50 θ0 0.0 0.5 1.0 θ 0.0 4.1 8.2 Density ΔT =10 ΔT =Δ0 ΔT =50 θ0 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 0 .8 0 0 1 .6 0 0 2.40 0 3.200 4.0 00 0 .8 0 0 0.800 0.8 00 1 .6 0 0 2 .4 0 0 3.20 0 4.0 00 0 .8 0 0 0 .8 0 0 1.60 0 2.400 3 .2 0 0 4 .0 0 0 4.80 0 ΔT Δ 28 ΔT Δ 46 ΔT Δ 100 θ0 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 0 .6 0 0 1.200 1 .8 0 0 2.4 00 3 .0 0 0 3.600 1.5 00 3.0 00 4 .5 0 0 6.000 7.500 1 .5 0 0 3.0 00 4.5 00 6.0 00 7.5 00 ΔT Δ 28 ΔT Δ 46 ΔT Δ 100 θ0 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 55.
    Real Network: SIon Indian village contact network Human-human interaction network by considering a financial network between 354 villagers living in South Indian state of Karnataka 354 nodes and 1541 edges 0.0 0.5 1.0 θ 0.0 5.8 11.6 Density P(θ|x0) θ0 ̂θ 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.01 0.03 Density Marginal posterior distribution The shortest path length between n0 SN = 70 and ˆnSN = 59 is 1 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 56.
    Real Network: CCon Facebook social network 4039 nodes and 88234 edges 0.0 0.5 1.0 β 0.0 0.5 1.0 γ 2.500 5 .0 0 0 7 .5 0 0 10.0 00 12.5 00 P(β, γ|x0) θ0 0 1 2 3 4 5 ρ(nSN,n0 SN) 0.00 0.11 0.22 Density Marginal posterior distribution. The shortest path length between n0 SN = 2000 and ˆnSN = 2435 is 1 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 57.
    Summary Simulator-based models: Adiffusion process on a fixed network Challenge: Inference of parameters of spreading process and seed-node is challenging due to intractability of likelihood Solution: Approximate Bayesian Computation Application: ABC works for different simulated and real network topologies Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 58.
    Computational facility ABCPy: Apython suite of ABC: user friendly and modular Paralleized: The most popular ABC algorithms are parallelized Super-computers: Work done in collaboration with Centro Svizzero di Calcolo Scientifico - CSCS Usability: In collaboration with CSCS, we offer to infer model/parameter of your problem using the most powerful super computer in Europe (CRAY) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 59.
    ABC for ModelComparison Standard Bayesian approach to model comparison involves Bayes factors and posterior probabilities of model index Posterior probabilities are poorly estimated by ABC (theory and simulation) Alternative approach: select the most likely model using a classifier and compute ABC approximation of the posterior predictive error rate Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 60.
    ABC for ModelComparison ABC for model comparison (Part I) Observe an empirical graph G Identify alternative possible generative network models M1 and M2 Draw model index from the model prior: τ1 = P(M = 1) = P(M = 2) = τ2 Draw parameter θ∗ from the prior p(θ|M) Simulate graph G∗ from the given generative network model using parameter θ∗ Accept θ∗ if ρ(S(G∗), S(G)) ≤ using any summaries S Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 61.
    ABC for ModelComparison ABC for model comparison (Part II) Accepted draws form the ABC approximation of the joint posterior p(θ, M|y) Generate n independent pseudo-data sets for each such draw (ABC approximation of the posterior predictive distribution) Compute posterior error rate using a random forest classifier (i.e., how frequently it returns the true model index) Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 62.
    Super Lerner forModel Selection Select model based on ABC training data with Super Learner (SL) SL = ensemble algorithm for prediction Combines a library of candidate algorithms Given loss function, SL aims for the lowest cross-validated risk Discrete SL minimizes over library of candidate algorithms Full SL minimizes over convex combinations of candidates algorithms Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 63.
    Antonietta Mira /ICS and DISAT Approximate Bayesian Computation
  • 64.
    Given bounded lossfunction, both version of SL have Oracle property The user is able to select: Loss function Candidate algorithms Predictors for each candidate algorithms This flexibility is invaluable for models with intractable likelihood Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 65.
    Procedure for ModelSelection Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 66.
    We use asloss function = 1 − AUC AUC is the area under the receiver operating characteristic curve Choice of statistics to use for calibration and to use as predictors Unlikely to obtain sufficiency given the nature of the likelihood Statistics for calibration: Need to characterize similarities between candidate models Ensure candidate models can plausibly generate the observed network Statistics as predictors: Need to characterize the differences between candidate models Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 67.
    Model for Proofof Concept Model (2 mechanisms) for graphs with fixed node and edge count 1 Pick two unconnected nodes uniformly at random 2 Determine probability to add edge based on number of closed triangles 3 Add edge based on determined probability 4 Repeat until requisite edge count is reached Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 68.
    Simulation Studies Simulation forselection between: Full model with both mechanism Submodel with only first mechanism 10000 networks generated from each as training data p0 = 0.3 and p1 = 0.1 for both candidates Edge count vary over 500, 1000, 2000 p2 for full model vary between 0.05, 0.03, 0.01, 0.005 Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 69.
    Statistic for calibration:edge count Statistics as predictors: Triangle count Average local clustering coefficient Quartiles of degree distribution Candidate algorithms for SL: Support vector machine k-nearest neighbors Random forest Performance will be evaluated on training data directly via cross-validation Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 70.
    Antonietta Mira /ICS and DISAT Approximate Bayesian Computation
  • 71.
    Conclusions ABC: very powerfulmethodology for sound statistical inference in LHD free settings Two distinct paradigms to the study of networked systems: Statistical modeling of networks (statistics) Network science (physics and applied mathematics) Statistical network science, a mixture of these two paradigms, appears potentially promising in the era of complex data Antonietta Mira / ICS and DISAT Approximate Bayesian Computation
  • 72.
    Further research lines FastABC inference via surrogate model for network Inference both on stochastic process (eg. spreading process) on a dynamic network Embed Super Learner in UQ framework Antonietta Mira / ICS and DISAT Approximate Bayesian Computation