Approximate Bayesian computation and machine learning Pierre Pudlo Universit ´e Montpellier 2 Institut de Math´ematiques et Mod´ elisation de Montpellier (I3M) Institut de Biologie Computationelle Labex NUMEV BigMC: 19 juin 2014 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 1 / 40
Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 2 / 40
Joint work with Jean-Marie Cornuet, Arnaud Estoup, Mathieu Gautier and Renaud Vitalis Jean-Michel Marin, Christian Robert, Mohammed Sedki & Julien Stoehr Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 3 / 40
Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 4 / 40
Motivation Issue. When the likelihood function f(yobsj) is expensive or impossible to calculate, it is extremely difficult to sample from the posterior distribution (jyobs). Two typical situations: I Latent vector u of high dimension, including combinatorial structures (trees, graphs) f(yj) = Z f(y, uj)du I Unknown normalizing constant f(xj) = g(y, ) Z() (eg. Markov random fields, exponential graphs,. . . ) Example. Population genetics: likelihood of the current genetic data depend on a latent genealogy (tree with random branch lengths) ABC is a technique that only requires being able to sample from the likelihood f(j). This technique stemmed from population genetics models, about 15 years ago, and population geneticists still significantly contribute to methodological developments of ABC. MCMC algorithms (Felsenstein, Hey,. . . ) get stuck on a tree of high probability when the scenario is complex. (Multimodal distribution on a combinatorial space) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 5 / 40
Population genetics The typical questions of Arnaud Estoup: I How the Asian ladybird arrived in Europe? I Why does they swarm right now? I What are the routes of invasion? I What was the size of the colony that founded a new population? Answer using molecular data, Kingman’s coalescent process and ABC inference procedures! Under neutrality, the frequencies of various allele evolves according to a random process, which depends on the demography. Sous l’hypothèse de neutralité des marqueurs les mutations sont indépendantes de la i.e. la généalogie ne dépend que des processus On construit donc la généalogie selon démographiques (ex. N), puis mutations branches, l’arbre On obtient polymorphisme démographiques considérés The coalescent put a random distribution on the genealogy of a sample of genes, backward in time, until the most recent common ancestor. Add then the mutation process along the branches of the genealogy Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 6 / 40
Population genetics (cont’) Divergence Pop1 Pop1 Pop2 (a) t0 t Admixture r 1 - r 2.5 Conclusion 37 Pop1 Pop3 Pop2 (b) t0 t Migration m21 m12 Pop1 Pop2 (c) t0 t FIGURE 2.3: Représentations graphiques des trois types d’évènements inter-populationnels d’un scénario démographique. Il existe deux familles d’évènements inter-populationnels. La première famille est simple, elle correspond aux évènement inter-populationnels instantanés. C’est le cas d’une divergence ou d’une admixture. (a) Deux populations qui évoluent pour se fusionner dans le cas d’une divergence. (b) Trois po-pulations Between populations: three types of events, backward in time I the divergence is the fusion between two populations, I the admixture is the split of a population into two parts, I the migration allows the move of some lineages of a population to another. The goal is to discriminate between different population scenarios from a dataset of polymorphism (DNA sample) yobs observed at the present time. qui évoluent en parallèle pour une admixture. Pour cette t5 situation, chacun des tubes représente (on peut imaginer qu’il porte à l’intérieur) la généalogie de la population qui évolue indépendamment des autres suivant un coalescent de Kingman. La deuxième correspond à la présence d’une migration.(c) Cette situation est légèrement plus compliquée que la précédente à cause des flux de gènes (plus d’indépendance). Ici, un seul processus évolutif gouverne les deux populations réunies. La présence de migrations entre les populations Pop1 et Pop2 implique des déplacements de lignées d’une population à l’autre et ainsi la concurrence entre les évènements de coales-cence et de migration. A complex scenario: Ne04 Divergence Pop1 Ne1 Pop4 Ne4 Ne6 Pop6 s 1 - s Admixture Ne3 Pop3 Ne2 Pop2 Ne5 Pop5 Migration m0 m t04 t4 t = 0 Ne4 t3 t2 t1 r 1 - r FIGURE 2.1: Exemple d’un scénario évolutif complexe composé d’évènements inter-populationnels. Ce scénario implique quatre populations échantillonnées Pop1, . . . , Pop4 et deux autres populations non-observées Pop5 et Pop6. Les branches de ce schéma sont des tubes et le scénario démographique contraint la généalogie à rester à l’intérieur de ces tubes. La migration entre les populations Pop3 et Pop4 sur la période [0, t3] est paramétrée par les taux de migration m et m0. Les deux évènements d’admixture sont pa-ramétrés Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 7 / 40 par les dates t1 et t3 ainsi que les taux d’admixture respectifs r et s. Les trois évènements restants sont des divergences, respectivement en t2, t4 et t5. L’événement en t04 correspond à un changement de taille
ABC recap Likelihood free rejection sampling Rubin (1984) The Annals of Statistics Tavar ´e et al. (1997) Genetics Pritchard et al. (1999) Mol. Biol. Evol. 1) Set i = 1, 2) Generate 0 from the prior distribution (), 3) Generate z0 from the likelihood f(j0), 4) If ((z0), (yobs)) 6 , set (i, zi) = (0, z0) and i = i + 1, 5) If i 6 N, return to 2). We keep the ’s values such that the distance between the corresponding simulated dataset and the observed dataset is small enough. Tuning parameters I 0: tolerance level, I (z): function that summarizes datasets, I (, 0): distance between vectors of summary statistics I N: size of the output ABC target: (, zjyobs) =
, z (z), (yobs) 6 / ()f(zj)1 (z), (yobs) 6 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 8 / 40
Another point of view Aim. Compute an approximation of
(yobs) Likelihood free rejection sampling 1) Set i = 1, 2) Generate 0 from the prior distribution (), 3) Generate z from the likelihood f(j0), 4) If ((z), (yobs)) 6 , set i = 0 and i = i + 1, 5) If i 6 N, return to 2). The general ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Use nonparametric methods to
(yobs) infer the conditional distribution The set of (, z) computed in A) is often called the reference table. If B) = kernel density estimate of the conditional density, = the bandwidth Likewise, if k-nearest neighbor estimate of
(yobs) in B), = the largest distance among the accepted particles. See Biau, C´erou, and Guyader (2014) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 9 / 40
Another point of view (cont’) The general ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Use nonparametric methods to
(yobs) infer the conditional distribution As we can see, calibrating is a challenging question. Either I a bandwidth, I the number of neighbors. Cannot be known before stage A) Other samplers do not send everywhere (hence, less time consuming) I ABC-MCMC (Marjoram et al., 2003) I Sequential methods: ABC-PMC (Beaumont et al. 2009), ABC-SMC (Del Moral et al. 2009) I Adaptive sequential methods. . . Drawbacks. I how to calibrate ? I if informative prior, the amount of time we can save with such algorithms is not large ) I will not talk about those clever algorithms Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 10 / 40
New light on Beaumont’s postprocessing Beaumont’s ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Keep the particles (, z) such that ((yobs), (z)) 6 C) Fit (in
and 0) a weighted, local linear model explaining by z on the kept particles: =
(z - yobs) +N (0, 2) Replace the 's by the residuals, centered around c0. How to explain the algorithm from a learning point of view? Assume we want to approximate the expected value of the posterior distribution (j(yobs)), i.e., E
(z) = (yobs) This is a regression problem. We can use some weighted, local linear model. Beaumont’s postprocessing method (stage C)) is a local linear model. See also Blum and Franc¸ois (2010) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 11 / 40
Other issues I will skip How to choose the set of summary statistics? I Joyce and Marjoram (2008, SAGMB) I Nunes and Balding (2010, SAGMB) I Fearnhead and Prangle (2012, JRSS B) I Ratmann et al. (2012, PLOS Comput. Biol) I Blum et al. (2013, Statistical science) In the context of ABC model choice? I Barnes et al. (2012, Stat. and Comp.) I Fearnhead et al. (2014, SAGMB) I Estoup, Lombaert, Marin, Guillemaud, Pudlo, Robert, Cornuet (2012, Molecular Ecology) Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics EP-ABC Barthelme Chopin (2013, JASA), . . . Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 12 / 40
Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 13 / 40
Model choice in population genetics Included in the software DIYABC, Cornuet J-M, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin J-M, Estoup A (2014, Bioinformatics) Example. Does pygmies share a common ancestral populations? !#$%'()*+,(-*.(/+0$'1)()$/+2!,03! 1/+*%*'4*+56(47()$/.+.1#+4*.+8-9':*.+ 94 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 14 / 40
Model choice in population genetics Included in the software DIYABC, Cornuet J-M, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin J-M, Estoup A (2014, Bioinformatics) !#$%'()*+,(-*.(/+0$'1)()$/+2!,03! 1/+*%*'4*+56(47()$/.+.1#+4*.+8-9':*.+ Example. Does pygmies share a common ancestral populations? 96 Différents scénarios possibles, choix de scenario par ABC Verdu et al. 2009 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 15 / 40
Another example: Worldwide routes of invasion of Harmonia axyridis For each outbreak, the arrow indicates the most likely invasion pathway and the associated posterior probability value (P), with 95% credible intervals in brackets The harlequin ladybird was introduced to fight against another deleterious instect: hop aphid (since 1916 in North America, since 1982 in Europe) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 16 / 40
Harmonia axyridis cont’ From a scenario on the map to a tractable model An evolutionary scenario The model Kingman’s coalescent is constrainted to live in the above “pipes” Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 17 / 40
Population genetics (cont’) For one population, the joint likelihood of the genealogy and of the polymorphism observed at the present time is available. However, the genealogical tree is a nuisance parameter, and calculation of the likelihood for the parameters involves integrating out the unknown tree. That is not possible. This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest. Choosing a population scenario is a model choice problem. We consider a collection of M models, for m 2 f1, . . . ,Mg: fm(yobsjm) and m(m) Prior on model index: (1) = P(M = 1), . . . , (M) = P(M = M) The Bayesian model choice is driven by P(M Z = mjyobs) / (m) i fi(yobsji)i(i)di Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 18 / 40
ABC model choice ABC model choice A) Generate a large set of (m, , z) from the Bayesian model, (m)m()fm(zj) B) Keep the particles (m, , z) such that ((yobs), (z)) 6 C) For each m, return cpm = porportion of m among the kept particles If is tuned so that the number of kept particles is k, then cpm is a k-nearest neighbor estimate of E 1 M = m

Approximate Bayesian computation and machine learning (BigMC 2014)

  • 1.
    Approximate Bayesian computation and machine learning Pierre Pudlo Universit ´e Montpellier 2 Institut de Math´ematiques et Mod´ elisation de Montpellier (I3M) Institut de Biologie Computationelle Labex NUMEV BigMC: 19 juin 2014 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 1 / 40
  • 2.
    Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 2 / 40
  • 3.
    Joint work with Jean-Marie Cornuet, Arnaud Estoup, Mathieu Gautier and Renaud Vitalis Jean-Michel Marin, Christian Robert, Mohammed Sedki & Julien Stoehr Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 3 / 40
  • 4.
    Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 4 / 40
  • 5.
    Motivation Issue. Whenthe likelihood function f(yobsj) is expensive or impossible to calculate, it is extremely difficult to sample from the posterior distribution (jyobs). Two typical situations: I Latent vector u of high dimension, including combinatorial structures (trees, graphs) f(yj) = Z f(y, uj)du I Unknown normalizing constant f(xj) = g(y, ) Z() (eg. Markov random fields, exponential graphs,. . . ) Example. Population genetics: likelihood of the current genetic data depend on a latent genealogy (tree with random branch lengths) ABC is a technique that only requires being able to sample from the likelihood f(j). This technique stemmed from population genetics models, about 15 years ago, and population geneticists still significantly contribute to methodological developments of ABC. MCMC algorithms (Felsenstein, Hey,. . . ) get stuck on a tree of high probability when the scenario is complex. (Multimodal distribution on a combinatorial space) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 5 / 40
  • 6.
    Population genetics Thetypical questions of Arnaud Estoup: I How the Asian ladybird arrived in Europe? I Why does they swarm right now? I What are the routes of invasion? I What was the size of the colony that founded a new population? Answer using molecular data, Kingman’s coalescent process and ABC inference procedures! Under neutrality, the frequencies of various allele evolves according to a random process, which depends on the demography. Sous l’hypothèse de neutralité des marqueurs les mutations sont indépendantes de la i.e. la généalogie ne dépend que des processus On construit donc la généalogie selon démographiques (ex. N), puis mutations branches, l’arbre On obtient polymorphisme démographiques considérés The coalescent put a random distribution on the genealogy of a sample of genes, backward in time, until the most recent common ancestor. Add then the mutation process along the branches of the genealogy Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 6 / 40
  • 7.
    Population genetics (cont’) Divergence Pop1 Pop1 Pop2 (a) t0 t Admixture r 1 - r 2.5 Conclusion 37 Pop1 Pop3 Pop2 (b) t0 t Migration m21 m12 Pop1 Pop2 (c) t0 t FIGURE 2.3: Représentations graphiques des trois types d’évènements inter-populationnels d’un scénario démographique. Il existe deux familles d’évènements inter-populationnels. La première famille est simple, elle correspond aux évènement inter-populationnels instantanés. C’est le cas d’une divergence ou d’une admixture. (a) Deux populations qui évoluent pour se fusionner dans le cas d’une divergence. (b) Trois po-pulations Between populations: three types of events, backward in time I the divergence is the fusion between two populations, I the admixture is the split of a population into two parts, I the migration allows the move of some lineages of a population to another. The goal is to discriminate between different population scenarios from a dataset of polymorphism (DNA sample) yobs observed at the present time. qui évoluent en parallèle pour une admixture. Pour cette t5 situation, chacun des tubes représente (on peut imaginer qu’il porte à l’intérieur) la généalogie de la population qui évolue indépendamment des autres suivant un coalescent de Kingman. La deuxième correspond à la présence d’une migration.(c) Cette situation est légèrement plus compliquée que la précédente à cause des flux de gènes (plus d’indépendance). Ici, un seul processus évolutif gouverne les deux populations réunies. La présence de migrations entre les populations Pop1 et Pop2 implique des déplacements de lignées d’une population à l’autre et ainsi la concurrence entre les évènements de coales-cence et de migration. A complex scenario: Ne04 Divergence Pop1 Ne1 Pop4 Ne4 Ne6 Pop6 s 1 - s Admixture Ne3 Pop3 Ne2 Pop2 Ne5 Pop5 Migration m0 m t04 t4 t = 0 Ne4 t3 t2 t1 r 1 - r FIGURE 2.1: Exemple d’un scénario évolutif complexe composé d’évènements inter-populationnels. Ce scénario implique quatre populations échantillonnées Pop1, . . . , Pop4 et deux autres populations non-observées Pop5 et Pop6. Les branches de ce schéma sont des tubes et le scénario démographique contraint la généalogie à rester à l’intérieur de ces tubes. La migration entre les populations Pop3 et Pop4 sur la période [0, t3] est paramétrée par les taux de migration m et m0. Les deux évènements d’admixture sont pa-ramétrés Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 7 / 40 par les dates t1 et t3 ainsi que les taux d’admixture respectifs r et s. Les trois évènements restants sont des divergences, respectivement en t2, t4 et t5. L’événement en t04 correspond à un changement de taille
  • 8.
    ABC recap Likelihoodfree rejection sampling Rubin (1984) The Annals of Statistics Tavar ´e et al. (1997) Genetics Pritchard et al. (1999) Mol. Biol. Evol. 1) Set i = 1, 2) Generate 0 from the prior distribution (), 3) Generate z0 from the likelihood f(j0), 4) If ((z0), (yobs)) 6 , set (i, zi) = (0, z0) and i = i + 1, 5) If i 6 N, return to 2). We keep the ’s values such that the distance between the corresponding simulated dataset and the observed dataset is small enough. Tuning parameters I 0: tolerance level, I (z): function that summarizes datasets, I (, 0): distance between vectors of summary statistics I N: size of the output ABC target: (, zjyobs) =
  • 11.
    , z (z), (yobs) 6 / ()f(zj)1 (z), (yobs) 6 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 8 / 40
  • 12.
    Another point ofview Aim. Compute an approximation of
  • 15.
    (yobs) Likelihood free rejection sampling 1) Set i = 1, 2) Generate 0 from the prior distribution (), 3) Generate z from the likelihood f(j0), 4) If ((z), (yobs)) 6 , set i = 0 and i = i + 1, 5) If i 6 N, return to 2). The general ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Use nonparametric methods to
  • 18.
    (yobs) infer theconditional distribution The set of (, z) computed in A) is often called the reference table. If B) = kernel density estimate of the conditional density, = the bandwidth Likewise, if k-nearest neighbor estimate of
  • 21.
    (yobs) in B), = the largest distance among the accepted particles. See Biau, C´erou, and Guyader (2014) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 9 / 40
  • 22.
    Another point ofview (cont’) The general ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Use nonparametric methods to
  • 25.
    (yobs) infer theconditional distribution As we can see, calibrating is a challenging question. Either I a bandwidth, I the number of neighbors. Cannot be known before stage A) Other samplers do not send everywhere (hence, less time consuming) I ABC-MCMC (Marjoram et al., 2003) I Sequential methods: ABC-PMC (Beaumont et al. 2009), ABC-SMC (Del Moral et al. 2009) I Adaptive sequential methods. . . Drawbacks. I how to calibrate ? I if informative prior, the amount of time we can save with such algorithms is not large ) I will not talk about those clever algorithms Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 10 / 40
  • 26.
    New light onBeaumont’s postprocessing Beaumont’s ABC algorithm A) Generate a large set of (, z) from the Bayesian model, ()f(zj) B) Keep the particles (, z) such that ((yobs), (z)) 6 C) Fit (in
  • 27.
    and 0) a weighted, local linear model explaining by z on the kept particles: =
  • 28.
    (z - yobs)+N (0, 2) Replace the 's by the residuals, centered around c0. How to explain the algorithm from a learning point of view? Assume we want to approximate the expected value of the posterior distribution (j(yobs)), i.e., E
  • 31.
    (z) = (yobs) This is a regression problem. We can use some weighted, local linear model. Beaumont’s postprocessing method (stage C)) is a local linear model. See also Blum and Franc¸ois (2010) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 11 / 40
  • 32.
    Other issues Iwill skip How to choose the set of summary statistics? I Joyce and Marjoram (2008, SAGMB) I Nunes and Balding (2010, SAGMB) I Fearnhead and Prangle (2012, JRSS B) I Ratmann et al. (2012, PLOS Comput. Biol) I Blum et al. (2013, Statistical science) In the context of ABC model choice? I Barnes et al. (2012, Stat. and Comp.) I Fearnhead et al. (2014, SAGMB) I Estoup, Lombaert, Marin, Guillemaud, Pudlo, Robert, Cornuet (2012, Molecular Ecology) Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics EP-ABC Barthelme Chopin (2013, JASA), . . . Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 12 / 40
  • 33.
    Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 13 / 40
  • 34.
    Model choice inpopulation genetics Included in the software DIYABC, Cornuet J-M, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin J-M, Estoup A (2014, Bioinformatics) Example. Does pygmies share a common ancestral populations? !#$%'()*+,(-*.(/+0$'1)()$/+2!,03! 1/+*%*'4*+56(47()$/.+.1#+4*.+8-9':*.+ 94 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 14 / 40
  • 35.
    Model choice inpopulation genetics Included in the software DIYABC, Cornuet J-M, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin J-M, Estoup A (2014, Bioinformatics) !#$%'()*+,(-*.(/+0$'1)()$/+2!,03! 1/+*%*'4*+56(47()$/.+.1#+4*.+8-9':*.+ Example. Does pygmies share a common ancestral populations? 96 Différents scénarios possibles, choix de scenario par ABC Verdu et al. 2009 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 15 / 40
  • 36.
    Another example: Worldwideroutes of invasion of Harmonia axyridis For each outbreak, the arrow indicates the most likely invasion pathway and the associated posterior probability value (P), with 95% credible intervals in brackets The harlequin ladybird was introduced to fight against another deleterious instect: hop aphid (since 1916 in North America, since 1982 in Europe) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 16 / 40
  • 37.
    Harmonia axyridis cont’ From a scenario on the map to a tractable model An evolutionary scenario The model Kingman’s coalescent is constrainted to live in the above “pipes” Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 17 / 40
  • 38.
    Population genetics (cont’) For one population, the joint likelihood of the genealogy and of the polymorphism observed at the present time is available. However, the genealogical tree is a nuisance parameter, and calculation of the likelihood for the parameters involves integrating out the unknown tree. That is not possible. This modelling clearly differs from the phylogenetic perspective where the tree is the parameter of interest. Choosing a population scenario is a model choice problem. We consider a collection of M models, for m 2 f1, . . . ,Mg: fm(yobsjm) and m(m) Prior on model index: (1) = P(M = 1), . . . , (M) = P(M = M) The Bayesian model choice is driven by P(M Z = mjyobs) / (m) i fi(yobsji)i(i)di Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 18 / 40
  • 39.
    ABC model choice ABC model choice A) Generate a large set of (m, , z) from the Bayesian model, (m)m()fm(zj) B) Keep the particles (m, , z) such that ((yobs), (z)) 6 C) For each m, return cpm = porportion of m among the kept particles If is tuned so that the number of kept particles is k, then cpm is a k-nearest neighbor estimate of E 1 M = m
  • 42.
    (yobs) Actually,approximating the posterior probabilities of each model is a regression problem where M = m I the response is 1 , I the co-variables are the summary statistics (z), I the loss is L2 (conditional expectation) The prefered method to approximate the postererior probabilities in DIYABC is a local polytomous logistic regression Less sensible to k, but no clear gain in quadratic loss Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 19 / 40
  • 43.
    Machine learning toanalyse machine simulated data ABC model choice A) Generate a large set of (m, , z) from the Bayesian model, (m)m()fm(zj) B) Use machine learning method to estimate anything on m
  • 45.
    (yobs) In thismachine learning perspective: I the (iid) “data set” is the reference table simulated during stage A) I the actual observed yobs becomes a new data point Note that: I predicting m is a classification problem () select the best model based on a maximal a posteriori rule I computing (mj(yobs)) is a regression problem () confidence in each model It is well known that classification is much simple than regression. (dimension of the object we try to learn) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 20 / 40
  • 46.
    Machine learning (cont’) ABC model choice A) Generate a large set of (m, , z) from the Bayesian model, (m)m()fm(zj) B) Use machine learning method to estimate anything on m
  • 48.
    (yobs) It iswell known that classification is much simple than regression. (dimension of the object we try to learn) Hence we renounce approximating the posterior probabilities. An error in the classification problem ) an error in approximating m
  • 51.
    (yobs) whichswaps the order the posterior probabilities A revealing example: MA(1) vs MA(2) Time series (xt) of length T = 100 MA(1): yt = t - 1t-1 MA(2): yt = t - 1t-1 - 2t-2 where t iid, Gaussian. Prior from Marin et al. (2011, Stat Comp) Auto-covariance of order 1,2,. . . as summary statistics Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 21 / 40
  • 52.
    Another issue Iwill skip The information we lost using non sufficient summary statistics. Fundamental discrepancy between the genuine Bayes factors/posterior probabilities and the Bayes factors based on summary statistics. See I Didelot et al. (2011, Bayesian analysis) I Robert, Cornuet, Marin and Pillai (2011, PNAS) I Marin, Pillai, Robert, Rousseau (2014, JRSS B) I . . . Running example: MA(1) vs. MA(2) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + ++ + + + + + +++ + + + ++ + + + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + ++ + + ++ + + + +++ + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + ++ + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + ++ + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ ++ + ++ + + + + + + + + + + + + + ++ + + + + + + + + + + + + +++ + + + + + + + + + + + ++ + + + + + +++ + + + + + + ++ + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + +++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ++++ + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + ++ + + + + ++++ + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + +++ + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p(m=2|x) p(m=2|S(x)) MA(2) p
  • 54.
    vs. p(MA(2)jyobs) (yobs) blue = simulated from MA(1) orange = simulated from MA(2) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 22 / 40
  • 55.
    MA(1) vs MA(2) The reference table of MA(1) vs MA(2) x-axis: auto-covariance of order 1 y-axis: auto-covariance of order 2 blue = simulated from MA(1) orange = simulated from MA(2) Which classification method? I linear discriminant analysis, I local logistic regression, I k-nearest neighbors, I . . . + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + ++ + + + +++ + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + ++ + + + + ++ + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + ++ + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++++ + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ ++ + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + ++ + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + ++ + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + −6 −4 −2 0 2 4 6 −5 0 5 10 lag−1 autocovariance lag−2 autocovariance Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 23 / 40
  • 56.
    MA(1) vs MA(2):Prior error rates Classification technique Error rate 2 sum stat 7 sum stat LDA 27.43 26.57 logistic regression 28.34 27.40 “na¨ıve” Bayes (with Gaussian marginals) 19.52 24.40 “na¨ıve” Bayes (with non-parametric marginal estimates) 18.25 21.92 NN with k = 100 neighbors 17.23 18.37 Random forest 17.04 16.15 NN with k = 50 neighbors 16.97 17.35 local logistic regression with k = 1000 neighbors 16.82 15.44 Kernel discriminant analysis (KDA) 16.95 NA Decision based on (mjyobs), with no summary 12.36 2 sum stat: lag-1 and lag-2 autocovariance 7 sum stat: lag-1,. . . , lag-7 autocovariance The error rate with a 0-1 loss is estimated via cross-validation on the reference table Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 24 / 40
  • 57.
    A few wordson Random Forest (RF) Breiman and Cutler suggested to create a “forest” of (CART) decision trees, fitted on randomised subset of the training database. The many classifiers from the forest are aggregated using a majority vote. Assume you have in the reference table I n data points (= number of simulations in ABC) I p covariates The overall performance of RF increases when the decision trees are more independent. To introduce some independence, I each node of a decision tree in the forest is optimized against a subset of the covariates of size pp I the training set of each tree is randomized Subsampling or bootstrap? Breiman and Cutler proposed to randomize the training set using a bootstrap sample from the training database. When n is large, random subsampling is a better idea, see the result of consistency of RF of Erwan Scornet et al. (arXiv, 2014) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 25 / 40
  • 58.
    Where is Bayes? when we renounce approximating the posterior probabilities ? ? in the prior distribution used to generate the reference table using a 0-1 loss: the MAP is the best classifier Hence a classifier with small prior error rate MAP of the true posterior probabilities
  • 60.
    (yobs) m A distinct advantage of posterior probabilities: evaluate the confidence in the model choice The prior error rate gives a first clue. But does not depend on yobs! ) how to evaluate the error on datasets that resemble yobs? Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 26 / 40
  • 61.
    Posterior predictive expectedloss We suggest replacing an unstable approximation of
  • 63.
    m byan average of (yobs) the selection error across all models given the observed data (yobs), i.e., P M^ (z) = m
  • 65.
    yobs where I the pair (m, z) is generated from the predictive R f(zj)m(mjyobs)dm I M^ (x) denotes the predictor of the machine learning method. Dataset that resemble yobs () dataset simulated from the predictive The above predictive loss can be computed with ABC methods. Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 27 / 40
  • 66.
    Computing the posteriorpredictive expected loss Posterior predictive error rate P M^ (z) = m
  • 68.
    yobs where I the pair (m, z) is generated from tRhe predictive f(zj)m(mjyobs)dm I M^ (z) denotes the predictor of the machine learning method. ABC predictive error rate 1) Simulate the reference table according to the Bayesian model 2) Fix a machine learning predictor ^M (tuned to minimize prior error rate via cross-validation or a test reference table) 3) Get an approximation of the posterior — in (m, m) — selecting the k = 100 nearest neighbors in terms of ((yobs), (z)) 4) For each kept pair (m, m), simulate M = 10 new datasets 5) Compute and return the average error of the predictor ^M on the k M new datasets Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 28 / 40
  • 69.
    Is this posteriorpredictive error rate relevant? Machine learning method: random forest l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Simulated from l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Predictive error −4 −2 0 2 4 −4 −2 0 2 4 6 lag−1 auto correlation lag−2 auto correlation l l l 0.1 0.5 0.9 l l MA(1) MA(2) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 29 / 40
  • 70.
    Is this posteriorpredictive error rate relevant? (cont’) Machine learning method: k-nearest neighbors l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Simulated from l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l Predictive error −4 −2 0 2 4 −4 −2 0 2 4 6 lag−1 auto correlation lag−2 auto correlation l l l 0.1 0.5 0.9 l l MA(1) MA(2) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 30 / 40
  • 71.
    Is this posteriorpredictive error rate relevant? (cont’) x-axis: posterior predictive expected loss of k-nn y-axis: posterior predictive expected loss of random forest 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 predictive error with knn predictive error with RF Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 31 / 40
  • 72.
    Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 32 / 40
  • 73.
    The model choiceproblem Asking the question: does population 3 comes from population 1, 2 or an admixture of both populations? Figure : The three population scenarios: left model 1, center model 2, right model 3 Dataset: 1, 000 autosomal diploid SNP loci for 75 individuals (25 per populations) () 1, 000 idd replicates of the coalescent with a random mutation on each genealogy Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 33 / 40
  • 74.
    Using good summarystatistics reduces the prior error rate 0 500 1000 1500 2000 0.22 0.24 0.26 0.28 0.30 k error Link between the ABC prior error rates and the value of k x-axis: value of k y-axis: prior error rate black line: using 49 summary statistics provided by DIYABC red line: using the two LDA axis instead of the 49 initial summary statistics (prior error rates are computed on a test sample of size 10, 000) Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 34 / 40
  • 75.
    Which machine learningtechnique? Estimated prior error rates for some classification methods Classification technique Prior error rate “na¨ıve” Bayes (with Gaussian marginals) 33.31 Linear Discriminant Analysis 21.89 “Optimal” k-nn using the initial 48 summaries 25.46 “Optimal” k-nn using only the two LDA axis 21.65 local logistic regression on the two LDA axis with k = 5, 000 21.61 random forest using the 48 initial summaries 20.84 random forest using only the two LDA axis 22.50 random forest using both the 48 initial summaries and the LDA axis 19.20 The best classifier: in red Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 35 / 40
  • 76.
    Plotting the referencetable −6 −4 −2 0 2 4 −5 0 5 10 LD1 LD2 Projection of the reference table on the two LDA axis. The colors correspond to the model indexes: I in black model 1 I in orange model 3 and I in blue model 2 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 36 / 40
  • 77.
    Posterior predictive errorrate −6 −4 −2 0 2 4 −5 0 5 10 LD1 LD2 Projection of the reference table on the two LDA axis. ^M = random forest using both the 48 initial summaries and the LDA axis Case 1: red triangle N ABC posterior predictive error rate 0.089 Case 2: green triangle N ABC posterior predictive error rate 0.0 Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 37 / 40
  • 78.
    Table of Contents 1 Approximate Bayesian computation 2 ABC model choice 3 Population genetics example on SNP data 4 Conclusion Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 38 / 40
  • 79.
    ABC model choice Key ideas I
  • 81.
    m , (yobs) m
  • 83.
    yobs IInstead of trying to approximate
  • 85.
    m ,focus on selecting (yobs) the best model (classification vs regression) I Use the best machine learning technique to learn how to select a model from the many simulations of ABC: minimise the 0-1 loss to mimic the MAP I Assess confidence in the model selection with the posterior predictive expected loss Consequences on Population genetics ABC I Often, RF k-NN (because less sensible to high correlation in the covariates = the summary statistics) I RF requires much less simulations from the Bayesian model I RF selects automatically the relevent summary statistics I Hence we can handle much complex models Pierre Pudlo (UM2) ABC learning BigMC 19/06/2014 39 / 40
  • 86.
    Questions? Pierre Pudlo(UM2) ABC learning BigMC 19/06/2014 40 / 40