Bayesian Inference and Uncertainty Quantification for Inverse Problems Matt Moores @MooresMt https://uow.edu.au/~mmoores/ Centre for Environmental Informatics (CEI), University of Wollongong, NSW, Australia TIDE Hub Seminar Series joint with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom 1 / 19
Introduction Physical model F t; ~ θ Unknown parameters ~ θ = {θ1, . . . , θk} Initial conditions F0 and derivatives ∂F ∂t , etc. 2 / 19
Introduction Physical model F t; ~ θ Unknown parameters ~ θ = {θ1, . . . , θk} Initial conditions F0 and derivatives ∂F ∂t , etc. Observed data yt at times t = 1, . . . , T yt = F t; ~ θ + εt where εt is random noise. 2 / 19
Example Growth curve: dF dt = −βγt log γ 3 / 19
Example Growth curve: dF dt = −βγt log γ Solution: F ti; ~ θ = α − βγti at time points t1, . . . , tn with initial condition F0 = α − β. 3 / 19
Example Growth curve: dF dt = −βγt log γ Solution: F ti; ~ θ = α − βγti at time points t1, . . . , tn with initial condition F0 = α − β. Noisy observations: yi = F(ti; α, β, γ) + εi where εi ∼ N(0, σ2 ) is additive Gaussian noise. Unknown parameters ~ θ = {α, β, γ, σ2 } 3 / 19
Observed Data for n = 27 dugongs 4 / 19
Nonlinear Least Squares nlm - nls(y ~ alpha - beta * gammˆt, data=dat, start=list(alpha=1, beta=1, gamm=0.9)) summary(nlm) ## ## Formula: y ~ alpha - beta * gamm^t ## ## Parameters: ## Estimate Std. Error t value Pr(|t|) ## alpha 2.65807 0.06151 43.21 2e-16 *** ## beta 0.96352 0.06968 13.83 6.3e-13 *** ## gamm 0.87146 0.02460 35.42 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.09525 on 24 degrees of freedom 5 / 19
Drawbacks No guarantee of convexity: will only find the nearest local optimum results are completely dependent on initialization 6 / 19
Drawbacks No guarantee of convexity: will only find the nearest local optimum results are completely dependent on initialization Standard errors are underestimated: 95% profile CI for α and β only achieve ~83% coverage 6 / 19
Inverse Problem Find parameters ~ θ consistent with ~ y 7 / 19
Inverse Problem Find parameters ~ θ consistent with ~ y Physical model F : Θ → Y doesn’t need to be invertible: although lack of identifiability for ~ θ can create problems we don’t need F in closed form — can use numeric solver for DE 7 / 19
Inverse Problem Find parameters ~ θ consistent with ~ y Physical model F : Θ → Y doesn’t need to be invertible: although lack of identifiability for ~ θ can create problems we don’t need F in closed form — can use numeric solver for DE Likelihood L(~ y | ~ θ) is based on E[Yi | ~ θ] = F ti; ~ θ as well as distribution of random noise (not necessarily Gaussian) 7 / 19
Bayesian Inference We assign prior distributions to the parameters ~ θ: π(log{α}) ∼ N(0, 100) π(log{β}) ∼ N(0, 100) π(logit{γ}) ∼ N(0, 100) π 1 σ2 ∼ Ga ν0 2 , 2 ν0s2 0 ! 8 / 19 Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.
Bayesian Inference We assign prior distributions to the parameters ~ θ: π(log{α}) ∼ N(0, 100) π(log{β}) ∼ N(0, 100) π(logit{γ}) ∼ N(0, 100) π 1 σ2 ∼ Ga ν0 2 , 2 ν0s2 0 ! Then the joint posterior distribution is: π ~ θ | ~ y = L(~ y | ~ θ)π(~ θ) π(~ y) where the normalising constant π(~ y) = R ∞ 0 R ∞ 0 R ∞ 0 R 1 0 L(~ y | α, β, γ, σ) dπ(γ)dπ(β)dπ(α)dπ(σ) is intractable. 8 / 19 Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.
Markov Chain Monte Carlo Algorithm 1 Random-walk Metropolis sampler for π ~ θ | ~ y 1: Initialize α0 ∼ π(log{α}), β0 ∼ π(log{β}), γ0 ∼ π(logit{γ}), σ0 ∼ π 1 σ2 2: Solve ~ µ0 = ~ F ~ t; α0, β0, γ0 and calculate `0 = log L(~ y | ~ µ0, σ2 0) 3: for iterations j = 1, . . . , J do 4: Propose log{α∗} ∼ N(log{αj−1}, σ2 α), log{β∗} ∼ N(log{βj−1}, σ2 β), logit{γ∗} ∼ N(logit{γj−1}, σ2 γ) 5: Solve ~ µ∗ = ~ F ~ t; α∗, β∗, γ∗ 6: Propose σ∗ ∼ π 1 σ2 | ~ µ∗,~ y and calculate `∗ = log L(~ y | ~ µ∗, σ2 ∗) 7: Calculate ρt = exp{`∗}π(~ θ∗)q(~ θj−1 | ~ θ∗) exp{`j−1}π(~ θj−1)q(~ θ∗ | ~ θj−1) 8: Accept ~ θj = ~ θ∗ with probability min{ρt, 1} else ~ θj = ~ θj−1 9: end for 9 / 19
Markov Chains 10 / 19
Posterior Distributions 11 / 19
95% Predictive Interval 12 / 19
Thermogravimetric Analysis The TGA model for a single reaction involves the Arrhenius equations: dM dt = −MA exp − E RT dT dt = α where M is mass fraction, T is temperature, R is the ideal gas constant. The initial mass M0, the initial temperature T0, and the heating rate α are experimentally controlled. The unknown parameters ~ θ are A pre-exponential factor E activation energy σ2 variance of the additive Gaussian noise 13 / 19
Reparameterisation We have good prior information for E (physically interpretable) and σ2 (measurement noise), but not for A. The distributions for A and E are also very highly correlated, slowing the mixing of MCMC. 14 / 19
Reparameterisation We have good prior information for E (physically interpretable) and σ2 (measurement noise), but not for A. The distributions for A and E are also very highly correlated, slowing the mixing of MCMC. Instead, we reparameterise the model in terms of the temperature, Tm, at which the rate dM dt is maximised: A exp − E RTm = E RT2 m α In our MCMC algorithm, we first propose q(E∗ | Ej−1), then propose q(Tm∗ | Tm,j−1). The value of A∗ can then be obtained from the equation above. We solve ~ F ~ t; E∗, A∗ numerically using a Runge-Kutta method. 14 / 19
Posterior Distributions 634.6 634.7 634.8 634.9 635.0 635.1 635.2 635.3 Tm 0 1 2 3 4 5 84750 85000 85250 85500 85750 86000 86250 86500 86750 E 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0014 0.0016 0.08 0.10 0.12 0.14 0 10 20 30 40 50 2400000 2600000 2800000 3000000 3200000 3400000 3600000 A 0.0000000 0.0000005 0.0000010 0.0000015 0.0000020 0.0000025 15 / 19
Posterior for Functions of ~ θ A function of a random variable is also a random variable. Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~ y), we can use our model to obtain predictions for any measurable function of the parameters, g(A, E, Tm). 16 / 19
Posterior for Functions of ~ θ A function of a random variable is also a random variable. Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~ y), we can use our model to obtain predictions for any measurable function of the parameters, g(A, E, Tm). For example, the critical length of a stockpile: Lcr = g(A, E, Tm) = K v u u texp n E RT2 m o A RT2 m E E[g(A, E, Tm) | ~ y] = Z 1000 0 Z ∞ 0 Z ∞ 0 g(A, E, Tm) dπ(A, E, Tm | ~ y) ≈ J X j=1 g(Aj, Ej, Tj) where K is a constant and {Aj, Ej, Tj}J j=1 are the MCMC samples (after discarding burn-in). 16 / 19
Posterior for Lcr 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 17 / 19
Advanced Methods Parallel and distributed computation Sequential updating of π(~ θ | ~ y) and streaming inference Spatio-temporal modelling of F(x, y, z, t; ~ θ) Emulation of the forward map Θ → Y (e.g. using artificial neural networks) Approximating the likelihood L (e.g. ABC) Accounting for multi-modality (e.g. ill-posed problems) Accounting for error in the numerical solver (probabilistic numerics) Accounting for model misspecification (discrepancy function) Multi-level Monte Carlo methods Hamiltonian Monte Carlo 18 / 19
Further Reading R. J. Longbottom, B. J. Monaghan, D. J. Pinson, N. A. S. Webster S. J. Chew In situ Phase Analysis during Self-sintering of BOS Filter Cake for Improved Recycling. ISIJ International 60(11): 2436–2445, 2020. A. M. Stuart Inverse problems: A Bayesian perspective. Acta Numerica 19: 451—559, 2010. L.C. Astfalck, E.J. Cripps, J.P. Gosling, M.R. Hodkiewicz I.A. Milne Expert elicitation of directional metocean parameters. Ocean Engineering 161: 268–276, 2018. B. P. Carlin A. E. Gelfand An iterative Monte Carlo method for nonconjugate Bayesian analysis. Statistics Computing 1(2): 119–128, 1991. 19 / 19

Bayesian Inference and Uncertainty Quantification for Inverse Problems

  • 1.
    Bayesian Inference andUncertainty Quantification for Inverse Problems Matt Moores @MooresMt https://uow.edu.au/~mmoores/ Centre for Environmental Informatics (CEI), University of Wollongong, NSW, Australia TIDE Hub Seminar Series joint with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom 1 / 19
  • 2.
    Introduction Physical model F t;~ θ Unknown parameters ~ θ = {θ1, . . . , θk} Initial conditions F0 and derivatives ∂F ∂t , etc. 2 / 19
  • 3.
    Introduction Physical model F t;~ θ Unknown parameters ~ θ = {θ1, . . . , θk} Initial conditions F0 and derivatives ∂F ∂t , etc. Observed data yt at times t = 1, . . . , T yt = F t; ~ θ + εt where εt is random noise. 2 / 19
  • 4.
  • 5.
    Example Growth curve: dF dt = −βγt logγ Solution: F ti; ~ θ = α − βγti at time points t1, . . . , tn with initial condition F0 = α − β. 3 / 19
  • 6.
    Example Growth curve: dF dt = −βγt logγ Solution: F ti; ~ θ = α − βγti at time points t1, . . . , tn with initial condition F0 = α − β. Noisy observations: yi = F(ti; α, β, γ) + εi where εi ∼ N(0, σ2 ) is additive Gaussian noise. Unknown parameters ~ θ = {α, β, γ, σ2 } 3 / 19
  • 7.
    Observed Data forn = 27 dugongs 4 / 19
  • 8.
    Nonlinear Least Squares nlm- nls(y ~ alpha - beta * gammˆt, data=dat, start=list(alpha=1, beta=1, gamm=0.9)) summary(nlm) ## ## Formula: y ~ alpha - beta * gamm^t ## ## Parameters: ## Estimate Std. Error t value Pr(|t|) ## alpha 2.65807 0.06151 43.21 2e-16 *** ## beta 0.96352 0.06968 13.83 6.3e-13 *** ## gamm 0.87146 0.02460 35.42 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.09525 on 24 degrees of freedom 5 / 19
  • 9.
    Drawbacks No guarantee ofconvexity: will only find the nearest local optimum results are completely dependent on initialization 6 / 19
  • 10.
    Drawbacks No guarantee ofconvexity: will only find the nearest local optimum results are completely dependent on initialization Standard errors are underestimated: 95% profile CI for α and β only achieve ~83% coverage 6 / 19
  • 11.
    Inverse Problem Find parameters~ θ consistent with ~ y 7 / 19
  • 12.
    Inverse Problem Find parameters~ θ consistent with ~ y Physical model F : Θ → Y doesn’t need to be invertible: although lack of identifiability for ~ θ can create problems we don’t need F in closed form — can use numeric solver for DE 7 / 19
  • 13.
    Inverse Problem Find parameters~ θ consistent with ~ y Physical model F : Θ → Y doesn’t need to be invertible: although lack of identifiability for ~ θ can create problems we don’t need F in closed form — can use numeric solver for DE Likelihood L(~ y | ~ θ) is based on E[Yi | ~ θ] = F ti; ~ θ as well as distribution of random noise (not necessarily Gaussian) 7 / 19
  • 14.
    Bayesian Inference We assignprior distributions to the parameters ~ θ: π(log{α}) ∼ N(0, 100) π(log{β}) ∼ N(0, 100) π(logit{γ}) ∼ N(0, 100) π 1 σ2 ∼ Ga ν0 2 , 2 ν0s2 0 ! 8 / 19 Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.
  • 15.
    Bayesian Inference We assignprior distributions to the parameters ~ θ: π(log{α}) ∼ N(0, 100) π(log{β}) ∼ N(0, 100) π(logit{γ}) ∼ N(0, 100) π 1 σ2 ∼ Ga ν0 2 , 2 ν0s2 0 ! Then the joint posterior distribution is: π ~ θ | ~ y = L(~ y | ~ θ)π(~ θ) π(~ y) where the normalising constant π(~ y) = R ∞ 0 R ∞ 0 R ∞ 0 R 1 0 L(~ y | α, β, γ, σ) dπ(γ)dπ(β)dπ(α)dπ(σ) is intractable. 8 / 19 Astfalck, Cripps, Gosling, Hodkiewicz Milne (2018) Ocean Engineering 161: 268–276.
  • 16.
    Markov Chain MonteCarlo Algorithm 1 Random-walk Metropolis sampler for π ~ θ | ~ y 1: Initialize α0 ∼ π(log{α}), β0 ∼ π(log{β}), γ0 ∼ π(logit{γ}), σ0 ∼ π 1 σ2 2: Solve ~ µ0 = ~ F ~ t; α0, β0, γ0 and calculate `0 = log L(~ y | ~ µ0, σ2 0) 3: for iterations j = 1, . . . , J do 4: Propose log{α∗} ∼ N(log{αj−1}, σ2 α), log{β∗} ∼ N(log{βj−1}, σ2 β), logit{γ∗} ∼ N(logit{γj−1}, σ2 γ) 5: Solve ~ µ∗ = ~ F ~ t; α∗, β∗, γ∗ 6: Propose σ∗ ∼ π 1 σ2 | ~ µ∗,~ y and calculate `∗ = log L(~ y | ~ µ∗, σ2 ∗) 7: Calculate ρt = exp{`∗}π(~ θ∗)q(~ θj−1 | ~ θ∗) exp{`j−1}π(~ θj−1)q(~ θ∗ | ~ θj−1) 8: Accept ~ θj = ~ θ∗ with probability min{ρt, 1} else ~ θj = ~ θj−1 9: end for 9 / 19
  • 17.
  • 18.
  • 19.
  • 20.
    Thermogravimetric Analysis The TGAmodel for a single reaction involves the Arrhenius equations: dM dt = −MA exp − E RT dT dt = α where M is mass fraction, T is temperature, R is the ideal gas constant. The initial mass M0, the initial temperature T0, and the heating rate α are experimentally controlled. The unknown parameters ~ θ are A pre-exponential factor E activation energy σ2 variance of the additive Gaussian noise 13 / 19
  • 21.
    Reparameterisation We have goodprior information for E (physically interpretable) and σ2 (measurement noise), but not for A. The distributions for A and E are also very highly correlated, slowing the mixing of MCMC. 14 / 19
  • 22.
    Reparameterisation We have goodprior information for E (physically interpretable) and σ2 (measurement noise), but not for A. The distributions for A and E are also very highly correlated, slowing the mixing of MCMC. Instead, we reparameterise the model in terms of the temperature, Tm, at which the rate dM dt is maximised: A exp − E RTm = E RT2 m α In our MCMC algorithm, we first propose q(E∗ | Ej−1), then propose q(Tm∗ | Tm,j−1). The value of A∗ can then be obtained from the equation above. We solve ~ F ~ t; E∗, A∗ numerically using a Runge-Kutta method. 14 / 19
  • 23.
    Posterior Distributions 634.6 634.7634.8 634.9 635.0 635.1 635.2 635.3 Tm 0 1 2 3 4 5 84750 85000 85250 85500 85750 86000 86250 86500 86750 E 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0014 0.0016 0.08 0.10 0.12 0.14 0 10 20 30 40 50 2400000 2600000 2800000 3000000 3200000 3400000 3600000 A 0.0000000 0.0000005 0.0000010 0.0000015 0.0000020 0.0000025 15 / 19
  • 24.
    Posterior for Functionsof ~ θ A function of a random variable is also a random variable. Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~ y), we can use our model to obtain predictions for any measurable function of the parameters, g(A, E, Tm). 16 / 19
  • 25.
    Posterior for Functionsof ~ θ A function of a random variable is also a random variable. Now that we have random samples from our posterior π(A, E, Tm, σ2 | ~ y), we can use our model to obtain predictions for any measurable function of the parameters, g(A, E, Tm). For example, the critical length of a stockpile: Lcr = g(A, E, Tm) = K v u u texp n E RT2 m o A RT2 m E E[g(A, E, Tm) | ~ y] = Z 1000 0 Z ∞ 0 Z ∞ 0 g(A, E, Tm) dπ(A, E, Tm | ~ y) ≈ J X j=1 g(Aj, Ej, Tj) where K is a constant and {Aj, Ej, Tj}J j=1 are the MCMC samples (after discarding burn-in). 16 / 19
  • 26.
  • 27.
    Advanced Methods Parallel anddistributed computation Sequential updating of π(~ θ | ~ y) and streaming inference Spatio-temporal modelling of F(x, y, z, t; ~ θ) Emulation of the forward map Θ → Y (e.g. using artificial neural networks) Approximating the likelihood L (e.g. ABC) Accounting for multi-modality (e.g. ill-posed problems) Accounting for error in the numerical solver (probabilistic numerics) Accounting for model misspecification (discrepancy function) Multi-level Monte Carlo methods Hamiltonian Monte Carlo 18 / 19
  • 28.
    Further Reading R. J.Longbottom, B. J. Monaghan, D. J. Pinson, N. A. S. Webster S. J. Chew In situ Phase Analysis during Self-sintering of BOS Filter Cake for Improved Recycling. ISIJ International 60(11): 2436–2445, 2020. A. M. Stuart Inverse problems: A Bayesian perspective. Acta Numerica 19: 451—559, 2010. L.C. Astfalck, E.J. Cripps, J.P. Gosling, M.R. Hodkiewicz I.A. Milne Expert elicitation of directional metocean parameters. Ocean Engineering 161: 268–276, 2018. B. P. Carlin A. E. Gelfand An iterative Monte Carlo method for nonconjugate Bayesian analysis. Statistics Computing 1(2): 119–128, 1991. 19 / 19