Bayesian reconstruction of historical population in Finland 1647-1850

Jouni Helske (with Miikka Voutilainen and Harri Högmander)

Prologue: Bayesian reconstruction of Jouni Helske in 2010-2019

  • 2010-2015 PhD in Statistics at JYU
    • State space models (time series),
      prediction problems, statistical software

2016-2017: First postdoc at JYU

  • Bayesian statistics, MCMC, sequential Monte Carlo
  • Visited Oxford’s department of Statistics for a year
  • 2017 moved to Sweden, visited Uppsala University

2017-2019: Postdoc at Linköping University

  • Postdoc in information visualization
  • First at H2020 project Safeclouds, then free hands

2019->: Second postdoc at JYU

  • Causal inference, Bayesian models, missing data, MCMC, SMC, HMC, PMCMC, HMM, SSM, (add your own acronym!), … ?

And now…

Bayesian reconstruction of historical population in Finland 1647-1850

By Jouni Helske, based on paper by Voutilainen, Helske, Högmander (2019) by the same name, conditionally accepted to Demography

Historical population estimates

  • Interesting for history, demography, and economic research
    • Important contextual / macro-economic variables
    • Also interesting in their own right
  • Only few countries have annual population series available before 19th century
    • No proper censuses, estimates based on tax or church records
  • Our aim was to estimate Finnish population from 1647 to 1850
    • Use incomplete parish records and Bayesian methods
    • Incorporate prior knowledge, report uncertainties of the estimates

Problems with current population estimates

  • Based on fiscal and/or parish records of poor quality with ad-hoc correction procedures
  • Only point estimates
    • arguments over differences of few thousands in estimates probably does not make much sense
  • Difficult to assess the validity of the results
    • no ground truth, only one data

Perfect data: Problem solved

Given the initial population estimate, full birth and death records:

μt=μt1+BtDt,t=1648,,1850,

where

  • μt is the population

  • Bt=197i=1bt,i is the total number of births

  • Dt=197i=1dt,i is the total number of deaths at year t.

  • But we don’t know μ1647 nor yearly births and deaths…

Our aim

  • Build a hierarchical model using
    • Parish level data on baptisms and burials (197 parishes)
    • External estimates on military casualties (taken as given)
    • Few available censuses (22) from 1749 onward
    • Prior knowledge (population level mid-1600s, characteristics of parish records, …)

Data quality

Three types of missingness:

  • Baptisms do not equal births and burials do not equal deaths
  • Records are not available for all parishes for every year
  • Not all burials and baptisms are registered

Census data Ct is likely not accurate either, but we’ll assume that

CtN(μt,σ2C),tT,

  • σCGamma(5,0.005)
  • T=1749,1751,1754,1757,,1775,1780,1800,1805,,1850
  • This leads to increase of relative accuracy of the censuses over time as population increases

Census data

Parish data

Population level

μ1647Gamma(215,0.0005),μtGamma(ψμ(μt1+βtδtst),ψμ),t1697

  • βt and δt are annual birth and death estimates, st are the military casualties at year t

  • Prior for first year translates to prior mean 430,000 and prior sd of 30,000

  • Special treatment of year 1697 is needed due to the Great Famine

  • mean of μt is μt1+βtδtst, sd is μt/ψμ

Parish level

We consider 177 parishes of 197 which have at least 10 years of baptism and burial records

  • Denote parishes with recorded baptisms at year t as Ωbt
  • Define baptism process βt

βt=1λbt(iΩbtbt,i+jΩbtˆbt,j),

  • First term in parenthesis corresponds to observed baptisms
  • Latter term corresponds estimates for completely missing records at year t
  • Similar construction for burials with Ωdt and δt

Correction coefficients

  • Coefficients λbt and λdt are the proportions of total number of baptisms of all births (and similarly for burials and deaths)
  • Try to capture
    • partial missingness
    • omission of 20 parishes
    • discrepancy between baptisms and births (burials and deaths)
  • 0<λt<1, increases over time as partial missingness decreases

Assumptions for λs

  • Based on prior knowledge:
    • λd1648<λb1648
    • λd1850=λb1850=λ1850
  • λ1850Beta with 99% prior interval of (0.55,1) (mean 0.9) λd1648/λ1850=a1,λb1648/λ1850=a1+a2,
  • with (a1,a2,a3)Dirichlet(12,14,14) leading to 99% prior intervals (0.2,0.8) and (0.5,1) for the proportions respectively.
  • Assume smooth increase of λs over time

λs as logistic curves

  • Define the λs as scaled logistic curves:

~λjt=[1+exp(rj(tmj))]1,λjt=˜λjt˜λj1648˜λj1850˜λj1648(λ1850λj1648)+λj1648,j=b,d.

with midpoint 1648<mj<1830 and growth rate 0<rj, j=b,d

Parish processes

bt,iGamma(ψbdexp(νbt,i),ψbd),ˆbt,jGamma(ψbdexp(νbt,j),ψbd),νbt,iN(νbt1,i+ηbt,σ2b,ν),ηbtN(ηbt1,σ2b,η),

and similarly for burials, except for year 1697

Handling Great Famine

  • The particularly cold period in 1695–1697 resulted poor yields and subsequent famine
  • Based on literature, it is estimated that 15-30% of the population died during this period
  • Such a drastic drop needs special attention
  • We define π=μ1697μ1695Beta with 99% prior interval of (0.7, 0.85), and μ1697Gamma(ψμπμ1695,ψμ),ˆdt,iGamma(ψbdexp(ϕδ)exp(νdt,i),ψbd),ϕδN(2,0.252)

Estimation alternatives

  • Total number of unknown parameters: 72489 (18 hyperparameters and 72471 latent variables)
  • Need to Markov chain Monte Carlo (MCMC) for fully Bayesian inference
  • Gibbs sampling or naive Metropolis-type MCMC won’t work due to large number of (correlated) parameters
  • State-of-the-art alternatives:
    • Hamiltonian Monte Carlo using probabilistic programming language Stan
    • particle marginal Metropolis–Hastings (PMCMC) using efficient particle filter (PF)

Pros and cons of Stan

  • No need to code up your own MCMC stuff
  • Simple way to write your model (BUGS-like syntax)
  • Claimed to be very efficient for “arbitrary” problems with large number of parameters
    • in practice, in order to be efficient you need to reparametrize and scale your variables
      • for efficient adaptation of the sampler, variables should be uncorrelated and on unit scale (in terms of posterior variance)
  • Many convergence diagnostics
    • Theoretical aspects of many diagnostics unclear, when should you be worried and when you can ignore the warnings?

Pros and cons of PMCMC

  • Stan is “dumb”, as it does not differentiate between hyperparameters and latent variables
  • PMCMC handles these separately, by marginalizing over the latent variables using PF
    • using e.g. random walk Metropolis for 18 hyperparameters is easy
  • But, large number of missing observations make bootstrap-PF unfeasible (needs too many particles)
  • Twisted-PFs with proposals based on approximating Gaussian model?
    • initial tests showed that the approximation of Gamma is not stable wrt hyperparameters (needs too many particles)

Estimation

  • We started prototyping the model formulation with Stan
  • Initial belief was that the modelling is done quickly so no need to bother with implementing custom PMCMC
  • In the end, we stuck with Stan despite some hiccups
  • Total estimation time of the final model was around 110h on Amazon AWS (16 parallel chains of length 15,000)
    • One run of twisted-PF for parish level ~30s on Julia
      • 100 particles, not really enough without some tricks
      • 300+ days for 15,000 iterations without parallelisation
      • not fully optimized code/algorithm, but not an encouraging starting point…

Results: λ-coefficients

Results: yearly differences

Results: Birth and death estimates

Results: Population estimates

Results: Posterior predictive check

- (using posterior samples of hyperparameters, βt, and δt)

Thank you!

Some references:

  • Paper: Voutilainen M, Helske J, Högmander, H (2019). A Bayesian reconstruction of historical population in Finland 1647-1850. Conditionally accepted to Demography.
  • Twisted PF etc: Vihola M, Helske J, Franks, J (2018). Importance sampling type estimators based on approximate marginal MCMC. On ArXiv.
  • Stan: Carpenter B, Gelman A, Hoffman M.D, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76(1).
  • PMCMC: Andrieu C, Doucet A, Holenstein R (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72: 269-342.