Flexible Bayesian modelling and causal inference for panel data with R package dynamite

Jouni Helske (based on a joint work with Santtu Tikka)
AoF PREDLIFE Consortium (University of Jyväskylä and University of Turku)

2022-12-02

Panel data

  • Data from N subjects
  • Measured over multiple time points 1,,T
    • Can be calendar time, age, some other variable defining the order of observations
  • Multiple measurements per subject, e.g., person’s yearly income, health status, partnership status, …
  • Common in many fields such as econometrics, psychology and social sciences in general.
  • Often used for observational causal inference

Statistical methods

  • Several modelling approaches, depending on the observed and assumed characteristics of the data
  • Traditional approaches are based on some variant of cross-lagged panel model (CLPM) (e.g., Allison, Williams, and Moral-Benito 2017; Hamaker, Kuiper, and Grasman 2015) yit=αyi+βyyyit1+βyxxit1+ϵyitxit=αxi+βxyyit1+βxxxit1+ϵxit
    where ϵyt and ϵxt follow normal distribution.
  • Estimation is typically based on maximum likelihood using structural equation models (SEM)
  • Other fixed “control variables” can also be included in both equations

Cross-lagged panel model with two variables

Directed acyclic graph (DAG) of bivariate CLPM

Limitations of CLPM

  • SEM approaches assume joint normality of the response variables
    • Traditional implementations can only handle few time points (e.g. < 50)
  • All effects are assumed to be constant in time
    • Reasonable for small T, not necessarily so for larger T
  • Focus on one-step-ahead effect based on the model parameters
    • Effects of interventions might not persists over time (not the same as previous point!)

Time-varying effects

  • It is possible that the effects of some predictors change over time
  • Effects of policy changes can slowly diminish or take some time to fully affect the behavior of all individuals
    • introduction of traffic cameras could first reduce the overall speeding until people learn where the cameras are located
  • Unmeasured changes in individuals’ common environment can also cause the parameters to vary in time (if allowed to do so)
    • Can be used to assess the effects of unobserved (time-varying) confounding

Time-varying regression

  • In state space modelling approaches the coefficients vary according to (integrated) random walks
    • Can become computationally demanding with large number of individuals, especially in non-Gaussian setting
    • (Mostly) outside of panel data setting (e.g., Helske 2022; Harvey and Phillips 1982) but proposed already in (Harvey 1978)
  • Varying coefficients models based on kernel smoothing and splines (e.g., tvReg and tvem R packages)
    • Choosing bandwidth or number of knots can be challenging

Long-term causal effects (LTCE)

  • Instead of focusing on the causal effect of xt on yt+1, it might be of interest to consider the effects of
  • atomic intervention p(yt+k|do(xt=x)), k=1, and
  • recurring intervention p(yt+k|do(xt=x,,xt+k=x)),k1.

Example of estimating LTCEs

  • Consider a situation in the following graph, from which we want to estimate causal effect of x1 on y3

Identifying functional

  • Nonparametric identifying functional for the interventional distribution p(y3|do(x1)): z1,y1p(y3|x1,z1,y1)p(z1,y1)
  • Familiar form of back-door adjustment (Pearl 2009)
  • We need to estimate distributions p(y3|x1,z1,y1) and p(z1,y1)
  • p(z1,y1) can be estimated based on the empirical distribution of our data, e.g., summing over our N individuals (Hernán and Robins 2020).

Alternative identifying functional

  • Could construct a statistical model directly for the distribution p(y3|x1,z1,y1)
  • Say we are also interested in p(y4|do(x1)). Now we should estimate a new model for p(y4|x1,z1,y1).
  • Fortunately we have an equivalent identifying functional z1,y1,x2,z2,y2p(y3|x2,z2,y2)p(x2,z2,y2|x1,z1,y1)p(z1,y1)
  • Enough to model the relationship of observations at time t given the previous time point(s).

Estimating LTCE

  • For j=1,,M (number of posterior samples)
    • For i=1,,N (number of individuals)
      • Simulate (xji,2,zji,2,yji,2)p(xi,2,zi,2,yi,2|x1,zi,1,yi,1,θj)
      • Simulate yji,3p(yi,3|xji,2,zji,2,yji,2,θj)
  • Now yji,3 are posterior samples from p(y3|do(x1)).
  • Replacing the simulation of yji,3 with E(yi,3|xji,2,zji,2,yji,2,θj) and averaging over individuals leads to samples from the posterior distribution of the average causal effect E(y3|do(x1))

Dynamite model

  • C response variables (channels), yit=(y1it,,yCit)

  • yi,t can depend on the past values yi,tk, k=1, and exogenous variables xi,tk, k=0,

  • k can be different for different channels, today k=1 for simplicity

  • We have pt(yi,t|yi,t1)=Cc=1pct(yci,t|y1i,t1,,yCi,t1),

  • The first k time points are treated as fixed data.

  • dynamite currently supports Gaussian, Categorical, Poisson, Binomial, Bernoulli, Negative Binomial, Gamma, Exponential and Beta distributions

Linear predictor

  • We define a linear predictor ηci,t for the channel c with a following general form: ηci,t=αct+νci+λciψct+Xci,tβc+Zci,tδct
  • αt is the common, possibly time-varying, intercept term
  • νci are zero-mean Gaussian intercept terms, correlated across channels
  • λciψct corresponds to latent dynamic factor and its loadings
  • βc are time-invariant coefficients and δct are time-varying coefficients

Time-varying components

  • We define δt via Bayesian penalized B-splines (Lang and Brezger 2004): δt=Btω,t=k+1,,TωdN(ωd1,τ2),d=2,,D
    where Bt are the B-spline basis function values at time t and vector ω contains spline coefficients.
  • Random walk prior on ω defines the smoothness of the spline
  • τ0 leads to a time-invariant effect
  • Prior on ω1 = prior on δ1

Latent factors

  • Instead of common time-varying intercept αt, we can estimate a common latent factor ψct and individual-specific factor loadings λci
  • Factors can be correlated across channels
  • I will skip these today

Dynamite graph

Causal effect of employment on employment

  • Data on 845 individuals from Swiss Household Panel
  • 13 time points, from ages 20 to 44 (every two years)
  • Binary variables gender (421/424 females/males) and fulltime, whether the person was employed full time or not
  • Data included in the R package march
  • Interested in how intervention on employment at the age of 30 affects the future employment status

Causal graph

Defining the model in R

model_formula <- 
  obs(fulltime ~ 
      gender + 
      varying(~ -1 + gender:lag(never) + gender:lag(fulltime)),
    family = "bernoulli") +
  aux(never ~ fulltime == 0 & lag(never) == 1 | init(1)) +
  splines(df = 10)
)
fit <- dynamite(model_formula, data = d, group = "id", time = "age",
  chains = 4, cores = 4, iter = 2000)

Time-varying effects

plot_deltas(fit) + xlab("Age") + theme_bw()

Simulation of individual trajectories based on the intervention

# No full time employment at age 30
newdata0 <- d |> filter(age >= 28)
newdata0$fulltime[newdata0$age == 30] <- 0
newdata0$fulltime[newdata0$age > 30] <- NA
# Full time employment at age 30
newdata1 <- d |> filter(age >= 28)
newdata1$fulltime[newdata1$age == 30] <- 1
newdata1$fulltime[newdata1$age > 30] <- NA

Predictions cont.

pred <- 
  bind_rows(
    no = predict(fit, newdata = newdata0,
      funs = list(fulltime = list(mean = mean)))$simulated,
    yes = predict(fit, newdata = newdata1,
      funs = list(fulltime = list(mean = mean)))$simulated,
    .id = "fulltime_30"
  ) |> filter(age > 28) |> 
  group_by(age) |>
  summarise(difference = 
      mean_fulltime[fulltime_30 == "yes"] - 
      mean_fulltime[fulltime_30 == "no"]
  ) |>
  summarise(
    mean = mean(difference),
    lwr = quantile(difference, 0.025),
    upr = quantile(difference, 0.975)
  )

Results

Future directions

  • Support for group-level regression coefficients
  • Ordinal and t-distributions
  • Case studies

The dynamite package can be installed from github via

devtools::install_github("santikka/dynamite")

The end

Allison, P D, R Williams, and E Moral-Benito. 2017. “Maximum Likelihood for Cross-Lagged Panel Models with Fixed Effects.” Socius.

Hamaker, E L, R M Kuiper, and R PPP Grasman. 2015. “A Critique of the Cross-Lagged Panel Model.” Psychological Methods.

Harvey, A C. 1978. “The Estimation of Time-Varying Parameters from Panel Data.” In Annales de l’inséé.

Harvey, A C, and G D A Phillips. 1982. “The Estimation of Regression Models with Time- Varying Parameters.” In Games, Economic Dynamics, and Time Series Analysis, edited by M. Deistler, E. Fürst, and G. Schwödiauer. Heidelberg.

Helske, J. 2022. “Efficient Bayesian Generalized Linear Models with Time-Varying Coefficients: The Walker Package in R.” SoftwareX.

Hernán, M A, and J M Robins. 2020. Causal Inference: What If.

Lang, S, and A Brezger. 2004. “Bayesian P- Splines.” Journal of Computational and Graphical Statistics.

Pearl, J. 2009. Causality: Models, Reasoning, and Inference.

Sun, Y, R J Carroll, and D Li. 2009. “Semiparametric Estimation of Fixed-Effects Panel Data Varying Coefficient Models.” In Nonparametric Econometric Methods.