Bayesian Inference

Spatio-temporal modeling of co-dynamics of smallpox, measles and pertussis in pre-healthcare Finland

Abstract Infections are known to interact as previous infections may have an effect on risk of succumbing to a new infection. The co-dynamics can be mediated by immunosuppression or -modulation, shared environmental or climatic drivers, or competition for susceptible hosts. Research and statistical methods in epidemiology often concentrate on large pooled datasets, or high quality data from cities, leaving rural areas underrepresented in literature. Data considering rural populations are typically sparse and scarce, especially in the case of historical data sources, which may introduce considerable methodological challenges.

Price Optimization Combining Conjoint Data and Purchase History: A Causal Modeling Approach

Abstract Pricing decisions of companies require an understanding of the causal effect of a price change on the demand. When real-life pricing experiments are infeasible, data-driven decision-making must be based on alternative data sources such as purchase history (sales data) and conjoint studies where a group of customers is asked to make imaginary purchases in an artificial setup. We present an approach for price optimization that combines population statistics, purchase history and conjoint data in a systematic way.

Flexible Bayesian modelling and causal inference for panel data with R package dynamite

Panel data, consisting of various measurements from multiple subjects followed over several time points, are commonly studied in social sciences and other fields. Such data can naturally be analyzed in various ways, depending on the research questions and the characteristics of the data. Popular, somewhat overlapping modelling approaches include dynamic panel models, fixed effect models, and variations of cross-lagged panel models. In this talk, I extend the traditional cross-lagged panel model to handle time-varying effects and non-Gaussian response variables and show how Bayesian posterior predictive distributions can be used to evaluate long-term counterfactual predictions which take into account the dynamic structure of the assumed causal graph of the system. Finally, I give an overview of a new R package dynamite for Bayesian inference for panel data.

Estimating the causal effect of timing on the reach of social media posts

Abstract Modern companies regularly use social media to communicate with their customers. In addition to the content, the reach of a social media post may depend on the season, the day of the week, and the time of the day. We consider optimizing the timing of Facebook posts by a large Finnish consumers’ cooperative using historical data on previous posts and their reach. The content and the timing of the posts reflect the marketing strategy of the cooperative.

A Bayesian spatio-temporal analysis of markets during the Finnish 1860s famine

Abstract We develop a Bayesian spatio-temporal model to study pre-industrial grain market integration during the Finnish famine of the 1860s. Our model takes into account several problematic features often present when analysing multiple spatially interdependent time series. For example, compared with the error correction methodology commonly applied in econometrics, our approach allows simultaneous modelling of multiple interdependent time series avoiding cumbersome statistical testing needed to predetermine the market leader as a point of reference.

Efficient Bayesian generalized linear models with time-varying coefficients: The walker package in R

Abstract The R package walker extends standard Bayesian general linear models to the case where the effects of the explanatory variables can vary in time. This allows, for example, to model the effects of interventions such as changes in tax policy which gradually increases their effect over time. The Markov chain Monte Carlo algorithms powering the Bayesian inference are based on Hamiltonian Monte Carlo provided by Stan software, using a state space representation of the model to marginalise over the regression coefficients for efficient low-dimensional sampling.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

Abstract We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference for the latent states based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretised diffusion latent state processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias.

Estimation of causal effects with small data in the presence of trapdoor variables

Abstract We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples. This bias is related to variables that we call trapdoor variables.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

State space models are a flexible class of latent variable models commonly used in analysing time series data. The R package bssm is designed for Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and state equations. The package provides easy-to-use and efficient functions for fully Bayesian inference with common time series models such as basic structural time series model with exogenous covariates, simple stochastic volatility models, and discretized diffusion models, making it straightforward and efficient to make predictions and other inference in a Bayesian setting. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional parallelizable importance sampling post-correction to eliminate any approximation bias. The bssm package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package supports directly models with linear-Gaussian state dynamics with non-Gaussian observation models and has an Rcpp interface for specifying custom non-linear and diffusion models.

Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect

Abstract Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo (MCMC) and MCMC based on parallel importance sampling type weighted estimators (Vihola, Helske, and Franks, 2020, doi:10.1111/sjos.12492). Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities and basic stochastic volatility models with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported.

walker: Bayesian Generalized Linear Models with Time-Varying Coefficients

Bayesian generalized linear models with time-varying coefficients as in Helske (2020, arXiv:2009.07063). Gaussian, Poisson, and binomial observations are supported. The Markov chain Monte Carlo (MCMC) computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling. For non-Gaussian models, the package uses the importance sampling type estimators based on approximate marginal MCMC as in Vihola, Helske, Franks (2020, doi:10.

Estimation of causal effects with small data in the presence of trapdoor variables

We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples (where parameter estimation exhibits non-negligible uncertainty). This bias is related to variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with real data from the Life Course 1971-2002 study. Using this dataset, we estimate the causal effect of education on income in the Finnish context. Using the Bayesian modelling approach allows us to take the parameter uncertainty into account and gives us the full interventional distribution instead of only average causal effect estimates.

Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo

Abstract We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyper parameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation.

A Bayesian reconstruction of historical population in Finland, 1647-1850

Abstract This article provides a novel method to estimate historical population development. We review the previous literature on historical population time series estimates and propose a general outline to address the well-known methodological problems. We use a Bayesian hierarchical time series model that allows us to integrate parish level dataset and prior population information in a coherent manner. The procedure provides us with model-based posterior intervals for the final population estimates.

Comparison of Attention Behaviour Across User Sets through Automatic Identification of Common Areas of Interest

Abstract Eye tracking is used to analyze and compare user behaviour within numerous domains, but long duration eye tracking experiments across multiple users generate millions of eye gaze samples, making the data analysis process complex. Usually the samples are labelled into Areas of Interest (AoI) or Objects of Interest (OoI), where the AoI approach aims to understand how a user monitors different regions of a scene while OoI identification uncovers distinct objects in the scene that attract user attention.

Bayesian reconstruction of historical population in Finland 1647-1850

The scarcity of long-run historical population series is a major problem because these are vital inputs for many fields of history, demography, and economics. Perhaps the most crucial omission thus far in historical population reconstructions has been the unavailability of uncertainty estimates, leaving the door open for conflicting interpretations of respective population developments. In this talk, I will describe a Bayesian hierarchical time series model that allows us to integrate partially observed parish level data and prior information in a coherent manner, providing us with model-based posterior intervals for the population estimates. We demonstrate its applicability by estimating long-term Finnish population development from 1647 onwards. This puts Finland among the very few countries with an annual population series of this length available.
(Slides start with a short bio)