I am currently Academy Research Fellow at University of Turku, working on temporal causal inference. I am also PI of the statistics subproject of PREDLIFE Consortium at University of Jyväskylä.
I did my Statistics studies in at the University of Jyväskylä, and after a postdoc under Matti Vihola (Bayesian Markov chain and sequential Monte Carlo stuff) I did my second postdoc at Anders Ynnerman’s Infovis group at the Linköping University (various visualization and statistic stuff), before coming back to Jyväskylä to work on causal inference with Juha Karvanen as part of the Decision analytics utilizing causal models and multiobjective optimization (DEMO) project, before acquiring my own funding to work on PREDLIFE project.
My research can be broadly classified as computational statistics, mainly related to causal inference and time series methods (state space models, hidden Markov models), and respective statistical software development. Check out my publications and R packages for more details on my current and previous research interests. You can also take look at my CV
Ph.D. in Statistics ∙ University of Jyväskylä ∙ 2015
MSc in Statistics ∙ University of Jyväskylä ∙ 2010
I am not teaching at the moment. I have previously taught courses Statistical Inference 1, Bayesian Inference 1, R programming, and generalized linear models 2 at the University of Jyväskylä.
Publications
A modern approach to transition analysis and process mining with Markov models: A tutorial with R
Abstract This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most common variations: first-order Markov models, hidden Markov models, mixture Markov models, and mixture hidden Markov models.
Read moreClustering and Structural Robustness in Causal Diagrams
Abstract Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily.
Read morePrice Optimization Combining Conjoint Data and Purchase History: A Causal Modeling Approach
Abstract Pricing decisions of companies require an understanding of the causal effect of a price change on the demand. When real-life pricing experiments are infeasible, data-driven decision-making must be based on alternative data sources such as purchase history (sales data) and conjoint studies where a group of customers is asked to make imaginary purchases in an artificial setup. We present an approach for price optimization that combines population statistics, purchase history and conjoint data in a systematic way.
Read moreSoftware
bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R
Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo (MCMC) and MCMC based on parallel importance sampling type weighted estimators (Vihola, Helske, and Franks, 2020, doi:10.1111/sjos.12492). Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities and basic stochastic volatility models with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported.
Read morewalker: Bayesian Generalized Linear Models with Time-Varying Coefficients
Bayesian generalized linear models with time-varying coefficients as in Helske (2020, arXiv:2009.07063). Gaussian, Poisson, and binomial observations are supported. The Markov chain Monte Carlo (MCMC) computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling. For non-Gaussian models, the package uses the importance sampling type estimators based on approximate marginal MCMC as in Vihola, Helske, Franks (2020, doi:10.
Read moreSome of my talks
Flexible Bayesian modelling and causal inference for panel data with R package dynamite
Panel data, consisting of various measurements from multiple subjects followed over several time points, are commonly studied in social sciences and other fields. Such data can naturally be analyzed in various ways, depending on the research questions and the characteristics of the data. Popular, somewhat overlapping modelling approaches include dynamic panel models, fixed effect models, and variations of cross-lagged panel models. In this talk, I extend the traditional cross-lagged panel model to handle time-varying effects and non-Gaussian response variables and show how Bayesian posterior predictive distributions can be used to evaluate long-term counterfactual predictions which take into account the dynamic structure of the assumed causal graph of the system. Finally, I give an overview of a new R package dynamite for Bayesian inference for panel data.
Read morebssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R
State space models are a flexible class of latent variable models commonly used in analysing time series data. The R package bssm is designed for Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and state equations. The package provides easy-to-use and efficient functions for fully Bayesian inference with common time series models such as basic structural time series model with exogenous covariates, simple stochastic volatility models, and discretized diffusion models, making it straightforward and efficient to make predictions and other inference in a Bayesian setting. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional parallelizable importance sampling post-correction to eliminate any approximation bias. The bssm package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package supports directly models with linear-Gaussian state dynamics with non-Gaussian observation models and has an Rcpp interface for specifying custom non-linear and diffusion models.
Read moreEstimation of causal effects with small data in the presence of trapdoor variables
We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples (where parameter estimation exhibits non-negligible uncertainty). This bias is related to variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with real data from the Life Course 1971-2002 study. Using this dataset, we estimate the causal effect of education on income in the Finnish context. Using the Bayesian modelling approach allows us to take the parameter uncertainty into account and gives us the full interventional distribution instead of only average causal effect estimates.
Read moreFeatured categories
R package (19) Bayesian Inference (16) Causal Inference (7)