Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models

Abstract Panel data are ubiquitous in scientific fields such as social sciences. Various modeling approaches have been presented for observational causal inference based on such data. Existing approaches typically impose restrictive assumptions on the data-generating process such as Gaussian responses or time-invariant effects, or they can only consider short-term causal effects. To surmount these restrictions, we present the dynamic multivariate panel model (DMPM) that supports time-varying, time-invariant, and individual-specific effects, multiple responses across a wide variety of distributions, and arbitrary dependency structures of lagged responses of any order.

dynamite: An R Package for Dynamic Multivariate Panel Models

Abstract dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods.

Flexible Bayesian modelling and causal inference for panel data with R package dynamite

Panel data, consisting of various measurements from multiple subjects followed over several time points, are commonly studied in social sciences and other fields. Such data can naturally be analyzed in various ways, depending on the research questions and the characteristics of the data. Popular, somewhat overlapping modelling approaches include dynamic panel models, fixed effect models, and variations of cross-lagged panel models. In this talk, I extend the traditional cross-lagged panel model to handle time-varying effects and non-Gaussian response variables and show how Bayesian posterior predictive distributions can be used to evaluate long-term counterfactual predictions which take into account the dynamic structure of the assumed causal graph of the system. Finally, I give an overview of a new R package dynamite for Bayesian inference for panel data.

Efficient Bayesian generalized linear models with time-varying coefficients: The walker package in R

Abstract The R package walker extends standard Bayesian general linear models to the case where the effects of the explanatory variables can vary in time. This allows, for example, to model the effects of interventions such as changes in tax policy which gradually increases their effect over time. The Markov chain Monte Carlo algorithms powering the Bayesian inference are based on Hamiltonian Monte Carlo provided by Stan software, using a state space representation of the model to marginalise over the regression coefficients for efficient low-dimensional sampling.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

Abstract We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference for the latent states based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretised diffusion latent state processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

State space models are a flexible class of latent variable models commonly used in analysing time series data. The R package bssm is designed for Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and state equations. The package provides easy-to-use and efficient functions for fully Bayesian inference with common time series models such as basic structural time series model with exogenous covariates, simple stochastic volatility models, and discretized diffusion models, making it straightforward and efficient to make predictions and other inference in a Bayesian setting. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional parallelizable importance sampling post-correction to eliminate any approximation bias. The bssm package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package supports directly models with linear-Gaussian state dynamics with non-Gaussian observation models and has an Rcpp interface for specifying custom non-linear and diffusion models.

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo (MCMC) and MCMC based on parallel importance sampling type weighted estimators (Vihola, Helske, and Franks, 2020, doi:10.1111/sjos.12492). Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities and basic stochastic volatility models with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretised diffusion models are supported.

walker: Bayesian Generalized Linear Models with Time-Varying Coefficients

Bayesian generalized linear models with time-varying coefficients as in Helske (2020, arXiv:2009.07063). Gaussian, Poisson, and binomial observations are supported. The Markov chain Monte Carlo (MCMC) computations are done using Hamiltonian Monte Carlo provided by Stan, using a state space representation of the model in order to marginalise over the coefficients for efficient sampling. For non-Gaussian models, the package uses the importance sampling type estimators based on approximate marginal MCMC as in Vihola, Helske, Franks (2020, doi:10.

KFAS: Kalman Filter and Smoother for Exponential Family State Space Models

State space modelling is an efficient and flexible framework for statistical inference of a broad class of time series and other data. KFAS includes computationally efficient functions for Kalman filtering, smoothing, forecasting, and simulation of multivariate exponential family state space models, with observations from Gaussian, Poisson, binomial, negative binomial, and gamma distributions. See the paper by Helske (2017) doi:10.18637/jss.v078.i10 for details.

seqHMM: Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series

Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and hidden Markov models.

ggstudent: Continuous Confidence Interval Plots using t-Distribution

Provides an extension to ‘ggplot2’ (Wickham, 2016, doi:10.1007/978-3-319-24277-4) for creating two types of continuous confidence interval plots (Violin CI and Gradient CI plots), typically for the sample mean. These plots contain multiple user-defined confidence areas with varying colours, defined by the underlying t-distribution used to compute standard confidence intervals for the mean of the normal distribution when the variance is unknown. Two types of plots are available, a gradient plot with rectangular areas, and a violin plot where the shape (horizontal width) is defined by the probability density function of the t-distribution.

Rlibeemd: Ensemble Empirical Mode Decomposition (EEMD) and Its Complete Variant (CEEMDAN)

An R interface for libeemd (Luukko, Helske, Räsänen, 2016) doi:10.1007/s00180-015-0603-9, a C library of highly efficient parallelizable functions for performing the ensemble empirical mode decomposition (EEMD), its complete variant (CEEMDAN), the regular empirical mode decomposition (EMD), and bivariate EMD (BEMD). Due to the possible portability issues CRAN version no longer supports OpenMP, you can install OpenMP-supported version from GitHub: https://github.com/helske/Rlibeemd/.

tsPI: Improved Prediction Intervals for ARIMA Processes and Structural Time Series

Prediction intervals for ARIMA and structural time series models using importance sampling approach with uninformative priors for model parameters, leading to more accurate coverage probabilities in frequentist sense. Instead of sampling the future observations and hidden states of the state space representation of the model, only model parameters are sampled, and the method is based solving the equations corresponding to the conditional coverage probability of the prediction intervals. This makes method relatively fast compared to for example MCMC methods, and standard errors of prediction limits can also be computed straightforwardly.

ramcmc: Robust Adaptive Metropolis Algorithm

Function for adapting the shape of the random walk Metropolis proposal as specified by robust adaptive Metropolis algorithm by Vihola (2012) doi:10.1007/s11222-011-9269-5. The package also includes fast functions for rank-one Cholesky update and downdate. These functions can be used directly from R or the corresponding C++ header files can be easily linked to other R packages.

diagis: Diagnostic Plot and Multivariate Summary Statistics of Weighted Samples from Importance Sampling

Fast functions for effective sample size, weighted multivariate mean and variance computation, and weight diagnostic plot for generic importance sampling type results.

changer: Change R Package Name

Changing the name of an existing R package is annoying but common task especially in the early stages of package development. This package (mostly) automates this task.

Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R

Abstract Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations.

KFAS: Exponential Family State Space Models in R

Abstract State space modeling is an efficient and flexible method for statistical inference of a broad class of time series and other data. This paper describes the R package KFAS for state space modeling with the observations from an exponential family, namely Gaussian, Poisson, binomial, negative binomial and gamma distributions. After introducing the basic theory behind Gaussian and non-Gaussian state space models, an illustrative example of Poisson time series forecasting is provided.

Introducing libeemd: A program package for performing the ensemble empirical mode decomposition

Abstract The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend.