From sequences to variables - Rethinking the relationship between sequences and outcomes

By Satu Helske, Jouni Helske, Guilherme K. Chihaya in Clustering

March 8, 2023



Sequence analysis (SA) has gained increasing interest in social sciences for the holistic analysis of life course and other longitudinal data. The usual approach is to construct sequences, calculate dissimilarities, group similar sequences with cluster analysis, and use cluster membership as a dependent or independent variable in a linear or nonlinear regression model. This approach may be problematic as the cluster memberships are assumed to be fixed known characteristics of the subjects in subsequent analysis. Furthermore, often it is more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for these issues may lead to wrong conclusions about the nature of the studied relationships. In this paper, we bring forward and discuss the problems of the “traditional” use of SA clusters and compare four approaches for different types of data. We conduct a simulation study and an empirical study, demonstrating the importance of considering how sequences and outcomes are related and the need to adjust the analysis accordingly. In many typical social science applications, the traditional approach is prone to result in wrong conclusions and so-called position-dependent approaches such as representativeness should be preferred.

Posted on:
March 8, 2023
1 minute read, 198 words
Life course data
See Also:
Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data
Analysing Complex Life Sequence Data with Hidden Markov Modelling
Minimum description length based hidden Markov model clustering for life sequence analysis