From sequences to variables - Rethinking the relationship between sequences and outcomes

Abstract Sequence analysis (SA) has gained increasing interest in social sciences for the holistic analysis of life course and other longitudinal data. The usual approach is to construct sequences, calculate dissimilarities, group similar sequences with cluster analysis, and use cluster membership as a dependent or independent variable in a linear or nonlinear regression model. This approach may be problematic as the cluster memberships are assumed to be fixed known characteristics of the subjects in subsequent analysis.

Minimum description length based hidden Markov model clustering for life sequence analysis

Abstract In this article, a model-based method for clustering life sequences is suggested. In the social sciences, model-free clustering methods are often used in order to find typical life sequences. The suggested method, which is based on hidden Markov models, provides principled probabilistic ranking of candidate clusterings for choosing the best solution. After presenting the principle of the method and algorithm, the method is tested with real life data, where it finds eight descriptive clusters with clear probabilistic structures.

Clustering