Paper 1

The Markov Memory Ladder

Stage 0 – Foundation In Progress

Abstract

Sequence analysis methods widely used in sociology treat employment trajectories as Markov chains of order 1 — that is, as memoryless processes in which each state depends only on the immediately preceding one. This paper introduces the Markov Memory Ladder (MML), a framework that embeds trajectories in a topological space and measures the degree to which observed sequence topology exceeds that of a Markov-1 null model. Applying the MML to UK longitudinal data from the BHPS and Understanding Society panels, I find strong evidence that employment trajectories carry persistent memory: topological features survive permutation testing at z ≈ −3.7 under the null. This result motivates a programme of higher-order methods (Papers 2–6) and provides the foundational benchmark against which topological complexity is measured throughout the research programme.

Plain-Language Summary

Most statistical models of employment over a lifetime assume that what you do next depends only on what you are doing right now — not on your full history. This paper tests that assumption rigorously. Using UK household panel data tracked over decades, I show that people's employment histories contain strong "memory": the shape of a person's career path over time is far more structured than a memoryless model would predict. The Markov Memory Ladder is the measuring instrument that detects this structure. The result sets the foundation for the rest of the research programme, which develops new mathematical tools to describe and explain that structure.

Introduction

Employment trajectories are sequences of occupational and labour-market states recorded across the working life. The dominant analytic tradition — optimal matching and its variants — treats sequence similarity as a distance problem. Topological data analysis opens a complementary question: what shape does a trajectory have?

This paper introduces the Markov Memory Ladder (MML), a framework for measuring the degree to which employment trajectory topology exceeds that of a Markov-1 null model. The MML operationalises “memory” as a topological property and provides rigorous permutation-based hypothesis tests.

Background

Employment Trajectories and Sequence Analysis

Sequence analysis in sociology dates to Abbott’s (1995) foundational work. The field has expanded substantially, but most methods remain agnostic about the generative process underlying sequences. The Markov assumption — that state transitions depend only on the current state — is rarely tested directly.

Topological Data Analysis

Persistent homology provides a multi-scale summary of the topological features of a data set. When applied to trajectory spaces, it captures loops, voids, and connected components that resist noise — properties invisible to distance-based methods.

Methods

The Markov Memory Ladder proceeds in three stages:

  1. Embedding: each trajectory is represented as a path in a Rips complex built from pairwise state-sequence distances.
  2. Measurement: persistent homology extracts the lifespan of topological features (connected components, loops) across filtration scales.
  3. Testing: observed feature lifespans are compared against a null distribution generated by randomly permuting transitions within each trajectory (preserving state marginals but destroying memory structure).

Data

Data are drawn from two UK household panel surveys:

  • BHPS (British Household Panel Survey): 1991–2009, 18 annual waves
  • Understanding Society: 2009–present, 14 waves available at time of analysis

Monthly activity sequences are constructed following the BHPS coding scheme: employed, self-employed, unemployed, inactive, in education, and retired.

Results

Markov Memory Ladder: Scalar Statistics

Under total-persistence testing, H₀ structure is order-dependent (order-shuffle p < 0.005) but not rejected by the Markov-1 null (Markov-1 p = 1.000). Both Markov-2 and label-shuffle controls behave as expected (p = 1.000 and p = 0.315 respectively). The scalar test is thus consistent with a first-order Markov generating process: observed persistence sums are within the null distribution.

The Wasserstein Reversal

Upgrading the test statistic to Wasserstein diagram-level distances — comparing the full geometric configuration of persistence features rather than collapsing them to a scalar — reverses the Markov-1 conclusion. Under Wasserstein testing (100 permutations, 2,000 landmarks), the Markov-1 null is rejected for H₀ (p = 0.002): observed persistence diagrams differ significantly in geometric configuration from Markov-generated diagrams, even though Markov surrogates produce more total persistence. This discrepancy — detectable only because both statistics were applied to the same null battery — is the paper’s central methodological finding. Total persistence masks structural differences that Wasserstein distances detect.

Seven Mobility Regimes

Gaussian mixture modelling (BIC-optimised) of the embedded trajectory space identifies seven mobility regimes: (R1) stable high-income employment, (R2) stable mid-income employment, (R3) employment-income churning, (R4) inactivity to employment transitions, (R5) inactivity-income churning, (R6) persistently low income, and (R7) mixed trajectories. The regime typology qualitatively resembles what sequence analysis finds in similar data; TDA adds multi-scale geometric characterisation and formal significance tests (ARI = 0.26 at matched k = 7 versus OM baseline).

Regime Stickiness

Overlapping 10-year career-phase window analysis confirms extreme stickiness: among 7,453 individuals starting in disadvantaged regimes (R3, R5, R6), only 416 (5.6%) escape to advantaged regimes within the observation window (11.0% under non-overlapping windows). Phase-order shuffle testing (n = 500) finds no evidence that phase ordering creates additional topology (H₀ p = 0.98, H₁ p = 0.542).

Stratification and Cross-Era Validation

Stratified Wasserstein distances reveal significant topological differences by gender, parental NS-SEC, and birth cohort; 30/50 tests survive Benjamini-Hochberg FDR correction at q < 0.05. BHPS cross-era replication (1991–2008) confirms the Wasserstein discrepancy with stronger results — Markov-1 H₀ p = 0.000 under both total persistence and Wasserstein — consistent with the longer observation window (mean 14.5 years vs 12.9 years in USoc).

Discussion

The central methodological contribution of this paper is the demonstration that two legitimate summary statistics — total persistence and Wasserstein diagram-level distance — yield opposite verdicts about whether the Markov-1 null accounts for trajectory topology. This is not a paradox: total persistence is a marginal summary that can be matched by Markov surrogates that produce many short-lived features; Wasserstein distance compares the full geometric configuration of the diagram and detects structural differences invisible to the scalar. The implication for TDA hypothesis testing in empirical applications is direct: choosing a test statistic is not a technical formality but a substantive decision.

Finding no evidence for H₁ loops (cycles) is the study’s cleanest negative result. The absence of significant cyclical structure in embedding space is a substantive finding: it suggests that employment churning produces fragmentation (H₀, cluster separation) but does not create persistent loop topology of the kind that would be expected from absorbing cycles where return is structurally guaranteed. This distinction — between fragmented and genuinely cyclic topology — was not available to standard sequence analysis methods.

Conclusion

UK employment trajectories exhibit robust topological structure that cannot be fully accounted for by first-order Markov dynamics — but the conclusion depends on the test statistic. Total persistence says Markov-1 is sufficient; Wasserstein distance says it is not. The Markov Memory Ladder formalises this as a graded hypothesis-testing framework whose outputs provide the baseline calibration for higher-order topological methods in Papers 2–4.

Key Findings

Methods

Computational Requirements

Hardware
CPU
⏱ Runtime
Minutes

Position in Research Programme

■ This paper ■ Enabled by this paper

Downloads & Citation