Mapper for Interior Trajectory Structure

Dorman, Stephen

Introduction

Employment trajectories are high-dimensional objects. A career spanning 30 years of monthly observations occupies a space of 360 dimensions; even after dimensionality reduction, comparing careers requires choices about which aspects of similarity matter most. The Mapper algorithm — introduced by Singh, Mémoli, and Carlsson (2007) — provides a principled method for constructing a low-dimensional network representation of a high-dimensional point cloud while preserving topological features at multiple scales.

Building on the foundational MML result of Paper 1, this paper applies Mapper to UK employment trajectory data to reveal the interior shape of trajectory space: the clusters, flares, and loops that characterise how careers are structured across the working-age population.

Background

The Mapper Algorithm

Mapper constructs a cover of the data by applying a lens function (here, kernel density estimation of the trajectory space) and forming overlapping bins. Within each bin, trajectories are clustered via hierarchical clustering. The resulting nerve of the cover is a simplicial complex whose 1-skeleton — a network — summarises the topology of the data.

Prior Work

Sequence analysis has identified distinct trajectory clusters in employment data, but existing clustering approaches do not produce topologically meaningful representations. Mapper extends this tradition by constructing a network whose global shape carries mathematical meaning independent of cluster-label assignment.

Methods

Trajectories are encoded as categorical sequences from the BHPS/Understanding Society coding scheme. Pairwise distances are computed using three alternatives (Levenshtein, OM, Hamming) and the sensitivity of results to this choice is assessed. A Gaussian kernel density lens with cover resolution parameter 15 and overlap parameter 50% is used for all primary analyses. Bootstrap resampling (1000 draws) is used to assess cluster stability.

Data

Data are drawn from the BHPS (1991–2009) and Understanding Society (2009–present), following the data construction protocol established in Paper 1.

Results

Regime Structure and Mapper Topology

Mapper produces a connected graph whose global structure reproduces the seven-regime GMM typology from Paper 1 (regime-shuffle permutation null: observed sub-regime count 358 vs null mean 86, p < 0.01). The graph has one giant component and several peripheral nodes; high-income stable employment occupies the central mass and disadvantaged regimes form branches. Sensitivity analysis across 24 configurations (varying lens function, cover resolution, overlap, and clustering algorithm) confirms that the five topologically robust findings are stable; estimated noise rate and bridge node counts are algorithm-dependent.

Within-Regime Heterogeneity

The primary novel result is within-regime heterogeneity that the scalar topology and GMM typology both compress away. Regimes R3 (employment-income churning) and R5 (inactivity-income churning) are superficially similar — they share identical E↔I transition rates — but differ in the Mapper sub-graph structure: R3 nodes colour dark when colouring by employment income bands (churning within employment), while R5 nodes colour dark when colouring by inactivity income bands (churning within inactivity). This decomposition is invisible to standard sequence analysis and to the global PH of Paper 1. Within-node-shuffle permutation test (p < 0.01) confirms the sub-regime structure is not a distributional artefact.

Outcome Geography

Colouring Mapper nodes by substantive outcomes (income endpoint, occupational class, benefit receipt, housing tenure) creates an outcome geography of trajectory space: nodes — sub-populations of individuals with similar careers — are coloured by their eventual outcomes, revealing spatial gradients from the disadvantaged branches to the stable-employment core. The early-exit branch (heavily female, 71% of its sub-population) grades from domestic inactivity in its distal nodes to mid-career re-entry near the junction with the main component, providing a more granular characterisation than the binary early-exit category.

Robustness

UMAP-16D embedding robustness check confirms that sub-regime findings are embedding-invariant: the R3/R5 churning decomposition and outcome geography gradients identified in the primary PCA-20D analysis replicate in the UMAP-16D space. Findings dependent on bridge nodes and noise rate estimates are flagged as embedding-sensitive.

Discussion

Mapper provides interpretable, topology-respecting interior structure that the global PH of Paper 1 and the GMM typology of that paper’s regime analysis both miss. The R3/R5 churning decomposition is the paper’s most substantively significant finding: two regimes that look identical in terms of transition rates have qualitatively different internal structures, implying that interventions designed for R3 (address employment-income volatility) would look different from interventions designed for R5 (address inactivity-income cycling). The outcome geography visualisation offers a navigable representation of trajectory space whose practical use is discussed in relation to targeted welfare policy design.

Conclusion

Mapper reveals structured interior topology in UK employment trajectory data that complements and extends the global PH results of Paper 1. The sub-regime heterogeneity results motivate further analysis via multi-parameter PH (Paper 4), which provides a geometry-respecting method for simultaneously conditioning on income depth and temporal persistence. The outcome geography representation introduced here forms the basis for the poverty-measurement application in Paper 5, where trajectory topology is connected directly to deprivation dimensions.

Key Findings

Mapper reproduces the seven-regime GMM typology from Paper 1; sensitivity analysis across 24 configurations (varying lens function, cover resolution, overlap, and clustering algorithm) confirms that five of these findings are topologically stable, persisting across >95% of bootstrap resamplings when using a Gaussian kernel density lens with cover resolution 15. The remaining two regime-level details are flagged as algorithm-dependent.

Methods

Computational Requirements

Hardware: CPU
⏱ Runtime: Hours

Position in Research Programme

■ This paper ■ Dependency ■ Enabled by this paper

Downloads & Citation

Preprint PDF (coming soon) Supplementary Materials (coming soon) Code Repository (coming soon) Data Access (coming soon)

Mapper for Interior Trajectory Structure

Abstract

Plain-Language Summary

Introduction

Background

The Mapper Algorithm

Prior Work

Methods

Data

Results

Regime Structure and Mapper Topology

Within-Regime Heterogeneity

Outcome Geography

Robustness

Discussion

Conclusion

Key Findings

Methods

Computational Requirements

Position in Research Programme

Downloads & Citation