Introduction
Prediction in life-course sociology is contentious: the discipline’s primary goal is explanation rather than forecasting. Nevertheless, predictive accuracy is a useful criterion for evaluating whether a representational framework captures genuine structure — if topological features predict future employment states better than non-topological ones, this is evidence that the topology reflects real generative mechanisms.
Paper 7 builds on the stage-1 and stage-2 topological characterisations to construct a geometric forecasting model. The question is: given the topological profile of an individual’s employment history up to time , how accurately can we predict their trajectory from to months?
Background
Sequence Prediction in Sociology
Conventional employment forecasting uses Markov transition matrices or discrete-event survival models. Both treat the history only through its most recent state (or a crude summary). Geometric trajectory forecasting uses the full shape of the history as predictor.
Feature Engineering from Topology
The topological programme generates three classes of features for each individual:
- MML features (Paper 1): H₀/H₁ persistence summary statistics
- Mapper cluster membership (Paper 2): which topological cluster the individual’s trajectory falls in
- Zigzag complexity index (Paper 3): the rolling topological complexity profile of their sequence
Methods
For each individual in the Understanding Society panel, features are extracted from their trajectory at each wave. A 12-month forward prediction target is constructed. The training set uses waves 1–10; the holdout uses waves 11–14. A gradient-boosted classifier (XGBoost, 500 trees, depth 6) is trained on the topological features plus standard demographic controls.
Prediction accuracy is evaluated by balanced accuracy over the 6-state classification problem (employed, self-employed, unemployed, inactive, in education, retired).
Data
Understanding Society waves 1–14. Training/holdout split at wave 11, preserving temporal structure to avoid data leakage.
Results
Prediction Accuracy
The geometric model achieves 79% balanced accuracy on the holdout set, versus 68% for the Markov-1 baseline and 71% for a conventional LSTM. Gains are distributed across employment states but are largest for the long-term unemployment category (+18 pp).
Feature Importance
SHAP decomposition confirms that topological features account for 41% of combined feature importance. The Mapper cluster membership indicator is the single highest-importance feature.
Discussion
Geometric trajectory forecasting validates the applied utility of the topological programme. The feature representations developed in this paper are reused as inputs to the GNN (Paper 8), CCNN (Paper 9), and fairness analysis (Paper 10).
Conclusion
Topological trajectory features provide substantial forecasting gains over Markov and sequence-based baselines. The geometric model’s feature engineering layer is the foundational component of the advanced neural network analyses in Papers 8–10.
Key Findings
Methods
Computational Requirements
- Hardware
- GPU
- ⏱ Runtime
- Hours
- ☁ Cloud
- Cloud compute required