TDA Method

The Markov Memory Ladder

Drafting

What Is the Markov Memory Ladder?

The Markov Memory Ladder (MML) is a framework for measuring how much memory a sequence carries — not in the colloquial sense, but in a precise topological sense. A sequence has no memory if each step depends only on the current position (Markov-1). It has memory if what came before continues to influence the trajectory as a whole. The MML measures the degree of this memory as a graded topological property: how much more structured is the observed trajectory topology compared to what a Markov-1 process would produce?

The metaphor of a ladder captures the graded nature of the measurement: each rung corresponds to a higher order of memory (Markov-2, Markov-3, …), and the MML tells you where on the ladder the observed data sits.

Mathematical Formulation

Let x=(x1,x2,,xT)\mathbf{x} = (x_1, x_2, \ldots, x_T) be a categorical sequence over state set SS. A Markov-kk null model preserves the kk-th order transition matrix and sequence length but randomises the sequence beyond those constraints.

The MML memory score for dimension pp is:

MMLp(x)=(b,d)Dgmp(x)(db)EMarkov-1[(b,d)(db)]SDMarkov-1[(b,d)(db)]\text{MML}_p(\mathbf{x}) = \frac{\sum_{(b,d) \in \text{Dgm}_p(\mathbf{x})} (d - b) - \mathbb{E}_{\text{Markov-1}}[\sum_{(b,d)} (d-b)]}{\text{SD}_{\text{Markov-1}}[\sum_{(b,d)} (d-b)]}

This is a zz-score: the observed sum of Hp_p lifespans relative to the mean and standard deviation under the Markov-1 permutation null. Values significantly below zero indicate that the observed topology is less complex than random (suppressed memory); significantly above zero indicates excess topological memory.

The test statistic for Paper 1’s primary finding is MML13.7\text{MML}_1 \approx -3.7: observed H₁ complexity is 3.7 standard deviations below the Markov-1 null — not random noise, but structured, sub-random topology that is consistent with persistent patterns of employment concentration.

Python Code Stub

import numpy as np
from typing import Callable

def mml_score(
    trajectory: list[int],
    compute_persistence_sum: Callable[[list[int]], float],
    n_permutations: int = 1000,
    rng: np.random.Generator | None = None,
) -> float:
    """
    Compute the Markov Memory Ladder z-score for a single trajectory.

    Parameters
    ----------
    trajectory            : sequence of integer state codes
    compute_persistence_sum : function mapping a sequence to sum of H1 barcode lengths
    n_permutations        : number of draws from the Markov-1 null
    rng                   : NumPy random generator (for reproducibility)

    Returns
    -------
    z-score relative to Markov-1 null
    """
    rng = rng or np.random.default_rng()
    observed = compute_persistence_sum(trajectory)

    # Build empirical transition matrix from the observed trajectory
    states = sorted(set(trajectory))
    state_index = {s: i for i, s in enumerate(states)}
    n_states = len(states)
    counts = np.zeros((n_states, n_states), dtype=float)
    for from_s, to_s in zip(trajectory[:-1], trajectory[1:]):
        counts[state_index[from_s], state_index[to_s]] += 1

    # Row-normalise to get transition probabilities; rows with no outgoing
    # transitions fall back to the empirical marginal distribution
    row_sums = counts.sum(axis=1, keepdims=True)
    marginal = counts.sum(axis=0)
    marginal /= marginal.sum()
    trans_probs = np.where(row_sums > 0, counts / np.maximum(row_sums, 1), marginal)

    null_scores = []
    seq_len = len(trajectory)
    for _ in range(n_permutations):
        # Sample a Markov-1 sequence of the same length starting from trajectory[0]
        null_seq = [trajectory[0]]
        current = state_index[trajectory[0]]
        for _ in range(seq_len - 1):
            current = rng.choice(n_states, p=trans_probs[current])
            null_seq.append(states[current])
        null_scores.append(compute_persistence_sum(null_seq))

    mean_null = np.mean(null_scores)
    sd_null = np.std(null_scores, ddof=1)
    return (observed - mean_null) / sd_null if sd_null > 0 else 0.0

Application to the Research Programme

The MML is the foundational framework of the entire programme, introduced in Paper 1. It provides:

  1. The primary test of topological memory in UK employment trajectories (z ≈ −3.7)
  2. The permutation null model framework used in all subsequent hypothesis tests
  3. The individual-level MML score that serves as a feature in the forecasting models (Papers 7–9)

An extended application in Paper 6 uses the MML framework to compare parent–offspring trajectory topology, demonstrating that the intergenerational transmission coefficient can be estimated using the same permutation infrastructure.