Neural Networks and the Black Box

Intuitive

A neural network is a software system loosely inspired by the structure of the brain. It is made of layers of simple computational units — neurons — each of which takes a set of numbers as input, multiplies them by learned weights, sums the result, and passes it through an activation function to produce an output. Layer after layer, these operations compose into a function that can learn extraordinarily complex patterns from data: recognising faces, translating languages, predicting which benefit claimants are likely to commit fraud. The power of neural networks comes precisely from the opacity of what they learn. Unlike a regression model, where every coefficient has an interpretable meaning, a large neural network’s learned weights are inscrutable — there are typically millions of them, and no individual weight corresponds to a human-readable rule. This is the “black box” problem. When such a model is used to make decisions about welfare eligibility, child protection risk, or recidivism probability, the person affected cannot be told why the algorithm reached its conclusion — because neither the algorithm’s designers nor its operators can read it.

Intermediate

A feedforward neural network with $L$ layers computes a function $f: \mathbb{R}^d \to \mathbb{R}^k$ through a composition of affine transformations and nonlinearities:

$\mathbf{h}^{(l)} = \phi\!\left(\mathbf{W}^{(l)}\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}\right), \quad l = 1, \ldots, L$

where $\mathbf{h}^{(0)} = \mathbf{x}$ is the input, $\mathbf{W}^{(l)}$ and $\mathbf{b}^{(l)}$ are learned weight matrices and bias vectors, and $\phi$ is a nonlinear activation function (commonly ReLU: $\phi(z) = \max(0, z)$ ).

The Universal Approximation Theorem guarantees that a sufficiently wide single hidden layer can approximate any continuous function on a compact domain to arbitrary accuracy — which explains the empirical power of neural networks but also their inscrutability: they can learn any mapping, including spurious ones.

In welfare and policing applications, neural networks are typically trained on historical administrative data — past sanctions, past arrests, past child welfare interventions — which encode the decisions of a system already shaped by structural racism and poverty. The network learns to replicate those decisions at scale, and to do so in a form that is much harder to challenge legally or democratically than the original human decisions were.

Formal

Training a neural network means finding weights $\theta = \{\mathbf{W}^{(l)}, \mathbf{b}^{(l)}\}_{l=1}^L$ that minimise a loss function $\mathcal{L}(\theta)$ over a training dataset $\{(\mathbf{x}_i, y_i)\}_{i=1}^n$ . For classification with $k$ classes, the cross-entropy loss is:

$\mathcal{L}(\theta) = -\frac{1}{n}\sum_{i=1}^n \sum_{c=1}^k \mathbf{1}[y_i = c] \log \hat{p}_{ic}(\theta)$

Minimisation proceeds via stochastic gradient descent with backpropagation computing $\nabla_\theta \mathcal{L}$ through the chain rule.

The explainability crisis examined in Chapter 13 arises because neither the loss function nor the gradient-based training procedure has any obligation to produce human-interpretable representations. Post-hoc methods like SHAP (SHapley Additive exPlanations) attempt to attribute predictions to features after training, but these attributions are themselves approximations that can be gamed, and they tell us nothing about the causal structure of the underlying decision process.

Graph Neural Networks (examined as a TDA method elsewhere) address one dimension of this problem by operating natively on relational data — social networks, administrative dependency graphs, household linkage records — rather than treating each individual as an independent input vector. This structural inductive bias can produce more interpretable and more genuinely structural representations of poverty than flat-vector neural networks that ignore relational context.