Generalization isn’t one thing
Humans don’t just interpolate between things we’ve seen — we extrapolate to genuinely novel combinations. Hear a sentence with an unfamiliar word in a familiar slot, and you still parse the role it’s playing. The paper asks whether Transformers, famous for their abstraction abilities, can do the same thing.
The answer is layered. Transformers comfortably outperform LSTMs at standard generalization and at recombining familiar fillers into unfamiliar role-orderings. But on the hardest test — a filler the inner model has never once seen during training — every Transformer-based architecture collapses to roughly the same middling accuracy as the plain LSTM. Only a much older, slower, reinforcement-learning-based indirection model holds up.
What “indirection” actually means here
The paper borrows the term from computer science: a pointer doesn’t hold a value directly, it holds the address of a value stored somewhere else. Kriete et al. (2013) proposed that working memory does something similar — an abstract role (agent, verb, patient) can point to whichever concrete word currently fills that role, rather than the role and the word being fused into one inseparable representation.
Three sentence roles, three arbitrary fillers:
Swap in a brand-new filler the model has never seen, and an indirection-style system only needs to update which address the pointer resolves to — not re-learn the whole bound representation from scratch.
Four tasks, increasing difficulty
| Task | What changes between train & test |
|---|---|
| SG | Standard Generalization — same fillers, new role-orderings |
| SA | Spurious Anticorrelation — familiar fillers, never co-occurring together before |
| FC | Full Combinatorial — a filler tested in a role it never occupied in training |
| NF | Novel Filler — a word the inner model never encountered in any role, at all |
Cortical stripes vs. vector algebra
The original indirection model and its later, faster descendant both implement the same pointer idea — but in very different substrates.
PFC Layers (Kriete et al.)
A direct computational model of prefrontal cortex anatomy: sparsely-interconnected neuron “stripes,” each implemented as its own recurrent layer, gated by a separate basal-ganglia module that decides when a stripe updates.
Biologically plausible by design — and expensive. Many interacting recurrent parts simulating neural dynamics over time make this slow to train and test.
Holographic Reduced Representations
Jovanovich’s (2017) replacement: role–filler binding done with circular convolution on fixed-size vectors, and approximate inverse operations to unbind. No simulated neurons, no recurrent stripe dynamics — just vector math.
Mullinax (2020) kept HRRs for role encoding but wrapped them in an LSTM-based word embedder, forming the OL/IND model used as this paper’s gold-standard baseline.
Five architectures, one shared encoder/decoder skeleton
Every model reads and reproduces three-word sentences through a nested outer/inner structure: an outer component spells words letter-by-letter, an inner component binds the resulting word-vectors into a sentence and back. The authors swap LSTM and Transformer components into both slots, then compare against the older indirection model.
Only the four LSTM/Transformer combinations were newly trained, all supervised (Adam/Nadam, MSE + BCE loss). OL/IND was not rebuilt for this paper — its reinforcement-learning training loop (Q-learning-gated stripes) doesn’t slot cleanly into the same supervised pipeline, which is likely why an OT/IND combination was never attempted either.
How a sentence actually moves through the pipeline
This is the architecture every model shares — what differs is purely whether the outer spelling component and the inner role-binding component are built from LSTM layers, Transformer blocks, or (in the baseline) an indirection mechanism.
Where the cracks show up
| Task | OL/IND | OL/IL | OL/IT | OT/IL | OT/IT |
|---|---|---|---|---|---|
| SG / SA | ~100% | ~90–100% | ~100% | 75–93% | ~100% |
| FC | ~100% | <40% | ~85–92% | <20% | ~100% |
| NF | ~100% | ~55–65% | ~55–95% | ~55–72% | ~58–72% |
Figures approximated from the plotted bars in Figure 2; word-level and letter-level values are merged into ranges per cell for brevity.
Three findings stand out. First, the outer-Transformer/inner-LSTM combination (OT/IL) actually underperforms the all-LSTM model on the easy tasks — the richer Transformer-produced embeddings seem to confuse a plain LSTM trying to bind them. Second, any model with an inner Transformer (OL/IT, OT/IT) handles FC well, contradicting earlier assumptions that one-hot/HRR-style representational interference was unavoidable on that task. Third, and most importantly, every non-indirection model — LSTM or Transformer, in any combination — lands in roughly the same 55–72% band on Novel Filler, while OL/IND stays near 100%.
Attention learns abstraction. It doesn’t learn pointers.
Self-attention is enough to recombine known fillers into unfamiliar arrangements — that's a genuine win over LSTMs, and it's why OL/IT and OT/IT dominate the FC task. But self-attention has no mechanism analogous to a memory address: when a filler has never appeared in training at all, there's no pointer to redirect, so every attention-based model falls back to roughly the same degraded performance as a recurrent network.
The indirection model's advantage isn't free, though — it trains via reinforcement learning rather than backprop, which the authors note makes it considerably slower than any of the supervised models compared here.