Catastrophic Forgetting
Catastrophic Forgetting
Section titled “Catastrophic Forgetting”Training a neural network on new data causes rapid, severe loss of performance on previously learned data. The network doesn’t gradually forget — it catastrophically overwrites old knowledge with new knowledge. Also called “catastrophic interference.”
Intuition
Section titled “Intuition”Think of a neural network as a shared whiteboard. When you learn task A, you write on the whiteboard. When you then learn task B, you erase parts of A to make room for B — not because you intended to, but because the same parameters (whiteboard space) must represent both tasks, and gradient descent doesn’t know which parts of A are still important.
The deeper issue is that neural networks store knowledge in distributed representations — information about task A is spread across all weights, not localised to a specific subset. When you update weights to improve on task B, every weight change is a potential corruption of task A knowledge. The more different A and B are, the more the updates conflict.
This is qualitatively different from human forgetting, which is gradual and graceful. A neural network can go from 95% accuracy on task A to 20% after a few batches of task B. The knowledge isn’t gradually fading — it’s being actively overwritten.
Manifestation
Section titled “Manifestation”- Performance on old tasks drops sharply when training on new data — not a gradual decline but a cliff
- The drop is proportional to how different the new data distribution is from the old one
- Fine-tuning a pretrained model on a small dataset can destroy the general capabilities the model spent millions of examples learning
- In RL: replay buffers exist specifically to mitigate this — without replay, the agent forgets how to handle states it hasn’t visited recently
Where It Appears
Section titled “Where It Appears”- Q-learning (
q-learning/): without a replay buffer, the agent only trains on recent transitions, forgetting Q-values for states it visited earlier → replay buffers are the primary mitigation - NN training (
nn-training/): fine-tuning is a controlled form of this problem — learning rate warmup, freezing early layers, and small learning rates are all strategies to limit forgetting - Policy gradient (
policy-gradient/): on-policy methods (A2C, PPO) discard data after use, so the policy only reflects recent experience — but the shared value function can still suffer from forgetting - Contrastive learning (
contrastive-self-supervising/): fine-tuning CLIP or SimCLR representations on a downstream task can damage the general-purpose features — careful unfreezing schedules help
Solutions at a Glance
Section titled “Solutions at a Glance”| Solution | Mechanism | Where documented |
|---|---|---|
| Replay buffers | Store and replay old experiences alongside new ones | atomic-concepts/rl-specific/replay-buffers.md |
| EWC (Elastic Weight Consolidation) | Penalise changes to weights that were important for old tasks | (Kirkpatrick et al., 2017) |
| Frozen early layers | Only fine-tune the last few layers, preserving learned features | (standard fine-tuning practice) |
| Learning rate warmup / small LR | Limit the magnitude of updates during fine-tuning to reduce overwriting | atomic-concepts/optimisation-primitives/learning-rate-warmup.md |
| Data mixing | Train on a mix of old and new data to maintain performance on both | (standard practice) |
| Progressive networks | Add new capacity for new tasks instead of reusing old parameters | (Rusu et al., 2016) |
Historical Context
Section titled “Historical Context”McCloskey & Cohen (1989) and Ratcliff (1990) first demonstrated catastrophic interference in connectionist networks, showing that sequential training on different patterns destroyed previously learned associations. The finding challenged the prevailing optimism about neural networks as general learning systems. For decades, it was considered a fundamental limitation. French (1999) wrote an influential review arguing that the problem was inherent to distributed representations. In RL, Lin (1992) introduced experience replay specifically to address forgetting, and the DQN paper (Mnih et al., 2015) showed that replay was essential for stable deep RL — without it, the network catastrophically forgets Q-values for earlier states. The problem has renewed urgency in the era of foundation models, where fine-tuning risks destroying expensive pretrained capabilities.