Skip to content

Training Oscillation

Two adversarially coupled components chase each other without converging — the generator adapts to fool the discriminator, the discriminator adapts to catch the generator, and the cycle repeats without either reaching a stable equilibrium. Also called “oscillatory dynamics” or “non-convergence” in game-theoretic settings.

Imagine two chess players who can only learn by playing each other. Player A discovers a winning strategy. Player B adapts to counter it. Player A then adapts to counter the counter. Neither player ever settles into a stable strategy — they keep cycling through a sequence of moves and counter-moves. If they’re evenly matched, this cycle continues forever.

GANs are a two-player minimax game, and gradient descent on such games doesn’t have the same convergence guarantees as gradient descent on a single loss function. In a single-objective setting, the loss landscape has a clear “downhill” direction. In a minimax game, there may be no fixed point that both players converge to — the gradient field can rotate around the equilibrium rather than pointing toward it. Each player’s update makes the other player’s current state worse, and the combined dynamics spiral.

The problem is amplified when one player is much stronger than the other. If the discriminator is too powerful, it provides no useful gradient to the generator (gradients vanish or point in erratic directions). If the generator is too strong, the discriminator can’t keep up and the generator overfits to its current weaknesses.

  • Generator and discriminator losses oscillate in anti-phase — when one improves, the other degrades, and vice versa
  • Generated sample quality fluctuates — good epochs alternate with bad epochs rather than steadily improving
  • FID/IS metrics oscillate rather than decreasing monotonically
  • No clear “convergence” — training can run for millions of steps without reaching a stable state
  • Learning rate sensitivity: small changes in learning rate ratios between the two networks dramatically change training dynamics
  • GANs (gans/): the canonical setting — the G-D minimax game has oscillatory dynamics by nature; WGAN’s Wasserstein objective and hinge loss both help by providing smoother, more informative gradients
  • Policy gradient (policy-gradient/): not directly adversarial, but the interplay between policy and value function updates can create oscillation, especially when the value function can’t keep up with rapid policy changes
  • Multi-agent RL: when multiple agents learn simultaneously, each agent’s environment is non-stationary (other agents are changing), creating oscillatory dynamics similar to GANs
SolutionMechanismWhere documented
WGAN / Wasserstein distanceSmoother loss landscape with less rotational dynamicsgans/
Hinge lossSaturates the discriminator at a margin, preventing it from becoming too stronggans/, atomic-concepts/loss-functions/hinge-loss.md
Spectral normalisationConstrains discriminator Lipschitz constant, balancing the two playersatomic-concepts/regularisation/spectral-normalisation.md
Two-timescale updatesTrain discriminator multiple steps per generator step (or use different learning rates)gans/
Gradient penaltyRegularises the discriminator’s gradient, smoothing the loss landscapeatomic-concepts/regularisation/gradient-penalty.md
EMA of generator weightsAverage generator weights over time to smooth oscillatory weight trajectoriesatomic-concepts/optimisation-primitives/exponential-moving-average.md

Oscillatory dynamics in adversarial training were observed from the earliest GAN experiments (Goodfellow et al., 2014). The theoretical analysis came from game theory: simultaneous gradient descent on minimax games was known to be non-convergent in general (the dynamics rotate around Nash equilibria rather than converging to them). Mescheder et al. (2018) provided a thorough spectral analysis showing that GAN training dynamics have eigenvalues with large imaginary components, which corresponds to rotational (oscillatory) dynamics. This understanding motivated the shift toward regularised objectives (WGAN-GP, spectral normalisation) that reduce the rotational component. The instability of adversarial training was ultimately one of the major motivations for the field’s migration toward diffusion models, which replace the adversarial game with a stable regression objective.