The Attention Zoo: Linear & Softmax Models Unified

An interactive guide to modern sequence models — explore architectures and recurrences across the linear-softmax landscape.


Introduction

Modern sequence models share a deep mathematical skeleton: a key-value memory written to at each step and read by a query. The differences lie in how that memory decays — and this page makes those differences interactive and visual.

Filter by attention kernel and memory decay type below. Each model card shows the architecture diagram for that model.


Interactive Explorer

Attention Type
Decay Type (exact match — selects unique models)
🔬

No models match this combination. Try a different filter.