Attention Zoo: Summary of SSMs and Transformers

An interactive guide to modern sequence models, explore architectures and recurrences across the linear-softmax landscape.

The ZOO

Readout

Decay Type (exact match, selects unique models)

1 / 23

📬 Final Note

If you feel that some linear or softmax models are missing from the Zoo, feel free to ping me and I will add them. A sample architecture block template is available — create your model’s block in the same style and send it over. You can reach out on Twitter/X, DMs are open 😉 @rshia_afz.

Also, the same recurrences and rollouts can be applied to the residual stream, resulting in Deep Delta Learning, gating, and attention residuals. Stay tuned, there will soon be another post, or an update to this one, covering upgrades to the residual stream as well.