Story of Linear Time Sequence Modeling 📚

Summary of Linear Transformers


Why Linear Transformers?

If you’ve heard about Large Language Models (LLMs) and Transformers and are curious to learn more, there are already plenty of excellent blog posts, articles, and YouTube videos that explain them in great detail with amazing visualizations. In case your main interest is understanding LLMs and how Transformers work, we’d point you there first, they’ve done a fantastic job (honestly, better than us).

In this series, we’ll cover Linear Transformers and State Space Models (SSMs), giving a high-level summary of their core ideas. So, if you’ve come across names like Mamba or DeltaNet and wondered what they are, or if you’ve asked yourself:

1) Why we moved from RNNs to Transformers?

2) Why we now seem to be circling back from Transformers toward Linear Transformers (almost like “RNNs on steroids”)?

then this post should be a good fit 😉.

So let’s start from answering the above questions