Summary of Linear Transformers
If youâve heard about Large Language Models (LLMs) and Transformers and are curious to learn more, there are already plenty of excellent blog posts, articles, and YouTube videos that explain them in great detail with amazing visualizations. In case your main interest is understanding LLMs and how Transformers work, weâd point you there first, theyâve done a fantastic job (honestly, better than us).
In this series, weâll cover Linear Transformers and State Space Models (SSMs), giving a high-level summary of their core ideas. So, if youâve come across names like Mamba or DeltaNet and wondered what they are, or if youâve asked yourself:
1) Why we moved from RNNs to Transformers?
2) Why we now seem to be circling back from Transformers toward Linear Transformers (almost like âRNNs on steroidsâ)?
then this post should be a good fit đ.
So letâs start from answering the above questions