| LinAtt |
\(\mathbf{1}_M\) |
\(\mathbf{v}_t\mathbf{k}_t^\top\) |
\(\mathbf{I}\) |
\(\mathbf{S}_t\mathbf{q}_t\) |
| RetNet |
\(\mathbf{1}_M\) |
\(\mathbf{v}_t\mathbf{k}_t^\top\) |
\(\gamma\) |
\(\mathbf{S}_t\mathbf{q}_t\) |
| GLA |
\(\mathbf{1}_M\) |
\(\mathbf{v}_t\mathbf{k}_t^\top\) |
\(\text{diag}(\sigma(\mathbf{W}\mathbf{x}_t))^{1/\tau}\) |
\(\mathbf{S}_t\mathbf{q}_t\) |
| Mamba-2 |
\(\mathbf{1}_M\) |
\(\mathbf{v}_t\mathbf{k}_t^\top\) |
\(a_t\) |
\(\mathbf{S}_t\mathbf{q}_t\) |
| GDN |
\(\mathbf{1}_M\) |
\(\mathbf{v}_t\mathbf{k}_t^\top\) |
\(a_t(\mathbf{I} - \mathbf{k}_t\mathbf{k}_t^\top)\) |
\(\mathbf{S}_t\mathbf{q}_t\) |
| Raven |
\({\mathbf{g}_t} / {\mathbf{1}^\top \mathbf{g}_t}\) |
\([\mathbf{k}_t \ \mathbf{v}_t]^\top\) |
\(\mathbf{I}\) |
\((\mathbf{S}^v_t)^\top f(\mathbf{S}^k_t\mathbf{q}_t)\) |
| GSA |
\(\mathbf{1}_M - \sigma(\mathbf{W}\mathbf{x}_t)^{1/\tau}\) |
\([\mathbf{k}_t \ \mathbf{v}_t]^\top\) |
\(\mathbf{I}\) |
\((\mathbf{S}^v_t)^\top f(\mathbf{S}^k_t\mathbf{q}_t)\) |
| ABC |
\(\text{softmax}(\mathbf{W}\mathbf{x}_t)\) |
\([\mathbf{k}_t \ \mathbf{v}_t]^\top\) |
\(\mathbf{I}\) |
\((\mathbf{S}^v_t)^\top f(\mathbf{S}^k_t\mathbf{q}_t)\) |
| SWA |
\(\mathbf{e}_t\) |
\([\mathbf{k}_t \ \mathbf{v}_t]^\top\) |
\(\mathbf{I}\) |
\((\mathbf{S}^v_t)^\top f(\mathbf{S}^k_t\mathbf{q}_t)\) |