| Model | # Params (M) |
LMB. | LMB. | PIQA | Hella. | Wino. | ARC-e | ARC-c | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| ppl\(\downarrow\) | acc\(\uparrow\) | acc\(\uparrow\) | acc\(\uparrow\) | acc\(\uparrow\) | acc\(\uparrow\) | acc\(\uparrow\) | acc\(\uparrow\) | ||
| Transformer | |||||||||
| w. RoPE | 340 | 42.0 | 31.0 | 64.4 | 30.2 | 51.0 | 44.3 | 18.7 | 39.9 |
| w. Gate (FoX) | 376 | 48.1 | 30.6 | 64.9 | 30.7 | 51.1 | 44.7 | 18.9 | 40.1 |
| SSM | |||||||||
| GLA | 400 | 42.1 | 30.7 | 64.4 | 30.1 | 52.7 | 43.8 | 19.6 | 40.2 |
| GSA | 399 | 44.1 | 30.3 | 64.9 | 30.7 | 51.5 | 45.6 | 20.5 | 40.5 |
| GDN | 475 | 40.1 | 31.6 | 65.6 | 31.4 | 50.2 | 45.7 | 19.3 | 40.6 |
| Mamba-2 | 382 | 43.0 | 29.9 | 65.0 | 31.5 | 51.2 | 47.5 | 20.5 | 40.1 |
| SWA | 374 | 40.7 | 30.5 | 64.5 | 30.4 | 51.6 | 44.9 | 18.6 | 40.0 |
| Raven | 424 | 41.0 | 32.7 | 64.1 | 30.3 | 51.7 | 43.9 | 18.4 | 40.2 |
| Transformer | |||||||||
| w. RoPE | 693 | 18.6 | 41.4 | 66.3 | 34.3 | 52.2 | 49.9 | 21.9 | 44.5 |
| w. Gate (FoX) | 694 | 25.3 | 38.2 | 67.8 | 34.4 | 51.7 | 49.9 | 21.5 | 44.1 |
| SSM | |||||||||
| GLA | 892 | 23.6 | 38.25 | 66.97 | 33.37 | 52.41 | 48.53 | 21.25 | 43.46 |
| GSA | 750 | 27.4 | 34.68 | 66.32 | 32.10 | 51.62 | 46.63 | 19.62 | 41.83 |
| GDN | 892 | 21.3 | 39.4 | 68.1 | 35.2 | 53.0 | 52.5 | 22.1 | 45.1 |
| Mamba-2 | 712 | 24.6 | 36.0 | 68.1 | 35.4 | 52.6 | 52.3 | 22.3 | 44.5 |
| SWA | 693 | 20.8 | 38.8 | 67.9 | 34.2 | 52.8 | 49.3 | 21.2 | 44.0 |
| Raven | 792 | 26.0 | 38.2 | 67.0 | 33.2 | 50.9 | 49.2 | 21.0 | 43.3 |