from
Hacker News
Top
New
Fast Transformer Decoding: One Write-Head Is All You Need
by
hislaziness
on 5/30/23, 4:17 PM with 1 comments
by
hislaziness
on 5/30/23, 4:17 PM
A more efficient way to infer with lower memory requirements.