from Hacker News

Fast Transformer Decoding: One Write-Head Is All You Need

by hislaziness on 5/30/23, 4:17 PM with 1 comments

by hislaziness on 5/30/23, 4:17 PM
A more efficient way to infer with lower memory requirements.