from Hacker News

Fast Transformer Decoding: One Write-Head Is All You Need

by hislaziness on 5/30/23, 4:17 PM with 1 comments

  • by hislaziness on 5/30/23, 4:17 PM

    A more efficient way to infer with lower memory requirements.