from Hacker News

DPO fine-tuning outperforms SFT

by kcorbitt on 10/2/24, 5:46 PM with 0 comments