from
Hacker News
Top
New
DPO fine-tuning outperforms SFT
by
kcorbitt
on 10/2/24, 5:46 PM with 0 comments