by djoldman on 9/22/25, 6:21 PM with 14 comments
by robots0only on 9/22/25, 8:59 PM
by thesz on 9/22/25, 9:20 PM
> This paper addresses the challenge by asking: how can we trade off more compute for less data?
Autoregressive models are not matched by compute and this is the major drawback.There is evidence that training RNN models that compute several steps with same input and coefficients (but different state) lead to better performance. It was shown in a followup to [1] that performed ablation study.
[1] https://arxiv.org/abs/1611.06188
They fixed number of time steps instead of varying it, and got better results.
Unfortunately, I forgot the title of that ablation paper.
by smokel on 9/22/25, 7:58 PM
Edit: from the source [1], this quote pretty much sums it all up: "Our 2022 paper predicted that high-quality text data would be fully used by 2024, whereas our new results indicate that might not happen until 2028."
[1] https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-...
by blurbleblurble on 9/22/25, 7:46 PM