from Hacker News

Nanochat

by bilsbie on 10/14/25, 12:58 AM with 15 comments

  • by Tepix on 10/14/25, 9:35 AM

    Amazingly, you can also do it on smaller hardware!

    From the readme:

    All code will run just fine on even a single GPU by omitting torchrun, and will produce ~identical results (code will automatically switch to gradient accumulation), but you'll have to wait 8 times longer. If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for --device_batch_size in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1. Less than that you'll have to know a bit more what you're doing and get more creative.

  • by ebbi on 10/14/25, 8:51 PM

    Can someone give me a ELI5 on what this is/does? I'm a non-coder, and recently gotten into diving into the world of AI, but I'm not sure what this is and where it sits in context with tools that I currently use (ChatGPT, Claude Code, Cursor).
  • by xnx on 10/14/25, 1:18 PM

  • by ultimatefan1 on 10/14/25, 12:02 PM

    No seagull?