from Hacker News

PyTorch Monarch

by jarbus on 10/23/25, 10:15 AM with 42 comments

by chandureddyvari on 10/23/25, 1:11 PM
Interesting - this seems to target a different layer than services like Tinker (https://thinkingmachines.ai/blog/announcing-tinker/). Monarch provides the infrastructure primitives while Tinker is a managed finetuning service. Could someone build something like Tinker on top of Monarch?
by pjmlp on 10/23/25, 11:55 AM
Apparently PyTorch oxidation has started.
> Monarch is split into a Python-based frontend, and a backend implemented in Rust.
Other than that, looks like a quite interesting project.
by alyxya on 10/23/25, 12:33 PM
I made my own single controller PyTorch extension [1], though mines doesn't yet support cross node communication. I found it interesting to compare how Monarch makes things performant. I believe Monarch also uses cloudpickle for code to be shared among all nodes, which is probably the only way to performantly have various nodes execute work as that ends up being a one time setup cost. I found the fanning out of sending messages from the single controller to be really interesting, so the controller is unlikely to be the bottleneck besides any synchronous operations.
As far as things that might be a performance loss here, one thing I'm wondering is if custom kernels are supported. I'm also wondering how much granularity of control there is with communication between different actors calling a function. Overall, I really like this project and hope to see it used over multi-controller setups.
[1] https://github.com/alyxya/mycelya-torch
by valzam on 10/23/25, 11:59 AM
I assume this is similar to Ray?
by milancurcic on 10/23/25, 12:22 PM
Cool! Essentially Fortran coarrays from 2008.
by porridgeraisin on 10/23/25, 12:58 PM
> This lets us avoid single-host bottlenecks, effectively using the whole mesh as a distributed cluster for message forwarding. (Cite scalability numbers here.)
In case someone that can fix this is reading here
by fadedsignal on 10/23/25, 2:27 PM
It is a nice project. I have questions.
- Is this similar to openMPI?
- How is a mesh established? Do they need to be on the same host?
by semessier on 10/23/25, 5:31 PM
this could become a major thing in coarray world, but the issues start already:
> ...Note that this does not support tensor engine, which is tied to CUDA and RDMA (via ibverbs).
I.e. yet another CUDA married approach: the issue is not ibverbs but the code shows they use GPUDirect RDMA, going from there this can only get worse - more CUDA dependencies. There would have been OpenUCX.
by logicchains on 10/23/25, 12:36 PM
This seems strictly less powerful than Jax, which comes with a powerful compiler that optimises how cross-node communication is conducted.
by bjourne on 10/23/25, 8:25 PM
> Monarch lets you program distributed systems the way you’d program a single machine, hiding the complexity of distributed computing:
There are some infamous tech based on the "hiding" paradigm. PHP comes to mind. By hiding how the http request/response cycle actually works it fostered a generation of web developers who didn't know what a session cookie was, resulting in login systems that leaked like a sieve. Distributed computing is complicated. There are many parameters you need to tweak and many design decisions you need to take to make distributed model training run smoothly. I think explicit and transparent architectures are way better. Distributed model training shouldn't "feel" like running on a single device because it isn't.
by jonapro on 10/23/25, 11:58 AM
Beowulf then.
by nothrowaways on 10/23/25, 12:51 PM
FB should create a pytorch foundation and set it free before they fuck it up.
by SomaticPirate on 10/23/25, 2:12 PM
"Our Rust-based backend facilitates our performance, scale, and robustness — we amply use Rust’s fearless concurrency in Monarch’s implementation"
Found a few typo's. The em dash makes me suspect an LLM was involved in proofreading