by FromTheArchives on 10/22/25, 12:36 PM with 96 comments
by matthewdgreen on 10/23/25, 1:10 AM
Sandboxing the agent hardly seems like a sufficient defense here.
by almosthere on 10/23/25, 5:36 PM
A massive productivity boost I get is using to do server maintenance.
Using gcloud compute ssh, log into all gh runners and run docker system prune, in parellel for speed and give me a summary report of the disk usage after.
This is an undocumented and underused feature of basic agentic abilities. It doesn't have to JUST write code.
by pdntspa on 10/24/25, 2:35 AM
What I see are three tiny little projects that do one thing.
That is boring. We already know the LLMs are good at that.
Let's see it YOLO into a larger codebase with protocols and a growing feature set without making a complete mess of things.
So far CC has been great for letting me punch above my weight but the few times I let it run unattended it has gone against conventions clearly established in AGENTS.md and I wasn't there to keep it on the straight and narrow. So a bunch more time had to be spent untangling the mess it created.
by corv on 10/24/25, 4:04 PM
by mike_hearn on 10/23/25, 6:11 PM
The reason they don't do that is because some popular and necessary apps use it. Like Chrome.
However, I tried this approach too and it's the wrong way to go IMHO, quite beyond the use of undocumented APIs. What you actually want to do is virtualize, not sandbox.
by stuaxo on 10/23/25, 11:42 AM
I reckon something lie Qubes could work fairly well.
Create a new Qube and have control over network connectivity, and do everything there, at the end copy the work out and destroy it.
by burgerquizz on 10/24/25, 5:28 AM
by lacker on 10/22/25, 10:27 PM
by jampa on 10/23/25, 6:31 PM
Setting up "permissions.allow" in `.claude/settings.local.json` takes minimal time. Claude even lets you configure this while approving code, and you can use wildcards like "Bash(timeout:*)". This is far safer than risking disasters like dropping a staging database or deleting all unstaged code, which Claude would do last week, if I were running it in the YOLO mode.
The worst part is seeing READMEs in popular GitHub repos telling people to run YOLO mode without explaining the tradeoffs. They just say, "Run with these parameters, and you're all good, bruh," without any warning about the risks.
I wish they could change the parameter to signify how scary it can be, just like React did with React.__SECRET_INTERNALS_DO_NOT_USE_OR_YOU_WILL_BE_FIRED (https://github.com/reactjs/react.dev/issues/3896)
by zxilly on 10/23/25, 7:19 PM
by igor47 on 10/22/25, 7:52 PM
by danielbln on 10/22/25, 9:02 PM
by boredtofears on 10/23/25, 5:30 PM
When I’m satisfied with the spec, I turn on “allow all edits” mode and just come back later to review the diff at the end.
I find this works a lot better than hoping I can one shot my original prompt or having to babysit the implementation the whole way.
by ZeroConcerns on 10/23/25, 7:15 PM
That would mean that their, undoubtedly extremely interesting, emails actually get met with more than a "450 4.1.8 Unable to find valid MX record for sender domain" rejection.
I'm sure this is just an oversight being caused by obsolete carbon lifeforms still being in charge of parts of their infrastructure, but still...
by catigula on 10/23/25, 2:56 AM