by swah on 2/3/26, 10:00 AM with 378 comments
by heikkilevanto on 2/8/26, 10:33 AM
Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!
by manuelabeledo on 2/8/26, 2:12 PM
> LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself. […] What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.
How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.
by anon946 on 2/8/26, 6:02 AM
by matheus-rr on 2/8/26, 9:40 AM
The generation step changed. The maintenance step didn't. And most codebases spend 90% of their life in maintenance mode.
The real test of whether prompts become a "language" is whether they become versioned, reviewed artifacts that teams commit to repos. Right now they're closer to Slack messages than source files. Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold.
by tomaytotomato on 2/7/26, 10:59 PM
"Generate a Frontend End for me now please so I don't need to think"
LLM starts outputting tokens
Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use
Then out of a shadowy alleyway a man in a trenchcoat approaches
"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"
"How much?"
"Oh that'll be $200.... and your muscle memory for running maven commands"
"Shut up and take my money"
----- 5 months later, washed up and disconnected from cloud LLMs ------
"Anyone got any spare tokens I could use?"
by WoodenChair on 2/8/26, 4:10 AM
https://www.observationalhazard.com/2025/12/c-java-java-llm....
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
by ekropotin on 2/8/26, 3:50 AM
by nly on 2/7/26, 11:35 PM
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Give me a whiteboard and a fellow engineer anyday
by toprerules on 2/7/26, 10:52 PM
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
by DavidPiper on 2/8/26, 6:11 AM
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
by voxleone on 2/8/26, 2:08 PM
A practical way to get better results is to stop prompting with prose and start providing explicit models of what we want. In that sense, UML-like notations can act as a bridge between human intent and machine output. Instead of:
“Write a function to do X…”
we give:
“Here’s a class diagram + state machine; generate safe C/C++/Rust code that implements it.”
UML is already a formal, standardized DSL for software structure. LLMs have no trouble consuming textual forms (PlantUML, Mermaid, etc.) and generating disciplined code from them. The value isn’t diagrams for humans but constraining the model’s degrees of freedom.
by frigg on 2/8/26, 12:25 PM
What did Javascript/Python do to Java? They are not interchangeable nor comparable. I don't think Federico's opinion is worth reading further.
by PunchyHamster on 2/8/26, 7:33 AM
by gloosx on 2/8/26, 2:11 PM
Beginner programmers want: "make this feature"
Experienced devs want: control over memory, data flow, timing, failure modes
That is why abstractions feel magical at first and suffocating later which sparks this whole debate.
by 1zael on 2/8/26, 7:03 AM
by DaedalusII on 2/8/26, 8:41 AM
For instance: Here is an email from my manager at 1pm today. Open the policy document he is referring to, create a new version, and add the changes he wants. refer to the entire codebase (our company onedrive/google drive/dropbox whatever) to make sure it is contextually correct.
>Sure, here is the document for your review
Great, reply back to manager with attachment linked to OneDrive
by kazinator on 2/7/26, 11:41 PM
by omarreeyaz on 2/8/26, 11:28 AM
by freetonik on 2/8/26, 11:36 AM
by notepad0x90 on 2/8/26, 9:12 PM
In the end, you have text typed by humans, that is lengthy. and it might contain errors in logic, contradictions, unforeseen issues in the instructions. And the same processes and tooling used for syntactic code might need to apply to it. You will need to version control your prompts for example.
LLMs solve the labor problem, not the management problem. You have to spend a lot of time and effort with pages and pages of LLM prompts, trying to figure out which part of the prompt is generating which part of your code base. LLMs can debug and troubleshoot, but they can't debug and troubleshoot your prompts for you. I doubt they can take their own output, generated by multiple agents and lots of sessions and trace it all back to what text in your prompt caused all the mess either.
On one hand, I want to see what this experimentation will yield, on the other hand, it had better not create a whole suite of other problems to solve just to use it.
My confusion really is when experienced programmers advocate for this stuff. Actually typing in the code isn't very hard. I like the LLM-assistance aspect of figuring out what to actually code, and do some research. But actually figure out what code to type in, sure LLMs save time, but not that much time. getting it to work, debugging, troubleshooting, maintaining, those tend to be the pain-points.
Perhaps there are shops out there that just crank out lots of LoC, and even measure developer performance based on LoC? I can see where this might be useful.
I do think LLM-friendly high-level languages need to evolve for sure. But the ideal workflow is always going to be a co-pilot type of workflow. Humans researching and guiding the AI.
Psychologically, until AI can maintain it's own code, this is a really bad idea. Actually typing out the code is extremely important for humans to be able to understand it. Or if someone wrote the code, you have to write something that is part of that code base and figure out how things fit together, AI can't do that for you, if you're still maintaining the codebase in any capacity.
by QuadrupleA on 2/8/26, 5:52 AM
Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)
LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.
I'll take the gloriously reliable classical compute world to compile my stuff any day.
by asim on 2/8/26, 12:09 PM
by euroderf on 2/8/26, 2:29 PM
by kaapipo on 2/8/26, 9:09 AM
by podgorniy on 2/8/26, 4:53 PM
I discovered that it is not trivial to conceptualize app to that extent of clarity which is required for deterministic output of LLM. It's way easier to say than to actually implement by yourself (that's why examples are so interesting to see).
Backwards dynamics when you get spec/doc based on the source code does not work good enough.
by AlexeyBrin on 2/8/26, 1:04 AM
I can take some C or Fortran code from 10 years ago, build it and get identical results.
by BudapestMemora on 2/8/26, 10:09 AM
Ask yourself "Computer memory and disk are also not 100% reliable , but we live with it somehow without man-in-the-middle manual check layer, yes?" Answer about LLM will be the same, if good enough level of similarity/same asnwers is achieved.
by Akef on 2/8/26, 1:54 PM
In essence: we're witnessing a paradigm shift. And for moments like these—I invite you—it's invaluable to have studied Popper and Kuhn in those courses.
An even more provocative hypothesis: the 'Vienna Circle' has morphed into the 'Circle of Big Tech,' gatekeepers of the data. What's the role of academia here? What happened to professional researchers? The way we learn has been hijacked by these brilliant companies, which—at least this time—have a clear horizon: maximizing profits. What clear horizon did the stewards of the scientific method have before? Wasn't it tainted by the enunciator's position? The personal trajectory of the scientist, the institution (university) funding them? Ideology, politics?
This time, it seems, we know exactly where we're headed.
(This comment was translated from Spanish, please excuse the rough edges)
by Verdex on 2/8/26, 1:15 PM
Last I checked with every other high level language, you save the source and then rerun the compiler to generate the artifact.
With LLMs you throw away the 'source' and save the artifact.
by apical_dendrite on 2/7/26, 11:23 PM
I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.
Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.
The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.
by geon on 2/8/26, 7:34 AM
Has this been true since the 90s?
I pretty much only hear people saying modern compilers are unbeatable.
by skaul on 2/8/26, 5:16 PM
That changes with LLMs. For now, you can use LLMs to help you code that way; a programming buddy whose code you review. That's soon going to become "quaint" (to quote the author) given the projected productivity gains of agents (and for many developers it already has).
by dainiusse on 2/8/26, 7:57 AM
Everything else is secondary.
by TZubiri on 2/7/26, 10:58 PM
This is not an appropriate analogy, at least not right now.
Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.
by pjmlp on 2/8/26, 9:24 AM
by redbell on 2/8/26, 8:37 AM
The hottest new programming language is English
______________by phplovesong on 2/8/26, 10:05 AM
by badgersnake on 2/8/26, 9:59 AM
by senfiaj on 2/8/26, 4:06 PM
by boomlinde on 2/9/26, 5:07 AM
by rco8786 on 2/8/26, 2:33 PM
The issue is that if you fire off 10 agents to work autonomously for an extended period of time at least 9 of them will build the WRONG THING.
The problem is context management and decision making based on that context. LLMs will always make assumptions about what you want, and the more assumptions they make the higher the likelihood that one or more of them is wrong.
by cess11 on 2/8/26, 11:01 AM
Because that's pretty much what "agentic" LLM coding systems are an automation of, skimming through forums or repos and cribbing the stuff that looks OK.
by kristjansson on 2/8/26, 6:57 AM
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[1]: loosely in the statistical sense of containing all the information the programmer was able to encode https://en.wikipedia.org/wiki/Sufficient_statistic
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
by pvtmert on 2/8/26, 9:15 AM
Imagine a machine that does the job sometimes but fails on some other times. Wonderful isn't it?
by karmasimida on 2/8/26, 10:28 AM
by rvz on 2/7/26, 11:16 PM
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
by fpereiro on 2/8/26, 9:22 PM
Some context: I'm basically trying to make sense of the tidal wave that's engulfing software development. Over the last 2-3 weeks I've realized that LLMs will start writing most code very soon (I could be wrong, though!). This article is just me making sense of it, not trying to convince anybody of anything (except of, perhaps, giving the whole thing a think). Most of the "discarded" objections I presented in the list were things I espoused myself over the past year. I should have clarified that in the article.
I (barely) understand that LLMs are not a programming language. My point was that we could still think of them as a "higher level programming language", despite them 1) not being programming languages; 2) being wildly undeterministic; 3) also jumping levels by them being able to help you direct them. This way of looking at the phenomenon of LLMs is to try to see if previous shifts in programming can explain at least partially the dynamics we are seeing unfold so quickly (to find, in Ray Dalio's words, "another kind of those").
I am stepping into this world of LLM code generation with complicated feelings. I'm not an AI enthusiast, at least not yet. I love writing code by hand and I am proud of my hand-written open source libraries. But I am also starting to experience the possibilities of working on a higher level of programming and being able to do much more in breadth and depth.
I fixed an important typo - here I meant: "Economically, only quality is undisputable as a goal".
Responding to a few interesting points:
@manuelabeledo: during 2025 I've been building a programming substrate called cell (think language + environment) that attempts to be both very compact and very expressive. Its goal is to massively reduce complexity to turn general purpose code more understandable (I know this is laughably ambitious and I'm desperately limited in my capabilities of pulling through something like that). But because of the LLM tsunami, I'm reconsidering the role of cell (or any other successful substrate): even if we achieve the goal, how will this interact with a world where people mostly write and validate code through natural language prompts? I never meant to say that natural language would itself be this substrate, or that the combination of LLMs and natural languages could do that: I still see that there will be a programming language behind all of this. Apologies for the confusion.
@heikkilevanto & @matheus-rr: Mario Zechner has a very interesting article where he deals with this problem (https://mariozechner.at/posts/2025-06-02-prompts-are-code/#t...). He's exploring how structured, sequential prompts can achieve repeatable results from LLMs, which you still have to verify. I'm experimenting with the same, though I'm just getting started. The idea I sense here is that perhaps a much tighter process of guiding the LLM, with current models, can get you repeatable and reliable results. I wonder if this is the way things are headed.
@woodenchair: I think that we can already experience a revolution with LLMs that are not fully autonomous. The potential is that an engineering-like approach to a prompt flow can allow you to design and review (not write) a lot more code than before. Though you're 100% correct that the analogy doesn't strictly hold until we can stop looking at the code in the same way that a js dev doesn't look at what the interpreter is emitting.
@nly: great point. The thing is that most code we write is not elegant implementations of algorithms, but mostly glue or CRUDs. So LLMs can still broadly be useful.
I hope I didn't rage bait anybody - if I did, it wasn't intentional. This was just me thinking out loud.
by svilen_dobrev on 2/8/26, 8:32 AM
by freejazz on 2/8/26, 8:00 PM
by Razengan on 2/8/26, 4:53 PM
Well, nobody could figure out how to program them. Except the few outcasts like us who went on to suffer for the rest of our lives for it :')
With phones & LLMs this is the closest we have come to that original promise of a computer in every home and everyone being able to do anything with it, that isn't pre-dictated by corporations and their apps:
Ideally ChatGPT etc should be able to create interactive apps on the fly on iPhone etc. Imagine having a specific need and just being able to say it and get an app right away just for you on your device.
by stared on 2/7/26, 11:01 PM
by retinaros on 2/8/26, 12:17 PM
by koiueo on 2/8/26, 10:40 AM
Gosh, LLMs been a thing only for a few years, but people became stupid already.
> what Javascript/Python/Perl did to Java
FFS... What did python do to java?
by mock-possum on 2/8/26, 7:25 AM
by lofaszvanitt on 2/8/26, 6:02 AM
by renewiltord on 2/8/26, 5:12 AM
by OutOfHere on 2/8/26, 12:06 AM
by dsr_ on 2/7/26, 11:02 PM
"I prompted it like this"
"I gave it the same prompt, and it came out different"
It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.
by zkmon on 2/8/26, 8:04 AM
by fullstackchris on 2/8/26, 8:19 AM
critical distinction: unless your getting paid comparable to your ouput (literally 0 traditional 9-5 software jobs I know unfortunately) this is infact the opposite - a subscription to any of these services reduces your overall salary, it doesnt make it higher...
then there is the case i know the dishonest are doing is firing of claude or whatever and going for a walk
by titaniumrain on 2/8/26, 8:51 AM
by gaigalas on 2/8/26, 3:30 PM
I mean, we only have them because it is strictly necessary. If we could make architectures friendly to programming directly, we would have.
In that sense, high level languages are not a marvelous thing but a burden we have to carry because of the strict requirements of low level ones. The less burdens like those we have, the better.
by abcde666777 on 2/8/26, 6:17 AM
So I'm guessing they just rise because they spark a debate?
by ares623 on 2/8/26, 9:00 AM
We don't commit compiled blobs in source control. Why can't the same be done for LLMs?
by niobe on 2/8/26, 9:26 AM
by dankobgd on 2/8/26, 12:12 PM
by echelon on 2/7/26, 10:57 PM
I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.