With a sigh, I acknowledge that agentive coding is here to stay, though I still think it’s a bit overrated. Elon Musk, however, is more bullish. He believes that with LLMs, compilers will become obsolete, saying things like:

Things will move, maybe even by the end of this year, to where you don’t even bother doing coding. The AI just creates the binary directly.

And like:

Code itself will go away in favor of just making the binary directly.

The next step after that is direct, real-time pixel generation by the neural net.

Elon Musk is mistaken about this. We will still need programming languages, and good compilers for them, in the era of LLMs. Just like humans do better, more efficient work with these tools, so will LLMs, reasoning systems made in the image of humans. There’s no reason to believe programming languages will become less useful.

Actually, I think Musk is worse than wrong. It’s not that programming languages will be less useful and important in the age of LLMs. It’s not even that they’ll be equally important as before. Programming languages will become more important and more essential.

For a skilled programmer, the quality of the programming language matters less. The better the programming language, the more it takes off the programmer’s plate, but a skilled programmer can handle it. The quality of the output will degrade with lower-quality programming languages, but relatively slowly.

For an LLM, the quality of the programming language matters more. The more we take off an LLM’s plate, the more we focus its context on relevant problems and less on irrelevancies, and the more we provide rigor, the better our outcomes. They’re struggling to keep in their limited context everything that’s going on—and they’re often failing, degrading into slop. Not only do they need programming languages and compilers, and good programming languages and compilers, they need these things more than we humans do.

Why LLMs Won’t Directly Output Machine Code#

Writing machine code—or even assembly language—is hard. It’s a problem that requires meticulous attention to detail. It is fraught with counterintuitive pitfalls—footguns, if you will—for performance and correctness. In the spectrum from soft, intuitive creative work to mathematical, rigorous, logical engineering work, it falls solidly on the engineering side.

Writing machine code is hard for humans. For a human to write machine code effectively, they basically have to write out a plan ahead of time as a separate step—effectively a high-level program. Otherwise, there’s too much complexity to manage all at once. Any human’s working memory would be overwhelmed without meticulous planning.

Then, once the human has written their plan—again, at a level of detail where it may as well be a Rust program—they would have to hand-translate that plan into machine code, a step that a compiler could do much faster if the human would just write the program in Rust. And not only would the compiler be much faster at generating machine code than a human, it would generate better machine code: more performant and far less liable to subtle bugs.

To self-plagiarize from my earlier, broader exploration of LLMs:

The Rust compiler has capabilities I don’t. I can’t hand-verify whether a Rust program is correct with 100% accuracy. I could probably manually turn a Rust program into equivalent assembly or machine code given enough time – but it’d be error-prone and my version would be slower, and I’d have to try many times before it would work at all.

In all the ways that I am dumber than the Rust compiler, an LLM is as well, or even more so.

LLMs need explicit structured planning and rigor more than humans. Even if they’re not trying to hand-write machine code, LLMs work better when they write a high-level abstract plan first and hand that plan off to a new instance with a fresh context. Multi-agent systems are all the rage now for exactly this reason. Different parts of the multifaceted process are handed to different contexts and different instances of an LLM, some high level and some low level.

The insight of multi-agent LLM workflows is that specialization is key. These different tasks need different context and different instructions—without distractions from other agents. Different models, different levels of effort might be appropriate for these different individual tasks. And I’d expand that by saying not all of the components should even be LLMs.

For the final, most low-level step, converting the lowest level plan into actual machine code, an LLM is just as ill-suited as a human. First of all, they’ll make sloppy mistakes, missing pointer invalidation issues (painful in C++ and even more in raw assembly), copying offsets wrong, etc. This type of exact, tedious work is literally what we originally designed computers for—where following an explicit, fixed algorithm with no room for creativity or intuition gets you the best answer every time.

LLMs are how computers do intuitive reasoning. For mathematical rigor, traditional programs (like compilers) formed into traditional tools are better. The only reason LLMs can reliably do arithmetic is because we give them access to a calculator tool internally. Similarly, they need internal tools for generating machine code.

Ideally, there’d be a mix and match of “intuitive”/“fuzzy reasoning” components like LLMs, and rigorous mathematical engineering-brain components…like programming languages. As more and more of these mixed reasoning systems are built, we’ll get better and better results—if we build effective programming languages.

In the same vein, we could also use harnesses to connect these new programming languages effectively to LLMs, better integration with tooling like LSPs, components for scratch space for internal notes, plug-ins for formulaic mathematical reasoning and visual reasoning, and so many other things—but for right now, the key point I want to emphasize: LLMs need programming languages and compilers.

What do LLMs need from a programming language?#

So, LLMs need programming languages. But what kind of programming languages? To understand what LLMs need from a programming language, we can take a step back and ask ourselves what makes LLMs successful in general. If we do this exercise, we find that LLM-based coding agents do their best when they have some metric to evaluate how they’re doing. Lots of test cases helps. Validation helps. (Of course, this is true of humans as well, but LLMs need metrics and validation even more, due to their shallower grasp of context.)

All of this sounds like static typing, like Rust, doesn’t it?

Programming languages like Rust are hard to use incorrectly. “If it compiles, it works” is a major design goal of Rust. Between the borrow checker and its rigorous but flexible type system, Rust pulls off this goal beyond what many would think possible (for which the Internet abounds with success stories, and any friendly neighborhood Rust programmer would be overjoyed to give you more). The Rust compiler, effectively, comes with so many built-in rules (which would be opt-in lints in other languages) that it serves as its own set of test cases.

Rust and LLM-based coding are therefore a match made in heaven. You may object that LLMs are better at Python and Javascript than Rust, but that’s comparing apples and mangoes—it’s just an artifact of those languages’ prevalence in the training data. In Rust, I’ve watched the LLM see compiler errors and adjust on the fly. I shudder to imagine an LLM writing C++ and the memory corruption and undefined behavior that would result—if anyone’s writing C++ with agentive workflows, please let me know how it’s going for you.

LLM-written code is slop compared to human code, as every programmer with judgment knows with experience. But there’s a limit to how sloppy Rust code can be. Rust slop just can’t be as sloppy as C++ slop.

I think we’re underutilizing Rust’s advantages in an LLM environment. Ideally, agentive systems would use more LSP integration. If an agent outputs a diff to Rust code, it shouldn’t have to query to see compiler diagnostics. Every time it writes code, it should abruptly be confronted with the errors, and given an opportunity to fix them, perhaps in a fork of the parent context, where the signatures and documentation of all the relevant functions is also injected. Static languages like Rust make all of this easier.

But Rust only covers part of the software engineering space. It’s specialized for systems programming, things like databases, data pipelines, low-level networking, programming language runtimes, and operating system kernels. It struggles with domains like graphical user interfaces, data science—the types of programming that most programmers actually do.

Rust’s elevation of type safety is separate from its focus on systems programming. Haskell, for one, also strives for (and succeeds at) the “if it compiles, it works” maxim, and it has a garbage collector and can be made to shine in a GUI context. It’s too bad it also has an aversion to practical success. Rust brings some of the capabilities of research languages like Haskell to a broader audience, but only for some domains. We need Rust-like programming languages for more domains.

The Future#

The work of programming language design, far from ending in the era of LLMs, is just beginning. We need even more Rust-like rigor in more programming languages. And we need agentive harnesses that leverage that rigor and integrate them into how we build the LLM components of these multi-agent models.

In general, we need to think critically and build intelligently, rather than just throwing more LLMs at every problem and assuming that will be enough—like certain technically out-of-touch business tycoons would do.

If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.

Programming Languages and Type Safety in the Era of LLMs

Why LLMs Won’t Directly Output Machine Code#

What do LLMs need from a programming language?#

The Future#

Subscribe

Comments