computers on The Coded Message

Why can't you request changes from yourself on GitHub?

2024-09-04T00:00:00+00:00

I was recently working on a (company-internal) GitHub pull request I’d written. A colleague left a few comments in a review and “requested changes” from me, effectively giving me a TODO list of items that needed to be done before the PR could be merged. Because he’d specified that he was “requesting changes,” GitHub knew to prevent someone from merging the PR before those requests had specifically been addressed.

Once I’d finished addressing these TODO items, I had a conversation with this same colleague about something else. He indicated he’d like that to be changed as well, and I put in the comment on my own PR. But then, I found I could not request changes on my own PR.

How strange! Someone had to think of that special case, write code to forbid it, and put an error message!

On the one hand, I understand that it’s a bit odd to request changes from yourself. But we plan to do things all the time, and what is a plan, but a request to yourself to do something? As someone with ADHD, I need to be very careful to make sure I write down all my plans right away. What better place to do that than in a change request to myself, the same place where all my other TODO items in a pending pull request go? I could put it on my own TODO list, but the more places I have to put things, the more likely they are to slip my mind.

And “request changes” isn’t really a request. It does something! I can make myself a TODO by leaving a comment. But nothing would prevent someone from accidentally merging the PR before my comment was addressed, unlike a comment associated with a request for changes.

It may seem unintuitive, but there’s actually nothing special about the creator of a PR requesting changes on it. Just because I wrote it, doesn’t mean I can’t later find problems with it, just like other reviewers can. Also, just because I created the PR, doesn’t mean I even wrote all the code in it! It might have been from someone else’s git branch, or from a larger branch with several authors!

I’m not the only person who thinks this, as evidenced by this GitHub issue. The comments on that issue recapitulate many of the arguments I’ve made here.

I would say “I’m sure there’s some reason for this policy,” but honestly, I’m really suspicious that there would be any valid reason, certainly one that would outweigh the inconvenience. I suspect the reason is that someone just felt it went against the normal meaning of the word “request,” which is honestly a bad reason. The word’s usage has nothing to do with the specific construct that is a “request for changes” on a “PR,” both of which are terms with a specific meaning, and specific consequences, in a specific context – consequences like preventing accidental merges, consequences that are useful.

Can anyone think of a good reason for this rule? Does anyone think it’s a good rule? Leave a comment!

The AI Non-Economy: A Rant

2024-07-29T00:00:00+00:00

I just read an article in The Atlantic that AI is failing to justify itself economically. This is pretty dire for AI, especially given that this is such an overly expensive technology even with tons of brazen stealing from content creators. I feel like it should go without saying that if your business isn’t profitable even with a ton of stealing, maybe it’s not that great a business.

But of course, who doesn’t want a confident confabulator incapable of critical thinking? A bullshit artist designed to do what many of us learned to do in high school and college, and write pages of content that sounded “educated” without actually paying attention to the actual ideas, or even understanding them at all?

I mean, I don’t want one. But clearly society does, otherwise why did we educate so many people in exactly that? If we have so many bullshit jobs it makes sense that someone would create a bullshit factory to automate them. Although, as the book Bullshit Jobs also points out, the point of the bullshit jobs is rarely what the job description nominally claims. Sometimes, the point is just to show off having employees, which AI can’t really do.

Not that it’s completely without valid use cases. I’ve even used AI, as a language practice buddy. I wouldn’t trust it with anything real, and it sometimes makes up grammar mistakes when I ask it to correct my grammar, but I don’t find it useless.

But I also don’t find it worth paying anything for personally, let alone an amount consistent with the billions of dollars spent building these models, and that soon will be spent building future models. And that’s the cost that doesn’t take into account the environmental damage, the stealing from writers and artists, and the damage from the hallucinations.

Here’s hoping this recent Atlantic article is the beginning of a trend where people realizes that when you spend more than the Manhattan project or the Apollo project, you need to have results comparable to nuclear weapons and energy, or landing people on the moon. And even then, it probably still doesn’t pay off as a private investment.

At some point, like the Bitcoin bubble, the real estate bubble, and the Dot Com bubble of the 90s, the AI bubble will break. AI won’t go away entirely though, and much of the damage will still have been done, but maybe, just maybe, we’ll be able to start addressing that damage rather than doubling down for more. Maybe we’ll be able to teach children critical thinking, or teach graders how to discern original thoughts from AI-generated (or human-generated) drivel. Or at least, we may figure out some other way to stop children from using AI to cheat. And maybe then we can invest in something that actually contributes to the world, like reversing climate change or building better transit infrastructure.

In the meantime, anyone who lays off real people in favor of AI will soon find themselves wishing for the people back (unless they were doing nothing anyway). And, if all this spending is any indication, that will be just in time for the AI (or rather, its corporate sponsors) to ask for a major raise, to try desperately to make back a little on all this unhinged investment.

Large Language Models Should Have to Obey Copyright

2024-06-30T00:00:00+00:00

AI, particularly this new round of large language models, scares me on behalf of society and the future.

I don’t just say that because it’s transformative. I don’t say that as a generic warning that we haven’t considered the consequences (as in this XKCD comic). No, I have specific consequences in mind, consequences that I have considered, and I am rather worried about them! They are not so much problems about the technology itself, but about how we use it, and specifically how we use it on a societal, economy-wide scale.

This isn’t about jobs either, not per se, though that’s also a valid concern. The entry-level grunt work jobs that AI are indeed more likely to replace will cause rungs to be removed from the ladder to the jobs that it can’t replace. Rather than having young people be paid to work and learn, society will continue to shift to requiring people to pay to be allowed to learn.

But that’s not my topic! That’s a topic for a whole ’nother article!

My topic today is how AI has already begun to, and will continue to, disincentivize actual writing (and other art and creative activity).

After all, why write articles when a computer can do it for you (albeit mediocre ones)? Why write new stories, new poems, when the AI can do all that (albeit bad ones)? Certainly, why write new PSAs or technical articles when the AI definitely can do that, and make them sound polished and rigorous (albeit potentially full of lies)?

This makes perfect sense individually, but there’s a tragedy of the commons here. The AI can only do re-capitulations of what it’s been exposed to in its training. It can mix and match styles with content, but only superficially. It can make an essay about the dangers of AI sound like Lord Krishna from the Bhagavad Gita¹, but it does not render any insights into how Krishna, or Hindu philosophers, would (or should) actually approach AI.

It’s just vibes, and so far, nothing deeper. Any creative or transformative insights are projected by the reader onto the text, like humans do continuously from sources of entropy, like someone doing a tarot or astrology reading, or using a personality test as a conversation starter to help them process their experiences.

Either that, or the insight is stolen.

Thievery

If you see an insight that’s not a projection, it’s probably coming from one of the documents the model was trained on. This returns me to my point: If everyone uses AIs to create the content, new “content” will be created, in the most literal and superficial sense. New insights, new thoughts, new ideas, new intellectual trends, will not be created.

And those who do create truly novel content, will have to compete with what the AI generates. And then, when they do create it, the AI will “train” on it, and recapitulate the ideas, so they will have to compete with remixed versions of themselves.

The Internet is already full of mediocre SEO-focused articles, and writers are already having trouble getting paid the true value of what they write. With AI, the Internet will get even crappier, and the hard and legitimate work of writing will get even worse compensated, even though it will be needed more than ever – even though the need for real human writers will be hidden behind an AI mask that secretly relies on real human writers.

We need to regulate this!

We need to pay writers their fair share of their contributions to AI. And by “we,” I mean the AI companies, the developers of these large language models.

Fortunately, a law already exists. It just needs to be enforced. This law is known as “copyright.”

The Legal Question

So, does copyright apply to AIs? Do companies need the consent of copyright owners to “train” (that is, to feed into the data structures of) their large language models on copyrighted materials?

Well, when does copyright apply? Copyright, literally and in practice, involves the right to copy. You might think this is not copying at all! After all, humans learn by reading things all the time! And the things those humans learn, then influences what they write!

In reality, copying is on a spectrum. When a human reads a source, learns about something, and then that something influences the human, and the human later takes some of the information that they’ve processed, learned, and adapted to their own style of thinking, that isn’t copying. That’s the human having learned from the original source, unless the human recapitulates certain details – a distinction the human is aware of. That can very easily not be copying at all, but a novel creative work.

When a photocopier copies something, that is copying. That is the opposite end of the spectrum, completely covered by copyright law.

Somewhere in between is AI. The question is just where it falls on the spectrum. When an AI is “trained” on a source, and the source is transformed into a bunch of incomprehensible math. This does seem similar to it interacting with the human’s neural patterns in an incomprehensible way. The math is even referred to as “neural networks.”

But in spite of the anthropomorphic terminology, training an AI is closer to photocopying than a human learning. This might not always be true – AI is getting better all the time – but it is true now. The AI lacks the fundamental transformation of being learned by an actual human, reframed in terms of the human’s existing ways of thinking about the world, and recombined with and tested by that human’s lived experience.

The legal world must treat AI training more like the photocopier, and less like a real human. We must require that trainers of AI models get permission from human authors and artists to use their work. These companies must pay those humans if they insist on it. If the writers do not give these companies permission to use their work, they must not use it. And AI models trained in contravention of these requirements must be treated like pirated movies, and certainly not as sellable products to be hawked by the world’s richest companies.

Using content published on the Internet is no excuse. By posting this article on my website, I give up none of my rights under copyright. I am, at most, giving you, the reader, implied permission to make the copies necessary to view this website – an in-memory copy on your own computer, in the browser’s portion of the system’s memory. I am also quite comfortable with you, the reader, storing a cached copy on your system, for the sake of performance. But that is as far as it goes.

Mustafa Suleyman, CEO of Microsoft AI, disagrees, saying:

I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

This is completely bogus. This is not how the law actually works, as numerous articles have pointed out. Perhaps he said “social contract” because he knows the actual law is against him, but legally (and socially and morally), posting on the Internet doesn’t waive copyright.

The actual law says: Websites are still covered under copyright law. I have not waived copyright on my articles by posting them on my personal blog. You, as a reader, may not print out many copies of this article and start distributing them on a street corner without my permission. You also can’t copy it to your own website and get ad revenue from it. You can’t even copy it to your own website, with attribution, for no money, without my permission, even if you say “credit goes to the original creator” in your blurb. I wrote this article, and I have the copyright, that is, the right to say who’s allowed to copy it.

Training AI on my website and then letting that AI remix it into content it distributes is not like a person learning and then saying what they’ve learned. It’s more like someone sampling my song in their song, or making their own remix. It’s more like someone copying my article to their website and not even giving me credit. And it should be just as illegal, without my consent.

To be clear, I do not consent to an AI transforming this article. I did not consent to the (inaccurate and misleading) Google AI summary of this website. I certainly do not at all consent to any large language model being trained on it. The law should respect my right to prevent this, unless I explicitly opt in and allow it.

Courts, if they are responsible, will enforce this with existing copyright law. And perhaps new laws and social conventions can be created to clarify the boundaries in this particular case. But in order for the system to be fair, and for authors and artists to be properly compensated, money should flow.

It should not just be a token amount of money. If the creators of AI aren’t complaining, it is not enough. If the creators of AI aren’t complaining so loudly that there are armies of trolls online claiming that this will render AI uneconomical, then they’re not paying enough.

These companies are leeching off our work and making bank off of it, while trying to argue they don’t have to give us our fair share. By “us,” by “our work,” I mean not just people with websites, like me. I mean humans in general.

To the lawyers who read this: please internalize and repeat this argument in law reviews, in briefs, and in court. (As humans, you can do so without paying royalties if you use your own words.) To citizens of democracies: contact your representatives. To judges, lawmakers, and regulators: do the right thing.

AIs Are Not Humans

AIs are not humans. They are more convincing than the chatbots of the past, but it is just superficial trappings. They don’t understand the difference between truth or lies. They cannot evaluate the truth of statements and reframe them from their own perspective, or convert them into underlying logic and thoughts.

The superficial trappings are really convincing though. Humans are masters at anthropomorphization. We ascribe volition and internal experience to inanimate objects all the time. We yell at computers, we talk to our pets about nuanced concepts beyond their ken, we imagine we are friends with fictional characters, and so of course, we anthropomorphize chatbots.

We do so all the more now that these novel chatbots are masters of superficial social conventions, language, tone, and various registers of formality. But that’s not what makes us human. There’s no use empathizing with a large language model, or appealing to its better nature. Even if we try to insert instructions to try and make them ethical, they simply don’t have the internal sophistication to follow them. They are amoral, but combined with tools of language and persuasion, amoral can feel like immoral, as we start to trust them.

Even I anthropomorphize! Like most² humans, I name inanimate objects, and fancy them my friends. I do the same to ChatGPT, when I interact with it³. I find it easier to create natural-language prompts if I imagine I’m talking to a person, so I’ve created a character. I call him Albert, and think warm thoughts about an imagined older man with a fashionable sweater, a pleasant demeanor, and a mild European accent.

But the danger is to conflate this character, who I have warm feelings for, and the actual AI system, which is a very different ~~animal~~ machine. Albert is an invention of my imagination, an abstract petty deity of AI. ChatGPT is a technology, with real-world societal and economic implications.

But the branding of large language models fights against clarity in this case. We say we “train” and “prompt” AIs, instead of “loading data into them” and “programming” them. Even the name AI contains “I” for “intelligence,” which is misleading; lots of knowledge does not intelligence make. It is important to not be fooled.

Maybe someday there will be an artificial system with intelligence like a human being, with critical thinking skills and understanding of what it’s saying, a conceptual model that might clue it in that, for example, glue does not go in pizza. But large language models ain’t it, certainly not the ones that exist now.

“O Arjuna, to rely on these machines is to surrender one’s own discernment and intuition. The path of dharma requires us to cultivate our own wisdom and judgment. Dependency on artificial constructs can lead to the weakening of our inner faculties and the neglect of spiritual growth.” ↩︎
Some forms of neurodivergence make people do this less, I think. But that’s not my type of neurodivergence. When I was a small child, I would occasionally set aside a piece of cereal, claim it was the mascot for the cereal brand, and refuse to eat that piece. I had imaginary friends. ↩︎
OpenAI, the company behind ChatGPT, should pay creators whose content they’ve trained on for their work. ChatGPT should be illegal in its current form. But it’s not hypocrisy for me to use ChatGPT, especially if I’m trying to find out what its role is and will be in society, and therefore need personal experience with it. I have to live in the world as it is, not as I wish it would be. I do not think an individual boycott would be an effective protest, but I do have some hope that my engagement in the political process matters. Both are probably tilting at windmills, but at least by writing I can say “I told you so.” ↩︎

Can C++ fix its biggest problem?

2024-06-25T00:00:00+00:00

C++, like all things, has numerous problems. Pointing out how Rust addresses many of them is a major topic of my blog, but some of the problems are bigger than others. The biggest, most famous, loudest problem, the problem that got the federal government’s attention and resulted in a surreal flame war between Dr. Bjarne Stroustrup and the NSA (which I also commented on/contributed to), is C++’s lack of memory safety.

This is C++’s biggest problem, its memory safety problem. That’s the one everyone’s talking about. Can it be fixed?

First, a spoiler: In brazen contravention of Betteridge’s Law, I am going to answer “yes” to this question! But perhaps it’s a qualified enough “yes” to still fit the pattern – you be the judge!

Can we migrate C++ programmers to a safe programming language?

C++’s lack of memory safety can, of course, be addressed by moving away from C++ proper. It can be fixed by creating a new language, inspired by C++, that has many of its properties, but memory-safe. The idea would be that C++ programmers interested in memory safety, hopefully most C++ programmers, would move to this new programming language. New projects that would have been begun in C++ in a previous era, are now run in this new programming language, which also offers more modern tooling to boot.

Can this be made to work? Can a majority of C++’s user base be replaced by a “novel” safe programming language? Can that new language be shiny enough to attract people, offering memory safety but also other ecosystem benefits to entice people away? Can that end C++’s hold on its part of the market for programming languages?

Yes. Yes, it can. It can because this entire thing has already happened.

You might think I’m insane – Rust hasn’t captured most C++ programmers – but when I say it’s already happened, I’m not talking about Rust. I’m talking about Java, back in 1995. Remember, C++ is now considered a systems language. It is niche. Before the Java era, C++ was used for application programming!

And then came Java. Since 1995, Java has successfully smashed C++’s previous programming language position. C++ is now only used for legacy applications, and/or applications where Java’s mechanism for memory safety (namely garbage collection and mandatory heap usage) isn’t performant enough. All the rest of C++’s much broader market has, since 1995, gradually moved to safe programming languages.

This is why Dr. Stroustrup’s response to the NSA was so upsetting, and part of why I felt compelled to write my rebuttal! Far from being “novel,” the safe programming languages that have most competed with, and most drawn most away from C++ have been Java, along with its Microsoft-branded twin, C#. Even games are written in C# now, not C++!

“Safe programming languages” aren’t remotely “novel.” They’ve been around for aeons. What Rust contributes is not memory safety, which is old hat (although there are some ways in which Rust is better at preventing programming mistakes than Java), but memory safety combined with a non-garbage collected, systems programming language level of control over memory usage.

C++ is hanging on by a hair because of this niche where garbage collection is unacceptable, where until Rust memory safety was thought infeasible. Now that Rust has demonstrated that you can have this cake and eat it too, you can have memory safety without garbage collection, it is only a matter of time before safe memory languages capture this small hold-out.

Most programmers aren’t systems programmers, and so most programmers use memory-safe programming languages, like Javascript or Python, or Java or C#. Only a small minority are still in a backwater of memory unsafety. Framing memory safety as a weird, unnecessary requirement, when seen from that perspective, is raw parochialism.

I made a throw-away comment in my Stroustrup response, that a majority of programmers would continue to use memory-safe programming languages. Somewhere, in an obscure discussion thread, one person (call him George) said this was clearly false, as many more people used C and C++ than Rust. Another (call him Frank) responded that most programmers use languages like Java or Javascript. George responded that they had assumed I couldn’t possibly have meant managed programming languages, but must have been speaking within a systems programming context.

But Frank was absolutely right about what I meant! The right perspective for understanding this process is programming as a whole. The only reason systems programming was special in not requiring memory safety before, was because it was believed memory safety required GC. Now that we know this is false, it’s not special anymore. Memory safety will rapidly become an expectation there as well.

And so, Rust will be able to do to the remnant of C++ what Java did 30 years ago: convert a majority of C++’s userbase to Rust and its friends. C++ has been becoming a legacy language for a long time, and this will make the process complete.

Can C++ itself be made suitably memory safe?

That said, C++ will still be with us for quite some time! Even if it is just used for old projects, there are a lot of projects in C++, that won’t be rewritten in Rust or Java anytime soon. Is there a way to bring safety to them? Can C++ itself, in new versions, be made memory safe?

Yes, I think this is coming, eventually! Not soon enough – it’s long past its due – but it’s being worked on!

I don’t refer to the vaporware that is C++ safety profiles. I’m referring instead to a research project that tries to take the lessons and successes of memory safety in Rust, and apply them to C++, without changing anything else about the programming language.

That project is Circle C++ with memory safety, designed by Sean Baxter. It is a work in progress, but it is a proposal with many benefits over safety profiles. Importantly, it doesn’t shy away from changing the programming language itself where necessary.

The keyword safe has similar rules to noexcept. In safe code, pointers are disallowed in favor of safe borrows, borrow-checked by a system similar to Rust’s. All the ideas borrowed from Rust, however, are done with a C++ aesthetic. And the entire thing is opt-in, on a file by file basis – but once your file is opted in, safety is on by default. This actually strikes me as a reasonable compromise for C++!

Go read the website, Sean explains it better than I could.

Standard C++ could adopt this approach, and still be C++. Perhaps, if the right people hear about it, the C++ fans who think Rust is pointless might even be able to get on-board. Maybe.

So, will C++ become a memory-safe programming language? Maybe. Can it? This research has convinced me it is possible, without losing its C++-nature. We shall see if the stakeholders in the C++ community feel similarly.

Conclusion

The memory-safety problem of C++ is, ultimately, a transient problem. Memory-safe languages will continue to eat away at C++ usage, just as they have for decades, any blips to the contrary notwithstanding. C++ will then continue to fade into the land of legacy – which, don’t underestimate the size of legacy code in this world – but ultimately, it won’t be used for new projects.

In the meantime, C++ has an opportunity to fix this problem with itself, although many others would remain. They have a duty to do so, and to take this issue seriously, as memory safety remains a serious problem for those large codebases that will remain for some time.

Asahi Linux Again

2024-05-18T00:00:00+00:00

Since my previous post, I haven’t posted about Asahi Linux. This is for a simple reason: I wasn’t using it. I never took the time to set up a tiling window manager, get dropbox working, and all the things I felt I needed, and I slipped back to using my trusty Dell Ubuntu laptop for Linux, and using my MacBook M1 just for macOS.

But then I tried again! And wow, has Asahi Linux changed! It’s Fedora, not Arch now, and installation was much easier! So I wanted to share how my experience has gone. I’m not particularly stoked to spend too much time on sysadmin tasks for my personal computing, so this is more a narrative about what actually has happened in my adjustment to it, rather than a reflection of Asahi at its best, but I thought I’d share where I was at.

Most things are amazing. I like Fedora. Adjusting to using dnf instead of apt was easy enough. It’s also just nice using a more powerful and quieter computer for my day-to-day Linux-side tasks, so Asahi’s main goal is absolutely fulfilled. Good job!

Wayland and Sway

The biggest issue is that X Windows is dead, and Wayland is now king. This isn’t an Asahi specific issue, but it was Asahi that really got me over this annoying hurdle. I knew it was possible to get X Windows working on Asahi, but it is very deeply recommended against, and I didn’t want to try it. That’s not an issue per se, because I know X Windows is rotting. But, it does mean that I can’t use XMonad anymore, as XMonad is X Windows specific.

So, of course, Sway it is. It requires configuration and learning a new tiling window manager, which is annoying. Worse, there seems to be no way with the version of Sway that comes with Asahi to actually get title bars to go away. The work-around of setting the font size to 0 doesn’t work on my version, and of course there should just be an actual setting for it but the PR seems to be stuck.

I don’t know why anyone wants titlebars in a tiling window manager, so I don’t know why no title bars isn’t the default. I have no idea why this hackish work-around was considered acceptable. Are Sway users or maintainers just into extra information that uses up a lot of screen real estate? I use tiling window managers partially to not waste space (and attention) on distractions from what’s actually going on in my window, so this is a disappointment. Look at how pointless it is:

EDIT: This has been fixed by advice from a helpful person in the comments, without me having to do any dev work! Thank you so much!

But this matches how I feel about the switch from X Windows to Wayland in general. Lots of reconfiguration, lots of new workflows, lots of old tools that don’t work. (Does ImageMagick import take screenshots still? Hmm, doesn’t seem to. OK, grim it is.) If you’re a user of a desktop environment like KDE or Gnome, it’s great! If you aren’t, well, you have to re-figure out everything, which is something that I don’t have time for, because I’m not really a hobbyist in “having and using a computer” anymore. I have things I actually want to do with it!

And, the tools on Wayland are actually less polished. Wayland in general might be the future, and I know this will get better over time, but there’s so much work to be done.

Ironically, this is probably one of the best pro-C++ arguments over Rust.

EDIT to explain: There’s lots of people who would have a huge learning curve to go through to transition. That investment can’t be taken for granted, as both C++ and Rust have both steep and long learning curves, especially if used in a systems context. Perhaps that’s one of the biggest reasons for resistence to Rust.

I don’t maintain computer desktops for a living, unlike programming which I do do for a living. If I did, I’d have time to learn all this new stuff more thoroughly, and maybe even get involved with things like Sway. But as it is, I’m just frustrated at having to learn new things just to get things done.

This titlebar thing isn’t the only Sway issue. I’m also experiencing this issue, which is unfortunately closed, because there seems to be some sort of work-around – even though it hasn’t worked for me.

I’m just sort of dealing with it for now. I know that with some amount of work I could get all of these things smoothed out, but I’m worried that it’ll involve actual dev work on Sway itself, and I don’t even want to run a custom build of Sway. I just want the prepackaged Sway that comes with Fedora to be good, and to work with the prepackaged version of gvim. Is that too much to ask?

I know this isn’t Asahi Linux’s fault, or even really Fedora’s fault. I know this is to some extent what I sign up for by using tiling window managers. It’s just a completely normal consequence of a large transition. However, I think people who are pushing Wayland over X Windows should be aware of how many little things it’s messing up for people. I also think that Sway deserves more love (that is, work) as a project, given that I can’t be the only person in this sort of situation.

Box64 for Baba Is You

A happier story is that running Intel binaries on ARM is great! I had a false start with qemu-user, but it turns out box64 just does the trick. Box64 allows you to run Intel binaries linked against native (ARM) libraries, which is quite impressive! Unfortunately, the one in Fedora’s package manager was compiled for the wrong page size, so I did have to recompile it.

But it runs Baba Is You no problem, which is an excellent game!

Box64 integrates super well with Linux. You can just launch the Intel binary, and it Just Works™ if you have it installed. I think a build appropriate for Macs should be available if installed on Asahi, and I also think that it should be part of the default installation. Then, you’d be able to “just run” Intel Linux binaries. How nice!

I haven’t tried any other programs out in it, but I suspect it’ll be not perfect but very very good.

What Bits Mean: Meta-Data and Static Typing

2024-04-23T00:00:00+00:00

This is part of my new series on what the 0’s and 1’s in computers mean, how computers use them to store various kinds of information, and why all of this works the way it does.

When I was a boy, my schoolmates, knowing that I was interested in computers, would sometimes ask me if I could read binary. They imagined I would see some binary, and be able to read it out loud like they could read letters, perhaps some binary that looked like this:

I’m not sure how I handled this situation as a boy – I’m sure it was plenty awkward and convoluted because my memory of it is blanked out. But I have a question in response now, and I offer it to you, my reader: Do you know how to read letters?

Perhaps, if you do, you can tell me what this sequence of letters means. I will tell you that I saw it written on a mysterious bottle of mysterious liquid:

GIFT

Now, perhaps you are very confident you know. But perhaps you want to ask a follow-up question. Because that sequence of letters can mean “present, item that has been given to you, free distribution of a good” – if we are assuming it is an English. If we instead assume it is a German word, well, then it means “poison.” Very different. (And perhaps in either case we shouldn’t drink mysterious liquids, even out of mysterious bottles that are only hypothetical – perhaps especially out of ones that are only hypothetical.)

But yes, letters are symbols, but they only have meaning in the context of a language to interpret them. The same series of symbols can mean two different words in two different languages.

Similarly, the binary I listed above could have different interpretations, depending on what type the data has. If interpreted as text with an ASCII character encoding, it says “GIFT” (with no indication, of course, whether that means poison or present). If interpreted as a 32-bit unsigned integer in little endian (increasing addresses from the top of the screen to the bottom), it is 1413892423.

Now, like with language in most situations, (especially ones that don’t involve mysterious bottles), we can use context clues to guess that it is more likely that I, Jimmy Hartzell, the author of (or at least the poster of) those bits, chose them to represent the word GIFT rather than the number 1413892423, a number with no relevance to the price of tea in China.

But computers can’t use context cues, certainly not in a probabilistic, critical-thinking based way. Or at least, traditionally they can’t! And they certainly can’t at the speed and reliability needed to do their normal day-to-day work. Computers need determinism! They need mechanisms guaranteed to tell them whether those bits written above, those 1’s and 0’s were ASCII text spelling GIFT or a (32-bit unsigned little endian integer) number, specifically 1413892423, or some other interpretation, like an 8 pixel by 4 pixel black and white image, or perhaps just garbage that just happened to be in unallocated memory, ready to be overwritten by something more useful.

Now, there are myriad ways that computers accomplish this. It differs by computer platform and operating system and programming language. But some of the simpler ones are familiar to any computer user.

One way of figuring out what interpretation to use for bits is meta-data – bits that are interpreted to mean things about how to interpret other bits. You may have heard the term meta-data before, and you certainly know some examples.

Meta-data is like the labels on a form. Here is an example form without labels:

Jimmy
Hartzell
Male
Pennsylvania
United States of America

If you see this form, you can probably guess that it provides my given name, surname, gender, state of residence, and country of citizenship.

But some people are named Virginia, and some people live in the state of Virginia, so there’s always room for confusion! And from Dune I’ve learned that at least some fictional people have the word “Idaho” attached to them, and it’s not a state but a surname. For these and other reasons, in practice, bureaucracies (which like computers have an allergy for confusion and a need for objective, consistent processes that they will follow against any and all opposing forces of common sense) use labels on their forms:

Given Name: Jimmy
Surname: Hartzell
Gender: Male
State of Residence: Pennsylvania
Nationality: United States of America

Even so, these labels are only useful if you know how to read the language. Even meta-data has to have some interpretative lens. Additionally, oftentimes, a bureaucratic form becomes invalid (and gets rejected by the authorities) if you start moving fields around, or start adding your own form. If I renewed my driver’s license, but decided to draw up my own form, it would be rejected, even if my meta-data were abundantly clear and it had all the data they wanted:

Favorite Color: Blue
Musical Instruments: Piano, recorder, trombone, vocals
Given name: Jimmy
Nationality: United States of America
Surname: Hartzell
State of Residence: Pennsylvania
State of Mind: Happy

Depending on what kind of computer system you’re dealing with, the computer might or might not mind adding additional fields – also depending on how fields are defined and how the meta-data is structured and what the format is for combining the meta-data with the data. It’s all quite complicated.

One common type of meta-data is file extensions. A file with a name ending in .docx is a Word Document, and when you (in this scenario perhaps you are a Microsoft Windows™ user) double-click on it in Windows’s file management program (is it still called Windows Explorer?), the program Microsoft Word™ will load to open it. If you name any old file to say .docx, it will still try to open it in Word, and then Word will yell at you that it can’t open it. (Oddly enough, if you rename it to say .zip instead, it will unzip just fine – Word documents are also zip files.)

How’s Windows know to open Word? Why’s it do it even if it’s not a valid Word document? It’s the extension. But not only the extension! It has configuration in the registry (at least it did at one time – do they still use a registry?) that associates the extension .docx with Word. Hopefully, that was the intention of the person who created the file, but you would imagine it is, otherwise they wouldn’t have named it that.

But even this convention depends on the context of the registry, not to mention the whole NTFS filesystem that Windows is probably using to tell which parts of the hard drive correspond to which named files in which folders.

You could also imagine a system where there were no file extensions and no file metadata. If you wanted to open a Word document, you would have to open Word first, and with it select a file to open. It would then try to open it as a Word document, and either you’d get something sensical or not depending on whether you were right about what program to use for that file. The onus would then be on the user for what program to use to open what file.

Perhaps the user could use their own metadata system, and have a Word document that they remember is a Word document, in which they write which program to use to open which file. Or perhaps the user can try different programs until they find one that makes sense. Perhaps the user can use specialized but ultimately fallible tools like file to see if there are any (relatively rigorous and consistent) clues as to the file type. Or the user may simply remember inside their own memory.

All of this is complicated, but that’s the world we live in. Symbols don’t have intrinsic meaning, and there is no inherent right language or right way to speak any language. There is no one way to read binary, and it is even more complicated than this essay implies, or than you might ever have guessed.

This extends into programming languages. In Python, variables have no type. You can use the same variable foo and put a number like 33 or text like "GIFT" into it. If you try to do an operation that doesn’t make sense, you get an error when you reach that operation, but not beforehand.:

import random

if random.randint(0,1) == 0:
    foo = "Hi"
else:
    foo = 33
print(foo)
print(foo + 1)

Half the time, this prints 33 and then 34. The other half, it prints “Hi” and then outputs an error message. Python is using meta-data to keep track of whether foo is a number or a string. That meta-data is in a format that makes sense to the Python interpreter, and allows the Python interpreter to inspect foo to see what type it is. If foo + 1 makes sense given that type, it does it. If it doesn’t, it displays an error on the spot.

This prevents it from misinterpreting data. The text “GIFT” will never be misinterpreted as the number 1413892423, because it won’t have the right meta-data. Any Python code that works on numbers will instead show an error message if the wrong meta-data is present.

What about a language like Rust? Rust also keeps track of types, but it does so without using meta-data like this. Rust takes your Rust program, and converts it into machine code that runs directly on your computer, a process known as compilation. That machine code is a series of instructions that are guaranteed to respect type safety (as long as you either don’t use unsafe Rust features or else only use them according to the strict rules Rust requires), so that if you write data interpreted as a number, the data is also read as a number.

Once the program is running, it doesn’t use meta-data to accomplish this. Instead, it is more like the user who knows to open Microsoft Word before opening a Word document. The instructions know to do operations on the right values. If they load a memory address to do math on it, it is because that memory address is known to the Rust compiler to be the type of data that math can be used for.

In this way, Rust is like a clever programmer who only writes correct code. If they store an integer in address 0xffffd9718c6c, and they load that value later, the programmer will remember in their brain that they should expect it to be stored as an integer. The resulting program works because the programmer wrote it in such a way that it would work, even though this information isn’t written down anywhere, because it uses addresses consistently.

The same is true of programs compiled by the Rust compiler. Once the compiler is done, it is not written down anywhere what type a variable has. At a computer level, the program is just written in such a way as to use data consistently.

This is more efficient, as Rust programs don’t need to take up extra memory for the meta-data. However, it does mean that the Python program we wrote above won’t work in Rust. We can’t even compile a program that tries to set a variable to two values of different types. There’s nowhere to write down the type information.

Let’s try to write an equivalent Rust program and see what happens.

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let test = rng.gen();
    let foo;
    if test {
        foo = 33;
    } else {
        foo = "Hi";
    }
    println!("{foo}");
}

In this case, you get an error:

   Compiling TypePun v0.1.0 (/home/jim/hobby/TypePun)
error[E0308]: mismatched types
  --> src/main.rs:10:15
   |
6  |     let foo;
   |         --- expected due to the type of this binding
...
10 |         foo = "Hi";
   |               ^^^^ expected integer, found `&str`

That makes sense, because Rust is keeping track of what type foo is supposed to be, so it can use it consistently. It can’t vary from run to run of the program, because that information isn’t written down anywhere. The value of foo can vary, of course – it wouldn’t be a good variable if it couldn’t – but the type, the interpretation of foo’s bits, cannot.

Of course, Rust can do everything Python can. In this case, you could tell Rust yourself to use a new type that uses meta-data to keep track of what type an inner value is. You can even do the math on it if it’s a number.

It gets complicated fast, since you have to define a new type, here StringOrInt, that indicates how to not only interpret the data in the value, but also the meta-data of what type of value it is. That outer type, however, is not stored in the resulting program as meta-meta-data.

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let test = rng.gen();

    enum StringOrInt {
        String(String),
        Int(u32),
    }

    let foo;
    if test {
        foo = StringOrInt::Int(33);
    } else {
        foo = StringOrInt::String("Hi".to_string());
    }

    match foo {
        StringOrInt::Int(foo) => {
            println!("{foo}");
            println!("{}", foo + 1);
        },
        StringOrInt::String(foo) => {
            println!("{foo}");
        },
    }
}

If you were to write a Python interpreter in Rust, you would have to do something like this for every variable, where you create a type that can contain multiple inner types. This is only one example of a technique that does this, where we created an enum type, but there are others, like “trait objects.” They all work according to similar principles: Rust needs to know explicitly that you want meta-data to keep track of additional information, and what style of meta-data you want.

Note that, in Rust, it still knows at compile-time whether + is an appropriate operation.

I mentioned something about safety earlier. You can get Rust to violate its rules with unsafe. This results in undefined behavior in general, and so the results you get with unsafe are not guaranteed to be consistent. However, we can use this to demonstrate what happens if Rust were to get its type information wrong.

fn main() {
    let foo = "GIFT";
    let foo_ptr: *const str = &*foo;

    // Safety: This just is unsafe.
    let foo_number = unsafe { *(foo_ptr as *const u32) };

    println!("{foo_number}");
}

The key here is this line:

let foo_number = unsafe { *(foo_ptr as *const u32) };

This means something like this:

Rust, I know you’re keeping track of what types go with what memory addresses. I know foo_ptr is a memory address of text (*const str means pointer to str, and str means text). But I want you to pretend it’s a pointer to an unsigned 32-bit integer (which is little endian on most machines, including the author’s Mac Book M1 which has an ARM64 processor), and read it according to that interpretation instead, letting me do operations appropriate to that interpretation.

And it prints, of course, on my machine:

1413892423

If we’d done println!("{foo}"), we would’ve gotten:

GIFT

The same data is passed to println!, but what it actually does is based on the type of the data. Again, this type is not tracked explicitly in the outputted machine code. Rust just makes sure that the machine code is appropriate for types that make sense.

This mechanism that Rust uses is called static typing, where instead of using meta-data like Python does, Rust creates a program that does the right thing, or else rejects a program that does something nonsensical or incoherent (or else fails to reject it because you tell Rust you know what you’re doing is unsafe).

Static typing has many uses. It is primarily used to make sure that you only do operations that make sense for the type you have. Some operations do different things to different types – + means one hardware operation for an integer like u32, and something else for a floating point like f32, and static typing also keeps track of that. You can create new operations like that – they are called polymorphic.

Static typing is also used to reject programs where that is not possible, where you write according to one binary format and read from another, unless you use unsafe to override these checks. The resulting programs would otherwise be incoherent and nonsensical, which could lead to memory corruption, especially if the optimizer is involved, which assumes you’re following the rules when it modifies the program to make it faster.

Static typing can also be used via creating custom types. These custom types might mean specific things in a certain context, to distinguish bits in more detailed ways than the built-in types do so. Are three f64 values a color (red, green, and blue) or a coordinate in a three-dimensional grid (X, Y, and Z)? Two types can be created to distinguish them, beyond what the built-in type of f64 already does:

struct ThreeDimensionalCoordinate {
    x: f64,
    y: f64,
    z: f64,
}

struct Color {
    red: f64,
    green: f64,
    blue: f64,
}

fn draw(coord: ThreeDimensionalCoordinate, color: Color) {
    // ...
}

Now, if there are 3 f64 values in a row, we can use Rust’s static typing system not to just track that they’re all f64 values, but whether they together represent a color or a coordinate. Otherwise, a user might accidentally mix them up calling the draw function, and the program might do something illogical.

So, static typing prevents incoherent code. It does it before you get a chance to run it, making it easier to catch bugs. And it makes it so you need less meta-data at run time (though some programming languages leverage both static typing and run-time type meta-data).

What Bits Mean: Binary Integers and Two's Complement

2024-04-15T00:00:00+00:00

I was explaining two’s complement recently¹ to a friend, and I thought my explanation was decent, so I decided to write it up and share it with you, my general blog audience, as well! If you already know about two’s complement, this will pretty much just be a review. If not, you may learn something, and you may not understand all of it. Try to get what you can without getting too anxious, there will not be a test!

In either case, feel free to ask questions in the comments or nit-pick any mistakes you see!

Storing Numbers in Binary

So, without further ado, let’s talk about binary.

Computers store all information in binary, in terms of combinations of two values, conventionally called “zero” and “one.” It is easy to distinguish two values in a physical representation: the presence or absence of current on a wire, or of a radio signal; two areas magnetized in the same direction or opposite direction; a capacitor that is charged or not charged. Only having two possibilities for each storage location is just easier for computing circuitry to work with, and so all information in a computer is stored as patterns of two values.

All information in a computer is stored in binary, whether it be text, images, audio, video, scientific data, and even programs. Binary is meaningless without a convention for interpreting it. Today, we will talk about how numbers are stored in binary, specifically integers.

So how do we encode integers in binary? Let’s start out by assuming all the integers we might want to store are non-negative, and then we will discuss later how to accommodate negative numbers.

Computers encode integers similar to how humans do with a pen and paper, by using a positional number system. When we read a number like 357, we know that it contains 3 hundreds (3*10^2), 5 tens (3*10^1) and 7 ones (7*10^0), where ^ is used as the symbol for exponentiation.

Computers use a similar system, but with 2 playing the role of 10. Said another way, computers use base 2 instead of base 10. So, 1101 represents the number 13:

1101
1 * 2 ^ 3 + 1 * 2 ^ 2 + 0 * 2 ^ 1 + 1 * 2 ^ 0
1 * 8     + 1 * 4     + 0 * 2     + 1 * 1
8         + 4         + 0         + 1
13

Instead of (from right to left) seeing a ones place, tens place, and hundreds place, we have a ones place, a twos place, a fours place, and an eights place. It continues with increasing powers of 2 (again, right to left in the number): 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536…

When we write numbers in base 10, we write as many digits as it takes to store the number, and no more. 357 is a 3-digit number, but 5364 is a 4-digit number. But when computers store numbers in binary, they generally have a fixed amount of memory to work with, memory that is specifically designated to store this number. If the number can’t be stored in that many binary digits – that many bits – then, well, it can’t be stored in that amount of memory. Programmers in lower-level languages (like C++ or Rust) simply have to be careful that this situation doesn’t arise, either by ensuring no larger numbers will be stored, or else checking for situations where larger numbers would be stored, and either signalling an error or arranging an alternative storage arrangement.

Usually, a programming language would support 8-bit, 16-bit, 32-bit, or 64-bit numbers. For unsigned numbers, these correspond to the types uint8_t, uint16_t, uint32_t and uint64_t in C or C++, or u8, u16, u32 and u64 in Rust. A type, by the way, is a particular way of looking at a collection of bits in binary as a value, determining both a mode of interpretation and a size of how many bits are included in the value.

All the bits in the allocated space must be set to 0 or 1. So, if we want to store the number 13 in an 8-bit number, we have to use 0s for all the higher-order bits: 00001101 means the same as 1101, just as 000357 means the same thing as 357.

For the purposes of this document, however, let’s talk about 4-bit integers. 4 bits is half a byte or a nibble. Most processors do not have the capacity for directly interfacing with 4-bit integers – they only deal with 8 bits, or one byte, at a time. But you can still program with 4-bit numbers, you just have to do some extra work. And it makes it possible to show every possibility in this document. The lessons learned from it generalize to wider numbers.

Here are all the possible 4-bit integers and their decimal (base 10, normal human) equivalents:

Note that the maximum is one less than 2^4, which is 16. There are 16 possible combinations of bits, but the maximum value is 15, as one of the possible combinations is used to encode 0. All integer types support storing 0.

So, how do we do addition? It’s through a very similar process to how we do addition in base 10. We take the two numbers we want to add, and line them up bit by bit. Let’s add 7 and 3, which we can see from the table are 0111 and 0011. Let’s line them up in classic grade school fashion:

0111
0011

Then, we can start adding. We start from the right, just as we did when doing arithmetic in grade school. 1 + 1 is 2, which in base 2 is 10. So we write down the 0, and carry the 1:

OK, now 1 + 1, + the 1 we carried from the previous step, is 3, which in base 2 is 11. We write down the (right) 1 and carry the (left) 1:

Now, 1 + 0, + the 1 we carried from before, is 2, which is 10. Keep the 0, carry the 1:

Final step, 0 + 0 + the 1 we carried is 1:

This has 1 in the 8s place and 1 in the 2s place, and 0 in the other places. 8+2=10, which is good because we were adding 7 and 3, so 10 is the right answer. If we look in the table above, we see that 1010 indeed corresponds to 10, so we know we’ve done it right.

So that is how we add binary numbers.

Adding Numbers with Circuitry or Program Logic

The addition table, as we can see, is very simple, so it can be represented in circuitry. For each bit, we have three inputs: one bit each from the two numbers, and the carry from the previous bit. We have two outputs, the bit we’re keeping, and the carry to pass on to the next bit.

We can create a complete table for this:

BIT A | BIT B | INPUT CARRY | OUTPUT CARRY | OUTPUT
    0 |     0 |           0 |            0 |      0
    0 |     0 |           1 |            0 |      1
    0 |     1 |           0 |            0 |      1
    0 |     1 |           1 |            1 |      0
    1 |     0 |           0 |            0 |      1
    1 |     0 |           1 |            1 |      0
    1 |     1 |           0 |            1 |      0
    1 |     1 |           1 |            1 |      1

Whenever we have a complete table of a limited number of inputs and outputs, it can be converted to a circuit. If we did so, and then wire a bunch of these circuits together, we could create a hardware adder. I will not go into how to do this in detail, as that is out of the scope of this post, but you can see how it would be simpler than encoding an entire addition table for base 10 in logic gates, which would have 200 entries instead of 8 (addition table with and without carried 1s).

I will, however, show you how simple it is conceptually by writing a program to do it in Rust. In this program, the binary numbers are represented as slices of bools, or true/false values. (A slice is a region of memory with multiple values of the same type.) true corresponds to 1, and 0 corresponds to false. In this program, the slices start from the right (the least significant bit, the one’s place), and go to the left (the most significant bit, the 2^N place) – backwards from what you may be used to, but better suited for implementing math like addition.

fn add(a: &[bool], b: &[bool]) -> Result<Vec<bool>, Error> {
    // Keep track of carry
    // 'mut' means it can change
    let mut carry = false;

    // Place to store the result
    //
    // A `Vec` lets us store a varying number of values of the
    // same type. It's like a slice, but it can grow over time.
    // `Vec::new()` gives us an empty `Vec`.
    let mut res = Vec::new();

    // Make sure we have the same number of bits in each input
    if a.len() != b.len() {
        // We don't? That's an error!
        return Err(Error::MismatchedInputSizes);
    }

    // Go through each position
    for i in 0..a.len() {
        // Examine the corresponding input bits from each input nibble
        // Also examine the carry from the previous step.
        // The result will be the output and the new carry value.
        // 
        // This corresponds to a "full adder" circuit, and in a hardware
        // adder, one of these is used per bit.
        let (output, new_carry) = match (a[i], b[i], carry) {
            // We have a table of what possibilities there are
            // for these input bits, and what two outputs to generate.
            // 
            // In a circuit, this would be expressed through logic gates.
            // All 8 possible combinations of 3 inputs are enumerated.
            // 8 = 2 ^ 3
            //
            // This is a Rust representation of the table shown above.
            (false, false, false) => (false, false),
            (false, false, true) => (true, false),
            (false, true, false) => (true, false),
            (false, true, true) => (false, true),
            (true, false, false) => (true, false),
            (true, false, true) => (false, true),
            (true, true, false) => (false, true),
            (true, true, true) => (true, true),
        };
        carry = new_carry;

        // This bit is added to our result
        res.push(output);
    }

    // What if the result doesn't fit in the same number of bits?
    // Because the highest order bits have a carry
    // We'll write some guess temporarily for now, and
    // discuss carries in more detail later.
    if carry {
        panic!("error message"); // ??? or something?
    }

    // We have successfully obtained a result!
    Ok(res)
}

A full, runnable program is available on GitHub, as are the other examples from this post.

This program uses booleans as a stand-in for bits, with true standing in for 1 and false for 0. It contains the table we created above, but in the form of a match expression. It loops through the two values, from least-significant bit (rightmost, 1’s place) to most-significant bit (leftmost, 2^N place), bringing the carry output from each place and using it as input in the next operation.

Ironically, actually running this program will actually use many more than 4 bits for each number. It is designed to correspond conceptually with the details of adding a 4-bit number. In practice, we’d use the built-in u8 type and let the computer’s built-in addition circuitry do it for us.

Overflow

So, what happens if we have a carry on the last bit? If we’re adding two 4-bit numbers, and we’re storing the result in a 4-bit number, what happens if we add 10 and 10? The result won’t fit in a 4-bit number! You might assume (as my friend Ilse Purrenhage did) that there would be “an error message or something” (and therefore that’s what the Rust sample code does)!

Let’s see what happens in practice!

Here is a Rust program that overflows an 8-bit unsigned integer.

fn main() {
    let mut integer: u8 = 255;
    integer += 1;
    println!("{integer}");
}

Here is a C program:

#include <stdint.h>
#include <stdio.h>

int main() {
    // The highest 8-bit unsigned integer possible is 255,
    // or 2^8 - 1, the highest number you can represent
    // before you need a 2^8 or 256 place in binary.
    uint8_t integer = 255;

    // OK, so what happens if we add 1 to it?
    integer += 1;

    // Let's print it out and see!
    printf("%u\n", integer);
}

Let’s start with the Rust program, as this is a Rust-focused blog.

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin overflow
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.37s
     Running `target/debug/overflow`
thread 'main' panicked at 'attempt to add with overflow', overflow.rs:3:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: overflow::main
   4: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Ah, looks like my friend Ilse was right, as this definitely qualifies as “an error message or something.” It’s reasonable that Rust does this! There’s no way to store 256 in a u8, so attempting to should lead to an error.

But the experienced Rustaceans in the audience know there’s another shoe that’s about to drop. We’ve finished developing overflow, and we want to do a production release, so we run it in release mode, giving the compiler more time to work on making the program run fast, and –

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --release --bin overflow
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/overflow`
0

Yes, that is indeed a 0 that was printed. 255 + 1 no longer results in an error message. No, we now see that in debug mode, while we are still developing the project, it results in an error. But once we switch to release mode, we get a 0. And unfortunately, 0 is (checks notes) not the sum of 255 + 1. This is really putting the “or something” into “an error message or something!”

What is going on here? Alright, maybe this is one of the things they are complaining about when they say Rust is hard to learn. Let’s move on to C, the “easier” programming language –

[jim@palatinate:~/Writing/thecodedmessage-examples/src]$ cc -o overflow overflow.c
[jim@palatinate:~/Writing/thecodedmessage-examples/src]$ ./overflow
0

– in which it would also appear that, for 8-bit integers, 255 + 1 = 0.

So what exactly is going on? Well, let’s do out 255 + 1 in binary, using the good ol’ elementary school addition algorithm. We add 1 and 1, and get 10 (2), so carry the 1 and write down 0, which collides against the next 1, so carry the 1 and write down 0, until:

11111111  (carry bits)
 11111111 (255)
 00000001 (1)
---------
 00000000

After doing all 8 bits, we have 8 bits of 0 as output, and still a 1 being carried to the 9th bit (also known as bit 8), the 256s place. Of course, there is no 9th bit in the output, which is why we get an error message or something when we run this. See our original binary.rs program which does this algorithm out by hand using booleans:

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin binary 00000001 11111111
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.41s
     Running `target/debug/binary 00000001 11111111`
thread 'main' panicked at 'error message', src/binary.rs:64:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: binary::add
             at ./src/binary.rs:64:9
   3: binary::main
             at ./src/binary.rs:78:18
   4: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

This was, if you remember from our example above, triggered because the carry variable was still true after the entire loop!

// What if the result doesn't fit in the same number of bits?
// Because the highest order bits have a carry
if carry {
    panic!("error message"); // ??? or something?
}

But what if we just remove that code?

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin binary 00000001 11111111
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.39s
     Running `target/debug/binary 00000001 11111111`
00000000

If we just ignore the last carry bit, instead of doing anything at all, we see that 1 + 255 does indeed equal 0. It equals 0, carrying a 1 into the 256s place, but if we just ignore that that carry might happen, it just equals 0. And that is what C does. And if we are asking Rust to optimize for performance rather than debuggability, that is also what Rust does.

In fact, this is exactly the behavior that the C standard explicitly requires for unsigned values. Code can be written that relies on this behavior.

So what exactly is happening here? How can we understand 255 + 1 = 0? Well, we are ignoring a carry, and ignoring places above the 128ths place. This creates a special form of math, where instead of increasing, at a certain point numbers wrap around, like an old-fashioned odometer going from 999999 to 000000, or a 24-hour clock going from 23:59 to 00:00, or the day of the month going from 31 to 1.

Modular Arithmetic

This is known in mathematics as modular arithmetic, which is an important part of number theory, or the study of whole numbers and integers (as opposed to rationals and reals, where modular arithmetic would make little sense). In modular arithmetic, we treat numbers as if they are equal – some call it congruent – if they have the same remainder when divided by a number. The number we divide by can vary – it can be, for example, 2, 10, 24… or 256.

For example, 24-hour time works modulo 24. 23 (or 11PM) + 1 hour yields 0 (or midnight). 22 (or 10PM) + 4 hours yields 2 (or 2AM).

We can also do modular arithmetic modulo 10 by considering decimal numbers and only paying attention to the one’s place. 3 * 5 = 15, but we are only paying attention to the one’s place, so we say that 3 * 5 = 5 modulo 10.

One property of modular arithmetic is that negation works weirdly. When working modulo 10, 9 also serves as -1. They are equivalent modulo 10, because they are a multiple of 10 apart. It works functionally: 6+9=5 modulo 10, or 1+9=0 modulo 10. Adding 9 is the same as subtracting 1, so we can say that 9 is congruent to -1.

Well, in C, and in Rust with debugging turned off, arithmetic on a u8, or any unsigned integer type, wraps around modulo the maximum value plus 1. For 4-bit integers, this would be 1111 + 1 in binary (or 15 + 1 in decimal), or 16. For 8-bit integers, it’s 256.

This is a natural consequence of ignoring the final output of the carry bit. Just like ignoring digits above the 1s place yields modulo-10 arithmetic, ignoring bits above the 8s place yields modulo-16 arithmetic.

It does mean that, when we want to subtract 1 from a u8, we can instead add 0 - 1, which is 256 - 1, which is 255. When we want to subtract 2, we can instead add 0 - 2, aka 256 - 2, aka 254.

Two’s Complement

In fact, this is how computers implement subtraction in circuitry – and not just how we implement subtraction, but also negative numbers. To negate a number, instead of counting up from 0, we count down from 0, but with wrap-around.

Let’s return to our 4-bit example, as that’s easier to work with. Each combination of bits can be interpreted as a negative number, or a positive number.

Bit Pattern | As Positive | As Negative
0000        | 0           | -16
0001        | 1           | -15
0010        | 2           | -14
0011        | 3           | -13
0100        | 4           | -12
0101        | 5           | -11
0110        | 6           | -10
0111        | 7           | -9
1000        | 8           | -8
1001        | 9           | -7
1010        | 10          | -6
1011        | 11          | -5
1100        | 12          | -4
1101        | 13          | -3
1110        | 14          | -2
1111        | 15          | -1

And, of course, it just wraps around after: 1111 + 1 (15 + 1) = 0000 (with a carry beyond the highest order digit).

For most operations, it doesn’t matter whether we interpret the number as positive or negative. Addition, subtraction, multiplication, and division will all wrap around. The computer just cares about the pattern of bits, and applies the circuitry to it.

Of course, if we want to display the number to the user, it matters. We don’t want the user to enter -1 and the computer to randomly display 15 later (or 255 for 8-bit, or 65535 for 16-bit, or 4294967295 for 32-bit). Somehow, we need some mechanism of deciding when we want to interpret the number as -1, and when we want to interpret it as 15.

Similary, comparisons. -1 is less than 1. 15 is greater than 1. Sign matters. Where does it wrap around? For what N is N + 1 < N?

It’s an arbitrary cut off. And there’s two conventions.

For one of the conventions, unsigned integers, well, it’s always positive. The N for which N + 1 < N is 15 (for 4 bit arithmetic), as 15 + 1 wraps around to 0. 15 > 1. 0 < 1. Subtract 0 - 1 and display it? You get 15.

The following program displays 15:


#include <stdio.h>

int main() {
    struct {
        unsigned four_bits: 4;
    } bitfield;
    bitfield.four_bits = 0; // Starts out with 0000
    bitfield.four_bits -= 1; // Modify field by subtracting 1
    printf("%u\n", bitfield.four_bits); // Display on screen
}

See the word unsigned there in the declaration of four_bits? That’s the unsigned convention for arithmetic.

Well, the opposite of that is signed integers. For these, the numbers that are interpreted as negative are the numbers for which the highest-order bit is a 1. The cut-off N, for which N + 1 < N, is 7. 7 + 1 = -8. 7 + 1 < 7.

Here’s our original table with the standard signed interpretations left in:

Bit Pattern | As Positive | As Negative
0000        | 0           |
0001        | 1           |
0010        | 2           |
0011        | 3           |
0100        | 4           |
0101        | 5           |
0110        | 6           |
0111        | 7           |
1000        |             | -8
1001        |             | -7
1010        |             | -6
1011        |             | -5
1100        |             | -4
1101        |             | -3
1110        |             | -2
1111        |             | -1

For another way of looking at it, remember we discussed that these bits were coefficients to powers of two. 0101 is 5 because it’s 0 * 2^3 + 1 * 2^2 + 0 * 2^1 + 1 * 2^0. It has 1s in the 4s place and the 1s place, so it has 1 4, and 1 1, and nothing else.

Well, what if instead of that higher-order bit being the 8s place, it were the -8s place? The other bits remain positive in value. 8 and -8 are equivalent modulo 16, so this doesn’t actually change the meaning of the bit in the modular sense. But it does change the meaning of the > and < operations.

0101 is still 5, but it’s 0 * (-8) + 1 * 4 + 0 * 2 + 1 * 1 now. And 1111 is -1, because it’s 1 * (-8) + 1 * 4 + 1 * 2 + 1 * 1, or -8 + 4 + 2 + 1 or -8 + 7 or -1.

I’m not making this needlessly complicated! If it is needlessly complicated, it’s not me who’s making it that way! This is how computers actually work, because it makes sense that way, from a circuitry design perspective and from an engineering perspective.

OK, now for some fun facts!

This way of storing signed integers in computers is known as two’s complement.

Two’s complement negation can be computed by inverting all the bits and adding 1. This is opposed to 1’s complement, where we negate by inverting all the bits, and the circuitry is annoying and stupid. It is also opposed to sign/magnitude, where the top bit indicates sign by negating the whole rest of the number if it is 1, rather than subtracting a value (e.g. 8 in 4-bit integers, or 128 in 8-bits).

Almost all computers use two’s complement to store integers these days, for all the reasons discussed above. For non-integers, all bets are off, but sign-magnitude is popular for floating point numbers overall.

In two’s complement, -1 is always represented as all bits 1. Why? Well, for the same reason 10,000 - 1 is 9,999, and 100,000 - 1 is 99,999.

In fact, if we start by 1 followed by many zeros in any base, and we subtract 1, then we get the maximum digit of that base repeated. If we’re in base 8, 1000 - 1 = 777. If in base 16, then 100 - 1 = FF (in base 16, it is conventional to use the letters A-F to represent the digits for 10-15).

Why is this? Well, doing the subtraction out, right to left, we start with 10-1, with a borrowed 1. The value one less than the base is the greatest digit of that base: 1 less than 10 is 9. Then, to get that borrowed 1, we must go to the next digit to the left, and subtract that 1. But that digit is also a 0, so we must borrow a 1 even further out, so we get 10-1 again. This continues until we get an actual 1 and can no longer borrow.

Of course, when subtracting from 0 in a computer context, you can always borrow past the left-hand side. 0000 and 10000 are the same bottom four bits; they are congruent modulo 16. Borrowing a 1 from off the end is always an option.

Or, considered another way, if you add binary 1 to 1111, you will get 10000, for the same reason that if you add 1 to 9,999, you will get 10,000. That last bit falls off the end, though, so 1111 is just -1, because 1111 + 1 = 0.

This also means that to negate a number N, you can:

Invert all the bits. This gives you -1 - N.
Add 1. This gives you -N.

Let’s do this one step at a time.

Invert all the bits. This gives you -1 - N. Let’s figure out how to do -1 - N, and we’ll see it works out to inverting all the bits. So, -1 has all bits 1, so no bit will need to be borrowed. If you subtract a 1 (from the N) from the 1 (from -1), it becomes a 0. If you subtract a 0 (from the N) from the 1 (from the -1), it becomes a 1.

Okay, perhaps it’s better to demonstrate visually. This is -1-5, as a subtraction problem in binary:

  1111    (-1 aka 15)
- 0101    ( 5)
  ----
  1010    (-6 aka 10)

By subtracting 1111 - 0101, we got 1010, the bit-inversion of 0101. Said another way, by inverting all the bits in 0101, we got 1111 - 0101. Said another way, by inverting all the bits in 5, we got -6.

Add 1. This gives you -N.

And that is what computers do with signed integers when you have an integer n and write let negate_n = -n; – the computer provides circuitry internally that lets you invert all the bits and then add one.

So, now we know how to represent numbers as signed and unsigned. But we also treat them as equivalent. That’s confusing. So what’s up with that?

For some operations, this representation makes it so you don’t need to care about signed and unsigned. These include addition, subtraction, and multiplication. As long as you assume that wrapping-around doesn’t happen (which is the faster, more efficient implementation in circuitry), these operations literally do the same thing in signed and unsigned. One person’s overflow is another person’s subtraction, but as long as we’re OK with overflow, subtraction is cool too.

Ironically, negation goes in this category. It doesn’t need to care which numbers you interpret as positive or negative to give you a number -N that, when you add it, undoes the effect of adding N.

However, if you want to do checked versions of addition, subtraction, and multiplication, where the program notices when adding two positive numbers results in a number smaller than both, and causes a trap, a stop in the program’s normal behavior, “an error message or something,” then how that check works differs in signed and unsigned.

This check would involve a version of the addition operation that actually used the carry output from the last bit. But there’s two interpretations of that. Should -1 + 1 trigger? In unsigned arithmetic, where it’s actually 15 + 1 (or 255 + 1, or 65,535 + 1), it probably should, because wrapping around to 0 is a trap. But in signed arithmetic, -1 + 1 is not an overflow at all.

So, in unsigned arithmetic, the normal carry output of the leftmost (most significant) adder circuit can be used. Given the leftmost input bit of the first argument, the leftmost input bit of the second argument, and the carry from the next adder circuit to the right (the second-most-significant bit, if you will), if 2 or more of those bits are 1, well, then, signal an overflow.

For signed arithmetic, it’s about how the sign bits line up in the input and the output.

Fun aside: The sign bit is another name in two’s complement for the most significant bit, as it is 1 if and only if the number is negative. Some overzealous teachers will say it’s not a sign bit because its significance can be ignored in many operations, but that’s even more obnoxiously pedantic than saying that there’s no such thing as centrifugal force, and more inaccurate. Every processor designer (including Intel and ARM) refer to this bit as the sign bit, which makes sense because it tells you what sign the number is.

If the two input sign bits are the same in an addition, and the result bit is different, then you have a signed overflow. For example, 1111+1111=1110 is fine, as that’s -1 + -1 = -2. Similarly, 1111+0010=0001 is fine, as that’s -1 + 2 = 1. But 1000+1000=0000 is weird, as that’s -8 + -8 = 0. The two input sign bits are both 1 in an addition, and the output sign bit is 1 – suspicious.

So the situation where unsigned integers wrap around in defiance of normal integer math is known as “carry” on Intel, and is indicated by a “carry flag,” which can be checked to output an error message or something. Similarly, the situation where signed integers wrap around is known on Intel as “overflow,” which is indicated by an overflow flag.

The other flag is always meaningless. Which flags you check on Intel is an indication of whether you are doing unsigned or signed (i.e. two’s complement) arithmetic. And, since operations like less than and greater than are also implemented on Intel by checking flags, literally the only difference on Intel between signed and unsigned arithmetic is what flags you check.

Fun fact: Comparisons on Intel are done by the cmp instruction, which does a subtraction, throws away the result, and sets the flags accordingly. The flags can then be inspected to determine which input was greater or less, or less-or-equal or greater-or-equal, with either signed or unsigned semantics. All the same flags are set with a normal subtraction.

By the way, every time you add or subtract on Intel, both carry and overflow flags are set accordingly. It’s easier in circuitry to just do both, the operations are so similar.

You can read more on Intel’s flags here.

Summary

Collections of bits can be used to store integers, using base 2. Addition, subtraction, and multiplication are implemented in such a way that wraps around, and any number can have a positive or negative interpretation. This weird type of math is called modular arithmetic.

If we interpret all the numbers as positive, then we are doing unsigned arithmetic. Programming languages represent this by referring to unsigned integers, but to the processor, they’re all just integers. The question is just what kind of arithmetic we’ll do. With this interpretation, adding two numbers and getting a smaller one is overflow, and subtracting from a number and making it bigger is called underflow. This may or may not be detected, but if it is, it is done on Intel by checking the carry flag. Besides that, it will happily do the arithmetic in a modular way. Less than and greater than are also evaluated via the carry bit.

If we interpret some of the numbers as negative, it’s based on the top-most bit, the sign bit. This is known as signed arithmetic, and programming languages will use this for signed integers. Numbers are still interpreted based on their negative meaning in modular arithmetic. In this interpretation, adding two negative numbers and getting a positive, or adding two positive numbers and getting a negative, is known as overflow, and it shows up in the overflow flag.

In either case, Intel processors perform addition, subtraction, multiplication, and negation the exact same way for signed and unsigned arithmetic. The only difference is the flag, and Intel will always do the work to set or clear both flags appropriately. To distinguish between signed and unsigned arithmetic, programming languages check the specific flag they’re interested in for appropriate operations, like less than and greater than operations.

Two’s complement is not to be confused with “twos compliment” (drawing by my friend Ilse Purrenhage):

This word is relative. ↩︎

Sorting Polymorphically in Many Languages

2024-02-05T00:00:00+00:00

Polymorphism is a powerful programming language feature. In polymorphism, we have generic functions that don’t know exactly what type of data they will be operating on. Often, the data types won’t even all have been designed yet when the generic function is written. The generic function provides the general outline of the work, but the details of some parts of the work, some specific operations, must be tailored to the specific types being used. The generic code needs some way of accessing these specific operations, and the users of the generic code need some way of specifying them.

There are many use cases for polymorphism. When sorting an array, the algorithm will need to be adapted to the specific element type, so it knows how to compare elements. When drawing virtual objects on a screen, an algorithm might choose where to put each object and which objects to draw, whereas each type of object might have its own specialized implementation of how to draw it.

These are just two examples among many. Most complicated projects have many polymorphic functions. Even in languages that don’t support polymorphism directly, there are usually ways of building it out of existing primitives.

The example I’ve chosen is sorting, specifically sorting an array or vector. It’s just an example; a lot of what I say applies generally to how polymorphism works in that programming language.

This is a good example, as sorting is a function where it’s really obvious where polymorphism is required to get a properly generalizable algorithm. A lot of discussions of polymorphism invent contrived situations where polymorphism seems overkill, and I think that’s fundamentally confusing.

On the other hand, it’s a bad example in some ways, because it only makes sense in the context of a homogeneous array or list, where every element is the same type. This is a bad example because heterogeneous containers, where every element has a different type and the polymorphic function has to look up as many function implementations as there are elements, provides a very different set of problems to solve.

This is especially important as Rust and C++ both provide two types of polymorphism, compile-time and run-time, also known as static and dynamic. The question of which to use is complicated, but for sorting, compile-time or static polymorphism is clearly the appropriate choice, with run-time or dynamic polymorphism feeling very awkward and forced. Heterogeneous containers generally must use some form of dynamic polymorphism (whether through virtual functions in C++ or through type erasure).

So, while I think this example will be illustrative, it won’t allow us to explore run-time, dynamic polymorphism on its home turf, if you will. Hopefully, I can make up this deficit in future blog posts.

Sorting: A Polymorphic Function

Sorting algorithms are a true use case for polymorphism: rather than distinguishing between a small set of options, many types support the operations necessary for sorting. The algorithm is agnostic to the implementation of those operations. Quick sort, insertion sort, and merge sort apply equally well to sorting integers, floating point values, or alphabetizing strings – any algorithm can be combined freely with any type, or at least any type for which a concept of “ordering” exists.

Here are the operations or properties (or dare I say, traits) that a type needs to be sortable, and that a generic sorting algorithm might need to find out about. The first one is obvious to OOP programmers, but the other two more subtle, and implied in many OOP programming languages:

Ordering or comparison: Given two values a and b, this operation answers which is greater, or determines that they are equal. Some types have the additional possibility that they are incomparable – arrays of those types cannot be sorted by most algorithms.
Swapping or moving: The data has to be able to be moved around to turn the unsorted array into a sorted one. This is automatic in many OOP languages for object types due to ubiquitous use of indirection. It is also automatic in Rust, where every type can be moved by just copying all the bytes.
Striding the array or size: Given a pointer to one element, how do you get to the next one? By how many bytes must you increment the pointer? Most sorting algorithms require this to be constant. If you use indirection for the values, this is also trivial. If you do not, it is key information.

These operations – or more generally, traits of a type – can then be combined with a sorting algorithm to create a concrete procedure to sort an array for a given concrete type.

So let’s see how various programming languages handle this.

Programming Language #0: Sorting in C

I will start our tour of programming languages with C. C – the non-OOP, non-C++ programming language; the classic “portable assembly language” from 1972 – doesn’t have many polymorphic algorithms, algorithms that accept any type, because you have to implement polymorphism by hand. But sorting is an important enough one that standard C does have a generic sorting function: qsort for quicksort (and on many systems, heapsort and mergesort` are also avaialble). Because polymorphism is implemented by hand, we can look at this function to see how one might specifically tailor polymorphism to the problem of sorting.

Here is the function signature for qsort:

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

It can be used to sort blocks of memory containing a sequence of integers, foating point values, or (pointers to) strings – any comparable and (trivially) movable fixed-size type.

C function signatures can be hard to read, so I’ll break it down argument by argument:

void *base: This is an untyped pointer (void *) to the beginning of the block of memory to be sorted.
size_t nmemb: This is a bound, how much memory is contained in the block of memory. C often represents aggregates by two values, base and a count of the members.
size_t size: How big is each member? On a typical 64-bit system, an int is 4 bytes, a double is 8, and char * for strings are 8 bytes. Custom types might be any size. qsort should work for all of these types, without indirection.
int (*compar)(const void *, const void *): This is the interesting part. This is a function pointer for the comparison operation as discussed above. You write a function that takes two pointers to two elements, and returns a value that encodes their relationship.

Swapping is assumed to be byte-by-byte, and so size covers the last two attributes of the type listed above. The key one here is compar, a bit of code that qsort has to call to do an operation specific to your type, a small policy injection that adapts a generic algorithm to your particular type.

The return value of compar is an int, but it is interpreted according to a C convention, shared with (for example) the string comparison function strcmp. For a ? b, a return value r is interpreted thus:

if r < 0, a < b
if r > 0, a > b
if r == 0, a == b

So, here’s a complete C program that sorts its command line arguments – including the program name:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare_strings(const void *a, const void *b) {
    // `a` and `b` are pointers to the element type, which in
    // this case is `char *`. Thus they are `char **`.
    //
    // Nothing is stopping you from getting this wrong and putting
    // `char *` instead -- it will just silently not work. The
    // compiler can and will make you write `const` in the right
    // place, though.

    char * const* a_str_ptr = a;
    char * const* b_str_ptr = b;

    // `strcmp` uses the same convention as `qsort` for comparison.
    return strcmp(*a_str_ptr, *b_str_ptr);
}

int main(int argc, char **argv) {
    qsort(argv, argc, sizeof(char *), &compare_strings);

    for (int i = 0; i < argc; i++) {
        printf("%s\n", argv[i]);
    }

    return 0;
}

But the same qsort function can also be used to sort integers, if given different parameters and a different comparison function:

#include <stdlib.h>
#include <stdio.h>

int compare_ints(const void *a_vp, const void *b_vp) {
    const int *a_ip = a_vp;
    const int *b_ip = b_vp;

    int a = *a_ip;
    int b = *b_ip;

    if (a < b) {
        return -1;
    } else if (a == b) {
        return 0;
    } else { // a > b
        return 1;
    }
}

int main(int argc, char **argv) {
    int intary[10] = { 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
    qsort(intary, 10, sizeof(int), &compare_ints);

    for (int i = 0; i < 10; i++) {
        printf("%d\n", intary[i]);
    }

    return 0;
}

qsort implements a form of manual run-time polymorphism, in a programming language with no built-in support for polymorphism. It behaves differently based on the element type, as passed to it via a variety of arguments. One of the traits – the comparison operator – differs between types in a way that requires custom code, and this is passed in via pointer. qsort then invokes the operation via indirect function call, the same mechanism that is used for polymorphism in OOP. But unlike OOP-style runtime polymorphism, there is just one function pointer for all the items, rather than each item coming with its own “vtable.”

Note that the optimizer is not able to eliminate this indirect call, especially in the qsort example, where the sorting function is in the standard library, whereas the function calling it and the comparison function are both in application code. This comes at a performance cost, which means that if you’re programming C and the performance of this particular sort is essential to your program, it might easily make sense to write custom sorting code that is not polymorphic.

Programming Language #1: Sorting in Java

Java is about as far from C as you can get in this matter. C provides no abstraction or language features specifically for polymorphism, and in qsort we use a low-level tool it does provide – function pointers – to build it ourselves. In Java, however, the programming language is explicitly object-oriented, and so the whole programming language is designed to encourage you to leverage polymorphism, as that is one of the pillars of object-oriented programming.

The version of polymorphism available in Java is dynamic, run-time, “late binding” polymorphism, the type of polymorphism that OOP favors. It is based off of the idea of overriding methods, either from base classes, or interfaces that a custom type (a “class”) can implement.

As I mentioned before, this is not the best match for the problem of sorting, at least not the type of sorting we’re talking about. Run-time polymorphism means that every individual element could potentially have a different comparison procedure, which is unlikely. The possibility of such a thing happen increases the cognitive load.

Nevertheless, Java does support polymorphic sorting, and it’s useful to discuss specifically because it does show how OOP-style polymorphism works when applied to such a problem.

There are many methods that do sorting in Java. Some of them take an explicit argument to convey how to do comparisons, just like the qsort example. But more commonly, we sort according to what Java refers to as the “natural order” of the elements, as (for example) in this overload of Collections.sort, with the following signature:

public static <T extends Comparable<? super T>>
void sort(List<T> list)

This sorts a list of elements of type T, where “list” in Java can refer to any of a number of collections that store data in order, such as in a single allocated array (ArrayList) or a linked list (LinkedList). Therefore, it is not only polymorphic in how to compare the elements, but also in how to navigate through the list.

It needs to know about the same traits of type T that qsort does. Some are not polymorphic: for this method to make sense, we know that T must be a reference type, that it must be boxed (that is, it must use indirection), and that therefore the size of an element is always the natural pointer size of the platform, and swapping the element only involves swapping the pointers.

But there’s no getting around the polymorphism of comparisons, and so we see this strange annotation on the function signature: <T extends Comparable<? super T>>. This indicates that T must implement the interface Comparable – implement in this context is called extends. Specifically, it must implement that interface in such a way that it can be applied to other elements of type T (which means that it uses T or some “supertype” of T).

The notation is complicated, because the semantics are complicated. Technically, T could be comparable to a parent type of T, and that would still work. In fact, T could refer to an entire class hierarchy of types derived from some base class, all of them comparable in different ways to objects elsewhere in the hierarchy and to objects derived from a yet further base class. Objects of type T could even be comparable to any arbitrary object – and all of this is covered in <T extends Comparable<? superT>>, trying to express at compile-time what will cause the type T to be a reasonable type to use for sorting.

But this is all just an extra check that the compiler can do at compile-time to prevent run-time errors, because all of the information on how to do the comparisons is available at run-time. In fact, other methods don’t use such formal prerequisites at all, preferring to query at run-time for appropriate interfaces, throwing an exception if they are not present.

In all of these cases, the comparison is the “natural ordering,” which is defined to mean that comparison is done through a Java interface. Specifically, these methods use the Comparable interface, which specifies a method, compareTo, which must take an implicit this parameter and an explicit parameter of the type being compared to, and, like the comparison functions in qsort, must then return an integer whose sign indicates whether the first value was greater or the second (with zero indicating equality).

This natural ordering is defined on a per-type basis. Each type can only implement Comparable once. Fortunately, the regular built-in types, all the ones we are likely to use, all come with good natural orderings. For example, this code all works:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        List<String> argList = Arrays.asList(args);
        Collections.sort(argList);
        for (String arg : argList) {
            System.out.println(arg);
        }

        List<Integer> list = new ArrayList<Integer>();
        list.add(1);
        list.add(3);
        list.add(2);
        list.add(4);
        Collections.sort(list);
        for (int i : list) {
            System.out.println(i);
        }
    }
}

See it in use:

$ java Sort b c a
a
b
c
1
2
3
4
$

It gets a little less coherent when we mix different types of object in the same list, which Java lets us represent in the type system by using Object, which is a type that can store a reference to any non-primitive (including boxed primitives):

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        List<Object> list = new ArrayList<Object>();
        list.add(1);
        list.add("Hi");

        Collections.sort(list);

        for (Object i : list) {
            System.out.println(i);
        }
    }
}

While the Java runtime allows us to create such a collection, the type system does not allow us to use Collections.sort to sort it, as Object does not provide us enough information to make sure these elements properly can be compared to each other (which in fact, they cannot, as comparing strings to integers is not defined in Java’s “natural ordering”):

$ javac Sort.java
Sort.java:9: error: no suitable method found for sort(List<Object>)
        Collections.sort(list);
                   ^
    method Collections.<T#1>sort(List<T#1>) is not applicable
      (inference variable T#1 has incompatible bounds
        equality constraints: Object
        lower bounds: Comparable<? super T#1>)
    method Collections.<T#2>sort(List<T#2>,Comparator<? super T#2>) is not applicable
      (cannot infer type-variable(s) T#2
        (actual and formal argument lists differ in length))
  where T#1,T#2 are type-variables:
    T#1 extends Comparable<? super T#1> declared in method <T#1>sort(List<T#1>)
    T#2 extends Object declared in method <T#2>sort(List<T#2>,Comparator<? super T#2>)
1 error
$

So how does this work? What is a Java interface? What are its advantages or disadvantages?

Well, Java has two types of values: primitives on the one hand, and object references on the other. In order to use interfaces, or polymorphism at all, we must be dealing with objects. For primitives, there are separate methods for sorting various types of arrays in the Arrays class. As primitives cannot be stored directly in collections, Collections doesn’t have to deal with them.

So, to use this polymorphism through interfaces, we must be dealing with objects. Objects in Java are a rich, standardized data structure, which is why it’s possible to query at run-time which interfaces an object supports. Objects contain not just the fields that the Java programmer specifies, but additional metadata that includes implementations of any supported interfaces, including Comparable. That metadata can be used to find the right version of the compareTo method to use to sort objects of type T. Once we have a T, we can query it at run-time to find the compareTo method. Theoretically, Java might query every object separately as it sorts, with a separate query for each comparison, although I trust that modern Java will in many cases realize that the method will be the same for each object, and figure out a way to optimize it out.

As a programmer of a type, we simply declare at the top of the class that our type Foo, for example, implements Comparable<Foo>, and then lower down include our implementation of compareTo among our methods with the override keyword. Based on that, Foo objects will be created with the correct metadata such that Java will know to use that method for comparison when sorting, whether the type is known at compile-time or at run-time. We can implement our own version of compareTo that has a different type than the typical “natural ordering” one would expect from the state that is contained in a Foo:

import java.util.*;

public class Sort {
    private static class Foo implements Comparable<Foo> {
        int inner;

        public Foo(int inner) {
            this.inner = inner;
        }

        @Override public int compareTo(Foo foo) {
            // Less and greater are swapped by this compared to int
            // comparison
            if (foo.inner > this.inner) {
                return 1;
            } else if (foo.inner < this.inner) {
                return -1;
            } else {
                return 0;
            }
        }

        public String toString() {
            return "" + inner;
        }
    }

    public static void main(String[] args) {
        List<Foo> list = new ArrayList<Foo>();
        list.add(new Foo(3));
        list.add(new Foo(4));
        list.add(new Foo(1));
        list.add(new Foo(2));

        Collections.sort(list);

        for (Object i : list) {
            System.out.println(i);
        }
    }
}

Here is the output:

$ java Sort
4
3
2
1
$

Built-in types such as String and Integer already provide their own compareTo override methods, corresponding to more typical implementations of comparisons. Only the author of each type can provide information on how the types are to be compared in this way. To get around this, you can use a wrapper type for each element (like Foo), or you have to fall back on passing in the comparison function the old-fashioned way, like in qsort – though in Java passing in a function is accomplished here through yet another interface, Comparator, as in this alternative function:

public static <T> void sort(List<T> list,
                            Comparator<? super T> c)

Here, Comparator is effectively a function pointer with context, but it’s expressed as an interface so that you can write a concrete class that implements the desired function. Fundamentally, Rust and C++ do something similar.

So, how are we to evaluate this system? It’s not particularly designed for situations like sorting. The run-time system is built for the heterogeneous containers, where each individual element of a collection might have a different opinion on how to compare itself to the others. The amount of run-time flexibility is overkill to the situation.

Rather than providing one sorting function pointer, as in the C example, each object comes with its own infrastructure for finding out how to not only sort, but do every other thing that Java might want to do polymorphically with that object, such as convert it to a string, or hash. While the infrastructure is well-optimized and performant for the assumption of heavy use of OOP-style polymorphism, it clearly doesn’t hold to the C++ or Rust performance ideals of not paying for what you don’t use, instead opting to pay an up-front cost under the assumption that any and all objects will regularly be used polymorphically, in OOP style.

The type system in Java is conceptualized as a way of preventing errors, a layer of safety on top of a more Smalltalk-like natural OOP state. In Smalltalk any method can be invoked on any object, and it’s simply a run-time error if that method isn’t available. In Java, the types form a more rigorous layer to check to make sure our method calls have correct semantics, allowing errors to be caught earlier, at compile-time (although Java type errors are also sometimes caught at run-time). The power of the more ideologically pure form of OOP is still available in Java, as evidenced by the signature on the Arrays.sort method alluded to above (and documented here. It is deprecated, but still possible:

public static void sort(Object[] a)

Here is a use case that succeeds:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        Arrays.sort(args);
        for (String arg : args) {
            System.out.println(arg);
        }
    }
}

Here is the output:

$ java Sort a c b
a
b
c
$

Here is a use that fails:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        Object [] array = new Object[2];
        array[0] = new Integer(0);
        array[1] = "Hi";
        Arrays.sort(array);
        for (Object obj : array) {
            System.out.println(obj);
        }
    }
}

It outputs:

Exception in thread "main" java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
	at java.base/java.lang.String.compareTo(String.java:125)
	at java.base/java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
	at java.base/java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
	at java.base/java.util.Arrays.sort(Arrays.java:1249)
	at Sort.main(Sort.java:8)

The cost of this is acceptable in Java but not in Rust or C++, or C for that matter. Every object must contain individual metadata if it is to be sortable through a polymorphic function, and it must be boxed. In C++ or Rust, we must be able to sort arbitrary unboxed data, without extra metadata included directly within it. But in Java, all types except for primitives are boxed, only boxed types support polymorphism, and they do so at the cost of additional data in each heap allocation to do so. And it works, for Java’s goals, of being a garbage-collected OOP language with a layer of types to expose errors at compile-time.

As the C example shows, this cost isn’t intrinsic to run-time polymorphism in general, but it is intrinsic to OOP-style polymorphism. OOP uses run-time polymorphism at an individual object level as one of its core features, even when the function does not need to be conveyed on a per-element basis, but only once.

Programming Language #2: Sorting in C++

C++, of course, supports this type of run-time polymorphism. We could, if we wanted, build a system like Java’s, where we had an abstract class Comparable that we could use to add run-time data to show every object of a type how to be compared with every other object. We could require that collections to be sorted contain classes that inherit from – in C++, inheritance and interface implementation are the same – Comparable. C++’s run-time polymorphism could be used to implement sorting in the exact same way as Java.

But that’s not how sorting is implemented in C++. Sorting, in C++, uses a completely unrelated mechanism of templates. Templates are C++’s mechanism for static, compile-time polymorphism, just as virtual functions and inheritance are C++’s mechanism for dynamic, run-time polymorphism (of a classical OOP variety that closely resembles Java). In spite of them both being forms of polymorphism, and having many overlapping use cases, templates and virtual functions are completely unrelated features.

I have seen people argue that templates and virtual functions are justified in being completely unrelated, because every situation clearly calls for one or the other. But if it’s possible to do sorting with run-time polymorphism, as we see from Java, then clearly the distinction is not clear-cut as all that. What’s to stop a former Java programmer from using C++’s run-time polymorphism to implement their own sorting function a la Java, even though that’s not idiomatic C++? There’s clearly some level of overlap in use cases, even if not in semantics!

So, how do templates actually work?

Caveat for modern C++ fans: I’m going to save concepts for the end. They don’t actually substantially affect my point (as I will explain). I think it’s simpler to talk about pre-concepts C++ at first, and then discuss how concepts impact (or rather, don’t really impact) the equation.

Templates are a form of macro system. A template (class template, function template, type alias template, etc.) is given parameters at compile-time. Once the template is given parameters, it is instantiated and stamps out a concrete component of the program (a class, function, type alias, etc.).

So, that’s quite abstract. This is a situation where an example can help a lot. In line with our theme, we’re going to write a template that involves comparisons: given two values of any type that you can compare (and we’ll have to decide what that means), which is bigger?

template <typename T>
T max_value(T a, T b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

When we actually invoke it, we provide a type for T, giving us a specialized function where T is replaced by that type.

std::cout << max_value<int>(3, 4) << std::endl;
std::cout << max_value<std::string>("hi"s, "hello"s) << std::endl;

The mere mention of max_value<int> creates a function max_value<int>, and likewise for max_value<std::string>. This function is the template, with the template parameter in brackets standing in for T.

Of course, for function templates, specifying the T is optional, as C++ can infer it, so this code works equally well:

std::cout << max_value(3, 4) << std::endl;
std::cout << max_value("hi"s, "hello"s) << std::endl;

So, what are the resulting functions? It’s very similar to as if we had written:

int max_value(int a, int b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

std::string max_value(std::string a, std::string b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

These are separate functions. The compiler will simply generate as many separate versions of max_value as it needs to. It outputs separate assembly language for each of them, and treats them as function overloads, meaning that it uses the static (compile-time) type of the parameters to figure out which function to call.

So, from the perspective of someone reading the code, we call max_value twice, and it figures out how to do its thing on an int or a std::string. It’s polymorphic, as it does the same algorithm (finding max) with an operation that changes based on type (<). But from the perspective of someone reading the outputted assembly, it’s not polymorphic – we’ve simply got two different functions that do max_value in two different ways.

In other words, we’ve gone from polymorphic code (compile time) to monomorphic code (run time). This is why Rust calls its equivalent to template instantiation “monomorphization.” This is also why it’s called “compile time polymorphism” – it is no longer polymorphic at run-time.

The advantage: This is a zero-overhead abstraction. We’re having the compiler write, on our behalf, specialized code for each type. We do not need each element to have virtual function metadata to indicate how to do comparisons, nor do we even need a function pointer like with qsort. It’s as optimal as specialized hand-written code, but we didn’t have to do the specialization.

The disadvantage: We have to know the type at compile-time. This prevents heterogeneous containers from being possible with this style of polymorphism. This type of polymorphism can only be based off of the compile-time type, not based off of changing run-time types. It is the exact opposite of “late binding” – the binding is done at compile-time. So, this could not be used for polymorphism over different types of widgets in a list of widgets.

The other disadvantage: Compile times take longer and the resultant binary is larger. (Eh, shrug.)

So what operations are needed to support this template? What definition are we using for “comparable type” for T? We’re not explicitly using any at all, but note that if the type T doesn’t support the < operator, this code will simply fail to compile:

class Foo {
};

max_value(Foo{}, Foo{});

Giving the error:

test.cpp: In instantiation of ‘T max_value(T, T) [with T = main()::Foo]’:
test.cpp:22:14:   required from here
test.cpp:8:11: error: no match for ‘operator<’ (operand types are ‘main()::Foo’ and ‘main()::Foo’)
    8 |     if (a < b) {
      |         ~~^~~

This goes away if we give it the < operator.

class Foo {
public:
    bool operator <(const Foo &other) const {
        return false; // All Foos are created equal!
    }
}

max_value(Foo{}, Foo{}); // Now compiles

If we’d written max_value differently, however, using > instead, this might not have made the error message go away. It turns out that < is the conventional operator to use for comparisons, however, the C++ equivalent to Java’s Comparable, the defining function for “natural order” by convention.

Is that all that’s required to make max_value work? It turns out no, as many an astute C++ programmer has probably already noticed. There is another operation besides operator< required to make max_value work, and this is because I intentionally made a mistake (so I could reveal it later to show how subtle templates can be).

Let’s take a look at the instantiation for std::string again, just the signature:

std::string max_value(std::string a, std::string b);

Is that how we’d write max_value by hand for std::string? No, we wouldn’t. We’d write const std::string &a, and take it by reference, so that no new objects are initialized in the comparison and return. If you’re not a C++ programmer, this might seem shocking, but max_value as we wrote it requires the type to be passable by value, which is a capability that a type might not have:

class Foo {
public:
    Foo() = default;
    Foo(const Foo&) = delete;
    bool operator <(const Foo& other) const {
        return false; // All Foos are created equal
    }
};

max_value(Foo{}, Foo{}); // Error! Error!

So, we missed the mark, quite by accident! We had an extra requirement besides comparison, and we can fix that by taking the value by (const) reference (which is what std::max does anyway), which also implies returning by reference:

template <typename T>
const T &max_value(const T &a, const T &b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

So what was required from T for us to call max_value?

In one sense, nothing besides that it should be a type! We could pass any type in for T, and the compiler will plug in the type and chug away, running into errors only once it has attempted to do so! This might actually happen several template instantiations deep, and the resulting error shows up in the template where the operation is attempted, not in where you use the template with an inappropriate type, which can be confusing.

In another sense, what is required is that we pass types that make max_value compile, so in this case, ones that support operator <. However, there is no guarantee or check that the type is making the semantic promises that correspond to that type. Sorting, for example, requires that that operator work in such a way as to define a strict equivalence class. If that operator doesn’t in fact do that, std::sort will compile but won’t work properly.

It seems reasonable in this case to expect people to use operator < for less-than as it’s such a well-established and fundamental operator. But templates can also invoke named methods. What if somebody writes a template that calls some_t.foo() expecting it to do one thing, and someone calls that template with an unrelated class that has a type-compatible foo method, but with different semantics? There is no indication to the compiler, when you write the class, that you intend for foo to be appropriate for use in the template. We didn’t have to say, when we wrote Foo here, that our operator < was valid for std::sort.

Concepts do help with that. You can statically assert that a class supports a concept’s requirements, and that documents your intention to support it semantically as well. Concepts can also cover stricter requirements than a template incidentally imposes, and help document the semantics of templates.

But everything about concepts is opt-in; you can always write a template that will sometimes fail on instantiation. And that makes them much less useful in my book. Don’t get me wrong: I’m glad they exist. I think C++ with concepts is better than C++ without concepts. But it only goes so far, especially when compared with Rust traits, which are mandatory for Rust’s form of compile-time polymorphism.

More relevant than all of this, to me, is that templates and OOP work so differently than each other. Run-time polymorphism and compile-time polymorphism are just completely different beasts. Students are taught the OOP style run-time polymorphism, and that doesn’t really help them understand templates, or even get started doing so. Again, I feel C++ is too big.

But, at least it has this zero-overhead abstraction, without requiring a method look-up and an indirection for every item to be sorted.

std::sort, by the way, takes iterators. These iterators must be value swappable legacy random access iterators, and that’s just a subset of the requirements, as seen in std::sort’s CPPReference page. The way to get from one element to another (and therefore implicitly the size), the way to swap elements, and the way to compare them are all implicitly derived from RandomIt, the type parameter specifying the type of the iterator (at least in the overloads of std::sort that do not take an explicit comparator).

Programming Language #3: Sorting in Haskell

Now for Haskell!

We’re mostly talking about Haskell to move on to talking about Rust, as this is a Rust-focused blog. There’s a lot going on with Haskell typeclasses that I won’t have time to get into here.

Haskell is where Rust got traits from, although Haskell calls them typeclasses. Incidentally, Haskell uses run-time polymorphism where Rust uses compile-time polymorphism, but the semantics are more similar than you might expect from that statement.

In Haskell, like Java, all types that sort accepts are boxed, covering size and swapping among the traits that might need to be customized. Unlike Java, the operations we need to perform on values of this type are passed to sort once, rather than looked up on a per-element basis.

Here is the type for sort:

sort :: Ord a => [a] -> [a]

a here is like T in C++: a type variable that can be replaced with any type. As in Java, this is subject to type erasure: sort just operates on generic boxed values. Any comparison-specific operations it needs come from the Ord a =>, which constrains a to types that have instances of the Ord typeclass.

Here is the definition of Ord:

class  (Eq a) => Ord a  where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

    compare x y = if x == y then EQ
                  -- NB: must be '<=' not '<' to validate the
                  -- above claim about the minimal things that
                  -- can be defined for an instance of Ord:
                  else if x <= y then LT
                  else GT

    x <  y = case compare x y of { LT -> True;  _ -> False }
    x <= y = case compare x y of { GT -> False; _ -> True }
    x >  y = case compare x y of { GT -> True;  _ -> False }
    x >= y = case compare x y of { LT -> False; _ -> True }

        -- These two default methods use '<=' rather than 'compare'
        -- because the latter is often more expensive
    max x y = if x <= y then y else x
    min x y = if x <= y then x else y

It defines many methods that an instance of Ord can support. These methods are functions defined in terms of each other; you must specifically implement at least one of them for your type to prevent infinite regress. Minimally, either compare or <= is sufficient, with compare recommended for more complex types.

Unlike in C++, when you define these methods, it is not enough to simply define a function called <= or compare. Haskell won’t even let you define functions with the same fully qualified name as the methods, which exist in the same namespace as any other functions. Unlike C++, Haskell does not have function overloading, and any time the same fully qualified name has different semantics for different types, it is through this mechanism of typeclasses. Like in Java, you have to explicitly declare your intention to implement the methods as found in Ord, by writing an instance explicitly, like so:

import Data.Ord
import Data.List

data Foo = Foo Integer
    deriving Show

instance Eq Foo where
    (Foo a) == (Foo b) = a == b

instance Ord Foo where
    (Foo a) <= (Foo b) = b <= a

main = do
    let list = [Foo 3, Foo 4, Foo 2]
    print $ sort list                   -- outputs [Foo 4,Foo 3,Foo 2]

Note that the instance declarations are separate from the definition of the type! The module where the type is declared can define them, but so can the module where the typeclass is declared. Other modules are not allowed to by default to make sure there is only one canonical definition of an instance for a given type and typeclass.

How does this actually work then? Well, Ord a is a secret parameter to sort. Haskell will create a bundle of function pointers for us that represent the specific Ord instance for whatever type we pass to sort, either from knowing the type statically at that point, or passing along a bundle passed into whatever called sort. So this compiles to something quite similar to the C qsort (at least as far as polymorphism is concerned), taking in a comparison function. The big difference is, Haskell will choose the comparison function for us – but it is one comparison function, not one comparison function per item as in Java.

Programming Language #4: Sorting in Rust

So, how does Rust do all of this?

As I said, a Rust trait is very much like a Haskell typeclass. Rust’s main sort method, like Haskell, requires the Ord ~~typeclass~~ trait. Like Haskell, it even has provided (but overrideable) methods as well as required methods:

pub trait Ord: Eq + PartialOrd {
    // Required method
    fn cmp(&self, other: &Self) -> Ordering;

    // Provided methods
    fn max(self, other: Self) -> Self
       where Self: Sized { ... }
    fn min(self, other: Self) -> Self
       where Self: Sized { ... }
    fn clamp(self, min: Self, max: Self) -> Self
       where Self: Sized + PartialOrd { ... }
}

Like typeclasses, to indicate that a type has a trait requires a specific block that says what trait we’re trying to implement, and lists the implementation of the required methods. Like in Haskell, that block may reside in the crate where the trait is defined, or the trait where the type is defined. Like in Haskell, this allows us to add polymorphism to previously unpolymorphic operations without having to create wrapper types.

Here is an example of implementing this trait (unfortunately, we have to implement both Ord and PartialOrd):

use std::cmp::Ordering;

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
struct Foo(u32);

impl PartialOrd for Foo {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        other.0.partial_cmp(&self.0)
    }
}

impl Ord for Foo {
    fn cmp(&self, other: &Self) -> Ordering {
        other.0.cmp(&self.0)
    }
}

fn main() {
    let mut foos = vec![Foo(3), Foo(4), Foo(1), Foo(2)];
    foos.sort();
    println!("{:?}", foos); // Displays [Foo(4), Foo(3), Foo(2), Foo(1)]
}

It’s very similar to Haskell, but with “C-like” syntax and aesthetic. The syntax for the functions using the trait looks like C++ templates:

fn max<T: Ord>(a: T, b: T) -> T {
    if b > a { b } else { a }
}

What’s different from Haskell is how it’s implemented. The semantics are quite similar, and the Rust implementation can be thought of as an optimization of the Haskell semantics. Instead of passing in to sort() a secret run-time parameter with Foo’s implementation of Ord, the function is monomorphized. We can think of it as inlining just that one parameter at compile-time, and generating a specialized function.

Yes, this implementation is fundamentally very similar to C++’s implementation of templates. It’s basically the same in terms of machine code and resulting optimizations. But the semantics are more Haskell-like. Polymorphic functions are type-checked once. They may only use functionality incorporated in the traits at hand. We don’t postpone the type-checking for the template instantiation.

What’s more, the same mechanism is also used for Rust’s run-time polymorphism, where we can have a type like dyn MyTrait for some specific traits that are object-safe. These trait object types are like OOP polymorphic types, in that each value has its own copy of the table of polymorphic functions with it, but the copy is outside the original object. It is a property of the pointer, not of the object, and implemented with fat pointers.

Like with any other trait, the trait implementation is separate from the type definition or the trait definition (though it must live in the same crate as one of them). Unlike C++, there is one system for polymorphism that can be used in both run-time and compile-time ways, with overlap where possible.

Conclusion

I hope this shows, if nothing else, that polymorphism itself can take many forms in many programming languages beyond the OOP variety of it. The OOP variety is in some senses self-propagating – if you optimize your language for it as in Java, then it makes sense to use for everything, even if it’s not what you would choose in a language that has other options.

For many forms of polymorphism, in C++ (for templates), Haskell, and Rust, no inheritance is necessary. It is simply not built according to the OOP frame of mind. I personally think Haskell and Rust are doing it right here, as is perhaps obvious from how I’ve written about it.

I hope to write more about run-time polymorphism in Rust, and how it differs from the C++ variety, and how you can manually implement other types of run-time polymorphism if you want. This would be a future post. But, this is a hobby blog, so no promises on timeline!

Minor News: Some Repos on GitHub

2024-01-21T00:00:00+00:00

So, there are now two additional repos of my code on GitHub that recently got published, both under the MIT license. Neither is any show-stopping major project, but I figured I’d let everyone know nevertheless, and write up a few notes about it. Both have been added to my programming portfolio garden.

Repo #1: Crate Version of Prefix Ranges

Arvid Norlander (blog, GitHub) reached out to me to ask if I wanted to publish my little Rust module from my post on prefix ranges as a crate, or, failing that, if I could license it as open source so he could publish it. I had thought of most of my code on this blog up until this point as example code not worth licensing, but his prompting changed my mind. If it’s just trivial example code, it’s not worth not open sourcing, so I might as well release the website’s example code under an MIT license.

This particular piece of code seems like the wrong end solution to the problem at hand – though it is the solution I ended up using when faced with the problem in a larger project. Ideally, I would like to write a follow-up piece to the prefix range article, discussing how to fix BTreeMap to generalize not just to splitting on various keys based on their ordering properties, but based on any appropriate function that acts as a range (i.e. that monotonically transitions from false to true when looping over items in sorted order by the Ord trait), as a generalization of Bound. Then, prefixes could be represented in terms of such a function, and we could leverage the full efficiency of a BTreeMap without having to do any extra UTF-8-mongering.

But fully implementing such a thing would mean patching the standard library, and fully writing that blog post would mean a lot of benchmarking work. I still plan on doing it someday, but as I point out many times, this is a hobby blog (although I do now support buying me a coffee, that is meant in the true spirit of buying me an extra beverage as a token of thanks. At the time of this writing no one has clicked it, and I certainly expect no more than occasional literal coffees to come of any money from it), and so follow-up posts will happen when they happen (although nagging me about it, nicely, over e-mail is allowed).

Repo #2: Texas Hold-Em Library/Quiz App

I’ve been writing some code to do with the most popular modern poker variant, Texas Hold-Em. It lives in a repo on GitHub. Ideally, it’ll turn into an app to help me and some buddies practice reading flops, counting outs, seeing who’s ahead, and doing other hold-em mental calculations. I might also extract a library or even a framework for writing AIs, or playing against them. Maybe even a front-end app could be added, either in Rust or in Reflex in Haskell.

But no promises! See the hobby blog note above! If you really want a feature, I’ll happily accept PRs!

Of course, this wouldn’t be the first such codebase, or even the first in Rust. I’m just having run. I enjoyed writing the code so far, and I figured I’d put it on GitHub in the meantime, even if it never becomes particularly useful.

Writing it with all its combinatoric randomness made me really learn to appreciate itertools, a collection of iterator methods that for various reasons haven’t been accepted or stabilized in the standard library. It’s been good exercise writing in functional programming, iterator and iterator-transformer style, which is a little harder in Rust than in Haskell.

Also, while I understand why Rust doesn’t have generators (there is an excellent blog series about the topic on “Without Boats”), many of the reasons are historical and, well, I just really wish it did.

Additional future exploration might include zany optimizations, perhaps inspired by (but not directly following in the feet of) this zany hand evaluation algorithm implemented in Rust many places including here by Wataru Inariba – although regular optimizations probably come first.

Rust Is Beyond Object-Oriented, Part 3: Inheritance

2023-12-07T00:00:00+00:00

In this next¹ post of my series explaining how Rust is better off without Object-Oriented Programming, I discuss the last and (in my opinion) the weirdest of OOP’s 3 traditional pillars.

It’s not encapsulation, a great idea which exists in some form in every modern programming language, just OOP does it oddly. It’s not polymorphism, also a great idea that OOP puts too many restrictions on, and that Rust borrows a better design for from Haskell (with syntax from C++).

No, it’s that third pillar, inheritance, that I am discussing today, that concept that only shows up in OOP circles, causing no end of problems for your code. Unlike encapsulation and polymorphism, Rust does not have any direct analogue.

Side note: In this series in general, but especially in this post, I am primarily discussing static OOP languages, like C++ and Java, where interfaces have to be explicit and where classes correspond to different static types. Much of what I write would have to be adapted to apply to more dynamic “duck-typing” styles of OOP like in Python or JavaScript (or Smalltalk), and won’t apply as directly. This series is about why Rust isn’t OOP, and Rust is closer to C++ or Java than to a dynamic language, so this bias makes sense in context.

Why do people like inheritance?

I can see why inheritance is so compelling. The entire system of education encourages us to categorize things into neat little hierarchies. Rectangles are a type of shape, and squares are a type of rectangle. Humans are a type of animal, and men and women are types of humans. Inheritance allows us to take this “X is a Y” and express it to a computer.

This “is a” relationship is seen as intuitive. As the entire point of OOP is to make programming more intuitive, more like reasoning about the real world, inheritance is a perfect match for it. Just like we reason about the real world with categories and subcategories, we can reason about the world of our program in a similar way.

And this allows us to feel smart when we read introductions to inheritance in various books on OOP programming. We see the Tiger class inherit from the Animal class, or the Rectangle class inherit from the Shape class.

We get so excited by the abstract principle of “is a” that we don’t even notice that the examples have nothing to do with programming. We don’t write code about shapes or animals. And even a drawing program or a zoo inventory app wouldn’t use inheritance like this! If inheritance was so useful as to be a pillar of OOP, why are there so few beginner examples that involve things programs actually do?

What do I mean by inheritance?

First, let me clarify what I mean by inheritance, or rather what I don’t mean.

I don’t mean every subtype-supertype relationship, where all values of one type are also included in another, broader type. Subtyping shows up in Rust all the time, particularly when it comes to lifetimes.

I also don’t mean the version of inheritance that only involves implementing an interface. In C++, you implement dynamic interfaces through inheritance as a mechanism, even if the “superclass” is just a list of methods. In Java, inheritance and interface implementation are separate mechanisms. I am not talking about interface implementation as inheritance, even though it is technically considered the same feature in C++:

// This class has no fields, only virtual methods.
//
// In Java, we would call this an interface. In Rust, we would
// call this a trait.
class Shape {
public:
    virtual void draw(Surface &surface) const = 0;
};

// This is considered inheritance in C++. The Java equivalent
// would use `implements` instead of `extends`. And you could still
// do this in Rust with a trait.
class Square : public Shape {
    int size;
    int x;
    int y;
public:
    void draw(Surface &surface) const override;
};

I am only opposed to the type of inheritance that is still called inheritance in Java. Having a type implement an interface (a trait in Rust) is perfectly legitimate and still allowed in Rust, as is casting a reference to a value to a generic, “dynamic” value based on that trait or interface:

trait Shape {
    fn draw(&self, surface: &mut Surface);
}

struct Square {
    size: u32,
    x: u32,
    y: u32,
}

impl Shape for Square {
    fn draw(&self, surface: &mut Surface) {
    }
}

// Assume square is Square, surface is Surface
let shape: &dyn Shape = &square;
shape.draw(&mut surface);

Shape, in this context, is a pure interface. It is only a structured form of polymorphism, not inheritance per se. Very importantly, Shape has no fields. It is defined based solely on what you can do with it. And accordingly, the “is a” language makes sense for interface implementation: Square is a Shape. A Shape has no state, though, just methods, just behaviors.

But some parent classes have fields. And that’s when inheritance really starts to have problems: when the “parent” class has fields. It is at this point that inheritance starts to seem really weird.

What does inheritance actually do?

In my article on encapsulation, I discussed how a class is secretly two things with the same name, entangled and conflated:

A record type (or what Rust would call a struct), that is, a type whose values consist of a number of fields with fixed names and types
A module (a collection of code with enforced encapsulation boundaries), containing that record type and a collection of functions (called “methods”) for interacting with it

Inheritance does something different with each of these concepts. To start out, let’s discuss what it does to the record type. We’ll continue using shapes, a classic example for discussing object-oriented features. A circle is a shape, so we can use inheritance here:

class Shape {
public:
    Color color;
};

class Point {
public:
    int x;
    int y;
};

class Circle : public Shape {
public:
    Point center;
    int radius;
};

So, what does this mean for Circle? Well, it means that all the fields of Shape (namely, color) are also fields of Circle. Therefore, references to Circle can be made into references to Shape, as everything you can do with a shape, you can do with a circle, like set the color, or get the color:

Circle circle;
Shape &shape = circle;
shape.color = Color::Blue;
assert(circle.color == Color::Blue);

The thing is, we already have a mechanism of taking all the fields of struct A and putting it in struct B: by putting a field of type A into struct B! Instead of inheritance’s “is a,” we can accomplish the same thing with having a field, or “has a.” In our example, we can do the exact same thing with Point that we did with Shape – it just involves being a little more explicit about what’s going on:

Circle circle;
Point &point = circle.center;
point.x = 3;
assert(circle.center.x == 3);

So, what does inheritance do to the classes from the record type perspective? It makes the parent class a field of the child class, just a field with no name. By writing:

class Circle : public Shape {
    // ...

… from a record type perspective, we were writing syntactic sugar for:

class Circle {
public:
    Shape shape;
    // ...

And when we wrote:

Shape &shape = circle;

That was translated into something like:

Shape &shape = circle.shape;

“Is a,” from a record type point of view, is just syntactic sugar for “has a.” If you want to do something similar in Rust, just make a has-a relationship, rather than creating an implicit field with no name. Rust doesn’t like implicit nameless things anyway.

This will also save on arguing about whether two types have an “is a” or a “has a” relationship. I regret all the time I’ve spent splitting hairs about that distinction, when really, it’s just a matter of whether we want a field to be implicit or not.

OK, so that covers what inheritance does to the record types, but what about the rest of the class, the module? What happens to the methods?

Well, for non-virtual methods, it’s also straight-forward. Instead of doing inheritance, you can still just use has-a instead, and do a field access. Instead of calling, say, circle.get_color(), we could always call circle.shape.get_color().

So far, with the fields and non-virtual methods, inheritance just seems a bit weird and overrated. Like, we don’t see any reason yet why a programming language would want to support it, when just having a field of a superclass type does everything. But on the other hand, some people like implicit fields and convenient short-hands, so there’s not much of a downside either.

Inheritance without virtual methods may seem harmless, but it doesn’t have much to do with the concept of “is a.” Technically, you can use a field access as an implicit conversion, and think of it as a subtyping relationship, but it doesn’t actually correspond to how the world works. Even in the world of shapes, it doesn’t make sense: if a square is a rectangle, how come it has less state than a rectangle, with only one field for side length instead of two for width and height?

But we’ve not yet talked about virtual methods. When we do, you will see why I think inheritance is not just an unnecessary feature, but an ill-conceived anti-feature.

But what about the virtual methods?

So, earlier we discussed a class as being two things, a record type (with fields) and a module (with methods and visibility restrictions). But once we consider virtual methods, a class is actually three things with the same name:

A record type: each object has the fields
A module: the type, trait, and other methods, are all in an encapsulated module
A trait or interface: the virtual methods form an interface

Side note: some programming languages consider all methods to be virtual for some reason. For these programming languages, everything I say still applies, but all methods are in the trait as they’re all virtual.

Given that most methods aren’t self-consciously written with the intent to be virtual, making methods implicitly virtual seems like a good way to set the programmer up for surprise – that is, a horrible idea. But nevertheless having all virtual methods was for a long time considered the more ideological, more purely OOP way to do things, and so languages which strove to be purely OOP (like the original Java) did it.

Up until now, we have ignored this additional conflation, this additional role that a class plays. In discussing encapsulation, we were discussing simply how classes conflate the two distinct concepts of record types and modules. In discussing polymorphism, we were assuming interfaces, and discussing how OOP’s version of interfaces were constrained by insisting on a specific dynamic implementation. Only now, now that we discuss inheritance, do we see that OOP not only conflates record types and modules, but it also conflates record types and interfaces.

When a class has virtual functions, that constitutes an interface, implemented by dynamic polymorphism. But the only way you are allowed to implement the interface is by inheriting from the class – that is, by also having a (secret, unnamed, implicit) field of the record type.

See, as discussed above, inheriting from a class without virtual methods, a class with just fields and regular methods, is no biggie. It’s just a weird way of writing a has-a relationship that comes with some syntactic sugar and automatic conversions – things I’m not a fan of and wouldn’t put in my programming language, but not that bad.

Similarly, inheriting from a class without fields, a class with just virtual methods (and perhaps regular methods, it turns out they barely matter) is also no biggie. It has all the downsides of OOP-style polymorphism, but is fundamentally just a way to indicate that you’re implementing an interface. In languages like C++, inheritance is the mechanism by which you implement interfaces, and in languages like Java, a methods-only class should probably be an interface.

(To round out all the possibilities, I will mention that a class with neither virtual methods nor fields is just a traditional module.)

But if you have both fields and virtual methods, then you have true OOP-style inheritance, with all of its problems. You have an interface that you can only implement if you inherit from the class. If you did not intend this, perhaps because you are writing in a language like Java where allowing inheritance is the default for classes and virtual is the default for methods, you are setting yourself up for surprises when someone inherits from your class and starts overriding methods.

If you did intend this, however, why? Why make implementing an interface contingent on having certain state, on having a special unnamed field? Why conflate these two fundamentally different concepts of containing another record type’s state and having the new record implement an interface?

There’s a number of problems with this conflation. Why would we assume that in order to implement the methods, you need that state? What if that state is represented differently, like on a disk, or over a network, or as mathematical consequences by a formula? This conflation of implementation and interface means that there is no sane way to implement proxy objects.

But more importantly than that, I’m not entirely sure what the upside of this conflation is. It seems to make programming simpler in one particular scenario, a scenario that I rarely see come up in real life, a scenario that frankly seems like a code smell.

So what can we do instead?

There is no inheritance in Rust. There are no fields in traits. There is simply no way of saying that in order to implement a trait, your type must have certain fields. Rather than conflate the concepts of record types, modules, and traits in this God-concept of “class,” Rust keeps these three concepts quite separate.

So if we have a design that requires inheritance (either because we think in OOP or because we’re translating from an OOP programming language), how would we represent that in Rust?

Well, the most straight-forward way would be to separate out the different parts of the base class. Such a refactor would allow us to express our design in Rust, as literally as possible. This is just meant as a starting point, a proof of concept that our design can survive in a language without inheritance. Alternative, often better ways of replacing inheritance will follow subseqeuntly.

But here’s the straight-forward method: If the base class has just fields, or just virtual methods, that’s easy: it becomes a struct or a trait, respectively. Instead of inheriting from the class, a type would have that struct as a field, or implement that trait. Actually, in this case, the straight-forward method might just be perfect – you weren’t actually using inheritance per se, just an odd syntax for a field or for implementing an interface.

If it has both, we’d have to extract both a struct and a trait. The fields would become a struct, of its own type. The interface of the virtual methods would become a trait. The implementation of the virtual methods would become the implementation of that trait for that struct, or provided methods on the trait, depending on what makes more sense. Any non-virtual methods would then become methods of the struct or provided methods on the trait, again depending on what makes more sense in context.

At this point, it might make sense to consider some of the alternatives that Rust provides to run-time polymorphism, as discussed in the polymorphism post. Is a trait, especially an OOP-style, object-safe trait, really what we want here? We’ve opened up alternative designs now, and perhaps one of the alternatives makes more sense.

Assuming we do want a trait, we can then go to all the “child” classes and make them implement the trait. They also get a new field, perhaps named super, to contain the parent. Their trait implementations would then do a mix of implementing new methods, calling the same method on super, and defaulting to the provided method.

And again, at this point it would be appropriate to consider whether we even need the super field, or if perhaps we can get away with not having it.

After this transformation, we have valid Rust code out of our inheritance-based OOP-style design pattern. But there’s nothing requiring us to use Rust to do it: you could do the same refactor of inheritance structures in an OOP language.

If we were to do this transformation, we’ve paid a small cost of having to potentially write .super (or whatever name we’ve given the parent field) every once in a while, as well as writing trait implementations that forward some method calls to the super field. In return, we’ve deconflated the two very different concepts of interface and fields, and opened ourselves up to more possibilities.

What should I actually do in Rust instead of inheritance?

But notice that in discussing this transformation, I encouraged you to consider alternatives at two points. Rarely does this transformation make sense literally, which is to say, rarely does a literal translation of inheritance into Rust make sense. I find this quite telling, as it implies to me that inheritance itself only rarely makes sense – and indeed, I only tend to use inheritance in OOP languages where a framework requires me to, or as an ersatz² replacement of sum types (i.e. Rust enum).

Here are some other patterns that replace inheritance hierarchies, that you might find yourself considering instead:

A regular enum. This actually covers most situations for me. Methods that would be overriden just do a match on the enum contents, and methods that would not, do not.
struct types that contain a field with an enum types. The enum type represents all the different options, but the struct type contains the fields that are always the same.

struct MessageHeader {
    source: Address,
    destination: Address,
    seqnum: u32,
}

enum MessageBody {
    Ping(PingMessage),
    Pong(PongMessage),
    Request(RequestMessage),
    Response(ResponseMessage),
}

struct Message {
    header: MessageHeader,
    body: MessageBody,
}

Isn’t this so much nicer than putting source, destination, and seqnum in the base class?

enum variants that themselves contain enum types.

enum Message {
    Client(ClientMessage),
    Server(ServerMessage),
}

enum ClientMessage {
    Ping(PingMessage),
    Request(RequestMessage),
}

enum ServerMessage {
    Pong(PongMessage),
    Response(ResponseMessage),
    Error(ErrorMessage),
}

Now, if you want any message, your type is Message. If you know for sure you have a client message, you can say ClientMessage. Or if you know for sure it’s specifically a ping, you can say PingMessage. It’s like a class hierarchy!

A struct with a template-parameterized member to set a policy.

This is perhaps the most sophisticated replacement. Imagine you have a class SocketHandler that handles reading from a socket. Imagine it looks like this:

class SocketHandler {
    CircularBuffer socket_data;
public:
    void data_available(int fd);
protected:
    virtual size_t message_size(const char *data, size_t size) = 0;
    virtual void process_message(const char *data, size_t size) = 0;
};

How this is going to work is, data_available is going to grab more and more data from the socket fd until message_size returns a non-zero value. Then, it’ll call process_message with that data. During this time, it’ll store the data in socket_data. All of that work is being done by data_available, in the parent class, and you can imagine that the socket dispatching library has a collection of these socket handlers, something like std::vector<std::unique_ptr<SocketHandler>> (or perhaps a map indexed by file descriptor).

The child class is responsible for overriding message_size and process_message to actually interpret incoming data for a specific protocol. You’d have a child class for each SocketHandler protocol, and it would include internal state like sequence numbers, etc.

But rather than have these methods overriden by a child class, the right way to do it is to have just those methods in a trait that a SocketHandler has. You can see this when you extract the implicit trait for SocketHandler for the Rust version:

trait SocketProtocol {
    fn message_size(&self, data: &[u8]) -> usize;
    fn process_message(&mut self, data: &[u8]) -> Result<()>;
}

struct SocketHandler<P: SocketProtocol> {
    buffer: CircularBuffer,
    protocol: P,
}

trait SocketHandlerTrait {
    fn data_available(&mut self, fd: u32) -> Result<()>;
}

impl<P: SocketProtocol> SocketHandlerTrait for SocketHandler<P> {
    fn data_available(&mut self, fd: u32) -> Result<()> {
        // Call `self.protocol.message_size/process_message`
    }
}

So, rather than each socket protocol inheriting from socket handler, with its common state, the socket handler has a socket protocol, as a policy. The SocketProtocol trait here can then be a compile-time, static trait and SocketHandlerTrait can be the object-safe, dynamic one, and the std::vector<std::unique_ptr<SocketHandler>> can be replaced with Vec<Box<dyn SocketHandlerTrait>>.

This last refactor can be generalized. Instead of inheriting from a base class to implement specific functionality, inject that functionality using policies³, and parameterize the struct with members that implement policy traits. Then, if need be (and need might not be) write a separate dynamic trait for the overall struct.

I know my last post hasn’t been since February. I’ve been procrastinating this one for a long time, mostly because my life has been so gosh-darn busy, and also mostly because I don’t really instinctively remember what I (or anyone else) really liked about inheritance to begin with. ↩︎
Isn’t it weird that ersatz means replacement in German, but means mediocre as a replacement in English, so that “ersatz replacement” doesn’t mean “replacement replacement” but “mediocre replacement”? Or am I using the English word wrong? ↩︎
Policies are known in Gang of Four terminology as strategies. I’ve touched on the policy pattern in some previous posts, and at some point should write a full post about it, as policies are my favorite thing. ↩︎

Endianness, and why I don't like htons(3) and friends

2023-10-19T00:00:00+00:00

Endianness is a long-standing headache for many a computer science student, and a thorn in the side of practitioners. I have already written some about it in a different context. Today, I’d like to talk more about how to deal with endianness in programming languages and APIs, especially how to deal with it in a principled, type-safe way.

Before we get to that, I want to make some preliminary clarifications about endianness, which will help inform our API design.

Why Little Endian Bugs Us

New students often are more confused by little endian (where the least-significant component of an integer is stored first), and until they are told about it, they tend to assume computers are big endian (where the most-significant component is stored first) even if they don’t know that word. This is due primarily to the fact that big endian is what they’re used to: We write numbers with the most significant digit on the left, and in languages that write from left to write (including English, the lingua franca of programming among other things), this means that we live our day to day lives in big endian. But that doesn’t mean that big endian is more logical in any way, just that it is more conventional.

This isn’t helped by the fact that many learners are first exposed to little endian by it being confusing, and making them do more cognitive work, by reading little endian numbers from a hex dump. Take, for example, this code, which displays a 32-bit number in hexadecimal, and then displays the individual bytes of the same number as a hex dump:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[0], bytes[1], bytes[2], bytes[3]);

This results in this befudding output:

12345678
78 56 34 12

When read as a number, we can just read the number normally. However, when read as a series of bytes, we find ourselves having to read the number from right to left to read the number as big endian, as we are accustomed to doing. We can’t even just read backwards, however, as each byte is still printed internally according to our big endian convention: the higher-order hex digit is still printed first, followed by the lower-order hex digit.

The problem here isn’t little endian. The problem is that the printing functionality accommodates our big endian preference in printing, but only at the level of printing an individual number, either as a byte or as a 32-bit word. The word printed as a whole is printed big endian, to accommodate us. The individual bytes are also printed big endian, to accommodate us. However, the hex dump as a whole is printed with the lower values on the left, and the higher values on the right, to similarly accommodate our values that lower-indexed memory, memory that comes earlier, should be on the left. On a little endian system, this desire to print each number with the most significant digit on the left, but to print a sequence of numbers from left to right, leads to the contradiction. The resulting last line, 78 56 34 12, isn’t, properly speaking, little endian. The print-out is an odd type of mixed endian, due to our awkward conventions.

There is actually a relatively easy fix: if we insist on reading numbers with the most significant digit on the right (which we do), and the computer insists on storing less significant components first (which it does), these two desires can be reconciled by printing the hex dump from right to left:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[3], bytes[2], bytes[1], bytes[0]);

This results in a much cleaner print-out:

12345678
12 34 56 78

This should make clear that the weirdness of little endian is entirely due to our preference for big endian, and our preference for listing the lower-indexed values to the left, and how these preferences interact. It is because of human conventions, not because of any intrinsic problem with little endian. I would argue that, on little endian systems, all hex dumps should be right to left, and that would help, but there is little I can do to change the conventions of this.

Now, almost all modern systems are little endian, either because they are typically configured that way for processors that support either endianness, or because they only support little endian, like Intel processors. The few programmers who have to write code for big endian systems find themselves in the minority, and find themselves doing extra work to deal with other code that no longer accommodates big endianness.

There is one big exception to this: the Internet. All of the Internet protocols are designed to use big endian ordering, known in this context as “network byte ordering.” This is because when the Internet protocols were developed, big endian was a viable rival to little endian, and both byte orders were common.

This does make some sense, as well, because hex dumps of packets are very common, and big endian does make those hex dumps easier to read and reckon with for us big endian humans.

When Endianness Comes In

I would also like to clarify something about how endianness works. A 32-bit word in a register in the processor is neither big endian nor little endian. The processor needs to be designed knowing which bits are more significant, and which are less, but there is no intrinsic way in which the less significant bits come “first.” In a word-based memory system, where only entire words were stored in memory (like the PDP-7 was with its 16-bit words), and where it was impossible to address memory in terms of individual bytes, this would be the end of it.

As an example of this, see the documentation for std::endian on CppReference.com:

If all scalar types have sizeof equal to 1, endianness does not matter and all three values, std::endian::little, std::endian::big, and std::endian::native are the same.

However, once we come up with the idea that memory is made up of bytes, the endianness question arises: How do we split this 32-bit number into bytes? Which end of it should be byte 0, and which end byte 3? Similarly, if we read a series of bytes into memory, where should the first byte (by memory address) go in the register, the most significant (big) end, or the least significant (little) end?

As a result, types like uint32_t (and uint16_t and uint64_t) have no intrinsic endianness, so long as they are stored in registers. Only if they are written to memory, or read from memory, does their endianness matter. And then, it only matters if the actual byte representation is important – if we, as in the code above, use memcpy to copy their representation, byte by byte, into an array of bytes.

In general, if the byte representation does matter, I would argue that uint32_t should be treated as an abstract 32-bit value, devoid of endianness. Only when it is transcribed as a series of bytes should endianness be taken into account – and then the description should instead have the type of uint8_t[4] in C (or std::array<uint8_t, 4> in C++ or [u8; 4] in Rust).

The Main Argument: Why I dislike `htons` and friends

In C, however, we do not in fact do this. We instead have functions like htons, with this signature:

uint16_t htons(uint16_t hostshort);

uint16_t http_port = htons(80);

This function purports to convert a 16-bit number from host endianness (typically little) to network endianness (always big). Assuming a little endian computer, it does a byteswap: It swaps the less significant 8 bits with the more significant 8 bits in the register used to return the uint16_t.

So what are the properties of the returned uint16_t? If we passed in, for example, 80 (the port of HTTP), http_port, the new uint16_t is 20480 – because 80 is 0x0050 in hex, and we’ve swapped the two bytes, so we now have 0x5000. What is this number?

It is not, to be clear, a uint16_t value 80 that is now in “big endian,” though we might say that as a manner of speaking. It is almost certainly in a register, and as mentioned before, registers don’t have intrinsic endianness. It is something far more awkward: It is a value that, if we were to store it in little endian (the only option), results in a different number being stored in big endian.

To expand on this: 20480 is not a particularly meaningful number. It is not actually the port number we want to use. And it has nothing to do with the actual number 20480. It is simply a number that, if we store it in memory as bytes, will result in 0x00 being stored, followed by 0x50 – the big endian representation of 80. It is a uint16_t with a value chosen not for what number we want to store, but what bytes we will get if we store http_port as bytes.

Since uint16_t is designed to store numbers, not collections of bytes, I would argue that this type is not being used in a semantically honest way – it is a lie. What we are really storing is an array of 2 bytes, 2 uint8_ts. We are storing it in a 16-bit register, and implementation-wise that might be a good decision – but I would argue, if we want that to be possible, we should create an ABI where uint8_t[2] should be storable in a single register. The C programming languages, by not making arrays first-class types, is getting in our way here, which explains the situation.

Am I exaggerating when I say the type is a lie? Well, we expect to be able to do arithmetic on a uint16_t, to be able to test, for example, whether it is less than 1024, as listening on a port less than 1024 is a privileged operation. But in order to do that, we have to convert it back to a normal uint16_t – all uint16_t’s usual arithmetic operators are inappropriate for data that’s stored with its bytes swapped around.

So what should be done? Well, if we really intend to express a value in network byte order, e.g. big endian, we are changing the semantics of the information from “this is a 16-bit integer” to “this is a specific sequence of two bytes, chosen for a reason.” Therefore, the return value of htons should be an aggregate of two bytes.

Again, because of pointer decay this is impossible to express straight-forwardly in C, although a wrapper struct could be used. C++ takes care of this by having a built-in wrapper struct for arrays, namely std::array. The equivalent of htons would not emphasize that the uint16_t is in the host order (which I think is the wrong way of thinking about it), but would simply indicate that we’re just storing this short in a big-endian fashion (as opposed to the hardware-supported default storage we can access with a memcpy):

std::array<uint8_t, 2> store_short_as_big_endian(uint16_t value);

Rust already provides this as an alternative:

impl u16 {
    pub const fn to_be_bytes(self) -> [u8; 2] {
        // ...
    }
}

Unfortunately for semantics, Rust still has the problematic signature for to_be:

impl u16 {
    pub const fn to_be(self) -> u16 {
        // ...
    }
}

Perhaps this is due to efficiency reasons, or felt efficiency. Programmers know that this byteswapped value should, for performance, be stored in a single register. Programmers can feel more confident that this is actually done if it remains a u16 (or uint16_t) than if it is transformed into an array of bytes, however semantically inappropriate the u16 is.

However, if we are using a u16 or uint16_t as an implementation layer for what is in fact a way of storing two bytes in the opposite order than the one that makes sense for our processor, if we are using it as an implementation trick to do something semantically different from what a uint16_t normally does, then we should at least make the type distinct to give the maintenance programmer and compiler some ability to avoid letting us do non-sensical things (like comparing the value using uint16_t’s comparison operator).

Luckily, there is a design pattern for using the implementation of a type, but applying different semantics to it: the newtype pattern. We typically think of it as a Haskell or Rust thing, but we can use it in C++ as well. I would argue that if we’re going to abuse uint16_ts and friends in such a way, we should at least abstract it using the newtype pattern. In C++, this would look something like this, assuming a little endian computer:

template <typename T>
class big_endian {
    T value;
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        value = std::byteswap(in);
        return *this;
    }

    operator T() {
        return std::byteswap(value);
    }
};

Adding appropriate if constexpr expressions to also support big endian machines, and defining std::byteswap if you don’t have it yet on your system is left as an exercise to the reader.

But it works on my (little endian) system:

int main() {
    big_endian<uint16_t> be = 80;
    std::array<uint8_t, 2> be_bytes;
    memcpy(be_bytes.data(), &be, 2);
    printf("%04X\n", uint16_t(be));
    printf("%02X %02X\n", be_bytes[0], be_bytes[1]);
    return 0;
}

I would much rather use this to represent “we want to store a value in a register byte-swapped on some platforms” than a uint16_t with no additional type information. You cannot accidentally run invalid uint16_t operators on it, but you can convert it to a normal uint16_t first and then use those operators. However, it does have a big endian representation when stored, as indicated by the memcpy, and it can still be stored in a single register.

Even so, I would still not prioritize that ability to store it in a single register in most situations. Using a uint16_t to store the bytes swapped is still not remotely “storing a big endian value in a uint16_t,” it is “storing a big endian representation in a uint16_t so that when the processor writes that uint16_t little endian, we get a big endian representation of the number we actually want.” It’s still fundamentally a hack for performance, and while I’m comfortable with it contained within the encapsulation of this little_endian class, I would still rather actually write std::array<uint8_t, sizeof(T)> as the underlying storage type, unless the optimization is actually needed. I actually would use a big_endian class that would look more like this:

template <typename T>
class big_endian {
    std::array<uint8_t, sizeof(T)> be_representation;

    static void swap_array(std::array<uint8_t, sizeof(T)> &arr) {
        for (auto it = arr.begin(), jt = arr.end() - 1;
             it < jt;
             ++it, --jt) {
            std::swap(*it, *jt);
        }
    }
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        memcpy(be_representation.data(), &in, sizeof(T));
        swap_array(be_representation);
        return *this;
    }

    operator T() {
        auto bytes_copy = be_representation;
        swap_array(bytes_copy);
        T out;
        memcpy(&out, bytes_copy.data(), sizeof(T));
        return out;
    }
};

This now feels like I’m actually representing accurately what a big endian representation is: a way of storing a number as a sequence of bytes, rather than however the processor feels like storing it, and certainly rather than as a value that the processor will store as little endian, but which will store the value we actually want to store as big endian. I won’t lie and say the optimizer will make it equally performant, and if I needed to actually optimize I would use the other version, but I feel like this version is hack-free. (Again, it still only works on little endian platforms – fixing this is again left as an exercise.)

This version has the added benefit of having an alignment of 1, which I will argue later is more appropriate than using the underlying alignment of uint16_t, uint32_t, etc.

Using These “Big Endian” Types

This leads to a further question, however: When do we need to support network byte order? Really, the only time is when generating messages in wire format to send over the network. In C and C++, we generally represent messages to be sent over the network as structs.

For example, one can imagine a packet format with a 32-bit sequence number. We would want to write uint32_t for this sequence number:

__attribute__((packed))
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    uint32_t sequence_number;
}

However, of course, if it is in big endian byte ordering (as many protocols are), we then have to call htonl when loading this value in:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = htonl(seq_num);

As I said before, I don’t like htonl. I certainly don’t like using uint32_t as the type for sequence_number. So, we can do one of two things:

We can use a Rust-style function to convert to byte representation, and use std::array<uint8_t, 4> as the type of sequence_number. This strikes me as equally awkward. We now know that we need to do soemthing other than just assign the value, but we don’t know what that thing is, necessarly.
We can make the type more semantic, and use our big_endian wrapper. This is the purpose why I wrote it, and the use case where it makes sense it has an alignment of 1 – wire format structures are often packed.

__attribute__((packed))
// ^^ You may need to add this to `little_endian` as well,
// or you may not need it at all now
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    big_endian<uint32_t> sequence_number;
}

Now, when we actually send it over the wire, we will cast or copy this packet_wire_format to get the byte-by-byte representation, and sequence_number will be in big endian, by the invariants of our big_endian class. We will not need to remember to call any function at all, as the class’s interface provides us with only appropriate options:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = seq_num; // Performs conversion

The fewer mistakes you can make by accident, the better. And of course, this has the additional advantage that the type of the wire format is more self-documenting.

Similarly, if you read or write from the wire format using read and write methods on a buffer type, those methods should either be parameterized to take endian information along with the values, or you can pass objects of type big_endian as the value to be copied in: big_endian<uint32_t> is just as trivially-copyable as uint32_t.

Conclusions and Loose Ends

It is a little more awkward to write big_endian for Rust. I would want to use the existing to_be_bytes method in the implementation, and unfortunately that method is not in any trait, as I’ve complained about before. This can easily be remedied by writing our own trait, however, or using external crates that already do so.

However, I wonder if maybe all of these languages should define types that correspond to uint16_t, uint32_t etc, and just are defined to store themselves in network byte order (and perhaps another one that guarantees little endian order). After all, most processors support byteswap instructions, that make writing a value as a byteswap an easy operation. They could be optimized as normal values unless actually written to memory – and only the optimizer knows when they’re actually written to memory. They could even be written to memory in native endianness unless there’s some defined way to get a byte-by-byte pointer to them – and really only the optimizer knows that.

Endianness seems more a configuration on the natural types of the programming language than it does something to be implemented on top of these natural tools. These loops I’m using to do byteswaps are surely not the most efficient way to do it (which is why the non-array based implementation of big_endian is surely more performant even if it is hackish), because processors have some support for non-native endianness baked in. If a C++ vendor provided types like big_endian (and perhaps some do, I’m sure I’ll find out in the comments) it would surely be more performant.

But again, perhaps they should be primitive types. There’s some built-in processor support for them, and only the optimizer knows when the non-native endianness actually should be used.

I am too busy a person to do the research for such a proposal. I don’t know if such a proposal exists. My interest here is simply in using the tools I have to be a good programmer. For that, to_be_bytes and my implementation of big_endian will simply have to suffice.

Operating Systems: What is the command line?

2023-10-08T00:00:00+00:00

This is my newest post in my series about operating systems. Yes, it was last updated in 2019 – I’m a hobbyist blogger. This is a post about the command line, a computer topic, but it is for educating a non-technical (but tech-curious) audience. Most of the programmers in my audience will already know everything I have to say, and may be bored by some explanation of things they already know, though I intend to discuss some technical details of how computers work.

This is not a tutorial on how to use the command line on any particular operating system. Rather, it is a discussion of the role that a command line plays in a modern operating system and why some people (including me) still use that kind of interface.

As I’ve explained before, I often use my computer through the command line. It is a major part of but not the entirety of how I interact with it. I do this so much that people looking at my computer will assume I’m programming even when I’m not – even when I’m working on my blog, or another writing project, or even just organizing my pictures.

Here is a screenshot of a command line session:

Graphical User Interfaces

This is (as you likely know since you’re reading this on a website) no longer the normal way to interact with computers. Nowadays, we usually interact with computers through graphical user interfaces (GUIs), and many people take them for granted. We access applications¹ through each having their own window – or, for web applications, we can combine them into one window via browser tabs.

We navigate these applications through the mouse, or touchpad. Scrolling and clicking to find our way through the document, right-clicking or navigating menus to find further options, and occasionally interacting with a “dialog box” to specify details. All features are expected to be discoverable, that is to say, we expect to be able to find them in a menu, a toolbar, a right-click menu, or by navigating the dialog boxes we reveal through these other things. If we cannot discover a feature by these mechanisms, we can reasonably assume the application does not have this feature.

Here is LibreOffice Calc, a (somewhat old-fashioned) GUI program:

Nowadays, applications often run inside web browsers. This principle of discoverability is still considered important. Here is Google Docs, an application running inside a web browser:

These are both mouse-navigated programs with discoverable features. For both of these applications, there are many visible ways to interact with them. If you want to find a feature, looking through what’s right in front of you is the way to go.

The Command Line in Brief

The command line works differently.

Nowadays, the command line is usually accessed via a window within the context of a graphical desktop environment², but in the olden days, people interacted with computers via dumb terminals that couldn’t display images, just text³:

“It was a dummy terminal, and I was a dummy user.”

A member of the Baby Boomer generation describing what it was like to be a person in a non-IT role using Unix in the 80s.

Instead of being able to find various features via menus visible on the screen, you are instead given a prompt, an indication of the current state of your session that is, well, prompting you to tell the computer what to do, to give it a command:

You can then type your command, maybe a few more.

As you type commands, the output of the commands displays on the subsequent lines. When you hit the bottom of the screen, the screen scrolls up. Most terminal emulators let you scroll the window to see earlier parts of the transcript. A command might also prompt for additional input, or take full control of the terminal emulator and provide a different type of (still text-based) interface entirely.

If you type a bad command, it is not very helpful:

There is no discoverability. There are no hints as to what commands might be accepted. You can use the command line to find out more information about what commands are accepted, but you have to know the commands to do that. In practice, you have to learn a minimal set of commands from a book (or nowadays, a website) before you can actually do anything productive.

It’s not intentionally user-unfriendly. For example, on Linux, there are commands like man (for “manual”) that explain what commands do, and commands like apropos to search for useful commands. Here is the manual page for the man command itself:

Additionally, once you know the name of a command or utility, you can generally find out more about how to use it by passing -? or --help:

Command lines are available on all modern operating systems for personal computing: Windows, macOS, Linux, and certainly any other Unix you might have running. They tend not to be available on mobile OSes.

What is the command line not?

Before we talk about what this is for, and why modern operating systems still support this decidedly old-fashioned way of interacting with them, I want to dispel some myths and misconceptions about the command line, specifically two opposite misconceptions that seem to still be common amongst the computer laity.

Misconception One: The command line is literally DOS, the Microsoft operating system from the 80’s and early 90’s. It is there to support old programs from the 80’s and early 90’s, and exists solely for the support of obsolete and obsolescent software.

This misconception is common among Windows users, because it used to be true. Until Windows XP, Windows still came bundled and intertwined with a version of Microsoft’s older, fully command-line operating system, DOS. Old DOS programs were still in common use, and people needed a way to run them, so they could run a copy of DOS inside a window.

It’s not true anymore, however. Windows is no longer a chimera of DOS and more modern components. Since Windows XP, both the consumer and business versions of the Windows brand have been versions of Windows NT, a different operating system from earlier consumer versions of Windows, one originally targeted at business users, with no DOS code in it at all.

On a modern Windows computer, the command line is not primarily for DOS programs. The ability to run DOS programs isn’t even shipped with Windows by default anymore, but the command line still is. The confusion is understandable, because the command line still looks like the DOS command line. The prompt is still a form of DOS’s famous C:\>.

What is the command line for, then? It is for running modern Windows programs that happen to be designed to be used from the command line. Windows comes with a bunch of such programs, for things like systems and network administration.

There are a bunch more that you can download install, usually tools written by computer professionals for other computer professionals. Many of these command line programs were written primarily for Linux and other Unix OSes, but also have Windows versions.

We will go into specific examples of command line programs in a later section, but the important thing to know is that a command line program has access to all the same system libraries and capabilities that any Windows (or Linux, or macOS) program can access. It can play audio, connect to the Internet, and do pretty much anything – anything except draw a new window on the screen, not because it can’t, but because that would make it not a command line program anymore.

But I don’t want to go too over-the-top rebutting this first misconception, because then I might lead you to believe the second misconception.

Misconception Two: Not only can you do anything from the command line that you can do from a graphical user interface, but the command line is fundamentally closer to the operating system. When graphical programs run, they are using the command line under the hood.

This is not true.

It should be obvious that there is at least one thing you can do from a graphical user interface that you can’t do from the command line, which is to display graphics. The command line is an interface based fundamentally on displaying a grid of text. Thanks to modern Unicode, “text” now includes “emojis,” but it does not include images or high-quality charts and graphs.

But even with that overly-obvious caveat aside, yes, it is true that anything a graphical program can do besides show graphics could be done by a command line program as well. There are command line programs that manipulate images, they just don’t show the images as they manipulate them. There are command line programs that pretend to be web browsers and scrape data off of the websites when they load. All the operating system features and computer resources that graphical programs have at their disposal, command line programs will generally have too, besides (by definition) actually doing graphical displays and interactions.

However – and this is a big however – just because a command line program could exist to do everything a graphical program does, doesn’t mean that you have that program installed on your system, or that someone’s even ever written that program. The capabilities of your computer depend on what software you have installed, and what software you can install depends on what software people have written. If someone creates a file format, but only writes a GUI program to edit it, well, then, until someone reverse-engineers it, that file format will only be editable via GUI. Similarly if they only create command line tools – that file format will then only be accessible by command line.

For example, someone with ImageMagick installed on their computer but not Photoshop may only be able to do image manipulation from the command line. Someone with Photoshop installed but not ImageMagick may only be able to do image manipulation from the GUI. There is nothing intrinsically more powerful about either interface.

Specifically, GUI programs are decidedly not wrappers around command line utilities. You could write a GUI program that way (and there are a couple that are), but the vast majority do not in fact do this. Just as command line programs have access to all the same computer resources and operating system functionality that GUI programs do, it also works the other way around. GUI programs and command line programs both are written in programming languages that allow the program to invoke operating system functionality through system libraries and system calls. These calls are not at all the same as command line commands, and the GUI doesn’t need to use the command line as an intermediate layer.

If there is a GUI version and a command line version of the same functionality, maybe this is implemented as the GUI version launching the command line version under the hood – that is certainly something GUI programs can do, and it might make sense if the command line version is the interface most people use and that most maintainers are interested in. But it is just as likely if not more likely to be implemented by the GUI program and the command line both using the same common library.

And certainly, GUI-only programs like web browsers, e-mail clients, and office suites do not by any means implement their functionality by wrapping command line programs. There is no command line version of or interface to Photoshop, nor of Microsoft Word⁴.

And just like it’s possible to have an operating system with a command line and no graphical user interface, it is possible to have an operating system with a graphical user interface and no command line, not even internal analogues of it.

History of the command line

As I said before, computers used to be frequently accessed via dumb terminals. Before this, they were accessed by teletypewriters. This was literally a typewriter, where the keys you entered went to the computer, and the computers responses were typed on the paper.

Modern command lines mostly follow that pattern – new input goes in at the bottom of the window, and the window scrolls like a piece of paper receding from the typewriter. But on a modern command line, the program can also take over the entire terminal emulator window, as long as what it wants to draw can be expressed as text. They even support multiple colors.

Most command line systems used today, like most operating systems used today, descend from the Unix tradition, written in 1970. The exception is Windows – even though the Windows command line is not DOS, it takes many of its aesthetic principles from DOS, not only the famous prompt C:\>, but also its habit of taking options with /, where Unix and friends use -.

What are some modern command line programs?

git keeps track of different versions of a large folder (called a repository) full of code or other forms of (mostly) text, and allows changes to be merged and reconciled between different authors. While there are GUI and web wrappers around it, the flagship program is a command line utility.
ssh lets you log into a command line interface of another computer, usually a server. This is often the only way to log into and administrate the server, as Linux servers generally don’t have any GUI capabilities or GUI programs installed.
ImageMagick lets you manipulate images.
Last but not least, there are many small programs that let you do basic file management, searching, and editing. Two of my favorite new ones are RipGrep by Andrew Gallant (which lets you search for strings or patterns in text files) and fd by David Peter (which let you search for files by name or other properties).

Why use the command line?

If you are new to a tool, discoverability is an important feature. If you are experienced with a tool, all the hints of where to find things are more distractions than they are useful.

As someone who needs all the focus that I can get⁵, distractions are bad. And so are extra steps: Why spend the time moving the mouse around to access one menu, then another, when on the command line, I can just type the command I already know for what I need to do.

Additionally, the command line is designed to save on extra typing. Generally, most modern command lines support “tab completion,” where you can type the beginning of the command, or a file that it’s operating on, and press the [TAB] key, the command line interpreter will complete the word for you – or list the possibilities if there are multiple.

For a newbie, it might be an intimidating, but for someone who’s used to it, it stays out of your way and lets you get stuff done – while showing you a detailed transcript of what you’ve been doing, in case you forget what exactly it was you were trying to do.

Command lines are even more important on the server. While Windows servers come with a graphical user interface you can remote login into, Unix⁶ servers generally don’t. It’s more efficient to just allow administrators a command line interface – and for most server administrators, it’s quite enough.

And while command lines are not closer to the operating system in a deep technological sense, they are closer to the operating system by convention. They tend to have all the options that a power user would want – and easy ways to specify them, rather than hiding them behind multiple warning signs and buttons labelled “Advanced….”

Last but not least, if you have a series of GUI actions that you often do, you usually have to just keep doing them, even if it’s very tedious. Precious few programs let you do something like write a shortcut key for five menu commands. On the command line, however, you can use aliases or scripts, where a short command stands for a long command, or a single command stands for a whole sequence of commands. You just put into a file the same text you would type at the prompt.

How does the command line actually work?

Generally, a terminal emulator or command line window has a process running in it that presents the prompt (C:\> or similar on Windows, normally something ending with $ on Unix). It then takes in the command, takes the first word, and runs that as a program. This program is launched as a separate process, just like clicking on a program icon launches a separate process in a graphical user interface. The shell waits in the background for the process to finish, and then presents a new prompt. On a modern multitasking operating system, the shell generally also allows you to run commands in the background, and use key combinations (Ctrl-Z on Unix) to put a process in the background, and commands like fg to bring processes back to the foreground. This allows you to run multiple programs at once within the terminal.

On Linux, when a program starts, it conventionally has three open files, 0, 1, and 2, for input, output, and error, respectively. On the command line, by default (for it is configurable), these all correspond to the terminal: input is read in from the keyboard on the terminal (by default line by line), and output and errors are outputted to the terminal. GUI programs will have these three files open when they start too, but unless they’re started from the terminal, the output will normally just silently be ignored.

The program can also draw a window, if a graphical environment is available. On Linux, it is easy for the same program to have a command line interface, and a graphical interface – sometimes at the same time. This is useful if it’s mostly used from the command line, but sometimes also wants to do things like show a chart or graph that can be generated.

macOS and Windows have more complicated GUI frameworks that make a GUI application more different in structure from a command line operation, but you can still launch GUI applications from the command line.

Footnotes

An application is just a computer program that does a task besides making the computer system work as a whole, a task interesting to the user. Examples include word processors, spreadsheets, chat apps, and video games. It’s not so much a rigorous technical term as an amorphous category of software. ↩︎
A desktop environment, also known as a graphical shell, is a graphical user interface for managing the windows you have open, and providing computer-wide menus for launching applications. It also controls the root window, which is what you see when you have no windows open, normally used for shortcuts and files you’re currently working on. Windows and macOS both provide their own desktop environments, which generally aren’t mentioned by name – they are just part of the operating system. Linux and most other Unixes, when they have graphical interfaces at all, can be used with a variety of different desktop environments. ↩︎
This image is taken from Wikimedia Commons. It is by Jason Scott, and available under CC BY-SA 4.0. It was modified by the Wikimedia poster by removing the background. ↩︎
Oddly enough, most web browsers support running without the browser window actually being displayed, in a headless mode. This is generally not usable purely from the command line, but in the context of being wrapped in a larger program (which might be a command line program). Additionally, Microsoft Word and Photoshop can be programmatically controlled – they are both scriptable – but as far as I know neither Microsoft nor Adobe have chosen to provide a command line interface to this functionality, even though they could. Again, it’s about what’s actually available on your computer. ↩︎
It has been said that I have a deficit of attention. ↩︎
I use Unix in a broad sense to include Unix-like operating systems like Linux and the BSDs, even if they aren’t Unix in a trademark sense. ↩︎

Can computers think things?

2023-09-30T00:00:00+00:00

This blog post isn’t about ChatGPT. It isn’t about machine learning, neural nets, or any mysterious or border-line spiritual form of computing. That’s a whole ’nother set of philosophical and metaphysical conundrums (conundra?).

This is about a way people sometimes speak, informally, about bog-standard boring non-AI computers and computer programs. You’ve probably heard people speak this way. You’ve probably spoken this way sometimes yourself:

“The server thinks your password is wrong.”
“The computer thinks you’ve lost the connection.”
“The phone thinks you want to use your headphones. It’s wrong though.”

We normally interpret this as a metaphor, but I’m not sure it is. Is the phone “thinking” you want to use your headphones rather than your car speaker substantially different from us “thinking” our friend would rather get a phone call than a text message?

Part of the problem here is that the word “think” in English can mean different things.

It can mean to cognate, to go through a rational series of propositions in our brains, expressed as internalized speech in our mind’s ear or diagrams in our mind’s eye or pure abstractions. “I am thinking about how to approach this physics problem.” Computers probably cannot do this, and certainly are nowhere as good at it as humans are, not even with this fancy new AI software everyone’s playing with.

But it can also mean to have a belief, a mental model about reality. “I think Joe doesn’t like me very much.” Or, “I think the reason the car won’t start is because the battery is dead.” Computers, I will argue, can do something remarkably similar to humans in this category.

Some languages distinguish these two meanings of “think.” English learners of German often say denken (to cognate), when they mean glauben (to believe), in contexts where both would translate as “to think.” And then, in case that was too simple, there’s also meinen, which means “to suppose” or “to opine,” also used when English speakers might say “to think.”

So here’s my thought on this, or rather, my opinion (meine Meinung):

Computers cannot yet denken, or cognate, like humans. But computers can definitely glauben, or internally believe, specific facts, and they’ve been able to do that since the day they were invented.

In order to figure out whether this is true, we first need to establish what it means to believe something, and then see if computers can do it. What does it mean for humans to think something, to believe something about the world? Can we extract a definition that can then be applied to computers, to see whether computers are capable of the same thing?

So, what does it mean for us to think something is true? Well, it means that we have some internal state, some internal information stored in the physical arrangement of our brains, that corresponds to that thought or belief. We then use that internal state to inform our behavior. If we think our friend would rather get a phone call than a text message, then we might choose to accomodate that and call them instead of texting them.

This internal state, when all is going well, corresponds to a specific external reality. The goal is for the internal state to match the external reality. Sometimes this goal is not met – sometimes we misapprehend the situation, our belief is wrong, or what we think is true is not true. But if we are wrong, we have the same internal state as we would have if we were right, and things were working.

We can therefore define believing or thinking that a proposition X is true thus:

A being believes X is true if they have an internal state that, when the being is functioning correctly, corresponds to X being true, that then informs their behavior such that it is the behavior that makes sense if X is true, rather than the behavior that makes sense if X is not true.

Applied to phone example, we have some internal state in our brain that indicates that “Jill would rather get a call than a text.” How do we know that the state indicates that proposition? Well, we know that when our brains are functioning correctly (a hard thing to define, but also a concept everyone uses all the time), we only have that internal state when the proposition is true. And, we also know that this internal state drives behavior consistent with that proposition being true. Assuming we want to accommodate Jill’s preference, we will call her instead of texting her, an adaptive decision if the belief is true, and a non-adaptive one if the belief is false.

With this framework, it seems almost easier to establish that computers can think something is true than that humans can do this. Humans often have complicated, ambivalent beliefs and thoughts. Humans will often believe something for reasons other than an efficient assessment of its truth value, and act contrary to their own earnestly held beliefs. I think this definition still works for humans, if you take all the confounding factors into consideration, but it’s hard: We get into things like “conscious” or “subconscious” beliefs, or “he says he thinks X, but his actions show he really thinks Y.” And, of course, it’s extremely difficult to define whether a human is “functioning correctly.”

With computers, however, they think all sorts of things. For example, let’s talk about whether a computer thinks a user has administrator privileges. You might see code like this:

let has_admin_privileges: bool = is_admin(conn.get_current_user());

Now, we have an internal state in the computer, a boolean (i.e. true or false) variable that is intended to correspond to whether the user has administrator privileges. If the code is functioning correctly, this variable will take on the value true. We know this, because the definition of “functioning correctly” is implicit in the way the programmer wrote the code, and how they named the variable.

Furthermore, the following lines of code are almost certainly behaviors in line with that interpretation of the internal state.

if has_admin_privileges {
    // Do the thing
    requested_task.perform()?;

    // Signal success
    Ok(())
} else {
    // Signal an error
    Err(Error::AccessDenied)
}

So, when people say things like “my phone thinks I want to use my Bluetooth headphones,” it means that there is information encoded in the silicon of the phone, possibly in an explicitly-named variable, that corresponds to that belief.

So now that I’ve thought this through properly, I don’t even think statements like this are metaphorical. I think they are literally true, and completely appropriate.

My Dream C++ Additions

2023-08-30T00:00:00+00:00

UPDATE: I have updated this post to address C++ features that address these issues or have been purported to.

I have long day-dreamed about useful improvements to C++. Some of these are inspired by Rust, but some of these are ideas I already had before I learned Rust. Each of these would make programming C++ a better experience, usually in a minor way.

Explicit `self` reference instead of implicit `this` pointer

UPDATE: This is coming out in C++23, and they did it right! I’m excited! Good job C++!

I admit I haven’t been paying close attention to C++ post C++14. C++17 was up-and-coming and I hadn’t finished learning everything I wanted to about it when I left C++ programming. And I refuse to be embarrassed for not knowing about a feature in a programming language that is not my favorite before any compiler even supports it.

But I am indeed excited for them! This is a substantial improvement I have wanted since well before C++11 came out. They’ve done it pretty close to how I wished for it here, and they have good reasons for how they made it.

There are a few weird parts of this.

For one, it is a pointer, but it is never allowed to be null, and it cannot be modified to point to a different object. In both of these ways, it behaves more like a reference than a pointer.

class Foo {
public:
    void bar() {
        this = new Foo{}; // Error
    }
};

int main() {
    Foo *foo = nullptr;
    foo->bar(); // Undefined behavior
}

For another, when we want to put a modifier on this, like const or volatile, there is nowhere obvious in the function signature to put it. We have to put it awkwardly after the parameters, before the ; or {:

class Foo {
public:
    void bar() const volatile && {
        // Do stuff
    }
};

Oddly enough, whether the parameter is taken by lvalue or rvalue can also be specified, which would make way more sense for a reference parameter instead of a pointer.

The modifiers have to go in this odd location because this is implicit. This is in line with OOP ideology and theory, but in my mind, it’s just a negative. If you have to think about whether it’s const or taken by rvalue anyway when writing the signature, why put those modifiers somewhere you might forget about, instead of right with the declaration of the parameter.

I would change the syntax to fix both of these issues with one fell swoop: allow an explicit self as an alternative to implicit this, and make it a reference:

class Foo {
public:
    void bar(&self) {
        self.baz();
    }

    void baz(volatile const &self) {
        // Do stuff
    }
}

The type would still be implicit, but modifiers can be specified where the type would be. You would also only be able to take by reference or rvalue reference, and never by value, because implicit copy on method call would be a new feature of questionable value. It would not conflict with existing code, as a parameter named self without an explicit type would be illegal under the current syntax.

Of course, this looks rather similar to Rust’s syntax, but believe it or not, I had this idea long before I learned that Rust does self in this way.

A new `byte` type for `uint8_t` and `int8_t`

In C++, the type we use for an individual byte of data, by definition, is char. This is the definition of char in the standard, and while the byte length (CHAR_BIT) doesn’t have to be 8 bits, other standard provisions and practical considerations mean that on a modern platform, it always is.

We might use uint8_t or int8_t for bytes in practical code, but these are defined as typedefs to unsigned char and signed char – I don’t know whether this is required by the standard but it is always done in practice.

However, char is also the type we use for text data, so it is a type with two different contrasting (perhaps even contradictory) sets of semantics.

That leads to many odd results, including the fact that char cannot represent all Unicode characters because it has to be 1 byte long. But the one I want to focus on today is a bit weirder. What does this code print?

#include <cstdint>
#include <iostream>

struct message_data {
    uint8_t message_type;
    uint8_t message_length;
    uint8_t data[1];
};

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << mesg.message_type << std::endl;
    std::cout << "Length: " << mesg.message_length << std::endl;
}

int main() {
    message_data data;
    data.message_type = 100;
    data.message_length = 0;
    print_message_hdr(data);
    return 0;
}

Well, if you thought the numbers 100 and 0 would show up on the output, you’d be wrong. std::cout’s operator<<’s char overloads are triggered, and so these fields, clearly meant as integers, are printed as text:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: d
Length:
[jim@palatinate:~]$

In order to get the integer print-outs we want, we have to override this strange default behavior, perhaps by casting the values to uint16_t before printing them:

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << uint16_t(mesg.message_type) << std::endl;
    std::cout << "Length: " << uint16_t(mesg.message_length) << std::endl;
}

This results in a better output:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: 100
Length: 0
[jim@palatinate:~]$

So, how do we make this a little more ergonomic? We introduce a byte type, that is similar to char, but overloads differently. Like any other integer type, it defaults to signed, and then we add overloads to operator<< and others to treat it like an integer, not like a character. Switching between byte and char would be an implicit cast, but for overloading purposes, they would be different types.

uint8_t and int8_t could then be defined in terms of byte.

I do not know what backwards-compatibility implications it has, but I do think the decision to make char mean byte as its primary meaning instead of “character” was a particularly poor one, and anything we can do to migrate away from it would be good.

Update: Someone drew my attention to std::byte. This one I was aware of, but had not thought about here as I didn’t think it really solves the problem. As it is, it is not an arithmetic type, and therefore cannot be used as the underlying type of uint8_t, leaving the confusing behavior in place.

Real if-else Expression Syntax

Oftentimes, in C++, I find myself writing code like this:

int32_t error_code;
if (setting == Setting::Socket) {
    error_code = initialize_socket();
} else { // setting == Setting::Pipe
    error_code = initialize_pipe();
}

if (error_code < 0) {
    // ...
}

This error_code variable is just one example. I often want to have a variable get different values depending on which side of the if-else statement it’s on, without having to declare the variable without an initializer right ahead of it, and write two assignment statements. Basically, I want if-else to be an expression.

Now, of course, C++ already has the ternary operator: ?:. But it’s so ugly and unreadable that no one uses it, for good reason. It’s hard to remember what the precedence is, meaning if we want to be rigorous and friendly to our readers we need to bracket with ( and ) even if strictly unnecessary, and the result looks like garbage and is hard to format in a way that’s remotely readable:

int32_t error_code = (setting == Setting::Socket
    ? initialize_socket()
    : initialize_pipe()
);

What do I want instead? I want if-else to have this role, to be an expression, where it evaluates to the value of the end of each block (with no semicolon, to make clear that it’s an expression not a full statement):

int32_t error_code = if (setting == Setting::Socket) {
    initialize_socket()
} else {
    initialize_pipe()
};

This is way better than ?:. The blocks can be multiple statements long if necessary. You can add if-else if-else chaining. And, most importantly, it can be formatted like any other if-else.

Update: Someone drew my attention to a lambda-invocation pattern that is, in my mind, equally ugly to ?:, and also leaves you without the ability to return from the enclosing function within the block. This strikes me as extremely hackish and not really an improvement, but I suppose that’s where C++ is going. I am at a loss for why they didn’t just implement GCC’s expression blocks, followed by if as expression. It’s clearly much better in my mind.

I’ve seen the technique from time to time but I guess I figured it was too hackish to mention. I didn’t realize it was getting officially recommended in C++ Core Guidelines. I feel like when they were recommending it, they should’ve simultaneously been trying to get more usable and obvious features included in the programming language itself. Maybe they are, and if so I wish them luck in that! Maybe C++30 will be a safe and usable programming language, equivalent to Rust now.

Variable Shadowing

On a related note, I want to have multiple variables with the same name shadow, rather than resulting in an error message. I want the new variable with the same name to simply hide the old variable, rather than giving me a “conflicting declaration” error (or similar).

Why? Well, a lot of production code involves taking the same conceptual thing, and migrating it through many types. Without shadowing, we have to use awkward Hungarian notation.

void handle_data(const void *data_v, size_t size) {
    const uint8_t *data_ch = (const char *)data_v;
    std::vector<uint8_t> data{data_ch, data_ch + size};
    // Actually do something with `data`
}

The new way would look like this:

void handle_data(const void *data, size_t size) {
    const uint8_t *data = (const char *)data;
    std::vector<uint8_t> data{data, data + size};
}

This also cuts down on how many variables are in scope at once.

This bugs people who are new to Rust sometimes, but it’s fairly easy to learn, and C++ has asked people to learn much, much harder things. Once learned, it is really useful, as the alternative is to use Hungarian notation or equivalents. It also helps you use the right value, as you won’t accidentally go back and use an old one, as it’s shadowed.

First-Class Support for Sum Types

std::variant is awful. I know, because few people except die-hards use it, and people use the Rust equivalent, enums, all the time. The weirdest thing about std::variant is that it supposes that all of the variants hold exactly one value, and one variant per type is sufficient. In reality, multiple variants might hold values of the same type, and many variants don’t need a value – both of which are possible but clumsy to express using std::variant’s semantics.

But C++11 already introduced enum class for more powerful enums! Let’s go all the way and add Rust-style values associated with it, for a compiler-implemented tagged union. The implementation of std::option’s fields would be so much simpler.

template <typename T>
enum class option {
    None,
    Some {
        T value;
    },

    // OK, define some methods
}

This interacts with object lifetimes and constructors in a complicated way, but if there were interest, I know it could be figured out. If you don’t think this feature is necessary, I suspect you’ve spent too long programming without it. Once you get used to this, it’s really hard to go without.

Conclusion

I am not going to do anything to try to make these things happen. I’m sure I’m not the most popular in the C++ community after my long write-ups of how Rust is so much better, and it’s not where my primary interests lie anymore. But, if someone were to make these features happen, it would make my life much easier, when for good reasons, projects I’m working on require me to code in C++.

In Defense of 'C/C++'

2023-08-28T00:00:00+00:00

One of the minor points I discussed in my response to Dr. Bjarne Stroustrup’s memory safety comments was the controversial, apparently deeply upsetting term C/C++. It is controversial and interesting enough that I decided to say a little more about it here.

A little background: Many people, especially outside the C and C++ communities (which, to be clear, don’t always like each other that much) use the term C/C++ to talk about the two programming languages together, as an informal short-hand for “C and C++” or “C or C++.” Within the ~~C/C++~~ C and C++ communities, it is widely hated.

And now for me to say the thing guaranteed to anger the most possible people: I see both sides of this debate.

On the one hand, the term “C/C++” is especially jarring because C and C++ fans regularly engage in actual controversy (famously including Linus Torvalds, of the C-based Linux kernel, insulting C++ and its programmers). It is frustrating to be a C++ programmer, to have strong opinions on what it means to be a C++ programmer, to think that C programmers are making a misguided decision, that using C over C++ is technologically backwards and regressive, and hear people cavalierly implying that the programming languages are the same. And likewise, of course, for the C programmer who feels similarly about C++.

And continuously, both C and C++ programmers are exposed regularly to people who mix up the programming languages when it is harmful. They see bosses and hiring managers who expect you to transition back and forth between them without any friction, and to enjoy them equally. They see resources that promise to teach you “C/C++ skills,” and know that they won’t teach how to use either the way that that language’s particular community actually prefers. They see people using “C/C++” all the time to talk about the languages in a way that only would make sense if they were much more similar than they, in fact, are – or at the very least, than a die-hard partisan of C or C++ would think they are.

And I do think this is understandable. After all, people don’t tend to lump together other languages like this. Java and C# are probably equally related (if not more related), and no one writes that they’re hiring a “Java/C# programmer.” Why should C and C++ get treated this way?

But, on the other hand, C and C++ are actually extremely closely related programming languages. I was writing something recently comparing Rust features to C and C++ features, specifically Rust enums to the tagged union idiom which is used in … C and C++, in very similar ways. I know all the reasons why as a C programmer and as a C++ programmer, I’m not supposed to write C/C++, and still, I was tired of writing “C and C++” over and over again to describe this particular thing that those languages have in common.

It turns out the real problem isn’t the act of writing “C/C++” – it turns out that just banning a problematic word doesn’t fix the real problem here at all – if it has ever fixed any problems. Some people do need to be told that C and C++ programming are different programming languages, different communities and different skillsets, even though they are still related skillsets and related programming languages. But some people who don’t need to be told that still find themselves needing a shorthand sometimes, and don’t feel the cultural need to be over-accommodating in avoiding it.

Because when two things are similar – and stop me if this is confusing! – there’s some ways in which they’re the same, and some ways in which they’re different. Sometimes, it makes sense to lump them together, and sometimes, it doesn’t. But yelling that people shouldn’t write “C/C++” won’t magically help anyone understand this – especially since those people are almost certainly not listening, and you’re preaching to the choir.

In the case of Dr. Stroustrup, he was using the “faux pas” of the NSA using “C/C++” to avoid having to actually address what they said and defend C++. He brought this up in his criticism of the NSA white paper:

As is far too common, it lumps C and C++ into the single category C/C++, ignoring 30+ years of progress.

I said, among other things, that Dr. Stroustrup was being unnecessarily exclusionary based on buzz-words:

He’s reading too much into the orthography and the NSA’s failure to use insider shibboleths of the programming languages they’re trying to criticize. Outside of the “C” and “C++” communities, “C/C++” is a fairly common way to refer to the two related programming languages.

But also, he was calling them out when they were right. In the specific category that the NSA was talking about, there actually is no difference, as I also mention in my post:

While there might be 30+ years of divergence between C and C++, none of C++’s so-called “progress” involved removing memory-unsafe C features from C++, many of which are still in common use, and many of which still make memory safety in C++ near intractible.

Perhaps we all should spend more time thinking critically than nit-picking word choice. And perhaps I should find something better to do than writing blog posts joining the fray, so that’s all I’ll say on the issue for now.

C++ Papercuts

2023-08-26T00:00:00+00:00

UPDATE: Wow, this post has gotten popular! I’ve written a new post that adds new papercuts combined with concrete suggestions for how C++ could improve, if you are interested. Also, if you want to read more about C++’s deeper-than-papercut issues, I recommend specifically my post on its move semantics. Thank you for reading!

My current day job is now again a C++ role. And so, I find myself again focusing in this blog post on the downsides of C++.

Overall, I have found returning to active C++ dev to be exactly what I expected: I still have the skills, and can still be effective in it, but now that I have worked in a more modern programming language with less legacy cruft, the downsides of C++ sting more. There are so many features I miss from Rust, not only the obvious safety features, or even primarily those, but also features that C++ could easily add, like first-class support for sum types (called enums in Rust), or tuples. (Clarification for C++ Fans: std::tuple and std::variant are not first class support, and if you’re used to first class support, you know how unacceptably clunky they are.)

In this blog post, I will focus on the minor problems of C++ that have affected me the most, the little usability papercuts, the petty inconveniences that just waste time. Instead of focusing on comparing them to Rust or other programming languages, I will focus on why they don’t make sense from a C++ point of view, with reference to just C++. I know better than to hope that by doing this that die-hard C++ fans will accept my criticism, but perhaps it will be relatable to C++ programmers who don’t have Rust experience.

Before I start getting into the papercuts, though, I want to address one of the primary defenses I’ve seen of C++, one that I’ve found particularly baffling. It goes something like this:

C++ is a great programming language. The complaints are just from people who aren’t up to it. If they were better programmers, they’d appreciate the C++ way of doing things, and they wouldn’t need their hand held. Languages like Rust are not helpful for such true professionals.

Obviously, the phrasing is a bit of a parody, but I’ve seen this sort of attitude so many times. The most charitable view I can take of it is a claim that C++’s difficulty is a sign of its power, and the natural cost of using a powerful programming language. What it reads like to me in many cases, however, is as a form of elitism: a general idea that making things easy for poorer programmers is pointless, and that good programmers don’t benefit from making things easier.

As someone who has programmed C++ professionally for a majority of my career, and who has taught (company-internal) classes in advanced C++, this is nonsense to me. I do know how to navigate the many papercuts and foot-guns of C++, and am happy to do so when working on a C++ codebase. But experienced as I am, they still slow me down and distract me, taking focus away from the actual problems I’m trying to solve, and resulting in less maintainable code.

And as for the upside, I see very little that C++ gets in exchange for all of this difficulty. The only ways in which C++ is more performant or more appropriate than Rust are in terms of platform support, legacy codebases, optimizations that are only available in specific compilers that happen to not support Rust, or other concerns irrelevant to the actual design of the programming language.

While I am proud of my C++ skills, I am not too proud to appreciate that better technology can render them partially obsolete. I am not too proud to appreciate having features that make it easier. In most cases, it’s not a matter of the programming language doing more work for me, but of C++ creating unnecessary extra make-work, often due to decisions that made sense when they were made, but have long since stopped making sense – don’t get me started on header files!

But I also want my programming language to be beginner-friendly. I am always going to work with other programmers with a variety of skill-sets, and I would rather not have to clean up my colleagues’ mistakes – or mistakes of earlier, more foolish versions of myself. If making a programming language more beginner-friendly sacrifices power, then I agree that some programming languages should not do it. But many, even most of C++’s beginner-unfriendly (and expert-annoying) features do not in fact make the language more powerful.

So, without further ado, here are the biggest papercuts I’ve noticed in the past month of returning to C++ development.

`const` is not the default

It is very easy to forget to mark a parameter const when it can be. You can just forget to type the keyword. This is especially true for this, which is an implicit parameter: there is no time when you are typing out the this parameter explicitly, and therefore it won’t sit there looking funny without the appropriate modifiers.

If C++ had the opposite default, where every value, reference, and pointer was const unless explicitly declared mutable, then we’d be much more likely to have every parameter declared correctly based on whether the function needs to mutate it or not. If someone includes a mutable keyword, it would be because they know they need it. If they need it and forget it, the compiler error would remind them.

Now, you might not think this is important, because you can just not use const and have functions with capabilities they don’t need – but sometimes you have to take things by const in C++. If you take a parameter by non-const reference, the caller can only use lvalues to call your function. But if you take a parameter by const reference, the caller can use lvalues or rvalues. So some functions, in order to be used in natural ways, must take their parameters by const reference.

Once you have a const reference, you can only (easily) call functions with it that accept const references, and so if any of those functions forgot to declare the parameter const, you have to include a const_cast – or go change the function later to correctly accept const.

Lest you think this is just a sloppy newbie error, note that many functions in the standard library had to be updated to take const_iterator instead of or in addition to iterator when it was discovered correctly that they made sense with a const_iterator: functions like erase. It turns out that for functions like erase, the collection is what has to be mutable, not the iterator – a fact that the maintainers of the C++ library simply got wrong at first.

Obligatory Copying

In C++, for an object to be copyable is the default, privileged way for an object to behave. If you don’t want your object to be copyable, and all its fields are copyable, you often have to mark the copy constructor and copy assignment operator as = delete. The default is for the compiler to write code for you – code that can be incorrect.

If you do make your class move-only, however, beware, because that means that there are situations where you can’t use it. In C++11, there was no ergonomic way to do a lambda capture by move – which is usually how I want to capture variables into a closure. This was “fixed” in C++14 – for when you want what should have been the default from the beginning, you can now use extremely clunky move-capture syntax.

However, even then, good luck using the lambda. If you want to put it in a std::function, you’re still out of luck to this day. std::function expects the object it manages to be copyable, and will fail to compile if your closure object is move-only. This is going to be addressed in C++23, with std::move_only_function – but in the meantime, I have been forced to write classes with a copy constructor that throws some sort of run-time logic exception. And even in C++23, copyable functions will be the default, assumed situation.

This is strange, because most complicated objects, especially closures, are never, and should never be, copied. Generally, copying a complicated data structure is a mistake – a missing &, or a missing std::move. But it is a mistake that carries no warning with it, and no visible sign in the code that a complex, allocation-heavy action is being undertaken. This is an early lesson to new C++ devs – don’t pass non-primitive types by value – but it’s possible for even advanced devs to mess up from time to time, and once it’s in the codebase, it’s easy to miss.

By-Reference Parameter Papercuts

It is unergonomic to return multiple values by tuple in C++. It can be done, but the calls to std::tie and std::make_tuple are long-winded and distracting, not to mention that you’ll be writing unidiomatically, which is always bad for people who are reading and debugging your code.

Side note: Someone brought up structured bindings in a comment, as if this fixed the issue. Structured bindings are a great example of the half-way fixes that proponents of modern C++ love to cite. Structured bindings help some, but if you think they make returning by tuple ergonomic, you’re mistaken. You still need to either write std::pair or std::make_tuple in the function return statement, or std::tuple in the function’s return type. This isn’t the worst, but it’s still not as light-weight as full first-class tuple support, and it’s not enough to have convinced people to not use out parameters, which are my real complaint.

And even at that, it’s not that out parameters (or in-out parameters) are bad, but that they’re bad in C++, as there is no good way to express them.

So what do we do instead? The clunkiness of tuples leads people to instead use out parameters. To use an out parameter, you end up taking a parameter by non-const reference, meaning the function is supposed to modify the parameter.

The problem is, this is only marked in the function signature. If you have a function that takes a parameter by reference, the parameter looks the same as a by-value parameter at the call site:

// Return false on failure. Modify size with actual message size,
// decreasing it if it contains more than one message.
bool got_message(const char *mesg, size_t &size);

size_t size = buff.size();
got_message(buff.data(), size);
buff.resize(size);

If you’re reading the calling code quickly, it might look like the resize call is redundant, but it is not. size is being modified by got_message, and the only way to know that it is being modified is to look at the function signature, which is usually in another file.

Some people prefer out parameters and in-out parameters to be passed by pointer for this very reason:

bool got_message(const char *mesg, size_t *size);

size_t size = buff.size();
got_message(buff.data(), &size);
buff.resize(size);

This is great – or would be, if pointers weren’t nullable. What does a nullptr parameter mean in this context? Is it going to trigger undefined behavior? What if you pass a pointer from a caller into it? People often forget to document what functions do with a null pointer.

This can be addressed with a non-nullable smart pointer, but very few programmers actually do this in practice. When something isn’t the default, it tends to not be used everywhere where appropriate. The sustainable answer to this is changing the default, not heroic attempts to fight human nature.

Obligatory side-gripe: At least in non-owning situations like this, it is possible to write such a smart pointer. However, if you want to write the obvious companion, a non-nullable owning smart pointer, a companion version of std::unique_ptr, then it cannot be done in a useful way, because such a pointer cannot then be moveable.

Method Implementations Can Contradict

In C++, every time you write a class, especially a lower-level one, you have a responsibility to make decisions about certain methods with special semantic importance in the programming language:

Constructor (Copy): X(const X&)
Constructor (Move): X(X&&)
Assignment (Copy): operator=(const X&)
Assignment (Move): operator=(X&&)
Destructor: ~X()

For many classes, the default implementations are enough, and if possible you should rely on them. Whether or not this is possible depends on whether naively copying all of the fields is a sensible way to copy the entire object, which is surprisingly easy to forget to consider.

But if you need a custom implementation of one of these, you are on the hook to write all of them. This is known as the “rule of 5.” You have to write all of them, even though the correct behavior of the two assignment operators can be completely determined by the appropriate constructor combined with the destructor. The compiler could make default implementations of the assignment operators that refer to those other functions, and therefore would always be correct, but it does not. Implementing them correctly is tricky, requiring techniques like either explicitly protecting against self-assignment, or swapping with a by-value parameter. In any case, they are boilerplate, and yet another thing that can go wrong in a programming language that has many such things.

Side note: One commentator did not understand what I meant. It is true that many classes can use = default for all these methods. However, IF you customize the copy constructor or move constructor, you must THEN also customize the assignment operator to match, even though the default implementation could have been correct, if the language was defined more intelligently.

I thought this was clear by citing the rule of 5, which essentially says this.

The full rule is explained on CPP Reference. If you customize the copy or move constructor, the corresponding = default assignment operator will be wrong. Be careful! Note how the example code does not use = default for the assignment operators, even though the assignment operators contain no logic.

“Modern” C++

After seeing comments on Hacker News, I felt compelled to add this section. Every time someone complains about anything in C++, someone will mention a newer version of C++ that fixes it. These “fixes” are usually not that good, and only feel like fixes if you’re used to everything being kind of clunky.

Here’s why:

The default way still is the old, bad way. For example, capturing lambdas by move should be the default, and std::move_only_function, coming soon in C++23, should have been the default std::function.
For that reason, and because there’s never warnings enabled on the old, bad way, even new coders keep doing things the bad way.

Of course, I understand that this is important for backwards-compatibility. But that is the entire problem: C++ has too many bad decisions accumulated. Why was copying the default for parameter passing collections, let alone for lambda capture? I know the historical reasons, but that doesn’t mean that a modern programming language should work that way.

Even C++11 couldn’t clean up the fact that raw pointers and C-style arrays get nice syntax, while smart pointers and std::array look terrible. Even C++11 couldn’t clean up that it was working around a language designed without moves.

Conclusion

Unfortunately, I am all too well aware of why these decisions were made, and it is exactly one reason: Compatibility with legacy code. C++ has no editions system, no way to deprecate core language features. If a new edition of C++ was made, it would cease to be C++ – though I support the efforts of people to transition C++ to new syntax and clean some of this stuff up.

However, if you ignore backwards-compatibility and the large existing codebases, none of these papercuts make the programming language more powerful or better, just harder to use. I’ve seen good-faith arguments in favor of human-maintained header files, surprising as that is to me, but I challenge my readers to tell me what is beneficial about C++’s design choices in these matters.

You might find these things trivial, but these all slow programmers down, while simultaneously annoying them. If you are experienced enough, your subconscious might be adept at navigating it, but imagine what your subconscious could do if it didn’t have to. But how adept are you at seeing these mistakes in a code review from your junior colleagues? If you are a rigorous reviewer, how much more time does it take? How adept are you at finding these issues quickly when a bug arises?

We’d be more effective, more efficient, and happier if these issues were resolved. Programming would be both enjoyable and faster to do. What’s the downside? The only upside is continuity with history. And while I can see the value in that, it is a very limited value, with very limited scope.

New Link: Technical Only RSS

2023-08-06T00:00:00+00:00

TLDR: I am adding a new link for RSS subscribers who just want to subscribe to technical posts. The RSS feed has always been available, but it is now explicitly one of the links across the top, for those who want their RSS feed to only give them my new technical posts.

I am writing this post primarily to let people know about this new link, but I also want to muse on it a little.

I realize that I have, in some ways, two blogs here in one website.

The Coded Message is primarily read for its technical content, especially for the posts about Rust. But I also write about other topics that interest me, and those posts are generally much less popular.

I combine them on the same website for a few reasons.

For one, it’s easier for me to have one blog. Blogging is a hobby for me, and so it has to play second fiddle to other life obligations, which is most of why I’ve been slow to finish some blog series and some promised future posts – I have not forgotten. This also means that anything that would make blogging harder for me, including separating out these blogs into two fully separate websites, is likely to make me blog substantially less. Laziness might not always be a virtue, whatever Larry Wall might say, but some amount of it is essential to actually accomplishing goals, especially in the hobby space.

But there is also a reason besides laziness, that is a little harder to articulate. As much as this blog largely concerns my professional work, it is my personal blog. All of the programming posts are laden with my personal opinions about programming, and this website is about everything I personally have to say publicly on any topic, not just programming. A separation between my professional and personal blogs would lead, in my own mind, to a sense of obligation to make the professional blog a polished resource for programmers, with more organization and possibly even a regular schedule, as opposed to merely being a forum where I hold forth on whatever topics interest me, which often but not always happens to be programming.

That said, I do make all my posts in the hopes that people read them, and find them useful in some way (even if that use is, as for my fiction posts, primarily entertainment). And I am aware that a large portion of my readership primarily, or even exclusively, finds my technical posts useful. As much as I may wish that all of my readers who are here for Rust content also care about my musings on other topics, I know that many of them do not, or even seriously disagree with me on these topics.

I try already to accommodate this. If you sign up for my newsletter, by default, you are only subscribed to technical posts, and you have to follow an additional link and explicitly subscribe if you want other topics. If you go to www.thecodedmessage.com in your web browser, you can click the link at the top labelled Computers/Programming Posts. And now, if you want to subscribe to just the technical posts via RSS, there is also a link at the top for that purpose.

I still encourage people who are interested in my other posts to read them, and I still plan on having this website combined for at least the medium-term future, but I wanted people to know that a technical-posts only RSS feed was available, if they so chose.

As always, I welcome feedback on my blog in the form of comments and e-mails (jah259 at cornell dot edu). Thank you so much for reading!

Walk-Through: Prefix Ranges in Rust, a Surprisingly Deep Dive

2023-06-24T00:00:00+00:00

Update: Arvid Norlander has gone through the trouble of refactoring this code into a crate and publishing it. Thank you, Arvid!

Rust’s BTreeMap and corresponding BTreeSet are excellent, B-tree-based sorted map and key types. The map implements the ergonomic entry API, more flexible than other map APIs, made possible by the borrow checker. They are implemented with the more performant but more gnarly B-Tree data structure, rather than the more common AVL trees or red-black trees. All in all, they are an excellent piece of engineering, and an excellent standard library feature.

But they aren’t perfect, as I learned recently when I had a very specific operation that I needed to perform on one. I scanned the method lists diligently, trying to find the one I needed, but it was not there. range was close, but not quite there, and so I would simply have to implement the operation by hand. range is defined based on a start key (where, at our option, it includes keys that are greater than or equal to that key, or strictly greater than that key) and an end key (where the keys in the range are either less than or equal, or strictly less than that key).

Here is an example of the use of range:

let set = {
    let mut set = BTreeSet::new();

    set.insert("ABC");
    set.insert("DEF");
    set.insert("DEG");
    set.insert("HIJ");
    set.insert("KLM");
    set.insert("NOP");

    set
};

for elem in set.range("DEF".."N") {
    println!("{elem}");
}

It outputs starting with "DEF", continuing in order through the set, but not including "NOP", as that is greater than "N" (lexigraphically and therefore according to &str’s Ord instance). If "N" were in the set, it would not be printed, as .. is exclusive on the right side. ..= would include it.

Maps and sets: A brief aside

This discussion only concerns the keys of a map. For simplicity’s sake, throughout the discussion, I’ll be using BTreeSet, a wrapper around BTreeMap for when there are just keys (that are still unique and sorted) and no values. Internally, it contains a BTreeMap with the zero-sized struct SetValZST as its value type.

The Problem

But that isn’t the exact operation I needed. I needed all of the keys (which were also String) that started with a certain prefix. So, if the set was as in the example above, and the prefix was "DE", this operation would give me "DEF',"DEG". As you can see from the example, and as is easy to prove in general, when the keys are sorted, all the keys starting with a prefix form a contiguous range. But it is not a range that can be expressed with the range operation.

It’s close, tantalizingly close. Due to the definition of Ord on String, our prefix-based range starts with the first key that is greater than or equal to the prefix, as strings starting with a prefix always compare greater to or equal to the prefix. This side of the range is therefore expressable with the range operation.

It’s the other side that causes the problem. We don’t have a key where all the keys in the prefix are less than that key. We know that once we hit a key string that doesn’t start with the prefix, it must be greater than all the keys that do, as must all subsequent ones, but we cannot express this bound easily in terms of the prefix. We would need an element that is either the greatest possible key that starts with that prefix, or else the least possible key that does not.

There is a lot of efficiency to be gained by taking advantage of the fact that the range we want is contiguous, which is why the range method exists. But there is no operation that covers this scenario, because of the narrowness of how the range operation is defined.

On the one hand, this is frustrating. We are so close to being able to do this straight-forwardly with the provided operations. It also seems like it would be more performant to determine the bounds of that range by doing a tree search, rather than trying to implement this operation by hand. Without this operation being available, we seem doomed to slowness.

On the other hand, it’s understandable. The key type of a map is only really expected to implement the Ord trait, and nothing about Ord has anything to do with prefixes. Creating ranges with range was allowed, but based on inclusive and exclusive bounds, which is to say, purely based on ordering of opaque elements. Evaluating a prefix as a range, on the other hand (or even merely proving that the keys forming a prefix do indeed constitute a contiguous range) would be outside of the scope of the operations represented by the Ord trait.

So I needed a way of getting keys that start with a specific prefix. So what did I do? I simply coded a manual form of the operation, looping starting from the beginning of the range, and checking each iteration whether we’d left the range yet:

for key in keys.range::<String, _>((Bound::Included(prefix), Bound::Unbounded)) {
    if !key.starts_with(prefix) {
        break; // We've gone past the end of the range
    }
    // ... Actually do something with the key
}

This seemed reasonable enough. My colleagues asked me to put in a comment to clarify that, since the map was sorted, all the items with a prefix would be contiguous, and therefore break was correct and not continue. It worked, and was performant enough for my purposes in writing the code, but perhaps not as much as ideally could be achieved. I couldn’t help but wonder if it could be made a little more performant if it were part of the standard library, if we had insight into and ability to access the inner structure of how a BTreeSet is laid out. Obviously, in such a case the code would also be more concise, and (more importantly) obviously correct, without need for a comment.

The performance considerations, if present, however, would be minimal. Looping through a BTreeSet is a reasonable operation, and I took advantage of the fact that my range was contiguous to stop once we’d gone past the last item. At best, explicit library support for prefixes would simply detect this condition slightly sooner, further up in the tree, without having to actually find the node with the offending item.

The next bit of code I wrote was for a closely related operation: dropping values outside of the prefix. What I wrote seemed like it definitely would be substantially less performant than a specially coded operation from the standard library would be. It certainly was harder to prove correct:

fn prefixed(mut set: BTreeSet<String>, prefix: &str) -> BTreeSet<String> {
    let mut set = set.split_off(prefix);

    let not_in_prefix = (&set).iter().find(|s| !s.starts_with(prefix));
    let not_in_prefix = not_in_prefix.map(|s| s.to_owned());
    if let Some(not_in_prefix) = not_in_prefix {
        set.split_off(&not_in_prefix);
    }

    set
}

This uses two calls to split_off, which like range needs a concrete T, a concrete String, to serve as a comparison-point for where to split. And it is certainly less performant than a dedicated method would have been, as it also uses a call to find to find a concrete String for the end of the range, which constitutes an additional loop through all the strings in the range.

Questions

This raised two questions in my mind:

Is there a way to convert a prefix into a range that can be used with range and split_off? More concretely, is there a way to construct a String such that it is the least possible String that is still greater than all the possible strings that start with our prefix, but less than or equal to all strings that do not? Would doing so in fact improve performance?
How hard would it be to add this feature to the standard library, both for iterating and for splitting the set?

In this blog post, we will focus on the first question. The second question is reserved for a future blog post.

Testing `prefixed`

The prefixed function needs the optimization more than the loop, so we’ll focus on that in our discussion. And as we’re discussing an optimization of the prefixed function, and as it is in any case a gnarly function, we will want to write some unit tests for it.

Here’s one example:

#[test]
fn it_works() {
    let set = {
        let mut set = BTreeSet::new();
        set.insert("Hi".to_string());
        set.insert("Hey".to_string());
        set.insert("Hello".to_string());
        set.insert("heyyy".to_string());
        set.insert("".to_string());
        set.insert("H".to_string());
        set
    };
    let set = prefixed(set, "H");
    assert_eq!(set.len(), 4);
    assert!(!set.contains("heyyy"));
}

This probably isn’t enough. Additional unit tests will be left as an exercise to the reader.

Constructing an upper bound

So, let us return to our example. In our example, the prefix was "DE". As discussed, the lower bound is easy: Everything that starts with a "DE" is greater than or equal to "DE". Strings outside of the range to the left will not:

println!("{}", "DD" >= "DE");       // Prints "false"
println!("{}", "DE" >= "DE");       // Prints "true"
println!("{}", "DEF" >= "DE");      // Prints "true"
println!("{}", "DEG" >= "DE");      // Prints "true"
println!("{}", "DF" >= "DE");       // Still prints "true" -- need something
println!("{}", "NOP" >= "DE");      // Still prints "true" -- need something

The upper bound is also easy enough, actually – we just need to increment the last character. Anything that starts with a "DE" will also compare strictly less to "DF":

println!("{}", "DE" < "DF");        // Prints "true"
println!("{}", "DEF" < "DF");       // Prints "true"
println!("{}", "DEG" < "DF");       // Prints "true"
println!("{}", "DF" < "DF");        // Prints "false"
println!("{}", "NOP" < "DF");       // Prints "false"

This seems easy enough to handle. We just need to write a function that increments the last character in a string, something with this signature:

fn upper_bound_from_prefix(prefix: &str) -> String;

Incrementing the last character in a string seems like it’s just a matter of incrementing the last byte, so let’s see what that looks like:

fn upper_bound_from_prefix(prefix: &str) -> String {
    let mut prefix = prefix.to_string();
    unsafe {
        // SAFETY: It is not. ☹️. XXX
        let prefix_bytes = prefix.as_bytes_mut();
        prefix_bytes[prefix_bytes.len() - 1] += 1;
    }
    prefix
}

Well, that’s not good. It passes the unit test I wrote, but that’s because we need to write more unit tests. Unfortunately, like many programmers before us, we have forgotten about UTF-8. Rust requires all its strings to be stored as valid UTF-8 as a safety invariant. Fortunately, because we’re using Rust, we notice that we’re violating this invariant when an operation we have to invoke is marked as unsafe.

In order to capture this failure, we would have to write a unit test where the prefix ends in a multi-byte Unicode character. Unfortunately, because this is a safety issue, the test might not even fail (but it might be worth doing as an exercise anyway).

That isn’t even to mention the possibility that the prefix is empty, which would result in a panic in this code!

So, how can we get the last character of a string? get allows us to do substrings with byte indexes, but returns None if it is not a valid substring. We can loop backwards until we find an index that works for the split, and we can return an option in case the string is empty:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0 .. prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                prefix[0..i]
            };

            // ???
        }
    }

    None
}

But that gives us two strs, and we want to increment a char. So we have to extract the singular char from the last_char_str, which we know to have exactly one char in it. Looking over the operations of str, we have only one real option:

let last_char = last_char_str
    .chars()
    .next()
    .expect("last_char_str will contain exactly one char");

Walking Through `char`s

But once we do have a char, we cannot simply do + 1 on it. This operation isn’t defined on a char. And before you say that we should convert it to u32 and back, you should know that the operation is left undefined on char for a reason. chars are supposed to remain valid Unicode code points.

So, we must do something else that will skip over invalid code points. There is no obvious operation in char that will do it, but if we look in the “Trait Implementations” section, we find something that looks potentially relevant: Step. And looking at char’s implementation of Step, we see the exact function we want:

fn forward_checked(start: char, count: usize) -> Option<char> {
    let start = start as u32;
    let mut res = Step::forward_checked(start, count)?;
    if start < 0xD800 && 0xD800 <= res {
        res = Step::forward_checked(res, 0x800)?;
    }
    if res <= char::MAX as u32 {
        // SAFETY: res is a valid unicode scalar
        // (below 0x110000 and not in 0xD800..0xE000)
        Some(unsafe { char::from_u32_unchecked(res) })
    } else {
        None
    }
}

Unfortunately, this gives us an Option. Why? Well, you can see that from the code: What if last_char is the highest possible Unicode code point, 0x10FFFF, also known as char::MAX? We’re going to procrastinate handling this (admittedly rare) situation, and panic for now. Spoiler: Fortunately, there is a solution, which we will discuss later.

This is a great example of why Rust is great. Because this operation is defined to return an Option, we have to explicitly say what we’re doing in case it returns None. We don’t even have to have a unit test for 0x10FFFF code-points in our prefix to realize that we have to cover this case (although now would be a great time to write one).

Also unfortunately, we can’t directly call forward_checked … not if we want to use stable Rust, in any case. It’s marked as a nightly-only “unstable API.” Fortunately, however, we can access it indirectly, through the Range API. Some rooting around in the standard library reveals that nth, on an iterator on a closed range, calls forward_checked, yielding :

let last_char_incr = (last_char ..= char::MAX)
    .nth(1)
    .expect("XXX fixme: can't handle highest possible codepoint");

This actually works, with the caveat of handling char::MAX set aside. All my unit tests except my 0x10FFFF one pass. Altogether, here is the state of things: We have a prefixed function that uses this to call split_off with an appropriate value, without iterating through all the strings in range in the set:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0..prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                &prefix[0..i]
            };

            let last_char = last_char_str
                .chars()
                .next()
                .expect("last_char_str will contain exactly one char");
            let last_char_incr = (last_char..)
                .nth(1)
                .expect("XXX fixme used highest possible codepoint");

            let new_string = format!("{rest_of_prefix}{last_char_incr}");

            return Some(new_string);
        }
    }

    None
}

pub fn prefixed(mut set: BTreeSet<String>, prefix: &str) -> BTreeSet<String> {
    let mut set = set.split_off(prefix);

    if let Some(not_in_prefix) = upper_bound_from_prefix(prefix) {
        set.split_off(&not_in_prefix);
    }

    set
}

Cleaning Up the Edge Case

OK, now that we’ve got something that (kind of) works, it’s time to do some clean-up.

So, first, of course, we should address the XXX fixme, the 0x10FFFF case. So what do we do in that case? Well, if we use X to stand in for this “highest code point character”, we can reason about it a little.

Let’s say the prefix is "deX". In order for something to be out of the range of the prefix, it can’t start with "deY", as there is no 'Y' character greater than 'X'. So, it would have to differ on the previous character. It would have to start with "df" or greater.

So, if our prefix ends with this special character, we can simply drop it, and move one character back, and increment that character instead. Strangely enough, that just means going through our for loop again (and no, I did not plan this). See, if we keep going backwards to find another character to increment, we’ll get the previous character. Our way of extracting characters from the suffix works even if there’s more than one character in the second substring – it’ll just get the first character, which is exactly what we want.

So we can actually write:

let Some(last_char_incr) = (last_char ..= char::MAX).nth(1) else {
    continue;
};

Adding some comments to explain, and adjusting existing code to no longer lie to the reader (last_char_str might now contain more than one character) we get this:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0..prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                &prefix[0..i]
            };

            let last_char = last_char_str
                .chars()
                .next()
                .expect("last_char_str will contain at least one char");
            let Some(last_char_incr) = (last_char ..= char::MAX).nth(1) else {
                // Last character is highest possible code point.
                // Go to second-to-last character instead.
                continue;
            };
            
            let new_string = format!("{rest_of_prefix}{last_char_incr}");

            return Some(new_string);
        }
    }

    None
}

If our string contains only copies of this highest possible code point, this returns None, which is appropriate because there will be no strings greater than the strings prefixed with these characters, just like there’s nothing that comes after names that start with “Z” in alphabetical order, nor anything that comes after names that start with “Zz”.

Note that if we want to save the other sets that are created by split_off, we can. We can easily modify this function to return all three sets: The set of keys that come lexigraphically before the prefix, the set that starts with the prefix, and the set of keys that come after the keys that start with the prefix.

Performance

This code certainly hasn’t been optimized to the fullest extent possible. In such a case, we probably would want to do some more extreme optimizations, like working with Vec<u8> rather than Strings, and check if they were valid UTF-8 only at the point when it is necessary (if it in fact is necessary for our application). Or, alternatively, we might want to fork the standard library’s BTree implementation and actually add this operation. Both of these are gnarly, but if the absolute best possible performance was truly our goal, they would both be in scope.

But I am reserving that for a future blog post. Detailed profiling of different implementations of this operation would require that level of optimization to be fully interesting and is therefore also reserved for a future blog post. Instead, here, I will walk through some informal reasoning about the performance of this new implementation of prefixed, and whether it is also useful for iteration rather than splitting off a new set.

So, let’s do some back-of-the-envelope reckoning. In creating this upper bound, we had to reconstruct the prefix string, which costs us an allocation as well as a string copy. In exchange, we saved an extra call for find, which might have had to loop over many, many strings that start with this prefix. We can expect this implementation of prefixed to be more performant, therefore, in situations where there are many strings that start with the prefix (and the prefix is not pathologically long).

For iterating over the range, however, we would be making an allocation, and only potentially saving us some walking through the tree. Given that allocations are expensive (and potentially also involve some amount of walking around memory), it’s probably not going to be worth it unless the tree is extremely large.

A Warning unto the Test-Shy

In an earlier draft of this post, I had the following code to increment a char rather than what I wrote above:

(last_char ..).nth(1)

This seems like it should work, in spite of having no upper bound. It stands to reason that char::MAX would, in such a case, serve as an implicit upper bound. It does still return an Option<char>, and when would None happen if not in such a situation?

But fortunately, I had a test case:

#[test]
fn maxicode() {
    let set = {
        let mut set = BTreeSet::new();
        set.insert("Hi".to_string());
        set.insert("Hey".to_string());
        set.insert("Hello".to_string());
        set.insert("heyyy".to_string());
        set.insert("H\u{10FFFF}eyyy".to_string());
        set.insert("H\u{10FFFF}".to_string());
        set.insert("I".to_string());
        set.insert("".to_string());
        set.insert("H".to_string());
        set
    };
    let set = prefixed(set, "H\u{10FFFF}");
    assert_eq!(set.len(), 2);
    assert!(!set.contains("I"));
}

This test case, in that earlier code, actually panicked! It turns out that in the case of an open-ended range like (last_char ..), which results in a value of the type RangeFrom, it is simply assumed that going forward is possible. Instead of calling forward_checked, its nth method calls forward:

#[inline]
fn nth(&mut self, n: usize) -> Option<A> {
    let plus_n = Step::forward(self.start.clone(), n);
    self.start = Step::forward(plus_n.clone(), 1);
    Some(plus_n)
}

And in forward, every None is converted into a panic:

fn forward(start: Self, count: usize) -> Self {
    Step::forward_checked(start, count).expect("overflow in `Step::forward`")
}

Conclusion

I hope you enjoyed this walk-through. You can find the final version of prefixed and two test cases here.

Please let me know what you think of this format in the comments. Also let me know if you have any follow-up topics you want me to explore, or other problems you would want walk-throughs of.

And, of course, please feel free to provide corrections and even nit-picks!

There is No One True Best Programming Language (but some are still better than others)

2023-05-24T00:00:00+00:00

I am no stranger to programming language controversy. I have a whole category on my blog dedicated to explaining why Rust is better than C++, and I’ve taken the extra step of organizing it into an MDBook for everyone’s convenience. Most of them have been argued about on Reddit, and a few even on Hacker News. Every single one of them have been subject to critique, and in the process, I’ve been exposed to every corner, every trope and tone and approach of programming language ~~debate~~ religious war, from the polite and well-considered to the tiresome and repetitive all the way to the rude and non-sensical.

There are two tropes in particular that many times have been proferred to me (or rather, levered at me) about programming languages, two opposite errors that I would like to critique. I would say that I’d like to nip them in the bud, or respond to them once and for all, but I know the power of my blog is limited, so instead I’d just like to give my opinion on them, and explain why they are erroneous. Here are the errors:

There is one best programming language.
Every programming language has its place.

Error #1: There is one best programming language

Some languages have fans in the original sense of fanatic. Some languages inspire a level of devotion in programmers where they forswear other programming languages with an almost religious loyalty. These fanatics truly believe that the programming language is perfect, and that no other language can so perfectly capture the structure of computing and of algorithmic reasoning – or even be acceptable in light of the existence of a perfect programming language.

Any threat to this programming monolatry is then attacked as intrinsically irrational. After all, if everyone would just do the basic and obvious step of rewriting everything in this ideal programming language, then all bugs would be fixed. Then, “the wolf also shall dwell with the lamb, and the leopard shall lie down with the kid,” everyone will be immortal, and the messiah will come… And this, of course, is insufferable to normal people, who realize that programming languages are tools, not gods.

Rust, admittedly, brings this out in people. So does Lisp, and so does Haskell. And lest you think I’m exaggerating with the religious references, someone even wrote a Haskell book entitled To Kata Haskellen Evangelion, Biblical Greek for the blasphemous and hopefully tongue-in-cheek title The Gospel according to Haskell.

I know what you’re thinking; I can hear it in my head. You’re thinking: “You’re one to talk, Jimmy! The Coded Message is a Rust blog, and worse, a Rust evangelism blog! How dare you criticize when you’re one of the worst offenders?”

Nevertheless, in spite of what you might think, I don’t think Rust is the one true programming language. I think it’s ahead of other mainstream programming languages in terms of strong typing and functional features (key word “mainstream”), and I personally enjoy working on it full time, all true. But while I am a fan, I don’t think it’s perfect, or even unique in most of the ways it’s good.

Instead, I bring up this error for the reason I promised: Because it has been levelled against me. Early on, with my first Rust post, I wrote this statement (and see if you can see why it was controversial):

If you are a systems programmer, if you are used to C and C++ and to trying to solve systems programming types of problems, Rust is magical, just like when you learned your previous favorite programming language.

If you are not, Rust is overkill for your task at hand and you shouldn’t be using it. I earnestly recommend Haskell.

This got me quite a bit of anger on Reddit. One commenter was furious that I recommended Haskell, because they had tried to learn it in the past and had a bad time. Another tried to tell me I was being stubborn because the collected testimony of the Rust Reddit hadn’t somehow managed to override my 18 years of professional programming experience and convince me that garbage collection was not a necessary thing to have in a programming language sometimes.

And the key term there is Rust Reddit: There are some people there who think everyone should be writing Rust, even people who have every reason to benefit from a garbage collector and who have nothing to gain from the strictness of a borrow checker, because they think Rust is just the absolute best possible language. And the Rust sub-reddit does what any good echo chamber does, and brings out that vibe in every Rustacean.

But the echo chamber did not get me. Although I’ve moderated my opinion some – I’ve realized that there are some times where Rust beats out GC’d languages for applications outside of my narrow definition of systems programming, if only because it is both so mainstream and so successful at bringing in modern FP features – I still hold by my fundamental point:

Sometimes, indeed probably for most programming projects, Rust is the wrong choice. Just like I wouldn’t use Excel to do systems programming, I wouldn’t use Rust to keep track of splitting expenses on a trip.

Even for “serious” programming projects (whatever that means), sometimes, you simply do need a garbage collector. Sometimes, the semantics of Rust are too deep-cut or complicated to teach to the people you need to do your programming.

Heck, sometimes even existing infrastructure or existing legacy codebases or just existing skillsets are more important than what programming language features you have. Sometimes, Rust would take a re-write. And re-writing in Rust is not a panacea, or even always a good idea.

Error #2: Every programming language has its place

This one of course gets levelled against me far more often, especially in my Rust vs C++ debates. Most people realize programming languages are technical tools, and a skilled programmer can pick new ones up with relative ease. But some people act and talk instead as if, say, C++ programmers were an ethnic or religious group. If I call for the gradual deprecation and obsolescence of C++ in favor of Rust – while understanding that legacy code is a genuine concern that will be with us for decades – these people act as if I’m calling for crimes against humanity, saying Kumbaya-reminiscent statements like “All programming languages have their place.”

But of course, some tools are simply obsoleted by other tools. While Rust won’t serve your needs if what you really need is garbage collection, there are very few scenarios where C++ still beats Rust for new development. Sure, C++ has improved over time, but Rust doesn’t have a legacy to weigh it down, and so can actually do things right the first time.

Some people disagree with this in a way I respect, because of support for optimizing compilers, or the vagueness and immaturity of the semantics of unsafe Rust, or some other concrete reason where C++ has something to offer as a tool. Others simply live in worlds where too much code is in C++, and it would be impossible to migrate anytime soon, and that also makes sense to me. But I simply cannot take seriously an assertion that in some axiomatic way, reminiscent of the intrinsic value of all human beings, every programming language has its value.

Why should this be true? It’s not like which programming language someone uses is an intrinsic quality. I’ve changed from a C++ programmer to a Rust programmer, and so can you. Perhaps some of the people saying this are hobbyist programmers, asserting the right of people to enjoy C++ personally, and to program it as nerds. And that’s fair! But that’s also not what I’m talking about. I’m talking about what the best programming language is to use for projects that people will use in anger, where it matters whether a language is likely to lead to security vulnerabilities when used. If what programming language you use for such projects is a key part of your identity, then that’s not an OK way to structure your identity.

If it were true that all languages had their place and their value, does that mean that there should be shops writing in the obsolete versions of C++, like the original C with Classes? Does that mean that there should be shops writing code in INTERCAL? Does that mean that there’s some situations in which it’s best to do greenfield development in COBOL?

One example of this trope is the famous essay "‘Considered Harmful’ Essays Considered Harmful", which has of course been cited to criticize my own “Considered Harmful” post (for more on the “Considered Harmful” trope, see the Wikipedia article). Ironically but unsurprisingly, “‘Considered Harmful’ Essays Considered Harmful” is dogmatic in exactly the way it criticizes, in spite of giving itself a (silly and ill-defended) out. In spite of recommending that “considered harmful” essays be replaced by “benefits and weaknesses” lists, or even “perceived benefits and weaknesses” lists, it does not follow its own advice. It does not list benefits of the “Considered Harmful” essays it considers harmful.

So I will fill in this deficit. “Considered Harmful” essays are good when a feature of a tool does indeed cause harm, and a better option is available – as is often actually the case. The title is a cliché, which is a good thing in this case: it signals to the reader, in a light-hearted way, what the thesis of the document is – as opposed to “benefits and weaknesses” lists which tend to be biased in any case and can amount to passive-aggressiveness. Weaknesses in one’s argument or benefits in one’s opponents argument can and should be acknowledged and addressed, but that doesn’t mean you have to pretend not to have a position. Just because something has some benefit doesn’t mean that it can’t, overall, be fairly considered harmful.

Indeed, my own post did do some “benefits and weaknesses,” in spite of being titled as a “Considered Harmful” essay. It did spend some time explaining why C++ made the decisions they did, and what the benefits of C++’s decisions were, even in the context of a post about why these decisions were considered harmful. C++ had to implement non-destructive moves for backwards-compatibility. They had boxed themselves into them, harmful as they are. That doesn’t make them any less harmful, but it does make them understandable.

So I disagree with the people who have used that post to criticize me, and ask them why they don’t also turn the arguments of that post against itself. Perhaps I could write:

‘“Considered Harmful” Essays Considered Harmful’ Considered Harmful

The only problem with this would be how to punctuate it. That and, I’m sure it would widely be considered… quite silly.

Conclusion: Restatement and Summary

Programming languages are tools. They are important tools, so it’s good to make sure they are of high quality, and do the things we demand of them, because they are often asked to do critical tasks for society. They are also not to be conflated with the people using the tools, who can retrain on new tools if they’re worth their salt.

Tools should not be idolized, and tools cannot be perfect. It is impossible to make a tool that can serve any purpose equally well – programming language design, in particular, will always have trade-offs. However, it is possible to make a tool that loses to another tool in all categories, and that is what C++ will soon be in comparison to Rust, if it is not already there.

And C++ programmers have their place in the new Rust world – it’s very easy to learn Rust from a C++ background. And C++ history has its place there too – Rust builds on C++, and it wouldn’t have been possible without the contributions of those who worked on making C++ what it is. Everything that is community about C++, everything that is people, everything that has moral value, can be migrated to Rust.

But that doesn’t mean that C++, the tool, has a place in production programming beyond legacy (i.e. pre-existing) projects. Again, there still may be a few other valid reasons to favor C++ over Rust (though they’re getting fewer and weaker with time), but a bald assertion that “every language has its place” is not one of them.

x86S: A Long Time Coming

2023-05-23T00:00:00+00:00

Intel has just released a new white paper, where they discuss removing a lot of the legacy cruft of the Intel/AMD architecture they call Intel64. Only 64-bit operating systems – and a narrow set of 32-bit legacy apps that don’t use segmentation (a small subset in theory but basically all of them in practice) – will be supported. I am surprised at how excited I am, although after all this time perhaps the better word is “relieved.”

Finally, Intel computers will dispense with the illusion that the default mode is the DOS-compatible, 16-bit “real mode.” They will drop the conceit that modern memory protection, not to mention the ability to address more than 1MB of memory (approximately, yes I know about A20), is opt-in – which it currently, literally, is. All of the code to accommodate these legacy modes can be phased out. All of the circuitry and/or microcode to implement all of these legacy modes can be removed – though I’m sure Intel has had ways to keep it from doing too much damage, it definitely increased the complexity of their processors.

This is one of the biggest tech debt paydowns I’ve seen in a long time. I have long felt about Intel architecture somewhat analogously to how Richard P. Gabriel, author of “The Rise of Worse is Better”, felt about C++ and Unix decades ago:

The good news is that in 1995 we will have a good operating system and programming language; the bad news is that they will be Unix and C++.

Similarly, I have always felt that Intel architecture would become reasonable someday, that it would gradually convert itself to something less absurd than its traditional state. I was excited when AMD (not Intel, note) came out with what Intel now calls Intel64, getting rid of segmentation in 64-bit mode and adding 8 sorely needed additional general-purpose registers (for a total of 16).

Now, finally, they’re phasing out the legacy modes. No more DOS on a modern PC (and it wouldn’t work anyway for other reasons). Good!

Like many tech debt paydowns of this magnitude and this level of historical relevance, it’s about the cognitive burden as much as it’s about the actual implementation or the actual code and circuitry to work around the complexity. We can now, slowly but surely, forget the arcane details of how things used to be.

It brings me a tinge of nostalgia, actually. 16-bit DOS programming was where I first learned assembly, at least to read it. Segmentation and the different processor modes was firmly in my awareness when I used a DOS computer with Windows 3.1 as a child. I remember playing with the edge cases, like “unreal mode” which was like real mode but where each segment could be addressed with 32-bit registers. Knowing the complexity of Intel architecture was relevant, and part of how I learned computer architecture in general.

But more recently, all of this knowledge has seemed overpresent. Too many times I’ve seen people assume Intel architecture and bring these old irrelevancies of PCs into conversation and even formal talks, assuming familiarity with not just operating system and systems concepts but the Intel-specific details of them. They’ll be talking about registers and you’ll see that instead of generic names like r3, r4, they’re talking about specific Intel registers. Or they’ll mention cr3 instead of generically saying “page table base register,” or “the syscall instruction” or even the obsolescent 32-bit int 0x80 instead of saying “issuing a syscall through a trap.”

The biggest example is how often I hear people talking about “ring 0” and “ring 3” when they should be saying “kernel mode” and “user mode.” The numbered rings are so jarringly and gratuitously Intel-specific. It makes me wonder if they genuinely think all processor architectures number protection rings or privilege levels like that (they do not), or if they think the intermediate rings between 0 and 3 are still relevant to modern OS design on Intel (they are not). Or perhaps they’re just okay with assuming Intel, ignoring the mobile and embedded worlds, and also bringing in an irrelevant, overengineered concept while they’re at it.

Maybe this will stop now that Intel is eliminating the unused rings 1 and 2. Maybe people will stop occasionally talking as if protected mode was an exceptional mode, now that it won’t be a mode at all, but the only way the processor runs.

A New Garden: Rust vs C++ mdbook

2023-04-24T00:00:00+00:00

Here it is, the Rust vs C++ mdbook.

I’ve wanted for a while to re-organize some of the content on my blog into gardens. I got the idea from the blog post “The Garden and the Stream: A Technopastoral”. Basically, some content is ill-suited to date-based, time-organized systems like blogs. In fact, most of my content remains valid over a long period of time, rather than participating in conversation (with some exceptions), but rapidly becomes less discoverable after I’ve written it, as it is buried by newer posts.

If I want to have content that is useful in a long-term fashion, the blog is not the ideal structure. While you can always scroll down, or look through tags, a more refined system would be to store information in gradually evolving, more comprehensive documents, that are gradually augmented or refined over time, that is to say, a garden.

The About Me page on a blog is one example of this, but my blog series about Rust vs. C++ seemed like another one where I had a lot of material that could be better structured and more coherently presented in a single, hierarchical document.

So I’ve posted it as an mdbook, here. I don’t like to think of this as a “book” in a form that would ever be published on paper – it’s not long enough, interesting enough, or complete enough for that. That would also go away from the garden aesthetic, where it is a continuous work-in-progress that is always evolving. But I do think the mdbook format is better suited to the material than my existing blog series, for long-term access.

I haven’t incorporated all the material from my blog series yet, as some of the older material I think could stand a re-write. It is maintained in the open on GitHub, so feel free to give feedback there in terms of issues and even merge requests. It’s released under the CC license for non-commercial, attributed, share-alike use, with this license file.

While I will continue to try and integrate existing material into this garden, and expand on it when I am inspired to do so, I plan on not focusing on Rust vs. C++ going forward. If there are any substantial additions, however, I will update you on this blog.

Thank you for reading! More, different Rust content is coming soon!

Rust: A New Attempt at C++'s Main Goal

2023-04-06T00:00:00+00:00

I know I set the goal for myself of doing less polemics and more education, but here I return for another Rust vs C++ post. I did say I doubted I would be able to get fully away from polemics, however, and I genuinely think this post will help contextualize the general Rust vs. C++ debate and contribute to the conversation. Besides, most of the outlining and thinking for this post – which is the majority of the work of writing – was already done when I set that goal. It also serves as a bit of conceptual glue, structuring and contextualizing many of my existing posts. So please bear with me as I say more on the topic of Rust and C++.

Rust is a polarizing programming language, because of how radical it is. It has gone the furthest in introducing features from functional programming languages into the mainstream world, and ignoring long-held programming language design principles from the realm of object-oriented programming. Its fans can be very enthusiastic, sometimes off-puttingly so, stereotypically demanding that all software be rewritten in Rust even when completely unfeasible – a stereotype that is mostly untrue, but whose existence and occasional true examples shows the intensity of the debate. But a lot of Rust’s criticism comes specifically from C++ programmers, and correspondingly a lot of Rustaceans’ criticisms of other programming languages is directed specifically at C++, including mine. Even the creator of C++, while not mentioning it by name, entered the fray (and along with other Rustaceans, I responded).

There’s a good reason for this particular rivalry. While usable in other domains, Rust is strongest where C++ has hitherto been unopposed: as a high-level systems programming language. Many of Rust’s greatest strengths are directly based off of ideas originated in C++. And Rust has, in many ways, the same goals that C++ has. It can be argued – and in this post I shall argue – that Rust has the exact same overall goal that C++ does, albeit with a different interpretation of how that goal is best accomplished.

Zero-Cost Abstractions

C++ has an explicit goal of providing zero-cost abstractions.

This is a bit of a confusing term of art and has the potential to be misleading, but it comes attached with explanations that clarify it some. It is also referred to as the “zero-overhead principle,” which Dr. Bjarne Stroustrup, father of C++, explains (see pg. 4) describes as containing two components:

What you don’t use, you don’t pay for (and Dr. Stroustrup means “paying” in the sense of performance costs, e.g. in higher latency, slower throughput, or higher memory usage)
What you do use, you couldn’t hand code any better

There is also an executive summary of the concept at CppReference.com.

I, however, prefer the terminology of “zero-cost abstraction,” confusing as it can be, because it embodies a hidden third principle, that is unstated among those other two, and against which those other two principles are balanced. The word “abstraction” is the key, and the third principle is:

You can still get the abstractive and expressive power you expect from a modern programming language.

This third principle is necessary to distinguish higher-level “zero cost” languages like C++ and Rust from lower-cost languages like C.

To fully explain why I include this third principle, and to delve into the history of the concept in general, I want to talk more about C.

C: The Portable Assembly

C has often been described as a “portable assembly language.” Unlike other high level programming languages before it (“high level” at the time meaning anything higher level than raw assembly language), it exposed users directly to gnarly machine-language abstractions like pointers, and to common assembly-language capabilities like shifting and bitwise operators.

The goal was to give the programmer something minimally distinct from assembly language, where the programmer had almost as much control over the computer as an assembly language programmer without sacrificing portability. Few higher-level features have been added, even now: there was no built-in string type, and only a limited array type that exposed the underlying concept of pointers the instant you poked at it. Structures are little more than a way of calculating offsets, and memory management is done by explicitly invoking memory management routines.

C’s preference, in general, was to only add onto assembly those features absolutely necessary for portability, and not to impose any other structure on the programmer – or, said another way, not to provide any other structure to the programmer.

This was far from an iron-clad rule. And there are definitely exceptions: C, built into the programming language, prefers null-terminated strings (also known as “C strings”) to arrangements that use specific lengths, a substantial constraint on the programmer beyond assembly language and probably a mistake overall.

More deeply, and probably less avoidably at the time, C assumes a traditional call structure. Many techniques that can be used to implement closures, co-routines, or other more radical alternatives to a call stack are difficult to impossible to do with standard C – while generally being possible in any assembly language.

But, with these exceptions, C generally does tend to only provide one overarching abstraction, portability, and when it does, it has the same zero-cost goals that C++ has, to only make the user pay for the abstractions they actually use, and to provide abstractions as efficiently as the equivalent hand-coded assembly.

Put another way, C++’s zero-cost overhead principle, as Dr. Stroustrup defines it, is more or less inherited from C. Where C++ differs from C is in the “abstraction” part of providing “zero-cost abstractions.” Everything you can do in C++ you can do in (potentially tedious and repetitive and error-prone) C, but C++ provides more abstractions, beyond just what is necessary for portability.

C++: A More Abstracted C

This gives us a framework for understanding the entire goal of C++, and I would argue, of Rust. Once we understand that C++ is trying to keep the zero-cost principle of C, where abstractions do not come with a performance penalty (and where “zero” is a reference to the difference between the performance cost and a manual assembly-language implementation), but with the expressive and abstractive power of a higher-level programming language, everything else about C++ makes sense.

C++ was originally christened “C with Classes,” and it tried to add Object-Oriented Programming to C. All the mechanisms of OOP could be portably added to C directly by an application or library developer with judicious use of function pointers and structure nesting (and glib is a famous example of a library that does exactly that), but C++ built this abstraction into the programming language itself.

Objective-C also did this (and according to Wikipedia it “first appeared” one year sooner in 1984), but Objective-C has always felt like two programming languages glued together. In Objective-C, the object-oriented features do not inherit the zero-overhead principle from C – nor do they look like C at all. They look instead like a Smalltalk dialect, where switching between C and this odd Smalltalk dialect was permitted on an expression-by-expression basis using an odd mix of square brackets and @-signs.

In C++, the added abstractions, including OOP, take on more of a resemblance to C, and importantly, continue to try to retain C’s advantages in systems programming by making the new features zero-overhead.

During much of the history of C++, OOP was considered to be the most important abstraction that a programming language could offer. But once it was added, it expanded the scope of C++ abstractions. Nowadays, C++ is considered multi-paradigm, and provides not just OOP, but a wide array of abstraction.

Nowadays, C++ tries to keep up with other programming languages in what features it offers, to the extent that it can while being limited by the zero-cost principle. This is in sharp contrast to C, which continues to try to define existing features better and make them more rigorous within the existing feature scope. The only features C++ rejects out of hand are those that do not jive with zero-cost abstraction, showing that in actuality C++’s defining trait is to have the three-pronged concept of zero-cost abstraction that I introduced above, two prongs about “zero cost” and one about “abstraction”:

What you don’t use, you don’t pay for
What you do use, you couldn’t hand code any better
We give you the power of abstraction expected for a programming language of the day

This is why garbage-collection is not offered in C++ (though it is still possible to implement manually) – it cannot be offered in a zero-cost way. However, C++’s alternative to garbage collection, namely RAII, continues to become more effective as new features like move semantics and std::unique_ptr were added, to the extent that in modern C++, it would be unimaginable not to have those features, and they have become essential to C++’s memory management model.

These three goals explain why C++ keeps accruing new features, whereas C maintains the features it has. They explain why C++ had to add templates – as a zero-cost alternative to OOP, or a zero-cost way of implementing collections. They explain why C++ had to add move semantics – because without it, RAII is a worse abstraction than GC.

Rust: A C++ Redo

Rust simply does a better job at achieving these goals, because Rust gets to start from scratch, with the modern concept of what’s expected in a high-level programming language, rather than working forwards through time. And, in doing so, it avoids a lot of the mistakes that C++ made, and can design a language that includes all of the modern features together.

A full set of OOP features is no longer ideologically required, so Rust doesn’t offer them. Instead, safety has become a sine qua non, so Rust offers that (with an opt-out provision). One might argue that safety violates the zero-cost abstraction because of bounds checking, but that’s simply not true as defined. You only pay for bounds checks if you’re actually using the feature of safety – unchecked unsafe accesses are in fact available just an unsafe keyword away – and the feature of safety is implemented as efficiently as one would by hand (by inserting bounds checks into array accesses).

Similarly, C++ has learned that move semantics turn out to be essential in an RAII/value-semantics model to avoid spurious copy-and-deletes and/or indirections for e.g. storing std::strings in a std::vector that might be resized. Before move semantics, C++ often forced violations of the zero-cost abstraction principle by providing abstractions that would do extraneous copies or required extra indirections to use effectively, which is not what an assembly language programmer would ever write. However, since C++ move semantics were bolted on after the fact, it does them in a deeply confusing way, where Rust gets to reset and design itself for destructive moves from the get-go.

A Note on “the RAII Model”

In my RAII post I referred to C++’s alternative to garbage collection, centered on RAII, as the “RAII model,” and wrote that std::unique_ptr and move semantics were essential to this model. A Reddit comment later explained that I must be confused, because RAII pre-dates those features.

They had misunderstood me, and I stand by my statements, but I think it is worth some clarification. By “RAII model,” I mean RAII and other features which, when combined, provide an alternative to garbage collection. And the RAII model before C++11 did indeed lack features essential to competing with garbage collection. It was simply a worse model then, and much harder to use correctly in a complicated codebase.

In a similar way, I would say that in Rust, borrow checking and destructive moves are essential to the RAII model, because without it, the model is a much worse competitor to garbage collection. And yes, that does imply that C++’s concept of RAII is fundamentally deficient by not being paired with borrow checking, just like pre-C++11 RAII was fundamentally deficient by not being paired with move semantics and std::unique_ptr.

The alternative to garbage collection that C++ and Rust have built has been a work in progress through most of its history. Rust had to be a new programming language rather than an evolution for a number of reasons, but fixing C++’s lack of borrow checking and weird move semantics were some of the most important such reasons.

Backwards-Compatibility

Of course, C++ does have goals that Rust drops – and in doing so, it can do better at this core goal. The biggest such goal is perhaps also a trivial example: C++ has the goal of being source-compatible with earlier versions of C++, and even to some extent with C. This makes sense, as backwards-compatibility between versions is sort of a fundamental expectation of any programming language, certainly one that tries to provide a modern set of abstractions, but it does restrain C++’s development.

While Rust tries to be backwards compatible with itself, dropping compatibility with C++ has allowed it to get out of a lot of C++’s accumulated cruft of complexity, much of which is inherited from C times.

This accomplishes a lot on its own. C++’s syntax has gotten so complex over the years that many in the C++ community are doing their own resets of the syntax, including Herb Sutter’s cppfront and Google’s Carbon. Even if starting from scratch to accomplish C++’s goals was the only thing Rust did, it would still result in a much better programming language, more ergonomic and with fewer pitfalls.

Some criticize Rust by saying that in another 30 or 50 years, Rust will end up as convoluted as C++ is now. This criticism has confused me, because it seems possible, even likely, that this is true, but that doesn’t strike me as a reason to not (gradually and responsibly) switch from C++ to Rust (especially for new projects or for when rewrites are particularly called for). If this is true, that just means programming languages are subject to entropy and obsolescence like everything else. And in that case, C++ will just continue to get worse, Rust will also continue to get worse, and Rust will be better than C++ the entire time. If all programming languages accrue cruft as they age, in what world is that a reason to use the cruftier programming language?

Most Rustaceans are not, despite the stereotype, treating Rust as some apocalyptic, messianic programming language to end all programming languages. I wouldn’t be surprised if 20 or 30 years from now, a new programming language will emerge, accomplishing the same goals from a fresh start. And when that happens, I will probably advocate in favor of this new programming language just like I now advocate in favor of Rust.

The goal isn’t to have an eternally good programming languages; the goal is to have tools now. What should new projects be written in now? When a rewrite is called for (as it sometimes is), should it include a new programming language now that there is a viable alternative?

I suspect that many making this argument are including an unstated assumption – that C++’s cruft is actually a sign of its maturity, and fitness for production use. Alternatively, and a little more charitably, they might assume that Rust isn’t ready for production use yet, and by the time it is, it will be just as crufty as C++, perhaps converging to the same level of cruft. But while there are a few categories where Rust lags C++, they are mistaken in the big picture. For the vast majority of C++ projects, Rust is already a better option for if the project had to be rewritten from scratch (a big “if,” but irrelevant to the merits of the programming languages).

Rust Deficits

Rust has a few downsides compared to C++.

Interfacing with C is an important goal for reasons besides backwards-compatibility. On many platforms, C serves as a lowest-common-denominator programming language, and its ABI serves as an inter-language protocol. C++ does provide smoother interfacing with this protocol than Rust does.

Relatedly, C++ generally has a relatively stable ABI on a given platform for a given compiler vendor. This allows dynamic libraries to be used as plugins with minimal glue code, something that in Rust normally requires awkwardly working through a C ABI interface. Personally, I think machine-language plugins as dynamically loaded libraries are mostly a relic of past software distribution models, and haven’t seen many situations where they make sense, but I could think of a few edge cases.

In both of these cases, Rust is clumsier, but not completely incapable. Rust still can speak the protocol that is the C ABI, just not as natively and smoothly-integrated as C++.

Other downsides of Rust have to do with network effects and Rust adoption. There is only one Rust compiler, while there are multiple C++ compilers, that work together through a standards process. GCC is currently in the process of getting Rust support, and we’ll see how well that works out for Rust.

Similarly, there are a lot of libraries that exist in C++ that don’t yet exist in Rust or have Rust bindings. Though that’s true of any pair of programming languages, it is a specific reason some developers might still want to write new projects in C++ in favor of Rust.

Finally, while I still think Rust would be a better programming language than C++ even if unsafe code were allowed everywhere, I think Rust could do more to make its rules clearer in the unsafe realm. The fact that the latest research on Rust’s memory models seems so deeply difficult to square with how async code often works as in this bug report makes me nervous.

I’m sure there are other ways in which Rust is behind C++, and the devil is as always in the details. I’m sure I’ll find out about some of them as soon as I post this post.

Conclusion

This was all topics I’ve discussed in other blog posts, but I hope this brings some perspective on how I think about the programming languages in general, and provides a conceptual framework for thinking about some of my other posts. I was a fan of C++ because of its goals, and I’m now a fan of Rust because I think Rust pulls them off better. When I was skeptical of Rust, it was because I did not think Rust would pull them off better, but that was due to a misunderstanding.

Next Steps

I am considering using (a revised version of) this post as an introduction, and then trying to bring all of my Rust vs C++ content into an mdbook so it could be more of a garden. It would have a title like “Rust: A Better C++ Than C++” and be licensed under some CC non-commercial license, and it would accept MRs from other people as a community resource for consolidating resources on this particular issue. Then, if I had further ideas I could put them in there. What do people think of that idea?

I realize now that I write this that the repo where I already have the bones of this idea is actually already public. I think I’m going to restart from scratch with just a reorganization existing blog posts, and save the more ambitious ideas in those notes files for later. What do people think?

Guest Collaboration: Paradigm Shift

2023-03-28T00:00:00+00:00

Does the choice of programming language matter?

For years, many programmers would answer “no”. There was an “OOP consensus” across languages as different as C++ and Python. Choice of programming language was just a matter of which syntax to use to express the same OOP patterns, or what libraries were needed for the application. Language features like type checking or closures were seen as incidental, mere curiosities or distractions.

To the extent there was a spectrum of opinions, it was between OOP denizens and those that didn’t really think software architecture mattered at all — an feeble attempt of corporatization against true programmers and their free-spirited ways. The office park versus the squatters. That’s how we got the wave of so-called “scripting languages”.

But OOP was the least of their concerns. They shrugged along with some sort of class system, and save their criticism for (static) types and compilation (an implementation strategy, not language property).

Now, times are changing. When in the last 30 years have we seen so many concurrent pivots in major languages?

Perhaps it began with lambdas. Once, they were seen as curiosities from the functional world, a special case of an OOP class overriding a single method (which is exactly how you had to write them in C++ in Java). Now, Java has lambdas. Even JavaScript thought its function() syntax was too heavy, replacing it with a lighter-weight =>. Hold up, even Excel has lambdas. Functional programming has intruded against the mainstream consensus.

When this intrusion broke through, the old equilibrium cracked. Both the OOP consensus and scripting language counterculture started to crumble. Now, Javascript, Python, and Ruby are getting type checking. Java is getting a whole mish-mash of “functional” features. C++ is de-emphasizing inheritance and doubling down instead on templates. Even Go is getting generics.

So here we’ve reached a funny point. Before we had a bunch of languages which roughly did the same thing. Now we have the same bunch of languages all adopting the same features they never dreamed of having before. Within that cohort there is still little reason to adopt one or another, but over time there are clear reasons to choose the newer versions over the older versions. You might not care about Java vs Go, but you sure as hell want the version with generics over the versions that don’t.

So among 20+ year old languages, the choice of languages absolutely matters for programmers with time machines (or contemplating Debian stable), but what about for the rest of us?

Well, there are newer languages now mainstream (enough) too. And here we find the front of the pack, the language bringing functional features into the mainstream more completely and thoroughly than others (because being born with them helps): Rust.

There are other languages zooming out in front of the pack, leading Rust just as Rust leads the others. Being way out ahead is exciting. But it can be lonely. It might be cold. And you might run out of steam. Being at the front of the pack, the furthest along of the mainstream, is nice. You still see where we’re going better. You go there early. But you’re not alone; you’re shoulder to shoulder with others doing the same.

If that sounds nice, learn Rust. Don’t learn it as a mish-mash of exotic cool features. And don’t let it lull you into thinking you must do some sort of whiz-bang systems programming that almost no one does.

Learn Rust, idiomatic Rust, yes, for solving all the mundane problems you face in your programming life, but also to get a head start on what will be the next era of accepted programming practice. Learn type classes (aka traits) in their full power (and not just the object-safe ones), and learn how Rust’s move semantics can be used to simulate type-state.

These features might seem niche now, but remember, so once did lambdas.

Rust Tidbits #1

2023-03-24T00:00:00+00:00

This is a collection of little Rust thoughts that weren’t complicated enough for a full post. I saved them up until I had a few, and now I’m posting the collection. I plan on continuing to do this again for such little thoughts, thus the #1 in the title.

`serde` flattening

What if you want to read a JSON file, process some of the fields, and write it back out, without changing the other fields? Can you still use serde? Won’t it only keep fields that you know about in your data structure?

Turns out, you can parse the fields you want, while also just preserving the fields you don’t!

#[derive(Serialize, Deserialize)]
pub struct {
    pub known_field: KnownField,
    pub known_field2: KnownField2,

    #[serde(flatten)]
    pub unknown_fields: BTreeMap<String, serde_json::Value>,
}

I found out about this in the serde documentation, so it’s not an original insight, but it came in handy for me recently and so I’m trying to raise awareness:

`let` surprises!

So, in Jon Gjengset’s popular Twitter thread transcribed here, he wrote this:

Did you know that whether or not let _ = x should move x is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488

I didn’t think much of this, besides making a note to self not to use let _ = x to ever drop anything, which hopefully I wouldn’t have done anyway because drop(x) is much more self-evident in what it intends. I remember also vaguely hoping that it did drop, because in my mind that was the obvious, logical thing for it to do.

But then later, as I was writing a match, I realized why _ couldn’t mean drop, from the match context:

match foo.bar.baz {
    MyEnum::Option1(_) => {
        // This shouldn't move from `foo.bar.baz`, but just
        // inspects whether it is `MyEnum::Option1`. Otherwise, there'd
        // be no straight-forward way to perform that inspection!
        //
        // And indeed, it doesn't.
        None
    }
    MyEnum::Option2(ref baz_inner) => {
        Some(foobar(baz_inner))
    }
}

So, if let _ = x was to be consistent with this use case, well, that meant that _ has to not drop, as it’s important for _ to mean the same thing. And, after all, the left-hand side of a let is just another pattern context!

But wait, I thought! Does this mean that you can write let ref x = y;? Yes, it does. It’s just another way of writing let x = &y;… But just because you can write it that way, doesn’t mean you should. Keeping to idiom is important.

Nevertheless, fun fact! The more you know!

Remember: `serde` `struct`s Can Be Function-Local

Let’s say you need to extract three fields out of some JSON, like name, age, and phone_number (which, ironically, is a string in JSON terms, and not a number). One of the great things about Rust and serde is that you can just write those fields in a struct with the Deserialize trait (which is deriveable) and grab the values into such a struct, even if there’s other actual fields in the JSON:

#[derive(Deserialize)]
struct Person {
    name: String,
    phone_number: String,
    age: f64,
}

let person: Person = serde_json::from_str(json_str);

The question then becomes, where should Person go? Well, if you plan on passing around this Person value, and structuring the rest of your code in terms of it, then it should be a prominent type.

But more often, especially in my own code, I immediately split such a structure into its constituent parts, which I then will use for other things:

let Person {
    name,
    phone_number,
    age,
} = serde_json::from_str(json_str);

let handle = person_database.lookup(&name)?;
handle.set_phone_number(&phone_number);
let demographic = demographic_for_age(age.trunc() as u32);

This is very reasonable. It makes sense that our internal data structures would be designed for whatever logic we want to do on them, rather than having them coincidentally match the wire format. For most complicated applications, having the internal data format match the wire format literally is actually sort of a code smell.

So, we often will have types that we use to deserialize (and serialize) JSON in exactly one function. In that situation, the type should in fact be written locally to that function. So in the example above, where struct Person { ... } is immediately followed by the serde_json::from_str, I didn’t just write them next to each other as convenience. I would literally put them together in a function:

fn do_thing(json_str: &str) -> Result<()> {
    do_something_else()?;

    
    #[derive(Deserialize)]
    struct Person {
        name: String,
        phone_number: String,
        age: f64,
    }

    let Person {
        name,
        phone_number,
        age,
    } = serde_json::from_str(json_str);

    let handle = person_database.lookup(&name)?;
    handle.set_phone_number(&phone_number);
    let demographic = demographic_for_age(age.trunc() as u32);
}

I bring this up mostly because many programmers don’t seem to be aware that you can do this, or don’t think to. I’ve seen people write types like Person at the top level. I realize that many programming languages either don’t let you do this sort of embedding, or else strongly discourage it. But I’m a big believer in giving things the least scope they need, and for many serde-related types, that’s function scope.

Rust Shadowing

Speaking of minimal scope, I wanted to write in praise of Rust’s penchant for shadowing that allows you to not have to come up with a bunch of names for the same thing. Oftentimes, we just convert the same information from type to type: wire format in bytes, to parsed wire format, to application domain format (wrapped in an Option in a Result), to application domain format with errors and absence handled (not wrapped in those things… Fortunately, Rust lets us shadow and re-use names for these different variables, and ultimately we get code that looks something like this (although no type annotations are normally necessary):

let foo: FooTypeC = {
    let foo: FooTypeA = get_foo();
    let foo: FooTypeB = transform_foo(&foo)?;
    match foo {
        Some(foo) => transform_foo_again(foo)?,
        None => FooTypeC::default(),
    }
};

This is really helpful, along with the fact that braces { … } enclose expressions, in really minimizing how much scope each variable has. But it’s also really helpful, because if shadowing wasn’t available, what would we name all these different variables? foo_a and foo_b and similar stupid names? This is an issue in certain other programming languages where shadowing isn’t as straight-forward, and the results aren’t fun.

The Importance of Logging

2023-03-21T00:00:00+00:00

Intro programming classes will nag you to do all sorts of programming chores: make sure your code actually compiles, write unit tests, write comments, split the code into functions (though sometimes the commenting and factoring advice is bad). Today, however, I want to talk about one little chore, one particular little habit, that is just as essential as all of those things, but rarely covered in the CS100 lectures or grading rubrics: logging.

And why am I choosing this particular topic for a blog post today? Simple: It’s to punish an earlier version of myself for not logging enough, for not caring about logging enough. It turns out it’s important. But I’ll get back to the OOP blog series soon enough, don’t worry!

Logging – writing text describing what’s been happening in your program to a file or other storage system – is essential for any software system. Luckily, Rust has a (nearly) standard logging framework, technically outside the standard library but maintained by many of the same people and solidly endorsed by the community: the log crate. But note: Even though this post is written specifically for Rustaceans, much of the advice and commentary in here will apply to logging systems in all programming languages.

Logging is essential for debugging and troubleshooting. When you find a bug, you need to find out which specific part of the program is actually broken out of the many parts, because it’s often not the part that’s visbly acting weird. This is often the first step in addressing a new bug after reproducing it, or even part of figuring out how to reproduce it – or the step before that, so obvious it goes without saying, of noticing that a bug exists.

In fact, logs can be helpful at every stage of the debugging process. You have to confirm your assumptions on what parts are known to work. After all, the whole program is supposed to work, and often times, the thing that’s broken is something that you would’ve assumed definitely worked, until absolutely everything else was ruled out.

Every programmer understands this intuitively, even as a student or a beginning self-taught programmer: When you are developing a project, and it’s not working, the easiest ad hoc debugging technique is “debug print statements,” a go-to technique of CS100 students worldwide. Ironically, CS100 professors often advocate against this in favor of debuggers, in spite of the fact that logging, the grown-up version of debug prints, is more generally useful, as code often exhibits bugs in environments where it didn’t happen to be running in a debugger, like production.

Debug prints work, by accomplishing two goals:

Verifying that the program got to the point of that debug print line.
Verifying that the data it has at that point is correct.

Logging is fundamentally debug print statements, but phrased and annotated correctly, so that it looks professional both in the code and in the log, and uses actual logging mechanisms with timestamps and log levels and stuff.

So instead of:

initialize_rainbows();
println!("Got here 2");
initialize_sunshine();
println("Got here!!!!");

You write the much nicer-looking:

initialize_rainbows();
info!("Rainbows fully initialized");

initialize_sunshine();
info!("Sunshine fully initialized");

When To Log

You should log as much as possible.

Every time you make a decision, you should log it. Every time you query a URL or build a string of some kind, you should log it. Every time you load a config parameter, you should definitely log it. This might seem silly, because you’re duplicating the configuration file, but a bug processing configuration (or prioritizing different sources of configuration) can be especially hard to find.

Logging can be used instead of comments to organize functions into parts. If you feel the need to tell the reader of your code what each part of a function does, perhaps you should tell your poor ops person which parts you’ve reached in the same breath. So instead of:

fn close_out_section(self) -> Result<()> {
    // Flush dirty data
    for datum in &mut self.data {
        if datum.is_dirty() {
            datum.flush()?;
        }
    }

    // Close files
    for file in &mut self.files {
        file.flush()?;
        file.close();
    }

    decrease_global_section_count()?;

    Ok(())
}

You could write:

fn close_out_section(self) -> Result<()> {
    info!("Closing out section: {}", self.name);

    debug!("Flushing dirty data");
    for datum in &mut self.data {
        if datum.is_dirty() {
            trace!("{} is dirty, flushing...", datum.name);
            datum.flush()?;
        }
    }

    debug!("Closing files");
    for file in &mut self.files {
        trace!("Closing {}", file.name);
        file.flush()?;
        file.close();
    }

    debug!("Decreasing global section count");
    decrease_global_section_count()?;

    debug!("Section successfully closed!");
    Ok(())
}

These log statements serve both as comments to your reader and information to your administrator at the same time! And, since you are writing to someone who is perhaps not looking at the source code, you don’t feel silly adding even more information that’d be obvious to a reader – which is useful also to readers of the source code, who might not share your definition of what is obvious. In spite of what you may have heard, it’s still a good idea to err on the side of explaining things more in comments. (Yes, I linked that post twice. It’s that good.)

You may object that all this logging might slow down your process a little, and I can see wanting to avoid it in the middle of a computational loop. But oftentimes, people avoid logging when there is no possible performance excuse, when much slower I/O is happening all around it, in comparison to which the logging would be a rounding error. Remember that famous Donald Knuth quote: “[P]remature optimization is the root of all evil….”

Log Levels

In addition to performance, you might claim that the amount of logging that I show above is spammy, and that the resulting log files would cause an information overload. But our programming foreparents were wise, and created an additional tool to address both this, and the potential performance problems: log levels.

An error message is different from a warning is different from information is different from debug printing. We want to distinguish these, so we can avoid seeing insufficiently important logs. There are many systems of log levels, and Rust’s log crate endorses a pretty typical list, enumerated in its Level enum:

pub enum Level {
    Error,
    Warn,
    Info,
    Debug,
    Trace,
}

They form an ordered, descending scale of severity, so that Trace is the least severe. You probably always want to enable Error-level logs (though even they can be turned off) but you probably only want to enable Trace-level logs if you’re doing some serious debugging.

In recognition of how the levels are ordered, log filtering is typically done by setting a level, and then logs of that level or more severe are let through. So if the level is Debug, Warn logs are also outputted, but if it is Error, Warn logs are suppressed. See the LevelFilter enum.

Errors are for problems that stop the process, or at least the specific thing the process was doing (e.g. API or RPC request being serviced). Warnings are for where something seems wrong but we’re going to do it anyway.

Info, debug, and trace are honestly kind of just labels with decreasingly urgent-sounding names, levels for the sake of levels. You should use them according to importance, so that most of the absolute nonsense can get filtered out as mere trace, like implementation details or extra information. You also want the occasional interesting high-level stuff to be captured with info, like what high-level task is the process currently working on. Medium-level tasks can get debug.

In general, the more performance-critical the code, the lower the log level you want to use, to increase the likelihood that you’ll just have a (very predictable) branch to indicate that you don’t need to print that line. Then, if there’s an actual problem, an operator can raise the log level (which they can sometimes do on a per-module basis) when those lines are worth seeing.

As a corrollary, configuration should use info and warn heavily, and generally log at higher log levels. Configuration only happens once, and in one section, so it’s allowed to be spammy. Furthermore, raising the log level at run-time won’t help reveal more configuration logs: unless the configuration is re-processed, you’ve just already missed those messages. Finally, configuration is never too latency sensitive for logging – configuration is the least performance sensitive part of your program.

So there is no excuse. Loading different configuration than you thought you had is a shockingly common cause of bugs and confusing system behavior. Log obsessively in your configuration code, at high log levels.

Using the Log Crate in Your Rust Projects

So how do we log in Rust?

log is a framework – in the words of its well-written documentation, it is a “lightweight logging facade.” The front-end is shared: You output logs through the log crate itself. The backend is pluggable, meaning that different backends exist with different features.

As a result, as the documentation says, libraries should just use the log crate, so that when they output logs, it will work with any backend. Applications choose the backends, and import an appropriate crate, like for example env_logger. The log documentation has a list of available backend crates.

This split between what crates should be used by libraries as opposed to application is not uncommon in Rust. For example, it also comes up with error handling, where libraries should generally use thiserror to preserve error information in a way that applications can programmatically investigate, but applications generally want to use anyhow and eyre to ergonomically convey any errors they cannot handle to the user.

Write Everything Down (Part 4): My Desktop Environment

2023-02-28T00:00:00+00:00

I’d like to share with you how I use my computer, in a way that is (for me) ADHD friendly and well-suited for implementing my organization system. Tools are important to any organizational and productivity system, and optimizing your tools for your brain and your workflow are important. My computer is my most important productivity tool, where my work happens, and where my life/chore/errand/calendar organization happens, so it should be an interesting example of an optimized key tool.

Note: I consider this a non-technical post, as it is intended for a general audience. Even though it is about a computer set-up that I’m not recommending to a non-technical audience, this description and explanation of my computer set-up should be accessible enough for everybody. However, it is also literally about computers, so it’s going in the “Computers/Programming Posts” bucket as well – and therefore it will show up under both feeds.

It’s been some time since I’ve written about organization – I had basically paused the series until further inspiration struck. I had even outlined this very post, and considered writing in more detail about my personal computer usage, how my desktop actually looks, and the actual techniques I use to get this machine to work for me for programming, blogging, and planning. The reason I didn’t was basically because I didn’t think it would be interesting enough.

But inspiration did finally strike, in the form of two things that changed my mind and convinced me that there was an audience for this post, two things that happened very close together in time:

I learned that huge numbers of people were excited to hear about how somebody had optimized their arrangement of iPhone app icons on the Cortex podcast. This was a completely standard iPhone, running unmodified, not-even-jailbroken iOS – perhaps the least customizable, least interesting consumer operating system out there. If huge numbers of people were interested in how icons are arranged on iOS, and how that can be optimized for productivity and to match someone’s brain, people will definitely be interested in how I use my computers, which do not even use a normal user interface for Linux and are extremely customized to how I think.
Several friends of mine in rapid succession thought that my computer interface was worthy of comment to me or to others as a way of characterizing me. One friend even said, when I showed her how a few vim commands worked, that she understood why I used this for my organization files.

So I’ll start by taking a screenshot of how my desktop looks right now, literally as I write this, to use as a conversational starting point:

I know I’ve shown some screenshots in my last post, but this time we’re going to discuss it in some more detail.

It looks very … computer-y. Very low-level. Very much as if I’m doing programming, even though I’m actually doing blogging.

It’s not just the presence of the terminal, either, though just using a command line is considered to be advanced or even programmer-level computer usage these days. It’s the whole aesthetic. There’s no window decorations on either the left side of the screen where I’m editing my post, nor on the right side of the screen where I’m having a command line session – that is to say, no title bar, no minimize-maximize-close icons, no menu bars. Clearly, if I want to save the file I’m working on, I can’t go up to the menu and click File -> Save. And, actually, there don’t seem to be any places designed for clicking at all.

Along the top, instead of a start menu or a system menu or a dock of application launchers, I have a bunch of status information, formatted in such a way so that you have to know what you’re reading to understand it: one number highlighted out of several; the word Tall; jim@palatinate: ~/Writing/TheCodedMessage/conte..., which is the same text as my prompt in the terminal, and indicates who I am, what computer I’m logged into, and what directory I’m currently in (in the currently highlighted terminal). Then, what WiFi I’m connected to, my CPU percentage, memory usage, date and time, and battery status.

There’s not a single icon among these status indicators – it’s just a long line of text. Text, that goes well with the text of the blog post I’m editing and the text of the command line. I can see why sometimes friends refer to my computer interface as “not logged into a graphical environment” or “in text mode” or “in command line only mode” – even though that is actually a thing, and is literally not the situation my computer is in.

It’s a modern graphical login session! Here’s me using a web browser if you don’t believe me (Chromium is off-brand Chrome, made from the same source code):

And of course, I can also view videos with VLC or look at pictures with Eye of GNOME (yes, I can use GNOME components even though I don’t use the GNOME desktop environment), and in literal text mode, that wouldn’t be possible.

But I understand why people call my set-up text mode, and now that I’m paying attention, I see that in a very literal sense, there aren’t any images or icons at all on my screen right now, just text in various colors. That is an intentional choice, and how I like it, and it does have to do with me being a programmer (in at least being aware of my options and capable of configuring it), so fair.

So what is going on? Why does my computer look so text-y, even if it’s not technically text-mode?

xmonad

To be clear, my set-up is not typical of how Linux computers normally look. On Linux, you get your choice of desktop interface, of what software draws things like window borders and docks and start menus. Usually, people use ones like GNOME or KDE (or dozens of others), which look much more like macOS or Windows, with a normal amount of icons, and sometimes even futuristic, overly dynamic graphics. Here’s a screenshot of KDE from Wikimedia Commons to demonstrate:

But I instead chose xmonad, which is designed for things like minimalism, deep configurability, and keyboard control – and in general designed almost exactly for my priorities. My XMonad set-up is not that weird, for an XMonad set-up. Like any XMonad set-up, however, it is deeply customized to my particular workflow.

But before we get into my customizations and use of it, I’d like to talk a bit about why I prefer XMonad to other, more traditional desktop environments. It’s not to be weird or to show off my technical skills or even to communicate that I’m a programmer and a nerd – I actually don’t very much like that people think I’m programming when I’m actually working on a writing project, nor do I like that other people find borrowing my laptop intimidating. Instead, it’s about adapting to what I feel comfortable with, and what works well with how my brain works.

So the lack of distractions, the lack of icons, is actually very important to helping me focus, as is the simplicity of the interface. My ADHD doesn’t manifest by having my eyes be regularly pulled away to where the icons are because they’re pretty – or at least, if it does I’m not aware of it. But if there is a dock of icons on the screen, my awareness that the dock is there can be a distraction to me, taking up precious space in my brain of very limited short-term memory that could be better served juggling the other things going on in the computer. This distraction even happens on macOS, even when the dock is hidden – I have to be aware of it so I know not to move my mouse to the bottom of the screen, or that if I do, I will suddenly see icons.

The title bars that typically line the top of windows are such a distraction, as are the menu bars (with File, Edit, etc.) that give you a list of things to do. If I were designing an operating system UI from scratch – which I have often fantasized about – the menu would show up as an overlay on top of the window when you pressed the [ALT] key, and a list of available keyboard shortcuts would show up when you pressed and released [CTRL], reminding you that paste, for example, is Ctrl-V.

Back in real life, I also don’t have menu bars on my machine for my most commonly used apps. But the replacement, unfortunately, isn’t an overlay, but simply knowing the relevant commands for both gvim and terminal, both literal commands, and keyboard and mouse gestures, like Ctrl-D to log out or middle-click to paste the last thing you highlighted – because I find Ctrl-C/Ctrl-V too tedious and prefer copy-and-paste through the “secondary clipboard” Linux supports: highlight and middle mouse click, or three fingers on my laptop trackpad.

The streamlined simplicity allows me to just see the text of the actual app I’m using. It reminds me of math textbooks. I prefer math textbooks that just are about math. I saw a math textbook for the high school level once that was full of pictures of youths doing math, very visually busy, lots of stuff going on. I thought to myself, I don’t know how long I could read this book, not because I would jump from thing to thing, but because I would try to extract the actual math out of it, and filtering out the rest would be well-nigh impossible, and quite fatiguing.

Thus, xmonad lets me choose exactly what goes on the screen. Even xmobar, the system status bar across the top with all that status information is optional – you can make it so that it appears and disappears based on a keyboard shortcut, or leave it out altogether. And certainly, no panel of icons – if I want to start a program, I have a keyboard combination to start the terminal, another to start the browser, and another to type in the name of a program I want to run (which I could also do, of course, from the terminal). The iOS equivalent would be to have one icon for Safari, and besides that to literally always use search to find your app, with no icons visible and an empty home screen.

One thing that I like about iOS, however, is also true of xmonad: when you start a program it takes up the entire screen. For the life of me, I don’t understand what I ever saw in having different windows that could overlap on your desktop. What were you doing with the empty space? Why was it so essential to be able to arrange the screen any way with enough work? Isn’t it more important to be able to have the screen in the configuration you want consistently?

In macOS, if I want a window to be full screen, that’s easy enough – but it’s still not the default, even if it’s the only window. However, if I want multiple windows to be tiled, then I have to do so many steps. The cost of the flexibility of freely moving window arrangements around is that the one I do want is harder.

In xmonad, when I open a window, it takes up the whole screen. If I open a second window, they split the screen. I can use key combinations to adjust which one is on which side, or to switch from left-right tiling to top-bottom tiling, or to move the dividing bar left or right, but most of the time I can just immediately use it.

I can also use a key combination (⌘-TAB – it would be ALT, but I have ⌘ generally configured to replace ALT) to switch which window is focused, but I usually use the mouse for that. I have focus-follows-mouse enabled, so I don’t actually have to click the mouse before I can start typing in the newly-focused window.

If I open a third window, then, it works perfectly how I like it: arranged so I can see all three:

More than three windows is similar to three – but I don’t let that happen normally. I stick to three windows per screen, or specifically, per virtual desktop.

Virtual Desktops

Virtual desktops are a key component of how I use my computer. macOS has the feature as well, described by the less techie-sounding name of spaces (and it appears that in that context, it’s also pretty easy to set up split-screening, which is good news). Virtual desktops are like having multiple full-screen windows that you switch between, except that each virtual desktop can have multiple windows on it. In my context, it means I never have more than three windows on a screen at a time, but I have multiple sets of three windows that go together that I can switch between, in my case indexed by number.

If I want to go to virtual desktop 1, I press ⌘-1 (where ⌘ is the command or logo key, a Windows™ logo on my keyboard even though I bought this computer from Dell™ with Linux™ pre-installed). To go to virtual desktop 3, I press ⌘-3. The currently available virtual desktops are shown on my status bar, with the currently showing one highlighted in yellow – if they weren’t, I would probably have forgotten about windows left in other virtual desktops when I first started using them. In the screenshot above of three windows, you can see that I am working in desktop 4. There are also windows on desktops 1, 2, and 5, but none on desktop 3, which is why there is no 3 shown. They go up to 9, or at least 9 that are accessible by that keyboard short-cut in my current configuration.

If I want to move a window from one virtual desktop to another, I just need to type ⌘-Shift-N while hovering my mouse over the window, where N is the desktop I want to move it to. Sometimes, the windows come out in the wrong arrangement on the new desktop, but I can use ⌘-Enter to switch them.

Virtual desktops are key to my workflow and my focus, because each one corresponds to a mode of using my computer, a type of action. I can switch between them, but while I’m within one, the only indication that others are available is up in the status bar.

I use specific virtual desktops for specific tasks on a permanent basis. When I have not recently been doing the task, there might be no windows in them, but when I want to do that task, I switch to that virtual desktop and start windows there. This keeps information about what is where in my long-term memory, as a fact about how my system works, rather than in my prospective memory, which as I’ve discussed is far more problematic.

To be specific, this is what I use each virtual desktop for:

Desktop 1: Browsing

Desktop 1 is a full-screen browser session. If you look at my web screenshot (also displayed above), you see that I am on desktop 1.

This is the only place I put a web browser window; you don’t need more than one because of tabs. I will occasionally also move a terminal or editor window to this desktop, if I need to type something into the terminal directly from a web browser, or manually retype text based off of what I’m reading there, but this is rare. Similarly, I will occasionally split-screen two web browser windows for the same reason – but only for as long as I need to see both pages at once.

I don’t use tabs as heavily as some people. I don’t relate to the ADHD person with hundreds of tabs open. I generally have Slack, e-mail, and then whatever exact thing I’m using the web browser for. If this is programming, and I’m reading documentation or troubleshooting an issue, that might be multiple tabs deep (e.g. of different but related documentation, or of documentation and source). And occasionally I’ll absent-mindedly find myself going on a tangent. But besides Slack, and sometimes e-mail, I close the relevant tabs as soon as I’m done doing the task – tabs are transient.

When I do read documentation from the web to write code, I do fully switch desktop environments as I write the code vs reading the documentation.

I don’t like that this is how I access my e-mail. I would prefer to have it set up with a TUI-based system, while still syncing with the GMail app on my phone. I know I can do that – I’ve done it before – but I simply haven’t gotten around to it.

One final note: To help me maintain focus, I do have a blacklist of websites I don’t let myself go to, implemented through /etc/hosts. This doesn’t actually restrict me, because I can always go to those websites on my “unproductive” computer (mostly for Netflix), or on my phone. They do, however, prevent me from going off the rails and drifting into a Reddit rabbit-hole when I’m supposed to be working. I can always unblock a website if I (temporarily or permanently) do need to access it from one of my primary computers.

Here’s the blacklist, all the domain names that my computer resolves as referring to localhost, my local computer, rather than the actual IP address of my server. Here’s all the websites the browser will therefore fail to connect to:

127.0.0.1       facebook.com
127.0.0.1       www.facebook.com
127.0.0.1       quora.com
127.0.0.1       www.quora.com
127.0.0.1       twitter.com
127.0.0.1       www.twitter.com
127.0.0.1       news.google.com
127.0.0.1       etrade.com
127.0.0.1       us.etrade.com
127.0.0.1       www.etrade.com
127.0.0.1       reddit.com
127.0.0.1       www.reddit.com
127.0.0.1       news.ycombinator.com

Desktop 2: Coding (Primary)

This is where I look at and edit files in the repo and project that I’m currently working on. I have a terminal open to the project directory, and normally two gvim windows – gvim is my preferred text editor – open to files within that project. The large full-height space is for the file I’m editing, the smaller space above the terminal for a file I’m referring to, but within the same project. If I want to edit the other file instead, I switch them so that gvim window is the new tall one – there’s a keyboard shortcut for that. The terminal stays on the right, and in the case of multiple windows on the right, the terminal stays as the lowest.

I continuously open and close new gvim windows, which is part of why I use gvim – it loads fast enough for this to be a viable strategy.

Desktop 3: Coding (Secondary)

Sometimes, when you’re working on a project, you need to know how something’s done in a different project. Perhaps you need to know an implementation detail of a function you’re calling, or maybe just the interface. Perhaps you know the other project did the thing you’re trying to do, and you need to see how they did it. Perhaps you suddenly realized you can’t have X dependency, and now you need to know if Y depends on X.

Sometimes this is a different internal project, sometimes it’s an open source project you need to download off GitHub. But it’s a different repo, with a working copy in a different directory, and that means that I have a different virtual desktop for it, with a different terminal in that directory.

There, I mostly do reading, but I can also do editing in a pinch. For example, if I need to make a change that straddles two repos, the application (for example) will often be in desktop 2 and the library in desktop 3. If it straddles 3 repos, I either switch which repo desktop 3 is used for (it is only used for one at a time) or I spill over to desktop 4 as a tertiary coding repo, as a non-standard use of that desktop. I usually feel vaguely uncomfortable when I do that, though.

Desktop 4: Blogging

I’m on desktop 4 right now as I’m writing this, because that is the desktop I use for blogging – and most other forms of prose writing (though specifically not documentation for work, which counts normally as part of a coding project).

I blog just as I program. I use gvim to edit text files. I use a terminal to open the right text files, list which text files are present, keep git up-to-date with what I’m working on, and build and deploy my blog. In this case, by “build,” I mean translate it from a directory full of Markdown files into a website, which I then upload to my server.

Here’s a screenshot of me editing the markdown for this post in the left window, and trying and failing to run my build-and-upload script in the other folder (which refused to upload as I hadn’t synchronized my files with GitHub yet):

I prefer editing my blog as a bunch of plain text files on my computer. It gives me a sense of control that I would not get if I installed Wordpress on my server – or used the official Wordpress. It allows me to use gvim to edit them as plain text, which I refer to WYSIWYG editing.

Generally, I’m only working on one file and so I have a terminal window and single solitary gvim window, rather than two or three gvim windows. It only makes sense to work on one file at a time in writing normally, unlike in programming where there’s intricate mutual references. Occasionally, for a blog series like this, I will open a previous part of the blog series to see how much I’m repeating myself.

Desktop 5: Organization

You might notice, however, that in none of the other desktops do I describe having any of my organizational files open. I have detailed organizational files, which I edit in gvim, and discussed in detail in the previous post. And as you can see in the sample screenshot from that post (reposted here), this organizational system lives entirely on desktop 5:

I do not have the complete list of things I have to do hanging over me while I’m doing each thing, only when I’m planning. Instead, when I reach the end of whatever I’m working on – or, as often happens, when I find I’ve generated a new TODO item that I want to write down but not yet fully switch my focus to – then I switch to virtual desktop 5 to interact with my TODO system. When I switch back, with a new task or with the idea safely written down, I can then (more) fully focus on my task without worrying about other ones.

Desktop 6: Signal

When I run the desktop version of signal, which I do sometimes, it runs on its own virtual desktop, namely desktop 6.

Desktop 7: Long-running processes

This is where I put VPN sessions, if they’re tied to a terminal window. It’s also where I put some very long-running builds or locally hosted servers.

Editor and Terminal

I’ve already discussed in a previous session how I use the web browser. Occasionally, I use a variety of other random graphical programs: an image viewer, a PDF viewer, or a video player. But most often, the two types of windows I have open besides the web browser are gvim, a text editor, and alacritty, a terminal emulator.

Both of these tools are primarily used by computer professionals of some stripe, so it’s a little unfair of me to bristle when people see them – also without any icons on the screen – and assume I am a programmer. I do have specific reasons for using them for non-programming tasks, that match my habits well, so I’d like to discuss them further.

Both of them are tools that require substantial investment in skill. Obviously, to use a terminal, you have to know commands. You can’t discover the interface like you can with a series of menus, or settings pages, or icons. Similarly but less obviously, gvim, like any version of vim, is close to useless to anyone who doesn’t know it. Both of them require reading documentation in the form of a book (or website) to explain to you what to do and at least get you started.

But I did all of that investment years ago, as a youth, and it’s been paying off ever since – to the point where if I try to edit text, or navigate file systems, without these tools, I feel substantially hindered.

Vim

I start with gvim because it’s the more relevant to my organizational particularities. It’s a text editor, which means that unlike something like Google Docs or Microsoft Word, it edits plain text files, files that just have sequences of characters organized into a sequence of lines. Characters can include Unicode – including accented letters, Chinese characters and emojis – but not styling like bold and italics.

Text editors are important to programmers because programming is done via collections of plain text files, and so text editors are universally useful tools for handling all of them. Rather than each programming language having its own special file format requiring its own special editor, text files allow programmers to bring their preferred text editors with them to a variety of projects, thus allowing a deeper investment in the skill of using the text editor.

Even this blog, which is not a programming project but a writing project, is maintained using text files, using Markdown, a format which interprets *italics* as italics and **bold** as bold, and Hugo, a software package that converts a hierarchy of Markdown-formatted plain text files appropriately into a website. And for Markdown, just as for any programming language, I can choose any text editor I want to, and it will be compatible.

This choice, the choice of text editor, can be greatly personal to a programmer. The rivalry between two major text editors from earlier eras of Unix, vi and emacs, was often referred to as a holy war for how intense the fights about it would get on Usenet (an old discussion forum that ran on an old pre-Internet network). gvim, which is the text editor I use, is a form of vim, which is a form of vi, so I have a definite position in that holy war. And I’m sure I’m going to hear from people who disagree with my position in response to this blog post!

While my gvim window looks like a terminal window – and vim can indeed run inside of a terminal – it’s actually a separate graphical application. That is what the initial g stands for, “graphical.” When I edit a file, I want a new window to be opened, and I also want to be able to use the mouse to click on a location on the screen and move my cursor there.

vim, like many of the tools I use, is optimized for expert use, rather than discoverability by beginners. It’s designed to be a skill to be invested in: I put in the effort to learn how to use it a long time ago, and it pays off over a lifetime. The commands I can make from my keyboard are more powerful than most computer text editing facilities can support, allowing me to with a few keystrokes perform complex manipulations of the text.

This is essential, in my mind, for efficient programming, which is why I put the effort in to learn it. However, it is also particularly well-suited to my organizational files, which, if you remember from my previous post, consists of plain text files with lots of highly-nested bulleted lists, like this outline for this section of the post:

* gvim
    * Text editor
        * Plain text and website generation
    * vim
        * But not terminal vim
        * Still has separate "window"
        * And can use mouse if necessary
    * Line-based editing good for organization
        * Commands work on lines
            * Delete
            * Paste last delete
            * Select multiple
            * Shift indentation level
        * Org-mode style use of hierarchical bullet points
            * Perfect match for those commands
            * No notes longer than a line
                * Make it more hierarchical instead

When I edit plain text files in this format – a custom habit inspired by Org mode but still compatible with Markdown – it’s important for me to be able to operate on the scale of entire lines. And operating on entire lines is one of vim’s strongest points! dd to remove a line, p to insert the line back in, and relevantly for hierarchical bullet points, << and >> to change indentation! Using V, I can select multiple lines, and then use <, >, or d to change indentation or move them! Meanwhile, j, and k, right on the home row, move down and up through the file, line by line, respectively.

This equates to removing tasks (when they’re done or no longer wanted), moving tasks between different places in the hierarchy (which I do shockingly often), removing or adding levels of hierarchy, and other such common operations on a hierarchical list.

Now, you may wonder how, if typing dd deletes a line, how I type a literal dd. Well, dd deletes a line in normal mode, but if you type o, it opens up a new line in insert mode, so that your letters are interpreted as letters again – until you are done inserting what you had to insert, and hit [ESC] to return to normal mode.

One of the ways you can tell you’re a proficient vim user is if you keep the system in normal mode any time you are not literally typing. Typing tends to be bursty anyway, and evenly interspersed with editing and navigating – at least in programming, and in my use case, also with writing.

But it is hard for a newbie. Every once in a while, even I find myself inserting an editing command as text by accident, or running random commands trying to type text while I’m actually in normal mode. When you’re new to vim this happens all the time. It’s decidedly not beginner-friendly.

But most of your time at a text editor – especially if you’re a programmer – you won’t be a beginner. And for me, I’m extremely used to it – and frustrated when I have to write text into a non-vim interface like Google Docs, or an especially long Slack message. That, and, I do revise just as often, if not more often, to how much additional text I type – I need those commands, and the ones I listed are only a brief sample.

Terminal/Command Line

This is probably the most interesting thing to many of my readers. Many readers my age or older remember DOS and the DOS prompt, and having to use the computer from the command line. For some of them, the only commands they knew were those to launch their games, or to launch other tools from which they would do their real work – the command line was fundamentally just a launcher, a menu, albeit one that didn’t list the options. Others may have simply used it to launch Microsoft Windows, by typing the win command, a usage pattern so common that Microsoft made it the premise of Windows 95, and skipped the whole “DOS” step, even though it was still present as a weird operating system layer and as a boot stage until Windows XP finally rolled out a modern Windows.

So I have some misconceptions to address about the command line that come from that perspective.

First, a modern command line is not DOS in a window. It’s certainly not on Linux or macOS, where it’s more visibly different, but it isn’t even DOS in modern Windows. The Windows command line might look like the DOS command line, with its famous prompt C:\>, but it is a modern Windows application that is used to launch modern Windows applications. No DOS involved, just a different interface mode.

On a related note, the command line, even on Windows but especially on macOS or Linux, is a modern user interface. It can do things that involve the Internet. It can make web requests, download and send e-mail, synchronize files, and do things that DOS couldn’t do.

However, on the flip side, it is not true that the command line can do everything a graphical user interface can do. It’s comparable, but it’s simply not identical, as should be obvious if you realize that it’s impossible to watch a video from the command line. You can use the command line to launch a video player, but the video player remains graphical.

And while it is true that the command line allows you more control over the operating system settings and file system, this is more an accident of graphical user interfaces trying to be “user friendly” or having limited room for options, rather than anything intrinsic. You may have heard of graphical user interfaces described as a layer or façade on top of the “underlying” command line, but that is a misconception. Graphical programs and command line programs have the same access to operating system facilities, except for user interface.

The command line does, however, have a more power user-friendly aesthetic. Like vim, it requires investment to use effectively – to use at all. And it is closer to the operating system in that by convention, it exposes as much control of it as possible, and its conventions were established in the 70s, before the modern concept of user-friendliness was really invented. This has been written about at length in many places, and one of my favorite (book-length) essays about it is Neal Stephenson’s “In the Beginning was the Command Line”.

Enough about what the command line is (and isn’t)! What do I actually use the command line for, then?

Well, the command line is an entire interface into the computer, used by many programs and utilities as the way to interact with them. And I do use it for basically all of the things I do on the computer that aren’t web browsing, text editing, or viewing various graphical-only files (like PDFs, images, or videos), and there’s some variety there.

Primarily, I use the command line for file management. I use the classic Unix tools for listing files (ls for list) and navigating directory hierarchies (cd for change directory). I use git to sync code and writing across computers and make sure it’s backed up somewhere. I use wc (for word count) to see how many lines of code or words of writing I’ve written. I use bc (basic calculator) to do back-of-the-envelope math.

I prefer this to graphical file managers. Not only do I not trust them – I’ve seen Finder crash relatively recently – they change all the time. And the changes are not good, and usually serve to hide the actual directory hierarchy and instead impose an organizational system on you. Instead of seeing directories inside your home directory, you see stuff like “Music” and “Downloads,” “Documents and “Movies.”

Usually, when I use a graphical file manager, I know where the directory is in the file system, but then I have to translate it to their list of commonly used directories, which assumes I keep loads of movies and photos on my computer, but can have all my “documents,” whether legal documents or writing projects, in one directory? Where is my home directory? What if I want to organize my files in a different hierarchy? Can I just navigate to it from my home directory, please? If you want to put fancy icons on subdirectories of my home directory based on their names, that’s fine, but please list all the directories within my home directory, thank you very much! Not just the pre-defined things you think I ought to have, like “Music” – this is my work computer, I listen to music on my phone.

So you can see why I prefer the straight-forwardness of the command line to Finder or Windows Explorer.

I also use the command line to actually do writing and programming work, not just launching gvim – once I’ve navigated to the file in my complicated directory system – but also running compilers and build scripts to turn program source code into programs, and then running those programs, almost all of which can be controlled entirely from the command line. I log into other computers I maintain, both embedded devices and servers, and do work on them. I run scripts that run hugo to turn my Markdown files into a website and post it on a server.

I also use it for system administration: apt for installing files (I use Ubuntu – I’m not trying to be a hero of sysadmin) and systemctl and all of those gnarly commands for other sysadmin stuff. But of course, the most powerful system administration command is just the text editor – by editing configuraton files, you can accomplish a lot.

All of this is easier and more focused than if I were using the graphical equivalent. I write my command and I run it, without having to go through all the tedious boring steps of a GUI wizard. It’s faster with fewer steps, with the penalty of accumulated life expertise – which is to say it’s easy on my perspective memory at the expense of my retrospective memory, which is to say, aligned to how my brain works.

And yes, I do occasionally have to look up how to do things – though that’s more in programming than in writing. But having a graphical user interface doesn’t save you from that, and if you think it does, you’re fooling yourself. At least when I look up how to do things, I get suggestions for commands I can directly type in, rather than having to go through 10 screens and dialog boxes and search them for whatever it is the poster’s talking about, only to find out I’m using a different version of the GUI, and that the directions became obsolete in the 2022 edition of Windows 10, or some other such thing.

To reiterate, it turns out that a deep enough hierarchy of dialog boxes and settings pages is just as complicated as the command line – but usually less powerful, harder to document, and more subject to arbitrary change. Just give me the command line!

Conclusion

If I were to summarize some themes of my user interface decisions, it would be in these three inter-related points:

Don’t use condescending, corporatist concepts of “easy to use,” because they’re more focused on the appearance of ease of use, or most charitably stated, not intimidating the user, rather than actually making it usable for an expert user for a wide variety of actual tasks.
Use systems that emphasize the long term power user over the short term newbie. They will often have a learning curve, but it will pay off.
Use systems that are customizable, so that I can use them my way.

But this is all for my work computer, where work is both writing/blogging and programming. For goofing off, I have a MacBook Air M1, which I use in macOS as a glorified tablet, and that is perfectly fine for watching Netflix and YouTube.

Rust Is Beyond Object-Oriented, Part 2: Polymorphism

2023-02-07T00:00:00+00:00

In this post, I continue my series on how Rust differs from the traditional object-oriented programming paradigm by discussing the second of the three traditional pillars of OOP: polymorphism.

Polymorphism is an especially big topic in object-oriented programming, perhaps the most important of its three pillars. Several books could be (and have been) written on what polymorphism is, how various programming languages have implemented it (both within the OOP world and outside of it – yes, polymorphism exists outside of OOP), how to use it effectively, and when not to use it. Books could be written on how to use the Rust version of it alone.

Unfortunately this is just a blog post, so I cannot cover polymorphism in as much detail or variety as I want to. I shall instead focus specifically on how Rust differs from the OOP conceptualization. I will start by describing how it works in OOP, and then discuss how to accomplish the same goals in Rust.

In OOP, polymorphism is everything. It tries to take all decision-making (or as much decision-making as possible) and unite it in a common narrow mechanism: run-time polymorphism. But unfortunately, it’s not just any run-time polymorphism, but a specific, narrow form of run-time polymorphism, constrained by OOP philosophy and by details of how the implementations typically work:

It requires indirection: Every object must typically be stored on the heap for run-time polymorphism to work, as the different “run-time types” have different sizes. This encourages the aliasing of mutable objects. Not only that, but to actually call a method, it must go through three layers of indirection: dereferencing the object reference, then dereferencing the class pointer or “vtable” pointer, and then doing an indirect function call.
It precludes optimization: Beyond the intrinsic cost of an indirect function call, the fact that the call is indirect means that inlining is impossible. Often, the polymorphic methods are small or even trivial, such as returning a constant, setting a field, or re-arranging the parameters and calling another method, so inlining would be useful. Inlining is also important to allow optimizations to cross the inlining boundary.
It is polymorphic in one parameter only: The special receiver parameter, called self or this, is the only parameter through which run-time polymorphism is typically possible. Polymorphism on other parameters can be simulated with helper methods in those types, which is awkward, and return-type polymorphism is impossible.
Each value is independently polymorphic: In run-time polymorphism, there is often no way to say that all the elements of a collection are of some type T that all implement the same interface, or to say that two parameters to a function are the same type but what that type is should be determined at run-time.
It is entangled with other OOP features: In C++, runtime polymorphism is tightly coupled with inheritance. In many OOP programming languages, it is only available for class types, which as I discussed in my previous post are a constrained form of modules.

I could write an entire blog post about each of these constraints – perhaps I will someday.

But in spite of all these constraints, it is seen as the preferred way of doing decision-making in OOP languages, and as especially intuitive and accesible. Programmers are trained to reach for this tool whenever feasible, whether or not it is the best tool for the decision at hand, even if there is no current need for it to be a run-time decision. Some programming languages, such as Smalltalk, even collapsed “if-then” logic and loops into this one oddly specific decision-making structure, implementing them via polymorphic methods like ifTrue:ifFalse that would be implemented differently in the True and False classes (and therefore on the true and false objects).

To be clear, having a mechanism of vtable-based runtime polymorphism isn’t a bad thing per se – Rust even has one (similar, but not quite identical, to the OOP version described above). But the Rust version is used in the relatively rare situations where that mechanism is the best fit, among a whole palette of mechanisms. In OOP, the elevation of this tightly constrained and unperformant form of decision making above all others, and the philosophical assertion that using it is the best way and most intuitive way to express program flow and business logic, is a problem.

It turns out that programming is much more ergonomic when you choose the tool most appropriate for the situation at hand – and OOP run-time polymorphism is only occasionally the actual tool for the jobs it is often asked to do.

So let’s look at 4 alternatives in Rust that can be used when OOP uses run-time polymorphism.

Alternative #0: `enum`

Not only are there other forms of polymorphism that have strictly fewer constraints (such as Haskell’s typeclasses) or a different set of trade-offs (such as Rust’s traits, heavily based on Haskell typeclasses), there is another decision-making systems in Rust and Haskell, namely algebraic data types (ADTs), or sum types, that also take over many of the applications of OOP-style polymorphism.

In Rust, these are known as enums. enums in many programming language are lists of constants to be stored in integer-sized types, sometimes implemented in a typesafe fashion (like in Java), sometimes not (like in C), sometimes with either option available (like in C++ with the distinction between enum and enum class).

Rust enums support this familiar use case, with type-safety:

pub enum Visibility {
    Visible,
    Invisible,
}

But they also support additional fields associated with each option, creating what in type theory is known as a “sum type,” but it is better known among C or C++ programmers as a “tagged union” – the difference being that in Rust, the compiler is aware of and enforces the tag. Here’s some examples of some enum declarations:

pub enum UserId {
    Username(String),
    Anonymous(IpAddress),
    // ^^ This isn't supposed to be a real network type,
    // just an example.
}

let user1 = UserId::Username("foo".to_string());
let user2 = UserId::Anonymous(parse_ip("127.0.0.1")?);

pub enum HostIdentifier {
    Dns(DomainName),
    Ipv4Addr(Ipv4Addr),
    Ipv6Addr(Ipv6Addr),
}

pub enum Location {
    Nowhere,
    Address(Address),
    Coordinates {
        lat: f64,
        long: f64,
    }
}

let loc1 = Location::Nowhere;
let loc2 = Location::Coordinates {
    lat: 80.0,
    long: 40.0,
};

What do these tagged unions have to do with polymorphism, you may ask? Well, most OOP languages don’t have good syntax for these sum types, but they do have powerful mechanisms for run-time polymorphism, and so you’ll see run-time polymorphism used for situations where Rust enums would actually be just as well-suited (and I will argue, better suited): when there’s a few options for how to store a value, but those options contain different details.

For example, here’s one way to represent the UserId type in Java using inheritance and run-time polymorphism – how I would’ve done it when I was a student (putting each class in a different file):

class UserId {
}

class Username extends UserId {
    private String username;
    public Username(String username) {
        this.username = username;
    }

    // ... getters, setters, etc.
}

class AnonymousUser extends UserId {
    private Ipv4Address ipAddress;
    
    // ... constructor, getters, setters, etc.
}

UserId user1 = new Username("foo");
UserId user2 = new AnonymousUser(new Ipv4Address("127.0.0.1"));

Importantly, just as in the enum example, we can put user1 and user2 in variables of the same type, and can pass them to the same kinds of functions, and in general do the same operations on them.

Now, these OOP-style classes look super-light to the point of being silly, but that’s mostly because we haven’t added any real operational code to this situation – just data and structure and a bit of variable definitions and boilerplate. Let’s consider what happens if we actually do anything with user IDs.

For example, we might want to determine whether they’re an administrator. In our hypothetical, let’s say anonymous users are never administrators, and users with usernames are only administrators if the username begins with the string admin_.

The doctrinally approved object-oriented way of doing that is to add a method, e.g. isAdministrator. In order for this method to work, we have to add it to all three classes, the base class and the two child classes:

class UserId {
    // ...
    public abstract bool isAdministrator();
}

class Username extends UserId {
    // ...
    public bool isAdministrator() {
        return username.startsWith("admin_");
    }
}

class AnonymousUser extends UserId {
    // ...
    public bool isAdminstrator() {
        return false;
    }
}

So, in order to add this simple operation, this simple capability to this type in Java, we have to go to three classes, which will be stored in three files. Each of them contains a method that does something simple, but nowhere can the entire logic be seen of who is and isn’t an administrator – something that someone might naturally ask.

Rust would use match for such an operation, putting all the information about it in one place:

fn is_administrator(user: &UserId) -> bool {
    match user {
        UserId::Username(name) => name.starts_with("admin_"),
        UserId::AnonymousUser(_) => false,
    }
}

This yields a more complicated individual function, but it has all the logic explicitly right there. Having the logic be explicit, instead of implicit in an inheritance hierarchy, cuts against an OOP precept where methods should be simple and polymorphism used to express the logic implicitly. But that doesn’t help guarantee anything, just sweeps it under the rug: It turns out that hiding the complexity makes it harder to grapple with, not easier.

Let’s go through another example. We’ve had this UserId code for a while, and you’re tasked with writing a new web front-end for this system. You need some way of displaying the user information in HTML, either a link to a user profile (in the case of a named user) or a stringification of the IP address in red (in the case of an anonymous user). So you decide to add a new operation for this small family of types, toHTML, which outputs your new front-end’s specialized DOM type. (Maybe the Java’s compiled to WebAssembly, I’m not sure. The details don’t matter.)

You submit a pull request to the maintainer of the UserId class hierarchy, deep in a core library of the backend. And then they reject it.

They have pretty good reasons, actually, you grudgingly admit. They’re saying it’s an absurd separation of concerns. Besides, the company can’t have this core library handling types from your front-end.

So, you sigh, and write the equivalent of a Rust match expression, but in Java (please pardon my absurd hypothetical HTML library):

Html userIdToHtml(UserId userId) {
    if (userId instanceof Username) {
        Username username = (Username)userId;
        String usernameString = username.getUsername();
        Url url = ProfileHandler.getProfileForUsername(usernameString);
        return Link.createTextLink(url, username.getUsername());
    } else if (userId instanceof AnonymousUser) {
        AnonymousUser anonymousUser = (AnonymousUser)userId;
        return Span.createColoredText(anonymousUser.getIp().formatString(), "red");
    } else {
        throw new RuntimeException("IDK, man");
    }
}

And this code your boss rejects upon code review, saying you used the instanceof anti-pattern, but then later they grudgingly accept it after you make them argue with the maintainer of the core library that wouldn’t accept your other patch.

But look at how ugly that instanceof code is! No wonder Java programmers consider it an anti-pattern! But in this situation, it’s the most reasonable thing, really the only possible thing besides implementing the observer pattern or the visitor pattern or something else that just amounts to infrastructure to fake an instanceof with inversion of control.

Having operations implemented by adding a method to every subclass makes sense when the set of operations is bounded (or close to it) and the number of subclasses of the class might grow in unanticipated ways. But just as often, the number of operations will grow in unanticipated ways, while the number of subclasses is bounded (or close to it).

For the latter situation, which is more common than OOP advocates would imagine, Rust enums – and sum types in general – are perfect. Once you’ve gotten used to them, you find yourself using them all the time.

I will say for the record that it isn’t this bad in all object-oriented programming languages. In some, you can write arbitrary class-method combinations in any order, and so you could write all three implementations in one place if you so chose. Smalltalk traditionally lets you navigate the codebase in a special browser, where you can see either a list of methods implemented by a class, or a list of classes that accept a given “message,” as Smalltalk calls it, so you can have your cake and eat it too.

Alternative #1: Closures

Sometimes, an OOP interface or polymorphic decision only involves one actual operation. In such a situation, a closure can just be used instead.

I don’t want to spend too much time on this, because most OOP programmers are already aware of this, and have been since their OOP languages have caught up with functional languages and gotten syntax for lambdas – Java in Java 8, C++ in C++11. Silly one-method interfaces like Java’s Comparator are therefore – fortunately – mostly a thing of the past.

Also, closures in Rust technically involve traits, and so are implemented using the same mechanism as the next two alternatives, so one could also argue that this isn’t really a separate option in Rust. In my mind, however, lambdas, closures, and the FnMut/FnOnce/Fn traits are special enough aesthetically and situationally that it deserved a little bit of time.

And so I’ll take the little bit of time to just say this: If you find yourself writing a trait (or a Java interface or a C++ class) with exactly one method, please consider whether you should instead be using some sort of closure or lambda type. Only you can prevent overengineering.

Alternative #2: Polymorphism with Traits

Just like Rust has a version of encapsulation more flexible and more powerful than the OOP notion of classes, as I discuss in the previous post, Rust has a more powerful version of polymorphism than OOP posits: traits.

Traits are like interfaces from Java (or an all-abstract superclass in C++), but without most of the constraints that I discuss at the beginning of the blog post. They have neither the semantic constraints or the performance constraints. Traits are heavily inspired in semantics and principle by Haskell’s typeclasses, and in syntax and implementation by C++’s templates. C++ programmers can think of them as templates with concepts (except done right, baked into the programming language from the get-go, and without having to deal with all the code that doesn’t use it).

Let’s start with the semantics: What can you do with traits that you can’t do with pure OOP, even if you throw all the indirection in the world at it? Well, in pure OOP terms, there’s no way you can write an interface like Rust Eq and Ord, given greatly oversimplified definitions here (the real definitions of Eq and Ord extend other classes that allow partial equivalence and orderings between different types, but like these simplified definitions, the Rust standard library version of non-partial Eq and Ord do cover equivalence and ordering between values of the same type):

trait Eq {
    fn eq(self, other: &Self) -> bool;
}

pub enum Ordering {
    Less,
    Equal,
    Greater,
}

trait Ord: Eq {
    fn cmp(&self, other: &Self) -> Ordering;
}

See what’s happening? Like in an OOP-style interface, the methods take a “receiver” type, a self parameter, of the Self type – that is, of whatever concrete type implements the trait (technically here a reference to Self or &Self). But unlike in an OOP-style interface, they also take another argument of &Self type. In order to implement Eq and Ord, a type T provides a function that takes two references to T. That’s meant literally: two references to T, not one reference to T and one reference to T or any subclass (such a thing doesn’t exist in Rust), not one reference to T and one reference to any other value that implements Eq, but two bona-fide non-heterogeneous references to the same concrete type, that the function can then compare for equality (or ordering).

This is important, because we want to use this to implement methods like sort:

impl Vec<T> {
    pub fn sort(&mut self) where T: Ord {
        // ...
    }
}

OOP-style polymorphism is ideal for heterogeneous containers, where each element has its own runtime type and its own implementation of the interfaces. But sort doesn’t work like that. You can’t sort a collection like [3, "Hello", true]; there’s no reasonable ordering across all types.

Instead, sort operates on homogeneous containers. All the elements have to match in type, so that they can be mutually compared. They don’t each need to have different implementations of the operations.

Nevertheless, sort is still polymorphic. A sorting algorithm is the same for integers or strings, but comparing integers is a completely different operation than comparing strings. The sorting algorithm needs a way of invoking an operation on its items – the comparison operation – differently for different types, while still having the same overall structure of code.

This can be done by injecting a comparison function, but many types have an intrinsic, default ordering, and sort should default to it. Thus, polymorphism – but not an OOP-friendly variety.

See the contrivance Java goes through to define sort:

static <T extends Comparable<? super T>> 
void sort(List<T> list)

There is no simple trait that can require T to be comparable to other Ts, for T to be ordered. Instead, as far as the programming language is concerned, the idea that T is comparable to itself, rather than to any other random type, is only articulated as an accident to this method. Nothing is stopping someone from implementing the Comparable interface in an inconsistent way, like having Integer implement Comparable<String>.

Additionally, when it actually looks up the implementation of Comparable, it decides what implementation to use based on the first argument of any comparison, not based on the type. Normally, they will all be the same type, but theoretically, this list could be heterogeneous, as long as all the objects “extend” T, and they could implement Comparable differently. The computer has to do extra work to indulge this possibility, even though it would certainly be a mistake.

As we’re now drifting outside of the realm of semantics, and into the realm of performance, let’s discuss the performance implementations of this fully.

The Java sort method, as we mentioned, requires every item in the collection to be a full object type, which means that instead of storing the values directly in the array, the values are stored in the heap, and references are stored in the array. This is unnecessary with a traits-based approach – the values can live directly in the array.

This means that different arrays will have different element sizes, so this has to be handled by a trait as well. And it is: The size of the values is also parameterized via the Sized trait. The size does have to be consistent among all the items of the array, but this is enforceable because we can express that all the elements are actually the exact same type – unlike Java’s List<T> which only expresses that they’re of type T or some subtype of T.

Rust’s sort method could have been implemented by passing the size information (from the Sized trait) and the ordering function (from the Ord trait) at runtime as an integer value and a function pointer. This is how typeclasses work in Haskell, which was the inspiration for Rust traits. This would still be more efficient than the Java, as there would be a single ordering function, rather than a different indirect lookup for every left side of the comparison, allowing indirect branch prediction to work in the processor.

But Rust goes even further than that, and implements its traits instead via monomorphization. This is similar to C++ template instantiation, but semantically better constrained. The premise is that while sort is only one method semantically, in the outputted, compiled code, a different version of sort is outputted for every type T that it is called with.

C++ templates create infamously bad error messages and are difficult to reason about, because they are essentially macros, and awkward ones. Even Rust cannot create great error messages with its macro system. But also, writing them requires expertise, and means that the programmer is forgoing many of the benefits of the type system – templates are often called, in my opinion rightly so, a form of compile time duck-typing. For these reasons, template programming in C++ is often considered more advanced (read as harder and less convenient rather than more powerful) than OOP-style polymorphism.

In Rust, however, traits provide an organized and more coherent way of accessing similar technology, getting the performance benefits of templates while still giving the structure of a solid type system.

Alternative #3: Dynamic Trait Objects

Sometimes, however, you do need full run-time polymorphism. You have the opposite of the scenario with the enum: You have a closed set of operations that can be performed on a value, but what those operations actually do will change dynamically in a way that cannot be bounded ahead of time.

In such situations, Rust has you covered with the dyn keyword. Please don’t overuse it, though. In almost all situations where I’ve thought it might be appropriate, static polymorphism combined with other design elements have worked out better.

Legitimate use cases for dyn tend to come up in situations involving inversion of control, where a framework library takes on a main loop, and the client code says how to handle various events. In network programming, the framework library says how to juggle all the sockets and register them with the operating system, but the application needs to say what to actually do with the data. In GUI programming, the framework code can say what widget was being clicked on, but very different things happen if that widget is a button versus a text box versus a custom widget you invented for this particular app.

Now, you don’t strictly need run-time polymorphism for this. You could use closures (or even raw function pointers) instead, creating struct of closures (or function pointers) if multiple operations are called for – which amounts to basically doing what dyn does the hard way by hand. For example, I fully expected tokio to use Rust’s run-time polymorphism feature internally to handle this inversion of control in task scheduling. Instead, for what I imagine are performance reasons, tokio implements dyn by hand, even calling its struct of function pointers Vtable.

But dyn does all of this work for you, for your trait. The only requirement is that your trait be object-safe, and the list of requirements may seem familiar, especially when it comes to the requirements for an associated function (e.g. a method) to be “dispatchable”:

Not have any type parameters (although lifetime parameters are allowed),

Be a method that does not use Self except in the type of the receiver.

Have a receiver with one of the following types:

&Self (i.e. &self)

&mut Self (i.e &mut self)

Box<Self>

Rc<Self>

Arc<Self>

Pin<P> where P is one of the types above

Does not have a where Self: Sized bound (receiver type of Self (i.e. self) implies this).

That is to say, it can be polymorphic in exactly one parameter, and that parameter must be by reference – more or less the exact requirements for methods to support run-time polymorphism in OOP.

This is of course because dyn uses almost exactly the same mechanism as OOP to implement run-time polymorphism: the “vtable.” Box<dyn Foo> really contains two pointers rather than one, one to the object in question, and the pointer to the “vtable,” the automatically-generated structure of function pointers for that type. The one-parameter requirement is because that is the parameter whose vtable is used to look up which concrete implementation of a method to call, and the indirection requirement is because the concrete type might be different sizes, with the size only known at run-time.

To be clear, these are limitations on one particular implementation strategy for run-time polymorphism. Alternative strategies exist that fully decouple the vtable from individual values of the type, as in Haskell.

There are still a few advantages of Rust’s version of run-time polymorphism with traits as opposed to OOP-style interfaces.

Performance-wise, it’s something done alongside a type, rather than intrinsic to the type. Normal values don’t store a vtable, spreading the cost of this throughout the program, but rather, the vtables are only referenced when a dyn pointer is created. If you never create a dyn pointer to a value of a given type, that type’s vtable doesn’t even have to be created. Certainly, you don’t have 8 bytes of extra gunk in every allocation for all the vtable pointers! This also means there’s one fewer level of indirection.

Semantically, it’s also a good thing that it’s just one option among many, and that it’s not the strongly preferred option that the entire programming language is trying to push you towards. Often, even usually, static polymorphism, enums, or even just good old-fashioned closures more accurately represent the problem at hand, and should be used instead.

Finally, the fact that run-time and static polymorphism in Rust both use traits makes it easier to transition from one system to another. If you find yourself using dyn for a trait, you don’t have to use it everywhere that trait is used. You can use the mechanisms of static polymorphism (like type parameters and impl Trait) instead, freely mixing and matching with the same traits.

Unlike in C++, you don’t have to learn two completely different sets of syntax for concepts vs parent classes, and vastly different semantics. Really, in Rust, dynamic polymorphism is just a special case of static polymorphism, and the only differences are the things that actually are different.

My Reaction to Dr. Stroustrup's Recent Memory Safety Comments

2023-01-30T00:00:00+00:00

The NSA recently published a Cybersecurity Information Sheet about the importance of memory safety, where they recommended moving from memory-unsafe programming languages (like C and C++) to memory-safe ones (like Rust). Dr. Bjarne Stroustrup, the original creator of C++, has made some waves with his response.

To be honest, I was disappointed. As a current die-hard Rustacean and former die-hard C++ programmer, I have thought (and blogged) quite a bit about the topic of Rust vs C++. Unfortunately, I feel that in spite of the exhortation in his title to “think seriously about safety,” Dr. Stroustrup was not in fact thinking seriously himself. Instead of engaging conceptually with the article, he seems to have reflexively thrown together some talking points – some of them very stale – not realizing that they mostly are not even relevant to the NSA’s Cybersecurity Information Sheet, let alone a thoughtful rebuttal of it.

Fortunately, he does eventually discuss his own ideas of how to make C++ memory safe – in the future. If these ideas are implemented well, it will make C++ a safe programming language as the NSA’s Cybersecurity Information Sheet has defined it. But given that they are currently just proposals in an early stage, it’s unfair of him to expect the NSA to mention them when advising people on what programming language to use. C++ has been an unsafe language for a long time. Maybe someday that will change, but we’ll believe it when we actually see it.

But before I discuss that, I’d like to rebut and discuss my disappointment at the talking points he uses earlier in his response, because I think they unfairly frame the debate, shield C++ from legitimate and important criticism, and slander memory-safe programming languages and downplay memory safety as a concept, even though it’s very important.

Multiple Types of Safety?

One of the most interesting and conceptually relevant points that Dr. Stroustrup harps on is that memory safety is not the only type of safety:

Also, as described, “safe” is limited to memory safety, leaving out on the order of a dozen other ways that a language could (and will) be used to violate some form of safety and security.

This might technically be true – it’s not entirely clear what other forms of “safety” he’s talking about – but it’s misleading. Memory unsafety is not just one of a dozen equally important forms of “unsafety.” Rather, memory unsafety is by far the biggest source of security vulnerabilities and instability in memory unsafe programming languages – estimates as high as 70 percent in some contexts.

A 70% decrease in security vulnerabilities is worth committing significant resources towards. Memory safety on its own is worth writing a Cybersecurity Information Sheet about, and it is the area where C++ has the most serious deficits. Given that, this feels like a car manufacturer whose cars do not provide air bags responding to a government advisory not to buy the C++ cars by saying “What about other types of safety? By talking just about air bags, the government is clearly not thinking seriously about safety.” Sure, there’s other types of safety features besides air bags (or memory safety), but air bags are still important!

So, Dr. Stroustrup, what about memory safety in C++? Shouldn’t C++ have memory safety? Are you saying it’s not important, especially when all of these other programming languages have it?

Of course, he doesn’t go into detail about other types of safety, which is telling. Of course, it’s because C++ doesn’t really have the advantage in any of them. For example, Rust also has a lot of mechanisms for thread safety and type safety, intimately connected with its memory safety mechanisms, and baked into the design of Rust in a way that would be next to impossible to retrofit into another programming language.

And, when you read later on about the “safety profiles” in the C++ Core Guidelines that he makes such a big deal about, most of the focus there is also about memory safety.

Petty Irrelevancies

Let’s look at some of the other points he makes.

That specifically and explicitly excludes C and C++ as unsafe.

C++ does not enforce memory safety as a feature of the programming language. This may change in the future (as Dr. Stroustrup discusses), but is the current state of things. Dr. Stroustrup tries to downplay this, but is not convincing.

As is far too common, it lumps C and C++ into the single category C/C++, ignoring 30+ years of progress.

Writing “C/C++” to mean “C and C++” is considered a faux pas among C++ programmers, and among C programmers as well, because it is seen as asserting that these two programming languages are near-identical when there are in fact major differences between them. By pointing out that the NSA does this, Dr. Stroustrup is trying to make them look like they don’t know what they’re talking about, just because they used a “/” character instead of the word “and.”

He’s reading too much into the orthography and the NSA’s failure to use insider shibboleths of the programming languages they’re trying to criticize. Outside of the “C” and “C++” communities, “C/C++” is a fairly common way to refer to the two related programming languages.

And that’s the most relevant thing here: C and C++ are indeed related programming languages, and they have a lot in common: They are both compiled programming languages with a focus on performance, and they are (very relevantly) both not particularly focused on guaranteeing memory safety. C and C++ have a substantial common subset, with many memory unsafe features that are popular with programmers, perhaps even more popular because they work similarly in both programming languages. For the purposes of this document, it’s often the features that C and C++ have in common that are the problematic ones, so it makes sense for the NSA to lump them together.

While there might be 30+ years of divergence between C and C++, none of C++’s so-called “progress” involved removing memory-unsafe C features from C++, many of which are still in common use, and many of which still make memory safety in C++ near intractible. Sure, new features in C++ have been added that (in some but by no means all cases) do not make it as easy to corrupt memory, but the bad old features are not in any real way being phased out: They are not guarded by any special opt-in syntax, nor in many cases do they result in warnings. Given that, the combined set of features is as strong as its weakest link.

Unfortunately, much C++ use is also stuck in the distant past, ignoring improvements, including ways of dramatically improving safety.

This is a common C++ talking point, but it doesn’t help Dr. Stroustrup’s position as much as he thinks it does.

He’s trying to talk up how much C++ has improved, especially in the last 11 years – and it has indeed improved. New ways of writing C++, emphasizing relatively new features, can indeed result in more reliable C++ code with less memory corruption.

But unfortunately, this talking point just serves to remind us that these old memory-unsafe features are still in common use. When someone says their project is written in Rust, we can guess that it likely uses only the safe features (including using standard library functions that use unsafe internally – that truly doesn’t count as unsafe), or maybe uses the unsafe features when absolutely necessary. But when someone says their project is written in C++, by Dr. Stroustrup’s own admission, there’s a high likelihood that it uses old features “stuck in the distant past, ignoring … ways of dramatically improving safety.” This is also a reason to avoid C++.

However, I would also contest his claim about these new features. Memory safety isn’t just an absence of memory corruption, but a reliable method for ensuring the absence of memory corruption. “Using new features” isn’t good enough. Even if using the new features in preference to the old ones were a guarantee of memory safety – which it isn’t, they’re less memory corrupting but not truly memory safe – the presence of the old ones would still cause problems. You would need some mechanism to ensure that the new features were only used safely, and that the old features were not used, and no such mechanism exists, at least not in the programming language itself. Someone who remembers the old features can always still slip up and use one by accident.

Static Analysis: Not Good Enough

Dr. Stroustrup points out that he’s been working very hard on improving memory safety in C++, for a very long time:

After all, I have worked for decades to make it possible to write better, safer, and more efficient C++. In particular, the work on the C++ Core Guidelines specifically aims at delivering statically guaranteed type-safe and resource-safe C++ for people who need that without disrupting code bases that can manage without such strong guarantees or introducing additional tool chains.

Unfortunately, it’s not done. The key word here is, of course, “aims.” The next sentences admit that this feature is not in fact available:

For example, the Microsoft Visual Studio analyzer and its memory-safety profile deliver much of the CG support today and any good static analyzer (e.g., Clang tidy, that has some CG support) could be made to completely deliver those guarantees….

For memory safety, “much of” is not really good enough, and “could be made” is practically worthless. Fundamentally, the point is that memory safety in C++ is a project being actively worked on, and close to existing. Meanwhile, Rust (and Swift, C#, Java, and others) already implements memory safety.

It’s worse than that, though. What Dr. Stroustrup is trying to downplay is that this involves using static analyzers, considered separate from the programming language, something the NSA’s original article also discusses. Theoretically, if a static analyzer could be used to guarantee memory safety, that could be just as reliable as a programming language that does it. An engineering team could have a policy that all code must pass this static analysis before being put into production.

But unfortunately, human nature is more fickle than that. If it’s not built into the programming language, it’s going to get skipped. If a vendor says their software is written in C++, or if an engineer takes a job in C++, how will they know that these static analyzers will in fact be used? A programming language that takes memory safety seriously doesn’t provide it as an optional add-on that most people will simply ignore.

But All The C++ Code!

The end of the last quote provides a common talking point in Rust vs C++ arguments:

[Static analyzers] could be made to completely deliver those guarantees at a fraction of the cost of a change to a variety of novel “safe” languages.

Besides the laughably condescending matter of calling Java (which first appeared in 1995), C# (first appeared in 2000), and Ruby (first appeared in 1995) “novel,” this is a jab at a common trope that (some immature) Rust programmers go around demanding that people rewrite their projects in Rust (please don’t do this!), and an attack on the idea that all code can be written in safe programming languages, given the large body of existing work in unsafe programming languages.

This is a bit of a straw man in this context. The NSA article that Stroustrup is responding to addresses that switching existing codebases might be expensive, even prohibitively so, saying:

It is not trivial to shift a mature software development infrastructure from one computer language to another. Skilled programmers need to be trained in a new language and there is an efficiency hit when using a new language. Programmers must endure a learning curve and work their way through any “newbie” mistakes. While another approach is to hire programmers skilled in a memory safe language, they too will have their own learning curve for understanding the existing code base and the domain in which the software will function.

It then follows this up immediately with an explanation of how tools like static analyzers can be used as a back-up plan for improving memory safety in memory unsafe programming languages – exactly what Dr. Stroustrup discusses. He’s criticizing this NSA document, implying it is not thinking “seriously,” while fundamentally making a point that they already made for him.

Of course, this is a terrible endorsement of C++. It’s far from ideal to have to use add-on tools to work around a language’s flaws. Coming from Dr. Stroustrup, it reads more like a brag that his programming language has locked everyone in than a defense of why C++ is good. Or else, it’s an admission that other programming languages should be used for new projects, and that C++’s fate is now to gradually fade like the elves from Middle Earth.

But he’s also overstating his case. As I mention before, safe programming languages have existed for a long time. Many programming projects that in the early 90’s would have been done in C or C++ have in fact been done in safe programming languages instead, and according to the NSA’s recommendation, that was a good idea. As computers have gotten faster and programming language technology has improved, there has been fewer and fewer reasons to settle for languages like C or C++ that don’t have memory safety as a feature.

When I was a professional C++ programmer as early as 2013, some people – even some programmers – already thought that C++ was a legacy programming language like COBOL or Fortran. And outside of narrow niches like systems programming (e.g. web browsers, operating systems, and lower-level libraries), video games, or high performance programming, it kind of has become one. The former application niches of C++ have been taken over by Java and C#, or more recently by Go. If you have an application program written in C++, chances are that it’s a relatively old codebase, or written at a shop that has reasons to write a lot of C++ (such as a high-frequency trading firm).

Now, even C++’s systems niche is under threat, with Rust, a powerful memory-safe programming language that avoids many of C++’s problems. Now, even the niches where C++ isn’t at all “legacy” have a viable, memory-safe alternative without a lot of the technical debt that C++ has. Rust is even allowed in the Linux kernel, a project that has only previously accepted C, and whose chief maintainer has always explicitly hated C++.

A Memory-Safe C++

Fortunately, after all of these ill-thought out, tired talking points, Dr. Stroustrup subtly changes his perspective. After his distractions, after bashing memory safe programming languages as “novel,” bragging about how C++ is too entrenched to be removable, pretending memory safety is just one of many equally important safety issues, and promising optional add-on tools that will eventually be standardized, he finally begins to tackle the question of how C++ could be made memory safe, in an opt-in fashion:

There is not just one definition of “safety”, and we can achieve a variety of kinds of safety through a combination of programming styles, support libraries, and enforcement through static analysis. P2410r0 gives a brief summary of the approach. I envision compiler options and code annotations for requesting rules to be enforced. The most obvious would be to request guaranteed full type-and-resource safety. P2687R0 is a start on how the standard can support this, R1 will be more specific. Naturally, comments and suggestions are most welcome.

…

For example, in application domains where performance is the main concern, the P2687R0 approach lets you apply the safety guarantees only where required and use your favorite tuning techniques where needed. Partial adoption of some of the rules (e.g., rules for range checking and initialization) is likely to be important. Gradual adoption of safety rules and adoption of differing safety rules will be important. If for no other reason than the billions of lines of C++ code will not magically disappear, and even “safe” code (in any language) will have to call traditional C or C++ code or be called by traditional code that does not offer specific safety guarantees.

This is a lot closer to what the NSA document actually specifies for memory safe programming languages than he gives the document credit for. For example, the document already provides for opting out of memory safety via annotation, paired with an observation that that will focus scrutiny on the code that opts out.

Dr. Stroustrup did not need to criticize the document for not thinking “seriously” to reach this conclusion, but simply acknowledge that it’s true that C++ is not a memory safe programming language yet, but that based on his work, it might soon become one. Maybe the next version of the NSA document will endorse using C++, but only if it’s C++ZZ – where ZZ is some future version of the C++ standard.

I’m glad comments and suggestions are welcome, however, because I have a huge one.

Opt-in for memory safety is unacceptable, and is almost as bad as having a separate static analysis tool to enforce safety. Opt-out is fine – Rust has a way to opt out of memory safety with the unsafe keyword, and this concept is discussed and defended in the NSA’s original document. But the default should be to enforce memory safety unless otherwise specified.

For C++, this means that if these safety features are added in C++ZZ, --std=c++ZZ should cause unsafe constructs to be rejected – and the C++ standard should require that these constructs be rejected for an implementation to be a conforming implementation of C++ZZ. Perhaps (but only perhaps) other command line arguments could be added to override this constraint on a file-by-file basis. Ideally, a new compiler command (e.g. g++ZZ) should be created for each implementation that defaults to this stricter behavior.

Parts of the codebase that use legacy features should have to have at least a file-level annotation that that file is a legacy file – and then this annotation could gradually be moved to the function level. As a side benefit, this could also be used to phase out and deprecate weird points of C++ syntax, similar to the Rust edition system: Anyone using, for example, 0 literals to mean nullptr would have to declare some sort of a legacy annotation on their file or in their build system.

Only with this sort of opt-out memory-safety system would I consider C++ a memory safe programming language. I’d be very happy to see a memory-safe C++. I earnestly hope Dr. Stroustrup is successful in his endeavors. I’m not holding my breath, though, and in the meantime, I will continue to use other programming languages, that are already memory-safe, for my new projects, as will the majority of programmers.

In the meantime, it is unfair for Dr. Stroustrup to call safe programming languages novelties or to pretend that C++ isn’t already far behind the times on this. This was already an important criticism of C++ decades ago, when Java first came out in the 90’s and was referred to as a “managed programming language.” This was discussed in detail in my classes when I was a college student in the late aughts. To read Dr. Stroustrup’s writing, C++ is being criticized by “novel” upstarts when it is well on its way to getting the feature, but in actuality, the time to act was 1996.

Rust and Default Parameters

2023-01-11T00:00:00+00:00

Rust doesn’t support default parameters in function signatures. And unlike in many languages, there’s no way to simulate them with function overloading. This is frustrating for many new Rustaceans coming from other programming languages, so I want to explain why this is actually a good thing, and how to use the Default trait and struct update syntax to achieve similar results.

Default parameters (and function overloading) are not part of object-oriented programming, but they are a common feature of a lot of the programming languages new Rustaceans are coming from. This post therefore fits in some ways with my on-going series on how Rust is not object-oriented, and so it is tagged with that series. It was also inspired by Reddit responses to my first OOP post.

How Default Parameters Work (in e.g. C++)

So before I talk about why Rust doesn’t have default parameters and what you can do instead, let’s talk a bit about what default parameters are and the situations in which they are useful.

Let’s say you have a function that takes many parameters, perhaps (to take an example from the Reddit response) one that creates a window in a GUI:

WindowHandle createWindow(int width, int height, bool visible)

auto handle = createWindow(10, 30, false); // Create invisible window
auto handle2 = createWindow(100, 500, true); // Create visible window

Now, let’s say that you assume that most windows that are created are intended to be visible, and you don’t want to burden the programmer with having to specify whether the window is visible – or even think about it explicitly – in that normal case. In a programming language that supported default parameters, you could then provide a default for visible.

WindowHandle createWindow(int width, int height, bool visible = true)

auto handle = createWindow(10, 30, false); // Create invisible window!

auto handle2 = createWindow(100, 500, true); // Create visible window!

auto handle3 = createWindow(100, 500); // Also create visible window!
auto handle4 = createWindow(100, 500); // Most of the time, that's what
auto handle5 = createWindow(100, 500); // you want, so why have to say it?

Default parameters can also be simulated with function overloading for programming languages where function overloading is available but default parameters are not:

WindowHandle createWindow(int width, int height, bool visible);

WindowHandle createWindow(int width, int height) {
    return createWindow(width, height, true);
}

Rust also does not have function overloading, and that’s a much more complicated issue, but many of the same arguments apply to this idiom.

Benefits (and Detriments) of Default Parameters

Defaults are good, and default parameters in this style are one way to implement them and reap their benefits.

Defaults are good because they uphold the DRY principle – Don’t Repeat Yourself. If we didn’t have defaults, we’d have to repeat parameters that don’t actually contribute to understanding of the goals of the code. And if the best default parameters changed in such a way that the best way to update the code was to continue using the default – perhaps because of a change of best practices – we’d have to update every call rather than just changing it once, where the default parameter is defined.

Defaults are also good because they decrease the programmer’s cognitive load. Programmers have to keep a lot of information in their brain at a time, and defaults help programmers by not forcing them to think about extra details when they don’t matter – which is the usual situation for most defaults.

Default parameters also make the code more concise, and are popular for that reason. But this isn’t a particular value that I have. I believe the DRY principle is important, and that often amounts to more concise code, but given modern editors and IDE, and modern expectations of typing and reading speed, a moderate amount of verbosity in exchange for other benefits (such as clarity and explicitness) is completely acceptable to me. I believe that default parameters, as they are implemented in C++ and Python, have a substantial cost in clarity and explicitness, and therefore conciseness isn’t a good enough reason to justify them.

In this case, what particularly bothers me about the lack of clarity is that the reader of the code doesn’t know that there are potentially more parameters; there is no hint that there might be other parameters. If a maintenance programmer wants to change one of these calls to make invisible windows instead, they might not realize they should check the documentation for create_window: after all, it only seems to take two parameters, and neither of them have anything remotely to do with invisible windows.

Fortunately, Rust has alternative features that allow us to reap the benefits for cognitive load and DRY without sacrificing explicitness and clarity.

Defaults in Rust: the `Default` trait

Rather than allowing default parameters, Rust allows you to optionally specify default values for your types using the Default trait. Here’s how it works:

enum Foo {
    Bar,
    Baz,
}

impl Default for Foo {
    fn default() -> Self {
        Foo::Bar
    }
}

Or, written using the more concise derive syntax:

#[derive(Default)]
enum Foo {
    #[default]
    Bar,

    Baz,
}

Once this default is defined, Foo::default() or even (in a context where the type is clear) Default::default() can stand in for Foo::Bar.

If you are used to re-using existing types for your function parameters, this might seem worse than useless. After all, the parameter we defaulted was of type bool, and the orphan rule (explained in the Rust book’s chapter on traits) forbids us from defining the Default trait on bool – as I alluded to above, Default allows you to define default values for your types. And even if we could, setting a default on booleans is way too overpowered a thing to do just to give this one function parameter have a default! After all, some other function might also have a boolean parameter with a different default.

But this makes more sense if you consider that in Rust, it is common – even idiomatic and preferred – to create custom types for things like configuration and function parameters. After all, if you’re not looking at the documentation, it can be unclear what true means. It’s not even clear that it has anything to do with visibility, let alone that true means that the window is to be visible when the parameter could just as easily be called invisible.

In Rust, we would prefer to define a new type for this situation, an enum listing the visibility options – which will also help if a new visibility option is created. And on this enum, it would be reasonable to declare a default:

#[derive(Default)]
enum WindowVisibility {
    #[default]
    Visible,

    Invisible,
}

Yes, this is more verbosity, but it is more clear, and no less DRY, than our original code. Conciseness is again not a value in and of itself. Explicitly listing the options is preferred to leaving them implicit.

Then, when we call the function, we can use this default:

fn create_window(width: u32, height: u32, visibility: WindowVisibility) -> WindowHandle;

let handle = create_window(10, 30, WindowVisibility::Invisible);
let handle2 = create_window(100, 500, WindowVisibility::Visible);

let handle3 = create_window(100, 500, WindowVisibility::default());
let handle4 = create_window(100, 500, WindowVisibility::default());
let handle5 = create_window(100, 500, Default::default()); // Also permitted

This is, as promised, more verbose, but equally DRY, and much more explicit and clear.

NB: I’m using free-standing functions for example purposes only. In reality, this particular function is just as likely to be part of a type’s intrinsic methods, something like WindowHandle::new or WindowHandle::create_window.

Scaling defaults in Rust: Struct update syntax

So this is all well and good for one default. But it doesn’t scale that well. What if we want to add another 3 parameters to our window creation function? In a language like C++, we can give them defaults, and the callers don’t even need to be updated (parameters are for example purposes only and do not represent a well-thought out list of what you might want to specify in creating a window):

WindowHandle createWindow(int width, int height, bool visible = true,
                          WindowStyle windowStyle = WindowStyle::Standard,
                          int z_position = -1,
                          bool autoclose = false);

createWindow(100, 500); // Still works identically
createWindow(100, 500, false); // Also still works
createWindow(100, 500, false, WindowStyle::Standard, 2, true); // Specify everything

This is a useful feature. In Rust, with the techniques we’ve discussed so far, we’d have to write Default::default() repeatedly for however many parameters there are. This is a DRY violation, and interferes with the ability to add new parameters.

There is a flaw with this feature, however. You’ve now constrained yourself to specifying parameters to the left in order to specify parameters on the right. In the last example call to createWindow, we violate DRY by explicitly specifying a value when we probably wanted to use the default, but that wasn’t available because we wanted to override the default for a later parameter.

Fortunately, Rust has a version of this too. Just as we created an enum just for the purposes of this function call, it is idiomatic in Rust to create structures for configuration parameters like this. The structure would look something like this:

pub struct WindowConfig {
    pub width: u32,
    pub height: u32,
    pub visibility: WindowVisibility,
    pub window_style: WindowStyle,
    pub z_position: i32,
    pub autoclose: AutoclosePolicy,
}

Then, we can implement Default for that entire struct:

impl Default for WindowConfig {
    fn default() -> Self {
        Self {
            width: 100,
            height: 100,
            visibility: WindowVisibility::Visible,
            window_style: WindowStyle::Standard,
            z_position: -1,
            autoclose: AutoclosePolicy::Disable,
        }
    }
}

Now, this might seem to be extremely tedious to use. You might imagine using it something like this:

let mut config = WindowConfig::default();
config.width = 500;
config.z_position = 2;
config.autoclose = AutoclosePolicy::Enable;
let handle = create_window(config);

I would argue that even this is preferable to default parameters, because again, it is explicit. However, Rust has a syntactic construct designed exactly for situations like this, struct update syntax. With it, we get something very similar to default parameters, but a little more verbose, a lot more explicit, and a lot more flexible:

let handle = create_window(WindowConfig {
    width: 500,
    z_position: 2,
    autoclose: AutoclosePolicy::Enable,
    ..Default::default()
});

Unlike C++-style default parameters, we can override exactly the defaults we want to. It is also explicitly clear that there are other parameters we could modify if we wanted to, without forcing the maintenance programmer to check the documentation.

But beyond that, this allows there to be other sets of defaults defined. In addition to WindowConfig::default, there might be another set of configuration parameters for creating dialog boxes, like WindowConfig::dialog() or WindowConfig::default_dialog. An app where the programmer usually creates invisible windows, or windows all of the same height, might define its own default set, config::app_local_default_window_config(). These wouldn’t be mediated through the Default trait, but Default is just a trait, and Default::default() is just a method call. You can call your own methods instead, and still use this struct update syntax.

So now, we have a system of idioms in Rust to replace default parameters. It’s just as DRY, and decreases the cognitive load just as much. More importantly, it does so without sacrificing explicitness and clarity as to exactly what’s going on – a given function always takes the same number of parameters, which is an invariant that Rust maintenance programmers can (and do) rely on.

The Builder Pattern

At this point, the old-hand Rustaceans in the audience will note that I haven’t discussed one common Rust approach to designing these configuration structs, the builder pattern.

That’s for a reason: I don’t like it. I personally prefer to use Default and struct update syntax where others might reach for the builder pattern. I think it’s less explicit, and since I have a lot of experience in non-OOP programming languages, it feels to me like a solution without a problem, the primary upshot of which is to make the code look more object-oriented.

But it is a commonly used pattern in Rust, and you will use crates that use the builder pattern, so it’s worth being familiar with it. It’s the same concept as before: using a struct full of parameters to send configuration to a constructor or to a function call. It’s probably going to be called something like WindowBuilder instead of WindowConfig.

However, instead of using the struct update syntax directly, a bunch of helper methods are added to do the struct update:

impl WindowBuilder {
    fn height(mut self, height: u32) -> Self {
        self.height = height;
        self
    }

    // ...
}

Or, as I would notate it:

impl WindowBuilder {
    fn height(self, height: u32) -> Self {
        Self {
            height,
            ..self
        }
    }

    // ...
}

Sometimes, enumerations are split into multiple update methods:

impl WindowBuilder {
    fn autoclose_enable(mut self) -> Self {
        self.autoclose = AutoclosePolicy::Enable;
        self
    }

    fn autoclose_disable(mut self) -> Self {
        self.autoclose = AutoclosePolicy::Disable;
        self
    }
}

Then, normally, instead of calling e.g. the window constructor, you call a build method defined on the builder (and at this point I cringe at the gratuitous OOP philosophy influencing the design):

impl WindowBuilder {
    fn build(self) {
        window_create(self)
    }
}

Then, instead of using struct update syntax, you chain together calls to these methods:

let handle = WindowBuilder::new()
    .width(500)
    .z_position(2)
    .autoclose_enable()
    .build();

I still prefer this to default parameters, but I also find it tacky. I don’t like being forced to think in terms of abstract “objects” like builders, and I don’t like the presumption that this style is more intuitive. Why is a “builder” an object that does something? Why is that prefered to a structure that is “configuration”? Are OOP programmers aware that in real life, the vast majority of objects literally don’t do things, and certainly don’t build other objects?

But for people familiar with the idioms of object-oriented programming, this might be preferable. It is a commonly chosen option, so it’s important at least to recognize it.

Conclusion and Application

Rust has a lot of idioms that are different from those in other programming languages. I often see proposals from new Rustaceans to add default parameters – and other similar features – to Rust, and these new Rustaceans are confused that the strong demand they feel is not as widely felt in the greater Rust community.

And normally, it’s similar to this situation with default parameters. There are alternative idioms that accomplish the same goals, to the extent that those goals are in line with Rust’s values: in this case, DRYness, and reducing developers’ cognitive loads. They are also better solutions in some other ways, according to Rusty values: the additional explicitness is worth a little more verbosity.

But often, the new Rustaceans making these proposals are unaware of the Rusty way of doing things. And if they are aware of it, they are approaching it from the goals of other programming languages, and don’t see how the solution measures up.

So I hope this can serve as a case study to help people understand that there often are Rusty ways of accomplishing the goals of popular features from OOP land, and why Rustaceans prefer these solutions to blind accumulation of features.

Rust Is Beyond Object-Oriented, Part 1: Intro and Encapsulation

2022-12-12T00:00:00+00:00

Rust is not an object oriented programming language.

Rust may look like an object-oriented programming language: Types can be associated with “methods,” either “intrinsic” or through “traits.” Methods can often be invoked with C++ or Java-style OOP syntax: map.insert(key, value) or foo.clone(). Just like in an OOP language, this syntax involves a “receiver” argument placed before a . in the caller, called self in the callee.

But make no mistake: Though it may borrow some of the trappings, some of the terminology and syntax, Rust is not an object-oriented programming language. There are three pillars of object-oriented programming: encapsulation, polymorphism, and inheritance. Of these, Rust nixes inheritance entirely, so it can never be a “true” object-oriented programming language. But even for encapsulation and polymorphism, Rust implements them differently than OOP languages do – which we will go into in more detail later.

This all comes as a surprise and an adjustment to a lot of programmers. I see Rust newbies on Reddit asking how to implement OOP design patterns literally, trying to get “class hierarchies” like “shapes” or “vehicles” working with traits standing in as “the Rust version of inheritance” – in other words, trying to solve problems they only have because they’re committed to the OOP approach, and doing contrived OOP examples to try to learn what they expect to be just another version of it.

It’s a stumbling block for many. I regularly see “lack of OOP” mentioned on the Internet by Rust newbies and sceptics as a reason Rust is hard to adjust to, or not a good fit for them, or even why it will never catch on. For people who learned to program in the height of OOP as a trend – when perfectly good languages like C and ML had to become object-oriented as Objective-C and OCaML – the amount of hype about a non-OOP language just feels off.

It’s not an easy adjustment either. So many programmers learned software design and architecture in an explicitly object-oriented way. I see question after question where a beginning or intermediate Rust programmer wants to do an object-oriented thing, and want a literal Rust equivalent. Often, these are examples of the XY problem, and they have trouble backtracking and approaching the problem in a more Rusty way.

But that isn’t Rust’s fault. The answer is still for us to adjust, even if it isn’t easy; being proficient in not only multiple languages but also different programming paradigms makes us better programmers.

And, as a paradigm, OOP is actually thoroughly mediocre – so much so that I’m writing a whole blog series to explain why, and why Rust’s approach is better.

OOP Ideology

Look, I get it. I used to drink the OOP Kool-Aid myself. I remember how it was billed to us: not as just a set of code organization practices, but a revolution in programming. The OOP way was held up as more intuitive, especially to non-programmers, because it would align better with how we think of the natural world.

For an archetypical example of this marketing, here is an excerpt from the first public article about OOP in a popular magazine (Byte Magazine, in 1981):

Many people who have no idea how a computer works find the idea of object-oriented programming quite natural. In contrast, many people who have experience with computers initially think there is something strange about object oriented systems.

It was pretty easy to buy into, as well. Of course, our everyday life doesn’t have anything like subroutines or variables – or, to the extent that it does, we don’t think about them explicitly! But it does have objects that we can interact with, each with its own capabilities. How could it not be more intuitive?

It’s very compelling pseudo-cognitive science, light on research, heavy on really persuasive rationales. The objects can be thought of as “agents,” almost as people, and so you could leverage your social skills towards it instead of just analytical thinking (never mind that objects act nothing like people, and actually substantially dumber in a way that still requires analytical thinking). Or, you can think of objects and classes as an almost-platonic representation of the world of forms itself, making it philosophically compelling.

And oh, how I bought in, especially in my wanton and reckless youth. I personally soaked up the connection between OOP and Platonic philosophy. I delved deep into meta-object protocols, and the fact that in Smalltalk every class had to have a metaclass. The concept of the Smalltalk code Metaclass class felt almost mystical to me, as the notion that any value could be organized in the same hierarchy, with Object at its root.

I remember reading in a book that OOP-style polymorphism made if-else statements redundant, and therefore we should strive to ultimately only use OOP-style polymorphism. Somehow, instead of putting me off, this excited me at the time. I was even more excited when I learned that Smalltalk in fact does this (if you ignored implementation details that optimize away some of this abstraction): In Smalltalk, the concept of if-then-else is implemented via methods like ifTrue: and ifFalse: and ifTrue:ifFalse: on the single-instance True and False classes, with their global objects, true and false.

As a more mature programmer, exposed to the less ideological OOP of C++ and the alternative of functional programming in Haskell, my positions softened, and then shifted dramatically, and now I am barely a fan of OOP at all, especially as its best ideas have been carried on to a newer synthesis in Haskell and Rust. I’ve realized that this hype about new programmers is typical for any paradigm; any new programming paradigm is more intuitive for a newbie than it is for someone who’s a veteran programmer in a different paradigm. The same thing is said for functional programming. The same thing is even said for Rust. It really doesn’t have that much to do with whether a paradigm is better.

As for if statements being fully replaceable by polymorphism, well, it’s easy to come up with a set of primitives that are Turing-complete. You can simulate if statements with polymorphism, true. You can also simulate while loops with recursion, or recursion with while loops and an explicit stack. You can simulate if statements with while loops.

None of these facts make such substitutions a good idea. Different features exist in a programming language for different situations, and making them distinct is actually a good thing, in moderation.

After all, the point of programming is to write programs, not to make proofs about Turing-completeness, do philosophy, or write conceptual poetry.

Practicality

So, in this blog series, I intend to evaluate OOP in practical terms, as a programmer with experience in what makes programming languages cognitively more manageable or easy to do abstraction in. I will do it in terms of my experience solving actual programming problems – I see it as a bad sign that many examples of how OOP abstractions work only make sense in really advanced programs or with contrived examples about different types of shapes or animals in a zoo.

And unlike most introductions to OOP, I will not primarily be focusing on how OOP compares to pre-OOP programming languages. I will instead be comparing to Rust, which takes many of the good ideas from OOP, and perhaps also to functional programming languages like Haskell. These programming languages have taken some of OOP’s good ideas, but transformed them in a way that fixes some of their flaws and moves them beyond what can reasonably be called OOP.

I will organize this comparison according to the three traditional pillars of object-oriented programming: encapsulation, polymorphism, and inheritance, with this first article focusing on encapsulation. For each pillar, I will discuss how OOP defines it, what equivalents or substitutes exist outside of the OOP world, and how these compare for practical ease and power of programming.

But before I jump in, I want to talk a second about a use case that turns much of this on its head: graphical user interfaces or GUIs. Especially before the era of the browser, writing GUI programs to run directly on desktop (or laptop) computers was a huge part of what programmers did. A lot of early development of OOP was done in tandem with research into graphical user interfaces at Xerox PARC, and OOP is uniquely well-suited for that use case. For this reason, the GUI deserves special consideration.

For example, it is common for people to emulate OOP in other programming languages. Gtk+ is a huge example of this, implementing OOP as a series of macros and conventions in C. This is done for many reasons, including familiarity with OOP designs and a desire to create some kind of run-time polymorphism. But in my experience, this is most common when implementing a GUI framework.

In this series of articles, we will primarily focus on applying OOP to other use cases, but we will also discuss GUIs as appropriate. In this introductory section, I will just point out that GUI frameworks are clearly possible outside traditional OOP designs and programming languages, and even in Rust. Sometimes, they work by completely different mechanisms, like the functional-reactive programming mostly pioneered in Haskell, which I personally prefer to traditional OOP-based programming and for which traditional OOP features would not be helpful.

Now, without further ado, let us compare OOP to Rust and other post-OOP programming languages, pillar by pillar, from a pragmatic perspective. For the rest of this first post, we will focus on encapsulation.

First Pillar: Encapsulation

In object-oriented programming, encapsulation is bound up with the idea of a class, the fundamental layer of abstraction in object-oriented programming. Each class contains a layout for some data in a record format, that is, a data structure where each instance contains a set number of fields. Individual instances of the record type are known as “objects.” Each class also contains code that is tightly paired to that record type, organized into procedures called methods. The idea is then that all of the fields will only be accessible from inside the methods, either by the conventions of OOP ideology or by the enforced rules of the programming language.

The fundamental benefit here is that the interface, which is how the code interacts with other code, or what you have to understand to use the code, is much simpler than the implementation, which are the more fluidly changing details of how the code actually accomplishes its job.

But of course, lots of programming languages have abstractions like this. Any program longer than a dozen lines has too many parts to keep in your brain all at once, and so all remotely modern programming languages have ways of dividing a program into smaller components, as a way to manage the complexity, so that the interface is simpler than the implementation, whether enforced by the programming language or a matter of the “honor system.” So in a broader sense of the word, all modern programming languages have some version of encapsulation.

One simple form of encapsulation – one that most object-oriented programming languages maintain as a layer within the class – is procedures, also known as functions, subroutines, or (as OOP calls them) methods. Rather than allow any line of code to jump to any other line of code, modern programming languages tend to group blocks of code together into procedures, and you can then change the contents of the procedure without affecting the outside code, and change the outside code without affecting the procedure, as long as they follow the same interface and contract.

The contract is usually at least partially a human-level convention. There’s not usually much stopping you from taking a procedure that is supposed to process some data and instead making it instead loop indefinitely or crash the program. But some of it, like the separation of the procedure from the rest of the program, and in many cases the number and types of values it is allowed to accept and return in an invocation, will be enforced by the programming language.

For example, variables declared inside the procedure are usually local, and there’s generally no way to reference them outside the procedure. The inputs and outputs are usually listed in a signature at the top of the procedure. Normally, outside code can only enter the procedure on its first line, rather than on an arbitrary line half-way through. In some programming languages – including Rust – procedures can even contain other procedures, which can only be called within the outer procedure.

But of course, modern programs are often more complicated than a mere handful of procedures. And so, modern programming languages (and again, the word “modern” here is being used in a very loose way) have another layer of encapsulated abstraction: modules.

Modules will generally contain a group of procedures, some externally accessible, and some not. And in non-duck typed languages, they will generally define a number of aggregate types, again some externally accessible, and some not. It is generally even possible to expose these types abstractly, so the existence of a type is accessible to the rest of the program, but not the record fields, or even the fact that it is a record type. Even C has this ability in its module system – C++ did not introduce it, just added an additional, orthogonal level of field-by-field access controls.

Seen from my pragmatic point of view, class-based encapsulation is not some special insight of OOP, but a specialized – or rather, tightly restricted – form of module. In an OOP programming language, we have this notion of a class, which is a special form of module (sometimes the only supported form, or sometimes even layered underneath a completely different, more traditional notion of module, for extra confusion). It’s just that, for a “class,” there can only generally be one primary type defined, which shares a name with the module itself, and where the fields of that type are given special protection against access by code outside the class.

Of course, there are other differences between a class and a module, but these have to do with the other pillars, and we will get to them later. For right now, we will just discuss the idea of a “class” as it relates to encapsulation – where a class is just a special module with one privileged, abstracted type.

And this is a reasonable way to write a module, but it’s not as special as object-oriented programming makes it out to be (especially once we discuss alternative approaches to the other pillars, but again, more later). There are some situations where a module doesn’t have any record type that it defines, which is awkward in programming languages like Java, where you have to define an empty record type anyway and still make a “class.” There are also situations in which a module defines multiple publically accessible types that are tightly entangled – and where the encapsulation between those types that OOP style would encourage you to do is more of a hinderance than a help.

Fundamentally, being able to hide the fields of a record from other modules is important, which is why even C supports it. It is even essential for implementing safe abstractions over unsafe features in Rust, such as for collections, where raw pointers have invariants in combination with other fields in the same record. But it is not new to OOP, and it is simply not the best choice for every possible type.

As evidence of this, in Java and Smalltalk, and to a lesser extent even in C++ or Python, the insistence on a one-type-per-class style of encapsulation means that you get these boilerplate methods like setFoo and getFoo. These methods do nothing but serve as field accessors for something that is fundamentally a dumb record type. In theory, this helps you if you want to change what happens when these fields are set or read, but in practice, the fact that they are raw field accessors is part of the contract. If they, for example, instead made a network call rather than just returning a value, that would strongly value the principle of surprise for such simply named methods.

It is far simpler to say:

pub struct Point {
    pub x: f64,
    pub y: f64,
    pub z: f64,
}

… than the Java idiomatic “JavaBean” equivalent from when I was a Java programmer (Java has apparently changed since then, but this is representative of many OOP programming languages including Smalltalk and many books on how to program):

class Point {
    private double x;
    private double y;
    private double z;

    double getX() {
        return x;
    }

    void setX(double x) {
        this.x = x;
    }

    double getY() {
        return y;
    }

    void setY(double y) {
        this.y = y;
    }

    double getZ() {
        return z;
    }

    void setZ(double z) {
        this.z = z;
    }
}

Such data types generally don’t use any of the other features that OOP classes get, such as polymorphism or inheritance. To use such features in such “JavaBean” classes would also violate the principle of least surprise. The “class” concept is overkill for these record types.

And of course, a Java developer (or Smalltalk, or C#) will say that by accessing the fields indirectly through these getter and setter methods, that they are future-proofing the class, in case the design changes (and in fact I was reminded to add this paragraph when someone on Reddit made exactly this point). But I find this disingenuous, or at least misguided – it is often used for structures internal to a portion of the program, where the far more reasonable thing to do would be to change the fields openly to all users of the structure. It is also extremely difficult to think of an unsurprising thing for these methods to do besides literally set or get a field, as the method name implies – making a network call, for example, would be a shocking surprise for a get or set method and therefore a violation of at least the implicit contract. In my time programming object-oriented programming languages, I never once saw a situation where it was appropriate for a getter or setter to do anything but literally get or set the field.

If code does change to require the getter or setter to do something else, I would rather change the name of the method to reflect what else it does, rather than pretend that’s somehow not a breaking change. fetchZFromNetwork or setAndValidateZ seem more appropriate than a getZ or setZ that does something more than the simple field access that we assume a setter or getter does. OOP’s insistence that every type should be its own code abstraction boundary is often absurd when applied to these lightweight aggregate types. These sorts of getters and setters are used to protect an abstraction boundary that shouldn’t exist and just gets in the way, and future-proof against implementation changes that shouldn’t be made without also changing the interface.

Setters and getters, in short, are an anti-pattern. If you intend to create an abstraction besides “data structure,” where validation or network calls or anything else beyond raw field accesses would be appropriate, then these get and set names are the wrong names for that abstraction.

Edit 2023-02-13 to add this paragraph: To be clear, these objections apply to properties as well. It’s not the syntactic inconvenience that I object to, but the entire notion that replacing field accesses with code transparently is a good thing to strive for, or an important possibility to leave open. I should hope that foo.bar = 3 would never make a network call in Rust! And what if it had to be async? It should be clear if I’m calling a function. Rust is about explicitness.

The get and set functions, in reality, are only used as wrappers to satisfy the constraints of object-oriented ideology. The future-proofing they purportedly provide is an illusion. If you provide “JavaBean” style types, or types with properties, over an abstraction boundary, you are in practice just as locked in as if you’d provided raw field access – the changes you are most likely to want to make to those structures would not allow shifting the getters and setters to maintain compatibility. Leveraging this future-proofing is likely to be completely impossible for the changes you’d want to make, and at best it would involve a horrendous hack.

Rust might seem to be the same as OOP languages in all of this; it superficially looks like it has something very similar to classes. You can define functions associated with a given type – and they are even called methods! Like OOP methods, they syntactically privilege taking values of that type (or references to those values) as the first argument, called the special name self. You even mark fields of a record type (called struct in Rust) as public or (by default) private, encouraging private fields just like in an object-oriented programming language.

According to this pillar, Rust seems pretty close to being OOP. And that’s a fair assessment, for this pillar, and an intentional choice to make Rust programming more comfortable to people used to the everyday syntax of OOP programming in C++ (or Java, or JavaScript).

But the similarity is only skin-deep. Encapsulation is the least distinct pillar of OOP (after all, all modern programming languages have some form of it), and the implementation in Rust is not bound with the type. When you declare a field private in Rust (by not specifying pub), that doesn’t mean private to its methods, that means private to the module. A module can provide multiple types, and any function in that module, whether a “method” of that type or not, can access all of the fields defined in that type. Passing around records is encouraged when appropriate, rather than discouraged to the point that accessors are forced instead, even in tightly-bound related code.

This is the first sign we see that Rust, in spite of its superficial syntax, is not an OOP programming language.

Future Posts

And at this point I’m going to have to pause for today.

Of course, encapsulation isn’t the only fancy thing OOP-style classes can do. If it were, classes wouldn’t have enamored so many people: it would simply be obvious to everyone that classes were nothing more than glorified modules, and methods nothing more than glorified procedures.

In the next posts of this series, we will discuss the other features associated with OOP, the two remaining traditional pillars of OOP, polymorphism and inheritance, analyze them from a practical point of view, and see how Rust compares with OOP as it comes to those pillars.

Next up will be polymorphism!

How to Write a JIRA Ticket in ... Relatively Few Steps

2022-10-31T00:00:00+00:00

If you’re confused by how to use JIRA effectively, do not worry! If you learn this process, which is ~~very simple~~ not literally impossible, you too can become ~~good at JIRA~~ ~~passingly competent at JIRA~~ not liable to being fired for being bad at JIRA.

Here are the steps:

Create personal TODO item to write JIRA ticket
- Accumulate requirements for JIRA ticket in personal notes
  - Often more complicated than the feature itself
    - This is the System Working™
- Write TODO items strategizing how to:
  - Share the JIRA ticket with other people
  - Connect it properly with other JIRA tickets
    - Advanced: Also epics, projects, or other meta-JIRA constructs
Write JIRA ticket
- Fail to understand what any of the fields are for
  - Oh, they’re required?
- Ask random people for appropriate values for required fields
  - Sometimes they never get back to you
  - Or they get back two days from then
  - In the meantime, forget you were writing a JIRA ticket
    - And then get reminded only by personal TODO list item
      - You did write one of those, right?
- Curse the names of whoever designed the schema
  - Find out it’s someone you actually liked
    - It made sense at the time
      - No, it cannot be changed now
Do follow up connecting JIRA ticket to other people’s JIRA
- Argue with people about whether JIRA set-up appropriate
- Reconcile said arguments
Relitigate everything at next stand-up meeting
- Potentially go back to beginning to write JIRA ticket again
Be too tired to code anymore
- What even is code?

First Impressions of Asahi Linux

2022-10-24T00:00:00+00:00

I bought my M1 Mac over a year ago with the intention of installing Asahi Linux on it, but I never got around to it until now. I am still thrilled to be using an ARM workstation made by a major computer manufacturer, and it’s good to be able to run the operating system of my choice on it (though macOS is acceptable for entertainment and video calls, Linux is what I work and do my organization in). And I don’t particularly do GPU-intensive things in my day to day computing – I run XMonad, of all things! – so I don’t really feel like I’m missing out by not having a “proper” graphics driver.

Installation

The Asahi Linux installation process, in spite of some dire warnings, was relatively friendly. It was a “wizard” process rather than a series of instructions to run individual commands that I would have to read off a website. Wizard is definitely better, because those instruction series almost always contain mistakes, assumptions of things you’d “obviously” do, or un-fleshed-out untested alternatives; NixOS in particular has stolen from me many hours of frustration I’ll never get back (and hours later of fixing configuration issues that resulted just from me following instructions from official materials).

So, I guess simply because I’m comparing it to NixOS, Asahi Linux felt extremely easy to install! I didn’t even mind that there wasn’t a concrete recommendation for how much space to give each operating system (although I would have appreciated it). The installer did, however, do two things that annoyed me.

The first thing was that it asked me if I wanted to enter some sort of an expert mode. It said that the questions it would ask in that mode were only interesting to developers, and while normally that would be “yes, absolutely me,” in this case I think they meant “developers of Asahi Linux” – so, not me. I wanted to say y out of curiosity, but I didn’t want to actually choose any wrong option and risk bricking my laptop – which I don’t think were the actual stakes, but I wasn’t entirely sure.

I really hope that if I’d said y, it would have been okay. I would hope that the default option in each “advanced” prompt would be the same as what I’d get if I didn’t do advanced options, but I didn’t really trust them to do that, and it was intimidating.

I’d much rather they said what the advanced options actually did, and reassured me that you could always go with the pre-set defaults if you were unsure, rather than just ask me if I wanted to do “expert mode.”

So that was a little annoying.

The second thing that annoyed me was something that the designers have definitely put some thought into, and I’m befuddled how they arrived where they did.

So, there is one point where the computer is turned off, and you must follow the instructions on how to turn the computer back on very carefully and particularly, or else there be dragons, because if you don’t boot it into recovery mode for the first boot, then Linux will never install.

That isn’t the problem. I appreciate them communicating the stakes, and communicating how it works. I’m sure it’s not their fault that you have to do this extra step, but rather something to do with how the M1’s firmware work. However, I am befuddled why they provide the instructions in the most detail on the laptop where you’re currently installing it – you know, a screen that’s immediately going to disappear as soon as you turn the computer off. There were 7 steps!

It appears that I was expected to:

Read all 7 steps very carefully
Memorize them (carefully!) and remember them when I turned the computer back on

Now, I have ADHD, so my short-term prospective memory is very poor. There’s also a high chance that I’ll get distracted while the computer’s off, and will have to come back to the turning-it-on step later. But even a neurotypical person can’t be expected to reliably remember how to do 7 steps carefully.

I took a picture of the instructions with my phone. I think they should have:

Suggested writing down or taking a picture of the instructions, because “careful” is likely not good enough for many people.
Included all 7 instructions in detail on the website, so if you fail to write it down, you get more than this condensced summary:

Once the first stage of the installation is done, you will have to reboot into 1TR mode (One True recoveryOS) in order to finish the install. Read the instructions that the installer prints carefully! Simply rebooting into the new OS won’t work until this is done. You need to fully shut down your machine, then boot by holding down the power button until you see “Entering startup options”, choose your new OS in the boot selector menu, and follow the prompts.

The website references the transient “instructions that the installer prints.” If anything, the installer should direct you to the website, which should give the instructions in equal detail to how the installer gives them:

In any case, what I actually did was panic, close the laptop, panic again, open it again, realize that made it turn on, and held down the power button – which worked, in spite of blatantly violating the instructions. So maybe warn people not to close and then re-open the laptop, while you’re at it?

… Perhaps it’s moves like this that prevented me from installing NixOS correctly, where they just kind of assume you wouldn’t do something that dumb.

First Boot

I haven’t dual booted a computer since I lived with my parents, and either had to share a computer with them (my Linux partition and their Windows) or later when I only had one computer that I could use in full privacy, but needed both Linux and a more “normal” OS – thus an iBook which ran Mac OS and a PowerPC version of Ubuntu. Even when I ran FreeBSD and other out-there OSes, I had a dedicated (old) full-tower desktop to run it from.

So the idea of dual-booting a “normal” OS that comes with the computer and the more “edgy” programmer-friendly OS that is Linux is quite nostalgic for me. I wondered whether there was any way to refer to macOS with capitalism-criticizing character substitutions a la Mi¢ro$oft: maybe macO$? And to be honest, I was even a little nervous that my IT/sysadmin skills had rotted a little bit since I was a kid. Even though this installer was bending over backwards to make everything easy, this was an alpha operating system unsupported by the workstation vendor.

But all went well.

Once you have it installed, the computer boots into Asahi Linux. You have to hold down the power button to get the boot menu – it uses a firmware-based boot manager to distinguish macOS and Linux. This is a little annoying, as I prefer being asked what operating system I want to boot every time in a dual boot set up, but I can deal with it.

The first boot requires a few remaining set up steps to select keyboard layout, language, and time zone, and also to name the computer and set up a default user. I named the computer protectorate as part of my forms-of-government naming scheme (my Dell laptop is palatinate), and in reference to that this is Linux acting in somewhat foreign territory, claimed by another Unix.

Once this set-up had been complete, I turned on Wi-Fi, which to my mild surprise worked immediately and like a charm, from the KDE-based graphical WiFi menu.

I mean, in all honesty, I kind of knew it would work the first time – that was the point – but I was still viscerally surprised. I guess I am used to the idea of getting Linux to run on a “new” or “odd” platform being an issue of chasing down driver after driver, so I’m happy that I have a distribution designed for basically exactly the computer that I have, even if it’s not a computer particularly associated with Linux.

Then, as soon as I’d verified that Linux worked, I very nervously rebooted the whole thing into macOS – which also worked. Yay!

Next Steps

So that’s where I am now.

To get my normal Linux workflow set up, I’m going to need XMonad and Dropbox. This should be interesting, as I understand neither of those things are Arch Linux packages on ARM, and Dropbox isn’t supported on Linux ARM at all (though you can maybe use their APIs directly to implement a janky home version?)

So, when I get that all set up, I will let you know in another post!

Pictures will come with the next blog post.

I make no promises as to schedule.

RAII: Compile-Time Memory Management in C++ and Rust

2022-10-11T00:00:00+00:00

I don’t want you to think of me as a hater of C++. In spite of the fact that I’ve been writing a Rust vs C++ blog series in Rust’s favor (in which this post is the latest installment), I am very aware that Rust as it exists would never have been possible without C++. Like all new technology and science, Rust stands on the shoulders of giants, and many of those giants contributed to C++.

And this makes sense if you think about it. Rust and C++ have very similar goals. The C++ community has done a lot over all these years to pioneer new programming language features in line with those goals. C++ has then given these features years to mature in its humongous ecosystem. And because Rust also doesn’t have to be compatible with C++, it can then steal those features without some of the caveats they come with in C++.

One of the biggest such features – perhaps the biggest one – is RAII, C++’s and now Rust’s (somewhat oddly-named) scope-based feature for resource management. And while RAII is for managing all kinds of resources, its biggest use case is as part of a compile-time alternative to run-time garbage collection and reference counting.

As an alternative to garbage collection, RAII has deficits. While many allocations are created and freed neatly in line with variables coming in and out of scope, sometimes that’s not possible. To fully compete with garbage collection and capture the diverse ways programs use the heap, RAII needs to be combined with other features.

And C++ has done a lot of this. C++ added move semantics in C++11, which Rust also has – though cleaner in Rust because Rust was designed with them from the start and so it can pull off destructive moves. C++ also has opt-in reference counting, which, again, Rust also has.

But C++ still doesn’t have lifetimes (Rust got that from Cyclone, which called them “regions”), nor the infamous borrow checker that goes along with them in Rust. And even though the borrow checker is perhaps the most hated part of Rust, in this post, I will argue that it brings Rust’s RAII-centric compile-time memory management system much closer to feature-parity with run-time reference counting and other run-time garbage-collection technologies.

I will start by talking about the problem that RAII was originally designed to solve. Then, I will re-hash the basics of how RAII works, and work through memory usage patterns where RAII needs to be combined with these other features, especially the borrow checker. Finally, I will discuss the downsides of these memory management techniques, especially performance implications and handling of cyclic data structures.

But before I get into the weeds, I have some important caveats:

Caveat: No Turing-complete programming language can completely prevent memory leaks. Even in fully-GC’d languages, you can still leak memory by filling up a data structure with increasing amounts of unnecessary data. This can be done by accident, especially when sophisticated callback systems are combined with closures. This is out of the scope of this post, which only concerns memory management issues that automated GC can actually help with.

Caveat #2: Rust allows you to leak memory on purpose, even when a garbage collector would have reclaimed it. In extreme circumstances, the reference counting system can be abused to leak memory as well. This fact has been used in anti-Rust rhetoric to imply its memory safety system is somehow worthless.

For the purposes of this post, we assume a programmer who is trying to get actual work done and needs help not leaking memory or causing memory corruption, not an adversarial programmer trying to make the system leak on purpose.

Caveat #3: RAII is a terrible name. OBRM (Ownership-Based Resource Management) is used in Rust sometimes, and is a much better name. I call it RAII in this article though, because that’s what most people call it, even in Rust.

The Problem: Manual Memory Management is Hard, GC is “Slow”

Caveat: To be clear, “slow” here is an oversimplification, and I address that more later. I mean it as a tongue-in-cheek way of saying that it has performance costs, whereas Rust and C++ try to adhere to a zero-cost principle.

So. C-style manual memory management – “just call free when you’re done with the allocation” – is error prone.

It is error prone when it is easy and tedious, because programmers can make stupid mistakes and just forget to write free and it isn’t immediately broken. It is error prone when multiple programmers work together, because they might make different assumptions about who is supposed to free something. It is error prone when multiple parts of the code need to use the same data, especially when that usage changes with new requirements and new features.

And the consequences of doing it wrong are not just memory leaks. Use-after-free can lead to memory corruption, and bugs in one part of the program can abruptly show up when allocation patterns change somewhere else entirely.

This is a problem that can be solved with discipline, but like many tedious clerical disciplines, it can also be solved by computer.

It can be solved at run-time, which is what garbage collection and reference counting do. These systems do two things:

They keep allocations from lasting too long. When memory becomes unreachable, it can be reclaimed. This prevents memory leaks.
They keep allocations from being freed early. If memory is still reachable, it will still be valid. This prevents memory corruption.

And for most programmers and applications, this is good enough. And so for almost all modern programming languages, this run-time cost is well worth not troubling the programmer with the error-prone tedious tasks of C-style manual memory management, enabling memory safety and resource efficiency at the same time.

GC (including RC) Has Costs

But there are costs to having the computer do memory management at run-time.

I lump mark-sweep garbage collection and reference counting together here. Both mark-sweep garbage collection and reference counting have costs above C-style manual memory management that make them unacceptable according to the zero-cost principle. GC comes with pauses, and additional threads, in the best case. RC comes with myriad increments and decrements to a reference count. These costs might be small enough to be okay for your application – and that’s well and good – but they are costs, and therefore they can’t be the main memory management model in C++ or Rust.

This is a complicated issue, and so before continuing, here comes another caveat:

Caveat: GC is not necessarily slower, but it does have performance implications that are often unacceptable for situations where C++ (or Rust) is used. To achieve its full performance, it needs to be enabled for the entire heap, and that has costs associated with it. For these reasons, C++ and Rust do not use GC. The details of these performance trade-offs are beyond the scope of this blog post.

A Dilemma

But C++ and Rust are not most programming languages. They face a dilemma:

On the one hand, manual memory management is unacceptably error prone for a high level language, a detail the computer should be able to handle for you.
On the other hand, run-time garbage collection violates a fundamental goal that C++ and Rust share: the zero-cost principle. Code written in these languages is supposed to be as performant as the equivalent manually-written C. To conform to that principle, reference counting (or GC) have to be opt-in (because, after all, sometimes manually written C code does use these technologies).

So, for the vast majority of situations, where a C programmer wouldn’t use reference counting (or mark-sweep), Rust and C++ need something more sophisticated. They need tools to prevent memory management mistakes – that is, to at least partially automate this tedious and error-prone task – without sacrificing any run-time performance.

And this is the reason C++ invented (and Rust appropriated) RAII. Instead of addressing the problem at run-time, RAII automates memory management at compile-time. Analogous to how templates and trait monomorphization can bring some but not all of the power of polymorphism without many of the run-time costs, RAII brings some but not all of the power of garbage collection without constant reference count updates or GC pauses.

But as we will see, RAII as C++ implements it only solves one of the two problems addressed by garbage collection: leaks. It cannot address memory corruption; it cannot keep allocations alive long enough for all the code that could possibly need to use it.

Raw RAII: How RAII Works on its Own

The simplest use case for RAII is underwhelming: it automatically inserts calls to free up heap allocations at the end of the block where we made the allocation. It replaces a malloc/free sandwich from C with simply the allocation side, by inserting an implicit (and unwritten) call to a destructor, which in its simplest version is an equivalent of free. And if that was all RAII did, it wouldn’t be that interesting.

For example, take this C-style (no RAII) code:

void print_int_little_endian_decimal(int foo) {
    // Little endian decimal print of `foo`
    // i.e. backwards from how we normally write decimal numbers
    // e.g. 831 prints out as "138"

    // Big endian would be too hard
    // Little endian is as always actually simpler platonically,
    // if somehow not for humans.

    // Yes, this only works for positive ints. It's an example.

    char *buffer = malloc(11);
    for(char *it = buffer; it < buffer + 10; ++it) {
        *it = '0' + foo % 10;
        foo /= 10;
        if (foo == 0) {
            it[1] = '\0';
            break;
        }
    }
    puts(buffer); // put-string, not the 3sg verb form "puts"
    free(buffer); // Don't forget to do this!
}

Just using RAII (and unique_ptrs, which are an essential part of the RAII model), but using no other features of C++, we get this very unidiomatic and unimpressive version:

void print_int_little_endian_decimal(int foo) {
    std::unique_ptr<char[]> buffer{new char[11]};
    for(char *it = &buffer[0]; it < &buffer[10]; ++it) {
        *it = '0' + foo % 10;
        foo /= 10;
        if (foo == 0) {
            it[1] = '\0';
            break;
        }
    }
    puts(&buffer[0]);
}

It doesn’t help us with our random guess of an appropriate buffer size, our awkward redundant attempts to avoid a buffer-overflow, or with any abstraction over the fact that we’re trying to implement a collection.

In fact, it makes the code more awkward, for a benefit that seems hardly worth it, to just automatically call free at the end of the block – which might not even be where we want to call free! We could instead have wanted to return the data to the caller, or inserted it into a bigger, greater data structure, or similar.

It’s a bit less ugly when you use C++’s abstractions. Destructors don’t have to just call free (or rather its C++ analogue delete) as unique_ptr’s does. Any C programmer can tell you that idiomatic C code is rife with custom free functions to free all of the allocations of a data structure, and C++ (and Rust) will choose which destructor to call for you based on the type of the data. Calling free when a custom destructor must be called is a common careless mistake in C. This is true especially among beginners, and (hot take!) making programming languages less needlessly tricky for beginners is a good thing for everybody.

We can combine RAII with other features of C++ to get this more idiomatic code, with the first do-while loop I’ve written in years:

void print_int_little_endian_decimal(int foo) {
    std::string res;
    do {
        res += '0' + foo % 10;
        foo /= 10;
    } while (foo != 0);
    std::cout << res << std::endl;
}

Does std::string allocate memory on the heap? Maybe it only does if the string goes above a certain size. But the custom destructor, ~std::string, will call delete[] only when the allocation was actually made, abstracting that question away, along with handling terminating nuls and avoiding overruns in a cleaner way.

This ability of RAII – to call custom destructors that abstract away allocation decisions – gets more impressive when we consider that many data structures don’t make just 0 or 1 heap allocations, but whole complicated trees of complicated heap allocations. In many cases, C++ (and Rust) will write your destructors for you, even for complicated types like this:

struct PersonRecord {
    std::string name;
    uint64_t salary;
};

std::unordered_map<std::string, std::vector<PersonRecord>> thing;

To destroy thing in C, you’d have to loop through the hash map, free all the keys, and then free all the values, which then requires freeing all the strings in each PersonRecord before freeing the backing for each vector. Only then could you free the actual allocations backing the hash map.

And perhaps a C-based hash map library could do this for you, but only by assuming that the keys are strings, and then taking a function pointer to know how to free the values, which would ironically be a form of dynamic polymorphism and therefore a performance hit. And the function to free the values would then still have to manually free the string, knowing which field of the PersonRecord was a pointer and duplicating that information between the structure and the manually-written “free” function, and still likely not supporting the small-string optimization that C++ enables.

In C++, this freeing code is all automatically generated. PersonRecord gets an automatic destructor that calls the destructor of each field (int’s destructor is trivial), and the destructors of std::unordered_map and std::vector are templated so that, at compile time, a fresh destructor is built from those templates that handles all of this, all without any indirect function calls or run-time cost beyond what manually would be written for exactly this data structure in C.

See, with RAII, a destructor isn’t just automatically and implicitly called at the end of a scope in a function, but also in the destructors of values (“objects” in C++) that own other values. Even if you do write a custom destructor for aggregate types, that just specifies what the computer should do on destruction beyond the automatic calls to the destructors of the fields, which are still implicit.

Ownership and its limitations

This is all possible based on the concept of “ownership,” one of the key principles of RAII. The key assumption is that every allocation has one owner at any given time. Allocations can own each other (forming a tree of allocations), or a scope can own an allocation (forming the root of such a tree). RAII then can make sure the allocation ends when its owner does – by the scope exiting, or when the owning object is destroyed.

But what if the allocation needs to outlive its parent, or its scope? It’s not always the case that a function has primitive types as its arguments and return value, and then only constructs trees of allocations privately. We need to take these sophisticated collections and pass them as arguments to functions. We need to have them be returned from functions.

This becomes apparent if we try to refactor our big-endian integer decimalizer to allow us to do other things with the resultant string besides print it:

std::string render_int_little_endian_decimal(int foo) {
    std::string res;
    do {
        res += '0' + foo % 10;
        foo /= 10;
    } while (foo != 0);
    return res;
}

int main() {
    std::cout << render_int_little_endian_decimal(3781) << std::endl;
    return 0;
}

Based on our previous discussion of RAII, you might assume that the ~std::string destructor is called on the end of its scope, rendering the allocation unusable for later printing, but instead this code “Just Works.”

We’ve hit one of many mitigations against the limitations of raw RAII that are necessary for it to work. This mitigation is the “Named Return Value Optimization (NRVO),” which stipulates that if a named variable is used in all of the return statements in a function, it is actually constructed (and destructed) in the context of the caller. It is misnamed an “optimization” because it’s actually part of the semantics: It eliminates entirely the call to the destructor at the end of the scope, even if that destructor call would have side effects.

This is just one of many ways RAII is made competitive with run-time garbage collection, and we can have values that live outside of a certain scope of a function. This one is narrow and peculiar to C++, but many of the others lead to interesting comparisons. In the next section, we discuss the others.

Filling the Gaps in RAII

Copying/Cloning

We’re going to start with one of the oldest of these: copying. When C++ was designed, the intention was that the programmer would not see a difference between types that don’t involve allocation (like int or double) and types that do (like std::string or std::unordered_map<std::string, std::vector<std::string>>.

When a function takes an int argument, as in print_int_little_endian_decimal, that integer is copied. Similarly, if we take a std::string argument without additional annotation, C++ will also make a copy:

int parse_int_le(std::string foo) {
    int res = 0;
    int pos = 1;
    for (char c: foo) {
        res += (c - '0') * pos; // No input validation -- example!
        pos *= 10;
    }
    return res;
}

int main(int argc, char **argv) {
    std::string s = argv[1];
    std::cout << parse_int_le(s) << std::endl;
    return 0;
}

This is indeed consistent. Treating ints and std::string objects in parallel ways is also in line with how higher-level programming languages sometimes work: a string is a value, an int is a value, why not give them the same semantics? Aliasing is confusing, why not avoid it with copying?

It’s made to work by an implicit function call. Just like destructor calls are implicit in C++, copying also calls a function in the types implementation. Here, it calls std::string’s “copy constructor.”

The problem here is that this is slow. Not only is an unnecessary copy made, but an unnecessary allocation and deallocation creep in. There is no reason not to use the same allocation the caller already has, here in s from the main function. A C programmer would never write this copying version.

The only reason this feature is allowed under C++’s zero-cost principle is because it is optional. It may be the default – and making it the default is one of the most questionable decisions C++ ever made – but we can still alias if we want to. It just takes more work.

Rust, as you can guess by my tone, requires explicit annotation to copy types that have an allocation. In fact, Rust doesn’t even use the term “copy,” which is reserved for types that can be copied without allocations. It calls this cloning, and requires use of the clone() method to accomplish it.

Some types don’t use an allocation, and “copying” them is just a simple memory copy. Some types do use an allocation, and “cloning” them requires allocating. This distinction is important and fundamental to how computers work. It’s relevant and visible in Java and even Python, and pretending it doesn’t exist is unbecoming for a systems programming language like C++.

Moves

Returning an allocation from a function can’t always use NRVO. So if you want your value to outlast your function, but it’s created inside the function (and therefore “owned” by the function scope), what you really need is a way for the value to change owners. You need to be able to move the value from the scope into the caller’s scope. Similarly, if you have a value in a vector, and need to remove the last value, you can move it.

This is distinct from copying, because, well, no copy is made – the allocation just stays the same. The allocation is “moved” because the previous scope no longer has responsibility for destroying the allocation, and the new scope gains the responsibility.

Move semantics fix the most serious issue with RAII: your allocation might not live exactly as long as its owner. The root of an allocation tree might outlive the stack-based scope it’s in, such as when you want to return a collection from a function. The other nodes of an allocation tree might leave that tree and be owned by another stack frame, or by another part of the same allocation tree, or by a different allocation tree. In general, “each allocation has a unique owner” becomes “each allocation has a unique owner at any given time,” which is much more flexible.

In Rust, this is done via “destructive moves,” which oddly enough means not calling the destructor on the moved-from value. In fact, the moved-from value ceases to be a value when it’s moved from, and accessing that variable is no longer permitted. The destructor is then called as normal in the place where the value is moved to. This is tracked statically at compile-time in the vast majority of situations, and when it cannot be, an extra boolean is inserted as a “drop flag” (“drop” is how Rust refers to its destructors).

C++ didn’t add move semantics until C++11; it was not part of the original RAII scheme. This is surprising given how essential moves are to RAII. Returning collections from functions is super important, and you can’t copy every time. But before C++, there were only poor man’s special cases for move, like NRVO and the related RVO for objects constructed in the return statement itself. These have completely different semantics than C++ move semantics – they’re still more efficient than C++ moves in many cases.

When C++ did eventually add moves, the other established semantics of C++ forced it to add moves in a weird and deeply confusing way: it added “non-destructive” moves. In C++, rather than the drop flag being a flag inserted by the compiler, it is internal to the value. Every type that supports moves must have a special “empty state,” because the destructor is called on the moved-from value. If the allocation had moved to another value, there would be no allocation to free, and this had to be handled by the destructor at run-time, which can amount to a violation of the zero-cost principle in some situations.

C++ justifies this by making moves a special case of copy. Moves are said to be like copies, but make no promises of preserving the initial value. In exchange, you might get the optimization of being able to use the original allocation, but then the initial value will not have an allocation, and will be forced to be different. This definition is very different than what moves are actually used for (cf. the name of the operation), and therefore, even though it is technically simple, claiming that focusing on that definition (as Herb Sutter does) will simplify things for the programmer is disingenuous, as I discuss in my post on move semantics.

In practice, this means that all types support the operation of moving – even ints – but even some types that manage an allocation might fall back on copying if moves haven’t been implemented for them. This inconsistency, like all inconsistencies, is bad for programmers.

In practice, this also means that moved-from objects are a problem. A moved-from object might stay the same, if no moving was done. It might also change in value, if the move caused an allocation (or other resource) to move into the new object. This forces C++ smart pointers to choose between movability and non-nullability – no moveable, non-nullable pointer is possible in C++. Nulls – and the other “moved-from” empty collections that you get from C++ move semantics – can then be referenced later on in the function, and though they must be “valid” values of the object, they are probably not the values you expect, and in the case of null pointers, they are famously difficult values to reason about.

This is a consequence of the fact that C++ was a pioneer of RAII semantics, and didn’t design RAII and moves together from the start. Rust has the advantage of having included moves from the beginning, and so Rust move semantics are much cleaner.

In Rust also, all types can be moved. But in Rust, no resources or allocations are ever copied. Instead, moves always have the same implementation: copy the memory that is stored in-line in the value itself, and then do not call the destructor. For copyable types like int that do not manage an allocation or other resource, this does amount to a copy, but the original is still not usable. But no allocation or resource is ever copied; for those types, the pointer or handle is simply brought along bit-by-bit just like other data, and the old value is never touched again, making this a safe operation.

All types must then be written in such a way to assume that values might not stay in the same place in memory. If some operations on a type can’t be written that way, they can be defined on “pinned” versions of that type. A pin is a type of reference or box that promises that the pointed-to value will never move again. The underlying type is still movable, but these particular values are not.

This is a gnarly exception to Rust’s “all types can be moved” rule that make it false in practice, though still true in pedantic, language-lawyery theory. But that’s not important. What is important is that Rust’s move semantics are consistent, and do not rely on move constructors and manual implementations of Rust’s drop flags within the object. The dangerous possibility of interacting with a moved-from object, whose value is unpredictable and quite possibly a special “empty” state like null, is not present in Rust.

Borrows in Rust

While moves cover returning a collection (or other resource-managing value) from a function, they don’t cover passing such a value into a function, or at least not in the general case. Sometimes, when we pass a value into a function, we want to move the value in, so that the function can consume it or add it to an allocation tree (like inserting into a collection). But most times, we want the function to be able to see and perhaps mutate it, but then we want to give it back to the owner.

Enter the borrow.

In Rust, borrows are commonly introduced as a sort of an improvement on moves. Consider our example function that parses a string to an int, here implemented in C++ with copies:

int parse_int_le(std::string foo) {
    int res = 0;
    int pos = 1;
    for (char c: foo) {
        res += (c - '0') * pos; // No input validation -- example!
        pos *= 10;
    }
    return res;
}

Here is a Rust version, with moves, so that the function consumes the string:

use std::env::args;

fn parse_int_le(foo: String) -> u32 {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    res
}

fn main() {
    let mut args: Vec<String> = args().collect();
    println!("{}", parse_int_le(args.remove(1)));
}

As we can see with the “move” version of this, we are in the awkward position of removing the string from the vector, so that parse_int_le can consume the string, so it doesn’t have multiple owners.

But parse_int_le doesn’t need to own the string. In fact, it could be written so that it can give the string back when it’s done:

fn parse_int_le(foo: String) -> (u32, String) {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    (res, foo)
}

“Taking temporary ownership” in real life is also known as borrowing, and Rust has such a feature built-in. It is more powerful than the above code that literally takes temporary ownership, though. That code would have to remove the string from the vector and then put it back – which is even more inefficient than just removing it. Rust borrowing allows you to borrow it even while it’s inside the vector, and stays inside the vector. This is implemented by a Rust reference, which has this borrowing semantics, and is, like most “references,” implemented as a pointer at the machine level.

In order to accomplish these semantics, Rust has its infamous borrow checker. While we are borrowing something inside the vector, we can’t simultaneously be mutating the vector, which could cause the thing we’re borrowing to move. Rust statically ensures that this is impossible, rejecting code that use a reference after a mutation, destruction, or move somewhere else would invalidate it.

This enables us to extend the RAII-based system and both prevent leaks and maintain safety, just like a GC or RC-based system. The borrow checker is essential to doing so.

For completeness, here is the idiomatic way to handle the parameter in parse_int_le, with an actual borrow, using &str, the special borrowed form of String that also allows slices:

use std::env::args;

fn parse_int_le(foo: &str) -> u32 {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    res
}

fn main() {
    let args: Vec<String> = args().collect();
    println!("{}", parse_int_le(&args[1]));
}

Dodging memory safety in C++

In C++, of course, there is no borrow checker. In the parse_int_le example, it’s still possible to use a pointer, or a reference, but then you’re on your own. When RAII-based code frees your allocation, your reference is invalidated, which means it’s undefined behavior to use it. No coordination is performed by the compiler between the RAII/move system and your references, which point into the ownership tree with no guarantee that said tree won’t move underneath it. This can lead to memory corruption bugs, with security implications.

It’s not just pointers and references. Other types that contain references, such as iterators, can also be invalidated. Sometimes those are more insidious because intermediate C++ programmers might know about pointer invalidation, but let their guard down with iterators. If you add to a vector while looping through it, you’ve just done undefined behavior, and that’s surprising because no pointers or references even have to show up. Rust’s borrow checker handles these as well.

Even though the Rust borrow checker gets a bad reputation, its safety guarantees often make it worth it. It’s hard to write correct C++ when references and non-owning pointers are involved. Maybe some of you have that skill, and are unsympathetic to those who don’t yet have it, but it is a specialized skill, and the compiler can do a lot of the work for you, by checking your work. Automation is a good thing, and so is making systems programming more accessible to beginners.

And of course, many C++ programmers do make mistakes. Even if it’s not you, it might be one of your colleagues, and then you’ll have to clean up the mess. Rust addresses this, and limits this more difficult mode of thinking to writing unsafe code, which can be contained in modules.

Multiple Ownership

In RAII, an allocation has one owner at a time, and if your owner is destroyed before the allocation is moved to another owner, the allocation must be destroyed along with it.

Of course, sometimes this isn’t how your allocations work. Sometimes they need to live until both of two parent allocations are destroyed, and sometimes there is no way to predict which parent is destroyed first. Sometimes, the only way to solve that situation – even in C – is to use runtime information – and so you can model multiple ownership through reference counting: std::shared_ptr in C++, or Rc and Arc in Rust (depending on whether it is shared between multiple threads).

This is something that C programmers will sometimes do in the face of complicated allocation DAGs, and end up implementing bespoke on a framework-by-framework basis (cf. GTK+ and other C GUI frameworks). C++ and Rust are just standardizing the implementation of this, but, in line with the zero-cost rule, making it optional.

Interestingly enough, reference counting is implemented in terms of RAII and moves. The destructor for a reference-counted pointer decreases the reference, and cloning/copying such a pointer increases it. Moves, of course, don’t change it at all.

RAII+: What this all adds up to

Between RAII, moves, reference counting, and the borrow checker, we now have the memory management system of safe Rust. Safe Rust is a powerful programming language, and in it, you can write programs almost as easily as in a traditionally GC’d programming language like Java, but get the performance of manually written, manually memory managed C.

The cost is annotation. In Java, there is no distinction between “borrowing” and “owning”, even though sometimes the code follows similar structures as if there were. In Rust, the compiler must be informed about the chain of owners, and about borrowers. Every time an allocation crosses scope boundaries or is referred to inside another allocation, you must write different syntax to tell Rust whether it’s a move or a borrow, and it must comply with the rules of the borrow checker.

But it turns out most code has a natural progression of owners, and most borrows are valid in the borrow checker. When they’re not, it’s usually straight-forward to rethink the code so that it can work that way, and the resultant code is usually cleaner anyway. And in situations where neither of them work, reference counting is still an option.

At the cost of this annotation, Rust gives you everything a GC does: Allocations are freed when their handles go out of scope, and memory safety is still guaranteed, because the annotations are checked. Memory leaks are as difficult as in a reference counting language, and the annotations are checked, which is most of the benefit of automating them. It’s an excellent happy medium between manual memory management and full run-time GC with no run-time cost over a certain discipline of C memory management.

Of course, other disciplines of C memory management are possible. And using this Rust system takes away flexibility that might be relevant to performance. Rust, like C++, allows you to sidestep the “compile-time GC” and use raw pointers, and that can often be better for performance. A recent blog post I read explores some of that in more detail; encouragingly, that blog post also considers RAII to be in-between manual memory management and run-time GC – serendipitously, because I had already drafted much of this post when it came out.

But the standard memory management tools of Rust cover the common cases well, and unsafe is available for when it’s inappropriate – and can be wrapped in abstractions for interfacing with code that uses the RAII-based system.

In C++, the annotations of “borrows” vs “moves” can easily result in undefined behavior. Leaks are prevented, but memory corruption is not. So the C++ system is a much worse replacement for garbage collection – RAII is only doing some of its job, as it is not paired with a borrow checker.

Cycles

I leave the most awkward topic for the end. We’ve talked about allocation trees and DAGs, but not general graphs. These require unsafe in Rust, even something as supposedly basic as doubly linked lists. It’s against the borrow checker’s rules, and the compiler will statically prevent you from making them using safe, borrowing references. They simply aren’t borrows in the Rust sense, but are rather something else, something about which Rust doesn’t know how to guarantee safety.

This is not as bad as you might think, because cycles also form a hole in reference counting, which is a popular run-time GC system. This is why you can’t use Rc or Arc to implement a doubly-linked list correctly in Rust either: You’ll get past the borrow checker and guarantee a memory leak.. These systems generally can’t detect cycles at all, and leak them, which is arguably worse than forbidding them to be created.

In any case, the unsafe keyword is not poison. For things that Rust doesn’t know how to keep safe, you need to exercise extra responsibility, but at least the programming language is making you aware of it – unlike C++, which is unsafe all the time.

A Strong Typing Example

2022-09-15T00:00:00+00:00

I’m a Rust programmer and in general a fan of strong typing over dynamic or duck typing. But a lot of advocacy for strong typing doesn’t actually give examples of the bugs it can prevent, or it gives overly simplistic examples that don’t really ring true to actual experience.

Today, I have a longer-form example of where static typing can help prevent bugs before they happen.

The Problem

Imagine you have a process that receives messages and must respond to them. In fact, imagine you have potentially many such processes, and want to write a framework to handle it.

The incoming messages are expected to be in JSON, and the responses are also supposed to be in JSON. So your framework parses the incoming messages from JSON before passing it to the application’s callback function, and then serializes the results.

In Rust, the interface for the callback would look something like this (Value is a parsed JSON type from serde_json:

trait MessageHandler {
    fn handle_message(&self, input: Value) -> Value;
}

In a dynamically-typed language like Python, the callback function would look more like this:

def handle_message(self, input):

The code in the callback would then (hopefully) validate the JSON to make sure it meets the expect schema, and if it’s not, return some error in the reply message. In a programming language like Python (I make no promises that my Python is idiomatic or accurate; it’s meant as an example of a duck-typing language), it perhaps could be written like this:

if not self.is_valid_input(input):
    return {"error": "Invalid input", "input": input}

If the JSON is in a valid format, it would do some processing and return a non-error result.

The framework code, in order to do this, runs code that looks something like this (in pseudo-Python):

input = conn.recv_message()
input = json_parse(input)
output = handler.handle_message(input)
output = json_serialize(output)
conn.send_response(output)

And all of this will work just fine.

Except… what if the input isn’t valid JSON? And what if none of our test cases considered this possibility, but it nevertheless arises in production? What if we didn’t even write test cases?

Some Attempts to Solve

Making sure we catch the error at all

In Rust, we would already have a hint that there’s something wrong. JSON parsing in Rust is a function that can fail, and that is reflected in the type of the function to parse JSON, which looks something like this:

pub fn from_slice(v: &[u8]) -> Result<Value>

The Result means that this function can fail. We have to handle that failure in some way before we can get the resultant type. We can crash the whole program:

let input = from_slice(&input).expect("Invalid JSON");

NB: Reusing the name input like this with a different type is allowed in Rust; this declares a new variable that shadows the old one. This is idiomatic when the value is being transformed and we don’t need the old form anymore.

Or we can do what Python will likely do by default, and bubble the error up to the caller of the current function:

let input = from_slice(&input)?;

Or we can handle the error. And in this case, we should handle the error in some way, as we need to reply to the message whether it’s in JSON or not, and so we don’t want to skip over the code that does the reply.

Already, Rust’s typing discipline is helping us. In order to do what Python does by default, we need to at least opt in with a ?. Admittedly, the programmer may do that on autopilot, but it at least gives the programmer a hint that there might be an issue worth spending a second or two considering before moving onwards.

What to do with the error?

But let’s assume that the programmer did, in fact, realize that these errors need to be handled. What should we do in case of an error?

One possibility is to handle it completely in the framework. If we know all inputs must be valid JSON, we can take this burden off of the application code:

try:
    output = json_parse(input)
except JsonError:
    output = {"error": "Invalid JSON"}

But what if we want to give the application-writer more flexibility? What if we envision a situation where the application-writer wants to accept either JSON or non-JSON data?

In a duck-typed programming language like Python, if the parsing fails, we can simply pass the original input to the handler. This is really easy to do.

try:
    input = json_parse(input)
except JsonError:
    pass

Now, the handler function just needs to ensure that the passed-in value is a dictionary in our validation:

def is_valid_input(input):
    if type(input) is not dict:
        return False
    if 'requiredField' not in input:
        return False
    return True

Of course, we might forget to do that, and if we do, we might now throw an exception when we run the not in test, which throws an exception if input is not in fact a dictionary. This would be bad, as not even all JSON parses to dictionaries, but it’s a mistake someone could make if they’re not thinking about error handling.

In Rust, we can’t pass the initial input directly to the handler, as it would be a different type. So if we try to do the direct equivalent to the Python, it gives us an error:

let input = match from_slice(&input) {
    Ok(parsed_value) => parsed_value, // This is the parsed value, type `Value`
    Err(_) => input, // This is the raw `Vec<u8>` data... TYPE MISMATCH!
}

We are then forced to brainstorm another solution, which might raise ideas we didn’t otherwise consider, and force us to backtrack in our design a little, which is actually a good thing because this solution, while simple in Python, has some flaws.

Here’s some solutions we might brainstorm:

Call a different callback in handler for unparsed data
- Application specifies whether data should be parsed
- Framework chooses which callback to call dynamically
Use an enum

That last one is interesting. If we do want to create a value that can contain either Value or Vec<u8>, we still can in Rust. We just have to create a new type that tells the compiler we want that:

enum IncomingMessage {
    Parsed(Value),
    Unparsed(Vec<u8>),
}

Then, before we can do any work on the wrapped Value, we have to say what happens if it’s actually a Vec<u8>:

let input = match input {
    Parsed(value) => value,
    Unparsed(_) => {
        // return an error JSON blob
    }
}

In fact, this even helps with the fact that not all parsed JSON is a dictionary, as serde_json::Value is itself an enum!

Further Problem

But even if we do correctly validate that we have a dictionary, and we output an error in our message response if we don’t, I want to point back to our original pseudo-Python for what error to output:

if not self.is_valid_input(input):
    return {"error": "Invalid input", "input": input}

If input is JSON parsed into a dictionary, it will definitely serialize back into JSON, and this line makes sense. But now that input might not be parsed JSON, but instead might be in some sort of raw format, this dictionary might fail to serialize back into JSON.

Conclusion

A lot of programming is converting data from one format to another and validating it. Strong static typing systems like Rust’s can help prevent mistakes before they happen, and force people to come up with more rigorous designs rather than shoe-horning different values into the same variable, which dynamic typing makes easy – too easy. I hope this example was relatable!

Blocking Sockets and Async

2022-08-08T00:00:00+00:00

Using async in Rust can lead to bad surprises. I recently came across a particularly gnarly one, and I thought it was interesting enough to share a little discussion. I think that we are too used to the burden of separating async from blocking being on the programmer, and Rust can and should do better, and so can operating system APIs, especially in subtle situations like the one I describe here.

Every async programmer learns early on not to call a blocking function from an async function. If you do, it is a hidden color violation, as I discuss in a previous post. By “hidden,” I mean that unlike other color violations, Rust gives you no compiler-time help. You just have to use discipline. You just have to “make sure not to do it.” You just have to increase your cognitive load. It is a rule that the computer is no help with – which means that you’ll definitely mess it up at some point, possibly at many points.

Unfortunately, it’s also a gnarly problem to debug. The actual blocking function call will quite possibly work just fine. It’ll return when the resource is ready, and block until then – probably exactly what you wanted. It’s the rest of the system that falls apart – other tasks on the same thread starve, tasks that are depending on them for progress also starve, but meanwhile other tasks might proceed without a problem. Worse, there’s no guarantee that the bug will manifest every time, so the bug isn’t readily reproducible.

You might think this is an easy problem to address, either through improvements in the programming language or better programming discipline.

At a programming language level, you could imagine Rust having some sort of generalization of unsafe, or maybe an effects system. Functions that block would have blocking as part of their signature. Calling a blocking function from an async function would then be an error, with a way out for functions like spawn_blocking.

Unfortunately, Rust doesn’t have this feature, so we have to rely on programmer discipline. The discipline seems easy enough: If you’re in an async function, and you call a function that’s going to take some time or do I/O, make sure you’re doing an async call, which in most cases means using the async keyword.

Unfortunately, this doesn’t work 100% of the time, because the operating system isn’t on board. There are system calls that block sometimes, based on dynamic configuration. Does the recv system call block? Well, that depends on whether the socket is a blocking socket, or a non-blocking socket. Fundamentally, recv is run-time polymorphic on socket type, in a way that makes it a different color based on run-time information.

This is bad design: BSD should have split recv into two system calls, recv or recv_nonblock. recv could error if given a non-blocking socket, and recv_nonblock could error if given a blocking one. Linux at least has a flag MSG_DONTWAIT that makes an individual recv call unconditionally non-blocking, but it’s non-standard. It’s not supported on macOS and tokio/mio understandably doesn’t use it.

Most of the time, this isn’t an issue. Sockets controlled through tokio or other async runtimes are always configured with the operating system to be non-blocking, as an invariant on those socket types. Sockets controlled through std or other libraries will be blocking, and will be contained in completely different Rust types. The Rust type system is used to keep track of the distinction even if the operating system won’t.

But this becomes an issue where these boundaries are broken, namely in conversion functions between them. These methods then have whether or not a socket is blocking as part of their contract. For example, the documentation for TcpStream::from_std says:

This function is intended to be used to wrap a TCP stream from the standard library in the Tokio equivalent. The conversion assumes nothing about the underlying stream; it is left up to the user to set it in non-blocking mode.

Thus, as a precondition of calling the from_std function, you must pass a “non-blocking” socket. If you instead did not set the socket as non-blocking – perhaps because you were making it with some extra options you needed, but assumed that tokio would handle the non-blocking part – bad things happen.

If blocking were considered a safety issue, this function would be marked unsafe. But it’s not, and so it’s simply an unchecked precondition – and we’re not used to those in Rust. Most safe functions check their preconditions, either returning a special value (like an Err) or panicking if something is wrong. The ones that don’t are typically marked unsafe. Unchecked preconditions still exist – they cause rogue behavior but not behavior deemed “unsafe” under Rust’s definition – but they are rare, and therefore surprising to a Rust programmer.

Why is it not a checked precondition? That’s easy to answer: Checking it would take an extra system call, as would unconditionally setting it unblocked in that system call itself. System calls are slow, and that would be an unacceptable performance penalty for many applications.

This leads to a dissapointing end result, though. It’s not enough to simply make sure you don’t call I/O methods unless they come with an async version. To be disciplined enough to be an async Rust programmer, you also have to watch out for these extra unchecked preconditions.

Otherwise, you get a hidden color bug that’s even harder to track down because the blocking functions you’re calling don’t look blocking. tokio calls recv, thinking it’s not blocking, but it is. You expect tokio to be correct, but because of this broken invariant, it isn’t. These sorts of issues can be very hard and time-consuming to debug.

Why Rust should only have provided `expect` for turning errors into panics, and not also provided `unwrap`

2022-07-14T00:00:00+00:00

UPDATE 2: I have made the title longer because people seem to be insisting on misunderstanding me, giving examples where the only reasonable thing to do is to escalate an Err into a panic. Indeed, such situations exist. I am not advocating for panic-free code. I am advocating that expect should be used for those functions, and if a function is particularly prone to being called like that (e.g. Mutex::lock or regex compilation), there should be a panicking version.

UPDATE: This post by Andrew Gallant, author of the excellent ripgrep, is a good overall discussion of the topic I am trying to address here. I basically entirely agree with it and recommend it as very educational; specifically, I disagree only in that I think that linting for unwrap is a good thing, for the reasons he acknowledges but ultimately does not find compelling in that section. In his own terms, I just think that the juice is worth the squeeze.

I see the unwrap function called a lot, especially in example code, quick-and-dirty prototype code, and code written by beginner Rustaceans. Most of the time I see it, ? would be better and could be used instead with minimal hassle, and the remainder of the time, I would have used expect instead. In fact, I personally never use unwrap, and I even wish it hadn’t been included in the standard library.

The simple reason is that something like expect is necessary and sometimes the best tool for the job, but it’s necessary rarely and should be used in the strictest moderation, just like panicking should be used in strictest moderation, and only where it is appropriate (e.g. array indexing, for reasons I elaborate on later). unwrap is too easy and indiscriminate, and using it at all encourages immoderate use.

This has turned out, much to my surprise, to be a somewhat controversial stance, and so I’d like to take some time to explain why I feel that way.

I’ll begin by reviewing what Result is and what options we have for dealing with its recoverable errors.

`Result`s and what to do with them

Rust is widely and rightly praised for its use of Result for recoverable error handling. Instead of using exceptions like C++, which propagate invisibly and surprisingly, or using sentinal values like NULL and -1, Rust has sum types and thus, a function can return a value that is either an error (of a specified, potentially narrow type) or the value we want:

#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

If we have a function call foo() that can fail and therefore returns a Result, we have a few different tactics we can use to handle it:

Ignore: We can ignore the return value, and therefore also ignore whether it errors. This is almost never what we actually want, and so the #[must_use] annotation on Result causes a warning to be issued:

foo(); // WARNING

Manual We can manually match on the return value and do different things:

match foo() {
    Ok(value) => do_something(value),
    Err(err) => handle_error(err),
}

Propagate: We can propagate the error ergonomically using the ? operator. This makes Result work like exceptions in many of the good ways, while cluing in the reader to the additional control flow, which is good:

foo()?;

Panic with custom message: We can transform the error into an “unrecoverable” panic using expect, which takes a string argument which is used to customize the error message:

foo().expect("foo error");

Panic without custom message: We can transform the error into a panic with unwrap, which does not take a string argument and therefore leads to more generic error messages:

foo().unwrap();

Most of the time, in production code, we will want to go with the propagate option, especially in library code where the application will likely have a better notion of what to do with the error. This option makes the flow control clearer, tends to result in better error messages when the error messages are ultimately outputted, and gives the calling functions more options.

The manual option is useful even in a library for when an error is in fact recoverable at that particular point (e.g. by retrying). In an application, we’re often at a point where it makes sense to report the error (via a log message or console output or user-facing error message).

Sometimes, errors are in fact no big deal, and should be suppressed completely, but this is better expressed through the manual option at the application layer, with a comment explaining why the right thing is to do nothing.

But sometimes (and only sometimes), panic is appropriate, and for that, there are two options, unwrap and expect. I always prefer expect for this, and pretend that unwrap doesn’t exist, because it makes panicking too easy. To explain, I’d like to discuss in what situations I think panics are appropriate.

When to panic?

Escalating an Err result to a panic should be done in similar situations to when panic is appropriate in general, which the Rust book offers some guidance on.

The most clear-cut case is when a code path is a logic error, when the error is only possible if the programmer has made a mistake and an invariant has been broken.

A typical example is array indexing. We often find ourselves with an array index that we (think we) know is valid and we want to use it to index an array, because we got it from looping or otherwise operating on the array bounds. We’re not so confident that we want to use unsafe and do an unchecked array access – that could result in a security vulnerability if we’re wrong – but it would also be nonsensical to try to recover from such an invalid access.

For array indexing, this is actually the most common scenario, and so the index operator in Rust actually panics for us if we specify an out-of-bounds index. An unsafe checked array indexing method container is available, as is one that never panics and instead returns an Option, but most of the time we want the panicking checked operation and so that is the version that gets the syntactic sugar: arr[index] will neither memory corrupt nor return a recoverable error on an invalid index, but instead panic.

This is definitely the best default for array indexing. But sometimes, logic errors result in recoverable errors, in Err (or its equivalent in the Option world, None). For example, maybe you have an array-like data structure in which only a get method is available, which returns an Option. If you were confident in your indexing, you would want to panic on None, and you can call expect or unwrap to make that happen.

It is certainly more ergonomic to write expect than to do the match manually, and less likely to lead to mistakes:

let val = arr.get(i).expect("i should be valid index");
let val = match arr.get(i) {
    Some(val) => val,
    None => panic!("i should be valid index"),
}

Besides logic errors, panics are also relevant in test cases, where they are used to indicate test failure.

No need to panic: Propagation Made Easy

However, expect and unwrap – especially unwrap – are also amenable to overuse and misuse.

Perhaps you’re doing prototyping and just need something that works most of the time, or you’re writing a simple app with limited error-handling needs. Some people use unwrap and expect for this situation, but I don’t. I use ? even in that situation, because I never know when prototype code might have to escalate to production code – either so suddenly there’s no time for me to intervene and improve the error handling or so gradually there’s no occasion for it and it never gets prioritized. Fixing crappy usage of ? in such a situation is way easier and more likely to happen than fixing a bunch of expects or unwraps.

How can I prototype with ?? Doesn’t it require a lot of extra work, compared with unwrap? Honestly, not really. Writing Result<Foo> is not substantially harder than writing Foo for functions which can error. As for converting between error types, libraries like eyre and anyhow exist so that all errors can be included.

Example code similarly can be written with ?. This is important because Rust is rapidly growing and has a lot of new programmers using it. They see that a function returns a thing, and want to get to the thing and don’t know how to, and they see unwrap in the example code and they cargo cult it. Even if they have learned a thing or two about Rust, it does have the perfect type signature for their problem, and so they jump on it, and end up using it in prototype code and then trying to use it in production code. Perhaps they know about ?, but it has a higher barrier to entry, and so they’ll procrastinate learning about it.

In these situations, unwrap provides an easy, ergonomic way of calling a function that might error, and so it’s very tempting, like walking through the grass when there’s a paved path available. However, ? is generally preferable to unwrap or expect, and so the relative easiness is misaligned to the order of preferences.

And unfortunately, once code has been written using unwrap or expect heavily, it’s hard to adapt it to use ? and propagation, especially if those interfaces have come to be relied upon.

Why I prefer `expect` to `unwrap`

There are definitely legitimate use cases to turning an error into panic, but they are relatively rare, especially if the code is well-factored. Turning an error into a panic is also extremely tempting to be abused. The second situation is more common than the first, so in many codebases, the bad unwraps and expects, the sloppy “OK for now” ones or the “it’s just an example” ones outnumber the legitimate use cases.

Raising the barrier to entry seems like a good solution, and expect seems like the perfect balance. The error string can also serve as documentation of why this decision was made, like comments for unsafety. The fact that expect is a little less ergonomic is a feature, as it discourages casual use. expect has enough convenience to encapsulate the concept of escalation from a “recoverable” error to an “unrecoverable” error, but not so much that it competes with ? in ergonomics.

expect’s error message can serve as a comment as to why the panic is justified. Comments are a good thing, and for as questionable an operation as escalating an Err to a panic, it’s useful to explain why we think it will never happen even if we think it’s obvious. Like the comments recommended for unsafe blocks, I think that expect is a situation that deserves some indication to the reader as to why the author thinks this is OK.

Why have this in the error message rather than just a comment? expect’s error message is also helpful in debugging. unwrap can give good error messages, printing the error value and providing a backtrace, but in other configurations and deployments you might not see a backtrace and the error value might not be useful. Some unwrap calls might provide good enough error messages sometimes, but it doesn’t work 100% of the time, so it can’t be relied upon – especially when expect is readily available. Especially in the case of a logic error, when the condition was thought impossible, debugging will already be hard, and the person doing the debugging needs all the help they can get.

Objections

When I’ve expressed my opinions about unwrap before, one objection stands out in my mind as particularly interesting and particularly valid. I say above that legitimate use cases to turning an Err into a panic are rare, which is generally true, but sometimes can seem false. There are certain APIs where it comes up a lot, APIs where Errs frequently are actually logic errors.

For example, regular expressions. The regex crate uses a method called new that is used to prepare regular expressions. It is practically always called on a constant string, making any failure a logic error, which should result in a panic, as discussed above. However, this same new method returns a Result, necessitating an unwrap or an expect to make the logic error into a panic. Am I seriously suggesting that the poor user write .expect("bad regular expression") instead of .unwrap() every time?

Well, that puts regex compilation in the same category as array indexing in my mind, and means that the default regex compilation function should panic on the user’s behalf (of course, the Result version should still be possible, just as get is a possible function for slices).

Similarly, when I’ve expressed my opinions about unwrap, some have assumed I’m opposed to panics altogether, and asked me if I used array indexing, implying that if I accept the possibility of panics in array indexing, I should accept the possibility of panics in unwrap as well.

For both of these objections, I want to clarify something: I’m not opposed to panicking in logic error situations. But that does not imply that unwrap is a good idea. Most Errs are not logic errors, and so converting one to a panic should be a little inconvenient, and should require the user to think enough to write an error message.

For those situations where an error is actually likely to be a logic error, such as array indexing or regex compilation, returning Result need not be the function’s default behavior. Perhaps the author of regex can make new panic on compiler error, and another function can be written for when the regex in question was user inputted, or where a regex compilation error would not be a logic error.

In general, when you find yourself using expect or unwrap over and over again in the same way, and you’re sure it’s legitimate each time, do what you do with all smelly-seeming code if you know it’s actually the right thing in spite of the smell: Wrap it in an abstraction. Put it in a function that calls expect to panic on error.

This is not cheating. This new, panicking function would instead serve as a documentation for the fact that in this context, an Err is in fact likely to be a logic error, a tangible paper trail that someone made a conscious call that, as a policy, panicking is appropriate in this instance. The decision to panic instead of returning an Err in this situation is made in one place instead of many, where it can be explained in a detailed comment if desired, and where it certainly won’t be too much of a burden to use expect instead of unwrap. Even the fact of the function existing and having a panic-based interface is a signal from the library author that they have thought about this issue, and deemed the situation to be more analogous to array indexing than, say, a file-not-found.

Tendencies and Statistics

In any case, array indexing and regex compilation are the exceptions, not the rule. Almost all bounds checks failures may be logic errors. Almost all regex compilation errors may be logic errors. Making these functions panic would indeed do little damage, as panicking is almost always the right move.

But – and this is a big “but” – most functions, when they return Err, genuinely are signalling recoverable errors, and unwrap doesn’t discriminate – it works equally well on all of them, in the inappropriate situations as well as the appropriate situations. With array indexing and regex compiling, the nature of the function being called gives some indication of why it’s a logic error; with unwrap, there is no indication.

Generally, this argument is in terms of statistics and human nature, not in terms of absolutes. Turning an Err into a panic should be rare, not necessarily in terms of how often it happens, but in how often it shows up in code. If it is common, either the programmer is using bad practices, and should be using better practices, or the API has a design flaw, and that needs to be fixed. In either case, expect is better than unwrap.

Ideally, we don’t get used to seeing expect and unwrap being used all the time. We don’t get used to casually panicking on Err, but instead treat panicking like an operation that should be considered carefully, whether once for all instances of a specific call (as in array indexing or regex construction), or on a case-by-case basis (for other uses of expect).

Humans are creatures of habit and lazy by nature. unwrap is a powerful tool, a way to get around the type system, and as such, we might find ourselves addicted to it. We should treat even expect as mildly suspicious, something only to be used with consideration, something to be wrapped behind an abstraction (as in the regex case). unwrap is even more dangerous, because it is easier, and given that legitimate usage of except should be rare (again in terms of lines of code, not frequency of invocation) and hidden behind an abstraction when it is common in frequency of invocation, I see no need for unwrap to exist.

Context

I am aware that removing unwrap from Rust is not a viable option at this point, which is why I said that I wish it was never put in Rust to begin with. I am aware that unwrap is used in the Rust compiler, and that there is no consensus to avoid unwrap to the level that I avoid it.

I will however note that the documentation of unwrap comes with a warning not to use it. The warning is framed in terms of the fact that unwrap may panic, but the documentation of expect, where this is equally true, does not come with such a warning.

Conclusion

Escalating an Err to a panic is sometimes appropriate. But it should be a considered choice, either on a function-by-function basis (through a wrapper function calling expect or a different choice of interface), or on a case-by-case basis. In either case, unwrap makes it too easy.

Including an error message, and documenting why a panic is appropriate (either through the error message or separately) should not be too much to ask. If it is, that’s a code smell. The fact that expect is more difficult is a feature.

In this article I have mentioned only briefly the other motivation for using expect – better error messages for debugging. I thought the code smell argument was more important. But debuggability can be very important as well, so I’ll discuss it briefly here. I don’t think it’s safe to assume backtraces will always be available. I don’t think it’s safe to assume every use of unwrap will print a useful error message, even if it sometimes can. Maybe an individual use of unwrap in one context does not cause this problem, but once unwrap is established as acceptable, it opens the door for it to be abused.

I personally do not use unwrap, nor do I sign off on code that does. I even prefer expect("foo") to unwrap, because it signals that it’s off-the-cuff example code and shows that the person writing it knows that more consideration would be needed to put it into production. Please consider joining me in this approach.

If you do not want to implement so strict a policy, and you think I’m too extreme in this way, hopefully this article at least makes my argument clearer, and explains why I do not call unwrap but still feel comfortable indexing my arrays. Hopefully also this has given food for thought about Results, errors, and panics.

Edits

This post has been edited to clarify certain things, including a clarification in the opening to the post to make sure my overall position is easily comprehensible.

Another Confusing Haskell Error Message

2022-06-17T00:00:00+00:00

The Error Message

I’ve written before about just how befuddling Haskell error messages can be, especially for beginners. And now, even though I have some professional Haskell development under my belt, I ran across a Haskell error message that confused me for a bit, where I had to get help. It’s clear to me now when I look at the error message what it’s trying to say, but I legitimately was stumped by it, and so, even though it’s embarrassing for me now, I feel the need to write about how this error message could have been easier to understand:

frontend/src/Frontend/WordTiles.hs:87:25-45: error:
    • Could not deduce (HasDomEvent t () 'ClickTag)
        arising from a use of ‘domEvent’
      from the context: (DomBuilder t m, PostBuild t m, MonadHold t m,
                         MonadFix m)
        bound by the type signature for:
                   app :: forall t (m :: * -> *).
                          (DomBuilder t m, PostBuild t m, MonadHold t m, MonadFix m) =>
                          m ()
        at frontend/src/Frontend/WordTiles.hs:(70,1)-(76,9)
    • In the expression: domEvent Click submit
      In an equation for ‘click’: click = domEvent Click submit
      In the second argument of ‘($)’, namely
        ‘do inputText <- fmap value $ inputElement $ def
            submit <- el "button" $ text "Submit"
            let click = domEvent Click submit
            pure $ current inputText <@ click’
   |
87 |             let click = domEvent Click submit
   |                         ^^^^^^^^^^^^^^^^^^^^^

The code in question was in the Reflex FRP’s “widget” monad, defined as usual by a number of monad typeclasses:

app
  :: ( DomBuilder t m
     , PostBuild t m
     , MonadHold t m
     , MonadFix m
     )
  => m ()
app = do
    let
        start = Game [] wordSet "PIETY"
        moveAll word (gm, _) = move word gm
    rec
        game <- foldDyn moveAll (start, []) newWord
        gameDisplay game
        newWord <- fmap (fmap T.unpack) $ el "div" $ do
            inputText <- fmap value $ inputElement $ def
            submit <- el "button" $ text "Submit"
            let click = domEvent Click submit
            pure $ current inputText <@ click
    pure ()

My Confusion

Some of you might already see the problem, especially those who know Reflex. But I didn’t see it. My brain saw (HasDomEvent t () 'ClickTag) and completely misread it. I assumed it meant something like “with t as the tag, we can get the DOM event as 'ClickTag.” I assumed that the () was irrelevant to understanding the type, indicating some sort of optional type was not necessary to be provided.

I then tried to address this by adding (HasDomEvent t () 'ClickTag) to the context of app:

app
  :: ( DomBuilder t m
     , PostBuild t m
     , MonadHold t m
     , MonadFix m
     , HasDomEvent t () 'ClickTag
     )
  => m ()

It wasn’t the issue.

I had hoped this wasn’t the issue, but I thought it might be, and I had no idea what the issue actually was. Maybe we just needed to list all the DOM events t can handle, I had thought. I should’ve noticed it was t and not m, and I would expect m to be involved in such a context. I should have read the thing out loud in my head, and realized that it wasn’t t that didn’t have the DOM event of 'ClickTag, but (). But I didn’t. My eyes kind of glazed over at the complicated typeclass expression. I just didn’t think.

The Solution

The problem, a friend had to tell me, was nothing to do with t and everything to do with (). submit was not, as I had thought, a representation of the DOM element I had created with a button. To do that, you need to call el':

(submit, _) <- el' "button" $ text "Submit"
let click = domEvent Click submit
pure $ current inputText <@ click

submit, gotten from el, was actually of type (). And, of course, you can’t get any DOM event out of (), let alone a Click.

Better Error Messages

But while I left this situation with take-aways for myself, to better read Haskell error messages in the future, I was also frustrated at the Haskell compiler, especially in comparison to the Rust compiler I have gotten used to recently through my job.

List Involved Types

How on earth did it not indicate at all that (HasDomEvent t () 'ClickTag) was a problem with the type of submit? Sure, the constraint “arose” from the type of domEvent, but submit is clearly an important value involved in making the type not work.

This is easier to implement than a Haskell person might think. I understand that it’s unclear which type “caused” the problem from a human perspective. So why not list them all? Just a laundry list of inferred types would’ve been helpful: I would have seen that submit was of type (), and that would’ve helped me through the situation. Is that too much to ask? Something like this:

Related types:
domEvent :: HasDomEvent t a => EventName en -> a -> Event t (EventResultType en)
Click :: EventName ClickTag
submit :: ()

Any two of those types would have given me the hint I needed. Really, either domEvent or submit would have enabled me to figure it out.

Warn About `()` Bindings

Similarly, how on earth was I allowed to write this line without a warning:

submit <- el "button" $ text "Submit"

submit is invariately (). Shouldn’t binding a () value be at least a warning? In what possible situation would you want to do that? I know that situations exist, especially situations where a type is sometimes (), but this type is invariably (), and I have -Wall turned on in this project. I want warnings for things that there are occasionally legitimate use cases for. Binding a name to (), especially when it’s from a function call and not literally let unit = (), has got to be a mistake 99 times out of 100.

This is apparently not a warning in Rust either, and I am confused by that, because Rust is normally better about its warnings:

fn foo() {
}

fn main() {
    let x = foo(); // Compiles without warning!
    drop(x);
}

I think it would be a reasonable and useful warning in both programming languages. The opposite situation already provokes a warning in Haskell, where you have an action in a do-block that returns a value and you implicitly ignore it:

[jim@palatinate:~/Writing/TheCodedMessage/content/posts]$ ghci -Wall
GHCi, version 8.8.4: https://www.haskell.org/ghc/  :? for help
Prelude> do { pure 'x'; pure () }

<interactive>:1:6: warning: [-Wunused-do-bind]
    A do-notation statement discarded a result of type ‘Char’
    Suppress this warning by saying ‘_ <- pure 'x'’
Prelude>

It only makes sense that the converse mistake, which is even more likely to be a mistake, also have a warning.

Conclusion

Error messages are an extremely important part of a programming language, both for adoption and for programmer efficiency. Part of the point in working in a strongly-typed language with a sophisticated type system, like Rust or Haskell, is supposed to be that we discover most of our problems through compiler error messages, rather than through runtime bugs. So most of our troubleshooting will happen at compile time, grappling with these error message. This makes error messages in Haskell more important than in the average programming language, and makes the standard for good error messages even higher. We can do better than the status quo, and we should.

Command Line Interface UXes Need Love, Too

2022-06-16T00:00:00+00:00

It took me a long time to admit to myself that the venerable Unix command line interface is stuck in the past and in need of a refresh, but it was a formative moment in my development as a programmer when I finally did. Coming from that perspective, I am very glad that there is a new wave of enthusiasm (coming especially from the Rust community) to build new tools that are fixing some of the problems with this very old and established user-interface.

The Role of the Unix CLI Interface

To describe the Unix command line interface, “venerable” is definitely the right word: many programmers (including myself at some points of my life) have an awe of Unix and its role in computing history that has sometimes bordered on veneration.

Since the Unix operating system began development at Bell Labs in 1969, it has gone viral. That’s probably an understatement: Most modern operating systems descend from this original Unix, either directly through gradual code change (macOS and iOS are descended it from it through BSD), or through Linux (the kernel behind most servers and behind Android and ChromeOS) and its accompanying usermode software (much of which was part of the GNU project), which were designed to work like Unix due its familiarity for users and programmers.

Unix was and is billed not just as an operating system, but a philosophy. Among other things, its command line interface has been held up time and time again as an example of good design practices and an ideal realization of this philosophy, with its developer- and administrator-friendly orientation towards plain text files and with its modularity, especially as embodied in the concept of pipelining.

And as a result, when people say they know “the command line,” it’s almost certainly the Unix command-line interface that they’re talking about. And what’s more, many of us were taught it from texts that gushed about how great it is. But even the Unix command line interface, though part of a well-established standard, the topic of many books, and used by and intimately familiar to millions of programmers and admins across generations, is, in the end, just another computer interface for users and developers. And it has its flaws.

A Disappointing Ambiguity

As I alluded to before, when I was a much younger programmer, I had an awe-struck veneration for Unix. One of my colleagues at an early job in my career referred to me as our company’s “Unix philosopher.” While I wasn’t sure whether he meant it as a compliment, at the time, I took it as one.

The first flaw that really got my attention in the Unix command line had to do with the mv command. I’m going to take some time explaining this flaw in detail, as it’s somewhat subtle, and as discovering it was a formative moment for me in my development as a programmer.

mv, as many of you know, is short for “move.” And while its job indeed includes moving files from one place to another, due to idiosyncracies of the Unix file system (if they can be called idiosyncracies when most file systems followed Unix’s lead on this), moving files and renaming files are closely related operations under the hood, causing the mv command to be both the “move” command and the “rename” command:

# Assume a file called 'draft-file'
# Assume a directory called 'final-docs'

# Rename 'draft-file' to 'final-file' and put it in 'final-docs'
mv draft-file final-file # rename 'draft-file' to 'final-file'
mv final-file final-docs # move 'final-file' into 'final-docs' directory

# Alternatively, one step:
mv draft-file final-docs/final-file

As you can see, there is no distinction between these operations. There is no option that you must enable to get the “moving” feature as opposed to the “renaming” feature. And this can result in surprises, which are bad in software development.

Consider this command again:

mv draft-file final-file

What does it do? It changes the name of the file from draft-file to final-file, keeping it in the same directory, right? Well, probably, and that’s almost certainly what the user intended, but what if someone, accidentally or intentionally, had created a directory called final-file? That command would be interpreted instead as moving draft-file into the final-file directory:

$ # Rename operation
$ touch draft-file
$ ls
draft-file
$ mv draft-file final-file
$ ls
final-file
$ ls final-file
final-file
$ rm final-file
$
$ # Move operation
$ mkdir final-file # Imagine someone else did this, or it was done by accident
$ touch draft-file
$ mv draft-file final-file
$ ls
final-file
$ ls final-file
draft-file
$ rm final-file
rm: cannot remove 'final-file': Is a directory
$ rm -rf final-file

Notice that if there is no color-coding enabled, a simple ls command doesn’t even distinguish the two situations, so you can’t tell which one happened without issuing a more specific command, as ls also has a dual role: it can either show you the names of the files you specify, if they are present, or it can show you the files in a directory you specify. The -d option disambiguates that you want the names and not the contents, but the default is still ambiguous.

In the case of the mv command, this potentially could even be a security vulnerability in a shell script (which is admittedly not a very secure platform). It is in any case an unnecessary complication.

The GNU version of mv has a -t option to indicate that the destination is not to be interpreted as a directory to put things in, and a -T option to show unambiguous intent for a target directory to be used. But these are extensions; the POSIX standard manual page for mv doesn’t mention them.

And while this GNU extension is helpful, especially in scripts that you know will only be run with the GNU version of mv (that is, not on macOS), I don’t think it goes far enough. Most people don’t know about them, and the possibility of surprise is still there.

Disillusioned

When I realized this, it created a huge hole in my previous (admittedly unreasonable) esteem for the Unix command line interface. I realized that the ideal solution was something impractical, almost unthinkable to the younger version of me: mv should be deprecated in favor of two commands, one to do renaming, and one to do targeted directory-dropping.

This glitch in the mv command is just a gotcha to be aware of, one of many minor flaws to dance around when shell scripting. But I remember it strongly, because rather than being warned about it in a book, I discovered it myself, and therefore it was the distinct moment I realized that the command line interface would need to be improved at some point. And once the metaphorical levee was broken, I started noticing many inconveniences and problems in the traditional Unix CLI tools, often more relevant to my day-to-day workflow than this minor gotcha.

I ultimately came to read more critical sources about Unix, such as the famous UNIX-HATERS Handbook, and similar sources that emphasized the problems. And I’m very glad I went through this process, because before this, I was a naive CLI user and shell-scripter, trusting the system way more than I should, leaving myself open to serious problems.

Many Unix commands have gotchas and inconveniences, some I knew about before this revelation and brushed aside, others that I found out about later. tar has its idiosyncratic traditional syntax that many, many scripts (and people) still use, and inconsistency between platforms on whether you need -z to unpack a compressed archive. The way the shell itself worked also contained gotchas: What happens if you have files whose names start with a -? (Answer: Their names get misinterpreted as options, even if you didn’t type them but simply included them accidentally in a wildcard expansion.)

Among the more practical issues that particularly effect me, I want to emphasize two in particular: Why is find’s syntax so gnarly, so that you have to type out --name and explicitly specify the current directory? Why is it so hard to get grep to not display the pages-long lines of minimized Javascript or similar files when I want to only display the shorter lines from actual source files?

The Future

Luckily, improvement is on its way. For the last two cherry-picked examples, there are new re-conceptions of find and grep -r that fix them (with new names, of course, so they’re not beholden to interface-compatibility), and I recommend them (dare I say such blasphemy?) over the traditional equivalents:

Don’t let their long names dissuade you; they are commonly installed as fd and rg, respectively, and come with such modern features as:

Normal command line syntax (fd)
Integration with git, the de facto standard version control system, by ignoring .git and .gitignore’d files by default (both)
Line length maximums (rg)
Modern leveraging of multithreading (both)
Better performance than their traditional counterparts

These are the only new Rust-based commands I’ve tried, but they’ve already vastly improved my workflow, so that I miss having them (fd especially) when SSH’d into relatively minimalist embedded devices. And I have reason to hope there’s more gems out there as part of this explosive movement to implement new Rust-based commands.

Whether people are doing this to improve their Rust chops, or because they’ve felt a need for a long time and Rust is just their PL of choice, it’s good to see some actual evolution in my day-to-day experience as a Unix CLI user. It hasn’t fixed mv – yet – but it’s good to see it evolving.

On the implementation side of things, I am also very happy to see a Rust project to reimplement the standard coreutils. The C implementations undoubtedly leave some performance and stability on the table, and a new implementation is long over-due. A fresh implementation of these utilities will hopefully also spark improvements to the interfaces.

And Meanwhile, in `git`-land

On a related positive note, I learned very recently (in 2022) that git has (in 2019) fixed a problem similar to mvs: git checkout, ambiguous in a similar way, has been rendered unnecessary by the less ambiguous git switch and git restore.

Trivia About Rust Types: An (Authorized) Transcription of Jon Gjengset's Twitter Thread

2022-06-06T00:00:00+00:00

Preface (by Jimmy Hartzell)

I am a huge fan of Jon Gjengset’s Rust for Rustaceans, an excellent book to bridge the gap between beginner Rust programming skills and becoming a fully-functional member of the Rust community. He’s famous for his YouTube channel as well; I’ve heard good things about it (watching video instruction isn’t really my thing personally). I have also greatly enjoyed his Twitter feed, and especially have enjoyed the thread surrounding this tweet:

Okay, learning time! Name a @rustlang type (can be generic), and I’ll (try to) tell you something you didn’t know about that type!

What great fun!

I immediately felt that this thread should have a transcription outside of social media (Jon Gjengset already did a Reddit transcription), and so I asked him if he had any plans to turn it into a blog post, and failing that, whether I could. Much to my surprise, he gave me the go-ahead.

So I have done so, and this is the blog post! It wasn’t even boring, because I learned so much as I copied the entries! Minor edits have been made to add formatting and adapt links to how blogs work rather than how Twitter works. This is taken from the Reddit version. My markdown source is also available.

So, without further ado, Jon Gjengset’s “Trivia About Rust Types.”

Trivia About Rust Types (by Jon Gjengset)

`std::fmt::Debug`

Did you know that the Formatter argument to Debug::fmt makes it really easy to customize debug representations for structs, enums, lists, and sets? See the debug_* methods on it.

`Formatter`

Did you know that std::fmt::Formatter is super easy to use if you want more control over debugging for a custom type? For example, to emit a “list-like” type, just Formatter::debug_list().entries(self.0.iter()).finish().

`Option<T>`

Did you know that Option<T> implements IntoIterator yielding 0/1 elements, and you can then call Iterator::flatten to make that be 0/n elements if T: IntoIterator?

`type EmptyTupleList = Vec<()>`

Did you know that since () is a zero-sized type, and the vector never actually has to store any data, the capacity of Vec<()> is usize::MAX!

`T`

Did you know that T doesn’t imply ownership? When we say a type is generic over T, that T can just as easily be a reference to something on the stack, and the type system will still be happy. Even T: 'static doesn’t imply owned — consider &'static str for example.

[Reminds me of this excellent article -Jimmy]

`std::sync::mpsc::channel::Sender`

Did you know that std::sync::mpsc has had a known bug since 2017, and that the implementation may actually be replaced entirely with the crossbeam channel implementation? https://github.com/rust-lang/rust/pull/93563

`u128`

Did you know that even though we got u128 a long time ago now, we still don’t have repr(128)? https://github.com/rust-lang/rust/issues/56071

`std::ffi::OsString`

Did you know that there are per-platform extension traits for OsString that bake in the assumptions you can safely make on that platform? Such as strings being [u8] on Unix and UTF-16 on Windows.

`std::ptr::NonNull`

Did you know that one of the super neat features of NonNull is that it enables the same niche optimization that regular references and the NonZero* types get where Option<NonNull<T>> is the same size as *mut T?

`Cow<T>`

Did you know that there used to be a special IntoCow trait, but it was deprecated before 1.0 was released! https://github.com/rust-lang/rust/issues/27735

`Box<T>`

Did you know that Box<T> is a #[fundamental] type, which means that it’s exempt from the normal rules that don’t allow you to implement foreign traits for foreign types (assuming T is a local type)?

`std::process::Child`

Did you know that std has three different ways to spawn a child process on Linux (posix_spawn, clone3/exec, fork/exec) depending on what capabilities your kernel version has?

`Pin<T>`

Did you know that the name Pin (and the name Unpin) where both heavily debated? Pin was almost called Pinned, for example. The discussion is an interesting read now after the fact.

`Vec<T>`

Did you know that Vec::swap_remove is way faster than Vec::remove if you can tolerate changes to ordering?

Did you know that the smallest non-zero capacity for a Vec<T> depends on the size of T?

`CStr`

Did you know that CStr::default creates a CStr that points to a const string "\0" stored in the binary text segment, which means all default CStrs point to the same (non-null) string!

`for<'a> SomeTrait<'a>`

Did you know that you can use for<'a> to say that a bound has to hold for any lifetime 'a, not just a specific lifetime you happen to have available at the time. For example, <T> for<'a>: &'a T: Read says that any shared reference to a T must implement Read.

This monstrous warp type

Did you know that the trailing commas you see in some places in there, ,), are to distinguish one-element tuples from regular parenthetical expressions?

`FnOnce`

Did you know that until Rust 1.35, you couldn’t call a Box<dyn FnOnce> and needed a special type (FnBox) for it! This was because it requires “unsized rvalues” to implement, which are still unstable today. https://github.com/rust-lang/rust/issues/28796 + https://github.com/rust-lang/rust/issues/48055

`f32`

Did you know that in Rust 1.62 we’ll get a deterministic ordering function for floating point numbers? https://github.com/rust-lang/rust/pull/95431

`Arc<T>`

Did you know that Arc has a make_mut method that effectively gives you copy-on-write? Given a &mut Arc<T>, it will either give you &mut T if there are no other Arcs, or it will clone T, make the Arc<T> point to that new T, and then give you a &mut to it!

`!`

Did you know that std::convert::Infallible is the “original” !, and that the plan is to one day replace Infallible with a type alias for !?

`fn`

Specifically, did you know that the name of a function is not an fn? It’s a FnDef, which can then be coerced to a FnPtr?

`PhantomData`

Did you know that it’s actually kind of tricky to define PhantomData yourself: https://github.com/dtolnay/ghost

`u32`

Did you know that u32 now has associated constants for MIN and MAX, so you no longer need to use std::u32::MIN and can use u32::MIN directly instead?

`bool`

Did you know that bool isn’t just “stored as a byte”, the compiler straight up declares its representation as the same as that of u8?

`Any`

Did you know that Any is really non-magical? It just has a blanket implementation for all T that returns TypeId::of::<T>(), and to downcast it simply compares the return value of that trait method to see if it’s safe to cast to downcast to a type! TypeId is magic though.

`Self`

Did you know that fn foo(self) is syntactic sugar for fn foo(self: Self), and that one day you’ll be able to use other types for self that involve Self, like fn foo(self: Arc<Self>)? https://github.com/rust-lang/rust/issues/44874

`()`

Did you know that () implements FromIterator, so you can .collect::<Result<(), E>> to just see if anything in an iterator erred?

[Note that this doesn’t say whether or not this is a good idea. -Jimmy]

`struct S`

Did you know that struct S implicitly declares a constant called S, which is why you can make one using just S?

`RefCell`

Did you know that RefCell allows you to replace a value in-place directly (like std::mem::replace)? https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.replace

`core::num::Wrapping`

Did you know that there used to also be a trait accompanying Wrapping, WrappingOps, that was removed last minute before 1.0? https://github.com/rust-lang/rust/pull/23549

`*const T`

Did you know that, at least for the time being, *const T and *mut T are more or less equivalent? https://github.com/rust-lang/unsafe-code-guidelines/issues/257

`std::os::unix::net::UnixStream`

Did you know that (on nightly) you can pass UNIX file descriptors over UnixStreams too, and thereby give another process access to a file it may not otherwise be able to open?

`std::sync::Condvar`/`Mutex`

Did you know that Mara is doing some awesome work on making Condvar (and Mutex and RwLock) much better on a wide array on platforms? https://github.com/rust-lang/rust/issues/93740

`std::task::Waker`

Did you know that Waker is secretly just a dyn std::task::Wake + Clone done in a way that doesn’t require a wide pointer or support for multi-trait dynamic dispatch? See https://doc.rust-lang.org/std/task/struct.RawWakerVTable.html

`impl Trait`

Did you know that impl Trait in argument position and impl Trait in return position represent completely different type constructs, even though they “feel” related? https://doc.rust-lang.org/nightly/reference/types/impl-trait.html

`BTreeMap<K, V>`

Did you know that BTreeMap is one of the few collections that still doesn’t have a drain method? https://github.com/rust-lang/rust/issues/81074

`struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>);`

Did you know that PhantomData<T> has variance like T, and *mut T is invariant over T, and so by placing a lifetime inside T you make the outer type invariant over that lifetime?

`Rc<T>`

Did you know that the Rc type was among the arguments for why std::mem::forget shouldn’t be marked as unsafe? https://github.com/rust-lang/rust/issues/24456

`std::future::Ready`

Did you know that these days you can just use async move { x } instead of future::ready(x). The main reason to still use future::ready(x) is that you can name the future it returns, which is harder with async (without type_alias_impl_trait that is).

`usize`

Did you know that usize isn’t really “the size of a pointer”. Instead, it’s more like “the size of a pointer address difference”, and the two can be fairly different! https://github.com/rust-lang/rust/issues/95228

`std::thread::Thread`

Did you know that the ThreadId that’s available for each Thread is entirely a std construct? Creating a ThreadId simply increments a global static counter under a lock.

`std::ops::ControlFlow`

Did you know that ControlFlow is really a stepping stone towards making ? work for other types than Option and Result? The full design has gone through a lot of iterations, but the latest and greatest is RFC3058.

`File`

Did you know that there are implementations of Read, Write, and Seek for &File as well, so multiple threads can share a single File and call those concurrently. Whether they should is a different question of course.

`Result<T, E>`

Did you know that Rust originally (pre-1.0) had both Result and an Either type? They decided to remove Either way back in 2013

`Cow<str>`

Did you know that because Cow<'a, T> is covariant in 'a, you can always assign Cow::Borrowed("some string") to one no matter what it originally held?

`PanicInfo`

Did you know that since PanicInfo is in core, its Display implementation cannot access the panic data if it’s a String (since it can’t name that type), so trying to print the PanicInfo after a std::panic::panic_any(format!("x y z")) won’t print "x y z"? Source link.

`std::ffi::c_void`

Did you know that the whole c_void type is a collection of hacks to try to work around the lack for extern types? https://github.com/rust-lang/rust/issues/43467

`#[feature(raw_ref_op)] &raw const T`

Definitely cheating :p But did you know that originally the intention was to have &const raw variable be just a MIR construct and let &variable as *const _ be automatically changed to &const raw? https://github.com/RalfJung/rfcs/blob/fd4b4cd769300cfde5d54865d227990b71b762d1/text/0000-raw-reference-operator.md

`u256`

Did you know that because Rust compiles through LLVM, we’re sort of constrained to the primitive types LLVM supports, and LLVM itself only goes up to 128?

`_`

Did you know that whether or not let _ = x should move x is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488

`MaybeUninit`

Did you know that MaybeUninit arose because the previous mechanism, std::mem::uninitialized, produced immediate undefined behavior when invoked with most types (like uninitialized::<bool>()).

`struct T<const C: usize>`

Did you know that with Rust 1.59.0 you can now give C a default value?

`Weak<T>`

Did you know that actual deallocation logic for Arc<T> is implemented in Weak<T>, and is invoked by considering all copies of a particular Arc<T> to collectively hold a single Weak<T> between them? Source link.

`[T; N]`

Did you know that while most trait implementations for arrays now use const generics to impl for any length N, we can’t yet do the same for Default.

`u8`

Did you know that as of Rust 1.60, you can now use u8::escape_ascii to get an iterator of the bytes needed to escape that byte character in most contexts.

`HashMap<K, V>`

Did you know that the Rust devs are working on a “raw” entry API for HashMap that allows you to (unsafely) avoid re-hashing a key you’ve already hashed? https://github.com/rust-lang/rust/issues/56167

`&mut T`

Did you know that while &mut T is defined as meaning “mutable reference” in the Rust reference, you’re often better off thinking of it as “mutually exclusive reference”. Quoth David Tolnay.

`std::ops::Range`

Did you know that there’s been a lot of debate around whether or not the Range types should be Copy? https://github.com/rust-lang/rust/pull/21846

`AtomicU32`

Did you know that you’ll often want compare_exchange_weak over compare_exchange to get more efficient code on ARM cores.

`std::ops::Hash`

Did you know that Hash is responsible for not just one , but two of the issues on the “rust 2 breakage wishlist”?

`{integer}`

Did you know that fasterthanlime’s most recent article does a great job at explaining {integer}?

`Fn`

Did you know that until Rust 1.35.0, Box<T> where T: Fn did not impl Fn, so you couldn’t (easily) call boxed closures! https://github.com/rust-lang/rust/pull/55431

`((), ())`

Did you know that ((), ()) and () have the same hash? Playground link.

`[T]`

Did you know that &[u8] implements Read and Write? So for anything that takes impl Read, you can provide &mut slice instead! Comes in handy for testing. Note that the slice itself is shortened for each read, hence &mut &[u8].

`*`

Did you know that * is (mostly) just syntax sugar for the std::ops::Mul trait?

`UnsafeCell<T>`

Did you know that UnsafeCell is one of those types that the compiler needs “special magic” for because it has to instruct LLVM to not assume Rust’s normal aliasing rules hold once code traverses the boundary of any UnsafeCell?

Function Overloading in Rust

2022-06-04T00:00:00+00:00

I just made a pull request to reqwest. I thought this particular one was interesting enough to be worth blogging about, so I am.

We know that many C++ family languages have a feature known as function overloading, where two functions or methods can exist with the same name but different argument types. It looks something like this:

void use_connector(ConnectorA conn) {
    // IMPL
}

void use_connector(ConnectorB conn) {
    // IMPL
}

The compiler then chooses which method to call, at compile-time, based on the static type of the argument. In C++, this is part of compile-time polymorphism, an easy “if statement” in the template meta-language. In Java and many other languages, it’s merely a convenience, for when an ad-hoc group of types are possible for what an outsider sees as the same operation, but which from the perspective of the library requires different implementations.

Rust does not support this, at least not in this form. This is a mildly controversial decision; I’ve seen many people complain about it, because it is a commonly-used feature in the languages they’ve come from. Ultimately, I think Rust made the right call. There are too many advantages of having a one-to-one correspondence between method or function names and implementations, and ultimately I think the feature is more confusing than helpful. traits cover a lot of the same ability, but in a more structured fashion, acting like C++’s compile-time “if-statements.” But of course, there is always a learning curve giving up a feature you’re used to using.

But just because Rust doesn’t officially support function loading as a feature, surprisingly doesn’t mean that it’s completely impossible. Recently, I was looking into the depths of reqwest, trying to troubleshoot an issue, and I came across this code:

#[cfg(any(feature = "native-tls", feature = "__rustls",))]
#[cfg_attr(docsrs, doc(cfg(any(feature = "native-tls", feature = "rustls-tls"))))]
pub fn use_preconfigured_tls(mut self, tls: impl Any) -> ClientBuilder {
    let mut tls = Some(tls);
    #[cfg(feature = "native-tls")]
    {
        if let Some(conn) =
            (&mut tls as &mut dyn Any).downcast_mut::<Option<native_tls_crate::TlsConnector>>()
        {
            let tls = conn.take().expect("is definitely Some");
            let tls = crate::tls::TlsBackend::BuiltNativeTls(tls);
            self.config.tls = tls;
            return self;
        }
    }
    #[cfg(feature = "__rustls")]
    {
        if let Some(conn) =
            (&mut tls as &mut dyn Any).downcast_mut::<Option<rustls::ClientConfig>>()
        {
            let tls = conn.take().expect("is definitely Some");
            let tls = crate::tls::TlsBackend::BuiltRustls(tls);
            self.config.tls = tls;
            return self;
        }
    }

    // Otherwise, we don't recognize the TLS backend!
    self.config.tls = crate::tls::TlsBackend::UnknownPreconfigured;
    self
}

I was shocked to see this! I felt like I was reading Java. My first thought was that this was the Java instanceof (anti-)pattern, but after a little more thought, I realized that this in practice would work out to function overloading.

Since this uses impl Any instead of &mut dyn Any, this function will be monomorphized at compile-time, and I would expect that the relevant branching would be collapsed, resulting in these monomorphizations, written in an imaginary version of Rust where function overloading is supported:

#[cfg(feature = "native-tls")]
pub fn use_preconfigured_tls(mut self, tls: native_tls_crate::TlsConnector) -> ClientBuilder {
    let tls = crate::tls::TlsBackend::BuiltNativeTls(tls);
    self.config.tls = tls;
    self
}

#[cfg(feature = "__rustls")]
pub fn use_preconfigured_tls(mut self, tls: rustls::ClientConfig) -> ClientBuilder {
    let tls = crate::tls::TlsBackend::BuiltRustls(tls);
    self.config.tls = tls;
    self
}

There is a wrinkle though. Unlike the Java or pseudo-Rust equivalent, the Rust code in reqwest will still allow functions to compile if they specify another type that is not one of the two supported. So you can call this function with anything, even an i32, and the compiler won’t signal an error or even a warning:

client_builder.use_preconfigured_tls(42); // COMPILES!

In this implementation, it eventually causes a run-time error instead (a separate function produces it in the case of UnknownPreconfigured). But this odd type-safety work-around still can’t be removed without breaking API-compatibility. Code could theoretically be relying on this function producing a run-time error in certain situations, or it could rely on that other function not being called. Luckily, reqwest is not 1.0, and I have reason to hope they won’t consider this problematic.

There are other ways to accomplish the same goal. Instead of an ad-hoc list of supported types, this code could’ve used a trait. Such code would look something like this:

pub trait TlsConfig {
    fn to_tls_backend(self) -> crate::tls::TlsBackend;
}

#[cfg(feature = "native-tls")]
impl TlsConfig for native_tls_crate::TlsConnector {
    fn to_tls_backend(self) -> crate::tls::TlsBackend {
        crate::tls::TlsBackend::BuiltNativeTls(self)
    }
}

#[cfg(feature = "__rustls")]
impl TlsConfig for rustls::ClientConfig {
    fn to_tls_backend(self) -> crate::tls::TlsBackend {
        crate::tls::TlsBackend::BuiltRustls(self)
    }
}

pub fn use_preconfigured_tls(mut self, tls: impl Tls) -> ClientBuilder {
    self.config.tls = tls.to_tls_backend();
    self
}

This would allow the library to be used in the exact same way for valid uses, but would still allow the compiler to catch invalid types. To be sure, the trait and its impls would have to be separated in the code from the use_preconfigured_tls method, as you can’t put a trait inside an impl block. But I think such an inconvenience is worth the better type-safety.

My take-away here is to be wary of emulating features from other programming languages, and also to be wary of std::any.

Addendum/Errata

I was wrong about the existing code not providing a run-time error. It sets an enum to UnknownPreconfigured, which then triggers a run-time error elsewhere in a separate function. The article has been updated accordingly.

The trait example code was also edited to reflect a version that actually compiles, but not the final version in the MR.

I also edited the intro to clarify the relationship between function overloading and traits.

The MR was ultimately rejected for reasons I deeply disagree with.

Can you have too many programming language features?

2022-05-11T00:00:00+00:00

There’s more than one way to do it.

Perl motto

There should be one– and preferably only one –obvious way to do it.

The Zen of Python (inconsistent formatting is part of the quote)

When it comes to statically-typed systems programming languages, C++ is the Perl, and Rust is the Python. In this post, the next installment of my Rust vs C++ series, I will attempt to explain why C++’s feature-set is problematic, and explain how Rust does better.

C++ fans brag that it is “multi-paradigm,” and it is. You can do everything the C way, as C++ has a subset almost exactly identical to C. You can use pointers and virtual functions and inheritance to create all the classic OOP design patterns, as C++ is object-oriented. Or you can use templates, and “static” or “compile-time” polymorphism, and program that way.

At first glance, this all seems like an unmitigated good thing, because it gives you, as a programmer, flexibility. You can express your code in OOP style if that matches the problem at hand, or even if you just like it better. If you need the performance of templates, you can use them, and if you don’t (or you just find them confusing), you can use run-time polymorphism instead. Or you can just ignore all of it, and program in almost-plain C. Flexibility is good: you can use the features you want, and not use the features you don’t want. Even if a feature is downright harmful, in your opinion, that’s easy enough to handle: Just don’t use it.

And this is all very well and good if you’re programming a quick project completely by yourself. But most code comes in long-lived projects, with developers jumping in and out of the project all the time. In such an environment, as Robert C. Martin puts it:

“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.”

Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

(Sidenote: I will admit to knowing almost nothing about Robert C. Martin besides this famous quote. I have no idea if the rest of his work is as insightful as this quote, or not, and will probably try to find out someday, but not today.)

Since programmers in general spend much more time reading code than writing it, we very rarely actually get to reap the benefits of this flexibility as writers. Much more often, as maintainers and readers, we have to be flexible ourselves. We have to be ready to read code in any style, in any paradigm, using any feature-set.

This is why Perl was commonly panned as a write-only programming language: It had so many features that you could not be up to speed on all of them. Each programmer at each point in time had a set that they used, but no one could ever get proficient at working in the entire available feature-set.

In Perl, the features were syntactic, so the programs would be unreadable at a line-by-line level. In C++, the different features have more to do with code organization, which is harder to make fun of, but I think more insidious, because a lot of the features are structural.

Let me explain what I mean. Let’s say you’re a C++ maintenance programmer, and you don’t like exceptions. You’re trying to maintain a program that uses exceptions heavily, and add new features to it. Not only do you have to be able to understand exceptions to read the code, you have to write your own code so that it handles the exceptions where appropriate, and so that it’s exception-safe. Even if you’re just using a third-party library that throws exceptions, you have to understand exceptions to use that library.

The entire programming language, with all the features, is part of the necessary skill-set to program proficiently. Even if it is just you writing your own project, you still will have to use libraries, and the features involved with it. And even if it is just you, if the project lives long enough, you will have to deal with your previous decisions. Migrating from dynamic to static polymorphism in C++ is no joke. Ask me how I know.

And of course, every feature has to be considered when writing advice. Every best practices manual for C++ is written for C++, not a subset of C++ features. The more things it’s possible for a future programmer or future library writer to do, the more things you have to worry about coding defensively, and the more things that have to be included in best practices manuals, and finally the more things that a proficient programmer has to stuff into their brain.

Specific C++ Examples, Rust Responses

But I’m also not trying to advocate for absolute minimalism. There may be a cost to every feature, and it may be that no feature is optional, but that doesn’t mean that we should have the bare minimum number of features. Sometimes the cognitive and maintenance cost of a seemingly extraneous feature is still worth it. Especially in a systems programming context, different problems often do actually call for different implementation strategies with different programming language features to express them.

C++, however, does this poorly. I’m not even sure I’d claim that C++ has too many features; it’s more that the features are not consistent. They clash with each other. Different feature-sets make assumptions that are violated by other feature-sets. C++ is not designed with the costs of extra features in mind, and as such, the features cost more than they have to.

Let’s discuss a few specific ways in which C++’s features cause problems and clash with each other. For each of these categories, I then discuss how Rust handles the same topic, with a more coherently-designed feature set.

Value and Reference Semantics: Slicing

Slicing is a famous beginner error in C++, where the semantics of combining certain features are surprising with a tendency to break invariants, but no diagnostics are issued as the code is completely valid. Perhaps unsurprisingly, this code comes from a mismatch between two C++ features designed for two C++ programming styles.

Specifically, C++ has a distinction between value and reference semantics.

With value semantics, you can use operator overloading to make your custom class look and act like a built-in type, supporting operators like + and +=:

class complex {
    double re;
    double im;
public:
    complex &operator=(const complex &other) {
        re = other.re;
        im = other.im;
        return *this;
    }

    complex &operator+=(const complex &other) {
        re += other.re;
        im += other.im;
        return *this;
    }

    complex operator+(const complex &other) {
        complex res = *this;
        res += other;
        return res;
    }
};

// Sample usage
Complex a, b;
a = b;
Complex c = a + b;

With reference semantics, you can use polymorphism to create many different types of object that support the same interface. You can then access these objects through pointers or references to the base class.

class Complex {
protected:
    double re;
    double im;
public:
    virtual double getMagnitude() {
        return sqrt(re * re + im * im);
    }
}

class Quaternion : public Complex {
protected:
    double j;
    double k;
public:
    double getMagnitude() override {
        return sqrt(re * re + im * im + j * j + k * k);
    }
}

// Sample usage
void print_magnitude(Complex &c) {
    std::cout << c.getMagnitude() << std::endl;
}

Quaternion a;
Complex b;
print_magnitude(a);
print_magnitude(b);

However, these two programming techniques cannot be combined. You cannot assign a Complex object a Quaternion value:

Quaternion a;
Complex b;
b = a; // Non-sensical

Why? Well, unlike in Java, Complex b actually allocates the space for a Complex number as a local variable on the stack. This means that it only has room for the two fields, re and im.

But, unfortunately, if you include all the methods from both examples, that code will compile, and run, and b will have only re and im from a. This is almost certainly not what you want, and may in fact break invariants (e.g. for this you might only be dealing with values of magnitude 1, and this truncation would lower the magnitude).

This comes from two alternative paradigms for objects: by value as “primitive replacement,” where Complex can be used like an int, and by reference with traditional OOP inheritance and polymorphism. These paradigms don’t use different keywords, however. They can just all be used in the same objects, causing this trouble.

Advice on how to prevent this includes rules like “give all parent classes at least one pure virtual function,” which would make Complex b as a by-value declaration illegal. But if this rule is recommended in leading books on C++, why isn’t it enforced in the programming language itself?

How Rust Handles This

C++’s slicing is caused by a conflict between two features, inheritance and assignment. Rust handles both of those features differently, so that they do not conflict.

So the most important difference here between Rust and C++ is that Rust does not have implementation inheritance like C++ does. For two given C++ concrete types, one of which surrounds the other, there are two possible relationships between them: is-a, and has-a. Rust only does has-a for concrete types.

C++ inheritance is a feature with many use cases, such as sharing implementation, implementing policy, and implementing interfaces (what Rust calls traits). Rust, rather than having one big broad feature, instead implements individual features as appropriate. The closest feature Rust has to inheritance is traits (including subtraits and supertraits), but because traits are not concrete types, they cannot be assigned, and so this issue is avoided.

But also in assignments, Rust implements a simpler feature that is easier to reason about: Rust does not allow custom assignment operators. Rust instead builds assignment out of two operations: move, and drop (cf. C++ destructors). If drop is implemented correctly, so will assignment. If you want to copy instead of move, you have to explicitly call a clone() method. And moves are not customizable either.

So, although Rust has some of the best parts of inheritance in traits, and still allows assignment of custom types (but through customizing drop, not assignment per se), it avoids this particular clash through restricting the scope of those features.

Exceptions and “Exception Safety”

It would be impossible to write a post criticizing C++ for its problematic feature-clashes and not talk some about exceptions.

Exceptions are another famous example of a C++ feature you simply can’t “not use.”

Exceptions are viral by nature. If you call a function that might throw and don’t catch all the exceptions that it throws – which might be impossible to determine – then your function can throw as well.

And lots of functions can cause exceptions. Allocating memory indicates failure via exception. Exceptions are the only way for constructors to signal failure, and C++ idiom encourages constructors to be written in such a way that success guarantees that the object is usable. The programming language was clearly not designed to be used without exceptions.

But exceptions are gnarly and confusing. I already know people will comment to this post and say that if you write and structure C++ code correctly, it will be exception-safe. And that’s almost trivially true, since exception safety is part of correct C++ practice, but it’s not easy and it doesn’t follow naturally from easy-to-learn principles, which is why Herb Sutter, a huge name in C++, felt the need to write two books about it. Of course, in practice, people just write exception-unsafe code, all the time.

Every time you call a function – which can happen in C++ simply by declaring a variable, or even by ending a scope (though destructors are supposed to avoid throwing exceptions) – you have to worry about whether that function throws an exception, and if you’re leaving things within that function in an inconsistent state. In C++, it is very common to implement your own unsafe data structures, and exceptions are designed to be sometimes recoverable from. Lack of exception safety can mean memory corruption or even exploitable security vulnerabilities.

No wonder a lot of codebases ban exceptions. Unfortunately, many shops simply avoid using exceptions instead of banning them, leaving exceptions possible. Also, code from “exception-free” codebases can then later be mixed back in with regular C++, re-opening it to exception-safety concerns.

The fact that exceptions are so controversial can lead to confusion as well. Consider this function signature:

std::unique_ptr<DatabaseConnection> connect(const ConnectionParameters&);

How does this function indicate failure? From the signature, there are two possibilities: It could either return nullptr, or it could throw an exception. Hopefully the documentation would clarify – but again, oftentimes, people don’t write documentation, especially for internal APIs.

How Rust Handles This

Normal Error Handling in Rust

For recoverable errors, Rust encodes them in the type. Rust’s equivalent to std::unique_ptr – Box – is not nullable. If we want to return one, but possibly also signal an error, we use a sum type or what Rust calls an enum, and what C++ would call a “tagged union” and make you implement by hand:

fn connect(param: &ConnectionParameters) ->
    Result<Box<DatabaseConnection>, OurError>;

This means that it can return either a database connection or an error. This is the convention to return any error condition that is recoverable, which is half of what exceptions are used for in C++. Since Box is not nullable, you have to say more than just Box to signal that it’s possible to return an error, proving that you really mean it.

For unrecoverable exceptions – for situations like logic and programming errors that the program has caught – Rust has panics, which work much more like C++ exceptions in practice.

Panic Safety

Rust afficionados will know that Rust has not escaped exception safety, having instead an analogous notion of “panic safety.” How, then, can I criticize C++ so boldly?

There are two notable differences between C++ exceptions and Rust panics. The first is that Rust panics are used primarily for unrecoverable errors, such as errors that indicate that a programmer’s assumptions were violated due to a bug or a circumstance that the program cannot recover from or a misunderstanding from the programmer. These generally are unrecoverable, and Rust by convention uses a different mechanism, Results, for recoverable errors. So most Rust code doesn’t have to care about maintaining invariants in the face of panics, because most Rust code can presume that if it panics, that’s the end. This is better scoping for the panic feature, as opposed to exceptions.

But the fact remains that panics can be recovered from, and do still do stack unwinding and destructor/drop calls, and safety issues can still exist. Panics in Rust can cause memory corruption – in unsafe code. And that’s where panic safety really still matters: in unsafe code only. By cordoning off the implementations of sophisticated data structures that require unsafe, Rust also cordons off who has to worry about panic safety.

In C++, every function that calls another function has to be written in an exception-safe way. In Rust, it’s really only unsafe code that has to worry about it. This, in my mind, is a huge win, and it comes from both better scoping of panics, and better management of the situations where panics can break things.

C-style vs C++-style Pointers and Arrays

There is a subset of C++ that is almost identical to C, and C++ must maintain compatibility with this subset for tradition’s sake. It also must maintain compatibility with previous versions of itself. Between the C and the C++, the concepts contained in C++20 stretch from 1972 to 2020, almost 50 years of active change in programming language technology. This leads to features being duplicated, but differently, and in ways that unfortunately clash with each other.

For example: How do you express indirection? How do you alias a value? There are three different ways to do it, and rather than breaking down by use case, the biggest difference between them is era:

Pointers, from the original C
References, a newer innovation that attempts to solve some of the issues with pointers
Smart pointers, an even newer innovation that attempts to cover some of the remaining use cases. For pointers into arrays, iterators also cover a lot of the same territory as smart pointers, and can be lumped together for this conversation.

These overlap a lot, and there is no single principle that will tell you when to use which. You can invent some rules, and come up with some principled reasonings for them, but your colleagues won’t necessary listen, and external libraries and other codebases you have to interact with certainly won’t, not even the standard library, not even the programming language itself. Fundamentally, the difference is era.

Nullability? Part of original pointers. Later, we learned it was harmful and got rid of it in references, but due to issues with how C++ does move semantics it comes back with a vengeance for smart pointers. (Of course, you still can make a null reference, it’s just undefined behavior. Ah well.)

Pointers and references have special syntax, whereas smart pointers, because they came from a later era, use the more standard ptr_type<T> syntax. Pointers and smart pointers can be used to manage ownership, and references should not be.

How should out parameters be expressed? It’s easy to say they should be expressed with references, because otherwise they’re nullable, and you have to worry about whether to check for nulls or not. On the other hand, expressing out parameters with references mean you can’t tell at the caller whether it’s an out parameter, only at the callee:

int foo_ptr(int in, int *out);
int foo_ref(int in, int &out);
int out;
foo_ptr(3, &out);
foo_ref(4, out); // Surprise, this changes `out`! Can't tell, though!

foo_ptr(5, nullptr); // Does this crash? Does this work? Who knows!
// Read the docs, I guess *shrug* hope there are docs

References should be used, in my practice and in the practice of many people I respect, in every case where the reference is not owning, will not be used for arithmetic, and is not optional. Of course, this meets all of those requirements, but is a pointer, not a reference (but a special pointer, where being null is undefined behavior, like a reference), simply because references were invented after this was, and for no stronger reason.

Similarly, my practice dictated that std::unique_ptr should be used for owning pointers. It’s nullable, but at least it auto-frees, and so you should use it everywhere you’re conveying ownership. And then, Foo * can be used when you want an optional non-owning reference. But old APIs and APIs from C exist all over the place that will use Foo * invariably, and some will use Foo * for out parameters because of the callee readability issue, or because of concerns about std::unique_ptr, or simply out of old habit, meaning you can’t count on this convention actually being upheld, not at all.

And of course, converting between these different representations is sometimes as easy as & or *, and sometimes as difficult as having & and * compile and seem to work but result in memory corruption, and everywhere in between.

Similarly, T foo[N] and std::array<T, N> foo are different ways of writing the same basic thing. It gets weird when N = 0, of course; this is only supported by std::array. And (on compilers that support it at all), having N be dynamic on the stack is only supported by T foo[N]. And of course, new T[N] returns a raw pointer to T whereas new std::array<T, N> returns a pointer to a std::array, which makes much more sense.

So, basically, T foo[N] should be completely deprecated, but keeps on being used even by new C++ programmers because it looks like it should be the normal way to write an array, and because it looks like the arrays from C. But they’re completely different types – one isn’t syntactic sugar for the other.

This gets unwieldy, because the ways with the syntactic sugar (like new and T* instead of std::make_unique and std::unique_ptr) are the old, more C-style ways, the ways that yield more memory leaks (you have to explicitly free or delete a T*) and memory corruption (T foo[] doesn’t even have a safe indexing operation, or proper iterators).

And of course, even if you use the more modern formulations to save on cognitive load because they’re more consistent with the rest of the programming language (where std::unique_ptr does RAII unlike traditional pointers (spelled *) and std::array implements the expected collections methods unlike traditional arrays ([])), you still have to understand the traditional pointers and arrays completely to call yourself a C++ programmer. Due to C interop, people not changing their ways, and old resources, lots of new code is still written with them, and there are still situations where they’re unavoidable, like pointer arithmetic or this.

Besides, even if you do correctly discern that a T* must be freed, how do you free it – free, delete, or delete[]? Choose wisely, because the consequences of mixing malloc and delete can go beyond whether destructors are called, and lead to undefined behavior and general memory corruption. The documentation (or lack thereof), however, might just assume you know which one to call.

How Rust Handles This

Rust also has references and various types of smart pointers and iterators. It also has raw pointers, from which smart pointers can be implemented. So in terms of the range of features, it’s actually the same as C++. What’s the difference then?

Well, in Rust, the difference is that they don’t overlap in the same way. Each feature has its own purpose, unlike in C++ where it’s anyone’s bet whether references or pointers are used for aliasing or pass-by-reference, or whether raw pointers or smart pointers are used to express ownership. Nullability is mostly a separate concern from day-to-day use of Rust’s types, and so it is implemented orthogonally through Option and Result, rather than being available in some types but not in others haphazardly.

References are for everyday aliasing and pass-by-reference. They are not nullable. They represent the primitive concept of aliasing and pass-by-reference, and they are the only feature that does so. Unlike C++ references, you must use the & operator to create a Rust reference, making them explicit on the caller.

Smart pointers are, for the most part, also not nullable, possibly partially because Rust has destructive moves. They represent ownership semantics – whether “unique” ownership (Box), shared ownership (Rc or Arc), or locking (Mutex or RefCell).

Raw pointers in Rust are very special – they are for implementing smart pointers or other low-level data structures. They are for situations where the structure of memory and the concept of a pointer is actually key to the situation. They are kept within these narrow bounds, and outside of everyday application programming, by having most of their features considered unsafe.

If only that could be done for raw pointers in C++! But there is too much momentum behind the C++ raw pointer.

Dynamic vs Static Polymorphism

This is the most intense one, and could be a blog post all on its own – and probably I’ll write it one day.

In response to comments, I’m going to add a caveat here even though I address it later: In this section, I’m discussing the status of C++ pre-concepts, from C++17 and earlier, because that is the form of C++ that most people are still using, and that the vast majority of code is still written in. It is too early to tell how much concepts will help, but because they are an optional feature, I’m not at all optimistic.

We have two forms of polymorphism in C++, two very different systems. One is a Turing-complete macro system that comprises overloads, templates, and template metaprogramming. The other is an object-oriented style system of polymorphism through inheritance.

They were designed with different purposes in mind, and considering their original purpose, it’s clear to see why they must have different implementations.

Templates were designed for collections and algorithms, for being able to write a vector or linked list that could contain any arbitrary type, without resorting to a C-style void* that would require both indirection and type erasure. The lack of indirection is the point – at least it was for C++ – and so as a consequence templates had to be carried out statically.

Dynamic polymorphism, on the other hand, was designed for OOP design patterns. As such, in line with OOP principles, it supports heterogeneous containers, especially necessary to support OOP’s core use case of GUI programming.

But in spite of this deep contrast between static and dynamic, they overlap in use case. For example, Smalltalk, Objective-C, and Java (pre Java 5) all show us that you can use dynamic polymorphism to implement generic containers. If C++ had been less performance-centric, and could tolerate the indirection, it could have used a similar strategy, the (old school) Java approach to generic containers without generics or templates:

Make all classes inherit from a universal base class, Object. This way, Object * (just Object in Java) can refer to any object. Make sure, for C++, that this has a virtual destructor, so you can delete any object through its Object* handle.
Write “boxed versions” of all primitive types, classes that extend Object to correspond to int and double, etc.
Write all collection classes (std::vector, std::list) in terms of Object *, writing Object * instead of T.
Use RTTI and dynamic_cast (or in Java terms, casts) to allow the user to get whatever object type they want out of them.

Voilà! You can now store anything in your collections without need for generics or templates, using dynamic_cast, an obscure feature of the OOP-style dynamic polymorphism that C++ has. And this system is in fact still the basis of Java generics, and so we can project that C++ would have used something similar if performance weren’t a concern and indirections and RTTI were acceptable.

So that shows the overlap between templates and runtime polymorphism in a theoretical sense, but do these very differently implemented features in fact overlap in practice?

I’ve seen skepticism. I once interviewed people for a job, and I asked candidates to explain to me the similarities and differences between dynamic polymorphism and templates. The candidate said there was no overlap; templates were for generic programming (e.g. collections and algorithms and STL), and dynamic polymorphism was for object-oriented programming.

But they do overlap in practice. I know, because I spent a lot of time transitioning object-oriented dynamic code into static form, and teaching the static equivalents to dynamic polymorphism patterns. It wasn’t easy, because even though the overlap is huge, the semantics are vastly different.

Let me give an example. Let’s start with one of my favorite patterns: the policy pattern. Let’s imagine we have a function that sends messages in a way that can fail, and let’s also imagine that we have a policy that indicates how we should delay and retry sending this message. I’ll start out writing it the object-oriented way, something like this:

struct RetryPolicy {
    virtual bool should_retry(mesg_send_err_t error_code) = 0;
    virtual uint32_t delay_microseconds() = 0;
};

mesg_send_err_t retry_send_message(Message &mesg, RetryPolicy &policy) {
    while (true) {
        auto err = send_message_once(mesg);
        if (err == mesg_send_err_t::SUCCESS) {
            return mesg_send_err_t::SUCCESS;
        } else if (!policy.should_retry(err)) {
            return err;
        } else {
            usleep(policy.delay_microseconds());
        }
    }
}

The policy can then do things like “retry 5 times, waiting 0.01 seconds between each retry” or “exponential back-off, so that each retry waits twice as long as the previous.” It can also deem certain errors as fatal, but others as worth sleeping and retrying for. Here’s an example of using this interface:

struct WaitOneSecondAndTryFiveTimes : RetryPolicy {
    int retry_count = 0;

    bool should_retry(mesg_send_err_t error_code) override {
        if (error_code == mesg_send_err_t::MALFORMED_MESG) {
            return false;
        }

        retry_count++;
        if (retry_count == 5) {
            return false; // do not retry
        }

        return true; // do retry
    }

    uint32_t delay_microseconds() override {
        return 1000000;
    }
};

WaitOneSecondAndTryFiveTimes policy;
auto err = retry_send_message(mesg, policy);

Now, it turns out we can do this exact same pattern with static polymorphsim. The callee code now looks like this:

template <typename T>
mesg_send_err_t retry_send_message(Message &mesg, T policy) {
    while (true) {
        auto err = send_message_once(mesg);
        if (err == mesg_send_err_t::SUCCESS) {
            return mesg_send_err_t::SUCCESS;
        } else if (!policy.should_retry(err)) {
            return err;
        } else {
            usleep(policy.delay_microseconds());
        }
    }
}

This is no longer a function. It is a function template, which is a type of macro. Its implementation must now move from the .cpp file to the .h or .hpp file, for reasons that only make sense if you think about how the programming language is implemented.

No longer is the policy interface spelled out separately. The only thing the function signature says about the type of policy is that it is T – which can be any type. Only in the implementation, in the body, do we see that should_retry() and delay_microseconds() must be implemented on it. This is an implicit interface, defined by usage, very similar to Python and Ruby’s duck typing. More importantly, it is completely unrelated to the OOP-style explicit interface using inheritance and virtual functions.

The errors are completely different, because the rules are completely different.

test.cpp:57:34: error: variable type 'WaitOneSecondAndTryFiveTimes' is an abstract class
    WaitOneSecondAndTryFiveTimes policy;
                                 ^
test.cpp:22:22: note: unimplemented pure virtual method 'delay_microseconds' in 'WaitOneSecondAndTryFiveTimes'
    virtual uint32_t delay_microseconds() = 0;
                     ^
1 error generated.

With the template version, you get:

test.cpp:34:27: error: no member named 'delay_microseconds' in 'WaitOneSecondAndTryFiveTimes'
            usleep(policy.delay_microseconds());
                   ~~~~~~ ^
test.cpp:59:16: note: in instantiation of function template specialization 'retry_send_message<WaitOneSecondAndTryFiveTimes>' requested here
    auto err = retry_send_message(mesg, policy);
               ^
1 error generated.

The ad-hoc nature of template requirements should not be underestimated. It means that objects that are designed to work with a whole library might only work with the exact combinations of functions they’ve been used with so far. It means that documentation, if it wants to be rigorous, must do the work of defining the protocols itself of every argument taken by every function. It means that it’s not clear when you’re putting new requirements on arguments to a function, as there is no warning and no clear red-line step to tell you that you’re breaking backwards-compatibility.

Concepts have been introduced recently to clean it up, and I think it’s still early to tell how good a job they will do. But even if they do a great job, the polymorphism will still look very different from the OOP style, and the old template-based code will still exist, and so in the meantime the C++ programming language has simply continued to grow.

And the concrete consequences: It’s a perfectly reasonable decision to use OOP-style polymorphism, for the benefits of cleaner structure and explicit specification of the interface, even when the dynamic nature of the polymorphism – and its concomittant performance costs – is never actually called for. Meanwhile, using static polymorphism to accomplish the same goals is simply harder, requiring much more skill and training.

How Rust Handles This

Like C++, Rust has both static (compile-time) polymorphism, and dynamic (run-time) polymorphism. Unlike C++, Rust integrates them closely into a single feature, inspired by Haskell’s typeclasses: traits.

Let’s use the same example again, but in Rust, using static polymorphism, which is the more Rusty way to write such a function:

trait RetryPolicy {
    // Return None to not retry at all
    // Takes `self` as `&mut` to implement counting and back-off
    fn retry_microseconds(&mut self, error: MesgSendError) -> Option<Duration>;
}

fn retry_send_message(mesg: &Message, mut policy: impl RetryPolicy) -> Result<(), MesgSendError> {
    loop {
        match send_message_once(mesg) {
            Ok(()) => {
                return Ok(());
            }
            Err(err) => match policy.retry_microseconds(err) {
                None => {
                    return Err(err);
                }
                Some(delay) => sleep(delay),
            },
        }
    }
}

I changed the example a little to showcase some other differences with Rust. Instead of querying two functions, for example, to know whether to try again and how long to delay, I feel in Rust it is more natural to use sum types (and in particular Option) to fold them into a single function. Similarly, rather than a u32 count of microseconds, std::thread::sleep takes a Duration, and so I felt the policy trait should reflect that as well. Also, last but not least, in Rust it is not necessary to consider SUCCESS to be one of the error options, and so the types are more well-honed to the situation.

Notice, however, that this is the more performant static version, and it has an explicit in-code specification of what the interface is for the policy. However, the policy code and the generic code are fully integrated just like in the C++ templated version, through a process known as monomorphization. Fundamentally, monomorphization exhibits the same behavior to C++ template instantiation, but in a more principled, constrained fashion.

Here is the example of the usage of such a polymorphic function:

struct WaitOneSecondAndTryFiveTimes {
    retry_count: u32,
}

impl WaitOneSecondAndTryFiveTimes {
    fn new() -> Self {
        Self {
            retry_count: 0,
        }
    }
}

impl RetryPolicy for WaitOneSecondAndTryFiveTimes {
    fn retry_microseconds(&mut self, err: MesgSendError) -> Option<Duration> {
        if err == MesgSendError::MalformedMessage {
            return None;
        }

        self.retry_count += 1;
        if self.retry_count == 5 {
            return None;
        }

        Some(Duration::from_secs(1))
    }
}

let policy = WaitOneSecondAndTryFiveTimes::new();
let res = retry_send_message(&mesg, policy);

If we wanted to use dynamic polymorphism for some reason – for example, if we wanted to look the policy up in some sort of map based on a user-supplied keyword, or load the policy from a dynamic library – we could, easily.

Unlike in C++, barely anything has to change. In fact, only three lines have to change.

The function signature has to change, to indicate that it’s using dynamic polymorphism now. Dynamic polymorphism, due to its nature, can only be done through indirection, so we have to add that (though it does not affect the function body):

fn retry_send_message(mesg: &Message, policy: &mut dyn RetryPolicy) -> Result<(), MesgSendError> {

Similarly, the call site has to change, to implement the indirection:

let mut policy = WaitOneSecondAndTryFiveTimes::new();
let res = retry_send_message(&mesg, &mut policy);

And that’s it! Now it’s dynamic polymorphism!

When I first saw this is when I was truly convinced that Rust would eclipse C++.

Discussion and Conclusion

So, assuming I’ve convinced you that Rust has a better organized feature-set than C++, we have to discuss what, in the big picture, C++ has done wrong and Rust has done right.

The first and most obvious thing Rust did right was learn from the mistakes of the past. Each new version of C++ has to be compatible with previous versions to a great extent, including (in a lot of ways) C, giving it a legacy back into the early 70’s. Rust started maintaining compatibility in 2015, and so it’s only had 7 years or so of cruft, but knew about all of C++’s later add-ons from the beginning.

And one of the things Rust learned from the experience of others is how to mitigate this effect, so we can hope Rust retains its youthful freshness for longer going forward. Rust has an edition system, so that features actually can be deprecated and phased out, while still maintaining compatibility.

But also, Rust’s goal of separating safe and unsafe features – and keeping unsafe code encapsulated using the unsafe keyword – forces Rust’s feature set to be more coherent. If two features clash in C++, the standards committee can put the work of reconciling them on the programmer, but in Rust, they often have to do the work to make them make sense together, so they can continue to guarantee that safe code can’t cause undefined behavior.

Additionally, Rust believes in, and has, invariants. In C++, some structs can be trivially copied. In Rust, all data types can be trivially moved (Pin is almost but not quite an exception, and the work that went into making Pin not break everything shows how important the invariant is.) In Rust, a mutable reference always means that a block of code has exclusive access to a value. These invariants also structure other features, and force them to work in concert.

Enough about Rust, though. I think there’s deeper lessons to be learned from the flaws in C++. Bjarne Stroustrup famously said, “Within C++, there is a much smaller and cleaner language struggling to get out.” I think he regrets the quote, which he clarified is about the modern semantics of C++, held down by the outdated syntax of C. It’s such a compelling quote, though, because C++ is so messy and dirty, so we want to believe in a small clean underlying core.

The truth, however, is that there isn’t one smaller and cleaner programming language struggling to get out. There’s multiple. And from the beginning, C++ was a glomming-together of multiple ideas: a Simula-like OOP system glued awkwardly to C, without a unified motivating vision. Operator overloading was intended to help integrate the two parts, which it did but at the expense of creating its own entire sub-paradigm. And then came templates, which tried to add generic containers and algorithms but unexpectedly exploded into their own programming paradigm.

So inside C++, struggling to get out, is of course C, the original “portable assembly,” which does its very simple job well. There’s also Java/C# in there, if we take the OOP features on their own. For the operator overloading and RAII and templates, the closest I can really imagine is Rust, which I think if Bjarne was being fair, he would have to admit is close to what he specified when he clarified his quote: Rust does emphasize “programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.”

It’s understandable that Bjarne glommed OOP, a foreign paradigm, onto the otherwise-stable base of C. OOP was extremely popular for a long time, and has been awkwardly glommed on to many programming languages, and I think Rust benefits from not even trying to be an OOP language in the traditional 3-pillar sense (Rust doesn’t have inheritance at all, and has non-OOP concepts of encapsulation and polymorphism).

C++ wasn’t even the only programming language to result from glomming on object-oriented programming to C, and of the two big ones, it is the more coherently integrated. Objective-C comes from a more dynamic tradition of object-oriented programming, and it really feels like two programming languages glued together, in this case C and Smalltalk.

I programmed Objective-C professionally for a while, and most of the time, the C only came out when you had to do a little bit of pure logic outside of the object-oriented framework. In the meantime, all of the OOP code had to be written using the little whisps of syntax C left behind, especially @, which basically served as a sigil to indicate that what followed was to be interpreted in an Objective-C way … which in an Objective-C codebase basically should have been the default.

At the time, I dreamed of the leaner programming language inside Objective-C (the non-C one), and even started designing a Smalltalk dialect designed to interact with Apple’s Cocoa APIs: CocoaTalk, I think it was called. Ultimately, Apple unveiled their concept of it, sharing many ideas with Rust, known as Swift. I felt very vindicated the day Swift was announced.

Rust is C++’s chance to get a leaner, cleaner programming language. The syntax is heavily influenced by C++, even as the semantics come from a variety of sources. The design was done de novo with guiding principles that allowed all of C++’s vast repertoire of features to be reimagined but working in concert with each other. As someone who used to love programming in C++, which enabled programming techniques no other programming language could, I continue to be deeply impressed by the feature design of Rust.

A Checklist of Dev-Ops Disciplines

2022-05-09T00:00:00+00:00

I have worked on a lot of programming projects in my time, and while I was a programming consultant I have worked in a lot of different corporate environments. At some of them, it was easy to be concretely productive: I was able to contribute immediately, and at a rapid rate. At others, actual useful contributions would be impossible until I had a month or more of experience with a codebase, and even then every change would be a long slog. The difference can be overwhelming and palpable.

The biggest contributors to this difference wasn’t what programming language was chosen (though I do care a lot about that), nor how well the code was factored (though that’s also very important), but rather the organizational structure that surrounded the code: the build system, the repo configuration, the tests, the documentation, the ticketing system – the stuff outside of the code itself that was essential to how programmers interacted with the code. Most, but not quite all, of what I’m talking about falls under the header of Dev Ops.

After having read (some of) Code That Fits in Your Head, I have come to believe in the importance of check-lists, so I’ll share with you my personal check-list of important dev-ops and dev-ops adjacent considerations when setting up a new project, so that developers can work rapidly, effectively, and with fewer mistakes.

The stakes are high – if it takes forever to make a change, if the process between modifying your code and running your code is too long, programmers won’t be able to work unless they’re much more confident, biasing them towards overly simple fixes and against more complicated refactors. If there’s no tests, programmers will be overly careful modifying the code to avoid breaking things, and so the code won’t be able to evolve. New team members will take much longer to gear up, and everyone will be much less productive.

The worst thing is, managers are liable to dismiss developers’ complaints, and developers are unlikely to have the confidence to raise them. It’s easy to be unsympathetic to the complaint that a job is tedious or inconvenient. It sounds to many programmers and managers alike like laziness, and the obvious answer is “Well, that’s why we pay you the big bucks.” Obviously development at these shops is still possible, and the old hands at the company, who are used to whatever system’s in place, have accepted the costs already.

But make no mistake: Developer convenience and happiness is closely connected to developer productivity and accuracy. So let’s discuss how to make a development environment convenient for a developer.

So here’s how we can make programming convenient, as a check-list with some explanation for each item. Many of these items I learned from colleagues and leaders along the way in my programming career; this is my first attempt to collect all of them.

Development Environment

Let programmers use their own preferred development environment

Many developers have life-long habits and long-accumulated configurations for their favorite editors. I know I do! Standardizing IDEs or even operating systems can be tempting, but in general it isn’t worth it. Programming includes a lot of little steps, and making all of them take longer by changing a programmer’s environment can destroy momentum.

Provide standards for developer workstations

This might seem to contradict the previous point, but I honestly think you need both. It should be super easy to figure out what kind of operating system requirements and dependencies are necessary to build all the projects, because, as we’ll get to soon, developers should be able to build projects locally.

For example, most Linux distributions are customizable enough that programmers will be able to find a development environment within that distribution that suits them. The dependencies of a project can then be specified as a package list within that distribution, but the developers can then customize the rest of their interface. Commonly used distributions should be preferred if developers are doing their own IT, so that they can easily find help online.

Build System

You’ve changed a line of code. Congratulations! Now how long will it be until you can see the results of your change in action? How many steps do you have to take to see if it fixed your issue? If it broke compilation? If it passes tests?

If the amount of time or number of steps is low, then people will be able to try out various solutions, use trace statements to debug issues, and otherwise interact with their code like a live system. If it’s high, they have to rely more on their own reasoning, which is fallible, more likely to lead to bugs, and more likely to lead to timid, overly-conservative changes, that work around problems rather than addressing them.

So how do we accomplish this?

Projects should run natively and directly on developer workstations

In my mind, this is almost a deal-breaker for development. If you have to deploy to a dev environment or install on a physical piece of embedded hardware to test your software, your dev cycle will be far too long. Dev environments and physical hardware are of course essential for testing, but using them for absolutely all development introduces resource constraints where there don’t have to be, and lengthens dev cycles.

Even if the local dev environment is different from the prod environment, that’s fine. Even if some of the code won’t run and make sense, it’s still important to be able to run the rest of code locally. Even if it’s running on an embedded platform and operating system with no proper simulator, some of the code will work on Linux or macOS. Those components should be testable on the developer workstation itself.

Building a project locally should be a single command
Building and running automated tests for a project should be a single command

When I say a single command, I mean it. Exactly one. Two is far too many. Once you try it, you’ll never go back. If your workplace doesn’t do this, write a script. Check the script in.

Of course, if different developers have different computers, this might be difficult, but if you assume a standard set of dependencies, (or use a reproducible build system like NixOS), this command can just be an invocation of the build system.

In situations where it’s more complicated, a shell script should be written to encapsulate the complexity. This shell script should be included in the repo and maintained and checked by CI along with the other code, so that it always works. The exact invocation of the command should be completely invariate, and documented in the projects README.md file.

Programmers don’t need to be distracted by complicated multi-part instructions that haven’t worked exactly right in years, or that work on some machines and not others or by twiddling with their Docker settings. They should be focused on actually improving and fixing code.

Building and running a project locally should be a single command

This is similar to the above but might require sample configuration to be checked in along with the repo.

Builds should be reasonably fast

Developers should program on sufficiently powerful computers for their builds. Build scripts should use options like -j and if helpful send builds seamlessly to build farms (the seamlessness is important; it should still be a single command for the developer and result in a local build and run). Private caches should be set up, if this is possible with your build system.

If programming in C or C++, header file hygiene can be an important consideration in build speeds – invest time into it. Use incrementality features of your build systems rather than having scripts that clean every time. Structure the code so that incremental builds are possible.

If necessary, allow developers to build only part of the project (while still making it simple to build the entire project).

Version Control

Use version control for all projects

This is hopefully obvious to all modern teams, but I wanted to make sure I said it anyway to talk some about why it’s important.

The first and more obvious upshot of version control is being able to undo and research mistakes. If the code changed how it works, developers should be able to ask “when did it break” before asking “how did it break.” If the changes in the log are fine-grained enough, this might prevent the need for investigating the “how.” (Note that bisecting often requires fast dev turn-around as well – these are interconnected.) Version control should always be used. Even informal, one-person projects, such as writing test programs to try out APIs, should exist within a version-controlled repo.

The second upshot is that it enables collaboration. This also makes it important even for very small projects, because it enables you to easily ask your colleagues for help, and your colleagues can then look at the code with their own preferred development environment and try out fixes on their own machine.

Developers should be proficient in Git

It’s not enough to cargo cult Git knowledge or focus on that “one guy who understands Git.” Everyone should put the effort in to be that “one guy.” If you don’t know what “reflog” means or how rebasing differs from merging or how to edit commits deep in the history, you’re not a sufficiently proficient git user. Many, perhaps even most, programmers aren’t.

Use and Enforce a Branching Discipline

Even on relatively small projects, no one should be committing and pushing directly to the trunk/main/master branch. If people push directly to master, every commit is automatically collaborative. This will make developers commit less frequently than they otherwise should, and will decrease the effectiveness of version control by having fewer versions to go back to.

It will also, obviously, lead to people accidentally “breaking the build” as projects get bigger. Committing a small change and merging that change into trunk or develop should be two different actions. The first should be done extremely often, and the second should only be allowed if a certain number of hoops have been jumped through.

Enforce CI

Before code can be merged into master, it should build. By default, merging into master should be impossible unless the repository has verified the build with CI. This is where we can easily test that it builds in a deployment setting in addition to a development setting, where artifacts can be created to deploy to servers or embedded devices (though this should also be possible to do locally) and where we can run automated tests. Coding standards should be enforced here, through lints. clippy and cargo fix are great tools for Rust.

Ideally, your CI scripts should be checked into the same repo as the systems they test, as is supported by GitLab with its .gitlab-ci.yml files.

Have tests in the repo

This is related. I’m not going to go into how to write tests and test coverage and all of that here; that’s again a separate topic for many many books. But there should be tests, and the important tests should be in the repo, and they should automatically be run by CI.

Remember: Tests aren’t just a tool for making sure the developers didn’t mess up after a fact. They’re there so developers can make sweeping changes with confidence.

Avoid mono repos

This one’s simple: The git log is too spammy and CI for the whole thing takes too long to run. Also, we have the technology of submodules, or, if on Nix, nix-thunk.

Require code review

This should be enforced by your Git system. As for how to actually do code review, this is a big enough topic to be its own section, which is coming up.

Code Review

The main point of code review is not to make sure bugs don’t get into the code, although it helps with that. The main point of code review is to mitigate bus factor, that is, to make sure there’s more than one person who is ready to maintain the code. All other guidance flows from here.

At least the person who maintains the code should also review

If the MR is written by the primary maintainer of the codebase, it should reviewed by whoever would have to step up if they were abruptly “hit by a bus.”

This ensures that everyone maintaining the code is in agreement with not just style and correctness concerns, but in the general design, architecture, and organization of the code.

The standard should be “Would I take responsibility to maintain this?”

If the answer is no, why not? Asking myself this question motivates me to make more suggestions about how the code should be factored, so I can jump in and make changes easily like I can with my own codebases, rather than just simply verify that it looks like it works and doesn’t have any unwrap() calls.

This question leads to some natural sub-questions:

How hard is it to find bugs in?

It shouldn’t just not have bugs, it should be obvious it doesn’t have bugs. This way, when a bug is actually discovered, code that isn’t buggy but is complicated won’t distract the poor developer trying to find the cause.

How hard is it to modify to do something else?
How easy is it to mess up?

This is where DRY (don’t repeat yourself) comes in. If I repeat the same pattern of code more than 2 times, and someone modifies it, they might only modify some of the instances of the pattern. This can also be mitigated not through abstraction but by putting all the instances next to each other, which is sometimes appropriate.

The code, however, should also not do premature abstraction, because then it will be impossible to find issues among all the spaghetti of function calls and variable references, so this is a balancing act.

If a bug is found to be caused by this change, will we know which part to revert?

Remember, programmers should be able to bisect instead of having to read an entire codebase when they want to find a bug. If you found out that the bug was caused by this change set, would you be relieved to know or would you still have a lot of work ahead of you?

Documentation

Last but not least, documentation.

Documentation should say how to build the project

It should, as mentioned, be one command, and it should not depend on very much set up beyond “having a standard development workstation.”

Documentation should say how to run the project

What flags or configuration does it take? How do you tell it to re-read the configuration? Does it use any environment variables?

Documentation should say what the project is for

This should be before how to build it and run it, and should explain who might want to run it and where it fits into the broader organization, and the first things a programmer might want to know before looking at it. This will help people understand the stakes of modifying it, and where to start looking for features. This should be covered in the lede paragraph.

Which leads me to:

There should be a lede paragraph

This should introduce the repo to someone who’s never heard of it and doesn’t have any context for what they’ve stumbled across. It should include its role in the company’s tech stack, its status, and what technologies it uses.

Here’s some examples:

This is the main repo for our flagship product, and it is one of our few repos that is not open source. Customers use it directly to control the widget machines, which it contains all the drivers for, and also Node.js code to serve the user-facing web interface.

This is run as a twice-daily batch job to automate pruning the widget description files. It is run on customer machines, and is open source as local administrators might want to customize it. It is written entirely in Perl 4 except for one module that is written in APL. Sometimes, it doesn’t work correctly, and we have to manually run an earlier version written in JCL and Cobol (link).

This implements the new DSL for widget description. Currently, it only supports translation to old widget descriptions, but it is hoped that it will eventually be integrated into the main repo. It is a research project still under active development. It is written in Haskell and Idris, and contains, as a component, a custom Prolog interpreter.

Documentation should be discoverable

It should either be in the README.md of the relevant repo or linked to directly from there.

Ticket Systems

I guess I lied when I said that documentation was last. Project management is, I think, a topic for a different blog post, but what I wanted to say about this is: It should be very easy to add a new TODO item that the programmer doesn’t have to remember anymore. If it takes too long to make a ticket, developers will lose their flow on the project they were trying to work on, or will produce fewer tickets, in a bad way.

Ideal is “type a single sentence and press a single button” either in web or (preferably) command line. The resultant TODO items can then be fleshed out in a separate grooming meeting.

Conclusion

Paying attention to these things is a bigger multiplier on developer productivity than finding “10x developers,” and is essential for attracting and retaining good developers. Improving these things is hard, especially at organizations that are set in their ways, but it is far more important than it might look. Dedicated dev-ops professionals are essential in such things.

Can you reproduce it?

2022-03-22T00:00:00+00:00

NOTE: This post has the #programming tag, but is intended to be comprehensible by everyone, programmer or not. In fact, I hope some non-programmers read it, as my goal with this post is to explain some of what it means to be a programmer to non-programmers. Therefore, it is also tagged with “nontechnical”.

What is the most important skill for a software engineer? It’s definitely not any particular programming language; they come and go, and a good programmer can pick them up as they work. It’s not estimating how long a project will take, as important and elusive as that skill is – because fundamentally, no one can, and many, many programmers are successful without having fully built up that skill.

No, in my learned and considered opinion, the most important skill in a software engineer is solving – and preventing! – problems. It is squashing and preventing “bugs” – those situations where the software behaves in an undesirable fashion, where it fails to meet expectations, whether or not you knew about those expectations ahead of time. That is the crux of the software engineering skillset. Preventing and fixing bugs is the goal which the other skills uphold, and the criterion by which software engineering principles and practices should be evaluated.

My other programming posts can be understood through that lens. All my posts on why Rust is a better programming language than C++ – the point is that Rust, as a programming language, is top-notch bug repellant technology. For any post about code organization and readability, the reason it’s important for code to be organized and readable is so that another programmer trying to find a bug is able to find it quickly, or that a programmer trying to add a feature doesn’t end up also adding more bugs, due to a misunderstanding of how the code works.

But today, I wanted to talk less about the prevention, and more about the squashing, about what to do when you’ve found a bug.

So how do you squash bugs?

First, I want to note that the most important bug-squashing tool is the human brain.

There is a tool, a type of program, called a “debugger,” but that is less essential than you might think from the name. A debugger won’t fix bugs for you, and unless the bug is a crash that actually happened, it can’t even find them. If a debugger could fix – or even just find – all your bugs, that would be almost equivalent to a program that could write programs, because, as mentioned before, preventing, finding, and fixing bugs is the crux of the entire job, and I know it hasn’t been automated, because I still get a paycheck.

What a debugger can do is attach to a running program, let you run it one line at a time instead of all at once, and let you inspect the program’s internal state to make sure it is what you think it is. Additionally, if there is a crash, the debugger can inspect the crash data, sometimes in the form of what’s known as a “core dump,” and tell you what line of code was running when the crash happened, a “backtrace” of how the program got there, and what values were in what variables then.

This is all useful in the debugging process, but not essential. The program ought to be called an “inspector” – or perhaps the “debugger’s companion,” because as a programmer, the true debugger is you, and much of what the “debugger” tool can do, you can do without it as well, using (for example) more verbose log lines and error messages, and just good old-fashioned reasoning power.

So what do you do, when you have a bug? Where do you start?

You might assume that the first thing to do when you see a bug is to try to find out what caused it. But not only is that going to be difficult without some initial steps, it can lead to problems, where you think you’ve got it, you go and do your fix, think it’s better, and actually the bug turns out to still be there.

No, more important than trying to guess what might have caused a bug is figuring out how to tell when the bug is actually fixed. If all we know is “sometimes, the app crashes,” and we change something, and the app doesn’t crash right away, well, is that because we fixed it, or is that because it just happened to be one of those times where the app doesn’t crash? If I had a nickel for every time a programmer thought they’d fixed a bug…

And this is where, although programming definitely is a type of engineering – software engineering – it has an advantage over other engineering fields. With software, we can run the same program over and over again, often almost for free, in a way that we can’t rebuild a bridge or dig a new mine tunnel. With software, often – not always, but usually – we can re-run a program, do a few things, and see if the bug arises again. And then, through experimentation, we can come up with a procedure that allows us to always trigger the bug.

In this way, “sometimes the app crashes” can be refined through experiment to “when you go to the settings page in particular, sometimes, the app crashes” which can then be refined to “when you go to the settings page, and you’re not logged in, and you’re on an iPhone from the past 3 years, it crashes every time.” And now, you have a way of knowing when you’ve fixed it. If you do those exact things, and it doesn’t crash, then you can be confident that your fix actually took.

Refining the conditions in which your bug is a bug, or rather, coming up with a list of instructions to make the bug happen on purpose – ideally as short a list as possible – is known in the business as “reproducing the bug.” And it is the most important skill-set in people who are testing software.

Because if I’ve written some code, and someone else is testing it, and they’ve gotten it to crash, but don’t know how or why it crashed – or especially if they don’t even know what they were doing when it crashed – well, that doesn’t do very much for me. I have no idea where to even start. Because I haven’t experienced any crashes, I can’t look at your crashes. It works on my machine. How can I solve a problem I can’t even see?

So this was a problem for me when I was a lowly iPhone programmer working in a small three-person company. My app would have error messages pop up – and this was frequent, as my boss, who was not a programmer, didn’t let me spend time improving how the app worked unless I was making continuous visible progress. My boss would tell me that he’d gotten an error message while using the app. Can I look into that?

I didn’t know what to do. I always looked into it when I got error messages, and was often able to reproduce them, find them, and fix them, but obviously my boss was better at QAing my app than I was – at least in the triggering bugs department. So I told him so. “I don’t know what to do – I can’t fix it unless you can reproduce it.”

What happened subsequently confused me. My boss would text me, giddy, for some reason acting as if he was winning an argument against me, saying “I reproduced it.” And then he’d send me a screenshot of the problem. He did this repeatedly. He seemed to think “reproducing the bug” meant screenshotting the bug in action.

Later I saw him in person, and he asked me about whether I’d fixed the bug yet. I tried to explain that that wasn’t enough; what I needed was a step by step explanation of how to make the bug happen. And he said something along the lines of, “How can you still not believe me? I sent the screenshot.”

I was flabbergasted. I said, “Wait. I’m not literally doing this as a policy, where I don’t work on it unless you prove to me there’s an actual problem. It’s not because I don’t believe you. It’s not because you have to convince me it’s real before I work on it.”

And my boss responded, with an affected, over-the-top laugh, “Haha, no I get it.” And then paused a second and continued, “But you kinda are though. That’s exactly what you’re doing.”

Apparently, when I said “I can’t,” my boss heard “I won’t.” My boss thought this was about me standing up for myself against potential spurious work, and being overly strict about burdens of proof, rather than me literally asking questions that would make it possible for me to do my job and troubleshoot these issues.

At this point, I was just shocked. How did he think I was going to do my job? Obviously I have to figure out what was going wrong. And obviously – or at least it was obvious to me – that required follow-up questions. Why was he so affronted by my follow-up questions? What did he think fixing the issue looked like? Did he think I’d just say, “Oh, error messages. I must have left some extras in. I’ll go take them out.” No, the error messages were the visible result that happened when the code didn’t know how to proceed to accomplish its tasks, and I had to go deeper in to find out why that was happening.

Luckily, I was able to compose myself, and think quickly on my feet. I knew I wasn’t going to be able to actually explain reproducibility to him – after all, I just had, and he somehow misinterpreted it as insubordination – so I fibbed a little.

The debugger that came with Macs for debugging iPhones looked very sophisticated, and to use it with the app, you had to connect the phone to the computer with a cable. This would allow you to, as I said before, inspect the current program state, and so on. It looked like a very useful tool – and it was. Just not as useful as the human brain, and not something you could use to skip steps.

But even though what I really needed was instructions on how to make the bug happen, I told my boss that what I needed was for the bug to actually happen while the phone was plugged into the debugger.

This served two purposes: It made him believe me that I wasn’t just messing with him, and it gave him a concrete reason to reproduce the bug. Now that he had the goal of making the error message pop up while it’s plugged in, he was able to figure out for himself that the easiest way to do that was to come up with a way to make it happen on purpose.

Now, when he found a new bug, he wouldn’t bring it to me until he was prepared to make it happen while it was plugged in. Some of them, he already knew how to trigger; he just hadn’t heard an adequate explanation for why he should tell me. Other bugs, he would figure out. In either case, once he was done, he would make the bug happen while it was plugged in.

And when this happened, I could either ask him how he got it to happen, casually, while pretending that the more important thing is that it’s plugged in, and while he sees that I’m using a computer and therefore “working” – or, if the steps were inconsistent or unclear, or if we’d just gotten lucky to see an error message while attached to the debugger, I could use information from the debugger to figure out what the program had been doing before the error – and use this, not to fix the bug immediately, but to figure out how to reproduce it myself.

So what did I learn from this experience? Well, even though the most important tool in programming (and bug-squashing) is the human brain, I learned that people, especially non-programmers, are more comfortable when they see you using other, concrete, fancy-looking tools. And if your human brain needs input to complete a task, people might be more likely to give it to you if you pretend the computer needs that input.

For those of us who work primarily with our brains, this can be frustrating and disappointing, and can lead to the need to fib a little sometimes to accommodate these biases.

And additionally, I learned about the depths of the disconnect between different levels of expertise. I thought it was obvious that to fix a problem with some code, you had to understand it at least well enough to make it happen again. This seemed to follow directly from first principles, to be the only logical way that it could work, if you thought about it. But my boss hadn’t thought about it, and didn’t understand this. The gap in our perspective didn’t come from his lack of detailed technical knowledge or specific technologies, but rather from his lack of a developed intuition for how programming works.

A Rust Gem: The Rust Map API

2022-03-12T00:00:00+00:00

For my next entry in my series comparing Rust to C++, I will be discussing a specific data structure API: the Rust map API. Maps are often one of the more awkward parts of a collections library, and the Rust map API is top-notch, especially its entry API – I literally squealed when I first learned about entries.

And as we shall discuss, this isn’t just because Rust made better choices than other standard libraries when designing the maps API. Even more so, it’s because the Rust programming language provides features that better expresses the concepts involved in querying and mutating maps. Therefore, this serves as a window into some deep differences between C++ and Rust that show why Rust is better.

And for this post, specifically, we’ll also be discussing Java, so this will be a three-way comparison, between Java, C++ and Rust.

Reading from a Map

So, let’s talk about map APIs. But before we get to Entry and friends, let’s discuss something a little simpler: getting an item from a map. Let’s say we have a sorted map of strings to integers:

In Java, TreeMap<String, Integer>
In C++, std::map<std::string, int>
In Rust, BTreeMap<&str, i32>

Let’s also say we have a string "foo", and want to know what integer corresponds to it. Now, if we’re always sure that the string we’re looking up is always in the map, then we know what we want: we want to get an integer.

But what if we’re not sure? There are plenty of situations where we want to read a value corresponding to the key – or do something else when that key is not present. Maybe the value is a count, and an absent key means 0. Or maybe the absent key means that the user has made a typo, and needs to be informed. Or maybe the map is a cache, and the absent key means we need to read a file or query a database. In all of these cases, we need to know either the value, or the fact that the key is absent.

Let’s see how this is handled in our three programming languages, and how fundamental design choices in these programming languages lead to such APIs.

Java `get` a (Nullable) Reference

A long time ago, Java made an extreme choice in the name of simplicity: It divided all values into a dichotomy of “primitives” and “objects.” Primitives are passed around by implicit copy, whereas objects are aliased through many mutable references. Objects always have optionality built in – any object reference is automatically “nullable,” which means you can store the special sentinal/invalid value null in it, the interpretation of which varies wildly. Primitives are not optional in this way.

Also for the sake of simplicity, and very relevantly to the topic at hand, generics are only supported for object types, not primitives. That means that map values can only ever be object types. And that means that our map from strings to integers in Java doesn’t use Java’s primitive integer type int, but rather this special wrapper/adapter type Integer, which auto-casts to and from int, and which, like any object type, is managed through mutable, nullable references. (At this point, I for one am beginning to suspect they missed the mark on their simplicity).

So what’s that mean for our map? How do we find out what value corresponds to "foo" in our map, or else that there is none? Well, the method for this is called get, and that returns the value in question if there is one. And when there isn’t? Well, Java here leverages nullability, and returns null when there is no value.

So we can write something like this:

Integer value = map.get("foo");
if (value == null) {
    System.out.println("No value for foo");
} else {
    int i_value = value;
    System.out.println("Value for foo was: " + i_value);
}

So far, so good. But there are problems. And perhaps I’m missing some – now is a good time to take a second, look at the code, and try to imagine in your mind what problems there may be with this system (you know, besides the fact that I have to use i_ as improvized Hungarian notation due to lack of support in Java for shadowing).

You have some? I’ll now list what I’ve got.

Problem the first: The signature of get doesn’t really alert us to the possibility of a value not being in a map. This is the sort of “edge case” that programmers regularly forget to handle; a programmer may know, due to their situation-specific knowledge, that the key ought to be present, and forget to consider that the key might not be.

Compilers of strongly typed languages generally work to ensure that programmers don’t miss edge cases like this, don’t make simple “thinkos” (typos but with thought) or “stupid mistakes.” How’s Java hold up? Well, remember how we mentioned that primitives can’t be null, but these wrapper types like Integer are coercible to primitives? Well, this compiles without a word of complaint from the compiler:

TreeMap<String, Integer> map = new TreeMap<String, Integer>();

map.put("foo", 3);

int foo = map.get("foo");
System.out.println("int foo: " + foo);

int bar = map.get("bar");
System.out.println("int bar: " + bar);

And what happens at run-time? Similar behavior to Rust’s infamous unwrap function. The conversion from the nullable Integer and the non-nullable int crashes when the Integer is in fact null:

int foo: 3
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.TreeMap.get(Object)" is null
        at test.main(test.java:12)

So you might try to fix this by querying if the key exists first:

TreeMap<String, Integer> map = new TreeMap<String, Integer>();

if (map.containsKey("bar")) {
    int bar = map.get("bar");
    System.out.println("int bar: " + bar);
} else {
    System.out.println("bar not present");
}

But now we’ve reached problem the second. Unfortunately, even though this looks like it addresses the issue, this won’t prevent the crash either. There is nothing stopping you from putting a null into the map, so this code also crashes given the right context:

        TreeMap<String, Integer> map = new TreeMap<String, Integer>();
        map.put("bar", null);
        if (map.containsKey("bar")) {
            int bar = map.get("bar");
            System.out.println("int bar: " + bar);
        } else {
            System.out.println("bar not present");
        }

So for a given key in a Java map, there are actually three possible situations:

The key is absent.
The key corresponds to an integer.
The key corresponds to one of these special null-values.

get can distinguish 2 from 1 and 3, but cannot distinguish between 1 and 3. containsKey can distinguish 1 from 2 and 3, but cannot distinguish 2 from 3. To distinguish all 3 scenarios, and handle all the representable values, you need to call both get and containsKey:

if (map.containsKey("bar")) {
    Integer bar = map.get("bar");
    if (bar == null) {
        System.out.println("bar present and null");
    } else {
        int i_bar = map.get("bar");
        System.out.println("int bar: " + i_bar);
    }
} else {
    System.out.println("bar not present");
}

In addition to this precaution not being enforced to the compiler, it leads to problem the third: We are now querying the map twice. We are walking the tree twice with our containsKey followed by get.

At this point, we find ourselves scrolling through the Map methods in Java’s documentation, trying to find a more general solution. getOrDefault might help in some situations – when there’s a value that makes sense as the default. compute might be useful – if we’re OK with modifying the map in the process.

But in general, nothing clean exists to tidy up these problems. And the blame lies squarely on Java’s decision to make almost all types – and all types that can be map values – nullable.

But wait! – you might object – Can’t we just maintain an invariant on the map that it contains no null values? If we have a map without null values, all these issues – well, many of these issues – dry up.

And this is true. Maintaining such an invariant makes for a much cleaner situation. Pretend you aren’t allowed to put nulls in maps, and arrange not to do it.

But, first off, maintaining an invariant like this is easier said than done. Programmers often do this sort of thing implicitly in their head, but it’s much better to comment. Either way, you have to trust future programmers – even future versions of the same programmers – to know about the invariant, either by intuiting it (all too common) or by reading the relevant comment (which, even if there is one, might not happen). And you have to trust them to not intentionally violate the invariant, and also to not accidentally violate the invariant: Are they sure that all those values they add to the map can never be null?

And second off, somewhat shockingly, sometimes people do assign special meanings to null. I said before null has a wide range of meanings, and it’s not uncommon to use null to mean special things. Maybe “not mapped” means “load from cache,” but “null” means “there actually is no value and we know it.” Or maybe the opposite convention applies. null is frustratingly without intrinsic meaning.

For such situations, programmers should probably compose the map with other types or better yet, write custom types that make the semantics of these situations abundantly clear. But let’s not put all the blame on the programmers. If Java had really wanted to protect people from distinguishing these “not mapped” and “mapped to null” situations, Java maps shouldn’t have made the distinction representable at all. It’s bad programming language design to put features in a library that can only be abused, and it’s bad understanding of human nature to then solely blame the programmers for misusing them.

C++: No Nulls No More

So now we move on to C++.

In C++, fewer types are nullable, and non-nullable types like int can be used as the value type of a map. For our map, of type std::map<std::string, int>, we no longer have the trichotomy of “key not present, value null, or value non-null,” but the much more reasonable dichotomy of either the key is present and there is an int, or it’s absent and there isn’t one.

This is, in my mind, the bare minimum a strongly typed language should be able to provide, but after the context of Java it’s worth pointing out.

There are three (3) methods in C++ that look like they might be usable as a get operation, an operation where we either get an int value or learn that the key is absent:

See if you can identify which one is the right one to use.

Spoiler alert! It’s find, the one whose name superficially looks least like it’ll be the right one. at throws an exception if the key is absent, and operator[], the one with the most appealing name, is an eldritch abhomination which we’ll discuss and condemn later.

But all well-deserved teasing aside, find is much better than Java’s get. It returns a special object – an iterator – that can be easily tested to see whether we’ve found an int, and easily probed to extract the int.

auto it = map.find(key);
if (it == map.end()) {
    std::cout << key << " not present" << std::endl;
} else {
    std::cout << key << " " << it->second << std::endl;
}

This is actually pretty good! The -> operator also serves as a signal to experienced C++ programmers that we’re assuming that it is valid: generally -> or * means that the object being operated on is “nullable” in some way.

So when a C++ programmer reads something like this, they have a little bit of warning that they’re doing something that might crash:

int foo = map.find(key)->second;

And certainly, they have more warning than the Java programmer with the equivalent Java:

int foo = map.get(foo);

Of course, this is awkward. find returns an iterator, which isn’t exactly the type we’d expect for this “optional value” situation. And to determine if the value isn’t present, we compare it to map.end(), which is a weird value to compare it to. Nothing about what these things are named is specifically intuitive, and people would be forgiven for using the accursed operator[]. map["foo"] just looks like an expression for doing boring map indexing, doesn’t it?

And what does operator[] do, if the key isn’t present? It inserts the key, with a default-constructed value. No configuration is possible of what value gets inserted, short of defining a new type for the object values. This is sometimes what you want – like if your value type has a good default (especially if you defined it yourself), or if you’re about to overwrite the value anyway. But in most cases, you want some other behavior if the value is not present – operator[] doesn’t really tell you that it inserted the item, so if you need to make a network query or read a file or print an error, you’re out of luck. operator[], as innocuous as it looks, has surprising behavior, and that is not good.

But all in all, as far as getting values goes, as far as querying the map goes, C++ is doing OK. Solid B result on this exam, I think. Decent work, C++. Especially since we just looked at Java.

The Rust `Option`

So now on to Rust: we want to query our BTreeMap<&str, i32>.

(Or… it might be a BTreeMap<String, i32>, depending on whether we want to own the strings. This is a decision we also have to make in C++ (where we could have used string_views as the keys), but do not have to make in Java. At least in Rust, we know that whichever decision we make, we will not accidentally introduce undefined behavior. But that’s a distraction!)

So let’s apply the same test to Rust as we’ve applied before. Here, the method in question is given an obvious name, get rather than find. So let’s see how it does in our test, of allowing us to read a value if present, but know if not:

if let Some(val) = map.get(key) {
    println!("{key}: {val}");
} else {
    println!("{key} not present");
}

See, get returns an Option type. Therefore, unlike in C++, we can test for the presence of the value and extract the value inside the same if statement. Unlike in C++, the return value of get isn’t a map-specific type, but rather the completely normal way to express a maybe-present value in Rust. This means that if we want to implement defaulting, we get that for free by using the Option type in Rust, which implements that already:

// Let's say missing keys means the count is 0:
let value = *map.get("foo").unwrap_or(&0);

Similarly, calling is_none() or pattern-matching against None is much more ergonomic than comparing an iterator to map.end(). It requires some more intimate knowledge – or some follow-up reading – to learn that the concept of “end of collection” and “not found” are for various reasons combined into one in C++.

So while C++ avoids the problematic elements of Java maps, Rust does so more ergonomically, because it has a well-established Option type. C++ now has one as well, std::optional, but it hasn’t yet reached its map API, because it was only added very recently, in C++17.

And Option integrates even better than std::optional with the programming language, because Option is just a garden-variety sum type, a Rust enum, which lets you do things like if let Some(x) = ..., and combine testing and unpacking in the same statement. C++ could not design a map API this ergonomic, because they lack this fundamental feature.

Also, unlike with null in Java, if you want to use Option as a meaningful distinction in your map, you still can. The get function would then return Option<Option<...>> instead of just Option – the outer one representing presence, the inner one representing whether the value was None or Some(...). Option is composable in a way that null is not.

For the record, the Rust equivalent to operator[] – the Index trait implementation on maps – does the equivalent to C++ at, and panics if the key isn’t present. While not as generally useful as get, I think this is a reasonable interpretation of what map["foo"] should mean.

Mutation Station

So Rust wins, I’d say pretty handily, when comparing how to access a value from a map, how to query them. But where Rust truly shines is when mutating a map. For mutation, I’m going to approach the discussion differently. I’m going to start by specifying what use cases might exist, and then, in that context, we can discuss how an API might be built.

The mutation situation has a similar dilemma to querying: the key in question might or might not already be in the map. And, for example, we often want to change the value if the key is present, and insert a fresh value if the key is absent.

Of course, we could always check if the key is present first, and then do something different in these two scenarios. But that has the same problem we already discussed for querying: We then have to iterate the tree twice, or hash the key twice, or in general traverse the container twice:

auto it = map.find(key); // first traversal
if (it != map.end()) {
    return it->second;
} else {
    int res = load_from_file(key);
    map.insert(std::pair{key, res}); // second traversal
    return res;
}

So what should we do for our API for this scenario, where we want to change the value if the key is present, and insert a fresh value if the key is absent?

Well, sometimes that fresh value is a default value, like if we’re counting and the key is the thing we’re counting – in that case, we can always insert 0. In that case, C++’s operator[] – when combined with an appropriate default constructor – can actually work well.

And sometimes, that fresh value depends on the key, like if the value is a more complicated record of many data points about the item in question. If the value is a sophisticated OOP-style “object,” and the key indexes one of the fields also contained in the value, C++’s operator[] would not work. The default value is a function of the key.

And sometimes, there isn’t a default value per se. Sometimes, if the key is absent, we need to do additional work to find out what value should be inserted. This is the case if the map is a cache of some database, accessed via IPC or file or even Internet. In that situation, we only want to send a query if the key is not present. We would not be able to accomplish our goals simply provide a default value when sending the mutation operation.

C++ doesn’t have anything for us here. operator[] is pretty much its most sophisticated “query-and-mutate” operation. Java, somewhat surprisingly, does have something relevant, compute. This handles all of these situations, with a relatively unergonomic callback function – and as long as your map never contains nulls.

Rust’s solution, however, is to create a value that encapsulates being at a key in the map that might or might not have a value associated with it, a value of the Entry type.

As long as you have that value, the borrow checker prevents you from modifying the map and potentially invalidating it. And as long as you have it, you can query which situation you’re in – the missing key or the present key. You can update a present key. You can compute a default for the missing key, either by providing the value or providing a function to generate it. There are many options, and you can read all of them in the Entry documentation; the world is your oyster.

So the C++ code above can be ergonomically expressed as something like this in Rust:

let entry = map.entry(key.to_string());
*entry.or_insert_with(|| load_from_file(key))

And the idiom where we’re counting something could be expressed something like:

map.entry(string)
    .and_modify(|v| *v += 1)
    .or_insert(1);

So we get this nice little program that counts how many times we use different command line arguments:

use std::collections::BTreeMap;
use std::env;

fn count_strings(strings: Vec<String>) -> BTreeMap<String, u32> {
    let mut map = BTreeMap::new();
    for string in strings {
        map.entry(string)
            .and_modify(|v| *v += 1)
            .or_insert(1);
    }
    map
}

fn main() {
    for (string, count) in count_strings(env::args().collect()) {
        println!("{string} shows up {count} times");
    }
}

Conclusion

So first off, Entrys are super nice, and neither Java nor C++ has anything anywhere near as nice. Even when it comes to just querying, Rust’s get is much better than Java’s get, and a little more ergonomic than C++’s find.

But this isn’t an accident. This isn’t just about Rust’s map API having a nice touch. When we look at the definition of Entry, we see things that Java and C++ can’t do:

pub enum Entry<'a, K, V> 
where
    K: 'a,
    V: 'a, 
 {
    Vacant(VacantEntry<'a, K, V>),
    Occupied(OccupiedEntry<'a, K, V>),
}

First, this is an enum: There’s two options, and in both option, there’s additional information. Of course, Java and C++ can express a dichotomy between two options, but it’s a lot clumsier. Either you’d have to use a class hierarchy, or std::variant, or something else. In Rust, this is as easy as pie, and since it does it the easy way, you can not only use the various combinator methods in Rust, you can also use Entrys with a good old-fashioned match or if let to distinguish between the Vacant and Occupied situation.

Second, there’s a little lifetime annotation there: 'a. This is an indication that while you have an Entry into a map, Rust won’t let you change it. Now, in Java and C++, there’s also iterators, which you may not change a map while you’re holding, but in both those languages, you have to enforce that constraint yourself. In Rust, the compiler can enforce it for you, making Entrys impossible to use wrong in this way.

Without both of these features, Entry would not have been an obvious API to create. It would’ve been barely possible. But Rust’s feature set encourages things like Entry, which is yet another reason to prefer Rust over C++ (and Java): Rust has enums (and lifetimes) and uses them to good effect.

Addendum

I wanted to address a few points that people have raised in comments since I posted this.

Some people have pointed out that C++ has insert_or_assign, but in spite of the promising name, it just unconditionally sets a key to be associated with a value, whether or not it previously was. This is not the same as behaving differently based on whether a value previously existed, and it is therefore not relevant to our discussion.

More interestingly, it has been pointed out to me that with the return value of insert, you can tell whether the insert actually inserted anything, and also get an iterator to the entry that existed before if it didn’t. This allows implementing some, but not all, of the patterns of Entry without traversing the map twice.

For example, counting:

int main(int argc, char **argv) {
    std::vector<std::string> args{argv, argv + argc};
    std::map<std::string, int> counts;

    for (const auto &arg : args) {
        counts.insert(std::pair{arg, 0}).first->second += 1;
    }

    for (const auto &pair : counts) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }

    return 0;
}

This works, but is much less clear and ergonomic than the Entry-based API. But perhaps more importantly, this functionality is much more constrained than Entry, and is equivalent to using Entry with just or_insert, and never using any of the other methods. As another commentator pointed out, counting is possible with just or_insert:

*map.entry(key).or_insert(0) += 1

But counting is just one example. C++’s insert is still deeply limited. Using C++’s insert means you have to know a priori what value you would be inserting. You can’t use it to notice that a key is missing and then go off and do other work to figure out what the value should be. So you can’t do my load_from_file example.

In order to do the load_from_file example in C++, even with this use of insert, you would have to temporarily insert some sentinal value in the map – and that goes against how strongly typed languages ought to work, in addition to breaking the C++ concept of exception safety.

This is, as was pointed out in another comment, exactly what C++ programmers sometimes have to do, to meet performance goals, at the expense of clarity and simplicity, and therefore, especially in C++, at the expense of confidence in safety and correctness.

The Good Ol' Days of QBasic Nibbles

2022-02-28T00:00:00+00:00

Let’s talk about an ancient programming language! I think we can all learn things from history, and it gives us grounding to realize that our time is just one time among many, to see what people in the past did differently, what they got wrong that we would never do now, and also to see what they got right.

Do you remember MS-DOS? Do you remember that it came with an interpreted programming language? From MS-DOS 5 onwards, it came with not Python, not Javascript or R or Matlab, but a dialect of BASIC. But I think most people, especially most people my age who were children at the height of the MS-DOS era, remember it for the games, the two sample programs that came with it, namely Gorillas and Nibbles (their name for Snake).

Nibbles is extra near and dear to my heart because not only is it the game that I better enjoyed, but more interestingly because it’s the first “large program” that I ever did work on (for me as a child, “large” meant multiple subroutines), and the first existing program I ever modified.

So recently, I tried to see if I could find it. And indeed, I could. I just needed DosBox, the QBasic interpreter (you want QBasic EN 1.1), to run it. After that, you just need the program itself, after which, you can throw them in a directory, “mount” it from inside DosBox, and run QBASIC.EXE and use its very discoverable interface (by 90’s standards).

It looks a little less impressive in such a small little emulation window, but of course at the time it took the entire screen of an entire CRT monitor, and was the best technology available for me to interact with.

Nibbles was a sample game designed for you to learn to program as well as having fun with. True to its time, it had a little set-up interface where you answered questions in a very basic prompt-and-respond TUI before you could start playing:

You ate numbers going from 1-9, which were easy to display – the program, though a video game, runs in text mode! – but at the time I just appreciated that it helped you keep track of how far along in the level you were. So I decided to take a look at the code and discuss it a little bit.

The first thing that struck me was how short it was – at 721 lines, this is a rather short source file, a “simple” module or class, let alone a whole program! I suppose things do seem bigger when you’re a kid.

But also, I didn’t view it as one block of size-12 text on a high-resolution monitor. I read it in QBasic’s built-in code browser where it showed up as 14 different logically separate parts, at the time an overwhelming number:

And this is then what the subroutine would look like:

Code browsers are great, and this interface is a solid reminder that subroutines are a very early form of modules, especially given that in QBasic, these subroutines could contain their own sub-subroutines using the more traditional GOSUB command.

So let’s talk about this programming language and program that once people used to get real work (and real play) done.

First off, we see some mutable global variables, a big no-no by modern standards, but can you really blame them when their scope is no larger than that of a small modern class, where the fields would be effectively global within the context of an instance?

But also, to my pleasant surprise, there were also some global constants, and they are marked as such, with the CONST keyword. In fact, as we see in multiple places, QBasic is actually strongly typed, sometimes even using the sigils, for which the BASIC family is infamous.

The “B” in BASIC stands for “beginner,” and that is exactly the target audience QBasic was designed for. So it’s really refreshing that in the past they didn’t have this notion that types were too advanced for novices, or perhaps too tedious, that an easy-to-learn programming language wouldn’t have you declare them.

Or, of course, maybe duck-typing was seen as too difficult or inefficient to implement. But in that case, why did they have what I imagine would be an equally difficult compromise measure, alphabetically-based type defaulting.

To be fair, for a long time I had no idea what DEFINT even meant, but DEFINT A-Z certainly seemed like an appropriately mysterious and even badass way to start a subroutine, a magical invocation, covering the ends of the alphabet to start off each page of code.

Obviously, QBasic is not object oriented. Its fundamental notion of module isn’t a class, but rather a subroutine or function. These two notions were distinct: functions returned values (like in math) – though they could also have side effects – and subroutines did not. (Both had strongly-typed arguments, however).

This might seem an odd distinction to make, but it makes sense at a certain level. Especially syntactically, subroutine calls definitionally must be the top-level construct of a statement. And lo and behold! – they do not require parentheses around their arguments whereas functions do.

There’s really no reason not to do something like that in Rust, come to think of it. And come to think of it, Haskell makes a vaguely similar distinction, where if what others would call a “function” does IO and does not take arguments, it’s not a function at all, but a special value known as an “action,” which can then only be called in certain contexts.

So what did I do with this? I added more action keys. I added keys to speed up and slow down gameplay on command, so that if you pressed the arrow in the direction the snake was currently going, instead of doing nothing, it sped up the snake. Pressing the opposite direction of where you were going would then slow it down. And then, I wrote new levels, using the existing levels code as a baseline.

And then, after that, I began to attack the main subroutine’s main loop. I thought it would be cool if multiple numbers could be on the screen at the same time, but this required modifying how the location of the numbers were stored, replacing the two variables indicating their current location with a two-dimensional array of boolean values (represented by 0 and -1 – integer/boolean distinctions were not yet well-established).

I wish I still had the code. But more importantly, I’m grateful that the Microsoft of the ’90s, as evil and monopolistic as it was, saw the need to put a programming language, a little IDE, and some sample programs and include them with their operating system. Bill Gates was my hero when I was a small child – before I knew what anti-trust was – and the fact that Microsoft made sure that computers came with plenty of fun corridors for me to explore was a huge part of why.

But also, there was no particular reason why the stuff I was doing couldn’t be done by any other elementary schooler, if there were interest in the schools in teaching it. Variables in programming are far more concrete in their meaning than variables in algebra – for one thing, their values actually vary with time, which made me think the variables in algebra were a bit of a misnomer.

And yet, programming isn’t even a required course in most American high schools. And that, I think, is a real shame. I understand that most schools don’t have the resources to do a good job of it, and that also, honestly, is a real shame.

Warnings and Linter Errors: The Awkward Middle Children

2022-02-25T00:00:00+00:00

What is “bad” Rust?

When we say that a snippet of code is “bad” Rust, it’s ambiguous.

We might on the one hand mean that it is “invalid” Rust, like the following function (standing on its own in a module):

fn foo(bar: u32) -> u32 {
    bar + baz // but baz is never declared...
}

In this situation, a rule of the programming language has been violated. The compiler stops compiling and does not output a binary. In fact, it has to stop compiling, because this is not a Rust program. It might resemble one, but it in fact does not make any sense, because it is violating one of the extra-syntactic constraints that text has to have to be a Rust program.

What would it even mean, to access a variable that’s not declared? When you write a variable access, the compiler issues an access to the corresponding register or location in memory. When a variable is undeclared, no such location exists. The compiler couldn’t compile this code if it wanted to!

On the other hand, there’s this sort of “bad Rust” as well:

fn foo(bar: bool) -> &'static str {
    match bar == false {
        true => "false",
        false => "true",
    }
}

This code is – as the kids say – cringe. Whatever this code is trying to do, it should not be done this way. But for all its flaws, it’s definitely “good” Rust in a validity sense: the compiler knows exactly what to do to output a binary from it, and will do so with no complaints. Whatever is “bad” about this code – and it’s a lot – is bad from a human perspective only; the computer doesn’t even notice. It’s bad idiomatic Rust, not erroneous invalid Rust, and it’s bad because humans prefer not to structure their concepts this way.

So now we have a nice little dichotomy of problems with a Rust program. On the one hand, we have errors, where the compiler will not – cannot, even – produce an output. And on the other hand, we have idiomatic failures. It’s a nice neat tidy distinction that a lot of people make, but in the context of Rust – and with most programming languages – it’s actually problematic, because problems with programs, like gender or political views, don’t actually quite form a tidy binary. And as with gender and politics, oversimplifying types of “bad Rust” into a binary, even conceptually, can lead to practical problems.

I am, of course, talking about warnings and linter errors – those rules that if you violate them, it won’t necessarily cause the compiler to reject the program, but it may, depending on its settings. I’m also talking about things like safety rules, where if you dereference pointers the compiler will normally reject your program, but it can be told not to on a block-by-block basis.

Here’s an example of that, for Rust:

fn foo() -> u32 {
    return 3;
    println!("Not reached!");
}

The compiler knows that that the println can’t be called, and it makes a point to tell the user about it:

warning: unreachable statement

But more on those later. For right now, we’ll continue to try and brush these warnings under the rug.

The Binary Error Model

I call the philosophical framework I am criticizing the “binary error” model, and before I start picking apart at it and denouncing it, I’d like to spend some time explaining what I mean by it, and why it’s appealing.

So to talk about “the binary error model,” as I’ve termed it, we’ll start by talking about why it exists, what problem it’s trying to solve. It’s trying to distinguish between a notion of the programming language in itself, as a platonic ideal almost, versus the other things that surround it – like a reference implementation, or a set of community norms. What would belong in a formal specification, and what not? What would have to be the same for another compiler to also be a Rust compiler?

In the “binary error model,” Rust, or any programming language, is a set of valid programs and their semantics. You could look at it as being analogous to a Rust function with this signature:

fn rust_programming_language(program: SourceTree) -> Option<Semantics>;

SourceTree in this context is a directory hierarchy of properly organized Rust code at some level of organization, maybe a crate. Semantics is a little harder to define – it’s an abstract notion of what the program “does,” a representation of the platonic essence of what the program should output (meaning, in this context, any observable behavior) given a set of inputs (meaning, in this context, any information the program can observe).

So this definition is to say, the Rust programming language, in general, can be thought of, philosophically, as a function from source trees to specifications of concrete behavior. Since this isn’t an actual Rust function, we can handwave those specifications a bit, and discuss them in English or a formal model of our choice.

And this is a coherent way to talk about Rust, a philosophical abstraction with practical applications. For example, if we were comparing two Rust compilers, trying to find out if they implemented “the same programming languages,” we could use this model as our criterion.

So, to find out whether two Rust implementations both implement the same programming language, we use this function signature as our guide: Given the same source tree, do they output programs with the same semantics, the same concrete interaction with the outside world?

There are a lot of things that can be different between implementations:

Do the programs, as compiled by these two different implementations, print out the same values when given the same inputs
Do the programs write the same data to the disk?
Do they panic in the same situations?
Do they have the same FFI characteristics to interact with a C library?
Do they have the same asymptotic complexity? (For a systems programming language, we definitely want to include this under “semantics”)
Do they have the same memory model for internal inter-thread interactions?
Do they make the same safety guarantees?
Do they accept and reject the same set of programs?
Do they print the same exact error messages?
Do they issue warnings on the same set of programs?
Are the two compilers invoked by the same command?
Is one of the compilers actually an interpreter?
Do they target the same processor architecture?
Do they output the exact same binaries?
Do they run with exactly equal performance?

Obviously, different implementations will differ in some of these ways. But we do need some way of defining whether two compilers both implement Rust, rather than one implementing Rust and one implementing Go, or one implementing Rust and the other one not quite succeeding at implementing Rust.

In the model, as we’ve defined it, the question comes down to whether accepted programs have the same semantics (but not form) and whether the set of accepted programs are the same. This means that, of the above questions, they stop mattering after “do they accept and reject the same set of programs?” That is where the binary error model draws the line.

To apply this model, the relevant part of a compiler is that it implements something like this:

fn rust_compiler(program: SourceTree) -> Option<CompiledProgram>;

And then, you could compare two compiled programs based on their semantics.

This model could also be useful for writing a formal specification of the Rust programming language (no, “the compiler itself” doesn’t count as a specification), and for programming languages that have a formal, written specification, it is couched in terms of something similar to this model – but not necessarily exactly.

Warnings and Errors

Let’s take another look at our abstract “function signature” for the Rust programming language:

fn rust_programming_language(program: SourceTree) -> Option<Semantics>;

We have so far been glossing over a feature of the return type, Option. But that is what makes this particular model the “binary error” model, and that’s what I’m going to be criticizing, so let’s discuss it now.

Some source trees are not Rust programs. Some are, in fact, Go programs, or directories full of plain text files, or random binary data. Some, on the other hand, are almost Rust programs, like the example from above:

fn foo(bar: u32) -> u32 {
    bar + baz // but baz is never declared...
}

This model treats all of these programs equally. From the perspective of this abstract function, these all return the same value, None. Which means, from the perspective of this philosophical perspective, all of these are the same: not a valid Rust program.

If we’re comparing two implementations of Rust, this model therefore considers these statements to be irrelevancies:

Do they generate the same error messages?
Are their error messages equally relevant to the problem?
Are their error messages equally comprehensible to a beginner programmer?

These things, however, are still relevant:

Do they reject the same source trees?

In fact, a single program accepted by one and not by the other would make these two compilers implementations of different programming languages.

And what about warnings? This abstract function signature barely has room for errors, flattening them all to None. The complexities of the ways in which a Rust program might be bad are simplified to a binary: it is or is not a valid Rust program. Warnings are rounded to “it is valid.”

So in the “binary error” model, where the “return value” of the abstract function for the programming language is just Option<Semantics>, this function falls into the “valid Rust” side of the binary:

fn foo() -> i32 {
    let Foo = 3;
    Foo
}

This is considered to be the case, even though the standard Rust compiler outputs a warning for it:

warning: variable `Foo` should have a snake case name
 --> test.rs:2:9
  |
2 |     let Foo = 3;
  |         ^^^ help: convert the identifier to snake case (notice the capitalization): `foo`
  |
  = note: `#[warn(non_snake_case)]` on by default

warning: 1 warning emitted

So what’s going on here?

Well, in point of fact, our compiler implementation does not implement Option<CompilerError> as its conceptual return value. Its contract looks more like this:

fn rust_compiler(program: SourceTree) ->
    (Result<CompiledProgram, Vec<ErrorMessage>>, Vec<Warning>);

But when we compare the compiler to other compilers in the “binary error” model, we pretend instead the compiler was wrapped in this wrapper:

fn rust_compiler_for_comparison(program: SourceTree) -> Option<CompiledProgram> {
    let res = rust_compiler(program);
    let (res, _) = res; // strip warnings
    res.ok() // flatten errors, did it compile or not?
}

In this model, only the parts that are part of our original rust_language function truly are part of the Rust programming language. Only the rules that would cause every hypothetical compiler to reject the program are part of the programming language. This warning is “just the compiler’s opinion, man.”

It’s as if the compiler had two jobs: compiling the Rust programming language (defined as including a binary distinction between valid and invalid programs) and separately a linter, which tells you the compiler-writer’s opinions about what might be considered wrong with the code.

And this is a self-consistent way to think about Rust and about programming languages. It has practical applications: It gives you a definition of when two compilers implement the “same” programming language, and it allows you to define a formal specification for Rust – or to imagine an abstract formal specification, if you so choose, and use this notion to think about how your Rust code might fare under alternative implementations of the programming language.

Alternatives to the “Binary Error Model”

There is no coherent way to say that this way of thinking about Rust is wrong, per se. It is a philosophical perspective, a definition of what concepts (like type safety) are part of the “programming language” and the “programming language specification” (even if none has been written) and what concepts are not, what concepts (like using snake case) are just opinions and conventions outside of the scope of the programming language.

But on the other hand, we are not forced to assume this model. As it is a definition of what is part of the “programming language,” we are free to use a different operating definition. As it is a scope for what goes in the “programming language specification,” the Rust community is free to write a formal specification with different scope.

And I think we should, when that time comes, use a different scope. I think that the people in charge of writing the spec come to it, they will use a different scope rather than strictly following the definitions explained here. Because even though the “binary error model” isn’t wrong, per se, I think it is, nevertheless, harmful.

I not only think if a formal Rust specification is written, it should not use this model. I think people should not assume this model. I think it will lead to mistakes in your thinking. I also think that, if you do assume this model specifically in Rust, you have to do a lot more mental work that can be saved by asserting a different model.

So what’s the alternative? Well, our original definition of a programming language did two things. It determined if the program was valid (a binary up-down decision), and it mapped each valid program to its semantics.

An alternative model would not make validity so binary. If we do this in the most straight-forward way, we get something like this:

fn rust_programming_language(program: SourceTree) ->
    (Result<Semantics, Vec<Errors>>, Vec<Warning>);

This loses a few of the nice properties that we had in the previous definition. “Valid Rust programs” is no longer a straight-forward set. Instead, we have a potential multiplicity of sets distinguished by this definition:

Programs that compile
Programs that compile without warnings
Programs that compile without a specific warning we may care about
Programs that don’t compile but only have one error
Programs that don’t compile but only have one category of error

Also, this definition imposes more on the writers of alternative implementations. Suddenly, a compiler is only a valid Rust compiler if it outputs the exact same list of errors and warnings, given an input program.

This seems to me a little too strict. I don’t think the exact wording of an error or warning should necessarily matter or be part of a programming language spec. And compilers regularly stop compiling after experiencing too many errors (where too many can sometimes be one), and implementations would reasonably differ about which errors they would output before giving up.

But I think it’s a good starting point, and in any case much better than the binary-error Option<Semantics> model. Part of the benefit of Rust as a programming language is how much work has gone into its warnings. For an alternative implementation to claim to be Rust without having the same warning system would strike me as extremely misleading. Warnings – obligatory warnings – should be included in any language spec.

Many important Rust safety features are actually warnings. Ignoring #[must_use] is technically a warning – just set to #[deny] by default. A function that has dead code after a return statement: this is a warning, but also a serious correctness issue.

Rust Warnings are Complicated

And of course any Rust implementation would have to include warnings. Just as C (in practice) has a #warning directive, which causes the compiler to issue warnings, Rust has a number of annotations that control the issuance of warnings.

For example, if we add an annotation to our function from before:

#[deny(non_snake_case)]
fn foo() -> i32 {
    let Foo = 3;
    Foo
}

… the warning becomes an error:

error: variable `Foo` should have a snake case name
 --> test.rs:3:9
  |
3 |     let Foo = 3;
  |         ^^^ help: convert the identifier to snake case (notice the capitalization): `foo`
  |
note: the lint level is defined here
 --> test.rs:1:8
  |
1 | #[deny(non_snake_case)]
  |        ^^^^^^^^^^^^^^

error: aborting due to previous error

Any Rust specification, even one with the binary error model, would therefore have to include:

The rules about snake case (variables should have snake case)
The rules about annotations (so that #[deny(...)] triggers an error)

This means that, even if we did imagine a specification where only errors were in scope, rules for warnings would have to also be in that specification, because they can be configured to become errors. And at that point, why not also specify in the specification that the warnings are obligatory?

Especially because we can also say #[warn(...)] as a tolerance level for these configurable rules. What do we say about #[warn(...)] in the spec if warnings are out of scope?

The Other Side of the Binary

Now that I’ve criticized the “binary error” model from the warnings side, I also want to address the notion that all errors are created equal. Errors are different from each other.

First off, there’s an obvious distinction between syntax errors and semantic errors. This is kind of boring and obvious, but it adds some nuance into the idea that invalid Rust is simply “not Rust,” and it comes up in practice sometimes.

As I write my code, I sometimes run cargo fmt as part of my editing workflow. Usually, this helps me read my own code better for further editing, and usually, this works even if my code is full of errors – it might even help me find and understand the errors. But sometimes, my code has a relatively superficial, syntactic error, like a missing }, and cargo fmt can’t even help me. This sends me into a little bit of a panic, but I’m usually also glad I didn’t keep working longer with such a problem.

If a Rust specification wanted to include formatting tools in its scope, it could conceivably make a formal distinction between syntax and semantic errors.

More interesting, however, are errors that don’t have to be errors, where the compiler could keep compiling, but it chooses not to.

We have the obvious example, where the error is configurable, where it’s actually a warning that’s just been set to #[deny(...)] as a lint level.

But we also have things like lifetime errors, which cannot be disabled. Or the rule against dereferencing a pointer outside an unsafe block. The Rust compiler could, if it wanted to, simply allow those things. We could do something like:

#[unsafe_allow(lifetime_mismatches)]

The compiler would then output a program, which would then exhibit undefined behavior – or not. It would then be potentially unsound – or not.

This is not included in Rust, but it’s theoretically possible, unlike referring to a variable that doesn’t exist, where there is no reasonable interpretation of what the code should do.

On the border is things that C++ allows, but are arguably non-sensical like referring to a variable that doesn’t exist. If a function returns u32, and you reach the end of the function, that’s non-sensical, right?

fn foo() -> u32 { }

But depending on the ABI, you can just not output the code that sets the return value, and perhaps not even output the code that returns from the function. This is definitely undefined behavior, but C++ will often allow it, sometimes without even a warning.

Unsafety as Always-On Warnings

As an aside, the #[allow] and #[deny] annotations are very similar to how unsafe works. We could imagine an alternative world where there was no unsafe keyword for blocks. Instead of writing:

unsafe { *ptr }

… we could instead imagine a Rust where this is written as:

#[allow(unsafe)]
*ptr

Basically, using operations like dereference (*ptr) are disallowed in Rust by default, but can be allowed. They are disallowed because, like Rust that is warned about, they are indications that the programmer likely made a mistake. But like Rust that is warned about, the programmer can make explicit that they are using the construct on purpose.

Given that unsafe/safety, one of Rust’s core features, works in a way very similar to warnings, should make us take seriously the importance of warnings. It would have been just as valid from a safety point of view to use literally the same mechanism with #[allow] and #[deny], but I think safety is such an important category of possible mistakes it’s probably for the best that it has its own special syntax.

Take-Aways

So why am I writing all of this, besides thinking it’s all an interesting mental exercise?

I don’t think the authors of any future Rust spec would actually err in such a way as to not discuss warnings at all. But I think it’s important to understand the theoretical implications.

But I also know that people do think in terms of the hypothetical Rust specification which only accepts or rejects programs. I recently saw someone write that capitalization conventions, such as snake case, are not part of the Rust programming language. They meant that according to the “binary error” model that we discussed above, which they implicitly subscribed to, using snake case or not will never change whether your program is a valid Rust program, and therefore, the entire convention is not part of Rust.

But even if we ignore the fact that a Rust compiler needs to know about this convention in case #[deny] is used, this assumes a definition of Rust programming language and Rust specification that uses the “binary error” model.

And while that is one way to think about Rust, it’s not a very good one, and I would say it’s not a very useful one. And more fundamentally, you don’t have to. You don’t have to use this philosophical framework where only rules that cause compilation failures are part of the programming language.

So I don’t think it’s fair to say “in Haskell, variable name case is significant, and in Rust, it is not, it is only a convention and not part of the programming language.” I think it’s more fair to say “in Haskell, case conventions are mandatory, violations are errors, and they are used to disambiguate the syntax. In Rust, they are non-fatal warnings by default, and the compiler can still process Rust with incorrect case, and in some situations has to.” Or, more simply, “in Haskell, case convention violations are errors, but in Rust, they’re just warnings.” But, in both Haskell and Rust, capitalization conventions are part of the programming language. In both, the compiler has to know about them, and enforces them in at least some situations.

This may seem like a nitpick, but I think using definitions of “programming language” vs “convention” can make the “convention” stuff seem less important than it should be. I think that if you think that way, and were writing an alternative implementation of Rust, you might give yourself permission to not care about the warnings. You might be less likely to add a policy to use -Werror, or require clippy to pass in your CI.

If someone with that attitude were writing the language spec – which I don’t think they would be, but if they were – they might underspecify the things that make Rust the useful tool that it is. Programming is about contracts, and as far as I’m concerned, the warnings are part of the Rust compiler’s contract. And a compiler should not be allowed to call itself a Rust compiler if it doesn’t follow it.

Snake case for variables is part of the Rust programming language, as I define the Rust programming language, and – I think – as most of the community defines it. Certainly it is part of “the Rust programming language” as that phrase is used in common parlance, and it is one of many features that make Rust special. If there is to be a specification, it should be part of the Rust specification. I understand that if you use the “binary error” model to define what a programming language is, and what a programming specification should be, you don’t get this result. But I just don’t think you should be using that model, and I think it does matter whether you do, even though it is a philosophical perspective that cannot be disproven.

Of course, Rust will probably not ever have a single monolithic specification mediated by ISO or an equivalent. It will certainly continue to be a community organization, with many different standards and specifications, perhaps one for a compiler with basic features, another for a compiler that fully supports errors, another for formatters like cargo fmt. Each of these specifications will delineate different sets of source trees: source trees with the syntax of Rust, source trees without errors, source trees without warnings, etc.

Just like the notion of a “programming language” doesn’t have to be a single set, a single binary between valid and invalid, the notion of a specification also needn’t be so monolithic.

Haskell Error Messages: Come on!

2022-02-16T00:00:00+00:00

I am a big fan of strongly typed languages, and my favorite GC’d language is Haskell. And I want you, the reader, to keep that in mind today. What I am writing is some commentary about a language I deeply love, some loving criticism.

So here’s what happened: A few days ago, I was showing off some Haskell for a friend who primarily programs in Python. The stakes were high – could I demonstrate that this strange language was worth some investigation?

My primary focus was on infinite lists, and defining fibonacci as a recursive data structure – all fun things to show off Haskell’s laziness. But at some point, we wrote an expression by accident that had a type error in it, and so we got to see how the compiler treated such things. I don’t remember the exact expression – it was deep in context – but the problem was I was trying to add an integer to an list. Something analogous to 1+[2,3].

Now, in some “weakly typed” languages, this sort of thing is actually allowed, as a colleague of mine recently pointed out:

[jim@palatinate:~]$ node
> 1+[2,3]
'12,3'

This is, of course, hilarious. But! We shouldn’t paint “weakly typed” languages with such a broad brush. In my friend’s native Python, it would have been an error, as it should be. It is a run-time error, but what does that matter when you’re working in an interpreted language, writing ad hoc scripts. The important thing is that failure is recognized as failure, and it doesn’t try to continue with nonsense:

[jim@palatinate:~]$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1+[2,3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'

This is an error message. It’s even a pretty decent error message. There are many things you can pass to the + operator in Python, but an int and a list together are not among them.

So now, what did Haskell do, this language that I’m trying to show off? Well, unfortunately, my friend didn’t see the actual problem in the code, but was first made aware of it from the compiler’s error message. And if you’ve ever done this before in Haskell, you’re probably wincing right now, because you know what this error message is:

[jim@palatinate:~]$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude> 1+[2,3]

<interactive>:1:1: error:
    • Non type-variable argument in the constraint: Num [a]
      (Use FlexibleContexts to permit this)
    • When checking the inferred type
        it :: forall a. (Num a, Num [a]) => [a]

Now, my friend didn’t understand this error message at all. Since I was in Demonstration Mode, my instinct was to explain it to him, but after a few false starts, I realized that this would simply not help, and pointed out that you couldn’t add integers to lists, and showed him where this was happening (it was a little more subtle than this example).

But since then, my colleagues and I were discussing error messages in Slack, specifically how good Rust’s error messages are, specifically how much better they are than Haskell’s. So I had an opportunity to paste that very bad Haskell error message me and my friend discovered into the Slack. There, it served as a case study, so we could discuss how problematically incomprehensible it is, sparking a lot of discussion, from which I shall try to extract the most interesting parts into this post.

For one, this error message has little to do with the concrete problem. The problem is – and the error message should say this – that you can’t add lists. Specifically, in Haskell, you can only add things that implement the Num typeclass (which lists don’t), and so you’d think the compiler would be smart enough to mention anywhere in this error message something along the lines of “expecting [a] to have Num instance, but it does not.” That’s the actual problem, even if not well-explained.

But instead, ghc tries to assume you meant what you wrote, and figure out a way in which [a] can have the Num instance. This is where it fails, and then it gives advice on how to make that succeed. As my professor-colleague points out, this is dangerous advice, especially for beginners, because there’s no way that using FlexibleContexts will actually help in that situation. The problem isn’t that these lists aren’t numbers in particular, and that you need to only accept lists that are numbers in your function. The problem is that no lists are (or at least should be) numbers! But a beginner might just follow the advice, try to figure out what the hell FlexibleContexts are, and find themselves in a world of pain, and no closer to solving the actual problem.

Part of what causes this is the type of 1 itself. Haskell, unlike Rust, allows literals like 1 to be interpreted in any number type. Given that Haskell (like Rust) has return-type polymorphism, it can directly express this in the type system:

Prelude> :type 1
1 :: Num p => p

In Rust, this would be something like impl Num. It means that 1 can be any type that is Num. Combine that with the fact that + requires its arguments to be Num and to match ((+) :: Num a => a -> a -> a), and when we see 1+[2,3], we’re simply left trying to figure out how [2,3] is Num.

If we did not have this polymorphic literal, this notion that the meaning of 1 is flexible, we would have seen a much more comprehensible error message. If 1 meant the same thing as (1::Integer) (or any arbitrary choice), we’d have this beautiful explanation:

Prelude> (1::Integer) + [2,3]

<interactive>:4:16: error:
    • Couldn't match expected type ‘Integer’
                  with actual type ‘[Integer]’
    • In the second argument of ‘(+)’, namely ‘[2, 3]’
      In the expression: (1 :: Integer) + [2, 3]
      In an equation for ‘it’: it = (1 :: Integer) + [2, 3]

Or even if we just had non-numbers on both sides, we’d similarly have a better error message:

[jim@palatinate:~]$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude> () + [1,2]

<interactive>:1:6: error:
    • Couldn't match expected type ‘()’ with actual type ‘[Integer]’
    • In the second argument of ‘(+)’, namely ‘[1, 2]’
      In the expression: () + [1, 2]
      In an equation for ‘it’: it = () + [1, 2]
Prelude>

What is my take-away here? I don’t think the compiler has been sufficiently tweaked when it comes to error messages, or that the Haskell community cares sufficiently about beginners. Rust as a community puts a lot of energy into good error messages, so that even though Rust also has a trait you could add to arrays to make + work, it still has a better error message:

error[E0277]: cannot add `[{integer}; 2]` to `{integer}`
 --> test.rs:2:7
  |
2 |     1 + [2,3];
  |       ^ no implementation for `{integer} + [{integer}; 2]`
  |
  = help: the trait `Add<[{integer}; 2]>` is not implemented for `{integer}`

But I also think the semantics of 1 are too liberal, leaving the compiler in an awkward place. See, the weird thing is, you can declare [2,3] a number, making 1+[2,3] an expression that adds two lists:

instance Num [a] where
    (+) = (<>)
    (-) = (<>) -- Eh, why not?
    (*) = (<>)
    negate = reverse
    abs = id
    signum = const []
    fromInteger i = take (fromInteger i) $ repeat undefined

main = do
    print $ signum $ 1 + [2,3]

Once you’ve defined lists as a number, 1 is suddenly a list if it wants to be. And this contributes to the difficulty of finding the right error message: what you asked for is possible after all.

And in the end, this leaves me with the feeling that Haskell has this in common with Javascript, and that makes me sad. A polymorphic enough strongly typed language is no longer strongly typed.

Being Fair about Memory Safety and Performance

2022-01-20T00:00:00+00:00

For this next iteration in my series comparing Rust to C++, I want to talk about something I’ve been avoiding so far: memory safety. I’ve been avoiding this topic so far because I think it is the most discussed difference between C++ and Rust, and therefore I felt I’d have relatively little to add to the conversation. I’ve also been avoiding it because I wanted to draw attention to all the other little ways in which Rust is a better-designed programming language, to say that even if you concede to the C++ people that Rust isn’t “truly memory safe” or “memory safe enough,” Rust still wins.

Array Indexing

But there is a persistent and persnickety little argument that I wanted to talk specifically about. This argument is really persuasive on its face, and so I think it deserves some attention – especially since I am guilty of having used this argument myself, many years ago when I still worked at an HFT firm, to claim that C++ had a niche that Rust wasn’t ready for. I’ve also seen it a few times in a row in the wild, and it’s made me so emotional that I simply had to write this, and as a result, it’s a little more emotional than some of the other posts.

In this argument, array indexing stands in for a number of little features. But – I’ve seen array indexing cited so often as a canonical example that I feel compelled to address it directly!

The argument goes like this: In Rust, array accesses are checked. Every time you write arr[i], there is an extra prepended if i >= arr.len() { panic!(..) }. As you can see, that is more code, and worse, a run-time check. And while the optimizer might eliminate it, or the branch predictor may well predict it right every time, the extra code bloat and possible run-time check, is just unacceptable in [insert field here (I used HFT)], where every nanosecond matters. And until some acceptable solution is found to this, I just don’t see Rust making it in [insert field].

When I made this argument, to a group of programming-language academics, the defenders of Rust countered with a number of points, all of which accepted the basic premise:

Do I really need those extra nanoseconds? Yes.
Is it really too much of a price to pay for all that extra safety? Yes.
Do I really distrust the optimizer that much? Yes. If only Rust had a way to do optimizer assertions, a way to statically verify that the panic had been optimized out.
Would dependent typing on integer values help? Yes. That sounds very promising. I think Rust will get there someday, but for right now we must use C++.

Now that I know more about Rust I’m happy to tell you that I was completely off base. I wasn’t off base about the performance considerations, or the unacceptability of even the slightest risk of a run-time check. I was off base about an even more basic premise: that Rust uses checked array indexing, whereas C++ uses unchecked array indexing.

But wait! Isn’t that the whole point? Doesn’t C++ avoid checking everything, to make sure all abstractions are zero-cost, to be blazing fast? Doesn’t Rust, while trying for performance, in the end always concede to the demands of safety?

Well, let’s look at the APIs in question. C++ apologists are always saying to use the modern C++ features from C++11 and later, rather than the more C-like “old style” C++ features, so on the C++ side let’s take a look at the documentation for std::array, introduced in C++11.

Here we see two indexing methods. The first one, at, is bounds checked and will throw an exception if the index is out of bounds, whereas the second one, operator[], is not, and will instead exhibit undefined behavior of a very difficult-to-debug nature. It looks like C++ actually believes in free choice here, leaving the choice of method up to the user. Not quite what we supposed, but the important part is that unchecked indexing is available, so so far the argument can still stand.

Now let’s look at Rust. Rust arrays and vectors can also be used with methods from slice, as can slices, so the slice documentation is the best place to look. And looking there, we immediately see – drum roll please – 4 methods. We see get and get_mut, which are checked, and right underneath them, in alphabetical order, get_unchecked and get_unchecked_mut, which are not.

To review, where do Rust and C++, these programming languages with their vastly different philosophies, Rust for the cautious, C++ for the fast and bold, stand? In the exact same place. Both programming languages have both checked and unchecked indexing.

Let me say that again. This is the talking point form, what to say if you need something quick to say, if you’re ever debating programming languages on a political-style talk show (or at a party or even a job interview):

In both Rust and C++, there is a method for checked array indexing, and a method for unchecked array indexing. The languages actually agree on this issue. They only disagree about which version gets to be spelled with brackets.

The difference is simply in the default, which one gets that old fashioned arr[index] syntax. And even that can be changed. Even if the C++ default were superior – and, as I will argue later, it is not – this is surely a minor issue. After all, don’t we normally use our fancy for x in arr syntax in Rust? This issue is just so small as to be unlikely to be a deciding factor in what programming language is better, even if we’re in a special application domain where every nanosecond matters.

The Unsafe Keyword

So that’s a wrap folks. We can all go home, and none of us will ever see this extremely silly argument on the Internet or in person again. It’s just a misunderstanding, the person making it was simply misinformed, and all it will take is a link to this blog post – or the relevant method in the docs to set them straight.

But wait! The C++ apologists are still talking! What are they saying? How have they not been completely flummoxed? They’re pointing at that method, chanting a word like a slogan at a protest march. I can’t quite make it out – what it is it?

Oh. They’re chanting unsafe. And credit where credit is due: it’s very difficult to chant in a monospace font.

Well, that is easy to respond with! The nerve, that C++ programmers would call our unchecked array indexing method unsafe. For one, all unchecked array indexing methods are unsafe: that’s what unchecked means. If it were safe, it would be at least statically checked. For another, isn’t this the pot calling the kettle black? Isn’t C++ all about unsafety, so much that C++ programmers don’t even mark their unsafe code regions becasue it all is, or their unsafe functions because they all are?

“But isn’t that the whole point of Rust?” they cry. “If you have to use unsafe to write good Rust, then Rust isn’t a safe language after all! It’s a cute effort, but it’s failing at its purpose! Might as well use C++ like a Real Programmer!”

This, my friends, is a straw man. No, the point of Rust and specifically Rust’s memory safety features is not to create an entirely safe programming language that can’t be circumvented in any circumstance; you must be thinking of Sing#, the programming language for Microsoft’s defunct research OS.

Let me be abundantly clear: The point of memory safety, the unsafe keyword, and friends in Rust is not to completely enforce memory safety, to make it impossible for the programmer to do anything they want to with the computer, even if they can’t prove to the compiler that it’s OK. In fact, the point of memory safety isn’t to make it impossible to do anything at all – it’s to make it possible to reason about the program.

The premise of Rust is that the vast majority of code in a systems program doesn’t need to be unsafe, and so it might as well be safe. People used to believe that you needed garbage collection for safety, but Rust proved that you could use lifetimes to still get safety without that performance cost. Now that we’re there, why worry about null pointers? Why not tell the compiler which things can be null, and which things can’t, so the compiler can check for you whether you’re handling nulls correctly? I’ve programmed C++ professionally for years without such a feature. You’d better believe I would have totally annotated the crap out of the code so the compiler could’ve caught them ahead of time.

Sometimes, C++ apologists cite valgrind. I’ve had codebases where I tried to use valgrind. Unfortunately, there was so much undefined behavior and memory leaks already caked into this project that new ones were simply impossible to see among all the noise. An army of junior engineers was at some point required to clean this up when finally the hierarcy decided that “valgrind” was something we might want to be able to use in the future.

And a lot of those undefined behaviors were ticking time bombs. Certainly, this codebase had its issues. A friend of mine took days to find a bug where a pointer had a value of 7. I don’t mean 7 elements into some array, not 7 of the relatively wide pointer type, not a convenient, testable-for NULL, value. No, none of that: The pointer’s value was exactly 0x7.

Update: My friend had a very similar incident to that described in this piece, but it was not the same incident. Some time after, I read that piece and shared it with this friend … and I must have conflated the numbers from the piece and from what happened to my friend. It was some null-page number, some “low integer,” however, even if not 0x7.

I’ve had memory corruption issues where I poured over every line of code that I wrote, over and over again, finding nothing. Ultimately, I learned that the issue was in framework code – code written by my boss’s boss. The code was untested, and written extremely poorly, and had rotted, so that it didn’t work at all. In Rust, I might have had some idea that my code – which in Rust would have all been able to be “safe” – couldn’t possibly be the source of the problem. Maybe my humble assumption that my code was to blame would be a little less tenable.

If I wanted a language that was always safe, at the time I knew Java or Python existed. Some companies even do finance in Java, for exactly that reason. But sometimes you still need that extra bit of performance. unsafe is sometimes necessary.

But given what gains safe Rust has made in predictable performance, it’s not as necessary as it used to be. The majority of the code I wrote then could’ve been written in safe Rust, and not lost a single clock cycle. The parts that needed to be unsafe could have been isolated, delegated to specific sections, wrapped in abstract data types, perhaps entrusted to a specific team.

And even then, I’m sure we would have been debugging memory corruption issues. But we’d know where to look. We’d know where to throw the tests. And we’d have saved programmer-years of time, days if not months of my life.

Now, I’m proud of my C++ skills. There is some part of me that wishes that C++ was better than Rust, that all that time getting better at debugging memory corruption wasn’t dedicated to a skill that is becoming obsolescent through better technology. And to be honest, that’s part of why I dismissed Rust as a candidate for HFT programming languages.

But it’s possible to be proud of a skill that is also becoming obsolete. And I am trying to replace it with a new skill to be proud of – writing Rust as performant as idiomatic C++, or even more performant, while reaching for the unsafe keyword rarely and modularly. I think it’s truly possible, for where it’s relevant.

Now I must turn to a subset of C++ apologists, who write using “modern C++” which is “very safe now” and experience therefore no memory corruption issues. To them I say, you are not doing high performance programming. If you were, you’d have to do some wonky things with pointers to spell the bespoke high-performance constructs you’d need.

There is indeed a safe subset of C++ heavy with modern features. If you are disciplined and keep your programming in that realm, you can avoid memory corruption mostly. But first, this safe subset covers fewer high-performance features than Rust. I’ve read some of this code and its idioms: It’s full of shared_ptrs not to share ownership but simply to avoid types that might be invalidated. It ironically leans on reference counting more than idiomatic Rust. This is among other, similar problems.

Let me be clear: First off, instead of keeping in your brain which features are “modern” and which are “edgy,” why not have a distinction where it’s well-marked? Second off, if you are writing entirely in this safe subset of C++, you can get much better performance instead out of the safe subset of Rust. You have no right to complain about Rust’s safety trade-offs, as you’re using a worse set, where you get no safety promises from the compiler and none of Rust’s surprising safe performance.

Rust’s safe and “slow” subset is faster than C++’s while still being, obviously, safer. Rust’s unsafe subset is better factored and better distinguished. Comparing apples to apples, Rust is better programming language for extracting performance out of LLVM, because you’ll be able to code more often without fear, and with very focussed fear when you do feel it.

A tool is even more useful if you can adjust it. The defenders of C++ talk about choosing trade-offs, but really, Rust offers both trade-offs. Mark your code as unsafe and convince yourself of its safety manually, or rely on programming language features. It’s up to you, on a function-by-function, even block-by-block, basis. In C++, if you have a problem, every line of code is suspect; you simply can’t opt in to safety, but in Rust, for where you don’t need the performance of unchecked indexing and other unsafe features, you can relax about the possibility of going bankrupt due to inadvertent memory reinterpretation – and how do I wish my NDA permitted me to talk about consequences at my own previous jobs!

And for where you do need to use unsafe, you can make sure your debugging and overthinking efforts are well-directed, for the few places in a large project you need it.

Unchecked Indices

This has gotten a little far from the original question. Should array indices be checked? Well, let me be clear about two facts that are both true, but in tension with each other:

Unchecked array indexing is sometimes absolutely necessary
Unchecked array indexing is an edge-case feature, which you normally don’t want.

If unchecked array indexing was unavailable in Rust, that would be a bug. What is not a bug is making it inconvenient. C++ programmers probably should be using at instead of operator[] more often. But in C++, what would it gain? There’s so many unsafe features, what’s the cost of one more?

But in Rust, where so much code can be written that’s completely safe, defaulting to the safe version makes more sense. Lack of safety is a cost too, and Rust makes that cost explicit. Isn’t that the goal of C++, making costs explicit?

Let’s look at situations where you are indexing memory. First off, most of them I saw were in old C-style for-loops, where you loop over an index rather than using iterators directly with a collection. Both Rust and C++ have safe versions of for that loop over collections with iterators, and those use the same check for the loop as they do for bounds, so those are easy enough to address. Nevertheless, I think that a lot of the noise about checked vs. unchecked array accesses comes from people who use indexing for their for-loops instead of iterators, and therefore mistakenly think that array indexing in general is a far more common operation than it is.

For the remaining situations, most are implementing either gnarly business logic, or a subtle, fast algorithm.

If it’s gnarly business logic, in my experience, it’s usually at config time – along with a good third to half to even more of the code in a complicated production system.

What do I mean by config time? A running high-performance system, whether optimized for latency or throughput, has a bunch of data structures organized just so, a lot of threads set up just right to move data between them in the perfect rhythm, and a lot of the work is in arranging them. That work is generally not performance-sensitive, but often has to be in the same programming language as the performance-intensive stuff.

Config-time is, depending on how you look at it, less of a thing or the entire thing in a programming language like Python. Python basically exists to do config-time programming for performance-intensive code put in very comprehensive “libraries” written in C or C++. But in C++, where you have a constructor that runs only once or a few times at first, and other methods related to it, in the same programming language as the money-making do-it part, you have to really adjust programming style between them.

Config-time is obviously when you read the configuration files. It’s where you open the relevant files. It’s where you call socket and bind and listen on your listening port. It’s where you spin up your worker threads, and make computations on how many worker threads there are. It’s where you construct your objects and your object pools. It’s where you memory map your log file. It’s where you set your process priorities. It’s where you recursively call the constructors and init functions of every object in your overwrought OOP hierarchy.

There is no need to sacrifice safety for performance at config time – especially since undefined behavior might lie latent and destabilize the system once it’s actually up and running. If you do an unchecked array access at config time, you might put garbage data in an important field, maybe one that determines how much money you’re willing to risk that day or how many of a thing to buy. And for what? To save a few nanoseconds before your process has even “gone live”?

So, when do you truly need unchecked array accesses? If it’s a subtle fast algorithm, probably deep in an inner loop, you should probably be wrapping it in an abstraction anyway. The code that actually executes the algorithm should be separate from the business logic, so that programmers trying to maintain the business logic don’t accidentally break it. And that’s exactly where it makes the most sense to use unsafe – when implementing a special algorithm. Maybe the proof that the index is within bounds relies upon some number theory the compiler was never going to understand without its own proof engine: great! You should probably be explaining that in a comment in C++ anyway, and so the conventional comment that goes with the unsafe block in Rust is a perfect place to explain it.

But maybe I’m wrong about all of this. Maybe your experience hasn’t matched mine. Maybe your particular application needs to make unchecked array accesses a lot, needs them to be unchecked, and needs them littered all over the codebase. I raise my eyebrows at you, suspect you need more iterators and perhaps other abstractions, and wonder what problem you’re trying to solve. But even if you’re absolutely right, I think it’s still a better idea to write Rust littered with unsafe every time you index an array, than to write C++.

Because, as I keep emphasizing, Rust is still a better unsafe programming language than C++. It would be better than C++ even if safety weren’t a feature.

Post-Script: Some Perspective for the New Rustacean

I understand where this straw man argument comes from. The word unsafe is scary, and advice, especially aimed at people coming from safe languages like Python and Javascript, is to avoid unsafe features while learning. And while I think adding unsafe to production code should only be done once you’ve exhausted safe possibilities – which requires full understanding of safe possibilities – this advice can feel overbearing for a transitioning C++ programmer, especially when it is immediately obvious that the safe features are very constrained and can’t literally do everything.

For that good-faith recovering C++ programmer, new to Rust: You’re right. The safe subset isn’t enough to do everything you want to do. And when it doesn’t, that doesn’t mean it failed. Its goal is to make unsafe code rare, not non-existent. But it might surprise you how rarely you truly need unsafe. And a good resource for you might be, as it was for me, the excellent Learn Rust the Dangerous Way by Cliff L. Biffle.

For what it’s worth, however, this criticism of Rust in general is often levelled either in bad faith, or from a misunderstanding of what the unsafe keyword is for. For all the philosophical discussion of what unsafe truly means – and how it interacts with the surrounding module and encapsulation/privacy boundaries – as well as principled conventions for using it, please see the Rustonomicon, the canonical book on unsafe Rust, the same way the book is canonical for introducing Rust.

Other criticisms of Rust from an HFT or low-latency point of view are more relevant. Most specifically, gcc and icc are much better compilers for those use cases – empirically – than is LLVM. Also, the large codebases existing in C++ are often tested and contain thousands upon thousands of programmer-years of optimizations and bugfixes, where even small compiler upgrades are scrutinized closely for performance regressions. Migrating to another programming language from that starting point would be prohibitively expensive.

None of which is to say that if Rust gradually replaced C++ altogether, eventually such ultra-optimizing compilers and ultra-optimized codebases wouldn’t start appearing in Rust. I hope to see that day within my lifetime.

In Defense of Async: Function Colors Are Rusty

2022-01-03T00:00:00+00:00

Finally in 2019, Rust stabilized the async feature, which supports asynchronous operations in a way that doesn’t require multiple operating system threads. This feature was so anticipated and hyped and in demand that there was a website whose sole purpose was to announce its stabilization.

async was controversial from its inception; it’s still controversial today; and in this post I am throwing my own 2 cents into this controversy, in defense of the feature. I am only going to try to counter one particular line of criticism here, and I don’t anticipate I’ll cover all the nuance of it – this is a multifaceted issue, and I have a day job. I am also going to assume for this post that you have some understanding of how async works, but if you don’t, or just want a refresher I heartily recommend the Tokio tutorial.

The Questionable Feature: Colored Functions

In any discussion of a programming language feature, the first thing to ask is what problem the feature is trying to solve. In the case of async, it’s trying to deal with asynchronous operations – operations that don’t require more work from the CPU to make progress, and where several might be in flight at any given time. For example, a single process might be writing some data to a file, reading data from another file, waiting for new incoming connections, and servicing an existing connection.

So how does Rust solve this? The easiest way to address this problem would be to have a thread for each operation, and to let the thread block at the asynchronous operation, essentially pretending that the operation is a long-running function like any other the CPU has to do, rather than something taking place elsewhere. But operating system threads are expensive. And rather than using green threads as some other programming languages do, Rust decided to create a syntactic sugar for futures, meaning that Rust’s async feature now suffers from the dreaded function coloring effect first explained by Bob Nystrom in a Javascript context in 2015.

In Bob Nystrom’s now-famous essay he complains that an analogous feature in Javascript is harmful, because asynchronous functions – which he refers to as “red” functions – can only be called from other red functions. Once a red function is needed, the function that calls it must also be red, and same with the function that calls that, the whole way up the call chain. And the syntax and semantics of calling a red function is more complicated than that of calling blue functions – especially in Javascript, where the next thing to do had to be enclosed in a lambda, resulting in callback hell (I do not endorse the suggestions in that post).

Colored Is Good, Actually, and Rusty

My position is close to those of this article, but with enough nuance that I wanted to write my own blog post to explain it in more detail. Fundamentally, I agree that Rust does indeed have colored functions, and that it’s not a bad thing. But I would go further. I say that function coloring has always existed in Rust, even before it manifested in the async world, that it is the Rustiest way to solve this problem, and furthermore, that Rust needs more function coloring than it has.

Rust, unlike the Javascript of the original colored functions article, is strongly typed, and influenced heavily by Haskell. This means that it has lots of type distinction on its values: “colored” values, if you will.

This type information includes basic ideas of type (string vs number vs widget), but also shades of distinction that a Javascript programmer won’t even be aware of. Let’s say you want to take a parameter to your function, a “widget.” In Javascript, you just take a parameter widget and do widget things with it, and hope that it works out. The name is just a comment: it’s up to the caller to know what exactly is expected, hopefully some sort of widget that works. In Rust, on the other hand, you have to annotate the parameter with a type, which not only ensures it’s actually a widget, but distinguishes between these potential requirements:

Exclusive reference to a widget &mut Widget
Ownership of a widget Widget
Reference to a widget that lives forever: &'static Widget
Optional widget (in Javascript this is very unclear): Option<Widget>

If Widget is a trait, you have even more options:

Owned run-time generic widget: Box<dyn Widget>
Non-owned reference to compile-time generic widget: &impl Widget

The list goes on. For each of these options, also, the caller often has to do something different. If the parameter is optional with Option, and the caller in fact has a widget, the caller still has to add Some to the parameter:

fn foo(widget: &Widget) { ... }
fn foo2(widget: Option<&Widget>) { ... }
fn foo3(widget: Widget) { ... }

let baz = Widget::new();
foo(&baz);
foo2(Some(&baz));
foo3(baz);

All of these, in my synaesthetic mind, are expressed by different colors and textures on the parameter. For all of these, Rust has made a value judgment that the programmer should be explicitly aware of these shades of distinction, if you will (pun intended). If a parameter is to be optional, the function is called differently than if it is mandatory. If a borrow happens, that requires a & from the caller, to make clear to the programmer what is going on, to make sure the writer of the caller and the writer of the callee are on the same page. Parameters in Rust are, in general, colored.

And this value coloring, like the async/sync function coloring, propagates. If a function requires a parameter to be 'static in lifetime, that requirement propagates to the caller of that function to the caller of that function to the originator of the value in question.

Similarly with return values – I disagree with “More Stina” about Result. I say Result-returning fallible functions are colored. In many programming languages, including Javascript, any and all functions can throw recoverable exceptions. In Rust, functions that might fail (in a recoverable fashion) must have a different return type than those that do not – they must return a Result<...>. Functions that return Result<T, E> are, as with async functions, harder to call than functions that just return T. If you don’t want to use the syntactic sugar ? to propagate the error, you have to grapple with Result as a literal return type, which means unpacking it and doing something else in the Err case. This is more straightforward than dealing with a raw impl Future, but fundamentally the same concept: either propagate the “color” with ? or async, or else deal with all the implications of Result or Future on the spot.

And all of these distinctions mean something. Passing by shared reference, mutable reference, or value are different, and put different safety requirements on the calling code, safety requirements that allow Rust to make more safety guarantees than Javascript ever could. Passing by reference is literally different at the ABI level from by-value, so each can implement the exact contract as efficiently as possible, unlike Javascript which leans on an expensive garbage collector for cleaning up the difference between these notions. That is to say, where Javascript (and Python) use garbage collectors, Rust uses distinctions – color distinctions, one might say – between types to achieve the same result, benefitting in performance but requiring more exactness from the programmer.

And in Rust, a statically typed programming language, we believe this to be a good thing. Rust is not for every project – it’s a steeper learning curve than Python or Javascript, and not every project needs to be maintainable long-term – but it has a distinct, consistent philosophy, which says that different things should be treated differently.

Async Functions Are Just Different

A blocking or asynchronous function is not the same thing as a non-blocking function. A non-blocking function fundamentally does some CPU tasks, taking control of the processor, using it, and giving it back. An asynchronous function does the set-up necessary for work to happen elsewhere. That work doesn’t need control of the CPU, and can be dealt with through a handle – a future – rather than just waiting for completion. These are fundamentally different notions, and while it might (or might not) make sense in Go or Javascript to lump them together into one notion of “calling a function,” Rust doesn’t do lumping.

When you call a normal function – without async/await – you build up a stack. When you use async/await, you build up a complex nested state object. If you use async/await with an executor to spawn a new task, that complex state object ends up on the heap in a data structure next to other task objects.

Both “call stack” (for synchronous code) and “task state object” (for asynchronous code) are reasonable ways of managing memory. Honestly, the miracle is that Rust, through async and await, manages to make these two vastly different paradigms look as similar as they do. Having to annotate the difference is a small price to pay for high-performance reactive programming.

It’s not 100% perfect. Even with the must_use warnings, people forget to call await on their futures sometimes. And writing reactive, async code is harder – which makes sense, because the resulting code is a more difficult but more performing usage of memory. Writing code that passes the borrow checker is harder, but considered worth it because we can remove indirections and avoid garbage collection. async offers us the same deal for reactive programming.

Alternatives to Async

But let’s say we did want to remove the coloring here. Let’s say we did want to pretend that blocking functions were just like CPU-based ones, but just taking a long time. What would we have to do?

Well, we’d still have to wait for multiple things simultaneously. Our servers have many connections they have to service at once, and when a message comes in on socket B, it can’t be ignored just because the code happens to be on socket A. If asynchronous operations are implemented by blocking, we have to handle this with multithreading.

Kernel multithreading is expensive, but even Go-style “green threads” have to have a separate stack for each green thread. Stacks are gnarly, because it’s unclear how much space should be reserved for them ahead of time. They have to dynamically adjust to the run-time demand, and when the original allocation is used up, you get a pause as you try to allocate more. The advantage is, you have a simpler mental model with fewer distinctions. Basically, you trade performance for simplicity – like in garbage collection.

If you want to do this trade, Rust doesn’t stop you from implementing it yourself. OS threads and blocking system calls are perfectly reasonable solutions to many problems. But Rust isn’t going to encourage the trade by creating a new compromise point of “green threads.” You have to do async the whole way, and if you think of what async code actually de-sugars to, you wouldn’t complain about how hard async functions are, but be impressed it’s so darn easy to write them!

Rust is a systems programming language at heart. I understand and respect that, because of its type system and guarantees, it has found use outside of the old domains of C and C++, but those C and C++ systems programmers are Rust’s ideal “base,” in a political sense of the word. Rust should not sacrifice performance for ease of programmability.

Blocking vs Non-Blocking

Rust has two ways of doing off-CPU “IO” operations, blocking and non-blocking. Blocking takes over the thread, and non-blocking works through async. This mirrors a distinction in the system calls that most kernels provide. The operating system API has this distinction built into it, and it makes sense for Rust to propagate that to the user.

But fundamentally, one of these constructs is more honest than the other. When we call a blocking kernel system call, rather than the kernel taking over the CPU, running on it, and then returning the thread of execution to us, what actually happens internally is more of a mirage. The kernel deschedules the current process, and using an internal mechanism more like async than like blocking, schedules it again, recovering its previous state as if nothing happens, when the IO is done.

This means that we can pretend the I/O operation was just an operation like any other, but it comes at a risk – the operation might not return anytime soon. It might in fact wait for a situation that’s not going to happen anytime soon.

If such a blocking function is called from a non-async Rust thread, we assume that the caller is using threads to juggle multiple I/O events – or else that they simply don’t have anything else going on. But it is very dangerous to call a blocking function from an async function. It can starve threads in a thread pool, and cause knock-on effects in other places. Maybe an async task is waiting for a message from a channel, and even though the message was sent, the task doesn’t resume because the thread it’s scheduled on is busy on this blocking function. The effects are unpredictable and non-local – similar to the dreaded “undefined behavior” – and debugging is similarly difficult – ask me how I know!

Functions that block but are not async are referred to in the “More Stina” blog post (also linked above) as “purple functions.” They are not true async “red functions” that you can call with async, but they are also not safe to simply call from an async function like a truly CPU-based “blue function” would be. Calling a blocking function from an async function is extremely unsafe, and there is simply no warning generated by rustc, normally so helpful about such things, to let you know how deep and undebuggable a mistake you’re making.

These purple functions ought to be a different color in Rust, just like they are in practice. It should be an error to call a blocking system call from an async function. I don’t know how this would work – I imagine a generalization of unsafe that includes things like blocks, perhaps as well as panics. That would fundamentally be an “effects system,” as is regularly proposed, but that’s not the only solution. But I do fundamentally think that something ought to be done about this deficit in Rust’s otherwise quite rigorous function-coloring system.

So, in conclusion, I say: yes, Rust async functions are colored. This is the same as saying they are strongly typed, and this is a good thing. And instead of trying to fix it, we should have more of it.

Postscript: Monads

As I mentioned before, calling an async function does something fundamentally different under the hood from calling a vanilla “blue” function. Similarly, calling a fallible function with Result does something different from calling a function with a normal return value. In both cases, the control flow is different – either it contains short-circuits to error code (Result); or regular hops back and forth between the task, other tasks, and the executor (async/Future).

In both these cases, it’s like the meaning of having one statement come after another has changed: ; itself has been overriden. And it would be nice if generic collections methods, like map and filter, supported this, so that you could fail, or await, in the closures.

This is possible in Haskell, because Haskell has a typeclass (equivalent to Rust traits) for abstracting over different styles of control flow. That is what Haskell’s infamous monads are for, and why Haskell persists in using this technology even though it’s so famously confusing for beginners.

Fundamentally, every Haskell monad is a function color. And often, they can be stacked together (via “monad transformers”) so that you can say something like “this function can do IO, fail, and be asynchronous.” You can also create functions that are polymorphic on “color”: the control flow is rewritten based on which monad you actually end up in.

Why is this useful? As “More Stina”’s post already mentions, there is a proposal to add try_ versions of iterator adapter methods: try_filter, etc., to enable them to work smoothly with Result-“colored” functions. A method like filter or map also would need an adapter to work well with async. If there were an abstract concept of monad, we could write code with filter-like methods that could short-circuit on failure and do the right thing with await:

vec!([2,3,4])
    .iter()
    .filter_monad(|x| fallable_thing.contains(x)?)
    .filter_monad(|x| network_file_thing.contains(x).await?)
    .for_each_monad(|x| network_other_thing.send(x).await);

Perhaps Rust will someday gain this abstraction as well. I actually think that would be good for Rust. Monads are hard to deal with conceptually, and I’m not sure how to make them more user-accessible, but I think if anyone could do it, it’s the Rust people, who’ve already done such a good job so far at programming language design and maintenance.

Endianness, API Design, and Polymorphism in Rust

2021-11-21T00:00:00+00:00

I have been working on a serialization project recently that involves endianness (also known as byte order), and it caused me to explore parts of the Rust standard library that deals with endianness, and share my thoughts about how endianness should be represented in a programming language and its standard library, as I think this is also something that Rust does better than C++, and also makes for a good case study to talk about API design and polymorphism in Rust.

To start with, let’s discuss endianness a little. I assume most of my audience has some familiarity with endianness; nevertheless, I’d like to explain it from first principles. That way, we can subsequently apply the insights from the explanation to API design. That, and I want practice explaining concepts, even if they are basic. I’ll try to keep it interesting, but also feel free to skim the next section.

Big End, Little End

I first encountered the concept of endianness when I was first learning to program using the DEBUG.EXE program on DOS. When a 16-bit value was displayed as a 16-bit value, it was just normal hexadecimal, but when it was displayed as two 8-bit bytes, something weird happened with the display.

Here’s a C++ snippet that demonstates the effect:

template<typename T>
void display_bytes(const T &val) {
    char bytes[sizeof(T)];
    memcpy(bytes, &val, sizeof(T));
    for (auto byte : bytes) {
        printf("%2x ", byte);
    }
    printf("\n");
}

int main() {
    int value = 0x12345678;
    printf("%x\n", value);
    display_bytes(value);
}

When run on any little-endian processor (the vast majority of processors), we get:

12345678
78 56 34 12

The least significant byte is first, so if you print out the individual bytes in order, you have to read it backwards – though each individual byte is still forwards.

If you were to run the same code on a big-endian processor, you would get:

12345678
12 34 56 78

At this point, little endianness as I understood it was a weird thing that Intel processors did for reasons I didn’t understand, that made me do a little extra work when reading hex dumps. At the time, this was fine: I thought that having to apply this extra arcane knowledge was cool for its own sake. But also, I thought of little endianness as the weird way to do things that required extra work, and big endianness as the more natural design. It wasn’t until much later that I got some nuance on that opinion.

See, the writing system we use for numbers is big endian. Instead of dividing a number into bytes (base 256), we divide it into digits. We consider the left of the page to come before the right of the page, and we write the most significant digit first. This is all taught explicitly in grade school:

1234 = 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0

There’s a certain, very human logic to this system: the more important information comes first, then the details. Mathematically, though, we see decreasing numbers for the powers of 10: first we specify a factor for 10 to the third, then to the second, then to the first, then to the zeroth. We could instead imagine where we wrote our decimal numbers big endian, where the same number would be written 4321, and still mean “one thousand two hundred thirty-four,” where we’d count like this:

This would have the advantage that the first digit, digit zero, would be multiplied by 10^0, digit one by 10^1. Not what humans would normally decide to do, but it has a certain logic. And if you think about languages like Hebrew or Arabic, which are written right from left, but which write numbers the same direction we do, the least significant digit is actually reached first in the normal direction of reading: when they see “100” in the midst of the text, the zeroes are “before” the “1”. (I am told that this is not how most people think of it; that they instead just think of numbers as going the other way from other text, but it just goes to demonstrate how based on convention all of this stuff is).

So all of this is to say that, the weird effect we had before with big endian looking “normal” and little endian looking “weird” has nothing to do with the intrinsic logic of big vs little endian, but rather with the fact that we’re mixing a little endian processor with a big endian writing notation. If we instead were to print the digits of each number in increasing significance – that is, if we were to use little endian as our printing convention – we’d get:

# Little Endian Machine
87654321
87 65 43 21
# Big Endian Machine
87654321
21 43 65 87

The mismatch between writing the whole word in hex and writing the individual bytes in order, in hex is caused by a mismatch between the endianness of the system (normally little in practice) and the endianness of the writing system (normally big in practice). When the writing system isn’t a factor, little endian makes more mathematical sense, is easier to reason about in circuitry and code, and therefore has won out over big endian in every major processor architecture.

The only real exception is network byte order, which uses big endian. This is convenient for manually reading hex dumps of packets, but probably has more to do with the fact that the Internet developed when this question was much less settled. Due to the presence of network byte order, however, and the fact that the endianness of Intel and modern ARM is opposite of the endianness of most human writing systems, the concept remains with us.

When is endianness relevant?

In writing numbers, a digit has no endianness: 8 means the same thing as a single digit number. Similarly, a byte is indivisible in a processor. Bytes are made up of bits, but outside of special instructions, the ordering of those bits is not relevant. One of them is most significant, one of them is least, but unless we’re indexing them for a special instruction, or sending them over a wire one by one, there is no way to say which such bit comes “first.”

Indeed, if we want to display a byte as a series of bits, we as the programmer get to choose the endianness, and the program runs identically on either a big endian or a little endian platform. The little endian version is a little more intuitive, as “2 to the N” is an operation that’s easy to write on computers, and in little endian the N increases as the index increases:

fn byte_as_bits_le(byte: u8) -> [u8; 8] {
    let mut res = [0u8; 8];
    for i in 0..8 {
        let mask = 1 << i;
        if byte & mask == 0 {
            res[i] = 0;
        } else {
            res[i] = 1;
        }
    }
    res
}

Nevertheless, at no point is this function relying on the endianness of the hardware, and it does the same thing on either types of hardware.

Why do I bring this up? Well, I don’t think it makes any sense to speak of the endianness of a (multi-byte) word per se. The endianness of the word only comes into play when it is stored as – and accessible as – a series of bytes.

So from that point of view, what are the operations where endianness is relevant? Given a word, what series of bytes comprises it? And then, given a series of bytes, what word is it?

In Rust terms, these are to_be_bytes (for big endian)/to_le_bytes (for little) in the one direction, and from_be_bytes/from_le_bytes in the other. These methods are all bundled together in the Rust documentation for – in this case – the primitive u32 type, along with ne which gives whatever the native endianness of the processor is.

These are the APIs I’m going to be discussing. But before discussing how they might be improved, I’m going to point out an API that I think doesn’t make as much sense: to_be. This method takes in a word, a u32, and outputs a u32, and yet claims to change the endianness of that word, which as I mentioned, does not have endianness per se, only in that it’s represented by bytes.

I know what they mean by it. On a little endian platform, it will replace 0x12345678 with 0x78563412. But what does that actually mean? In its form as a u32, as I have argued above, a number has no endianness. So what is this number 0x78563412? It is the number that, if stored in bytes in the native endianness, will store the original number in big endian.

That’s a mouthful, I know, because it’s actually a complicated concept. That is to say, it’s a hack. We want to write a number – say, 2000 – in big endian, but we don’t want to think of it as bytes, yet. We want to be able to load the whole number into a register, and when we write it, we want it to be 2000 in big endian. So we byte swap the number, and instead of storing 2000, we store 3490119680, so that if we write it using the processor’s normal mechanism for writing, it comes out to 2000 in big endian.

Basically, to_be does the equivalent of u32::from_le_bytes(input.to_be_bytes()), and using it looks like this:

let input: u32 = 2000;
// These two invocations do the same thing
let be = input.to_be();
let be2: u32 = u32::from_le_bytes(input.to_be_bytes());
println!("{} {}", be, be2); // 3490119680 for both

// The result can be written using native (little endian) byte
// order, and it will give 2000 in big endian byte order.
assert_eq!(be.to_le_bytes(), input.to_be_bytes());

This is arguably a useful hack – though I’m not fully convinced – but it is definitely a hack. I do not think the description is sufficiently rigorous. The output of to_be is not a number “in big endian,” it is a different number that resembles the big endian representation of the original number. The description is a simplification, and I think a conceptually incoherent one – which is understandable because the concept at play here is so hackish.

It appears that to_be was in Rust 1.0, and to_be_bytes was introduced later. This to me is a good sign, as to_be_bytes, I think, makes much more sense as an interface. And as to why we started out with the to_be type of interface in Rust, that makes sense as well, because in C the traditional (POSIX but not ANSI C) functions for these conversions have similar semantics, such as htonl (host to network long), where we have this conceit of storing a “big endian” or “network byte order” value in a uint32_t (C for u32). This always struck me as the wrong abstraction, but it is justified – or at least more understandable – for C as we simply can’t pass around things like char[4] (C for [u8; 4]) by value in C.

There are other technical and historical reasons why htonl and to_be and friends exist, even if conceptually messy, but in any case, since I’m talking about API design, and to_be_bytes and friends are a better match for the concepts at hand, I am now going to pretend to_be is deprecated (it is not), and move on to discussing the design of to_be_bytes and to_le_bytes.

Policies

So the first thing I notice is that there’s six methods that deal with fundamentally one topic:

from_be_bytes
from_le_bytes
from_ne_bytes
to_be_bytes
to_le_bytes
to_ne_bytes

But really they vary in two ways, namely:

which operation is performed (from_X_bytes vs to_X_bytes)
which endianness is required (le, be, and ne)

For us humans, this is clear from the names, but to the compiler, these names do not form a pattern that it is capable of recognizing. There are simply 6 separate functions named with 6 separate combinations of characters.

Now, having separate functions for separate operations makes sense; that’s what functions are for. But for the same operation but with different endianness, it might make more sense to indicate that it is one operation with several possible endiannesses by making the endianness into a parameter.

The obvious way to do this would be via run-time parameter. A fairly literal translation of this API would be something like:

enum Endian {
    Little,
    Big,
    Native,
}

impl u32 {
    fn to_endian_bytes(self, endianness: Endian) -> [u8; 4] {
        match endianness {
            Endian::Little | Endian::Native => { ... }
            Endian::Big => { ... }
        }
    }

    fn from_endian_bytes([u8: 4], endianness: Endian) -> Self {
        match endianness {
            Endian::Little | Endian::Native => { ... }
            Endian::Big => { ... }
        }
    }
}

This would also allow us to implement the concept of “native” byte order a little differently, and create more names for byte orders:

enum Endian {
    Little,
    Big,
}
static NATIVE_ENDIAN: Endian = Endian::Little;
static NETWORK_ENDIAN: Endian = Endian::Big;

So, besides simplifying away the need for a separate implementation for the ne functions, and making the code more in sync with what’s happening, what other positive things have we accomplished? Well, given that we now have a parameter, we can now make more complicated code parametric on it. Imagine we have an entire structure to write out, and we want to write the entire structure as big-endian or little-endian, perhaps because the protocol in question changed endianness at some version. Or perhaps we just want to make clear to the reader that one endianness is used for the entire structure. We can now do something like this:

struct Structure {
    a: u32,
    b: u32,
    c: u32,
}

impl Structure {
    fn serialize(&self, endianness: Endian) -> [u8; 12] {
        let mut res = [0u8; 12];
        [0..4].copy_from_slice(self.a.to_endian_bytes(endianness));
        [4..8].copy_from_slice(self.b.to_endian_bytes(endianness));
        [8..12].copy_from_slice(self.c.to_endian_bytes(endianness));
        res
    }

    pub fn serialize_old_version(&self) -> [u8; 12] {
        self.serialize(Endian::Big)
    }

    pub fn serialize_new_version(&self) -> [u8; 12] {
        self.serialize(Endian::Little)
    }
}

The alternative would be to write two separate serializers, and duplicate all the logic of how to arrange the layout. Duplication is bad, because bug fixes don’t necessarily get to all the duplicate copies. So, to save on duplication, we’d have to basically wrap to_be_bytes and to_le_bytes in a version of this; it would be more convenient if the standard library had done this for us.

What is the downside of this? Well, the implementation didn’t really get any simpler. Actually, in the normal case, where you don’t change your mind about endianness, the implementation got more complicated. We now have a match expression in our two simplified functions, which theoretically indicates a run-time decision. We could trust the optimizer to fold the decision in through inlining and constant-propagation, but trusting the optimizer is suspicious and unnecessary.

Nothing we’ve done so far requires this decision to be made at run-time, and so we can instead make the decision at compile-time. Where we had a run-time parameter, we can now have a compile-time parameter.

Now, although Rust has rudimentary support for other kinds of compile-time parameters, the archetypical compile-time parameter is a type, bound by a trait. Our enum from before would then have to be lifted into the type space, as a trait and a few types:

trait Endianness { }

struct BigEndian;
impl Endianness for BigEndian { }

struct LittleEndian;
impl Endianness for LittleEndian { }

Now, we need a compile-time equivalent for match. This is a little harder, as at the time of this writing stable Rust does not have the most direct equivalent of match for implementors of traits, that is, “specialization.” But Rust does allow something similar: the code for each branch of the match must go in each type’s implementation of that trait, and the fact of the match must be provided in the trait itself.

This will also help us simplify the implementation. This to_be_bytes/ to_le_bytes API is not just implemented for u32, but for all primitive types. Currently, these mostly-similar implementations are stamped out by a macro, along with other methods for primitive types. But we might imagine that there are two things going on in the implementation:

write out the type into an array of bytes
either swap the bytes, or not, based on whether we’re using the hardware endianness

We could then make the trait come into play – with the decision made at compile time – for the swapping part.

trait Endianness {
    fn possibly_swap(bytes: &mut [u8]);
}

struct BigEndian;
impl Endianness for BigEndian {
    fn possibly_swap(bytes: &mut [u8]) {
        // actually swap here
    }
}

impl Endianness for LittleEndian {
    fn possibly_swap(_: &mut [u8]) {
        // no need to do anything here
    }
}

We have now moved some of the implementation into a trait, where the specifics of the implementation are determined by which type implements that trait. This is an example of the policy pattern, where a portion of the code is abstracted out into a policy, and the policy and the main body of the function are sewn together – in this case, at compile-time – into many variations of a function that execute similarly to what an implementor might have written by hand.

Note that there is no possibility of doing run-time endianness determination in this version. This trait methods does not take a self parameter, and would have to be invoked as T::possibly_swap. This is possible in Rust because we are doing compile-time polymorphism, not run-time, so there is no need to make this trait object-safe.

Our previous example serializer, with the two versions, now looks something like this:

struct Structure {
    a: u32,
    b: u32,
    c: u32,
}

impl Structure {
    fn serialize<T: Endianness>(&self) -> [u8; 12] {
        let res = [0u8; 12];
        (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
        (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
        (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
    }

    pub fn serialize_old_version(&self) -> [u8; 12] {
        self.serialize::<BigEndian>()
    }

    pub fn serialize_new_version(&self) -> [u8; 12] {
        self.serialize::<LittleEndian>()
    }
}

The policy pattern is a fairly common pattern in generic programming just like in object-oriented programming, but when generic programming is implemented through monomorphization, as it is in Rust, it can be just as efficient as hand-implementing the combinations of policy and code, while allowing for more policies.

For example, if there were a platform where 4-byte chunks were split into 2-byte chunks little endian, but 2-byte chunks were split into 1-byte chunks big endian, we could write a new policy for this platform and all the existing code would support it.

A much more complicated example of the policy pattern is serde, where the generated serializers and deserializers for each structure are all polymorphic on what serialization format should be used. If a new serialization format comes out with serde support, all existing Serialize instances can then be used with the new format without modification.

Now, in practice, there are often processor instructions that do byte swaps. The hardware uses an interface analogous to the hackish, conceptually messy to_be(), which at a hardware level makes sense because elegance of abstraction is not as an important goal as performance. This converts 0x12345678 into 0x78563412, and similar. So, this implementation is not actually what the policy would look like in a production context. Nevertheless, the endianness argument could definitely be passed in by a trait-constrained type parameter; the implementation would just be more complicated.

Traits

I mentioned before that u32 is not the only type that implements this set of methods, this convention, this informal protocol of to_be_bytes, to_le_bytes, etc. This means that if we were writing in C++, we would have enough from this informal protocol to write a function that did something like “write this value in big endian twice, and little endian twice, to different locations” that was agnostic to the type provided, as long as it implemented this informal interface. It would look something like this:

template <typename T>
void write_four_times(T val) {
    write_to_location_1(val.to_be_bytes());
    write_to_location_2(val.to_be_bytes());
    write_to_location_3(val.to_le_bytes());
    write_to_location_4(val.to_le_bytes());
}

This would allow you to call write_four_times on any type for which that code made sense, as C++ templates are literally templates, and the T is filled in before type-checking. The protocol here is implicit in the structure of the function – it is compile-time duck typing.

Rust generic functions are type-checked before monomorphization, so we can’t do this in Rust. Instead of defining to_le_bytes() and friends separately on each type, this function would require them to be in a trait, maybe EndianBytes:

fn write_four_times<T: EndianBytes>(val: T) {
    write_to_location_1(&val.to_be_bytes());
    write_to_location_2(&val.to_be_bytes());
    write_to_location_3(&val.to_le_bytes());
    write_to_location_4(&val.to_le_bytes());
}

EndianBytes would have to define at least those methods:

trait EndianBytes {
    fn to_be_bytes(self) -> [u8; ???];
    fn to_le_bytes(self) -> [u8; ???];
}

Unfortunately, as the ??? shows, the different output arrays have different lengths – a u16 would be 2 bytes and a u64 8 bytes – and so the Rust trait system at the time of this writing is (to my knowledge) not powerful enough to represent this trait as is. Instead, it would have to return a slice, which introduces an additional run-time value (the length) into the mix that we’d rather avoid in this exercise on compile-time generic programming.

Run-Time Endianness

What if we want to make decisions about endianness at run-time, say, because we are implementing DBus? This is, as Linus Torvalds pointed out in one of his famously angry emails, a stupid idea for a protocol, but we don’t always get to choose what protocol we implement. Even though choosing one endianness and sticking to it would have avoided the run-time cost of making a decision (which as Torvalds points out is more than the cost of either decision), the developers of DBus did not do that. UTF-16 also didn’t – it also does run-time endianness adjustment with a sentinal character at the top of the text block to indicate the endianness.

The most obvious solution is to use the run-time parameterized version we discussed towards the beginning of this post, and have an enum Endianness parameter. This would be parsed in each message (or connection, or whatever duration of time endianness is configured) and then passed through to all the serializing and deserializing code, which would look something like our original serialization example:

fn serialize(&self, endianness: Endian) -> [u8; 12] {
    let res = [0u8; 12];
    (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes(endian));
    (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes(endian));
    (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes(endian));
}

pub fn serialize_old_version(&self) -> [u8; 12] {
    self.serialize(Endian::Big)
}

pub fn serialize_new_version(&self) -> [u8; 12] {
    self.serialize(Endian::Little)
}

We can do better than that, though. This has one copy of the serialization code in the source, and one copy in the binary. What we could do instead, is expand the more sophisticated compile-time version of the serialization code, and move the match into a wrapper serialize method:

fn serialize_impl<T: EndiannessTrait>(&self) -> [u8; 12] {
    let res = [0u8; 12];
    (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
    (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
    (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
}

pub fn serialize(&self, endianness: EndiannessEnum) -> [u8; 12] {
    match endianness {
        EndiannessEnum::Big => self.serialize_impl::<BigEndian>(),
        EndiannessEnum::Little => self.serialize_impl::<LittleEndian>(),
    }
}

This generates two serializers from one serializer function (thus mitigating the biggest problem with code duplication – that of maintainability), and makes the run-time decision further up in the call tree. This ability – to adjust between finer-grained run-time decisions and duplication of run-time code – is one of the greatest powers of C++ and of Rust. We can effectively – in the DBus case – create two entire DBus deserializers – one for little-endian, one for big endian – and then decide between the two deserializers at run-time on a per-message basis, which, because fewer run-time decisions are being made, will be much more efficient than making the run-time deserialization decision at every deserialization site.

Of course, for serialization we can simply write one serializer and always generate little-endian DBus messages.

C++ Move Semantics Considered Harmful (Rust is better)

2021-11-03T00:00:00+00:00

This post is part of my series comparing C++ to Rust, which I introduced with a discussion of C++ and Rust syntax. In this post, I discuss move semantics. This post is framed around the way moves are implemented in C++, and the fundamental problem with that implementation, With that context, I shall then explain how Rust implements the same feature. I know that move semantics in Rust are often confusing to new Rustaceans – though not as confusing as move semantics in C++ – and I think an exploration of how move semantics work in C++ can be helpful in understanding why Rust is designed the way it is, and why Rust is a better alternative to C++.

I am by far not the first person to discuss this topic, but I intend:

to discuss it thoroughly enough to contribute to the conversation
to nevertheless discuss it in such a way that those familiar with systems programming, but unfamiliar with either C++ or move semantics, can understand it, starting from first principles

Modern C++

First, some background.

In 2011, C++ finally fixed a set of long-standing deficits in the programming language with the shiny new C++11 standard, bringing it into the modern era. Programmers enthusiastically pushed their companies to allow them to migrate their codebases, champing at the bit to be able to use these new features. Writers to this day talk about “modern C++,” with the cut-off being 2011. Programmers who only used C++ pre-C++11 are told that it is a new programming language, the best version of its old self, worth a complete fresh try.

There were a lot of new features to be excited about. C++ standard threads were added then – and thread standardization was indeed good, though anyone who wanted to use threads before likely had their choice of good libraries for their platform. Closures were also very exciting, especially for people like me who came from functional programming, but to be honest, closures were just syntactic sugar for existing patterns of boilerplate that could be readily used to write function objects.

Indeed, the real excitement at the time, certainly the one my colleagues and I were most excited about, was move semantics. To explain why this feature was so important, I’ll need to talk a little about the C++ object model, and the problem that move semantics exist to solve.

Value Semantics

Let’s start by talking about a primitive type in C++: int. Objects – in C++ standard parlance, int values are indeed considered objects – of type int only take up a few bytes of storage, and so copying them has always been very cheap. When you assign an int from one variable to another, it is copied. When you pass it to a function, it is copied:

int print_i(int arg) {
    arg += 3;
    std::cout << arg << std::endl;
}

int foo = 3;
int bar = foo; // copy
foo += 1; // foo gets 4
std::cout << bar << std::endl; // bar is still 3
print_i(foo); // prints 4+3 ==> 7
std::cout << foo << std::endl; // foo is still 4

As you can see, every variable of type int acts independently of each other when mutated, which is how primitive types like int work in many programming languages.

In the C++ version of object-oriented programming, it was decided that values of custom, user-defined types would have the same semantics, that they would work the same way as the primitive types. So for C++ strings:

std::string foo = "foo";
std::string bar = foo; // copy (!)
foo += "__";
bar += "!!";
std::cout << foo << std::endl; // foo is "foo__"
std::cout << bar << std::endl; // bar is "foo!!"

This means that whenever we assign a string to a new variable, or pass it to a function, a copy is made. This is important, because the std::string object proper is just a handle, a small structure that manages a larger memory allocation on the heap, where the actual string data is stored. Each new std::string that is made via copy requires allocating a new heap allocation, a relatively expensive operation in performance.

This would cause a problem when we want to pass a std::string to a function, just like an int, but don’t want to actually make a copy of it. But C++ has a feature that helps with that: const references. Details of the C++ reference system are a topic for another post, but const references allow a function to operate on the std::string without the need for a copy, but still promising not to change the original value.

The feature is available for both int and std::string; the principle that they’re treated the same is preserved. But for the sake of performance, ints are passed by value, and std::strings are passed by const reference in the same situation. In practice, this dilutes the benefit of treating them the same, as in practice the function signatures are different if we don’t want to trigger spurious expensive deep copies:

void foo(int bar);
void foo(const std::string &bar);

If you instead declare the function foo like you would with an int, you get a poorly performing deep copy. The default is something you probably don’t want:

void foo(std::string bar);
void foo2(const std::string &bar);
`
std::string bar("Hi"); // Make one heap allocation
foo(bar); // Make another heap allocation
foo2(bar); // No copy is made

This is all part of “pre-modern” C++, but already we’re seeing negative consequences of the decision to treat int and std::string as identical when they are not, a decision that will get more gnarly when applied to moves. This is why Rust has the Copy trait to mark types like i32 (the Rust equivalent of int) as being copyable, so that they can be passed around freely, while requiring an explicit call to clone() for types like String so we know we’re paying the cost of a deep copy, or else an explicit indication that we’re passing by reference:

fn foo(bar: String) {
    // Implementation
}

fn foo2(bar: &str) {
    // Implementation
}

let bar = "hi".to_string();
foo(bar.clone());
foo2(&bar);

The third option in Rust is to move, but we’ll discuss that after we discuss moves in C++.

Copy-Deletes and Moves

C++ value semantics break down even more when we do need the function to hold onto the value. References are only valid as long as the original value is valid, and sometimes a function needs it to stay alive longer. Taking by reference is not an option when the object (whether int or std::string) is being added to a vector that will outlive the original object:

std::vector<int> vi;
std::vector<std::string> vs;
{
    int foo = 3;
    foo += 4;
    vi.push_back(foo);
} // foo goes out of scope, vi lives on
{
    std::string bar = "Hi!";
    bar += " Joe!";
    vs.push_back(bar);
} // bar goes out of scope, vs lives on

So, to add this string to the vector, we must first make an allocation corresponding to the object contained in the variable bar, and then must make a new allocation for the object that lives in vs, and then copy all the data.

Then, when bar goes out of scope, its destructor is called, as is done automatically whenever an object with a destructor goes out of scope. This allows std::string to free its heap allocation.

Which means we copied an allocation into a new heap allocation, just to free the original allocation. Copying an allocation and freeing the old one is equivalent to just re-using the old allocation, just slower. Wouldn’t it make more sense to make the string in the vector just refer to the same heap allocation that bar formerly did?

Such an operation is referred to as a “move,” and the original C++ – pre C++11 – didn’t support them. This was possibly because they didn’t make sense for ints, and so they were not added for objects that were trying to act like ints – but on the other hand, destructors were supported and ints don’t need to be destructed.

In any case, moves were not supported. And so, objects that managed resources – in this case, a heap allocation, but other resources could apply as well – could not be put onto vectors or stored in collections directly without a copy and delete of whatever resource was being managed.

Now, there were ways to handle this in pre-C++11 days. You could add an indirection, and make a heap allocation to contain the std::string object, which is only a small object with a pointer to another allocation, but would at least let you pass around a std::string * which is a raw pointer that would not trigger all these copies by automatically managing the heap allocation with this façade of value semantics. Or you could manually manage a C-style string with char *.

But the most ergonomic, clear std::vector<std::string> could not be used without performance degradation. Worse, if the vector ever needed to be resized, and had to itself switch to a different allocation, it would have to copy all those std::string objects internally and delete the originals, N useless reallocations.

As a demonstration of this, I wrote a sample program with a vastly simplified version of std::string, that tracks how many allocations it makes. It allows C++11-style moves to be enabled or disabled, and then it takes all the command line arguments, creates string objects out of them, and puts them in a vector. For 8 command line arguments, the version with move made, as you might expect, 8 allocations, whereas the version without the move, that just put these strings into a vector, made 23. Each time a string was added to a vector, a spurious allocation was made, and then N spurious allocations had to be made each time the vector doubled.

This problem is purely an artifact of the limitations of the tools provided by C++ to encapsulate and automatically manage memory, RAII and “value semantics.”

Consider this snippet of code:

// Pre-C++11, without moves
std::vector<std::string> vec;
{ // This might take place inside another function
  // Using local block scope for simplicity
    std::string foo = "Hi!";
    vec.push_back(foo);
}
{
    std::string bar = "Hello!";
    vec.push_back(bar);
}
// Use the vector

If we didn’t use this string class, we would then have not done a copy, just to free the original allocation. We would have simply put the pointer into the vector. We would then have been responsible for freeing all the allocations – once – when we’re done:

// Manually written equivalent
std::vector<char *> vec;
{
    // strdup, a POSIX call, makes a new allocation and copies a
    // string into it, here used to turn a static string into one
    // on the heap. We will assume we have a reason to store it
    // on the heap -- perhaps we did more manipulation in the
    // real application to generate the string.

    // The allocation is necessary to be the direct equivalent of
    // `vec.push_back("Hi")` or even `vec.emplace_back("Hi")` for
    // a `std::vector<std::string>, because that data structure has
    // the invariant that all strings in the vector must have their
    // own heap allocation (assuming no small string optimization,
    // which many strings are ineligible for).

    char *foo = strdup("Hi!");
    vec.push_back(foo);
}
{
    char *bar = strdup("Hello!");
    vec.push_back(bar);
}

// Use the vector

// Then, later, when we are done with the vector, free all the elements once
for (char *c: vec) {
    free(c);
}

The copy version of the C++ code instead does – after de-sugaring the RAII and value semantics and inlining – something that no programmer would ever write manually, something equivalent to this:

// Desugaring of pre-C++11 version of code
std::vector<char *> vec;
{
    char *foo = strdup("Hi");
    vec.push_back(strdup(foo)); // Why the additional allocate-and-copy?
    free(foo); // Because the destructor of foo will free the original
}
{
    char *bar = strdup("Hello!");
    vec.push_back(strdup(bar));
    free(bar);
}

// Use the vec
for (char *c: vec) {
    free(c);
}

C++ without move semantics fails to reach its goal of zero-cost abstraction. The version with the abstraction, with the value semantics, compiles to code less efficient than any code someone would write manually, because what we really want is to allocate the allocation while it’s a local variable foo, use the same allocation on the vector, and then only free it on the vector.

The abstractions of only supporting “copy” and “destruct” mean that the destructor of the variable foo must be called when foo goes out of scope. This means that the “copy” operation must make an independent allocation, as it cannot control when the original goes out of scope, or will be replaced with another value. If we had instead re-used the same allocation, it would be freed by foos destructor.

But copying just to destroy the original is silly – silly and ill-performant. What any programmer would naturally write in that situation results in a “move”. So this gap – and it was a huge gap – in C++ value semantics was filled in C++11 when they added a “move” operation.

Because of this addition, using objects with value semantics that managed resources became possible. It also became possible to use objects with value semantics for resources that could not meaningfully be copied, like unique ownership of an object or a thread handle, while still being able to get the advantages of putting such objects in collections and, well, moving them. Shops that previously had to work around value semantics for performance reasons could now use them directly.

It is not, therefore, surprising that this was for many the most exciting change in C++11.

How Move Is Implemented in C++

But for now, let’s put ourselves in the place of the language designers who designed this new move operation. What should this move operation look like? How could we integrate it into the rest of C++?

Ideally, we would want it to output – after inlining – exactly the code that we would expect to write manually. When foo is moved into the vector, the original allocation must not freed. Instead, it is only freed when the vector itself is freed. This is an absolute necessity to solve the problem as we must remove a free in order to remove the allocation, but we also cannot leak memory. If there is to be exactly one allocation, there must be exactly one deallocation.

Calls to free (or delete[] in my example program) are made in the destructor, so the most straight-forward way to go forward is to say that the destructor should only be called when the vector is destroyed, but not when foo goes out of scope. If foo is moved onto the vector, then the compiler should take note that it has been moved from, and simply not call the destructor. The move should be treated as having already destroyed the object, as an operation that accomplishes both initialization of the new object (the string on the vector) from the original object and the destruction of the original object.

This notion is called “destructive move,” and it is how moves are done in Rust, but it is not what C++ opted for. In Rust, the compiler would simply not output a destructor call (a “drop” in Rust) for foo because it has been moved from. But, in fact, the C++ compiler still does. In destructive move semantics, the compiler would not allow foo to be read from after the move, but in fact, the C++ compiler still does, not just for the destructor, but for any operation.

So how is the deallocation avoided, if the compiler doesn’t remove it in this situation? Well, there is a decision to make here. If an object has been moved from, no deallocation should be performed. If it has not, a deallocation should be performed. Rust makes this decision at compile-time (with rare exceptions where it has to add a “drop flag”), but C++ makes it at run-time.

When you write the code that defines what it means to move from an object in C++, you must make sure the original object is in a run-time state where the destructor will still be called on it, and will still succeed. And, since we established already that we must save a deallocation by moving, that means that the destructor must make a run-time decision as to whether to deallocate or not.

The more C-style post-inlining code for our example would then look something like this:

std::vector<char *> vec;
{
    char *foo = strdup("Hi!");
    vec.push_back(foo);
    foo = nullptr;
    if (foo != nullptr) {
        free(foo);
    }
}
{
    char *bar = strdup("Hi!");
    vec.push_back(bar);
    bar = nullptr;
    if (bar != nullptr) {
        free(bar);
    }
}

This null check is hidden by the fact that in C++, free and delete and friends are defined to be no-ops on null, but it still exists. And while the check might be very cheap compared to the cost of calling free, it might not be cheap when things are moved in a tight loop, where free is never actually called. That is to say, this run-time check is not cheap compared to the cost of not calling free.

So, given the semantics of move in C++, it results in code that is not the same as – and not as performant as – the equivalent hand-written C-style code, and therefore it is not a zero-cost abstraction, and doesn’t live up to the goals of C++.

Now, it looks like the optimizer should be able to clean up an adjacent set to null and check for null, but not all examples are as simple as this one, and, like in many situations where the abstraction relies on the optimizer, the optimizer doesn’t always get it.

Arguing Semantics

But that performance hit is small, and it is usually possible to optimize out. If that were the only problem with C++ move semantics, I might find it annoying, but ultimately I’d say, like about many things in about both C++ and Rust, something like: Well, this decision was made, remember to profile, and if you absolutely have to make sure the optimizer got it in a particular instance, check the assembly by hand.

But there’s a few further consequences of that decision.

First off, the resource might not be a memory allocation, and null pointers might not be an appropriate way to indicate that that resource doesn’t exist. This responsibility of having some run-time indication of what resources need to be freed – rather than a one-to-one correspondence between objects and resources – is left up to the implementors of classes. For heap allocations, it is made relatively easy, but the implementor of the class is still responsible for re-setting the original object. In my example, the move constructor reads:

string(string &&other) noexcept {
    m_len = other.m_len;
    m_str = other.m_str;
    other.m_str = nullptr; // Don't forget to do this
}

The move constructor has two responsibilities, where a destructive version would only have one: It must set up state for the new object, and it must set up a valid “moved from” state for the old object. That second obligation is a direct consequence of non-destructive moves, and provides the programmer with another chance to mess something up.

In fact, since destructive moves can almost always be implemented by just copying the memory (and leaving the original memory as garbage data as the destructor will not be called on it), a default move constructor would correctly cover the vast majority of implementations, creating even fewer opportunities to introduce bugs.

But in C++, the moved-from state also has obligations. The destructor has to know at run-time not to reclaim any resources if the object no longer has any, but in general, there is no rule that moved-from objects must immediately be destroyed. The programming language has explicitly decided not to enforce such a rule, and so, to be properly safe, moved-from objects must be considered – and must be – valid values for those objects.

This means that any object that manages a resource now must manage either 1 or 0 copies of that resource. Collections are easy – moved from collections can be made equivalent to the “empty” collection that has no element. For things like thread handles or file handles, this means that you can have a file handle with no corresponding file. Optionality is imported to all “value types.”

So, smart pointer types that manage single-ownership heap allocations, or any sort of transferrable ownership of heap allocations, now of necessity must be nullable. Nullable pointers are a serious cause of errors, as often they are used with the implicit contract that they will not be null, but that contract is not actually represented in the type. Every time a nullable pointer is passed around, you have a potential miscommunication of whether nullptr is a valid value, one that will cause some sort of error condition, or one that may lead to undefined behavior.

C++ move semantics of necessity perpetuate this confusion. Non-nullable smart pointers are unimplementable in C++, not if you want them to be moveable as well.

Move, Complicatedly

This leads me to Herb Sutter’s explanation of C++ move semantics from his blog. I respect Herb Sutter greatly as someone explaining C++, and his materials helped me learn C++ and teach it. An explanation like this is really useful if programming in C++ is what you have to do.

However, I am instead investigating whether C++’s move semantics are reasonable, especially in comparison to programming languages like Rust which do have a destructive move. And from that point of view, I think this blog post, and its necessity, serve as a good illustration of the problems with C++’s move semantics.

I shall respond to specific excerpts from the post.

C++ “move” semantics are simple, and unchanged since C++11. But they are still widely misunderstood, sometimes because of unclear teaching and sometimes because of a desire to view move as something else instead of what it is.

Given the definition he’s about to give of C++ move semantics, I think this is unfair. The goal of move is clear: to allow resources to be transferred when copying would force them to be duplicated. It is obvious from the name. However, the semantics as the language defines them, while enabling that goal, are defined without reference to that goal.

This is doomed to lead to confusion, no matter how good the teaching is. And it is desirable to try to understand the semantics as they connect to the goal of the feature.

To explain what I mean, see the definition he then gives for moving:

In C++, copying or moving from an object a to an object b sets b to a’s original value. The only difference is that copying from a won’t change a, but moving from a might.

This is a fair statement of C++’s move semantics as defined. But it has a disconnect with the goals.

In this definition, we are discussing the assignment written as b = a or as b = std::move(a). The reason why moving might change a, as we’ve discussed, is that a might contain a resource. Moving indicates that we do not wish to copy resources that are expensive or impossible to copy, and that in exchange for this ability, we give up the right to expect that a retain its value.

This definition is the correct one to use for reasoning about C++ programs, but it is not directly connected to why you might want to use the feature at all. It is natural that programmers would want to be able to reason about a feature in a way that aligns with its goals.

The goal of this post is to obscure the goal, and to treat move as if it were a pure optimization of copy, which will not help a programmer understand why a’s value might change, or why move-only types like std::unique_ptr exist.

The explanation of the goal of this operation is reserved in this post for the section entitled “advanced notes for type implementors”.

Of course, almost all C++ programmers in a sufficiently large project have to become “type implementors” to understand and maintain custom types, if not to write fresh implementations of them, so I think most professional programmers should be reading these notes, and so I think it’s unfair to call them advanced. But beyond that, this explanation is core to why the operation exists, and the only explanation for why move-only types exist, which all C++ programmers will have to use:

For types that are move-only (not copyable), move is C++’s closest current approximation to expressing an object that can be cheaply moved around to different memory addresses, by making at least its value cheap to move around.

He follows up with an acknowledgement that destructive moves are a theoretical possibility:

(Other not-yet-standard proposals to go further in this direction include ones with names like “relocatable” and “destructive move,” but those aren’t standard yet so it’s premature to talk about them.)

For his purposes, this is extremely fair, but since my purposes are to compare C++ to Rust and other programming languages which have destructive moves, it is not premature for me to talk about them.

This gets more interesting in the Q&A.

How can moving from an object not change its state?

For example, moving an int doesn’t change the source’s value because an int is cheap to copy, so move just does the same thing as copy. Copy is always a valid implementation of move if the type didn’t provide anything more efficient.

Indeed, for reasons of consistency and generic programming, move is defined on all types that can be moved or copied, even types that don’t implement move differently than copy.

What makes this confusing in C++, however, is that types that manage resources might be written without an implementation of move. They might pre-date the move feature, or their implementor might not have understood move well enough to implement them, or there might be a technical reason why moving couldn’t be implemented in a way that elides the resource duplication. For these types, a move falls back on a copy, even if the copy does significant work. This can be surprising to the programmer, and surprises in programming are never good. More direly, there is no warning when this happens, because the notion of resource management is not referenced in the semantics.

In Rust, a move is always implemented by copying the data in the object itself and then not destructing the original object, and never by copying resources managed by the object, or running any custom code.

But what about the “moved-from” state, isn’t it special somehow?

No. The state of a after it has been moved from is the same as the state of a after any other non-const operation. Move is just another non-constfunction that might (or might not) change the value of the source object.

I disagree in practice. For objects that use move as intended, to avoid copying resources, move will (at least usually) drain its resource. This means that an object that often manages a resource will enter a state in which it is not managing a resource. That state is special, because it is the state when a resource-managing object is doing something other than its normal job, and is not managing a resource. This is not a “special state” by any rigorous definition, but is guaranteed to be intuitively special by virtue of being resource-free. (It is also a special state in that the value is unspecified in general, whereas most of the time, the value is specified.)

Collections can, as I said before, get away with becoming the empty collection in this scenario, but even for those, the empty state is special: It is the only state that can be represented without holding a resource. And many other types of objects cannot even do this. std::unique_ptr’s moved-from state is the null pointer, and without these move semantics, it would be possible to design a std::unique_ptr that did not have a null state.

Once std::unique_ptr is forced to be allowed to have null values, it makes sense that there be other ways to create a null std::unique_ptr, e.g. by default-constructing it. But it is the design of move semantics that force it to have a null value in the first place.

Put another way: std::unique_ptr and thread handles are therefore collections of 0 or 1 heap allocation handles or thread handles, and once defined that way, the “empty” state is not special, but it is move semantics that force them to be defined that way.

Does “but unspecified” mean the object’s invariants might not hold?

No. In C++, an object is valid (meets its invariants) for its entire lifetime, which is from the end of its construction to the start of its destruction…. Moving from an object does not end its lifetime, only destruction does, so moving from an object does not make it invalid or not obey its invariants.

This is true, as discussed above. The moved-from object must be able to be destructed, and there is nothing stopping a programmer for instead doing something else with it. Given that, it must be in some state that its operations can reckon with. But that state is not necessarily one that would be valid if move semantics didn’t force its conclusion, and so again, we are close to the problem.

Does “but unspecified” mean the only safe operation on a moved-from object is to call its destructor?

No.

Does “but unspecified” mean the only safe operation on a moved-from object is to call its destructor or to assign it a new value?

No.

Does “but unspecified” sound scary or confusing to average programmers?

It shouldn’t, it’s just a reminder that the value might have changed, that’s all. It isn’t intended to make “moved-from” seem mysterious (it’s not).

I disagree firmly with the answer to the last question. “Unspecified” values are extremely scary, especially to programmers on team projects, because it means that the behavior of the program is subject to arbitrary change, but that change will not be considered breaking.

For example, std::string does not make any promises about the contents of a moved-from string. However, a programmer – even a senior programmer – may, instead of consulting the documentation, write a test program to find out what the value is of a moved-from string. Seeing an empty string, the programmer might write a program that relies on the string being empty:

std::vector<std::string>
split_into_chunks(const std::string &in) {
    int count = 0;
    std::vector<std::string> res;
    std::string acc;
    for (char c: in) {
        if (count == 4) {
            res.push_back(std::move(acc));
            // Don't need to clear string.
            // I checked and it's empty.
            count = 0;
        }
        acc += c;
    }
}

Of course, you should not do that. A later version of std::string might implement the small string optimization, where strings of below a certain size are not stored in an expensive-to-copy heap resource, but in the actual object itself. In that situation, it would be reasonable to implement move as a copy, which is allowed, and then this program would no longer do the same thing.

But this is a surprise. This is a result of the “unspecified value.” And so while it may, strictly speaking, be “safe” to do things with a moved-from object other than destruct them or assign to them, in practice, without documentation to the contrary making stronger guarantees, the only way to get “not surprising” behavior is to greatly limit what you do with moved-from objects.

What about objects that aren’t safe to be used normally after being moved from?

They are buggy….

By this definition, std::unique_ptr should likely be considered buggy, as null pointers cannot be used “normally”. Similarly, a std::thread object that does not represent a thread handle. It is only by stretching the definition of “used normally” to include these special “empty values” that std::unique_ptr gets to claim to not be buggy under that definition, although a null pointer simply cannot be used the way a normal pointer can.

Again, this attitude, that a null pointer is a normal pointer, that an empty thread handle is a normal type of thread handle, is adaptive to programming C++. But it will inevitably exist in a programmer’s blind spot, as null pointers always have. The “not null” invariant is often expressed implicitly. Many uses of std::unique_ptr are relying on them never being null, and simply leave this up to the programmer to ensure.

Herb Sutter himself discusses this:

Since the problem is that we are not expressing the “not null” invariant, we should express that by construction — one way is to make the pointer member a gsl::not_null<> (see for example the Microsoft GSL implementation) which is copyable but not movable or default-constructible.

In a programming language with destructive moves, it would be possible to have a smart pointer that was both “non-null” and movable. If we need both movability and the ability to express this invariant in the type system, well, C++ cannot help us.

But what about a third option, that the class intends (and documents) that you just shouldn’t call operator< on a moved-from object… that’s a hard-to-use class, but that doesn’t necessarily make it a buggy class, does it?

Yes, in my view it does make it a buggy class that shouldn’t pass code review.

But in a sense, this is exactly what std::unique_ptr is. It has a special state where you cannot call its most important operator, the dereference operator. It only avoids being called buggy because it expands this state so it can be arrived at by other means.

Again, everything Herb Sutter says is true in a strict sense. It is memory-safe to use moved-from objects other than to destroy or assign to them, even if the move operation makes no further guarantees. It simply isn’t safe in a broader sense, in that it will have surprising, changeable behavior. It is true that the null pointer is a valid value of std::unique_ptr, but smart pointers that implement move are forced to have such a value.

And therefore, it should not be surprising that these questions come up. The misconceptions that Herb Sutter is addressing are an unfortunate consequence of the dissonance between the strict semantics of the programming language, where his statements are true, and the practical implications of how these features are used and are intended to be used, where the situation is more complicated.

Moves in Rust

So the natural follow-up question is, how does Rust handle move semantics?

First off, as mentioned before, Rust makes a special case for types that do not need move semantics, where the value itself contains all the information necessary to represent it, where no heap allocations or resources are managed by the value, types like i32. These types implement the special Copy trait, because for these types, copying is cheap, and is the default way to pass to functions or to handle assignments:

fn foo(bar: i32) {
    // Implementation
}

let var: i32 = 3;
foo(var); // copy
foo(var); // copy
foo(var); // copy

For types that are not Copy, such as String, the default function call uses move semantics. In Rust, when a variable is moved from, that variable’s lifetime ends early. The move replaces the destructor call at the end of the block, at compile time, which means it’s a compile time error to write the equivalent code for String:

fn foo(bar: String) {
    // Implementation
}

let var: String = "Hi".to_string();
foo(var); // Move
foo(var); // Compile-Time Error
foo(var); // Compile-Time Error

Copy is a trait, but more entwined with the compiler than most traits. Unlike most traits, you can’t implement it by hand, but only by deriving from primitive types that implement copy. Types like Box, that manage a heap allocation, do not implement copy, and therefore structs that contain Box also cannot.

This is already an advantage to Rust. C++ pretends that all types are the same, even though they require different usage patterns in practice. You can pass a std::string by copy just like an int. Even if you have a vector of vectors of strings, you can pass by copy and that’s usually the default way to pass it – moves in many cases require explicit opt-in. For int it’s a reasonable default, but for collections types it isn’t, and in Rust the programming language is designed accordingly.

If you want a deep copy, you can always explicitly ask for it with .clone():

fn foo(bar: String) {
    // Implementation
}

let var: String = "Hi".to_string();
foo(var.clone()); // Copy
foo(var.clone()); // Copy
foo(var);         // Move

What this actually does is create a clone, or a deep copy, and then move the clone, as foo takes its parameter by move, the default for non-Copy types.

What does a move in Rust actually entail? C++ implements moves with custom-written move constructors, which collections and other resource-managing types have to implement in addition to implementing copying (though automatic implementation is available if building out of other movable types). Rust requires implementations for clone, but for all moves, the implementation is the same: copy the memory in the value itself, and don’t call the destructor on the original value. And in Rust, all types are movable with this exact implementation – non-movable types don’t exist (though non-movable values do). The bytes encode information – such as a pointer – about the resource that the value is managing, and they must accomplish that in the new location just as well as they did in the old location.

C++ can’t do that, because in C++, the implementation of move has to mark the moved-from value as no longer containing the resource. How this marking works depends on the details of the type.

But even if C++ implemented destructive moves, some sort of “move constructor” or custom move implementation would still be required. C++, unlike Rust, does not require that the bytes contained in an object mean the same thing in any arbitrary location. The object could contain a reference to itself, or to part of itself, that would be invalidated by moving it. Or, there could be a data structure somewhere with a reference to it, that would need to be updated. C++ would have to give types an opportunity to address such things.

Safe Rust forbids these things. The lifetime of a value takes moves into account; you can’t move from a value unless there are no references to it. And in safe Rust, there is no way for the user to create a self-referential value (though the compiler can in its implementation of async – but only if the value is already “pinned,” which we will discuss in a moment).

But even in unsafe Rust, such things violate the principle of move. Moving is always safe, and unsafe Rust is always responsible for keeping safe code safe. As a result, Rust has a mechanism called “pinning” that indicates, in the type system, that a particular value will never move again, which can be used to implement self-referential values and which is used in async. The details are beyond the scope of this blog post, but it does mean that Rust can avoid the issue of move semantics for non-movable values without ruining the simplicity of its move semantics.

For these rare circumstances, the features of moving can be accomplished by indirection, and using a Box that points to a pinned value on the heap. And there is nothing stopping such types from implementing a custom function which effectively implements a custom move by consuming the pinned value, and outputs a new value, which can then be pinned in a different location. There is no need to muddy the built-in move operation with such semantics.

Practical Implications for C++ Programmers

So, obviously, in light of my blog series, I recommend using Rust over C++. For Rust users, I hope this clarifies why the move semantics are the way they are, and why the Copy trait exists and is so important.

But of course, not everyone has the choice of using Rust. There are a lot of large, mature C++ codebases that are well-tested and not going away anytime soon, and many programmers working on those codebases. For these programmers, here is some advice for the footgun that is C++ move semantics, both based on what we’ve discussed, and a few gotchas that were out of the scope of this post:

Learn the difference between rvalue, lvalue, and forwarding references. Learn the rules for how passing by value works in modern C++. These topics are out of the scope of this blog post, but they are core parts of C++ move semantics and especially how overloading is handled in situations where moves are possible. Scott Meyers’s Effective Modern C++ is an excellent resource.
Move constructors and assignment operators should always be noexcept. Otherwise, std::vector and many other library utilities will simply ignore them. There is no warning for this.
The only sane things to do with most moved-from objects are to immediately destroy it or reset its value. Comment about this in your code! If the class specifically defines that moved-from values are empty or null, note that in a comment too, so that programmers don’t get the impression that there are any guarantees about moved-from values in general.

Conclusion

Move semantics are essential to the performance of modern C++. Without them, much of its standard library would become much more difficult to use. However, the specific design of moves in C++:

is misaligned with the purpose of moving
fails to eliminate all run-time cost
surprises programmers, and
forces designers of types to implement an “empty-yet-valid” state

Why, then, does C++ use such a definition? Well, C++ was not originally designed with move semantics in mind. Proposals to add destructive move do not interact well with the existing language semantics. One interesting blog post that I found even says, when following through on the consequences of adding destructive move semantics:

… if you try to statically detect such situations, you end up with Rust.

C++ has so many unsafe features and so many existing mechanisms, that this was deemed the most reasonable way to add move semantics to C++, harmful as it is.

And perhaps this decision was unnecessary. Perhaps there was a way – perhaps there still is a way – to add destructive moves to C++. But for right now, non-destructive moves are the ones the maintainers of C++ have decided on. And even if destructive moves were added, it’s unlikely that they’d be as clean as the Rust version, and the existing non-destructive moves would still have to be supported for backwards-compatibility sake.

In any case, Rust has taken this opportunity to learn from existing programming languages, and to solve the same problems in a cleaner, more principled way. And so, for the move semantics as well as for the syntax, I recommend Rust over C++.

And to be clear, this still has very little to do with the safety features of Rust. A more C++-style language with no unsafe keyword and no safety guarantees could have still gone the Rust way, or something similar to it. Rust is not just a safer alternative to C++, but, as I continue to argue, unsafe Rust is a better unsafe language than C++.

Sayonara, C++, and hello to Rust!

2021-10-26T00:00:00+00:00

This past May, I started a new job working in Rust. I was somewhat skeptical of Rust for a while, but it turns out, it really is all it’s cracked up to be. As a long-time C++ programmer, and C++ instructor, I am convinced that Rust is better than C++ in all of C++’s application space, that for any new programming project where C++ would make sense as the programming language, Rust would make more sense.

What Rust is not for

Before going into more detail about why I think that, I’d like to throw out a few caveats, so you know I’m a reasonable person and not just an extremist fanboy.

Caveat the first: Note that I said this about new programming project. There are some people on the Internet who demand the re-writing of all existing C and C++ projects in Rust, and while I think Rust is a better language for new projects, and that many existing projects should seriously consider integrating it, I realize, like a reasonable mature programmer, that for most existing projects, a Rust re-write would be a prohibitively expensive rabbit hole. In short, Rust is not so amazing that it will protect you from second system syndrome or from the perils of a complete rewrite. It is but a mortal programming language.

That said, new Rust versions of aging C and C++ projects are often very worthwhile and exciting, like new versions of many aging projects can be. It’s just not a magical exception to basic economics.

Caveat the second: Note also that I said “where C++ would make sense.” Rust has a lot of enthusiastic fans, and so there are a lot of people learning Rust expecting the magic when they first learned their favorite programming language. And what they find is a programming language that requires lots of arcane rules, where everything seems rather tedious, and where a lot of their favorite features don’t exist.

Rust is a systems programming language. It is not garbage collected, meaning you do have to manually manage memory. While Rust makes it much harder to do that egregiously wrong, it’s still a very hard problem, and there are trade-offs that Rust – unlike GC’d languages – refuses to make for you. Meanwhile, like C++, the emphasis is on performance (or at least control over performance), whether latency or throughput or memory footprint. Rust is trying to make sure that all its organizational abstractions have no run-time cost, or, if they do, to make sure it’s abundantly clear exactly what that cost is. If you are a systems programmer, if you are used to C and C++ and to trying to solve systems programming types of problems, Rust is magical, just like when you learned your previous favorite programming language.

If you are not, Rust is overkill for your task at hand and you shouldn’t be using it. I earnestly recommend Haskell.

For more clarity on what I mean by systems programming: If you write Python or JavaScript or Ruby, then you’re running the code in a Python interpreter, in Node or a web browser, in the Ruby interpreter, all on top of an operating system with an operating system kernel. Rust doesn’t replace those tools. Rather, the Python interpreter, the web browser, and Node, and even the kernel, are programs written in C or C++, and Rust replaces that. It’s a whole ’nother level of programming, where you have manage the actual hardware.

Purpose of this series

But enough of what Rust is and is not for. It is an excellent systems programming language, and one that was a long time coming. I plan on writing several posts about Rust features, why they’re an improvement upon C++ features, and why Rust is a better, more modern programming language. Mostly, this will be a discussion of why Rust is better than C++, which I think is the most comparable existing programming language, but it will also touch on why Rust is an improvement on C.

Because of this C++ focus, this series will at times be as much or more a criticism of C++ as it is a commendation of Rust. I think that is unavoidable, as this type of criticism of C++ is most truly credible when an alternative is available, and similarly, Rust is most practically evaluated in terms of its most viable alternative. Unfortunately, that also means that I’ll assume some level of familiarity with C++, but hopefully not too much.

I know that this is a much-discussed topic. Perhaps this is the Rust equivalent of the dreaded Haskell monad tutorial, where every person new to the programming language excitedly writes the same thing, and so thank you for reading. I’m going to try and avoid the obvious tropes: I’m going to try to do more than simply beat the table about type-safety and memory safety and avoiding undefined behavior – though of course these topics will come up. In fact, I had until rather recently simply assumed that the safety of Rust would lead to unacceptable performance degradation, that Rust might be well and good for some applications but could unfortunately never be useful in a true low-latency environment. I had to be persuaded that memory safety wasn’t a downside in such contexts, that Rust could truly be a competitor to C or C++ and not just to Go or Swift.

The syntax of C++

So for today, I’m going to ignore memory safety completely. Even assuming that C++ was more or less right that performance and optimization requires a broad range of undefined behaviors, there were still problems with C++ that left me regularly begging for at least a syntactic rewrite. As Bjarne Stroustrup, creator of C++, famously said: “Within C++, there is a much smaller and cleaner language struggling to get out.”

He later clarified that he wasn’t talking about a streamlined GC’d language like Java, and of course he is aware of Rust and still on the C++ train. As he clarified, he was talking about the syntax of C++, and the legacy of C. But just that category, just syntax, is I think enough to justify a do-over of C++. I fantasized continually about a new syntax – with identical semantics in my mind – that could be migrated to in a file-by-file basis, with its own file extension. This, I realize now, would in practice make for a new language, and a good opportunity to introduce modern typing in it, and in Rust, I see my hope realized, if a little more inconveniently than I imagined.

So that is what I want to focus on for the rest of this post: Why C++ syntax is rotten. Analyses of other Rust features I will reserve for future posts.

Many of C++’s syntactic foibles have to do with its C heritage. This is not to smear C: the same features that make sense in a simple “portable assembly” like C begin to break down when they are preserved with almost-identical syntax and naively extended semantics in a language that promises powerful generic programming features that assist in automatic code generation, resource management, and memory safety.

Header Files

This is really clear in my first example: header files. In C, they serve two purposes. For the programmer, they allow a separation of interface and implementation, especially when considering that modern IDE technology did not exist when C was developed. The header file shows the external interface of how to use the module, and the C file shows how it is implemented.

For the compiler, this arrangement simplifies implementation. The information necessary to compile each module is all included in the C file for the module plus all the headers included by it (and included by them, etc.). None of the other .c C files need be consulted, only the much smaller .h headers, in a practice known as separate compilation.

The problem when this is extended to C++ is templates. These are essentially macros, in that they allow the on-demand generation of new code, based on what is going on in the client code. So if we imagine that the module broadcast depends on the module connection, and that the compiler is currently compiling broadcast, it would not only be necessary to investigate the interface to connection, but also the complete implementation of the templates.

If we are to preserve the programmers’ perspective, and keep only the interface in the header files, this means that “separate compilation” is broken and that the compiler would need to fish around in the main C++ files. If we instead preserve the concept of separate compilation, the headers are no longer about just interfaces, but also implementation details.

The inventors of C++ decided to preserve the concept of separate compilation, and in the laziest way possible, literally the exact same implementation of C rather than trying to apply the same principles and goals, and re-engineering something better.

So now, we have the situation for the programmer where the header file contains a duplicate of the interface, in addition to the implementation of all functions that happen to be templates. To a compiler, a function and a function template are very different things, but to a user, functions go back and forth between template and not all the time, requiring the programmer to move them between files.

Why does this distinction exist at all in C++? Computers are much faster now. The compiler, to preserve efficiency, could automatically extract its own binary file of what information it needs from each module to compile other modules. The compiler could do this work for us. It is doing far more work in all of its optimization steps.

C is famously a portable syntax for writing assembly. In C, the information needed by the compiler, the application binary interface or ABI, is exactly the same as the interface needed by the programmer, the application programming interface or API – or at least very close to it. And so in C, the concept of header files makes sense, if unnecessary with modern compiler technology.

And to be clear, templates are just the most egregious example of non-interface code needed from other modules: bodies of inline functions and private member variables in classes are also not part of the interface from a programming perspective, but part of the binary interface, part of what the compiler needs to know about a module to compile the other modules that depend on it.

Now, you may think, why is this such a big deal? Why such a complaint about the inconvenience of moving things between files, or duplicating some information? Why does it matter if the rules of which things to put in which file are on the arcane side? You might imagine, so what? You’ll mess it sometimes, the compiler will issue an error, you’ll say “oh, right” and move the code to the appropriate location.

To such objections I say: You don’t know C++. Unfortunately, I’ve seen this attitude taken by professional C++ programmers, who were careless with header files, moving code around in bulk in a way that was liable to break this particular set of arcane rules, and accusing me of overreacting and wasting time when I objected to this.

For those who haven’t had the misfortunate of finding this out the hard way, when you break an arcane rule in C++ – even rules that have nothing to do with run-time behavior or memory safety – you are lucky if you get away with a simple compiler error – or even the somewhat more common arcane, incomprehensible compiler error. Unfortunately, the result is regularly no error at all. The compiler cannot tell, from its separate compilation point of view, if the information provided in the headers is consistent. One module might import one version of a header, and another module might import another.

This may sound unlikely, but many codebases have the practice of separating out the actual interface in one header, and the template implementations in another. At this point, it becomes important which one is included, especially because “template specializations” mean that additional template code doesn’t just make more templates available, but changes the meaning of existing templates.

If the templates included in different compilation units are inconsistent, the result is undefinned behavior, and the program might potentially do anything. Unfortunately, this also means the behavior might switch arbitrarily between different compiler versions, different compiler vendors, or based on seemingly unrelated permutations. Unpredictable behavior changes lead to bugs and security vulnerabilities.

Worse, header files are implemented by textual inclusion. The compiler proceeds as if the contents of the header were literally included in the module that imports them. Cycles of inclusion don’t result in error, but instead, a header is simply not included (if common precautions are made) when the second recursion happens.

Thus: Imports via header files are sensitive to ordering. A seemingly-innocuous change, like alphabetizing the included header files in each module, can break builds or change behavior. Such a change rolled out over an entire company’s codebase can be disastrous, and take many programmer-months to unravel the consequences of. Ask me how I know.

So it should make sense that the first concept I had for my “new syntax” for C++ was that header files should be auto-generated from source files, preferably in a pre-compiled binary format.

This would be an implementation detail of the build artifact, maintained a build directory, and be a compiler-specific optimization in favor of better compilation times. Semantically, rather than textual inclusion, there would simply be a declaration in one module to say that another module’s public interface could be used, where order wouldn’t matter.

Rust is a modern programming language, and the Rust use directive does in fact work that way.

This isn’t particularly a special point about Rust. This would be the obvious way to construct any new programming language. The compiler doesn’t need header files – the C preprocessor that implements #include directives, along with the rule that functions and structures must be declared before use, is a hold-over from the time when compilers ran on computers slower than a modern thermostat. And programmers don’t need them either: A better place for interfaces to be put in a separate file would be automatically-generated documentation.

So Rust here gets points for doing what any sensible modern programming language would do, and C++ loses points for carrying over an implementation detail from C to a context where it no longer makes any sense.

Syntax and Layout

Since we’re talking about the syntax of C++, I wanted to touch on something very basic but very serious: basic syntax for control structures. C and its syntactic descendants, including C# and Java, use something like this for if-statements and for-statements:

if (!foo.is_empty()) {
    spin_up_thread(foo);
    destroy(&bar);
}
do_something_else();

In this example, the calls to spin_up_thread and destroy are inside the if statement, and only happen if foo is indeed non-empty. The call to do_something_else is not part of the if statement.

How do we know that? Well, the compiler knows that because after the if statement there is an opening brace, and so all statements are included until the matching closing brace, including the two mentioned. But, depending on how fast we’re skimming the code, we probably know that because the spin_up_thread and destroy calls are indented.

In this situation, in what will be a recurring theme in this comparison, the compiler and the programmer are getting their information from different places. Therefore, the compiler and the programmer can disagree, especially as braces aren’t mandatory, and if omitted indicate that only the first subsequent statement is included:

if (!foo.is_empty())
    spin_up_thread(foo);
    destroy(&bar); // Warning: This is done unconditionally
do_something_else();

This looks like it only destroys &bar conditionally, and to a human following the indentation in code review or casual reading, that’s exactly what you would expect. But there’s no braces and the compiler, for whatever reason, ignores the same whitespace that human readers rely on.

This has come up in personal projects of mine, usually when collaborating with someone else. Even if you make the personal discipline of always including the braces { around the body of your if-statements }, someone else might not have that discipline, and therefore, you might be exposed to this intermediate-state code:

if (!foo.is_empty())
    spin_up_thread(foo);
do_something_else();

Needing to add a call to destroy(&bar) in the condition, after spin_up_thread, you find the line, add a new line at the same indentation level, and simply fail to notice that the new line is not actually wrapped in any {.

This was, of course, the direct cause of a major security vulnerability in iOS and macOS:

if (some_err_condition)
    goto fail;
    goto fail;

Since humans use indentation to read code, and to determine what is in a block and what isn’t, I would’ve wanted my wish-list “new C++ syntax” programming language to take a page from Python and use significant whitespace:

if !foo.is_empty():
    spin_up_thread(foo)
    destroy(&bar)
do_something_else()

Rust is a minor disappointment in this department. It stuck to braces, and whitespace being “insignificant.” But it made a huge improvement, far outweighing my disappointment: Rust at least prevents the goto fail scenario by making braces mandatory, helping ergonomics by instead removing bracketing around the condition. Having the body of the if-statement without brackets is simply not worth it as a short-cut, but if the braces are mandatory, then the parentheses aren’t necessary:

if !foo.is_empty() {
    spin_up_thread(foo);
    destroy(bar);
}
do_something_else();

This is better because then the goto fail example would still be glaringly obviously failsome, because even if the indentation does not match the braces, the braces still have to go somewhere, and will jump out at you:

if some_err_condition {
    goto fail; }
    goto fail;

It disappoints me that these issues tend to be dismissed as “matters of taste,” because as Apple learned, there are actual consequences to this misalignment of what programmers pay attention to and what the compiler pays attention to. I would have liked Rust to go the whole way, and remove altogether this strange concept that whitespace should be insignificant, a concept that my oldest C and C++ books exclaimed as a great feature without explanation or justification. But at least Rust has fixed the most egregious consequences of C++ syntax. Again, this problem with C++ comes from a feature inherited from C, but in this case C was just wrong to begin with, and should’ve done it the Rust way (or the Python way) from the very start.

Additionally, Rust has good auto-formatting, which unlike C++ auto-formatting tools, do not break code (by, for example, re-ordering headers). This fundamentally replaces the whitespace provided by the programmer – which might be misleading to other programmers – with whitespace that aligns with the compiler’s interpretation, and therefore is correct to rely on when skimming. A good cargo fmt should therefore be run before every code review, to make sure that the code can be easily and correctly read.

C-isms vs “Modern C++”

I then have one more topic before I wrap up syntax. C++ programmers nowadays are telling everyone who was upset with their language in the 90’s and aughts that it’s better now, that C++ has cleaned up its act. C++11 really has changed a lot, and C++ is innovating again, and that’s very good. C++ is full of new features, and part of its claim to be a modern programming language involves claiming that programming in C++ is good, if you use these new features.

Smart pointers, written std::unique_ptr<Foo>, allow automatic implementation of construction and destruction for owning pointers, and allows much clearer communication about ownership semantics in function signatures, and so is preferable to writing the C-style Foo *. C++ arrays, std::array<Foo, 12> arr;, act like any other STL collection, allow them and their iterators to be passed to standard templates that expect STL interfaces, and provide a number of useful features as methods, and so using them is preferable to the C-style Foo arr[12];. static_cast<A>(b) is much more specific, and therefore less prone to accident, than the C-style (A)b.

These are among the features that are trotted out whenever someone says they used C++ in the 90’s and it had all these problems. These features are among the ones used to claim that C++ is a better, cleaner, tidier, more modern programming language than it used to be. Whether or not they’ve done enough to replace their old counter-parts – they’re generally preferred whenever possible.

The problem? Convenience. Who wants to type std::unique_ptr<Foo> when instead you can write Foo *? Why are the somewhat-deprecated options the easy ones to write? Why isn’t it something like std::raw_ptr<Foo> with some convenient notation for std::unique_ptr?

But of course, that would break compatibility with C, and with earlier versions of C++.

I don’t want to get into the myriad reasons why smart pointers are to be preferred to raw pointers, or why raw pointers occupy such an awkward place in C++ – those are topics for a future post. But for however many seemingly-principled reasons some of my colleagues might state for why they used raw pointers in this or that situation, I couldn’t shake the feeling that it was partially because raw pointers were given the old-fashioned, easy-to-type notation.

And so, when I imagined my new C++ syntax, it would have Foo * mean std::unique_ptr<Foo>, and Foo arr[12] mean std::array<Foo, 12>. Why have the not-entirely-deprecated-but-not-preferred legacy C features be the easier ones to type?

Conclusions

All in all, this shows that a lot of purely syntactic but still substantial and consequential problems with C++ can be fixed with a syntax reboot, which Rust mostly provides. And I haven’t once mentioned type safety or memory safety! This will be developed on further in this blog series, where I will maintain that Rust is not only a better programming language than C++, but a better unsafe programming language than C++. Even if I had to use unsafe for every function in my module, I’d still rather write my module in Rust than C++, for all these reasons. I say this as a pre-emptive strike against the argument that occasionally having to use unsafe to achieve performance parity with C++ (and it is very occasional) “defeats the whole purpose” of Rust.

But of course, there are deeper problems with C++ that Rust also addresses, beyond just the syntactic. But those will have to wait for future posts.

Apple Silicon

2020-11-16T00:00:00+00:00

This year, Apple released, to much fanfare, a somewhat obscure technical change to how its computers work: Macs will transition away from Intel’s CPUs to in-house processors known as “Apple Silicon,” more similar to the technology Apple already uses in its phones and tablets. It is a tremendous amount of hype for something rather technical, and to people used to more user-visible feature announcements, this can be somewhat disappointing, or at least confusing.

What does this actually mean for the end user? Apple claims that these new Macs will be (many times) faster, run cooler, and have much better battery life. Are these improvements as drastic as Apple claims? Will there be downsides and other adjustments that users will have to make, or will these new computers just work like faster, less power-hungry Macs?

A lot of responses I’ve seen seem pretty skeptical, which is fair. It’s been a long time since the drastic improvements of Moore’s law have been the norm in computing; we’re used to much more incremental improvements. And Apple is claiming to achieve these improvements by moving away from Intel, when Intel is the established market leader in making high-powered PC processors. Can these new computers really be that much better?

My answer, as your computer-nerdy friend, is that these computers are not only going to be a great technical improvement over previous Macs, but represent a revolution even beyond the Apple ecosystem, a turning point for PCs in general, and one that was a long time coming. To explain why, I’m going to delve into some computer history to give context to this shift, and some of the technical details of how computers work, and specifically in what ways these new Macs will work differently from the current ones.

So bear with me as we go deep. I promise it’s relevant.

Operating Systems and App Compatibility

Do you remember when it mattered a lot more which operating system you ran?

Nowadays, most of my personal time on the computer is spent on the web browser, doing my writing on a website, my TV watching on a website and even my TODO lists. I look at the bottom of the screen on my non-work computer, the MacBook I’m using to type this, and I see a slew of icons for various apps: a messenger app, a mail app, a calendar app, a spreadsheet, a word processor, all in all standard computer fare, but not getting much use compared to that Google Chrome icon. As a result, unless we’re doing specialized tasks, like programming (as I do for work) or CAD or photo-editing, apps besides the browser is not the big deal it used to be.

But once, it was a huge deal. Every computer user had several different apps in their workflow, and using an alternative operating system (as macOS once was) risked not being able to find appropriate equivalent apps, and possibly not even being able to read documents that would be in “Windows formatting.” There was tremendous social pressure to use the same software as other people, to the extent that this webcomic rang true.

Therefore, every serious personal computer application existed as a Windows program, specifically a Windows program that ran on Intel (or Intel-compatible) processors (known as “Wintel,” especially by those who criticized it as a monopoly). A company would support other types of computers only as an after-thought. And at that time, apps were distributed on CDs and stored on physical media. It was common to have an old version of an app lying around, not be able to receive live updates for it, and to expect it to run on a newly-purchased computer, unmodified.

In such an environment, compatibility was key to profits. Each new Intel processor, and each new version of Windows, had to support all the apps that could run on the previous version. There was no higher priority. For years after Windows 95, since before Windows 95 was even released, Microsoft had another operating system, Windows NT, that was far more stable and technically superior. But Windows 95 was more similar to Windows 3.1 before it, and supported more apps, so until Microsoft could get Windows NT to run all those other apps, it was stuck with the inferior product. Eventually, 6 years later, Microsoft came out with a version of Windows NT able to run 95 apps: Windows XP. Even that transition was gnarly.

The Intel side of the “Wintel” monopoly was similar. To this day, a modern Intel processor is capable of running MS-DOS programs from the 80s directly, without requiring any emulation layer in the operating system or any modifications to those programs. The antiquated 16-bit instructions that comprise those programs will still be interpreted by the modern hardware, which also supports countless other compatibility modes for various eras of the processor history. And this is true not only for the Intel processors that power Windows PCs, where it makes some amount of sense to support all the Microsoft ecosystems of years past, but also on Macs, where this history is much shallower.

Processor Design and Instruction-Set Architecture

See, Intel is more than just a company. Intel is also an instruction set architecture, also known as Intel64 (or Intel32 for 32 bit versions) or x86 (sometimes x64 for 64 bit versions). When applications are prepared to run on Intel, they are (traditionally) compiled to a file containing a sequence of instructions. The meaning of each instruction is determined by complicated standards, and the hardware of the processor must take these instructions, and actually perform the designated operations.

The amount of complexity involved in the meanings of these instructions, or the instruction set architecture (ISA) is large. Intel’s ISA was documented in 3 paperback volumes back when I was a child in the early aughts when my parents were gracious enough to order the set for me. It has only grown more gnarly since then.

Because of the vast compatibility requirements that Intel in particular historically has faced, their ISA, has never been redesigned from scratch since the 80’s. This means that, instead of having every instruction be a fixed length of say, 4 bytes, Intel instructions can range from 1 to 15 bytes. Some of them do simple things like adding two numbers together, whereas others do more complicated things like copying an entire string of characters. Many of the design choices would never be made by a modern engineer, but Intel is stuck with them for historical reasons.

But Intel hasn’t been able to clean this up, for the same reason that “Wintel” customers can’t switch from Intel: Any clean-up would mean old apps would not be able to run on the new computers. When Intel did attempt this, with Itanium, the world wasn’t ready, even though Intel tried to leverage the already-difficult transition to 64-bit as a reason to get people to switch to a completely new architecture. Even worse, any clean-up would mean that Intel would have to compete with other companies as equals, whereas now there is only one other company, AMD, that is allowed (for historical reasons) to design Intel-ISA processors. Embarrassingly, it was AMD that actually convinced everyone to switch to 64-bit, by providing a much more gradual transition to a 64-bit ISA far more similar to the existing 32-bit one.

Processor architectures with such convoluted ISAs are referred to as CISC, for Complex Instruction Set Computing. All of this complexity must be implemented using more complex hardware. Intel processors come with decoders to break down over-complicated instructions into smaller pieces, reorder buffers to optimize on-the-fly which pieces can be done when, and extra circuitry to handle all of the various compatibility modes that are necessary to support old software. All of this constrains processor design and, very specifically, draws extra power.

What’s the alternative? The transition to mobile, to phones and tablets, gave computing something of a fresh start. No one ever expected their “Wintel” apps to run on a phone, and so when initial iPhones and Androids came out, new apps were written from scratch. Those would be written for whatever processor architecture Apple and Google chose, and they chose a more modern, less-CISC ISA: ARM. ARM stands for Advanced RISC Machine, where RISC, or Reduced Instruction Set Computing, is the opposite of CISC.

The ARM ISA, which unlike Intel’s is available for any company to license and design their own compatible processors, takes a moderate position in the historical RISC/CISC wars, and require in any case far less decoding circuitry than Intel processors require. When Intel tried to make Intel-ISA processors for phones and lightweight laptops, the Atom processors, it was a failure. The decoder got in the way of achieving a good combination of performance and power consumption, and the resulting phones were either unacceptably slow or unacceptably low in battery life.

And Apple Silicon is Apple’s branding for their ARM ISA processors, supporting the ARM ISA with Apple’s proprietary processor design. They’re bringing the benefits of phones to the PC world.

New Modes of App Development

So the “Wintel” monopoly and Intel in general never jumped from the PC world to mobile, and as a result we have our cool, fanless, high battery life but high performance phones we have today. But why, then, do Macs, which never ran Windows programs, use Intel processors to begin with? Why are they switching now? And why is this a turning point for PCs in general?

Well, for one thing, it’s not entirely true that Macs don’t run Windows programs. A key reason why Apple switched from Power to Intel in the first place was that Macs can run Windows programs: by either running a version of Windows simultaneously to running macOS (https://www.parallels.com/), or, more simply, by rebooting the same computer into Windows, which runs on Mac just as well as on any other type of PC. When Intel Macs first came out, this was a decisive feature for many switchers, nervous to abandon app compatibility.

And at the time, Intel was the best processor manufacturer in existence, so that Intel processors with their flaws were still better (as they were produced through better manufacturing processes) than the POWER-based RISC processors Apple was previously using. Get better processors and get some level of Windows-compatibility: the decision was clear for Apple at the time.

But now, Windows compatibility is not important to hardly any Mac users. And there are a number of reasons for that. Nowadays, a lot of software is not translated to machine code, the level at which ISAs are relevant. A lot of software is delivered to us via the browser, where a portion runs on servers in the cloud and a portion is written in Javascript, and then either interpreted in that form, or translated to the ISA of the computer live by the browser. Once the browser supports an ISA, all websites come with it.

And even for apps, many of them are written in higher-level languages that use a virtual machine or Just-In-Time compilation to be processor-architecture neutral. These programming language technologies matured after Intel had already become stuck with its backwards-compatibility advantage, and apps written with them also are easily portable, which is to say, brought to a new ISA. Once the Java virtual machine (for example) is ported to ARM, all Java programs come with it.

And even for programs that are compiled in a traditional fashion, written in an old-fashioned compiled programming language like C or C++ (or the Apple-specific Objective-C or Swift), ISA compatibility is no longer the issue it once was. These languages have evolved over time to make it easier to re-target ISAs, and programmers nowadays are better trained in writing their code in such a way that it can be compiled for any ISA. Creating an ARM version of a Mac app is just a switch in the compilation system, and maybe finding a few obscure bugs where the differences matter a little more deeply.

And once the new version is made, Android and iOS both transitioned from 32-bit to 64-bit ARM, requiring new apps to be built, and we as customers hardly noticed. Developers quietly prepared 64 bit versions of our apps, and when we upgraded to 64-bit compatible phones, we didn’t notice that the app store sent us a different version. After all, we get updated versions of apps from the app store all the time. As long as the developer can adjust, the end user just has to do some more downloading – cheap and easy in an era of widespread broadband.

Conclusion

And so what does Apple lose out on for the Apple Silicon Macs? The ability to boot Windows is now more of a liability than an asset for them. Old programs will rapidly be ported. Due to modern technology, which allows us to translate between ISAs on the spot, emulating Intel on the new Macs is often faster than running the Intel programs directly on the old Macs.

As users, we might be mildly frustrated by it. I certainly will be a little worried about buying such a computer until I know that some version of Linux will work smoothly on it – Linux on ARM is currently very much so a second-class citizen in the PC Linux world.

But the advantages are great. No longer constrained by hardware decoders, Macs will cheat the old trade-off between computational power and battery life, and get to have their cake and eat it too, at least for one round of abrupt improvement for the transition. And because now the PC and phones use the same processor architectures, iOS and iPadOS apps will now work on macOS as well, saving time for writers of tablet applications.

And other PC manufacturers will end up having to notice. Windows on ARM exists already, though it is currently obscure. If Microsoft can cultivate as modern an ecosystem as Apple, where a combination of emulation and streamlined distribution make it easy to get ARM versions, these new Macs might start an ARMification trend.

This spells (long-term) doom for Intel. Their business model is tied to their ISA, on the premise that no one can afford to switch away from the most popular ISA, that everyone is locked in. This was never true for mobile, in spite of Intel’s best efforts, and as Apple is demonstrating, it also hasn’t really been true for PCs for a while either.

Open Internet, Closed Web

2019-12-23T00:00:00+00:00

The Internet promised — and still promises — a revolution in democratic, decentralized, and open communications. And yet, we see today a tech world controlled by a few central players, as Elizabeth Warren promises to break them up and Congress summons Mark Zuckerberg to explain his company’s role in privacy-violating election-manipulating foreign conspiracies. But Presidential use of anti-trust laws and new Congressional regulations of social media won’t address the more fundamental issues: The Internet is now structured, on a technical and social level, so as to naturally encourage centralized monopolies.

To explain this, we’ll first have to explain some terms. In common parlance, the terms Web and Internet are used interchangeably, but technically they refer to different elements of what now looks like a single system. The Internet refers to the single global connected network, and technologies that allow any computer on it to connect to any other computer on it — but without saying much about what the connection looks like. The Web is but one way of communicating information over the Internet, where you use a browser to access “websites,” but other ones exist: for example online video games don’t generally use the web to sync data between players. Examples are easier to find as we go back in time: the stand-alone AOL instant messenger app did not use the web, and neither did old-fashioned e-mail clients like Outlook or Thunderbird, or Bittorrent and other torrent trackers.

What makes the web different, that it has eaten up these other services, that now we do our movie-watching, our chatting, and our e-mailing in the web browser?

The web started out as a way of posting content — you would enter your URL, which identified what server (or publically accessible computer) you wanted a webpage from, and what page you wanted. The browser would send a request to the server, and it would send you back the page at that URL, likely either an article, or a directory of articles. They would have text and possibly embedded images, and could link to each other, and specify another URL to go to. The original concept of the web would have included sites like magazines, and envisioned sites like Wikipedia, but would not have been able to support e-mail or a chat app or a social media platform like Facebook.

The web was just the “public content” protocol alongside other protocols, and similar to them. You could choose your own browser, Netscape or Internet Explorer, and access the same web pages, just like you could choose your own e-mail client, Outlook Express or Eudora, to access the same feed of e-mail. The software was installed on your computer, and what you accessed through it was content, and that content all was for you to read.

Gradually, however, this changed as the web became more flexible. CGI allowed forms on websites to connect to programs that would be run in response on the server. Java and Flash and ActiveX allowed you to embed programs in your website — programs that you would not download and run on their own, but that came with the page and acted as if they were part of the page. And gradually, Javascript, originally used to validate forms before they were submitted, or to do simple animations, became powerful, as browser vendors competed to make it run fast, and as it gained more capabilities.

When you go to Facebook, you are not reading a page that someone posted there; you’re not accessing “content” in the traditional sense. What you are doing is downloading, on the spot, a large application. Not only is the content sent over the wire — the statuses, the comments, the pictures, the lists of people who like it — but, inseparably from it, we are sent the software that is used to process the content, the application used to enter it and generate it, sent to run in the browser every time we type in “www.facebook.com”. It is only through the lens of that Javascript program that we can access the content itself.

Indeed, every time we go to a modern website, especially one by a major tech company, we load a fresh program into our browsers. No longer are browsers just renderers of pages stored on servers, they are platforms where programs run, where the programs are written not for Windows or Mac or Linux, but for the web browser, now typically for Google Chrome, which has become an operating system unto itself.

Why does this lend itself to monopolization and privacy problems? For one thing, the web lends itself to an integration of frontend or client code, which runs on your computer, and backend code, which runs on a server. With a non-web protocol, you can use many programs to access the content on a specific server: different e-mail clients for the same provider, different trackers for the same torrent. You can also combine multiple e-mail providers or torrents in a single window. With the web, you go to the server, and you are provided with the client program to access the services it provides. You can’t take the Facebook Javascript code and point it at Twitter, nor can you expect your own custom Facebook app to work.

Imagine how a social network like Facebook might work if it were conceived of outside of the web. There might be a standardized protocol (say SSP for Standard Social Protocol) and multiple packages of client and server software. A school or a church or another community stakeholder might run their own copy of the server software, and you might have accounts on multiple such servers. All the status updates could be aggregated together in a single feed, and you could configure settings to indicate which servers your posts went to. Perhaps you could have “friends” at a server you don’t subscribe to, and specify both their username and what server they use (with an at-sign, like eric.smith@cornell.edu), and the servers could sync with each other so that you could still see their posts.

Who would pay for all this software be written? The software would be sold to you like Outlook was, or perhaps open source packages like Thunderbird (Mozilla’s e-mail client) would arise. And who would pay for the servers? Your school, workplace, ISP, or community, and probably you could sign up for a public ad-supported or for-fee service.

And in this model, if you control your client social software, you could have any strategy for what statuses it shows you and what doesn’t, rather than Facebook’s algorithm deliberately designed to addict you. You would be able to pay for the service rather than be thrown into a huge advertising pool.

It’s also fundamentally less monopolistic. You could imagine that someone, instead of using a standardized protocol, released a single client and sold the server software. Other companies or open source communities would soon make compatible software, and since the network of interactions was already decentralized, using those compatible systems would not prevent you from interacting in the same community, as happens to alternatives to current social networks.

Of course, they could also try harder, and force you to use their server, and release a single client, like AOL Instant Messenger did. But then programs like Pidgin came to aggregate that and other messenger clients, so that you could talk to contacts on different messenging systems in the same app.

This type of social network, which is known as a federated social network isn’t an unachievable dream. E-mail used to work this way before GMail gobbled it up, and still does theoretically: that’s why there’s an @-sign in e-mail addresses, to indicate which of many compatible servers you have an account at. Social media used to work like this, too: You could be a member of many listservs or newsgroups, and it would be handled through a single e-mail and newsreader app. Messaging doesn’t work this way, even today, but there is a protocol out there that would work like that, called XMPP: It simply never caught on.

There even exists software, like Mastodon, and a protocol like our hypothetical SSP, called ActivityPub, that does exactly what I just described. But Facebook, Twitter, Reddit and similar sites have stolen all the actual user-base. A social network, of all things, needs a certain critical mass before anyone can really get good use out of it: Facebook is very useful when everyone in your college was socially obligated to have it, less so when you have a niche social network only used by open source enthusiasts.

Before we talk about how or even whether we can or should turn the tide on this, I’d like to point out a side issue: Mobile. On iOS and Android, you do download individual client apps. But most of the time, we use the same model: You use the Instagram app to connect to Instagram services, and you use no other app for those services. WhatsApp messages stay on WhatsApp servers. If it’s the technological layout of the web that makes for this business model, why has it carried over into mobile?

I remember being excited when mobile came out for the comeback of the standalone application. There are multiple Twitter apps available, all posting to the same service and accessing, differently, the same content. But it hasn’t led to a return to a more federated model for new software, or openness in general.

There’s a few reasons for this. One is, by the time mobile platforms started gaining steam, the web revolution had already mostly gone its course. We’d gotten used to that business model. The assumptions that are built in to how the web works — that you would get your client software from the company that also provides the only server it works with — those assumptions had become entrenched enough that a different technology landscape didn’t overcome them.

Another is the closed nature of both major mobile platforms. It is very annoying to put an app on the app store. It is annoying to write one — historically, it was a quite constrained platform. Apple can and will reject you arbitrarily. It increases the barrier to entry, so that established companies have a huge advantage.

But the biggest reason, in my opinion, is that the mobile world and the web world are too entwined. Not only do we expect to use many services from the phone on the computer as well, where the web dominates, but the platforms use the same servers and often, the same frontend code. It is relatively easy, and commonly done, to use some or all of the code that normally runs in a web browser, and instead run it in a browser engine embedded into a mobile app.

So the pattern set by the modern web is deeply entrenched. The end result is a computer as an endpoint for service. Rather than as a tool we control and use directly, it is an adaptable terminal that we use to enter into corporate-controlled environments, where people make their livelihoods and run their social lives, but the rules can change at the companies’ whim.

So how do we return to a locally controlled system again? Anti-trust and regulation isn’t enough — that’ll simply change what companies we do the interactions with. Getting rid of the web isn’t feasible and probably still wouldn’t be enough — we’ve thoroughly convinced ourselves by now that this is how computers are supposed to work.

We need to build an alternative. We need a complete suite of software that replaces all the needs that websites currently have, but which do not rely on the same level of centralization. This requires a lot of work, and while open source software can spontaneously and freely arise as collaboration between companies when technical concerns are at play (Linux, compilers, libraries), when it comes to polished and well-designed products, that usually requires more explicit funding.

So if I were someday, somehow elected president, I would not only carry out Elizabeth Warren’s noble anti-trust plan. I would also fund a government program to give grants to build open source software that could be used this way, with a mission of re-building a computer culture that doesn’t rely on the same level of centralization and corporatization. This would be an effective use of tax money, because what differentiates software from other products is that, once created, software can be duplicated and re-deployed without any natural cost.

And federated social networks would be a small, relatively unimportant part of it. What if craftspeople could easily sell directly to consumers, rather than listing on Etsy? What if cab drivers didn’t have to sign up for apps that take giant cuts for doing very little? What if we had time logging and vacation tracking software for our small companies that actually worked? What if someone didn’t feel like they had to buy an iPhone so they could Facetime their family, but could feel confident using whatever phone they wanted?

The Haskeller's Hungarian Notation

2019-08-11T00:00:00+00:00

When I was first learning to program, a long time ago, it was in BASIC, and you had to annotate your variable names to indicate what type something is. foo would be a number, whereas foo$ would be a string. This meant that there could only be as many types of information as there were symbols to put after your variable, but that was okay for the sort of programming BASIC was used for. These were called sigils, and they helped you keep straight in your head what was going on +++ and made it easier for the computer too. Any aggregates had to be explicitly declared.

Later on, I learned Perl, which had a similar system, but with a twist. A variable named $foo could contain a number or a string — or even some sort of object or reference — but it could only contain one of them. It was a “scalar.” @foo would contain many scalars with indices in an array, and %foo would contain many with string or other keys in a hash map. The computer kept track, dynamically, of the practical types of the scalars, and could easily do the same for the aggregate types, but chose to instead enforce a mechanism where the programmer would be reminded of whether it was a single value or some sort of aggregate that was being discussed.

In Haskell terms, BASIC had you use sigils for data types, but Perl had you use sigils for functors. And not to make people too upset by comparing Haskell and Perl, but Haskellers regularly do the same today, voluntarily annotating variable names with the functors by convention. For example, dmdMenuItems might translate, in a Reflex codebase, to Dynamic of Maybe of Dynamic of list of DomElement.

The usage originally struck me as quite strange, and I didn’t like it. I remember thinking the original Hungarian notation was redundant: int iFoo; literally says int right before it. And besides, wasn’t the point of a type system to not need extra mnemonics, because the compiler will stop you from messing things up?

At my previous job, we used prefixes like m_ and g_ in C++ to indicate scope (member variable/field and global, respectively), and it similarly took me a while to adapt. In those situations, it turned out to help because the sigils told you where to look for more information. If there wasn’t a m_, you looked in the same function, but otherwise you had to immediately go to the class declaration. But that wasn’t the only advantage. What scope something was in was important in how you treated the variable, in many subtle ways that would be bad to confuse, and which the compiler in C++ wouldn’t really help you with.

Similarly, in Haskell, indicating what functor something is in tells you something important: What kinds of things can you do to get a regular value out of it? Do you need to provide a default value (Maybe) or only provide it to versions of functions adapted for it (Dynamic) or perhaps just keep the functor around while transforming the values inside ((<$>), and (<$$>), and (<$$$>)…where which one depends on how many functors). And while the compiler will help us with this, it’s something it’s convenient to see all the time, and the types of each individual variable are sometimes inferred and always not immediately visible in every usage.

And when we do write the pure function or the lambda or the fromMaybe or the dyn_ $ ffor ..., what variable do we name it now? Many times we have many variables with the exact same semantic role, the only difference being what functors they’ve been wrapped with. We want to say ffor dSelectedId $ \selectedId -> ... or fmap (\number -> number + 1) eNumber or let fish = fromMaybe defaultFish mFish. The alternative is, what, judicious use of ' for the different but analogous variables? The difference between these variables, intuitively, is how wrapped up in functors they are, and that should also be the difference in their names.

And I’ve decided this is a good thing. Conventionalized terseness is the least problematic type of terseness. Single-letter abbreviations are great if it communicates information efficiently and everyone agrees on what they mean. I’ve seen dyn and may as well, and I prefer d and m, as they are easier to stack up without getting too unwieldy, and besides, dyn is used for functions and may is also a verb (does mayFish mean something that’s a Maybe Fish or a boolean about whether you are permitted to fish?)

And so, in spite of my initial skepticism, I’ve come to like this naming convention, and I recommend it to all of you as well.

Components of a Modern Operating System

2019-07-11T00:00:00+00:00

In previous posts, we discussed historic operating systems and where various OS features come from, but we only gave a brief overview of how they worked.

Now that we have a modern operating system’s full complement of features, we can look at what components need to exist in a modern operating system to get those features. As discussed with MS-DOS, an operating system, even today, is partially code, and partially conventions, like file formats or rules of good behavior – the difference being, that modern operating systems have more ability to enforce some of these conventions.

These conventions are still important. Linux is considered a version of Unix by the original authors of Unix — even though for legal and trademark reasons it is not — not because it has any code in common (it doesn’t), but because it follows the conventions of Unix.

So on our tour we’ll discuss both more concrete software components that are a body of code, and also conventions that hold the operating system together at various levels.

The Kernel

One big problem with the MS-DOS model is that a program could circumvent its interfaces. It could directly access hardware if it wanted to, without regard to the OS’s file system code, setting the file system conventions in stone. A program could install your own procedures to run when hardware events happened, its own interrupt handlers, and the system wouldn’t stop you.

This wasn’t really a limitation of MS-DOS per se, but of the 8086, the processor MS-DOS was designed for. If code is running on an 8086, it can execute any of an 8086’s instructions, no matter what. A more modern processor – including Intel’s later processors and therefore most of the processors MS-DOS ran on in practice – has a distinction between user mode and a supervisor mode, which will only allow hardware access to take place while the processor is in the supervisor mode (also known as kernel mode).

Application code, regular program code, will all run in user mode. A lot of operating system code can as well: How much code should be actually run in kernel mode as opposed to user mode is a complicated design decision. Certain instructions in the processor are only allowed in kernel mode, including those that control what memory is mapped, or currently accessible, those that install interrupt handlers, and those that control which pieces of hardware the processor is currently permitted to send data to.

In MS-DOS, all code was functionally in kernel mode – or more precisely, in a legacy mode of the Intel processor that emulated a time when the distinction didn’t exist, and all instructions were always allowed. A separate mode, referenced above, put the processor into a different legacy mode where it also acted like an 8086, but invoked special procedures whenever the program executed a privileged instruction, basically allowing MS-DOS to run inside a sandbox inside a larger operating system (I’ve used both Windows and Linux as the larger operating system in this model).

Unlike MS-DOS, a modern operating system will have controls on what is allowed to run in kernel mode, and everything else must run instead in user mode. The body of code that is intended to run in kernel mode is known as the kernel, or kernel code. If someone asks you what an operating system kernel is, this is the answer — the set of code that runs in kernel mode. It might be stored in multiple files, it might be all in one file, and it might be divided into internal components with different names, but that is what the kernel is.

So, if only the kernel can access hardware directly, and most code isn’t allowed to be in the kernel, then how does a normal application access the hardware? Well, instead of accessing it directly, the application must ask the operating system to do the thing on its behalf. Just as the operating system can install procedures as interrupt handlers, for the processor to trigger in case of hardware events, it can install system call handlers, procedures that run in kernel mode but can be invoked in user mode. These procedures will be designed to make sure that the user program in question is accessing the hardware in an acceptable way, and only perform the operation if it is allowed — possibly, there will be no reasonable way for the program to even request an impermissable hardware operation.

This is a key distinction between MS-DOS and even older Mac operating systems: whereas all operating systems provide abstractions, those with an OS kernel can provide mandatory abstractions. This means that, if you want to support new features, you can change what the system calls do, and all programs will automatically adapt to it. If your file system is suddenly stored over the network, programs won’t get tripped up trying to access the hard drive directly. The operating system can insert itself at the level of the system call interface and redirect your request to the network instead — if the system call interface is well-designed.

The Application Binary Interface

So let’s say you have a Windows program, and you want to run it on Linux. Or you have a Linux program, and you want to run it on macOS, which are both Unixes and have a better chance of being compatible. It won’t work — certainly not “out of the box.”

Why? Well, one reason is mentioned above. Different operating systems provide different ways of organizing the functionality of the computer into system calls. They provide different abstractions, which are nowadays mandatory.

For example, on Windows, different drives use different letters, and volumes shared over the network are also assigned letters, e.g. the famous C: drive, or A: for floppies, or X: maybe for a shared drive. On Unixes, different volumes — Unix doesn’t use the word “drive” as often — are assigned different mount points within the system. One volume might be /, and another at /home, and another /mnt/network, and it would provide the illusion of one unified hierarchical filesystem. Imagine if you had — as a simplified example — a system call to assign a drive letter to a network share. This would make sense with the Windows abstraction, but what would it even mean on Linux?

Another reason has to do with how programs are stored on the drive. Programs are not just a list of instructions for the processor. They usually have to be loaded at a particular address. Memory must be mapped for them to store their variables — and how much memory varies program by program. They have to load libraries of other procedures, which may be stored separately through dynamic linking in a shared library (Unix terminology, .so) or a dynamically loadable library (Windows terminology, .dll), which is also going to be mapped at a certain address in memory according to arcane rules.

Different operating systems have different binary file formats, or formats for storing programs (which are often called binaries when stored on disk, although everything a disk stores is in binary). Linux has ELF (Executable and Linkable Format, which can use DWARF to store its debugging information), Windows has PE (standing for portable executable, which falsely implies it runs on more systems besides just Windows). Different Unix varieties have different binary file formats — it’s something that evolves over time. Some operating systems — many operating systems +++ have different binary formats supported, for backwards-compatibility, or for simulating other operating system, or even for different types of programs or programs written in different program languages.

The combination of the set of available system calls, the available libraries on the system, and the format of the binaries, constitute the main blocker to compatibility between operating systems, the ABI or application binary interface, an acronym or phrase that is intended to sum up everything that needs to match for binary compatibility, the ability to run binaries (compiled programs as they are usually stored for running) from one system on another.

The Application Programming Interface(s)

There are other kinds of compatibility. Even though you can’t take the Windows version of a program and run it on macOS, we see plenty of programs that have versions available, right on their website, for both Windows and macOS. Similarly, most phone apps are available in both the iPhone and Android stores.

In some cases, that’s because there’s two applications, written by different teams, that solve the same problem (and have the same branding) or interact with the same servers (which run on Linux and where all the complex stuff happens anyway). But in others, it is substantially the same program that is run on both systems.

In many cases, though, that’s because the versions were written sharing a lot of the same source code, with a layer of software interfacing between that and the specific operating systems in question. This might be because there were different teams (or people) who maintained compatibility layers proprietary to that company (this is what many traditional software vendors do and have done in the past). Nowadays, it is more likely because there was a programming language that has implementations available on both platforms, and versions of the same library functions available for each (which is what Java was originally famous for and what Python does today).

This is fairly common for relatively new programming languages, where the program language was written after the operating system was already around, and where part of the point of the programming langauge is to support multiple operating systems for your programs. For programming in an operating systems “native language,” so to speak – for programming in C on Linux or Objective-C on macOS, it’s a bit harder: An Objective-C macOS program is unlikely to be particularly portable to anything (except maybe iOS).

There are some exceptions to this. A program written for Linux can usually be made to run on macOS, because of their common Unix heritage. Even though Linux and macOS have different ABIs or application binary interfaces, they have very similar APIs, which stands for application programming interface (NB: This term means something different in a modern, web programming context). This means that, although they are not very binary compatible, they are source compatible, or close to it, which is to say, that there are few changes to the source code you would have to make to a Linux C-based program to make it work on macOS. It might be invoking different system calls with different identification numbers when you write the code to open and read a file, but that code looks exactly identical on both platforms, possibly something like this:

    // Simultaneously both Linux and macOS C code
    int file_descriptor = open(filename, O_RDONLY);
    ssize_t res = read(file_descriptor, buffer, sizeof buffer);

As you might have picked up, this applies to only a subset of the functionality. Any GUI-related code would not enjoy this level of portability — macOS and Linux have very different GUIs. More likely this is code intended to be primary run on servers (and perhaps run on a Mac for testing), or code used by programmers (like git and other development tools designed to be run from the command line) or by scientists or other researchers (like the non-GUI components of Matlab and R or even Python).

The baseline API that all Unix-like operating systems have in common is called POSIX. Operating systems are certified as brand-name Unixes based on a bigger API specification, with more functions and more requirements, called X/Open — which is to say that Unix is defined not by where the code originated nor by its ABI, but rather by its C programming API. To be clear, an operating system based on Linux could probably pass X/Open and become legally a Unix, but nobody has decided to spend the time and money to try and make this certification happen. It is the fact that it is as close as it is that leads many of the original developers of Unix to consider Linux “a Unix,” as it is this API that ties the Unix family together.

The Unix/Linux API is so important that Microsoft needed to add it to Windows and that macOS’s native use of it is considered a selling point, especially for developers. This is because a lot of server software and programmer tools assumes this Unix API (as well as, for example, Unix filesystem conventions), or else it assumes Linux which has few enough peculiar features to make much of a difference. Most users are isolated from this, but anyone who has to write software to run on servers (which is most programmers) or use programmer tools (which is all programmers) is very keenly aware of this.

This Unix API is a core API provided by the operating system itself, the official, default way for applications to be written, but the other programming interfaces discussed above are also APIs. That is to say, Java comes with its own API that it brings to every operating system it runs on, leading to it its once-famous “write once, run everywhere” slogan.

The most important API for application compatibility today is something irrelevant to most of this discussion though, and relatively new to operating system history. Most applications that run on your computer today run in Javascript in the very controlled environment of a web browser. Part of what a web browser does is provide a stable, cross-platform (that is, multi-operating system) API for the portion of a web application that runs on each local computer. This interface is so important that many modern apps for phone and desktop are internally implemented as running inside a web browser, or something that resembles a web browser in more or fewer ways.

The System Library/Libraries

We spoke in the last section about the POSIX or Unix APIs. There are a lot of functions that a Unix-like operating system is expected to provide functionality for, in a lot of domains. Some, like opening or reading files, more or less have to be implemented as system calls, at least the most basic versions of them. Others, like calculating a square root, are simply procedures that run in user mode. Still others, like printing a number to the console, have to involve some system calls (to output text to the screen) but also some computation appropriate for user mode (to convert the number into a string of digits).

To provide these functions, Unix-like systems will provide their own version of the C standard library. On most Unix systems, this is maintained by the same organization that maintains the kernel, with Linux as the major exception. The set of POSIX APIs that a Unix will maintain is implemented through the standard library — some of them system calls, some of them implemented in user mode, and the programmer doesn’t have to care which.

In fact, between versions of the same operating system, and certainly between different operating systems, what used to be a system call might become a wrapper around a new, more advanced system call interface, where basically the library is providing compatibility with other versions of the same operating system. This is especially important in Unix, as there’s a lot of calls descended from different branches of the family tree with slightly different semantics, or subtleties of meaning, all of which are used by modern programmers, who can use whichever is more convenient to them or simply preferred.

The library enables source compatibility and API compatibility, even in situations where the kernel itself is much more particular about its system calls. The question is, where does the ABI compatibility layer go? On Linux, the kernel itself is responsible that its updates don’t break working programs, and its founding and lead developer Linus Torvalds is adament and dictatorial — sometimes abusively so — about this rule. If you want a system call to behave differently, what you do in these situations is actually make a new system call that behaves the new way, and leave the old system call available at the old number in case a program wants to use it.

However, all modern operating systems support dynamic linking. This means that the libraries and the main program binary are stored in separate files, and the main program binary specifies the names of the functions it calls, rather than using numbers. If all programs use dynamic linking, and only call system calls through the library, you can update the library to use a different system call interface, and change the kernel along with it. This is what macOS requires +++ while it is technically possible on macOS to bypass the library to call a system call, the attitude is, that if you do that, you should not expect your program to work as expected. The operating system will still ensure it won’t break other programs, but will not guarantee your program to behave the same from version to version.

These are two vastly different approaches to maintaining ABI compatibility. In making the standard library part of the ABI, macOS doesn’t allow static linking, where all code in a process comes from a single file and a copy of the libraries are placed into the main binary when you compile it. It’s not only not recommended — by default, it will not even run statically compiled binaries. If you want to have an alternative version of the C library, you can’t. If you’re writing in another programming language that doesn’t work like C, you still have to go through the C library to talk to the operating system, which isn’t written necessarily with other programming languages in mind.

But, the kernel developers have the ability to control their system call interface better. If they want to add a new system call, they can make their old way of doing it call the new system call, and keep the kernel cleaner. This is important because all code in the kernel constitutes a greater level of vulnerability — if a kernel accesses unmapped memory, it’s generally a kernel panic (the Blue Screen of Death on Windows), but a user process will just crash with a segmentation fault. Or worse, if you exploit a vulnerability in the kernel and manage to manipulate it into doing something for you it wasn’t supposed to, it can literally do everything on your computer. This as opposed to a regular program, which still can only do things the kernel permits it to.

Linux, on the other hand, has more flexibility. You can have statically linked files, your own C library, or libraries specialized for other programming languages. You can avoid all the baggage that comes with its implementation of the C library functions that have nothing to do with system calls.

Honestly, my preference would be somewhere in between. I’d have a smaller library than libc — maybe libsystem — that every program would be automatically dynamically linked to. This would be for things that are usually implemented as system calls, or that were system calls in previous versions of the operating system. These would be things that any programming language might reasonably want to use. The more C-specific stuff would be relegated to its own, more general library. libsystem would be as simple as possible.

Libraries that form part of the main API and that are provided with any installation of the operating system definitely constitute part of the operating system. Libraries that come bundled with specific application or that exist to do certain program tasks are not part of the operating system. Which count as core operating system functionality is up to the operating system vendor, but all operating systems come with at least some libraries, to abstract their austere system call interfaces into something that you can actually program.

The Shell (Command Line)

All modern (non-mobile) operating systems come with a command line interface, whether on the computer or on the server. When you type commands into the command line interface, it isn’t the kernel itself that reads the line you typed and decides how to proceed. Instead, a separate process does that. This process is key to the core job of an operating system — letting you run multiple programs and share resources between them — and therefore counts as part of the operating system, but is also not part of the kernel.

The concept of having the shell be a user process like any other was actually one of the early innovations of Unix over other contemporary operating systems. Before that, the kernel would often be responsible for this. By removing it from the kernel, Unix allows different users to use different shells, with different syntax for advanced features like scripting or running commands conditionally on the results of other commands. Even Windows has two shells now, traditional cmd and its newer “object oriented” PowerShell.

All shells can run any terminal-oriented program, and usually can also be used as a starting point to launch graphical programs when the system supports it, i.e., when it’s a desktop OS and not a server OS.

The Shell (GUI)

Not all modern operating systems have GUIs. Remember that many computers are servers (or embedded devices) where you don’t actually sit at a monitor and keyboard — where they likely don’t even have a monitor and keyboard. But for those that do, the concept of shell can be generalized to the program from which you run other programs.

On macOS this is called Finder, and it dates back to the early pre-Unix Macintoshes. On Windows this is called Windows Explorer. On Linux, and other Unixes that share Linux’s user interface philosophy, there are multiple desktop environments available, each of which handles program launching differently, and each of which usually comes bundled with a window manager that draws decorations around your windows and allows you to minimize, maximize, tile or overlap them. This leads to a rich diversity of Linux systems in their appearance and casual use.

It is usually these graphical shells, these desktop environments, that form your mental image of what an “operating system” is. But that can be misleading. Linux can have one of many different graphical user interfaces — or none at all — and most of what makes Linux Linux will be the same.

So what about Linux, Android, and ChromeOS? Are they the same operating system then, because they all share the same kernel? Linux and Android differ at a deeper level than a shell. An Android program can’t be run on a normal Linux distribution without some layer to accommodate the additional libraries, and vice versa. The different desktop environments on Linux all tend to be compatible with X, a unified protocol for UI interactions, and the many command line shells all run the same set of command line utilities, but Android display is not done through X.

In the case of ChromeOS, the situation is different. The shell in ChromeOS is basically the Google Chrome browser, which is the same thing that on other platforms acts as a single program in a larger context. So many programs nowadays are run through the medium of the browser that it’s become more than a single program in practice — many people only open the browser on their computer and use that for all or almost all of their computer-oriented tasks: one tab open to GMail for their e-mail; one tab open to Twitter; one to Spotify, to play the background music; another to Slack to talk with their colleagues; and finally yet another to Google Docs to do the actual productive work of writing whatever it is they’re writing. Is Chrome a shell in practice on these other operating systems? Is it just an annoyance for some users that there is the taskbar to switch between multiple programs, in addition to the tab bar to switch between multiple websites? Google certainly thinks this is true for some users, and it is for them that the Chromebook is intended.

Operating Systems Part II: Modern Operating Systems

2019-05-26T00:00:00+00:00

We use operating systems all the time in our life, whether designed for a computer, a phone, or for a server we’re more indirectly interacting with, but a lot of people don’t know very much about what connects the different systems we use, and what makes them distinct. We discussed fundamental concepts of operating systems in the last post, so in this post we will discuss how some of the same concepts apply to modern operating systems, going over them one at a time.

macOS

Unix moved on from controlling dumb terminals to having several graphical user interfaces. When Steve Jobs was fired from Apple in 1985, he started a company called NeXT to develop NextSTEP, a version of Unix with graphical user interface ideas, some from his work with the Macintosh, some developed independently:

When Apple was struggling to bring its operating system into the modern era, when Mac OS System 9 was still using cooperative multitasking, Apple bought NeXT and brought Steve Jobs back into leadership to turn NextSTEP into the next version of Mac OS, then called Mac OS X for the Roman numeral 10. In spite of superficial similarities to previous versions – the NeXT interface was changed to look more like previous Mac OS systems – and application compatibility (which was bolted on by running Mac OS System 9 as a single process within Mac OS X, which shows how much more sophisticated Mac OS X really was), the new version was completely different software descended from the original AT&T Unix.

It used to be common wisdom in some IT-savvy crowds (including a Best Buy salesman in my hometown when Mac OS X first came out) to claim that Mac OS X was a version of Linux, but this is not true. Linux is one of many operating systems that come from the Unix tradition, and Mac OS is a different one, sharing much of the Unix core instead with FreeBSD, a much less common version of Unix descended from the version developed at UC Berkeley (BSD stands for Berkeley Software Distribution).

For “desktop” computers, including laptops, macOS is now by far the most installed brand-name Unix operating system, and even if you include Linux in a broader category of Unix-like operating systems, it still is the most popular one on the desktop.

This is in spite of the fact – or perhaps because of the fact – that macOS doesn’t really emphasize its Unix “underpinnings.” Its graphical user interface is proprietary to Apple, and there’s often macOS-specific libraries that circumvent or supercede equivalent Unix ones, especially when focusing on the GUI applications.

They also don’t invest a lot of resources into making their command line interface friendly or powerful. Most Unixes make it easier to install new applications and frameworks via command line, and the command line is not particularly well-integrated with their graphical interface, to the point where it sometimes seems like their GUI is next to Unix rather than being built on Unix.

Finally, strangely for a Unix, Apple does not provide a server version of its operating system, making it difficult for software developers for Macs to be able to run server-side tasks like bulk automated testing on the same environment as their workstation.

iOS, watchOS, etc.

iOS, watchOS, and their ilk are locked-down versions of macOS. Unlike on macOS, each application is locked into its own directory and can only access its own files, rather than being able to access any files owned by the current user. The security features of Unix are applied to isolate applications from each other rather than users, and the user doesn’t really see the concept of the file system — instead, each app simply remembers information for the user, and presents how its organized in its own way.

Since only one application is visible at a time on many of these devices, this gives it a feel similar to an old single-tasking operating system, where each application is more its own universe. Since they don’t visibly share a file system, the applications also interact less with each other.

The most scary thing about these operating systems is that they’re set up to protect the owner of the device “from themselves.” Only Apple-approved applications can be installed unless you jailbreak the device, which voids the warranty. Apple constantly lobbies for jailbreaking to be made illegal, they claim for the users’ protection and to prevent users from illegally copying apps, but also because they get a huge cut of all sales done through iOS apps, which Spotify claims is against European law.

Open Source and Linux on the Desktop

The open source movement, and its more opinionated cousin the free software movement, believe, to various extents, that it is valuable for software to be open source (or alternatively phrased free as in speech). This means that anyone can read the source code to the software, the version of it that is human readable and editable by actual programmers. It also means that anyone can make modified versions of it, and publish them, usually with different branding. Some open source/free software licenses require those modified versions to also be open source, while others allow them to be proprietary, but in all cases, the fundamental nature of open source software is that anyone can make their own version (given sufficient programmers and time).

Linux (sometimes called GNU/Linux because Linux technically only refers to one part of the operating system, the kernel) is an open source reimplementation of Unix. It organizes software in the same way that Unix traditionally would, is written so that Unix programs can treat it as yet another version of Unix (of which there were already many incompatible versions), and follows the design of Unix function call by function call, command by command.

Linux is a really big deal on the server, and as a component of the Android operating system, as we’ll discuss later. It also is usable as a desktop operating system in its own right. It inherited a graphical user interface framework from Unix, known as the X Windowing System or X Windows, and the open source movement inspired a lot of work writing desktop environments within that framework, so that there could be an entire modern desktop operating system that was open source.

Throughout the 90’s and 2000’s, many Linux enthusiasts would hope that someday, a completely open source operating system could reach common use. Articles would be written claiming this was immanent, to the point where it became an easy-to-mock cliche: “This is the year of Linux on the desktop!”

Ultimately, though many companies tried, no one succeeded in arranging for it to be pre-installed on mainstream desktops or laptops nor in polishing it enough to convince the normal user to install it over what their computer came with. It is now a mostly-usable operating system, should you choose to install it on your computer or buy a computer wiht it pre-installed (which is an option some manufacturers now market towards software developers). It is very well-suited for programming for reasons we’ll discuss later, but still a bit awkward for things like setting up Bluetooth or getting interesting features to work.

Windows NT, XP, etc.

The history of Windows is intricate and arcane, and as a result, the Windows 10 of today has virtually no code in common with the Windows 3.1 discussed above. Similar to macOS, the Windows brand at some point was switched out with a better operating system implementation, although in Windows’s case, that implementation came from Microsoft’s “workstation” or “business” version, Windows NT.

Windows NT first came out shortly after Windows 3.1, and to avoid having a Windows NT 1.0, which might sound less sophisticated than the existing Windows 3.1, the very first version of Windows NT was called Windows NT 3.1. It was based off of OS/2, a failed collaboration between Microsoft and IBM to render MS-DOS obsolete, and it did not boot off of MS-DOS nor use MS-DOS as a layer.

Windows NT was designed from the beginning to support programs designed for other operating systems. For more sophisticated operating systems, programs have to go through the operating system to access hardware, by invoking procedures that invoke operating system code, and different operating systems provide different procedures. Based on what program you were running, Windows NT could support many sets of procedures (also known as APIs, but distinct from what API means on the web), which it called personalities.

Windows NT had from the get-go a personality to support Windows 3.1 versions, a 32-bit personality to support new Windows NT programs, and a personality to support MS-DOS (which involved much more machinery to give the program the illusion of more direct hardware access). It also originally came with personalities for Unix and OS/2, which eventually were removed.

As Windows NT supported traditional Windows programs as a personality, Windows and Windows NT co-existed for a long time. Windows 95, 98, and Millenium were versions of Windows that still used MS-DOS as part of their structure and which did not attempt strong security or rigor (though they did adopt preemptive multitasking), while Windows NT 4.0 and Windows 2000 (aka NT 5.0) were versions of Microsoft’s more sophisticated operating system, that could more or less run the same programs but focused on stability and workplace use (with the presumption of professional IT people), rather than Microsoft’s maniacal obsession with application support and its easy-to-use brand.

Eventually, in Windows XP, they made the switch. They risked worse compatibility with really old applications (after all, the operating system was completely switched out under the hood) in order to push everyone towards their more modern operating system. Windows XP was internally Windows NT 5.1 (and remember that Windows NT 3.1 was the first one because it borrowed its number from the other OS called Windows), and it replaced Windows 98 and Millenium as Microsoft’s flagship consumer OS.

Now, they don’t have to maintain two completely different operating systems anymore. Their server OSes are still distributed separately, but that is mostly for licensing and configuration reasons – it’s the same fundamental OS with different features enabled and different auxiliary programs shipped. All in all, Microsoft has a simpler tech architecture now that they’ve pushed everyone towards NT.

This is a good place to clear up a common misnomer: the Windows command line, in modern NT-based Windows, is not a version of MS-DOS. It is only related to MS-DOS aesthetically: It has a similar look to the prompt (C:\>, C:\WINDOWS\>), and similar commands to do similar things (dir to list files instead of Unix’s ls). It is simply the Windows command line.

Furthermore, support for MS-DOS binary compatibility was finally dropped with the transition to 64-bit computing, not because Microsoft wanted to, but because that would require a processor mode that AMD (and therefore Intel) decided not to support in their hardware.

You can’t, on the AMD64/Intel64 platform, have a 64-bit operating system and a “virtual 8086” mode process, where the processor would have to pretend to give you full control over the computer and pretends to be an ancient MS-DOS-era computer while also giving final say to the real 64-bit operating system. Intel32 supported this for 32-bit OSes and 16-bit MS-DOS compatibility, but I suppose the processor manufacturers thought the 64-bit vs 16-bit compatibility bridge was just a bit too far.

Microsoft Windows’s Monopolistic Market Dominance and the Open Source Movement

In the 90’s and 2000’s, Microsoft had a lot of power through Windows. It constituted a monopoly on consumer operating systems, and people were scared to run other operating systems, because application compatibility was a big deal. Only major application vendors had the resources to support two operating systems (which was much harder in those days), and so having a different operating system (especially an ill-supported open source operating system like Linux) could cut you off from the rest of the computing world.

Microsoft used this power to control the application market, because any application it bundled with the operating system would drive any competitor out of business. It did this any time it thought an application was interesting, including writing its own web browser that drove Netscape out of business, finally attracting a lawsuit that almost split Microsoft into multiple companies. When that didn’t happen, it looked bad for the computer industry.

Microsoft also had corrupt relationships with computer manufacturers. Deals were signed where the hardware vendors would have to exclusively install Microsoft Windows on their computers, or else pay Microsoft based on how many total computers they sold rather than how many came with Windows. This meant that Microsoft didn’t actually have to improve Windows to compete; they could just rest on their laurels due to their shrewd and blatently illegal business dealings.

At that time, it seemed like the only way to break Microsoft’s competitive hold was compatible, open source alternative versions of everything. OpenOffice was written to try to be an alternative to Microsoft Office, but it was a non-starter unless it could read and write Microsoft’s proprietary Office file formats. Similarly, Mozilla Firefox, the first web browser to erode Internet Explorer’s hold on the web, only worked on many sites because it used to be configured by default to tell web servers that it was Internet Explorer rather than identifying itself honestly.

The crown jewel of this effort would have been working compatibility with Windows programs on another operating system — at the time, that was often seen as the only hope for breaking Microsoft’s monopoly on operating systems. Two efforts co-existed in that regard, Wine and ReactOS.

Wine was the more serious effort, which would have allowed Windows programs to run unmodified on Linux, including Microsoft Office, which was the only program that could perfectly read Microsoft Office documents. Wine would provide Windows applications with a personality, like Windows NT had, where they could call Windows’s library functions, and have them translated into the equivalent series of Linux library function calls to get their work done.

ReactOS was fascinating to me at the time because it attempted a complete open source reimplementation of Windows NT. Programs running on ReactOS would act like programs running on Windows because the operating system was designed from the beginning to act like Windows.

Neither of these projects gained enough stability to be used in any production setting. What ultimately lessened Microsoft’s stranglehold on power was the fact that nowadays, it’s not really relevant for most applications what operating system you use, because applications have transitioned to the web for deployment.

Nowadays, when you want to do something new with your desktop or laptop computer, you don’t install a new application (although interestingly, you still do with your phone). Instead, for the most part, you go to a website, whether for matchmaking services, communicating with people through many different means of communication, or ordering food to your apartment. The local program you buy at a store, or even download over the Internet, has been obsoleted by just going to a website, where you don’t even need to install anything. And as a result, Microsoft’s biggest stranglehold was eroded from a direction they barely expected.

They tried to hold on, as long as they could, by making their web browser, Internet Explorer the standard web browser, and encouraging websites to use Internet Explorer specific features. Eventually, Firefox was compatible enough with Internet Explorer to break through that monopoly and force Microsoft to update its browser, which led to the current situation — where Chrome is becoming the new monopolistic web browser and now it is Google that is close to single-handedly controlling our primary platform for deploying applications.

Linux and Unix on the Server

I mentioned before that Linux and macOS were both popular among developers. Linux certainly allows a lot of customization, and you could see how that would be appealing for advanced users like many developers are – but that doesn’t really explain the popularity of macOS, which is the opposite.

Really, Linux and macOS are popular among developers because they are Unixes. Unix — and Linux, which is now basically the best Unix for most tasks — never waned in popularity in the minicomputer space, which evolved into the server space. When you are running a server, having a powerful (and programmable) command line is a huge plus, and not having a smooth GUI experience or drivers for every consumer device is a non-issue. Linux is the de facto standard for server operating systems now, and when developing applications to run on the server (like the server side components of any web application, including Facebook, Twitter, GMail, and more or less any you can think of), it is useful to have a match between what you run on the server and what you run on your personal computer.

macOS provides a close enough match to Linux servers to be useful for development. Most Linux software also runs on macOS, because of their shared Unix heritage and continuing efforts to keep compatibility. The compatibility isn’t perfect, and many programmers like the flexibility that comes with Linux (and don’t mind the inconvenience), and so Linux is also popular among developers as a client OS.

Windows is actually actively trying to catch up with macOS in this domain; it has introduced the Windows Subsystem for Linux, an NT-based personality that allows Windows to run Linux programs, unmodified. This is an impressive technology marketed at devleopers and used for practical applications by many people I know.

What is a server?

What does a server do? It waits for incoming connections from other servers and from client computers like your laptop or phone, and responds to requests. It stores your data in databases and file systems, and does the heavy lifting that needs to be done by a more powerful computer than you really need to have in your own home. We interact with servers every time we use a web browser or an e-mail client, and most phone apps and games have a server-side component — certainly if they involve coordination with other people and other phones!

As the “cloud” grows as a concept, more and more of our computing is done on servers owned by big companies. We store our documents and spreadsheets on Google Drive, keep our contact information on iCloud, or let our photos be saved on Instagram. All of these services use Linux to power the servers that actually store the data and provide it to us in an organized and secure way.

Android

As mentioned earlier, Linux is technically only one component of the operating system called Linux (or rather the family of operating systems, because many companies and organizations leverage its open source nature and distribute their own Linux-based operating systems, and there is no one official complete distribution), namely, the kernel. The kernel is the portion of the operating system that runs in a privileged mode on the processor, which forces the applications to go through it rather than access the hardware directly (as on MS-DOS).

Android uses the Linux kernel — but nothing else from the operating system commonly called Linux. Like iOS, it uses its kernel in an idiosyncratic, locked-down way — not quite as locked-down as iOS, but much more locked down nevertheless than any desktop operating system.

Android is open source, but you need to pay Google to use their app store and standard apps and brand. Off-brand Android can only be used in practice by companies rich and powerful enough to build out their own app store, like Amazon. Being able to run Android apps would be a relatively easy way for another mobile OS to gain a pre-existing developer base.

ChromeOS

And Google somehow, after writing Android, wanted yet another Linux-based operating system. ChromeOS, popular in American public schools like Mac OS was in my school days, is exactly what it sounds like: a laptop operating system where you just run Google Chrome. With so many apps in the browser anyway, what’s the downside?

In a ChromeOS context, from a user’s point of view, you begin to wonder what the difference is between a browser and an operating system, really. An operating system lets you run multiple applications — but now those are just different browser tabs. Who cares whether the Linux kernel or Chrome itself are the pieces of software that separate the applications from each other — from the user’s perspective, it’s all the same.

If you unlock the developer mode, you get a somewhat dumb version of “Linux on the desktop,” with a Linux command line interface. This is convenient for people who only want to use the web and log into remote servers, which is a surprisingly large demographic.

What is an operating system?

2019-04-28T00:00:00+00:00

A user of modern technology hears the term “operating system” thrown around a lot. Most people can name a few examples: Windows and macOS on workstations and laptops, iOS and Android on phones. Some people might even throw in Linux or Unix or ChromeOS. Most people also understand that a program or a game or even a sufficiently advanced website might work on some operating systems but not others, and might require different versions for different operating systems. But it’s a bit less clear what an operating system actually is, how it fits into the general model of a computer, and how it works.

This isn’t surprising, because “operating system” is a bit of an amorphous concept. Is it a type of program? It’s certainly different from most programs we think of!

It wasn’t my idea to ask this question. I listened to a talk recently by the lead programmer on a project to develop a new operating system, and he spent at least the first quarter of the lecture and many slides trying to come up with a workable definition that jived well with most programmers’ and users’ intuitions. [Edited to add: It was Bryan Cantrill, who brings this up in multiple talks. I am unsure which one inspired this.]

But now that I’ve heard the question posed, I feel compelled to try to answer it. So, to explore this concept, I’m going to talk about a lot of operating systems from history. These aren’t going to be the operating systems that invented the models in question, but rather typical examples of those models, especially very popular operating systems of their era and ones that were direct predecessors to popular operating systems today. All of the fundamental technologies discussed pre-date the operating systems I discuss to typify them.

Computers Without Operating Systems

To see what an operating system is, and why we might want one, let’s imagine a computer without an operating system, or perhaps with a very minimal operating system. Such computers once existed; people my age or older might remember the Apple II or the Sega Genesis. A more recent example might include earlier versions of the Game Boy. These computers (and a game console is a type of a computer for these purposes) could only run one program at a time; if you wanted to run a different program or game, you had to turn the device off, insert a new floppy or cartridge, and turn the device back on again.

The same physical machine took on an entirely different interface based on what software you provided. Each program has full control of the computer while you’re running it, to the extent that you have to turn the computer off to stop running the program. Each program also managed its own storage; you would save your Sega Genesis games on the cartridge, not the console, and could then resume them on your neighbor’s console if you wanted to.

This is very different from how computers with operating systems work, and leads me to the following definition of an operating system: an operating system is a set of software that allows multiple programs to co-exist on a computer. You need an operating system to, for example, reasonably have a permanent hard disk, because there needs to be some or another convention as to tell which programs should write their data to which portions of the disk.

A Minimal Operating System: MS-DOS

This definition includes older operating systems like MS-DOS (see the original source code), Microsoft’s flagship operating system from the 80’s and early 90’s. MS-DOS only could run one application at a time, like the Apple II or the Sega Genesis. The difference is that MS-DOS would at least let you share a hard disk between applications and it also let you switch which application you were using without rebooting or inserting new media. Sharing a hard disk between programs was its defining feature, to the point where DOS actually stands for “disk operating system.” MS-DOS shared this acronym DOS with other, similarly featured microcomptuer operating systems of its day, which also focused on simply letting programs share a hard drive.

To share a hard drive between multiple programs over time, all the programs have to agree on how the hard drive is organized. It wouldn’t do for a game to store its game data on sector 13 of the hard drive when a word processing editor wanted to store its list of documents on the same sector. The hard drive required not only an organization scheme, but one shared between different programs by different authors.

This was done through a file system, which allowed you to assign names to long blobs of bytes, called files. A programmer could have a program store whatever it wanted in the files it created, but as long as it created files with different names from the other programs, the operating system, with its file system, would ensure that the data could be found again without each program having to have its own, possibly conflicting, ideas of where to look directly on the disk.

On MS-DOS, these files had to be 12 characters long or less: 8 characters of name, a dot ., and an 3-character extension, for example, teleport.doc or taxr1998.xls. The extension served as a convention to indicate which program was supposed to care about this file. Your spreadsheet program would let you save spreadsheets on the same file system that your word processor would let you save your documents — some mechanism was needed to say which program should be run to make sense of which blob of binary bytes, especially because the first version of MS-DOS didn’t even have support for directories (which we now might call folders).

If you opened a file with the wrong program, the program might notice you used the wrong extension — or it might not, and give you gibberish results from misinterpreting the data. It would certainly encourage you to save files with the proper extension — a concept that survives in Windows to this day, where programs only offer to open files that have an appropriate extension.

By modern standards, MS-DOS and its file system didn’t do very much. It didn’t stop a program from modifying files intended for another program — or even from wiping the computer entirely; it simply created an organizational system that allowed programs to co-exist and store their data in an organized fashion, as long as the program’s were well-behaved and not buggy (or malicious).

It did have to define a format for programs themselves to be stored on the disk. You could tell which files represented runnable programs because they had the extension com (for “command”) or exe (for executable). It also had to provide a program to launch your application programs: This was known as a shell: It was the first program that ran when you turned on the computer, and you could use it to select other programs to run. At the time through a command-line interface: It would prompt you with the text C:\>, and you would have to type the name of the file that contained the program you wanted to load (or alternatively do some very basic file management directly from the command line through built-in commands).

Besides its core mission of providing a system to operate a disk, the “disk operating system” did also have other code, to help programs interact with the hardware. As most components besides the disk could be used by the programs however they wanted without damaging others (because only one ran at a time), this code wasn’t as essential to its functionality, but it did exist. Software used to interact with hardware is called drivers, and they might be included in an operating system or might be loaded separately, depending on the design. Driver code is organized into procedures that programs invoke to do things to the hardware (e.g. draw on the screen or print a file), or code that is installed as interrupt handlers so that the processor will interrupt the current task whenever a certain hardware event happens (e.g., what to do when the user presses a key). Because MS-DOS was so minimal, both types of drivers could be circumvented.

And in actuality, application programs could circumvent the driver that was the most core to its role as a “disk” operating system — the driver for the hard drive, and the layer that allowed you to edit it in terms of files. MS-DOS couldn’t even force programs to use its procedures for the one abstraction it absolutely had to maintain. Though the existence of official filesystem procedures provided some stability, many programs circumvented these procedures and modified the hard disk directly, (hopefully) making sure to respect the conventions but not using MS-DOS’s actual code. MS-DOS, especially at first, was a little bit of code, and a lot of “gentlemen’s agreement” — it had no security or rigor whatsoever.

This had some upsides. Every application had access to the full power of the computer. Microcomputers were much slower then, and so every ounce of direct hardware access could be a major performance boon, especially for games. Furthermore, many applications supported hardware that the operating system itself could not: In MS-DOS days, you often had to do separate sound card or even graphics configuration for every game you had, but at least you weren’t limited by what Microsoft had chosen to provide support for.

It also had some downsides. Obviously, securing your files was impossible: there was a way to mark files as read-only, but it could only be advisory. There was no system of multiuser file ownership — though an application could individually provide an encryption feature. These downsides weren’t too bad — if you trusted everyone who used your computer, it wasn’t really a problem. It’s generally better anyway to secure your computer with encryption or just by putting it in a locked room.

More importantly, this was a hazard for the stability of the system. Any program could decide to circumvent the standard ways of doing file access, and many did, to cut corners on performance. But many different pieces of code all interacting with the same file system is many opportunities to mess up and have bugs instead of just one. There was a real risk of a poorly-written program corrupting your file system, deleting files it wasn’t even supposed to touch or potentially rendering the entire filesystem unusable.

The biggest long-term problem for Microsoft was a subtler version of this: If Microsoft wanted to change the file system — if they, for example, wanted to make filenames longer than 8.3 (so you could say real_long_name.html instead of rllngnam.htm), they couldn’t just go do it themselves. Changing a bit of code is easy. Changing a subtle gentlemen’s agreement requires all the gentlemen in question to agree. If they had changed the format to allow more characters, programs that used their officially recognized libraries would keep working, but those that accessed the file system on the hard drive directly would be following the old ways when the conventions had changed. They would be thrown off by the long filenames like old people thrown off by how young people dress. The software that followed the old conventions could easily accidentally delete data that no longer follows them.

If this were just an occasional program that was doing things its own way, then Microsoft could just break that one program. Unfortunately, many many programs had their own ways of accessing the disk. The “disk operating system” couldn’t even keep control of its central feature.

The other major downside of MS-DOS and OSes like it is that you couldn’t run multiple programs at the same time. It allowed different programs to run in sequence, and to share permanent resources (the filesystem). On a modern operating system we take for granted the ability to multitask programs. We listen to music while being ready to receive a call at any moment — and to return to the music when the call is finished. We expect to be able to look up directions or text messages while talking to our friends while a file is downloading in the background. This takes much more sophistication than MS-DOS could provide.

Luckily for those who wanted multitasking, many systems existed to add multitasking to an MS-DOS installation. Because MS-DOS was so minimalistic, an MS-DOS program took full control of the computer when it was run. If it used that control to dispatch between multiple, simultaneously running programs, it fits our definition of an operating system: a software system that allows multiple programs to coexist on a computer. Basically, operating systems existed that used DOS as their launching point, taking over the computer and providing richer and more modern services to the programs running under its scope.

These programs/OSes were called “DOS extenders,” and the most famous of them was written by Microsoft, DOS’s vendor, to add multitasking (and GUI, which in the personal computer world often went hand in hand) to their otherwise primitive operating system. This was called “Windows.”

For those of you who don’t remember this era, Windows was not always the operating system a computer would immediately boot into. It used to be that Windows masqueraded as a MS-DOS program, that you’d boot up the computer and see a command-line prompt, and have to type win before you saw any graphical user interface whatsoever. Without a preexisting MS-DOS installation to set up the file system and do initial hardware configuration, you couldn’t run Windows at all — not that Windows wasn’t sophisticated enough, but it had always been run that way, and so it never replicated that functionality in the boot process. Similarly, Windows at the time was constrained, just as DOS was, by its 8.3 filename convention. It had to share a filesystem with DOS programs, as it was itself a DOS program — as well as an operating system in its own right.

By the time Windows had gotten to version 3, it had the ability, on sufficiently powerful computers, to run multiple copies of MS-DOS at the same time and an MS-DOS program in each of those copies — and yet, at another layer of abstraction, it was itself a program run from the one copy of MS-DOS that your program booted. Microsoft cleaned up this situation in Windows 95, which still used DOS internally as part of its boot process, but went straight to graphical, Windows mode when the computer turned on.

Cooperative Multitasking

Windows 3 supported graphical user interfaces and running multiple programs at the same time, and so did Mac OS System 7, both from the early 1990’s. However, multiple programs did not, and could not, literally run at the same time — the processor executed instructions in a stream and that stream of instructions represented only one program at a time.

To maintain the illusion of running multiple programs at the same time, these systems used cooperative multitasking. In cooperative multitasking a program runs for a short amount of time, and then it is expected to yield control of the processor back to the operating system.

In a graphical user interface, this usually corresponded to an event of some sort. When the user clicked in some window, the program that owned the window would get to run for enough time to decide how to respond to it: what internal memory should it update, what should it write to the hard drive, and what new things should it display on the screen. Once it was done handling the event, it would return to the operating system, which would then see if the user has clicked a key in the meantime, which might mean sending an event to another program. The program could also, however — maliciously or accidentally — not return to the operating system, in which case the computer would simply hang and refuse to respond to more input. This is why operating systems of that time would regularly freeze completely in the presence of a poorly-written program.

The memory of all the programs were loaded in memory at the same time, and there was nothing protecting one program’s internal data from being overwritten, maliciously or accidentally, by another program. Basically, the different programs could be thought of, in a modern sense, as collections of loadable event-handling subroutines for one graphical interface system. They were kept separate again by convention, by gentlemen’s agreement.

For certain background tasks, like playing music, the code to keep sending data to the speakers has to be run repeatedly, on a timer — so any apps that use that feature can crash the computer at any time by simply failing to complete.

So while these operating systems were more sophisticated than MS-DOS and its cohorts, in another sense they promised more than they could deliver, and relied even more on the good behavior of the programs they managed.

They allowed multiple programs to run simultaneously, but actually required more out of the individual programs to have a harmonious system. After all, if an MS-DOS program crashes, the computer could be rebooted, but at least you only lost your work in that program. If a Windows 3.1 or Mac OS System 7 program were to crash, you’d lose work in all the other programs it was “multitasking” with.

By this point, there were stronger protections against a program circumventing the operating system with its own drivers. It was still generally possible, but less likely to be done. This is important, because while in MS-DOS, it makes perfect sense for each program to define what happens when you click the mouse, on a graphical system, the mouse has to control a mouse pointer which moves from window to window and acts the same whichever application is in the foreground. When more than one application runs at a time, more hardware becomes shared resources, and so the operating system must take on responsibility for it, even if this responsibility is only carried out cooperatively.

Windows wasn’t Microsoft’s first attempt at a more robust operating system than MS-DOS. For a while, it tried to market a more sophisticated version of MS-DOS, still command-line centric, but without many of the deficits we’ve discussed. This operating system was Xenix.

Xenix was Microsoft’s entry into a longer, older tradition of the Unix operating system. This tradition is mostly present today in Unix’s off-brand workalike clone, Linux. It is from the world of minicomputers, which is what we used to call what we now call server-class computers, from before the primary use of them was to provide centralized infrastructure for other “client” computers.

Before any of the other operating systems we’ve discussed, Unix was developed at Bell Labs for minicomputers (see the original source code. Don’t let the name fool you — they’re named because they’re the size of a refrigerator rather than the size of a warehouse room like a mainframe. It ran on a single computer that had multiple dumb terminals connected to it, which means that there was a non-computer device that the user would sit at, and use a command-line interface to interact, over the phone or some other connection, with a centralized computer that was shared with other users.

In such an environment, the laxness of MS-DOS or Windows 3.1 was simply unacceptable. While security against malicious users was not necessarily important, depending on your user-base, there needed to be some level of robustness against ill-behaved programs, especially as at the time, most computer users would regularly write new programs that could easily behave poorly, as they were still being developed.

More importantly, programs would often have to bulk-process data. On the spectrum of “consumer interaction” to “serious work,” these early minicomputers were very much on the side of “serious work” in their common use cases. You might leave a program running for hours as it processed a large bulk of data. You didn’t want to have to worry about letting other users’ programs get a chance to run — at the very least, you didn’t want to have to put active effort into making it possible. It would be inconvenient.

On the hardware side, these computers’ processors, like processors on microcomputers (as personal desktop and laptop computers were once called), processed one series of instructions at a time. Something had to be done to give each of the users the illusion that they were the only one running their tasks on the computer.

If a process — meaning a currently active instance of a user running a program — was waiting for more data, because it had requested a read from the operating system (which mediated all reads from files or any terminal), it was similar to the cooperative situation: the operating system would suspend or block the execution of the current process, and schedule it again when the read had completed, perhaps in response to a terminal user hitting the [Enter] key.

But there could be long gaps between when a process would enter into a blocked state like this. A user could try to calculate a million digits of Pi. On Mac OS System 7, some sort of yield function would have to be called from time to time, to give other events a chance to be handled, but ideally we don’t want that complexity to be passed onto the application programmer.

Instead, before letting a process run on the processor, the operating system will first set a timer in the hardware. When the timer goes off, it will cause a timer interrupt, where the processor will stop what it’s doing and run an operating system procedure instead. That operating system procedure will suspend the currently running process, using features of the processor to make it so that when the process is resumed, it is almost impossible for the user — or even for the program — to detect that it had ever been interrupted.

In that case, while we hope that only one user is running a complicated task at a time, even when multiple are, their long-running tasks simply split the processor 50/50 — or in some other proportion deemed fair by the system’s scheduler.

For every purpose but speed, however, the user has the illusion that they’re the only one using the computer, although in fact many users might be using it at the same time. Just as sharing a disk was the primary feature of MS-DOS, splitting processor time was the primary feature of Unix, as evidenced by its original full name, the “Unix Time-Sharing System.”

Time sharing was often, but not always, paired with memory protection, the idea that a process was limited in what memory it could modify, and isolated from other processes. This was a feature that most minicomputers had, but that it took a longer time to mature on microcomputers. This feature usually goes hand-in-hand with a mechanism to force programs to interact with hardware through the operating system, which also requires hardware support, known in the Intel universe — appropriately — as protected mode. MS-DOS did not run in protected mode. Windows 3 could. Windows 95 always did.

There were other time-sharing systems of that time, but Unix was one of the most famous, partially because it has survived in continuous evolution to this day. Its off-brand open source clone, Linux, is the most popular OS for servers as well as part of the Android operating system for mobile devices. One of the more popular workstation operating systems, macOS, is nowadays also a fully licensed brand-name Unix.

I bring up Unix to show that time-sharing features pre-date MS-DOS and much of the microcomputer era. They were considered overkill for microcomputers while they were still underpowered, but they existed in other contexts. At the time, the focus was more on supporting multiple simultaneous users — the fact that a single user might be able to run multiple processes at once was a minor side benefit. After all, these systems were mostly command-line based, and it was only possible for a user to interact with one process at a time (per terminal), so besides background computation (which some users did really care about), it didn’t have the same immediate practical use as being able to edit your Word document while playing music.

So why did cooperative operating systems ever exist, if Unix predates Windows 3.1 and MacOS System 7? Well, they existed in different domains. Preemptive multitasking was difficult to program, and was mostly available on operating systems for minicomputers — more powerful systems than individuals could generally own — or else expensive desktop computers known as “workstations” for particular specialized jobs.

The operating system, is, after all, about coordinating between programs in sharing hardware resources. It makes sense that what those hardware resources are should influence operating system design. When it is a single terminal and no disk, you barely need an operating system, but when it is a graphical user interface, you need more of one, and when it is several terminals, you have different needs. Nowadays, we expect a lot out of simple devices, beyond what would be necessary to get good use out of them, but in the past, the hardware (and human/programmer) resources were not not as up to the challenge.

Modern operating systems combine all of these concepts, and provide graphical user interfaces while using all the technical advantage of time-sharing and memory protection, and more can be read about them in the next post.

Function Pointers in C and C++

2019-02-26T00:00:00+00:00

Programmers of functional programming languages will often point out that, in functional programming languages, the order of the arguments is often significant, because of currying. If you have a function that takes two arguments (e.g. map which takes a function to apply and a list to apply it to) it actually takes the first argument, and returns a function that takes the second argument and returns the final result. This makes it more convenient to write a lambda where the second argument is the unknown parameter: \x -> map someFunc x can be written as map f, whereas \f -> map f someValue has no such convenient shorthand (flip map someValue is actually clunkier).

To this, I sometimes respond that the order of arguments is significant in C (and thus its hipper cousin, C++) as well. This is most obvious in a function that uses variable arguments like printf: the first argument tells the compiler what to expect from the others. If you write printf("%s %i\n", "foo", 3);, we know from the first parameter that a char* and an int are expected later. If, however, we just have printf("Hi!\n"); it takes no further arguments.

The C mechanism used to do this, called “varargs,” works from left to right only. You declare the function as int printf(const char *fmt, ...);, and then during the function dynamically decide what the further arguments are. You could not instead arrange to have the last argument be the format string and then on that basis determine how many previous arguments there would be. The C programming language allows functions to dynamically determine what arguments they take, but only left to right.

ABI Considerations

This has consequences for the ABI, which specifies for each platform how C function calls are represented as assignments to registers or writes to stack memory. For any function that takes varargs, this left-to-right dynamic argument reading must be supported. This means that if an ABI assigns the first parameter to r2 in a varargs function with one parameter, it had better assign it to r2 in a function that takes that parameter plus an additional one. If it assigns the first four parameters to registers when there’s only four parameters, it had better use the same registers when there’s more than 4 parameters as well.

And, in practice, this doesn’t just apply to varargs functions. Other functions will have the same ABI. The standard doesn’t explicitly require this, but C does allow traditional K&R declarations (int printf();) or even implicit function declarations (in older C standards that are still common enough to be worth considering), so that you might not be able to tell when you’re calling a function what its official signature is or whether it takes a variable number of arguments. The way printf("%s %i\n", "foo", 3); is called, on a machine code level, will be the same whether printf was declared int printf(const char *fmt,...);, as int printf(const char *fmt, const char *arg1, int arg2); or as int printf();.

The principle is always the same: You never need to know anything about the latter arguments to access the former arguments. Number of former arguments, the type of the former arguments — fair game. Latter arguments? Right out.

Function Pointers and Callbacks

This has an interesting consequence for function pointers. What follows is not, strictly speaking, endorsed by the standard, but the standard is written in such a way that ABI designers have to make it work, and I haven’t seen a compiler optimization yet that breaks it.

Let’s say you have a function pointer used as a callback. Let’s say it gets called whenever data comes in on a socket. It would receive perhaps a pointer to the buffer of the incoming data, and a size indicating how much data, and would return how much of the data it had consumed. It would therefore have a signature that would look something like this:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);

The arguments and return value make sense for what it does, and are all absolutely necessary for a callback that acts like that, except for one, context. The context parameter is a convention in C that allows the same function to serve as a callback for different situations.

For example, if we wanted to write the data that came into the socket to a file, but wanted to write to different files based on which socket the data had come into, the context might indicate which file to write to, and perhaps even what to do in case of a write error (which, if it is a function pointer, might similarly require a context):

struct callback_data {
  int fd;
  void (*error_callback)(void *context);
  void *context;
};

size_t write_to_file_callback(const char *buff, size_t size, void *context) {
  struct callback_data *data = context; // No cast required in C
  ssize_t res = write(data-&gt;fd, buff, size);
  if (res &lt; 0) {
    data-&gt;error_callback(data-&gt;context);
    return 0;
  }
  return (size_t)res;
}

And then we’d register the callback along with the callback_data it corresponds to, which would then be stored by whatever socket library we were using, without any knowledge of what that data would mean.

Now, let’s say that you have a function that just prints the data to the screen, and doesn’t care which context was used:

size_t print_data(const char *buff, size_t size) {
  return write(1, buff, size);
}

Or, for a more extreme example, let’s say that you have a function that panic-quits the program, that you want to be able to pass to any function that takes a callback, no matter what type of callback it takes:

__attribute__((noreturn)) size_t panic() {
  abort(); // Or you could just use the library's abort function...
}

Can you use these functions as the callback, if the callback type is defined as process_data_cb is above?

Officially, the answer is no. Certainly, this sort of thing won’t compile:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);
process_data_cb = panic;

But, if you include a cast, it will:

typedef size_t (*process_data_cb_t)(const char*, size_t, void*);
process_data_cb_t cb = (process_data_cb_t)panic;

And will it work? Well, try it! You will find that it will.

Why? Because the function we’re calling takes a prefix of the parameters we’re calling it with, and so we’ll be writing to the right registers for that function to read. It just won’t read the registers with the parameters that it doesn’t have — which is fine, it didn’t have to anyway.

And the return type is the same. This is important, because return types don’t have anything to do with varargs. Returning a struct can add a secret first parameter in some ABIs, changing which register goes with which parameter for every parameter.

Implications for Programmers

Is this a horrible hack? Perhaps. Is this officially allowed by the standard? Not really — although it works on all compilers and platforms I’ve tested it on, which is all the ones I’ve developed on.

It certainly wouldn’t be the end of the world to avoid this nonsense and write wrapper functions:

size_t panic_cb(const char*, size_t, void*) {
  abort();
}

There are two problems I have with this. First, this can create a lot of boilerplate for the very lightweight operation of turning an existing function into a callback. C++ lambdas help with that (but they’re not available in C) yielding pretty light-weight, low-boilerplate results:

// With lambdas
register_callback(some_socket, [](const char *, size_t, void *) { abort(); });
// With a cast
register_callback(some_socket, reinterpret_cast<process_data_cb_t>(abort));

But then again, C++ already has better mechanisms than this void *context pattern for callback functions. std::function handles these things anyway for situations where the callback must be stored, and templates can be used to take functors when the callback need not be.

The other problem is a little harder to avoid: performance. By doing a cast, we can shave time off of an extra function call. In most situations, this doesn’t matter, and wouldn’t be a reason for a hack — if it is a hack. But there are some situations where every little bit of performance matters, and function pointer stuff like this can be hard to optimize.

Specifically, most C++ compilers could improve the overall performance of std::function by adopting a variant of this trick — but more on that in a future post.

My Personal Opinions

I think the standards of both programming languages should be amended to require this. In fact, I think calling a function with extra arguments in general should only be a warning, and that functions with fewer arguments should be able to override functions with more arguments in C++ (assuming appropriate use of POD types). Unfortunately — or fortunately — that is not my call to make.

And more importantly than all of this, I think this fact about C and C++ ABIs is something that every serious C or C++ programmer should be aware of. And I think it should be used within the standard library (in the implementation of std::function) wherever the platform is known, readability is relatively unimportant (the standard library is maintained by C++ experts) and performance improvements are possible to help every user of that library.

computers on The Coded Message

Why can't you request changes from yourself on GitHub?

The AI Non-Economy: A Rant

Large Language Models Should Have to Obey Copyright

Thievery

The Legal Question

AIs Are Not Humans

Can C++ fix its biggest problem?

Can we migrate C++ programmers to a safe programming language?

Can C++ itself be made suitably memory safe?

Conclusion

Asahi Linux Again

Wayland and Sway

Box64 for Baba Is You

What Bits Mean: Meta-Data and Static Typing

What Bits Mean: Binary Integers and Two's Complement

Storing Numbers in Binary

Adding Numbers with Circuitry or Program Logic

Overflow

Modular Arithmetic

Two’s Complement

Summary

Sorting Polymorphically in Many Languages

Sorting: A Polymorphic Function

Programming Language #0: Sorting in C

Programming Language #1: Sorting in Java

Programming Language #2: Sorting in C++

Programming Language #3: Sorting in Haskell

Programming Language #4: Sorting in Rust

Conclusion

Minor News: Some Repos on GitHub

Repo #1: Crate Version of Prefix Ranges

Repo #2: Texas Hold-Em Library/Quiz App

Rust Is Beyond Object-Oriented, Part 3: Inheritance

Why do people like inheritance?

What do I mean by inheritance?

What does inheritance actually do?

But what about the virtual methods?

So what can we do instead?

What should I actually do in Rust instead of inheritance?

Endianness, and why I don't like htons(3) and friends

Why Little Endian Bugs Us

When Endianness Comes In

The Main Argument: Why I dislike htons and friends

Using These “Big Endian” Types

Conclusions and Loose Ends

Operating Systems: What is the command line?

Graphical User Interfaces

The Command Line in Brief

What is the command line not?

History of the command line

What are some modern command line programs?

Why use the command line?

How does the command line actually work?

Footnotes

Can computers think things?

My Dream C++ Additions

Explicit self reference instead of implicit this pointer

A new byte type for uint8_t and int8_t

Real if-else Expression Syntax

Variable Shadowing

First-Class Support for Sum Types

Conclusion

In Defense of 'C/C++'

C++ Papercuts

const is not the default

Obligatory Copying

By-Reference Parameter Papercuts

Method Implementations Can Contradict

“Modern” C++

Conclusion

New Link: Technical Only RSS

Walk-Through: Prefix Ranges in Rust, a Surprisingly Deep Dive

The Problem

Questions

Testing prefixed

Constructing an upper bound

Walking Through chars

Cleaning Up the Edge Case

Performance

The Main Argument: Why I dislike `htons` and friends

Explicit `self` reference instead of implicit `this` pointer

A new `byte` type for `uint8_t` and `int8_t`

`const` is not the default

Testing `prefixed`

Walking Through `char`s

`serde` flattening

`let` surprises!

Remember: `serde` `struct`s Can Be Function-Local

Alternative #0: `enum`

Defaults in Rust: the `Default` trait