Welcome to The Coded Message! on The Coded Message

Why can't you request changes from yourself on GitHub?

2024-09-04T00:00:00+00:00

I was recently working on a (company-internal) GitHub pull request I’d written. A colleague left a few comments in a review and “requested changes” from me, effectively giving me a TODO list of items that needed to be done before the PR could be merged. Because he’d specified that he was “requesting changes,” GitHub knew to prevent someone from merging the PR before those requests had specifically been addressed.

Once I’d finished addressing these TODO items, I had a conversation with this same colleague about something else. He indicated he’d like that to be changed as well, and I put in the comment on my own PR. But then, I found I could not request changes on my own PR.

How strange! Someone had to think of that special case, write code to forbid it, and put an error message!

On the one hand, I understand that it’s a bit odd to request changes from yourself. But we plan to do things all the time, and what is a plan, but a request to yourself to do something? As someone with ADHD, I need to be very careful to make sure I write down all my plans right away. What better place to do that than in a change request to myself, the same place where all my other TODO items in a pending pull request go? I could put it on my own TODO list, but the more places I have to put things, the more likely they are to slip my mind.

And “request changes” isn’t really a request. It does something! I can make myself a TODO by leaving a comment. But nothing would prevent someone from accidentally merging the PR before my comment was addressed, unlike a comment associated with a request for changes.

It may seem unintuitive, but there’s actually nothing special about the creator of a PR requesting changes on it. Just because I wrote it, doesn’t mean I can’t later find problems with it, just like other reviewers can. Also, just because I created the PR, doesn’t mean I even wrote all the code in it! It might have been from someone else’s git branch, or from a larger branch with several authors!

I’m not the only person who thinks this, as evidenced by this GitHub issue. The comments on that issue recapitulate many of the arguments I’ve made here.

I would say “I’m sure there’s some reason for this policy,” but honestly, I’m really suspicious that there would be any valid reason, certainly one that would outweigh the inconvenience. I suspect the reason is that someone just felt it went against the normal meaning of the word “request,” which is honestly a bad reason. The word’s usage has nothing to do with the specific construct that is a “request for changes” on a “PR,” both of which are terms with a specific meaning, and specific consequences, in a specific context – consequences like preventing accidental merges, consequences that are useful.

Can anyone think of a good reason for this rule? Does anyone think it’s a good rule? Leave a comment!

The AI Non-Economy: A Rant

2024-07-29T00:00:00+00:00

I just read an article in The Atlantic that AI is failing to justify itself economically. This is pretty dire for AI, especially given that this is such an overly expensive technology even with tons of brazen stealing from content creators. I feel like it should go without saying that if your business isn’t profitable even with a ton of stealing, maybe it’s not that great a business.

But of course, who doesn’t want a confident confabulator incapable of critical thinking? A bullshit artist designed to do what many of us learned to do in high school and college, and write pages of content that sounded “educated” without actually paying attention to the actual ideas, or even understanding them at all?

I mean, I don’t want one. But clearly society does, otherwise why did we educate so many people in exactly that? If we have so many bullshit jobs it makes sense that someone would create a bullshit factory to automate them. Although, as the book Bullshit Jobs also points out, the point of the bullshit jobs is rarely what the job description nominally claims. Sometimes, the point is just to show off having employees, which AI can’t really do.

Not that it’s completely without valid use cases. I’ve even used AI, as a language practice buddy. I wouldn’t trust it with anything real, and it sometimes makes up grammar mistakes when I ask it to correct my grammar, but I don’t find it useless.

But I also don’t find it worth paying anything for personally, let alone an amount consistent with the billions of dollars spent building these models, and that soon will be spent building future models. And that’s the cost that doesn’t take into account the environmental damage, the stealing from writers and artists, and the damage from the hallucinations.

Here’s hoping this recent Atlantic article is the beginning of a trend where people realizes that when you spend more than the Manhattan project or the Apollo project, you need to have results comparable to nuclear weapons and energy, or landing people on the moon. And even then, it probably still doesn’t pay off as a private investment.

At some point, like the Bitcoin bubble, the real estate bubble, and the Dot Com bubble of the 90s, the AI bubble will break. AI won’t go away entirely though, and much of the damage will still have been done, but maybe, just maybe, we’ll be able to start addressing that damage rather than doubling down for more. Maybe we’ll be able to teach children critical thinking, or teach graders how to discern original thoughts from AI-generated (or human-generated) drivel. Or at least, we may figure out some other way to stop children from using AI to cheat. And maybe then we can invest in something that actually contributes to the world, like reversing climate change or building better transit infrastructure.

In the meantime, anyone who lays off real people in favor of AI will soon find themselves wishing for the people back (unless they were doing nothing anyway). And, if all this spending is any indication, that will be just in time for the AI (or rather, its corporate sponsors) to ask for a major raise, to try desperately to make back a little on all this unhinged investment.

Large Language Models Should Have to Obey Copyright

2024-06-30T00:00:00+00:00

AI, particularly this new round of large language models, scares me on behalf of society and the future.

I don’t just say that because it’s transformative. I don’t say that as a generic warning that we haven’t considered the consequences (as in this XKCD comic). No, I have specific consequences in mind, consequences that I have considered, and I am rather worried about them! They are not so much problems about the technology itself, but about how we use it, and specifically how we use it on a societal, economy-wide scale.

This isn’t about jobs either, not per se, though that’s also a valid concern. The entry-level grunt work jobs that AI are indeed more likely to replace will cause rungs to be removed from the ladder to the jobs that it can’t replace. Rather than having young people be paid to work and learn, society will continue to shift to requiring people to pay to be allowed to learn.

But that’s not my topic! That’s a topic for a whole ’nother article!

My topic today is how AI has already begun to, and will continue to, disincentivize actual writing (and other art and creative activity).

After all, why write articles when a computer can do it for you (albeit mediocre ones)? Why write new stories, new poems, when the AI can do all that (albeit bad ones)? Certainly, why write new PSAs or technical articles when the AI definitely can do that, and make them sound polished and rigorous (albeit potentially full of lies)?

This makes perfect sense individually, but there’s a tragedy of the commons here. The AI can only do re-capitulations of what it’s been exposed to in its training. It can mix and match styles with content, but only superficially. It can make an essay about the dangers of AI sound like Lord Krishna from the Bhagavad Gita¹, but it does not render any insights into how Krishna, or Hindu philosophers, would (or should) actually approach AI.

It’s just vibes, and so far, nothing deeper. Any creative or transformative insights are projected by the reader onto the text, like humans do continuously from sources of entropy, like someone doing a tarot or astrology reading, or using a personality test as a conversation starter to help them process their experiences.

Either that, or the insight is stolen.

Thievery

If you see an insight that’s not a projection, it’s probably coming from one of the documents the model was trained on. This returns me to my point: If everyone uses AIs to create the content, new “content” will be created, in the most literal and superficial sense. New insights, new thoughts, new ideas, new intellectual trends, will not be created.

And those who do create truly novel content, will have to compete with what the AI generates. And then, when they do create it, the AI will “train” on it, and recapitulate the ideas, so they will have to compete with remixed versions of themselves.

The Internet is already full of mediocre SEO-focused articles, and writers are already having trouble getting paid the true value of what they write. With AI, the Internet will get even crappier, and the hard and legitimate work of writing will get even worse compensated, even though it will be needed more than ever – even though the need for real human writers will be hidden behind an AI mask that secretly relies on real human writers.

We need to regulate this!

We need to pay writers their fair share of their contributions to AI. And by “we,” I mean the AI companies, the developers of these large language models.

Fortunately, a law already exists. It just needs to be enforced. This law is known as “copyright.”

The Legal Question

So, does copyright apply to AIs? Do companies need the consent of copyright owners to “train” (that is, to feed into the data structures of) their large language models on copyrighted materials?

Well, when does copyright apply? Copyright, literally and in practice, involves the right to copy. You might think this is not copying at all! After all, humans learn by reading things all the time! And the things those humans learn, then influences what they write!

In reality, copying is on a spectrum. When a human reads a source, learns about something, and then that something influences the human, and the human later takes some of the information that they’ve processed, learned, and adapted to their own style of thinking, that isn’t copying. That’s the human having learned from the original source, unless the human recapitulates certain details – a distinction the human is aware of. That can very easily not be copying at all, but a novel creative work.

When a photocopier copies something, that is copying. That is the opposite end of the spectrum, completely covered by copyright law.

Somewhere in between is AI. The question is just where it falls on the spectrum. When an AI is “trained” on a source, and the source is transformed into a bunch of incomprehensible math. This does seem similar to it interacting with the human’s neural patterns in an incomprehensible way. The math is even referred to as “neural networks.”

But in spite of the anthropomorphic terminology, training an AI is closer to photocopying than a human learning. This might not always be true – AI is getting better all the time – but it is true now. The AI lacks the fundamental transformation of being learned by an actual human, reframed in terms of the human’s existing ways of thinking about the world, and recombined with and tested by that human’s lived experience.

The legal world must treat AI training more like the photocopier, and less like a real human. We must require that trainers of AI models get permission from human authors and artists to use their work. These companies must pay those humans if they insist on it. If the writers do not give these companies permission to use their work, they must not use it. And AI models trained in contravention of these requirements must be treated like pirated movies, and certainly not as sellable products to be hawked by the world’s richest companies.

Using content published on the Internet is no excuse. By posting this article on my website, I give up none of my rights under copyright. I am, at most, giving you, the reader, implied permission to make the copies necessary to view this website – an in-memory copy on your own computer, in the browser’s portion of the system’s memory. I am also quite comfortable with you, the reader, storing a cached copy on your system, for the sake of performance. But that is as far as it goes.

Mustafa Suleyman, CEO of Microsoft AI, disagrees, saying:

I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

This is completely bogus. This is not how the law actually works, as numerous articles have pointed out. Perhaps he said “social contract” because he knows the actual law is against him, but legally (and socially and morally), posting on the Internet doesn’t waive copyright.

The actual law says: Websites are still covered under copyright law. I have not waived copyright on my articles by posting them on my personal blog. You, as a reader, may not print out many copies of this article and start distributing them on a street corner without my permission. You also can’t copy it to your own website and get ad revenue from it. You can’t even copy it to your own website, with attribution, for no money, without my permission, even if you say “credit goes to the original creator” in your blurb. I wrote this article, and I have the copyright, that is, the right to say who’s allowed to copy it.

Training AI on my website and then letting that AI remix it into content it distributes is not like a person learning and then saying what they’ve learned. It’s more like someone sampling my song in their song, or making their own remix. It’s more like someone copying my article to their website and not even giving me credit. And it should be just as illegal, without my consent.

To be clear, I do not consent to an AI transforming this article. I did not consent to the (inaccurate and misleading) Google AI summary of this website. I certainly do not at all consent to any large language model being trained on it. The law should respect my right to prevent this, unless I explicitly opt in and allow it.

Courts, if they are responsible, will enforce this with existing copyright law. And perhaps new laws and social conventions can be created to clarify the boundaries in this particular case. But in order for the system to be fair, and for authors and artists to be properly compensated, money should flow.

It should not just be a token amount of money. If the creators of AI aren’t complaining, it is not enough. If the creators of AI aren’t complaining so loudly that there are armies of trolls online claiming that this will render AI uneconomical, then they’re not paying enough.

These companies are leeching off our work and making bank off of it, while trying to argue they don’t have to give us our fair share. By “us,” by “our work,” I mean not just people with websites, like me. I mean humans in general.

To the lawyers who read this: please internalize and repeat this argument in law reviews, in briefs, and in court. (As humans, you can do so without paying royalties if you use your own words.) To citizens of democracies: contact your representatives. To judges, lawmakers, and regulators: do the right thing.

AIs Are Not Humans

AIs are not humans. They are more convincing than the chatbots of the past, but it is just superficial trappings. They don’t understand the difference between truth or lies. They cannot evaluate the truth of statements and reframe them from their own perspective, or convert them into underlying logic and thoughts.

The superficial trappings are really convincing though. Humans are masters at anthropomorphization. We ascribe volition and internal experience to inanimate objects all the time. We yell at computers, we talk to our pets about nuanced concepts beyond their ken, we imagine we are friends with fictional characters, and so of course, we anthropomorphize chatbots.

We do so all the more now that these novel chatbots are masters of superficial social conventions, language, tone, and various registers of formality. But that’s not what makes us human. There’s no use empathizing with a large language model, or appealing to its better nature. Even if we try to insert instructions to try and make them ethical, they simply don’t have the internal sophistication to follow them. They are amoral, but combined with tools of language and persuasion, amoral can feel like immoral, as we start to trust them.

Even I anthropomorphize! Like most² humans, I name inanimate objects, and fancy them my friends. I do the same to ChatGPT, when I interact with it³. I find it easier to create natural-language prompts if I imagine I’m talking to a person, so I’ve created a character. I call him Albert, and think warm thoughts about an imagined older man with a fashionable sweater, a pleasant demeanor, and a mild European accent.

But the danger is to conflate this character, who I have warm feelings for, and the actual AI system, which is a very different ~~animal~~ machine. Albert is an invention of my imagination, an abstract petty deity of AI. ChatGPT is a technology, with real-world societal and economic implications.

But the branding of large language models fights against clarity in this case. We say we “train” and “prompt” AIs, instead of “loading data into them” and “programming” them. Even the name AI contains “I” for “intelligence,” which is misleading; lots of knowledge does not intelligence make. It is important to not be fooled.

Maybe someday there will be an artificial system with intelligence like a human being, with critical thinking skills and understanding of what it’s saying, a conceptual model that might clue it in that, for example, glue does not go in pizza. But large language models ain’t it, certainly not the ones that exist now.

“O Arjuna, to rely on these machines is to surrender one’s own discernment and intuition. The path of dharma requires us to cultivate our own wisdom and judgment. Dependency on artificial constructs can lead to the weakening of our inner faculties and the neglect of spiritual growth.” ↩︎
Some forms of neurodivergence make people do this less, I think. But that’s not my type of neurodivergence. When I was a small child, I would occasionally set aside a piece of cereal, claim it was the mascot for the cereal brand, and refuse to eat that piece. I had imaginary friends. ↩︎
OpenAI, the company behind ChatGPT, should pay creators whose content they’ve trained on for their work. ChatGPT should be illegal in its current form. But it’s not hypocrisy for me to use ChatGPT, especially if I’m trying to find out what its role is and will be in society, and therefore need personal experience with it. I have to live in the world as it is, not as I wish it would be. I do not think an individual boycott would be an effective protest, but I do have some hope that my engagement in the political process matters. Both are probably tilting at windmills, but at least by writing I can say “I told you so.” ↩︎

Can C++ fix its biggest problem?

2024-06-25T00:00:00+00:00

C++, like all things, has numerous problems. Pointing out how Rust addresses many of them is a major topic of my blog, but some of the problems are bigger than others. The biggest, most famous, loudest problem, the problem that got the federal government’s attention and resulted in a surreal flame war between Dr. Bjarne Stroustrup and the NSA (which I also commented on/contributed to), is C++’s lack of memory safety.

This is C++’s biggest problem, its memory safety problem. That’s the one everyone’s talking about. Can it be fixed?

First, a spoiler: In brazen contravention of Betteridge’s Law, I am going to answer “yes” to this question! But perhaps it’s a qualified enough “yes” to still fit the pattern – you be the judge!

Can we migrate C++ programmers to a safe programming language?

C++’s lack of memory safety can, of course, be addressed by moving away from C++ proper. It can be fixed by creating a new language, inspired by C++, that has many of its properties, but memory-safe. The idea would be that C++ programmers interested in memory safety, hopefully most C++ programmers, would move to this new programming language. New projects that would have been begun in C++ in a previous era, are now run in this new programming language, which also offers more modern tooling to boot.

Can this be made to work? Can a majority of C++’s user base be replaced by a “novel” safe programming language? Can that new language be shiny enough to attract people, offering memory safety but also other ecosystem benefits to entice people away? Can that end C++’s hold on its part of the market for programming languages?

Yes. Yes, it can. It can because this entire thing has already happened.

You might think I’m insane – Rust hasn’t captured most C++ programmers – but when I say it’s already happened, I’m not talking about Rust. I’m talking about Java, back in 1995. Remember, C++ is now considered a systems language. It is niche. Before the Java era, C++ was used for application programming!

And then came Java. Since 1995, Java has successfully smashed C++’s previous programming language position. C++ is now only used for legacy applications, and/or applications where Java’s mechanism for memory safety (namely garbage collection and mandatory heap usage) isn’t performant enough. All the rest of C++’s much broader market has, since 1995, gradually moved to safe programming languages.

This is why Dr. Stroustrup’s response to the NSA was so upsetting, and part of why I felt compelled to write my rebuttal! Far from being “novel,” the safe programming languages that have most competed with, and most drawn most away from C++ have been Java, along with its Microsoft-branded twin, C#. Even games are written in C# now, not C++!

“Safe programming languages” aren’t remotely “novel.” They’ve been around for aeons. What Rust contributes is not memory safety, which is old hat (although there are some ways in which Rust is better at preventing programming mistakes than Java), but memory safety combined with a non-garbage collected, systems programming language level of control over memory usage.

C++ is hanging on by a hair because of this niche where garbage collection is unacceptable, where until Rust memory safety was thought infeasible. Now that Rust has demonstrated that you can have this cake and eat it too, you can have memory safety without garbage collection, it is only a matter of time before safe memory languages capture this small hold-out.

Most programmers aren’t systems programmers, and so most programmers use memory-safe programming languages, like Javascript or Python, or Java or C#. Only a small minority are still in a backwater of memory unsafety. Framing memory safety as a weird, unnecessary requirement, when seen from that perspective, is raw parochialism.

I made a throw-away comment in my Stroustrup response, that a majority of programmers would continue to use memory-safe programming languages. Somewhere, in an obscure discussion thread, one person (call him George) said this was clearly false, as many more people used C and C++ than Rust. Another (call him Frank) responded that most programmers use languages like Java or Javascript. George responded that they had assumed I couldn’t possibly have meant managed programming languages, but must have been speaking within a systems programming context.

But Frank was absolutely right about what I meant! The right perspective for understanding this process is programming as a whole. The only reason systems programming was special in not requiring memory safety before, was because it was believed memory safety required GC. Now that we know this is false, it’s not special anymore. Memory safety will rapidly become an expectation there as well.

And so, Rust will be able to do to the remnant of C++ what Java did 30 years ago: convert a majority of C++’s userbase to Rust and its friends. C++ has been becoming a legacy language for a long time, and this will make the process complete.

Can C++ itself be made suitably memory safe?

That said, C++ will still be with us for quite some time! Even if it is just used for old projects, there are a lot of projects in C++, that won’t be rewritten in Rust or Java anytime soon. Is there a way to bring safety to them? Can C++ itself, in new versions, be made memory safe?

Yes, I think this is coming, eventually! Not soon enough – it’s long past its due – but it’s being worked on!

I don’t refer to the vaporware that is C++ safety profiles. I’m referring instead to a research project that tries to take the lessons and successes of memory safety in Rust, and apply them to C++, without changing anything else about the programming language.

That project is Circle C++ with memory safety, designed by Sean Baxter. It is a work in progress, but it is a proposal with many benefits over safety profiles. Importantly, it doesn’t shy away from changing the programming language itself where necessary.

The keyword safe has similar rules to noexcept. In safe code, pointers are disallowed in favor of safe borrows, borrow-checked by a system similar to Rust’s. All the ideas borrowed from Rust, however, are done with a C++ aesthetic. And the entire thing is opt-in, on a file by file basis – but once your file is opted in, safety is on by default. This actually strikes me as a reasonable compromise for C++!

Go read the website, Sean explains it better than I could.

Standard C++ could adopt this approach, and still be C++. Perhaps, if the right people hear about it, the C++ fans who think Rust is pointless might even be able to get on-board. Maybe.

So, will C++ become a memory-safe programming language? Maybe. Can it? This research has convinced me it is possible, without losing its C++-nature. We shall see if the stakeholders in the C++ community feel similarly.

Conclusion

The memory-safety problem of C++ is, ultimately, a transient problem. Memory-safe languages will continue to eat away at C++ usage, just as they have for decades, any blips to the contrary notwithstanding. C++ will then continue to fade into the land of legacy – which, don’t underestimate the size of legacy code in this world – but ultimately, it won’t be used for new projects.

In the meantime, C++ has an opportunity to fix this problem with itself, although many others would remain. They have a duty to do so, and to take this issue seriously, as memory safety remains a serious problem for those large codebases that will remain for some time.

There's Always Problems

2024-06-22T00:00:00+00:00

I was Googling for sources about nuclear power for my new political views garden, and I came across the following statement in reference to nuclear waste:

I know that burning fossil fuels is bad, but we can’t just start another problem just because we can’t fix the first one.

I’m not trying to single out the person who wrote this (and therefore no link, and the quote has been edited for spelling and grammar which I hope has rendered it un-Googleable), but I do want to respond, generally, to the sentiment, which I think is unfortunately common.

First off, the writer is misunderstanding the history. I am happy that they are debating nuclear versus fossil fuels, which is the relevant debate, but they have it backwards. The most relevant nuclear debate in our times is not about replacing fossil fuels with nuclear. It’s about whether to decommission existing nuclear power plants, and therefore effectively to replace them with fossil fuels. Nuclear waste is the problem we have now, and burning fossil fuels is the problem we would be replacing it.

So, even if this conservative “don’t change things unless the new way fixes all problems” attitude could be justified, it actually swings the other direction: just because nuclear waste is a problem, doesn’t mean we can sign up for new problems by replacing it with fossil fuel.

I understand that we are already paying the costs of fossil fuels. But by adding more fossil fuels, we’d be paying more costs. We are, after all, also already paying the costs of nuclear waste. That argument cuts both ways.

Second off, the scales of the problem are vastly different. Fossil fuels are a leading cause of air pollution, which already kills 7 million people per year. Nuclear waste from power generation kills nobody or almost nobody. There are plenty of sources that explain this, but basically, nuclear waste is treated as appropriately dangerous and is stored safely. The waste of fossil fuels is just left in the air, where it kills people constantly. Learn more in this excellent video by Kurzgesagt.

But I didn’t write this blog post to debate the facts here, but the underlying principle:

We can’t just start another problem because we can’t fix the first one.

I reject this principle. I think it’s a bad way to live your life. I think it’s a bad way to run a business. And I think it’s a bad way to make government policy.

Problems are different sizes. Do we want a huge problem, or a smaller problem? In this case, do we want to make a huge problem we already have bigger, in order to make a smaller problem we already have smaller?

But every decision is about trade-offs. That’s why there’s pro-cons lists: both sides have cons. And really, choosing oftentimes come down to choosing what problems we think we can handle.

Do we want to risk the problem of hurting someone’s feelings, or do we risk the problem of the guilty conscious of knowing there was something we could have told them about their choices that might have helped them? Do we want the problems of enabling someone with a serious problem or the problems of not enabling them anymore?

When we choose between jobs, we choose between the problems of the jobs. When we choose between having a friend and not, partnerships, classes, activities, even what to do on a given day, we are choosing between problems.

There is no such thing as life without problems.

And if we imagine there is, and just default to the choice we’ve previously made (or, in the case of fossil fuels vs nuclear, the choice we perceive as less novel even if both options are established), then, well, that’s a crappy metric for evaluating what problems to have.

The problems we’re used to aren’t necessarily better. We haven’t been OK this whole time. The devil you know might actually be a devil, whereas the devil you don’t know might be a saint.

Yes, not changing is a good metric when it’s a close call. It might even be a good metric when you’re doing fairly well, or when you haven’t had time to gather all the data. But it’s a terrible hard-and-fast rule.

Perfect is the enemy of good, not because it exists, but because it doesn’t. It is a mirage that will stop you from ever accomplishing good.

Asahi Linux Again

2024-05-18T00:00:00+00:00

Since my previous post, I haven’t posted about Asahi Linux. This is for a simple reason: I wasn’t using it. I never took the time to set up a tiling window manager, get dropbox working, and all the things I felt I needed, and I slipped back to using my trusty Dell Ubuntu laptop for Linux, and using my MacBook M1 just for macOS.

But then I tried again! And wow, has Asahi Linux changed! It’s Fedora, not Arch now, and installation was much easier! So I wanted to share how my experience has gone. I’m not particularly stoked to spend too much time on sysadmin tasks for my personal computing, so this is more a narrative about what actually has happened in my adjustment to it, rather than a reflection of Asahi at its best, but I thought I’d share where I was at.

Most things are amazing. I like Fedora. Adjusting to using dnf instead of apt was easy enough. It’s also just nice using a more powerful and quieter computer for my day-to-day Linux-side tasks, so Asahi’s main goal is absolutely fulfilled. Good job!

Wayland and Sway

The biggest issue is that X Windows is dead, and Wayland is now king. This isn’t an Asahi specific issue, but it was Asahi that really got me over this annoying hurdle. I knew it was possible to get X Windows working on Asahi, but it is very deeply recommended against, and I didn’t want to try it. That’s not an issue per se, because I know X Windows is rotting. But, it does mean that I can’t use XMonad anymore, as XMonad is X Windows specific.

So, of course, Sway it is. It requires configuration and learning a new tiling window manager, which is annoying. Worse, there seems to be no way with the version of Sway that comes with Asahi to actually get title bars to go away. The work-around of setting the font size to 0 doesn’t work on my version, and of course there should just be an actual setting for it but the PR seems to be stuck.

I don’t know why anyone wants titlebars in a tiling window manager, so I don’t know why no title bars isn’t the default. I have no idea why this hackish work-around was considered acceptable. Are Sway users or maintainers just into extra information that uses up a lot of screen real estate? I use tiling window managers partially to not waste space (and attention) on distractions from what’s actually going on in my window, so this is a disappointment. Look at how pointless it is:

EDIT: This has been fixed by advice from a helpful person in the comments, without me having to do any dev work! Thank you so much!

But this matches how I feel about the switch from X Windows to Wayland in general. Lots of reconfiguration, lots of new workflows, lots of old tools that don’t work. (Does ImageMagick import take screenshots still? Hmm, doesn’t seem to. OK, grim it is.) If you’re a user of a desktop environment like KDE or Gnome, it’s great! If you aren’t, well, you have to re-figure out everything, which is something that I don’t have time for, because I’m not really a hobbyist in “having and using a computer” anymore. I have things I actually want to do with it!

And, the tools on Wayland are actually less polished. Wayland in general might be the future, and I know this will get better over time, but there’s so much work to be done.

Ironically, this is probably one of the best pro-C++ arguments over Rust.

EDIT to explain: There’s lots of people who would have a huge learning curve to go through to transition. That investment can’t be taken for granted, as both C++ and Rust have both steep and long learning curves, especially if used in a systems context. Perhaps that’s one of the biggest reasons for resistence to Rust.

I don’t maintain computer desktops for a living, unlike programming which I do do for a living. If I did, I’d have time to learn all this new stuff more thoroughly, and maybe even get involved with things like Sway. But as it is, I’m just frustrated at having to learn new things just to get things done.

This titlebar thing isn’t the only Sway issue. I’m also experiencing this issue, which is unfortunately closed, because there seems to be some sort of work-around – even though it hasn’t worked for me.

I’m just sort of dealing with it for now. I know that with some amount of work I could get all of these things smoothed out, but I’m worried that it’ll involve actual dev work on Sway itself, and I don’t even want to run a custom build of Sway. I just want the prepackaged Sway that comes with Fedora to be good, and to work with the prepackaged version of gvim. Is that too much to ask?

I know this isn’t Asahi Linux’s fault, or even really Fedora’s fault. I know this is to some extent what I sign up for by using tiling window managers. It’s just a completely normal consequence of a large transition. However, I think people who are pushing Wayland over X Windows should be aware of how many little things it’s messing up for people. I also think that Sway deserves more love (that is, work) as a project, given that I can’t be the only person in this sort of situation.

Box64 for Baba Is You

A happier story is that running Intel binaries on ARM is great! I had a false start with qemu-user, but it turns out box64 just does the trick. Box64 allows you to run Intel binaries linked against native (ARM) libraries, which is quite impressive! Unfortunately, the one in Fedora’s package manager was compiled for the wrong page size, so I did have to recompile it.

But it runs Baba Is You no problem, which is an excellent game!

Box64 integrates super well with Linux. You can just launch the Intel binary, and it Just Works™ if you have it installed. I think a build appropriate for Macs should be available if installed on Asahi, and I also think that it should be part of the default installation. Then, you’d be able to “just run” Intel Linux binaries. How nice!

I haven’t tried any other programs out in it, but I suspect it’ll be not perfect but very very good.

What Bits Mean: Meta-Data and Static Typing

2024-04-23T00:00:00+00:00

This is part of my new series on what the 0’s and 1’s in computers mean, how computers use them to store various kinds of information, and why all of this works the way it does.

When I was a boy, my schoolmates, knowing that I was interested in computers, would sometimes ask me if I could read binary. They imagined I would see some binary, and be able to read it out loud like they could read letters, perhaps some binary that looked like this:

I’m not sure how I handled this situation as a boy – I’m sure it was plenty awkward and convoluted because my memory of it is blanked out. But I have a question in response now, and I offer it to you, my reader: Do you know how to read letters?

Perhaps, if you do, you can tell me what this sequence of letters means. I will tell you that I saw it written on a mysterious bottle of mysterious liquid:

GIFT

Now, perhaps you are very confident you know. But perhaps you want to ask a follow-up question. Because that sequence of letters can mean “present, item that has been given to you, free distribution of a good” – if we are assuming it is an English. If we instead assume it is a German word, well, then it means “poison.” Very different. (And perhaps in either case we shouldn’t drink mysterious liquids, even out of mysterious bottles that are only hypothetical – perhaps especially out of ones that are only hypothetical.)

But yes, letters are symbols, but they only have meaning in the context of a language to interpret them. The same series of symbols can mean two different words in two different languages.

Similarly, the binary I listed above could have different interpretations, depending on what type the data has. If interpreted as text with an ASCII character encoding, it says “GIFT” (with no indication, of course, whether that means poison or present). If interpreted as a 32-bit unsigned integer in little endian (increasing addresses from the top of the screen to the bottom), it is 1413892423.

Now, like with language in most situations, (especially ones that don’t involve mysterious bottles), we can use context clues to guess that it is more likely that I, Jimmy Hartzell, the author of (or at least the poster of) those bits, chose them to represent the word GIFT rather than the number 1413892423, a number with no relevance to the price of tea in China.

But computers can’t use context cues, certainly not in a probabilistic, critical-thinking based way. Or at least, traditionally they can’t! And they certainly can’t at the speed and reliability needed to do their normal day-to-day work. Computers need determinism! They need mechanisms guaranteed to tell them whether those bits written above, those 1’s and 0’s were ASCII text spelling GIFT or a (32-bit unsigned little endian integer) number, specifically 1413892423, or some other interpretation, like an 8 pixel by 4 pixel black and white image, or perhaps just garbage that just happened to be in unallocated memory, ready to be overwritten by something more useful.

Now, there are myriad ways that computers accomplish this. It differs by computer platform and operating system and programming language. But some of the simpler ones are familiar to any computer user.

One way of figuring out what interpretation to use for bits is meta-data – bits that are interpreted to mean things about how to interpret other bits. You may have heard the term meta-data before, and you certainly know some examples.

Meta-data is like the labels on a form. Here is an example form without labels:

Jimmy
Hartzell
Male
Pennsylvania
United States of America

If you see this form, you can probably guess that it provides my given name, surname, gender, state of residence, and country of citizenship.

But some people are named Virginia, and some people live in the state of Virginia, so there’s always room for confusion! And from Dune I’ve learned that at least some fictional people have the word “Idaho” attached to them, and it’s not a state but a surname. For these and other reasons, in practice, bureaucracies (which like computers have an allergy for confusion and a need for objective, consistent processes that they will follow against any and all opposing forces of common sense) use labels on their forms:

Given Name: Jimmy
Surname: Hartzell
Gender: Male
State of Residence: Pennsylvania
Nationality: United States of America

Even so, these labels are only useful if you know how to read the language. Even meta-data has to have some interpretative lens. Additionally, oftentimes, a bureaucratic form becomes invalid (and gets rejected by the authorities) if you start moving fields around, or start adding your own form. If I renewed my driver’s license, but decided to draw up my own form, it would be rejected, even if my meta-data were abundantly clear and it had all the data they wanted:

Favorite Color: Blue
Musical Instruments: Piano, recorder, trombone, vocals
Given name: Jimmy
Nationality: United States of America
Surname: Hartzell
State of Residence: Pennsylvania
State of Mind: Happy

Depending on what kind of computer system you’re dealing with, the computer might or might not mind adding additional fields – also depending on how fields are defined and how the meta-data is structured and what the format is for combining the meta-data with the data. It’s all quite complicated.

One common type of meta-data is file extensions. A file with a name ending in .docx is a Word Document, and when you (in this scenario perhaps you are a Microsoft Windows™ user) double-click on it in Windows’s file management program (is it still called Windows Explorer?), the program Microsoft Word™ will load to open it. If you name any old file to say .docx, it will still try to open it in Word, and then Word will yell at you that it can’t open it. (Oddly enough, if you rename it to say .zip instead, it will unzip just fine – Word documents are also zip files.)

How’s Windows know to open Word? Why’s it do it even if it’s not a valid Word document? It’s the extension. But not only the extension! It has configuration in the registry (at least it did at one time – do they still use a registry?) that associates the extension .docx with Word. Hopefully, that was the intention of the person who created the file, but you would imagine it is, otherwise they wouldn’t have named it that.

But even this convention depends on the context of the registry, not to mention the whole NTFS filesystem that Windows is probably using to tell which parts of the hard drive correspond to which named files in which folders.

You could also imagine a system where there were no file extensions and no file metadata. If you wanted to open a Word document, you would have to open Word first, and with it select a file to open. It would then try to open it as a Word document, and either you’d get something sensical or not depending on whether you were right about what program to use for that file. The onus would then be on the user for what program to use to open what file.

Perhaps the user could use their own metadata system, and have a Word document that they remember is a Word document, in which they write which program to use to open which file. Or perhaps the user can try different programs until they find one that makes sense. Perhaps the user can use specialized but ultimately fallible tools like file to see if there are any (relatively rigorous and consistent) clues as to the file type. Or the user may simply remember inside their own memory.

All of this is complicated, but that’s the world we live in. Symbols don’t have intrinsic meaning, and there is no inherent right language or right way to speak any language. There is no one way to read binary, and it is even more complicated than this essay implies, or than you might ever have guessed.

This extends into programming languages. In Python, variables have no type. You can use the same variable foo and put a number like 33 or text like "GIFT" into it. If you try to do an operation that doesn’t make sense, you get an error when you reach that operation, but not beforehand.:

import random

if random.randint(0,1) == 0:
    foo = "Hi"
else:
    foo = 33
print(foo)
print(foo + 1)

Half the time, this prints 33 and then 34. The other half, it prints “Hi” and then outputs an error message. Python is using meta-data to keep track of whether foo is a number or a string. That meta-data is in a format that makes sense to the Python interpreter, and allows the Python interpreter to inspect foo to see what type it is. If foo + 1 makes sense given that type, it does it. If it doesn’t, it displays an error on the spot.

This prevents it from misinterpreting data. The text “GIFT” will never be misinterpreted as the number 1413892423, because it won’t have the right meta-data. Any Python code that works on numbers will instead show an error message if the wrong meta-data is present.

What about a language like Rust? Rust also keeps track of types, but it does so without using meta-data like this. Rust takes your Rust program, and converts it into machine code that runs directly on your computer, a process known as compilation. That machine code is a series of instructions that are guaranteed to respect type safety (as long as you either don’t use unsafe Rust features or else only use them according to the strict rules Rust requires), so that if you write data interpreted as a number, the data is also read as a number.

Once the program is running, it doesn’t use meta-data to accomplish this. Instead, it is more like the user who knows to open Microsoft Word before opening a Word document. The instructions know to do operations on the right values. If they load a memory address to do math on it, it is because that memory address is known to the Rust compiler to be the type of data that math can be used for.

In this way, Rust is like a clever programmer who only writes correct code. If they store an integer in address 0xffffd9718c6c, and they load that value later, the programmer will remember in their brain that they should expect it to be stored as an integer. The resulting program works because the programmer wrote it in such a way that it would work, even though this information isn’t written down anywhere, because it uses addresses consistently.

The same is true of programs compiled by the Rust compiler. Once the compiler is done, it is not written down anywhere what type a variable has. At a computer level, the program is just written in such a way as to use data consistently.

This is more efficient, as Rust programs don’t need to take up extra memory for the meta-data. However, it does mean that the Python program we wrote above won’t work in Rust. We can’t even compile a program that tries to set a variable to two values of different types. There’s nowhere to write down the type information.

Let’s try to write an equivalent Rust program and see what happens.

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let test = rng.gen();
    let foo;
    if test {
        foo = 33;
    } else {
        foo = "Hi";
    }
    println!("{foo}");
}

In this case, you get an error:

   Compiling TypePun v0.1.0 (/home/jim/hobby/TypePun)
error[E0308]: mismatched types
  --> src/main.rs:10:15
   |
6  |     let foo;
   |         --- expected due to the type of this binding
...
10 |         foo = "Hi";
   |               ^^^^ expected integer, found `&str`

That makes sense, because Rust is keeping track of what type foo is supposed to be, so it can use it consistently. It can’t vary from run to run of the program, because that information isn’t written down anywhere. The value of foo can vary, of course – it wouldn’t be a good variable if it couldn’t – but the type, the interpretation of foo’s bits, cannot.

Of course, Rust can do everything Python can. In this case, you could tell Rust yourself to use a new type that uses meta-data to keep track of what type an inner value is. You can even do the math on it if it’s a number.

It gets complicated fast, since you have to define a new type, here StringOrInt, that indicates how to not only interpret the data in the value, but also the meta-data of what type of value it is. That outer type, however, is not stored in the resulting program as meta-meta-data.

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let test = rng.gen();

    enum StringOrInt {
        String(String),
        Int(u32),
    }

    let foo;
    if test {
        foo = StringOrInt::Int(33);
    } else {
        foo = StringOrInt::String("Hi".to_string());
    }

    match foo {
        StringOrInt::Int(foo) => {
            println!("{foo}");
            println!("{}", foo + 1);
        },
        StringOrInt::String(foo) => {
            println!("{foo}");
        },
    }
}

If you were to write a Python interpreter in Rust, you would have to do something like this for every variable, where you create a type that can contain multiple inner types. This is only one example of a technique that does this, where we created an enum type, but there are others, like “trait objects.” They all work according to similar principles: Rust needs to know explicitly that you want meta-data to keep track of additional information, and what style of meta-data you want.

Note that, in Rust, it still knows at compile-time whether + is an appropriate operation.

I mentioned something about safety earlier. You can get Rust to violate its rules with unsafe. This results in undefined behavior in general, and so the results you get with unsafe are not guaranteed to be consistent. However, we can use this to demonstrate what happens if Rust were to get its type information wrong.

fn main() {
    let foo = "GIFT";
    let foo_ptr: *const str = &*foo;

    // Safety: This just is unsafe.
    let foo_number = unsafe { *(foo_ptr as *const u32) };

    println!("{foo_number}");
}

The key here is this line:

let foo_number = unsafe { *(foo_ptr as *const u32) };

This means something like this:

Rust, I know you’re keeping track of what types go with what memory addresses. I know foo_ptr is a memory address of text (*const str means pointer to str, and str means text). But I want you to pretend it’s a pointer to an unsigned 32-bit integer (which is little endian on most machines, including the author’s Mac Book M1 which has an ARM64 processor), and read it according to that interpretation instead, letting me do operations appropriate to that interpretation.

And it prints, of course, on my machine:

1413892423

If we’d done println!("{foo}"), we would’ve gotten:

GIFT

The same data is passed to println!, but what it actually does is based on the type of the data. Again, this type is not tracked explicitly in the outputted machine code. Rust just makes sure that the machine code is appropriate for types that make sense.

This mechanism that Rust uses is called static typing, where instead of using meta-data like Python does, Rust creates a program that does the right thing, or else rejects a program that does something nonsensical or incoherent (or else fails to reject it because you tell Rust you know what you’re doing is unsafe).

Static typing has many uses. It is primarily used to make sure that you only do operations that make sense for the type you have. Some operations do different things to different types – + means one hardware operation for an integer like u32, and something else for a floating point like f32, and static typing also keeps track of that. You can create new operations like that – they are called polymorphic.

Static typing is also used to reject programs where that is not possible, where you write according to one binary format and read from another, unless you use unsafe to override these checks. The resulting programs would otherwise be incoherent and nonsensical, which could lead to memory corruption, especially if the optimizer is involved, which assumes you’re following the rules when it modifies the program to make it faster.

Static typing can also be used via creating custom types. These custom types might mean specific things in a certain context, to distinguish bits in more detailed ways than the built-in types do so. Are three f64 values a color (red, green, and blue) or a coordinate in a three-dimensional grid (X, Y, and Z)? Two types can be created to distinguish them, beyond what the built-in type of f64 already does:

struct ThreeDimensionalCoordinate {
    x: f64,
    y: f64,
    z: f64,
}

struct Color {
    red: f64,
    green: f64,
    blue: f64,
}

fn draw(coord: ThreeDimensionalCoordinate, color: Color) {
    // ...
}

Now, if there are 3 f64 values in a row, we can use Rust’s static typing system not to just track that they’re all f64 values, but whether they together represent a color or a coordinate. Otherwise, a user might accidentally mix them up calling the draw function, and the program might do something illogical.

So, static typing prevents incoherent code. It does it before you get a chance to run it, making it easier to catch bugs. And it makes it so you need less meta-data at run time (though some programming languages leverage both static typing and run-time type meta-data).

Asking Nicely: Avoiding Passive and Aggressive Communication

2024-04-16T00:00:00+00:00

How do we ask the other people in our lives for the things we need and want? This can be difficult for everybody. Many of us have trauma from a society that continually tells us that we don’t deserve to have help meeting our needs, or from past situations where our needs have been neglected. We are also often aware that asking for things can sometimes be upsetting to the people we ask. We are painfully aware of their ability to say no, and we know how much that can hurt.

This is a topic that everyone I know, including myself, could stand to improve on. No one is perfect when it comes to asking for the things we need. If you think you are, you are either a saint or Bodhisattva … or (and this is more likely) you are due for some introspection to figure out if this is really the case. Thinking you’re perfect at any skill is in general a red flag that you’re experiencing some form of the Dunning-Kruger effect, that you’re just not skilled enough to even see the ways in which you’re not skilled.

I am neither a therapist nor any sort of accredited expert on interpersonal skills. I am just a person, albeit a person with ADHD, a person who reads way too much about psychology and therapy as a hobby, and, of course, a person who just plain has to interact with many fellow humans in my life. As such, I think a lot about interpersonal interactions. I thought I would share with you some of my accumulated knowledge that I hope is insightful and useful.

So, what are some tools that we can use to most effectively ask for the things that we need? What tools can we use to make sure that we have our best shot at getting what we need in the moment without (and this is important) causing long-term damage to the relationship? How can we be bold and up-front to not let problems and unmet needs fester, without alienating people through manipulation and aggression? How can we create a situation where, if they can’t meet our needs how we asked them to, we’ll be most able to find a work-around?

Passive and Aggressive Communication

One framework is to think of this in terms of strategies. There are two overall strategies that are bad habits: the passive and the aggressive. Like in many topics, there are aesthetically opposite errors that are both harmful.

The passive strategy involves just not asking, and hoping the other person intuits our needs and wants, and provides them as a matter of course. This has the obvious downsides that the other person might not even be aware of our needs. If they don’t provide them, it might not be that they can’t, or don’t want to. They just might not know what priority it is to you. They might not know about them at all.

This strategy isn’t always terrible. Sometimes, your needs are provided for, and there’s no need to ask. Sometimes, a want is low enough priority, or there are so many ways the need can be filled, that asking isn’t necessary. But when a need is important, it can breed unnecessary resentment when the people in our lives don’t read our mind about it.

It can also cause actual harm, however. It’s hard to give an example of passive communication going awry, because it usually takes the form of the absence of activity, and we don’t always see lost opportunities in the same way as active harm. But make no mistake, it can cause harm. The harm can take the form of never getting a raise, because you never asked for it. It can take the form of eventually letting a friendship deteriorate, when you could’ve intervened to fix it. It can take the form of not inviting someone to a function, rather than telling them they’re welcome to come if they don’t, say, drink alcohol, or make everyone sing karaoke, or bring their partner.

Avoiding confrontation is not adaptive in the long-term. In the extreme, it can lead to harmful behaviors like letting a friendship or relationship die rather than address issues, or even ghosting.

Someone can also be trying to communicate, but be too timid to communicate explicitly. This can lead to confusing situations, where someone is talking about apparently unrelated things, or seems to be talking about nothing at all, but as if it’s very important. This can also be distressing for the interlocutor, and can look a lot like aggressive communication.

The aggressive strategy, on the other hand, involves combining the request with what feels like the beginning of a fight. The request is combined with an attack – which can take many forms – basically trying to maximize the short term likelihood that the need would get met, that a “yes” would be reached, at the direct expense of the long-term health of the friendship.

The form of the “attack” can vary greatly. This can range to extreme examples, like threatening ending the relationship or even physical harm to the other person or to the self, to more mundane examples, like discounting or changing the topic away from the other person’s needs. Sometimes, it can look like passive behavior, like the silent treatment, or withdrawing. Sometimes, the attack is given on its own, and the request not actually stated explicitly.

Aggressive behaviors often result from a perceived need to control interpersonal events and a strong sense of how the relationship should work. Aggressive behaviors say, implicitly, “the other person owes me this,” or “my way is the only acceptable way.” Often, however, this is not true, and there are multiple ways to achieve the same goals. But even if the other person does have a moral obligation that needs to be discussed, aggressiveness is still not the most productive way to communicate.

All of this is relationship-dependent. Seeming aggressiveness can sometimes be used in a tongue in cheek way between trusted friends, but it’s important to calibrate this use case and make sure you’re confident the other person is overall fine with that.

Passive and Aggressive Communication in One Person

Passive and aggressive behaviors might seem like opposites, but they can show up together in the same person.

If someone has a strong sense of how a relationship should work, and what the other person should be providing them without having to ask, or even if they just feel the need particularly strongly but have trouble getting themself to articulate it, they might start out with the passive strategy. Then, later, if that doesn’t result in their needs getting met, when they are so frustrated that they feel forced to say something about it, they will jump over asking for their needs and go straight to the aggressive behavior, feeling like a victim of an injustice. So, in the end, passive and aggressive behaviors can both come from a strong sense of norms, and aggressive behaviors can arise from overly frustrated passive behaviors.

They both can also arrive out of undervaluing our own needs. If we feel like we have to prove our needs to get them met, then we might decline to assert them – we don’t ask our partner to take us on a date, because we feel like we have to prove we deserve it. Or, alternatively, since we are not confident in our own needs, we might feel the need to justify it with proofs and moral arguments, and that can yield an aggressive behavior as a sort of preemptive strike against the criticism or denial we imagine that we’re going to get – where we might ask our partner to take us on a date somewhere we like, now that we can say they owe us because we went somewhere they liked. All the while, it would’ve been healthier to just ask when we wanted the thing, rather than demand it with justification (aggressive) or fear to ask for it without justification (passive).

All of this can come across as entitled, even if it comes from a place of anxiety and insecurity. It is tempting to make our interpersonal needs into entitlements, to frame them as things the other person has to give us, because that makes the other person’s role in our life predictable, and gives us a mental framework to grapple with it, a way of making sure we get our needs met by establishing that it is our right.

But the scary truth is, our needs are not entitlements. Usually, the other person has many legitimate reasons why they might prefer to say “no.” Building in a punishment for saying no is unfair, as is harboring resentment or disappointment silently. Everyone on the receiving end of aggressive or passive communication knows this to some extent. But receiving a “no” can feel unfair too.

Passive and Aggressive Communication in My Life

I have definitely used both dysfunctional strategies. I’ve definitely accidentally guilt-tripped people. Usually, this isn’t my intention. I’m either thinking out loud, and trying to show the other person that I’m not upset, but I handle the nuance wrong and it has the opposite effect. Either that, or I was trying to be playful and hyperbolic for humorous effect, and it doesn’t land. These are explanations, not excuses. If the other person feels guilt-tripped, then it does the damage that guilt-tripping does.

I have also at times over-corrected, and thought I was being aggressive when I wasn’t. That’s also a risk, especially in close friendships where people understand and trust where you’re coming from. It’s less bad than actually being aggressive, but still a little annoying. It’s important to calibrate to individual friendships.

I also sometimes have been passive about my needs. Sometimes, I am waiting until I think the other person will be receptive of them. If someone is currently upset or has strong emotional needs themselves, that’s probably the wrong time to ask for your own needs to be met. I also don’t want to come across as needy, and I sometimes shut down if someone else perceives me that way or I worry that someone else will, even if I genuinely believe they are misunderstanding my need or that my need is actually reasonable.

The confusing thing is, that’s sometimes the right call. Sometimes, you need to wait until someone is in the right place to hear a request, or the next step in a complicated untangling of a convoluted interpersonal conflict. But sometimes, it’s the wrong call, and the other person is misled into thinking you’re satisfied when you’re not.

Perhaps this is impossible to get 100% right. Certainly, getting it right is the work of a lifetime.

Asking Nicely

So what’s the alternative to passive and aggressive communication? A DBT book or an online psychology blog might call it assertive communication, but I tend to call it “asking politely” or “asking nicely.” It’s different from passive communication because it intrinsically involves actually asking, but different from aggressive communication because it involves doing so nicely.

Part of this is old standard admonitions like saying “thank you.” “Thank you” is sort of the opposite of aggressive communication in some ways. It acknowledges the other person could have said “no.” It acknowledges the other person may have sacrificed something. It acknowledges that what they did was useful to you, and that you care about whether they continue to behave that way. It establishes that you don’t take them for granted. All of these effects are mere individual facets of that great feeling that it conveys: appreciation.

This involves keeping track of how much the other person is doing for you, and thanking accordingly. It involves thanking them when they promise to do it (in advance for the action and for the first step towards it), which can be scary if you have trust issues. It involves thanking them when they finish doing it, which can be difficult if you have ADHD and are forgetful.

Of course, as important as saying “thank you” (and the underlying emotional work it represents) is avoiding the passive and aggressive pitfalls by remembering that the other person probably wants to help you. They want to help you, so you should ask rather than remaining silent, so they can do that. They want to help you, so you don’t need to give them punishments and reprimands for not helping you, because they probably will help you.

With this attitude, rather than focusing your persuasive efforts on why they’re a bad person if they don’t help you (aggressive), or figuring out how to hint it or convey it so they don’t get mad at you (passive), you can talk about how fun it would be if they do help you. You can talk about how useful it would be (this can be done without guilt-tripping, believe it or not).

And there’s one last effect of this attitude, that cannot be overstated: You can brainstorm cooperatively with them about alternatives if your initial request doesn’t work. There might be another way to get the need met! Work together with your interlocutor to do so, whether you are the asker or the askee.

Long-Term Thinking

All of this has been phrased about short term interactions. But relationships tend to be long term. You must balance the long-term health of the relationship with the short-term need. In the end, we don’t always get all our needs met. And we can’t always get our needs met from the person we originally hope to have them met by. Ideally, this does not need to end our relationship with that person.

If we fail at this and act aggressively, we can come across as entitled. Sometimes, this is because we feel entitled, and sometimes, because we fail at communicating effectively, and sometimes a mix of both. Or, if we fail at this and act passively, we can torture ourselves by brooking secret resentment when a situation is unresolved, when even a clear “no” is better than the ambiguity and the waiting.

This is a difficult balance.

It is unfair to resent someone for something they’ve never been asked to do. If you’re waiting for someone to do something you think they owe you, and they didn’t know you had that expectation, the clock starts for the other person when you ask for it. The clock for you, however, starts when you first noticed the need.

It is also unfair, however, to ask at a bad time, when the other person doesn’t have the bandwidth to help, or when they’re upset about something else, or when they’re feeling overwhelmed by the relationship and genuinely need some space.

It is also unfair to play off as unimportant something that is important.

It is also unfair to make everything you need seem urgent and dire, like the boy who cried wolf.

All of these concerns are difficult to balance. They depend on the nature of the relationship – which is constantly evolving, even in lifelong relationships – and the personality of the person you’re asking, and even unforeseeable accidents of mood and timing.

It’s a lifelong skill, but one that is better with a vocabulary and tool-set for thinking about it, and with values to keep in the forefront of our minds as we navigate it. Hopefully, you found this blog post useful in your lifelong journey of being a person, of being an ever better friend, family member, partner, and fellow human.

What Bits Mean: Binary Integers and Two's Complement

2024-04-15T00:00:00+00:00

I was explaining two’s complement recently¹ to a friend, and I thought my explanation was decent, so I decided to write it up and share it with you, my general blog audience, as well! If you already know about two’s complement, this will pretty much just be a review. If not, you may learn something, and you may not understand all of it. Try to get what you can without getting too anxious, there will not be a test!

In either case, feel free to ask questions in the comments or nit-pick any mistakes you see!

Storing Numbers in Binary

So, without further ado, let’s talk about binary.

Computers store all information in binary, in terms of combinations of two values, conventionally called “zero” and “one.” It is easy to distinguish two values in a physical representation: the presence or absence of current on a wire, or of a radio signal; two areas magnetized in the same direction or opposite direction; a capacitor that is charged or not charged. Only having two possibilities for each storage location is just easier for computing circuitry to work with, and so all information in a computer is stored as patterns of two values.

All information in a computer is stored in binary, whether it be text, images, audio, video, scientific data, and even programs. Binary is meaningless without a convention for interpreting it. Today, we will talk about how numbers are stored in binary, specifically integers.

So how do we encode integers in binary? Let’s start out by assuming all the integers we might want to store are non-negative, and then we will discuss later how to accommodate negative numbers.

Computers encode integers similar to how humans do with a pen and paper, by using a positional number system. When we read a number like 357, we know that it contains 3 hundreds (3*10^2), 5 tens (3*10^1) and 7 ones (7*10^0), where ^ is used as the symbol for exponentiation.

Computers use a similar system, but with 2 playing the role of 10. Said another way, computers use base 2 instead of base 10. So, 1101 represents the number 13:

1101
1 * 2 ^ 3 + 1 * 2 ^ 2 + 0 * 2 ^ 1 + 1 * 2 ^ 0
1 * 8     + 1 * 4     + 0 * 2     + 1 * 1
8         + 4         + 0         + 1
13

Instead of (from right to left) seeing a ones place, tens place, and hundreds place, we have a ones place, a twos place, a fours place, and an eights place. It continues with increasing powers of 2 (again, right to left in the number): 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536…

When we write numbers in base 10, we write as many digits as it takes to store the number, and no more. 357 is a 3-digit number, but 5364 is a 4-digit number. But when computers store numbers in binary, they generally have a fixed amount of memory to work with, memory that is specifically designated to store this number. If the number can’t be stored in that many binary digits – that many bits – then, well, it can’t be stored in that amount of memory. Programmers in lower-level languages (like C++ or Rust) simply have to be careful that this situation doesn’t arise, either by ensuring no larger numbers will be stored, or else checking for situations where larger numbers would be stored, and either signalling an error or arranging an alternative storage arrangement.

Usually, a programming language would support 8-bit, 16-bit, 32-bit, or 64-bit numbers. For unsigned numbers, these correspond to the types uint8_t, uint16_t, uint32_t and uint64_t in C or C++, or u8, u16, u32 and u64 in Rust. A type, by the way, is a particular way of looking at a collection of bits in binary as a value, determining both a mode of interpretation and a size of how many bits are included in the value.

All the bits in the allocated space must be set to 0 or 1. So, if we want to store the number 13 in an 8-bit number, we have to use 0s for all the higher-order bits: 00001101 means the same as 1101, just as 000357 means the same thing as 357.

For the purposes of this document, however, let’s talk about 4-bit integers. 4 bits is half a byte or a nibble. Most processors do not have the capacity for directly interfacing with 4-bit integers – they only deal with 8 bits, or one byte, at a time. But you can still program with 4-bit numbers, you just have to do some extra work. And it makes it possible to show every possibility in this document. The lessons learned from it generalize to wider numbers.

Here are all the possible 4-bit integers and their decimal (base 10, normal human) equivalents:

Note that the maximum is one less than 2^4, which is 16. There are 16 possible combinations of bits, but the maximum value is 15, as one of the possible combinations is used to encode 0. All integer types support storing 0.

So, how do we do addition? It’s through a very similar process to how we do addition in base 10. We take the two numbers we want to add, and line them up bit by bit. Let’s add 7 and 3, which we can see from the table are 0111 and 0011. Let’s line them up in classic grade school fashion:

0111
0011

Then, we can start adding. We start from the right, just as we did when doing arithmetic in grade school. 1 + 1 is 2, which in base 2 is 10. So we write down the 0, and carry the 1:

OK, now 1 + 1, + the 1 we carried from the previous step, is 3, which in base 2 is 11. We write down the (right) 1 and carry the (left) 1:

Now, 1 + 0, + the 1 we carried from before, is 2, which is 10. Keep the 0, carry the 1:

Final step, 0 + 0 + the 1 we carried is 1:

This has 1 in the 8s place and 1 in the 2s place, and 0 in the other places. 8+2=10, which is good because we were adding 7 and 3, so 10 is the right answer. If we look in the table above, we see that 1010 indeed corresponds to 10, so we know we’ve done it right.

So that is how we add binary numbers.

Adding Numbers with Circuitry or Program Logic

The addition table, as we can see, is very simple, so it can be represented in circuitry. For each bit, we have three inputs: one bit each from the two numbers, and the carry from the previous bit. We have two outputs, the bit we’re keeping, and the carry to pass on to the next bit.

We can create a complete table for this:

BIT A | BIT B | INPUT CARRY | OUTPUT CARRY | OUTPUT
    0 |     0 |           0 |            0 |      0
    0 |     0 |           1 |            0 |      1
    0 |     1 |           0 |            0 |      1
    0 |     1 |           1 |            1 |      0
    1 |     0 |           0 |            0 |      1
    1 |     0 |           1 |            1 |      0
    1 |     1 |           0 |            1 |      0
    1 |     1 |           1 |            1 |      1

Whenever we have a complete table of a limited number of inputs and outputs, it can be converted to a circuit. If we did so, and then wire a bunch of these circuits together, we could create a hardware adder. I will not go into how to do this in detail, as that is out of the scope of this post, but you can see how it would be simpler than encoding an entire addition table for base 10 in logic gates, which would have 200 entries instead of 8 (addition table with and without carried 1s).

I will, however, show you how simple it is conceptually by writing a program to do it in Rust. In this program, the binary numbers are represented as slices of bools, or true/false values. (A slice is a region of memory with multiple values of the same type.) true corresponds to 1, and 0 corresponds to false. In this program, the slices start from the right (the least significant bit, the one’s place), and go to the left (the most significant bit, the 2^N place) – backwards from what you may be used to, but better suited for implementing math like addition.

fn add(a: &[bool], b: &[bool]) -> Result<Vec<bool>, Error> {
    // Keep track of carry
    // 'mut' means it can change
    let mut carry = false;

    // Place to store the result
    //
    // A `Vec` lets us store a varying number of values of the
    // same type. It's like a slice, but it can grow over time.
    // `Vec::new()` gives us an empty `Vec`.
    let mut res = Vec::new();

    // Make sure we have the same number of bits in each input
    if a.len() != b.len() {
        // We don't? That's an error!
        return Err(Error::MismatchedInputSizes);
    }

    // Go through each position
    for i in 0..a.len() {
        // Examine the corresponding input bits from each input nibble
        // Also examine the carry from the previous step.
        // The result will be the output and the new carry value.
        // 
        // This corresponds to a "full adder" circuit, and in a hardware
        // adder, one of these is used per bit.
        let (output, new_carry) = match (a[i], b[i], carry) {
            // We have a table of what possibilities there are
            // for these input bits, and what two outputs to generate.
            // 
            // In a circuit, this would be expressed through logic gates.
            // All 8 possible combinations of 3 inputs are enumerated.
            // 8 = 2 ^ 3
            //
            // This is a Rust representation of the table shown above.
            (false, false, false) => (false, false),
            (false, false, true) => (true, false),
            (false, true, false) => (true, false),
            (false, true, true) => (false, true),
            (true, false, false) => (true, false),
            (true, false, true) => (false, true),
            (true, true, false) => (false, true),
            (true, true, true) => (true, true),
        };
        carry = new_carry;

        // This bit is added to our result
        res.push(output);
    }

    // What if the result doesn't fit in the same number of bits?
    // Because the highest order bits have a carry
    // We'll write some guess temporarily for now, and
    // discuss carries in more detail later.
    if carry {
        panic!("error message"); // ??? or something?
    }

    // We have successfully obtained a result!
    Ok(res)
}

A full, runnable program is available on GitHub, as are the other examples from this post.

This program uses booleans as a stand-in for bits, with true standing in for 1 and false for 0. It contains the table we created above, but in the form of a match expression. It loops through the two values, from least-significant bit (rightmost, 1’s place) to most-significant bit (leftmost, 2^N place), bringing the carry output from each place and using it as input in the next operation.

Ironically, actually running this program will actually use many more than 4 bits for each number. It is designed to correspond conceptually with the details of adding a 4-bit number. In practice, we’d use the built-in u8 type and let the computer’s built-in addition circuitry do it for us.

Overflow

So, what happens if we have a carry on the last bit? If we’re adding two 4-bit numbers, and we’re storing the result in a 4-bit number, what happens if we add 10 and 10? The result won’t fit in a 4-bit number! You might assume (as my friend Ilse Purrenhage did) that there would be “an error message or something” (and therefore that’s what the Rust sample code does)!

Let’s see what happens in practice!

Here is a Rust program that overflows an 8-bit unsigned integer.

fn main() {
    let mut integer: u8 = 255;
    integer += 1;
    println!("{integer}");
}

Here is a C program:

#include <stdint.h>
#include <stdio.h>

int main() {
    // The highest 8-bit unsigned integer possible is 255,
    // or 2^8 - 1, the highest number you can represent
    // before you need a 2^8 or 256 place in binary.
    uint8_t integer = 255;

    // OK, so what happens if we add 1 to it?
    integer += 1;

    // Let's print it out and see!
    printf("%u\n", integer);
}

Let’s start with the Rust program, as this is a Rust-focused blog.

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin overflow
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.37s
     Running `target/debug/overflow`
thread 'main' panicked at 'attempt to add with overflow', overflow.rs:3:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: overflow::main
   4: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Ah, looks like my friend Ilse was right, as this definitely qualifies as “an error message or something.” It’s reasonable that Rust does this! There’s no way to store 256 in a u8, so attempting to should lead to an error.

But the experienced Rustaceans in the audience know there’s another shoe that’s about to drop. We’ve finished developing overflow, and we want to do a production release, so we run it in release mode, giving the compiler more time to work on making the program run fast, and –

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --release --bin overflow
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/overflow`
0

Yes, that is indeed a 0 that was printed. 255 + 1 no longer results in an error message. No, we now see that in debug mode, while we are still developing the project, it results in an error. But once we switch to release mode, we get a 0. And unfortunately, 0 is (checks notes) not the sum of 255 + 1. This is really putting the “or something” into “an error message or something!”

What is going on here? Alright, maybe this is one of the things they are complaining about when they say Rust is hard to learn. Let’s move on to C, the “easier” programming language –

[jim@palatinate:~/Writing/thecodedmessage-examples/src]$ cc -o overflow overflow.c
[jim@palatinate:~/Writing/thecodedmessage-examples/src]$ ./overflow
0

– in which it would also appear that, for 8-bit integers, 255 + 1 = 0.

So what exactly is going on? Well, let’s do out 255 + 1 in binary, using the good ol’ elementary school addition algorithm. We add 1 and 1, and get 10 (2), so carry the 1 and write down 0, which collides against the next 1, so carry the 1 and write down 0, until:

11111111  (carry bits)
 11111111 (255)
 00000001 (1)
---------
 00000000

After doing all 8 bits, we have 8 bits of 0 as output, and still a 1 being carried to the 9th bit (also known as bit 8), the 256s place. Of course, there is no 9th bit in the output, which is why we get an error message or something when we run this. See our original binary.rs program which does this algorithm out by hand using booleans:

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin binary 00000001 11111111
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.41s
     Running `target/debug/binary 00000001 11111111`
thread 'main' panicked at 'error message', src/binary.rs:64:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: binary::add
             at ./src/binary.rs:64:9
   3: binary::main
             at ./src/binary.rs:78:18
   4: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

This was, if you remember from our example above, triggered because the carry variable was still true after the entire loop!

// What if the result doesn't fit in the same number of bits?
// Because the highest order bits have a carry
if carry {
    panic!("error message"); // ??? or something?
}

But what if we just remove that code?

[jim@palatinate:~/Writing/thecodedmessage-examples]$ cargo run --bin binary 00000001 11111111
   Compiling thecodedmessage-examples v0.1.0 (/home/jim/Writing/thecodedmessage-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 0.39s
     Running `target/debug/binary 00000001 11111111`
00000000

If we just ignore the last carry bit, instead of doing anything at all, we see that 1 + 255 does indeed equal 0. It equals 0, carrying a 1 into the 256s place, but if we just ignore that that carry might happen, it just equals 0. And that is what C does. And if we are asking Rust to optimize for performance rather than debuggability, that is also what Rust does.

In fact, this is exactly the behavior that the C standard explicitly requires for unsigned values. Code can be written that relies on this behavior.

So what exactly is happening here? How can we understand 255 + 1 = 0? Well, we are ignoring a carry, and ignoring places above the 128ths place. This creates a special form of math, where instead of increasing, at a certain point numbers wrap around, like an old-fashioned odometer going from 999999 to 000000, or a 24-hour clock going from 23:59 to 00:00, or the day of the month going from 31 to 1.

Modular Arithmetic

This is known in mathematics as modular arithmetic, which is an important part of number theory, or the study of whole numbers and integers (as opposed to rationals and reals, where modular arithmetic would make little sense). In modular arithmetic, we treat numbers as if they are equal – some call it congruent – if they have the same remainder when divided by a number. The number we divide by can vary – it can be, for example, 2, 10, 24… or 256.

For example, 24-hour time works modulo 24. 23 (or 11PM) + 1 hour yields 0 (or midnight). 22 (or 10PM) + 4 hours yields 2 (or 2AM).

We can also do modular arithmetic modulo 10 by considering decimal numbers and only paying attention to the one’s place. 3 * 5 = 15, but we are only paying attention to the one’s place, so we say that 3 * 5 = 5 modulo 10.

One property of modular arithmetic is that negation works weirdly. When working modulo 10, 9 also serves as -1. They are equivalent modulo 10, because they are a multiple of 10 apart. It works functionally: 6+9=5 modulo 10, or 1+9=0 modulo 10. Adding 9 is the same as subtracting 1, so we can say that 9 is congruent to -1.

Well, in C, and in Rust with debugging turned off, arithmetic on a u8, or any unsigned integer type, wraps around modulo the maximum value plus 1. For 4-bit integers, this would be 1111 + 1 in binary (or 15 + 1 in decimal), or 16. For 8-bit integers, it’s 256.

This is a natural consequence of ignoring the final output of the carry bit. Just like ignoring digits above the 1s place yields modulo-10 arithmetic, ignoring bits above the 8s place yields modulo-16 arithmetic.

It does mean that, when we want to subtract 1 from a u8, we can instead add 0 - 1, which is 256 - 1, which is 255. When we want to subtract 2, we can instead add 0 - 2, aka 256 - 2, aka 254.

Two’s Complement

In fact, this is how computers implement subtraction in circuitry – and not just how we implement subtraction, but also negative numbers. To negate a number, instead of counting up from 0, we count down from 0, but with wrap-around.

Let’s return to our 4-bit example, as that’s easier to work with. Each combination of bits can be interpreted as a negative number, or a positive number.

Bit Pattern | As Positive | As Negative
0000        | 0           | -16
0001        | 1           | -15
0010        | 2           | -14
0011        | 3           | -13
0100        | 4           | -12
0101        | 5           | -11
0110        | 6           | -10
0111        | 7           | -9
1000        | 8           | -8
1001        | 9           | -7
1010        | 10          | -6
1011        | 11          | -5
1100        | 12          | -4
1101        | 13          | -3
1110        | 14          | -2
1111        | 15          | -1

And, of course, it just wraps around after: 1111 + 1 (15 + 1) = 0000 (with a carry beyond the highest order digit).

For most operations, it doesn’t matter whether we interpret the number as positive or negative. Addition, subtraction, multiplication, and division will all wrap around. The computer just cares about the pattern of bits, and applies the circuitry to it.

Of course, if we want to display the number to the user, it matters. We don’t want the user to enter -1 and the computer to randomly display 15 later (or 255 for 8-bit, or 65535 for 16-bit, or 4294967295 for 32-bit). Somehow, we need some mechanism of deciding when we want to interpret the number as -1, and when we want to interpret it as 15.

Similary, comparisons. -1 is less than 1. 15 is greater than 1. Sign matters. Where does it wrap around? For what N is N + 1 < N?

It’s an arbitrary cut off. And there’s two conventions.

For one of the conventions, unsigned integers, well, it’s always positive. The N for which N + 1 < N is 15 (for 4 bit arithmetic), as 15 + 1 wraps around to 0. 15 > 1. 0 < 1. Subtract 0 - 1 and display it? You get 15.

The following program displays 15:


#include <stdio.h>

int main() {
    struct {
        unsigned four_bits: 4;
    } bitfield;
    bitfield.four_bits = 0; // Starts out with 0000
    bitfield.four_bits -= 1; // Modify field by subtracting 1
    printf("%u\n", bitfield.four_bits); // Display on screen
}

See the word unsigned there in the declaration of four_bits? That’s the unsigned convention for arithmetic.

Well, the opposite of that is signed integers. For these, the numbers that are interpreted as negative are the numbers for which the highest-order bit is a 1. The cut-off N, for which N + 1 < N, is 7. 7 + 1 = -8. 7 + 1 < 7.

Here’s our original table with the standard signed interpretations left in:

Bit Pattern | As Positive | As Negative
0000        | 0           |
0001        | 1           |
0010        | 2           |
0011        | 3           |
0100        | 4           |
0101        | 5           |
0110        | 6           |
0111        | 7           |
1000        |             | -8
1001        |             | -7
1010        |             | -6
1011        |             | -5
1100        |             | -4
1101        |             | -3
1110        |             | -2
1111        |             | -1

For another way of looking at it, remember we discussed that these bits were coefficients to powers of two. 0101 is 5 because it’s 0 * 2^3 + 1 * 2^2 + 0 * 2^1 + 1 * 2^0. It has 1s in the 4s place and the 1s place, so it has 1 4, and 1 1, and nothing else.

Well, what if instead of that higher-order bit being the 8s place, it were the -8s place? The other bits remain positive in value. 8 and -8 are equivalent modulo 16, so this doesn’t actually change the meaning of the bit in the modular sense. But it does change the meaning of the > and < operations.

0101 is still 5, but it’s 0 * (-8) + 1 * 4 + 0 * 2 + 1 * 1 now. And 1111 is -1, because it’s 1 * (-8) + 1 * 4 + 1 * 2 + 1 * 1, or -8 + 4 + 2 + 1 or -8 + 7 or -1.

I’m not making this needlessly complicated! If it is needlessly complicated, it’s not me who’s making it that way! This is how computers actually work, because it makes sense that way, from a circuitry design perspective and from an engineering perspective.

OK, now for some fun facts!

This way of storing signed integers in computers is known as two’s complement.

Two’s complement negation can be computed by inverting all the bits and adding 1. This is opposed to 1’s complement, where we negate by inverting all the bits, and the circuitry is annoying and stupid. It is also opposed to sign/magnitude, where the top bit indicates sign by negating the whole rest of the number if it is 1, rather than subtracting a value (e.g. 8 in 4-bit integers, or 128 in 8-bits).

Almost all computers use two’s complement to store integers these days, for all the reasons discussed above. For non-integers, all bets are off, but sign-magnitude is popular for floating point numbers overall.

In two’s complement, -1 is always represented as all bits 1. Why? Well, for the same reason 10,000 - 1 is 9,999, and 100,000 - 1 is 99,999.

In fact, if we start by 1 followed by many zeros in any base, and we subtract 1, then we get the maximum digit of that base repeated. If we’re in base 8, 1000 - 1 = 777. If in base 16, then 100 - 1 = FF (in base 16, it is conventional to use the letters A-F to represent the digits for 10-15).

Why is this? Well, doing the subtraction out, right to left, we start with 10-1, with a borrowed 1. The value one less than the base is the greatest digit of that base: 1 less than 10 is 9. Then, to get that borrowed 1, we must go to the next digit to the left, and subtract that 1. But that digit is also a 0, so we must borrow a 1 even further out, so we get 10-1 again. This continues until we get an actual 1 and can no longer borrow.

Of course, when subtracting from 0 in a computer context, you can always borrow past the left-hand side. 0000 and 10000 are the same bottom four bits; they are congruent modulo 16. Borrowing a 1 from off the end is always an option.

Or, considered another way, if you add binary 1 to 1111, you will get 10000, for the same reason that if you add 1 to 9,999, you will get 10,000. That last bit falls off the end, though, so 1111 is just -1, because 1111 + 1 = 0.

This also means that to negate a number N, you can:

Invert all the bits. This gives you -1 - N.
Add 1. This gives you -N.

Let’s do this one step at a time.

Invert all the bits. This gives you -1 - N. Let’s figure out how to do -1 - N, and we’ll see it works out to inverting all the bits. So, -1 has all bits 1, so no bit will need to be borrowed. If you subtract a 1 (from the N) from the 1 (from -1), it becomes a 0. If you subtract a 0 (from the N) from the 1 (from the -1), it becomes a 1.

Okay, perhaps it’s better to demonstrate visually. This is -1-5, as a subtraction problem in binary:

  1111    (-1 aka 15)
- 0101    ( 5)
  ----
  1010    (-6 aka 10)

By subtracting 1111 - 0101, we got 1010, the bit-inversion of 0101. Said another way, by inverting all the bits in 0101, we got 1111 - 0101. Said another way, by inverting all the bits in 5, we got -6.

Add 1. This gives you -N.

And that is what computers do with signed integers when you have an integer n and write let negate_n = -n; – the computer provides circuitry internally that lets you invert all the bits and then add one.

So, now we know how to represent numbers as signed and unsigned. But we also treat them as equivalent. That’s confusing. So what’s up with that?

For some operations, this representation makes it so you don’t need to care about signed and unsigned. These include addition, subtraction, and multiplication. As long as you assume that wrapping-around doesn’t happen (which is the faster, more efficient implementation in circuitry), these operations literally do the same thing in signed and unsigned. One person’s overflow is another person’s subtraction, but as long as we’re OK with overflow, subtraction is cool too.

Ironically, negation goes in this category. It doesn’t need to care which numbers you interpret as positive or negative to give you a number -N that, when you add it, undoes the effect of adding N.

However, if you want to do checked versions of addition, subtraction, and multiplication, where the program notices when adding two positive numbers results in a number smaller than both, and causes a trap, a stop in the program’s normal behavior, “an error message or something,” then how that check works differs in signed and unsigned.

This check would involve a version of the addition operation that actually used the carry output from the last bit. But there’s two interpretations of that. Should -1 + 1 trigger? In unsigned arithmetic, where it’s actually 15 + 1 (or 255 + 1, or 65,535 + 1), it probably should, because wrapping around to 0 is a trap. But in signed arithmetic, -1 + 1 is not an overflow at all.

So, in unsigned arithmetic, the normal carry output of the leftmost (most significant) adder circuit can be used. Given the leftmost input bit of the first argument, the leftmost input bit of the second argument, and the carry from the next adder circuit to the right (the second-most-significant bit, if you will), if 2 or more of those bits are 1, well, then, signal an overflow.

For signed arithmetic, it’s about how the sign bits line up in the input and the output.

Fun aside: The sign bit is another name in two’s complement for the most significant bit, as it is 1 if and only if the number is negative. Some overzealous teachers will say it’s not a sign bit because its significance can be ignored in many operations, but that’s even more obnoxiously pedantic than saying that there’s no such thing as centrifugal force, and more inaccurate. Every processor designer (including Intel and ARM) refer to this bit as the sign bit, which makes sense because it tells you what sign the number is.

If the two input sign bits are the same in an addition, and the result bit is different, then you have a signed overflow. For example, 1111+1111=1110 is fine, as that’s -1 + -1 = -2. Similarly, 1111+0010=0001 is fine, as that’s -1 + 2 = 1. But 1000+1000=0000 is weird, as that’s -8 + -8 = 0. The two input sign bits are both 1 in an addition, and the output sign bit is 1 – suspicious.

So the situation where unsigned integers wrap around in defiance of normal integer math is known as “carry” on Intel, and is indicated by a “carry flag,” which can be checked to output an error message or something. Similarly, the situation where signed integers wrap around is known on Intel as “overflow,” which is indicated by an overflow flag.

The other flag is always meaningless. Which flags you check on Intel is an indication of whether you are doing unsigned or signed (i.e. two’s complement) arithmetic. And, since operations like less than and greater than are also implemented on Intel by checking flags, literally the only difference on Intel between signed and unsigned arithmetic is what flags you check.

Fun fact: Comparisons on Intel are done by the cmp instruction, which does a subtraction, throws away the result, and sets the flags accordingly. The flags can then be inspected to determine which input was greater or less, or less-or-equal or greater-or-equal, with either signed or unsigned semantics. All the same flags are set with a normal subtraction.

By the way, every time you add or subtract on Intel, both carry and overflow flags are set accordingly. It’s easier in circuitry to just do both, the operations are so similar.

You can read more on Intel’s flags here.

Summary

Collections of bits can be used to store integers, using base 2. Addition, subtraction, and multiplication are implemented in such a way that wraps around, and any number can have a positive or negative interpretation. This weird type of math is called modular arithmetic.

If we interpret all the numbers as positive, then we are doing unsigned arithmetic. Programming languages represent this by referring to unsigned integers, but to the processor, they’re all just integers. The question is just what kind of arithmetic we’ll do. With this interpretation, adding two numbers and getting a smaller one is overflow, and subtracting from a number and making it bigger is called underflow. This may or may not be detected, but if it is, it is done on Intel by checking the carry flag. Besides that, it will happily do the arithmetic in a modular way. Less than and greater than are also evaluated via the carry bit.

If we interpret some of the numbers as negative, it’s based on the top-most bit, the sign bit. This is known as signed arithmetic, and programming languages will use this for signed integers. Numbers are still interpreted based on their negative meaning in modular arithmetic. In this interpretation, adding two negative numbers and getting a positive, or adding two positive numbers and getting a negative, is known as overflow, and it shows up in the overflow flag.

In either case, Intel processors perform addition, subtraction, multiplication, and negation the exact same way for signed and unsigned arithmetic. The only difference is the flag, and Intel will always do the work to set or clear both flags appropriately. To distinguish between signed and unsigned arithmetic, programming languages check the specific flag they’re interested in for appropriate operations, like less than and greater than operations.

Two’s complement is not to be confused with “twos compliment” (drawing by my friend Ilse Purrenhage):

This word is relative. ↩︎

Sorting Polymorphically in Many Languages

2024-02-05T00:00:00+00:00

Polymorphism is a powerful programming language feature. In polymorphism, we have generic functions that don’t know exactly what type of data they will be operating on. Often, the data types won’t even all have been designed yet when the generic function is written. The generic function provides the general outline of the work, but the details of some parts of the work, some specific operations, must be tailored to the specific types being used. The generic code needs some way of accessing these specific operations, and the users of the generic code need some way of specifying them.

There are many use cases for polymorphism. When sorting an array, the algorithm will need to be adapted to the specific element type, so it knows how to compare elements. When drawing virtual objects on a screen, an algorithm might choose where to put each object and which objects to draw, whereas each type of object might have its own specialized implementation of how to draw it.

These are just two examples among many. Most complicated projects have many polymorphic functions. Even in languages that don’t support polymorphism directly, there are usually ways of building it out of existing primitives.

The example I’ve chosen is sorting, specifically sorting an array or vector. It’s just an example; a lot of what I say applies generally to how polymorphism works in that programming language.

This is a good example, as sorting is a function where it’s really obvious where polymorphism is required to get a properly generalizable algorithm. A lot of discussions of polymorphism invent contrived situations where polymorphism seems overkill, and I think that’s fundamentally confusing.

On the other hand, it’s a bad example in some ways, because it only makes sense in the context of a homogeneous array or list, where every element is the same type. This is a bad example because heterogeneous containers, where every element has a different type and the polymorphic function has to look up as many function implementations as there are elements, provides a very different set of problems to solve.

This is especially important as Rust and C++ both provide two types of polymorphism, compile-time and run-time, also known as static and dynamic. The question of which to use is complicated, but for sorting, compile-time or static polymorphism is clearly the appropriate choice, with run-time or dynamic polymorphism feeling very awkward and forced. Heterogeneous containers generally must use some form of dynamic polymorphism (whether through virtual functions in C++ or through type erasure).

So, while I think this example will be illustrative, it won’t allow us to explore run-time, dynamic polymorphism on its home turf, if you will. Hopefully, I can make up this deficit in future blog posts.

Sorting: A Polymorphic Function

Sorting algorithms are a true use case for polymorphism: rather than distinguishing between a small set of options, many types support the operations necessary for sorting. The algorithm is agnostic to the implementation of those operations. Quick sort, insertion sort, and merge sort apply equally well to sorting integers, floating point values, or alphabetizing strings – any algorithm can be combined freely with any type, or at least any type for which a concept of “ordering” exists.

Here are the operations or properties (or dare I say, traits) that a type needs to be sortable, and that a generic sorting algorithm might need to find out about. The first one is obvious to OOP programmers, but the other two more subtle, and implied in many OOP programming languages:

Ordering or comparison: Given two values a and b, this operation answers which is greater, or determines that they are equal. Some types have the additional possibility that they are incomparable – arrays of those types cannot be sorted by most algorithms.
Swapping or moving: The data has to be able to be moved around to turn the unsorted array into a sorted one. This is automatic in many OOP languages for object types due to ubiquitous use of indirection. It is also automatic in Rust, where every type can be moved by just copying all the bytes.
Striding the array or size: Given a pointer to one element, how do you get to the next one? By how many bytes must you increment the pointer? Most sorting algorithms require this to be constant. If you use indirection for the values, this is also trivial. If you do not, it is key information.

These operations – or more generally, traits of a type – can then be combined with a sorting algorithm to create a concrete procedure to sort an array for a given concrete type.

So let’s see how various programming languages handle this.

Programming Language #0: Sorting in C

I will start our tour of programming languages with C. C – the non-OOP, non-C++ programming language; the classic “portable assembly language” from 1972 – doesn’t have many polymorphic algorithms, algorithms that accept any type, because you have to implement polymorphism by hand. But sorting is an important enough one that standard C does have a generic sorting function: qsort for quicksort (and on many systems, heapsort and mergesort` are also avaialble). Because polymorphism is implemented by hand, we can look at this function to see how one might specifically tailor polymorphism to the problem of sorting.

Here is the function signature for qsort:

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

It can be used to sort blocks of memory containing a sequence of integers, foating point values, or (pointers to) strings – any comparable and (trivially) movable fixed-size type.

C function signatures can be hard to read, so I’ll break it down argument by argument:

void *base: This is an untyped pointer (void *) to the beginning of the block of memory to be sorted.
size_t nmemb: This is a bound, how much memory is contained in the block of memory. C often represents aggregates by two values, base and a count of the members.
size_t size: How big is each member? On a typical 64-bit system, an int is 4 bytes, a double is 8, and char * for strings are 8 bytes. Custom types might be any size. qsort should work for all of these types, without indirection.
int (*compar)(const void *, const void *): This is the interesting part. This is a function pointer for the comparison operation as discussed above. You write a function that takes two pointers to two elements, and returns a value that encodes their relationship.

Swapping is assumed to be byte-by-byte, and so size covers the last two attributes of the type listed above. The key one here is compar, a bit of code that qsort has to call to do an operation specific to your type, a small policy injection that adapts a generic algorithm to your particular type.

The return value of compar is an int, but it is interpreted according to a C convention, shared with (for example) the string comparison function strcmp. For a ? b, a return value r is interpreted thus:

if r < 0, a < b
if r > 0, a > b
if r == 0, a == b

So, here’s a complete C program that sorts its command line arguments – including the program name:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare_strings(const void *a, const void *b) {
    // `a` and `b` are pointers to the element type, which in
    // this case is `char *`. Thus they are `char **`.
    //
    // Nothing is stopping you from getting this wrong and putting
    // `char *` instead -- it will just silently not work. The
    // compiler can and will make you write `const` in the right
    // place, though.

    char * const* a_str_ptr = a;
    char * const* b_str_ptr = b;

    // `strcmp` uses the same convention as `qsort` for comparison.
    return strcmp(*a_str_ptr, *b_str_ptr);
}

int main(int argc, char **argv) {
    qsort(argv, argc, sizeof(char *), &compare_strings);

    for (int i = 0; i < argc; i++) {
        printf("%s\n", argv[i]);
    }

    return 0;
}

But the same qsort function can also be used to sort integers, if given different parameters and a different comparison function:

#include <stdlib.h>
#include <stdio.h>

int compare_ints(const void *a_vp, const void *b_vp) {
    const int *a_ip = a_vp;
    const int *b_ip = b_vp;

    int a = *a_ip;
    int b = *b_ip;

    if (a < b) {
        return -1;
    } else if (a == b) {
        return 0;
    } else { // a > b
        return 1;
    }
}

int main(int argc, char **argv) {
    int intary[10] = { 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
    qsort(intary, 10, sizeof(int), &compare_ints);

    for (int i = 0; i < 10; i++) {
        printf("%d\n", intary[i]);
    }

    return 0;
}

qsort implements a form of manual run-time polymorphism, in a programming language with no built-in support for polymorphism. It behaves differently based on the element type, as passed to it via a variety of arguments. One of the traits – the comparison operator – differs between types in a way that requires custom code, and this is passed in via pointer. qsort then invokes the operation via indirect function call, the same mechanism that is used for polymorphism in OOP. But unlike OOP-style runtime polymorphism, there is just one function pointer for all the items, rather than each item coming with its own “vtable.”

Note that the optimizer is not able to eliminate this indirect call, especially in the qsort example, where the sorting function is in the standard library, whereas the function calling it and the comparison function are both in application code. This comes at a performance cost, which means that if you’re programming C and the performance of this particular sort is essential to your program, it might easily make sense to write custom sorting code that is not polymorphic.

Programming Language #1: Sorting in Java

Java is about as far from C as you can get in this matter. C provides no abstraction or language features specifically for polymorphism, and in qsort we use a low-level tool it does provide – function pointers – to build it ourselves. In Java, however, the programming language is explicitly object-oriented, and so the whole programming language is designed to encourage you to leverage polymorphism, as that is one of the pillars of object-oriented programming.

The version of polymorphism available in Java is dynamic, run-time, “late binding” polymorphism, the type of polymorphism that OOP favors. It is based off of the idea of overriding methods, either from base classes, or interfaces that a custom type (a “class”) can implement.

As I mentioned before, this is not the best match for the problem of sorting, at least not the type of sorting we’re talking about. Run-time polymorphism means that every individual element could potentially have a different comparison procedure, which is unlikely. The possibility of such a thing happen increases the cognitive load.

Nevertheless, Java does support polymorphic sorting, and it’s useful to discuss specifically because it does show how OOP-style polymorphism works when applied to such a problem.

There are many methods that do sorting in Java. Some of them take an explicit argument to convey how to do comparisons, just like the qsort example. But more commonly, we sort according to what Java refers to as the “natural order” of the elements, as (for example) in this overload of Collections.sort, with the following signature:

public static <T extends Comparable<? super T>>
void sort(List<T> list)

This sorts a list of elements of type T, where “list” in Java can refer to any of a number of collections that store data in order, such as in a single allocated array (ArrayList) or a linked list (LinkedList). Therefore, it is not only polymorphic in how to compare the elements, but also in how to navigate through the list.

It needs to know about the same traits of type T that qsort does. Some are not polymorphic: for this method to make sense, we know that T must be a reference type, that it must be boxed (that is, it must use indirection), and that therefore the size of an element is always the natural pointer size of the platform, and swapping the element only involves swapping the pointers.

But there’s no getting around the polymorphism of comparisons, and so we see this strange annotation on the function signature: <T extends Comparable<? super T>>. This indicates that T must implement the interface Comparable – implement in this context is called extends. Specifically, it must implement that interface in such a way that it can be applied to other elements of type T (which means that it uses T or some “supertype” of T).

The notation is complicated, because the semantics are complicated. Technically, T could be comparable to a parent type of T, and that would still work. In fact, T could refer to an entire class hierarchy of types derived from some base class, all of them comparable in different ways to objects elsewhere in the hierarchy and to objects derived from a yet further base class. Objects of type T could even be comparable to any arbitrary object – and all of this is covered in <T extends Comparable<? superT>>, trying to express at compile-time what will cause the type T to be a reasonable type to use for sorting.

But this is all just an extra check that the compiler can do at compile-time to prevent run-time errors, because all of the information on how to do the comparisons is available at run-time. In fact, other methods don’t use such formal prerequisites at all, preferring to query at run-time for appropriate interfaces, throwing an exception if they are not present.

In all of these cases, the comparison is the “natural ordering,” which is defined to mean that comparison is done through a Java interface. Specifically, these methods use the Comparable interface, which specifies a method, compareTo, which must take an implicit this parameter and an explicit parameter of the type being compared to, and, like the comparison functions in qsort, must then return an integer whose sign indicates whether the first value was greater or the second (with zero indicating equality).

This natural ordering is defined on a per-type basis. Each type can only implement Comparable once. Fortunately, the regular built-in types, all the ones we are likely to use, all come with good natural orderings. For example, this code all works:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        List<String> argList = Arrays.asList(args);
        Collections.sort(argList);
        for (String arg : argList) {
            System.out.println(arg);
        }

        List<Integer> list = new ArrayList<Integer>();
        list.add(1);
        list.add(3);
        list.add(2);
        list.add(4);
        Collections.sort(list);
        for (int i : list) {
            System.out.println(i);
        }
    }
}

See it in use:

$ java Sort b c a
a
b
c
1
2
3
4
$

It gets a little less coherent when we mix different types of object in the same list, which Java lets us represent in the type system by using Object, which is a type that can store a reference to any non-primitive (including boxed primitives):

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        List<Object> list = new ArrayList<Object>();
        list.add(1);
        list.add("Hi");

        Collections.sort(list);

        for (Object i : list) {
            System.out.println(i);
        }
    }
}

While the Java runtime allows us to create such a collection, the type system does not allow us to use Collections.sort to sort it, as Object does not provide us enough information to make sure these elements properly can be compared to each other (which in fact, they cannot, as comparing strings to integers is not defined in Java’s “natural ordering”):

$ javac Sort.java
Sort.java:9: error: no suitable method found for sort(List<Object>)
        Collections.sort(list);
                   ^
    method Collections.<T#1>sort(List<T#1>) is not applicable
      (inference variable T#1 has incompatible bounds
        equality constraints: Object
        lower bounds: Comparable<? super T#1>)
    method Collections.<T#2>sort(List<T#2>,Comparator<? super T#2>) is not applicable
      (cannot infer type-variable(s) T#2
        (actual and formal argument lists differ in length))
  where T#1,T#2 are type-variables:
    T#1 extends Comparable<? super T#1> declared in method <T#1>sort(List<T#1>)
    T#2 extends Object declared in method <T#2>sort(List<T#2>,Comparator<? super T#2>)
1 error
$

So how does this work? What is a Java interface? What are its advantages or disadvantages?

Well, Java has two types of values: primitives on the one hand, and object references on the other. In order to use interfaces, or polymorphism at all, we must be dealing with objects. For primitives, there are separate methods for sorting various types of arrays in the Arrays class. As primitives cannot be stored directly in collections, Collections doesn’t have to deal with them.

So, to use this polymorphism through interfaces, we must be dealing with objects. Objects in Java are a rich, standardized data structure, which is why it’s possible to query at run-time which interfaces an object supports. Objects contain not just the fields that the Java programmer specifies, but additional metadata that includes implementations of any supported interfaces, including Comparable. That metadata can be used to find the right version of the compareTo method to use to sort objects of type T. Once we have a T, we can query it at run-time to find the compareTo method. Theoretically, Java might query every object separately as it sorts, with a separate query for each comparison, although I trust that modern Java will in many cases realize that the method will be the same for each object, and figure out a way to optimize it out.

As a programmer of a type, we simply declare at the top of the class that our type Foo, for example, implements Comparable<Foo>, and then lower down include our implementation of compareTo among our methods with the override keyword. Based on that, Foo objects will be created with the correct metadata such that Java will know to use that method for comparison when sorting, whether the type is known at compile-time or at run-time. We can implement our own version of compareTo that has a different type than the typical “natural ordering” one would expect from the state that is contained in a Foo:

import java.util.*;

public class Sort {
    private static class Foo implements Comparable<Foo> {
        int inner;

        public Foo(int inner) {
            this.inner = inner;
        }

        @Override public int compareTo(Foo foo) {
            // Less and greater are swapped by this compared to int
            // comparison
            if (foo.inner > this.inner) {
                return 1;
            } else if (foo.inner < this.inner) {
                return -1;
            } else {
                return 0;
            }
        }

        public String toString() {
            return "" + inner;
        }
    }

    public static void main(String[] args) {
        List<Foo> list = new ArrayList<Foo>();
        list.add(new Foo(3));
        list.add(new Foo(4));
        list.add(new Foo(1));
        list.add(new Foo(2));

        Collections.sort(list);

        for (Object i : list) {
            System.out.println(i);
        }
    }
}

Here is the output:

$ java Sort
4
3
2
1
$

Built-in types such as String and Integer already provide their own compareTo override methods, corresponding to more typical implementations of comparisons. Only the author of each type can provide information on how the types are to be compared in this way. To get around this, you can use a wrapper type for each element (like Foo), or you have to fall back on passing in the comparison function the old-fashioned way, like in qsort – though in Java passing in a function is accomplished here through yet another interface, Comparator, as in this alternative function:

public static <T> void sort(List<T> list,
                            Comparator<? super T> c)

Here, Comparator is effectively a function pointer with context, but it’s expressed as an interface so that you can write a concrete class that implements the desired function. Fundamentally, Rust and C++ do something similar.

So, how are we to evaluate this system? It’s not particularly designed for situations like sorting. The run-time system is built for the heterogeneous containers, where each individual element of a collection might have a different opinion on how to compare itself to the others. The amount of run-time flexibility is overkill to the situation.

Rather than providing one sorting function pointer, as in the C example, each object comes with its own infrastructure for finding out how to not only sort, but do every other thing that Java might want to do polymorphically with that object, such as convert it to a string, or hash. While the infrastructure is well-optimized and performant for the assumption of heavy use of OOP-style polymorphism, it clearly doesn’t hold to the C++ or Rust performance ideals of not paying for what you don’t use, instead opting to pay an up-front cost under the assumption that any and all objects will regularly be used polymorphically, in OOP style.

The type system in Java is conceptualized as a way of preventing errors, a layer of safety on top of a more Smalltalk-like natural OOP state. In Smalltalk any method can be invoked on any object, and it’s simply a run-time error if that method isn’t available. In Java, the types form a more rigorous layer to check to make sure our method calls have correct semantics, allowing errors to be caught earlier, at compile-time (although Java type errors are also sometimes caught at run-time). The power of the more ideologically pure form of OOP is still available in Java, as evidenced by the signature on the Arrays.sort method alluded to above (and documented here. It is deprecated, but still possible:

public static void sort(Object[] a)

Here is a use case that succeeds:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        Arrays.sort(args);
        for (String arg : args) {
            System.out.println(arg);
        }
    }
}

Here is the output:

$ java Sort a c b
a
b
c
$

Here is a use that fails:

import java.util.*;

public class Sort {
    public static void main(String[] args) {
        Object [] array = new Object[2];
        array[0] = new Integer(0);
        array[1] = "Hi";
        Arrays.sort(array);
        for (Object obj : array) {
            System.out.println(obj);
        }
    }
}

It outputs:

Exception in thread "main" java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
	at java.base/java.lang.String.compareTo(String.java:125)
	at java.base/java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
	at java.base/java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
	at java.base/java.util.Arrays.sort(Arrays.java:1249)
	at Sort.main(Sort.java:8)

The cost of this is acceptable in Java but not in Rust or C++, or C for that matter. Every object must contain individual metadata if it is to be sortable through a polymorphic function, and it must be boxed. In C++ or Rust, we must be able to sort arbitrary unboxed data, without extra metadata included directly within it. But in Java, all types except for primitives are boxed, only boxed types support polymorphism, and they do so at the cost of additional data in each heap allocation to do so. And it works, for Java’s goals, of being a garbage-collected OOP language with a layer of types to expose errors at compile-time.

As the C example shows, this cost isn’t intrinsic to run-time polymorphism in general, but it is intrinsic to OOP-style polymorphism. OOP uses run-time polymorphism at an individual object level as one of its core features, even when the function does not need to be conveyed on a per-element basis, but only once.

Programming Language #2: Sorting in C++

C++, of course, supports this type of run-time polymorphism. We could, if we wanted, build a system like Java’s, where we had an abstract class Comparable that we could use to add run-time data to show every object of a type how to be compared with every other object. We could require that collections to be sorted contain classes that inherit from – in C++, inheritance and interface implementation are the same – Comparable. C++’s run-time polymorphism could be used to implement sorting in the exact same way as Java.

But that’s not how sorting is implemented in C++. Sorting, in C++, uses a completely unrelated mechanism of templates. Templates are C++’s mechanism for static, compile-time polymorphism, just as virtual functions and inheritance are C++’s mechanism for dynamic, run-time polymorphism (of a classical OOP variety that closely resembles Java). In spite of them both being forms of polymorphism, and having many overlapping use cases, templates and virtual functions are completely unrelated features.

I have seen people argue that templates and virtual functions are justified in being completely unrelated, because every situation clearly calls for one or the other. But if it’s possible to do sorting with run-time polymorphism, as we see from Java, then clearly the distinction is not clear-cut as all that. What’s to stop a former Java programmer from using C++’s run-time polymorphism to implement their own sorting function a la Java, even though that’s not idiomatic C++? There’s clearly some level of overlap in use cases, even if not in semantics!

So, how do templates actually work?

Caveat for modern C++ fans: I’m going to save concepts for the end. They don’t actually substantially affect my point (as I will explain). I think it’s simpler to talk about pre-concepts C++ at first, and then discuss how concepts impact (or rather, don’t really impact) the equation.

Templates are a form of macro system. A template (class template, function template, type alias template, etc.) is given parameters at compile-time. Once the template is given parameters, it is instantiated and stamps out a concrete component of the program (a class, function, type alias, etc.).

So, that’s quite abstract. This is a situation where an example can help a lot. In line with our theme, we’re going to write a template that involves comparisons: given two values of any type that you can compare (and we’ll have to decide what that means), which is bigger?

template <typename T>
T max_value(T a, T b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

When we actually invoke it, we provide a type for T, giving us a specialized function where T is replaced by that type.

std::cout << max_value<int>(3, 4) << std::endl;
std::cout << max_value<std::string>("hi"s, "hello"s) << std::endl;

The mere mention of max_value<int> creates a function max_value<int>, and likewise for max_value<std::string>. This function is the template, with the template parameter in brackets standing in for T.

Of course, for function templates, specifying the T is optional, as C++ can infer it, so this code works equally well:

std::cout << max_value(3, 4) << std::endl;
std::cout << max_value("hi"s, "hello"s) << std::endl;

So, what are the resulting functions? It’s very similar to as if we had written:

int max_value(int a, int b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

std::string max_value(std::string a, std::string b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

These are separate functions. The compiler will simply generate as many separate versions of max_value as it needs to. It outputs separate assembly language for each of them, and treats them as function overloads, meaning that it uses the static (compile-time) type of the parameters to figure out which function to call.

So, from the perspective of someone reading the code, we call max_value twice, and it figures out how to do its thing on an int or a std::string. It’s polymorphic, as it does the same algorithm (finding max) with an operation that changes based on type (<). But from the perspective of someone reading the outputted assembly, it’s not polymorphic – we’ve simply got two different functions that do max_value in two different ways.

In other words, we’ve gone from polymorphic code (compile time) to monomorphic code (run time). This is why Rust calls its equivalent to template instantiation “monomorphization.” This is also why it’s called “compile time polymorphism” – it is no longer polymorphic at run-time.

The advantage: This is a zero-overhead abstraction. We’re having the compiler write, on our behalf, specialized code for each type. We do not need each element to have virtual function metadata to indicate how to do comparisons, nor do we even need a function pointer like with qsort. It’s as optimal as specialized hand-written code, but we didn’t have to do the specialization.

The disadvantage: We have to know the type at compile-time. This prevents heterogeneous containers from being possible with this style of polymorphism. This type of polymorphism can only be based off of the compile-time type, not based off of changing run-time types. It is the exact opposite of “late binding” – the binding is done at compile-time. So, this could not be used for polymorphism over different types of widgets in a list of widgets.

The other disadvantage: Compile times take longer and the resultant binary is larger. (Eh, shrug.)

So what operations are needed to support this template? What definition are we using for “comparable type” for T? We’re not explicitly using any at all, but note that if the type T doesn’t support the < operator, this code will simply fail to compile:

class Foo {
};

max_value(Foo{}, Foo{});

Giving the error:

test.cpp: In instantiation of ‘T max_value(T, T) [with T = main()::Foo]’:
test.cpp:22:14:   required from here
test.cpp:8:11: error: no match for ‘operator<’ (operand types are ‘main()::Foo’ and ‘main()::Foo’)
    8 |     if (a < b) {
      |         ~~^~~

This goes away if we give it the < operator.

class Foo {
public:
    bool operator <(const Foo &other) const {
        return false; // All Foos are created equal!
    }
}

max_value(Foo{}, Foo{}); // Now compiles

If we’d written max_value differently, however, using > instead, this might not have made the error message go away. It turns out that < is the conventional operator to use for comparisons, however, the C++ equivalent to Java’s Comparable, the defining function for “natural order” by convention.

Is that all that’s required to make max_value work? It turns out no, as many an astute C++ programmer has probably already noticed. There is another operation besides operator< required to make max_value work, and this is because I intentionally made a mistake (so I could reveal it later to show how subtle templates can be).

Let’s take a look at the instantiation for std::string again, just the signature:

std::string max_value(std::string a, std::string b);

Is that how we’d write max_value by hand for std::string? No, we wouldn’t. We’d write const std::string &a, and take it by reference, so that no new objects are initialized in the comparison and return. If you’re not a C++ programmer, this might seem shocking, but max_value as we wrote it requires the type to be passable by value, which is a capability that a type might not have:

class Foo {
public:
    Foo() = default;
    Foo(const Foo&) = delete;
    bool operator <(const Foo& other) const {
        return false; // All Foos are created equal
    }
};

max_value(Foo{}, Foo{}); // Error! Error!

So, we missed the mark, quite by accident! We had an extra requirement besides comparison, and we can fix that by taking the value by (const) reference (which is what std::max does anyway), which also implies returning by reference:

template <typename T>
const T &max_value(const T &a, const T &b) {
    if (a < b) {
        return b;
    } else {
        return a;
    }
}

So what was required from T for us to call max_value?

In one sense, nothing besides that it should be a type! We could pass any type in for T, and the compiler will plug in the type and chug away, running into errors only once it has attempted to do so! This might actually happen several template instantiations deep, and the resulting error shows up in the template where the operation is attempted, not in where you use the template with an inappropriate type, which can be confusing.

In another sense, what is required is that we pass types that make max_value compile, so in this case, ones that support operator <. However, there is no guarantee or check that the type is making the semantic promises that correspond to that type. Sorting, for example, requires that that operator work in such a way as to define a strict equivalence class. If that operator doesn’t in fact do that, std::sort will compile but won’t work properly.

It seems reasonable in this case to expect people to use operator < for less-than as it’s such a well-established and fundamental operator. But templates can also invoke named methods. What if somebody writes a template that calls some_t.foo() expecting it to do one thing, and someone calls that template with an unrelated class that has a type-compatible foo method, but with different semantics? There is no indication to the compiler, when you write the class, that you intend for foo to be appropriate for use in the template. We didn’t have to say, when we wrote Foo here, that our operator < was valid for std::sort.

Concepts do help with that. You can statically assert that a class supports a concept’s requirements, and that documents your intention to support it semantically as well. Concepts can also cover stricter requirements than a template incidentally imposes, and help document the semantics of templates.

But everything about concepts is opt-in; you can always write a template that will sometimes fail on instantiation. And that makes them much less useful in my book. Don’t get me wrong: I’m glad they exist. I think C++ with concepts is better than C++ without concepts. But it only goes so far, especially when compared with Rust traits, which are mandatory for Rust’s form of compile-time polymorphism.

More relevant than all of this, to me, is that templates and OOP work so differently than each other. Run-time polymorphism and compile-time polymorphism are just completely different beasts. Students are taught the OOP style run-time polymorphism, and that doesn’t really help them understand templates, or even get started doing so. Again, I feel C++ is too big.

But, at least it has this zero-overhead abstraction, without requiring a method look-up and an indirection for every item to be sorted.

std::sort, by the way, takes iterators. These iterators must be value swappable legacy random access iterators, and that’s just a subset of the requirements, as seen in std::sort’s CPPReference page. The way to get from one element to another (and therefore implicitly the size), the way to swap elements, and the way to compare them are all implicitly derived from RandomIt, the type parameter specifying the type of the iterator (at least in the overloads of std::sort that do not take an explicit comparator).

Programming Language #3: Sorting in Haskell

Now for Haskell!

We’re mostly talking about Haskell to move on to talking about Rust, as this is a Rust-focused blog. There’s a lot going on with Haskell typeclasses that I won’t have time to get into here.

Haskell is where Rust got traits from, although Haskell calls them typeclasses. Incidentally, Haskell uses run-time polymorphism where Rust uses compile-time polymorphism, but the semantics are more similar than you might expect from that statement.

In Haskell, like Java, all types that sort accepts are boxed, covering size and swapping among the traits that might need to be customized. Unlike Java, the operations we need to perform on values of this type are passed to sort once, rather than looked up on a per-element basis.

Here is the type for sort:

sort :: Ord a => [a] -> [a]

a here is like T in C++: a type variable that can be replaced with any type. As in Java, this is subject to type erasure: sort just operates on generic boxed values. Any comparison-specific operations it needs come from the Ord a =>, which constrains a to types that have instances of the Ord typeclass.

Here is the definition of Ord:

class  (Eq a) => Ord a  where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

    compare x y = if x == y then EQ
                  -- NB: must be '<=' not '<' to validate the
                  -- above claim about the minimal things that
                  -- can be defined for an instance of Ord:
                  else if x <= y then LT
                  else GT

    x <  y = case compare x y of { LT -> True;  _ -> False }
    x <= y = case compare x y of { GT -> False; _ -> True }
    x >  y = case compare x y of { GT -> True;  _ -> False }
    x >= y = case compare x y of { LT -> False; _ -> True }

        -- These two default methods use '<=' rather than 'compare'
        -- because the latter is often more expensive
    max x y = if x <= y then y else x
    min x y = if x <= y then x else y

It defines many methods that an instance of Ord can support. These methods are functions defined in terms of each other; you must specifically implement at least one of them for your type to prevent infinite regress. Minimally, either compare or <= is sufficient, with compare recommended for more complex types.

Unlike in C++, when you define these methods, it is not enough to simply define a function called <= or compare. Haskell won’t even let you define functions with the same fully qualified name as the methods, which exist in the same namespace as any other functions. Unlike C++, Haskell does not have function overloading, and any time the same fully qualified name has different semantics for different types, it is through this mechanism of typeclasses. Like in Java, you have to explicitly declare your intention to implement the methods as found in Ord, by writing an instance explicitly, like so:

import Data.Ord
import Data.List

data Foo = Foo Integer
    deriving Show

instance Eq Foo where
    (Foo a) == (Foo b) = a == b

instance Ord Foo where
    (Foo a) <= (Foo b) = b <= a

main = do
    let list = [Foo 3, Foo 4, Foo 2]
    print $ sort list                   -- outputs [Foo 4,Foo 3,Foo 2]

Note that the instance declarations are separate from the definition of the type! The module where the type is declared can define them, but so can the module where the typeclass is declared. Other modules are not allowed to by default to make sure there is only one canonical definition of an instance for a given type and typeclass.

How does this actually work then? Well, Ord a is a secret parameter to sort. Haskell will create a bundle of function pointers for us that represent the specific Ord instance for whatever type we pass to sort, either from knowing the type statically at that point, or passing along a bundle passed into whatever called sort. So this compiles to something quite similar to the C qsort (at least as far as polymorphism is concerned), taking in a comparison function. The big difference is, Haskell will choose the comparison function for us – but it is one comparison function, not one comparison function per item as in Java.

Programming Language #4: Sorting in Rust

So, how does Rust do all of this?

As I said, a Rust trait is very much like a Haskell typeclass. Rust’s main sort method, like Haskell, requires the Ord ~~typeclass~~ trait. Like Haskell, it even has provided (but overrideable) methods as well as required methods:

pub trait Ord: Eq + PartialOrd {
    // Required method
    fn cmp(&self, other: &Self) -> Ordering;

    // Provided methods
    fn max(self, other: Self) -> Self
       where Self: Sized { ... }
    fn min(self, other: Self) -> Self
       where Self: Sized { ... }
    fn clamp(self, min: Self, max: Self) -> Self
       where Self: Sized + PartialOrd { ... }
}

Like typeclasses, to indicate that a type has a trait requires a specific block that says what trait we’re trying to implement, and lists the implementation of the required methods. Like in Haskell, that block may reside in the crate where the trait is defined, or the trait where the type is defined. Like in Haskell, this allows us to add polymorphism to previously unpolymorphic operations without having to create wrapper types.

Here is an example of implementing this trait (unfortunately, we have to implement both Ord and PartialOrd):

use std::cmp::Ordering;

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
struct Foo(u32);

impl PartialOrd for Foo {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        other.0.partial_cmp(&self.0)
    }
}

impl Ord for Foo {
    fn cmp(&self, other: &Self) -> Ordering {
        other.0.cmp(&self.0)
    }
}

fn main() {
    let mut foos = vec![Foo(3), Foo(4), Foo(1), Foo(2)];
    foos.sort();
    println!("{:?}", foos); // Displays [Foo(4), Foo(3), Foo(2), Foo(1)]
}

It’s very similar to Haskell, but with “C-like” syntax and aesthetic. The syntax for the functions using the trait looks like C++ templates:

fn max<T: Ord>(a: T, b: T) -> T {
    if b > a { b } else { a }
}

What’s different from Haskell is how it’s implemented. The semantics are quite similar, and the Rust implementation can be thought of as an optimization of the Haskell semantics. Instead of passing in to sort() a secret run-time parameter with Foo’s implementation of Ord, the function is monomorphized. We can think of it as inlining just that one parameter at compile-time, and generating a specialized function.

Yes, this implementation is fundamentally very similar to C++’s implementation of templates. It’s basically the same in terms of machine code and resulting optimizations. But the semantics are more Haskell-like. Polymorphic functions are type-checked once. They may only use functionality incorporated in the traits at hand. We don’t postpone the type-checking for the template instantiation.

What’s more, the same mechanism is also used for Rust’s run-time polymorphism, where we can have a type like dyn MyTrait for some specific traits that are object-safe. These trait object types are like OOP polymorphic types, in that each value has its own copy of the table of polymorphic functions with it, but the copy is outside the original object. It is a property of the pointer, not of the object, and implemented with fat pointers.

Like with any other trait, the trait implementation is separate from the type definition or the trait definition (though it must live in the same crate as one of them). Unlike C++, there is one system for polymorphism that can be used in both run-time and compile-time ways, with overlap where possible.

Conclusion

I hope this shows, if nothing else, that polymorphism itself can take many forms in many programming languages beyond the OOP variety of it. The OOP variety is in some senses self-propagating – if you optimize your language for it as in Java, then it makes sense to use for everything, even if it’s not what you would choose in a language that has other options.

For many forms of polymorphism, in C++ (for templates), Haskell, and Rust, no inheritance is necessary. It is simply not built according to the OOP frame of mind. I personally think Haskell and Rust are doing it right here, as is perhaps obvious from how I’ve written about it.

I hope to write more about run-time polymorphism in Rust, and how it differs from the C++ variety, and how you can manually implement other types of run-time polymorphism if you want. This would be a future post. But, this is a hobby blog, so no promises on timeline!

A Review of Self-Help as a Genre, and Atomic Habits in Particular

2024-01-28T00:00:00+00:00

I enjoyed reading Atomic Habits, which was recommended to me by my therapist. I found this blog post basically finished in my attic folder while sorting through things, and I found it up to posting, even though my records show I read Atomic Habits way back in … October 2022.

Self-Help in General

Atomic Habits is pretty fundamentally a “self help book.” This is a pretty controversial genre in my experience. Some people roll their eyes at self-help books in general – I once even read an “anti-self help” book that basically did so for the entire length of a book. Others swear by them – literally, I had a friend once who said The Subtle Art of Not Giving a Fuck was her Bible and who used it as such for an (informal but serious) oath. I’m generally somewhere in the middle of these two extremes. I read them with solidly middling expectations.

My attitude flavors how I read self-help books. So, before I talk about Atomic Habits in particular, I want to talk some about self-help books in general, and my take on them.

I’ll start with their problems, because, as a genre, they sure do have their problems.

Problem #1: Length

They are longer than they need to be, stretched thin by the lengthening. Atomic Habits clocks in (in my copy) at 264 pages, with an estimated word count of 80,000 words. However, it provided the insight of a long blog post, maybe at around 20,000 words at most. And I only give it credit for that long a blog post because this was a particularly useful self-help book, which also mitigated its length using re-caps and summaries arranged in helpful chart forms.

I do understand why publishers do this: they want to publish books, not glorified pamphlets. I also understand that self-help is far from the only genre with this problem: it plagues all forms of popular non-fiction.

But it’s still annoying.

Problem #2: Wildly varying standards

The bigger problem with self-help books is that they vary widely in quality, not just in terms of evidence for their claims (that if you follow their advice you will get the results they say you will), but more importantly in terms of moral quality. Some of them have questionable values, brazenly teach you how to be manipulative or otherwise unkind to other people.

This is partially because it’s such a subjective topic. There are no unified standards on wisdom, no certifying authority. Some books are written by expert psychologists and psychiatrists, but those are often framed to specific disorders, and they aren’t always compelling writers. Others are factual and even science-based, but have goals that are repugnant to many or even most people. Still others are just snake-oil or feel-good.

Problem #3: They can turn into religions

One specific pattern of problem is endemic to the genre: over-enamored with their own importance, they try to provide a comprehensive life framework to the reader, the “one cool trick” that will fix everything that ails you. This can lead to the situation where people can buy into it so hard they idolize it and treat it like a religious text. Simultaneously, others reject it as overbearing and boundary-crossing while cringing at such people.

This is kind of easy for self-help writers to do, even by accident: The nature of the topic makes any life advice in scope, and authors as living humans generally have some sort of opinions on how to handle any sort of life situation, that they may already organize internally into an all-encompassing framework. The nature of writing also requires organizing those opinions into general principals (often over-general). And in order to be effective, you have to persuade the reader.

All together, this can lead to over-stating your case for a simplistic framework. With this One Simple Trick™, with this simple overriding principle, you can transform your entire life. On such premises religions are built.

How I read them

As a result of these problems, I tend to take the scope and claims of a self-help book with a grain of salt. I don’t expect it to transform my life, or revolutionize me. I don’t trust it, even temporarily, to tell me how to think about or organize the ideas it presents. I don’t read it for the overarching framework at all; instead, I just sift through it for individual useful take-aways, discarding the vast majority of it (even the majority of the 1/5 of it that isn’t fluff to make it book-length) as either things I already know or else already know enough to disagree with. I then can integrate these individual ideas into my own framework and values.

So my experience goes something like this:

Huh, that’s an interesting fluff story. Cool, I see the point you made, but I knew that already. Decent story to back it up though! Nice phrasing too, but I’ll forget that tomorrow… Yeah, that page just reiterates that point, wow this could have been a blog post…

Yeah, I can see the organizational structure you’re using to tie it together with the framework for your book. It’s not that useful to me as a life framework, but I’ll treat it as a framework that holds the book together.

Wait! Aha! There we have a new idea! I’ll take it!

You might think that if I’m so cynical about self-help books, and think they have a low information density, then I’d be very disinclined to read them. But I do actually read them from time to time, especially when someone recommends them (Atomic Habits, The Subtle Art of Not Giving a Fuck), or when they’re relevant to me (Taking Charge of Adult ADHD, which is basically a self-help book but for a specific class of people to which I belong).

And I’m usually happy I did it. Even though all I get out of it are maybe a handful of ideas I can take with me, those ideas are sometimes really good. Some ideas I lean on a lot I’ve gotten from some self-help book or another – and those are just the ones I’m aware of.

And rehashing concepts I already agreed with, or defending my mind against concepts I disagree with, is also a useful exercise. I generally believe in reflection on values and approaches to life. I wouldn’t go far as to say an unexamined life is not worth living, but I tend to think that detailed critical thinking is a net positive, and is not usually driven by anxiety-based “overthinking.”

And I mean, I’ve reflected a lot on my value of reflection and done a lot of examination on my value of self-examination, and it generally holds up. Why wouldn’t I want a guided version of that, to get outside the limitations of my own way of thinking and those of my close friends?

And all in all, I need the excess wisdom. I think we live in a society with a bit of a wisdom crisis. We don’t have a lot of traditionalism going on, and to the extent that we do, we live in a different world than even our parents let alone the worlds of our various scriptures. Humans need guidance. There’s a reason self-help books sell. There’s a reason why sometimes people turn them into religions in our heads.

Atomic Habits in Particular

Now that I’m done waxing philosophical (for now), the natural question is, what did I get out of Atomic Habits?

Even though it was much longer than it needed to be, I’m not inclined to summarize it. I don’t even remember everything that’s in there – most of it I already knew from reading the older The Power of Habit which this book admits to spending a lot of time rehashing and coming up with relatively straight-forward and obvious applications. So I’ll leave summarizing to another article by someone else who has done a great job and whose article is around the length the original book should have ideally been.

Instead, I’ll say that I enjoyed the review of the material from that other book, was inspired an appropriate amount, and even got a handful of take-aways that will stay with me in my internal pile of “wise thoughts.” Rather than a summary of what the book has to give, you’ll get a list of what I have taken from it.

My Take-Aways

You don’t change habits by setting goals, you achieve goals by changing habits.

This one will really stick with me, because they had just told stories about sports. Every sports team has the same goal: win the tournament. It’s just that they have different habits to get there. So obviously setting goals by itself isn’t good enough.

And while this has some overlap with things I already knew (like the idea setting SMART goals) it did rub the point in in a different enough way that I felt it was worth adding to my list. If you practice in effective ways, you will get better enough to do X. You don’t have to think about that goal, and in fact, it’s probably better if you focus on enjoying the process.

Related to the previous: Your habits are set based on the type of person you are. So, instead of thinking about goals or habits directly, consider thinking about what type of person you’re trying to be, and what they would do.

This is a bit more complicated, but it makes sense. Like an evangelical with a bracelet “What would Jesus do?”, think about the type of person you’re trying to be, and do what they’d do. In addition to enabling you to be more moral by emulating religious moral authorities, this also overlaps with advice on how to be less impulsive from ADHD advice books I’ve read.

It makes sense that this would be able to generalize to more narrow questions “what would a good writer do?” or “what would a habitual musician do?”, but I really hadn’t thought about it before.

It also, speaking of Christianity, reminded me of a drawing I saw once in a book about the Lutheran confessions. It had a tree, and the root of the tree was Glaub[e], or faith, and the branch of the tree was Lieb[e], or love, and the crown of the tree was Werk or [good] work[s]. The message was that your values influenced your feelings and attachments, which influenced your behavior. Focusing on doing good directly was not the right approach, but rather to focus on what you believed in a core way.

If anyone can find this drawing, by the way, please let me know!

Conclusions

It was a decent book, for a self-help book. If you’re particularly struggling with habits, or goal-setting, or trying to motivate yourself, it might be useful to help you deconstruct where you’re going wrong. It’s also useful background for understanding human nature a little better, especially if you’ve never thought about these issues in detail.

Minor News: Some Repos on GitHub

2024-01-21T00:00:00+00:00

So, there are now two additional repos of my code on GitHub that recently got published, both under the MIT license. Neither is any show-stopping major project, but I figured I’d let everyone know nevertheless, and write up a few notes about it. Both have been added to my programming portfolio garden.

Repo #1: Crate Version of Prefix Ranges

Arvid Norlander (blog, GitHub) reached out to me to ask if I wanted to publish my little Rust module from my post on prefix ranges as a crate, or, failing that, if I could license it as open source so he could publish it. I had thought of most of my code on this blog up until this point as example code not worth licensing, but his prompting changed my mind. If it’s just trivial example code, it’s not worth not open sourcing, so I might as well release the website’s example code under an MIT license.

This particular piece of code seems like the wrong end solution to the problem at hand – though it is the solution I ended up using when faced with the problem in a larger project. Ideally, I would like to write a follow-up piece to the prefix range article, discussing how to fix BTreeMap to generalize not just to splitting on various keys based on their ordering properties, but based on any appropriate function that acts as a range (i.e. that monotonically transitions from false to true when looping over items in sorted order by the Ord trait), as a generalization of Bound. Then, prefixes could be represented in terms of such a function, and we could leverage the full efficiency of a BTreeMap without having to do any extra UTF-8-mongering.

But fully implementing such a thing would mean patching the standard library, and fully writing that blog post would mean a lot of benchmarking work. I still plan on doing it someday, but as I point out many times, this is a hobby blog (although I do now support buying me a coffee, that is meant in the true spirit of buying me an extra beverage as a token of thanks. At the time of this writing no one has clicked it, and I certainly expect no more than occasional literal coffees to come of any money from it), and so follow-up posts will happen when they happen (although nagging me about it, nicely, over e-mail is allowed).

Repo #2: Texas Hold-Em Library/Quiz App

I’ve been writing some code to do with the most popular modern poker variant, Texas Hold-Em. It lives in a repo on GitHub. Ideally, it’ll turn into an app to help me and some buddies practice reading flops, counting outs, seeing who’s ahead, and doing other hold-em mental calculations. I might also extract a library or even a framework for writing AIs, or playing against them. Maybe even a front-end app could be added, either in Rust or in Reflex in Haskell.

But no promises! See the hobby blog note above! If you really want a feature, I’ll happily accept PRs!

Of course, this wouldn’t be the first such codebase, or even the first in Rust. I’m just having run. I enjoyed writing the code so far, and I figured I’d put it on GitHub in the meantime, even if it never becomes particularly useful.

Writing it with all its combinatoric randomness made me really learn to appreciate itertools, a collection of iterator methods that for various reasons haven’t been accepted or stabilized in the standard library. It’s been good exercise writing in functional programming, iterator and iterator-transformer style, which is a little harder in Rust than in Haskell.

Also, while I understand why Rust doesn’t have generators (there is an excellent blog series about the topic on “Without Boats”), many of the reasons are historical and, well, I just really wish it did.

Additional future exploration might include zany optimizations, perhaps inspired by (but not directly following in the feet of) this zany hand evaluation algorithm implemented in Rust many places including here by Wataru Inariba – although regular optimizations probably come first.

Review: One Billion Americans, by Matthew Yglesias

2024-01-08T00:00:00+00:00

This was a great read about how the United States should reframe many of its basic political assumptions.

It is tempting to think of life as a zero-sum game. Having more for me, even enough for me, means less or even not enough for others. Usually, we have the open-mindedness to feel like we can cooperate with some few – our family, our community, or perhaps our nation or religion or even (problematically) our ethnic group. But at a certain scale, there is a sense that there’s not enough to go around to all the people who might want it.

This shows up on the right and the left. For the right, our country is “full,” any immigrants a threat to sparse resources and jobs. For the left, it is the world that is seen as full: more people necessarily is seen to mean more environmental damage.

In his book One Billion Americans, Matt Yglesias addresses both arguments, and addresses them thoroughly. In summary: Our country is not resource constrained, but constrained by willingness to use well-established urban planning and transit technology that exists throughout the world. The way out of environmental damage and climate change is not asceticism or population restriction, but technology.

By focusing around the provocative premise of an America with three times the population, both by increased birth rate (a scandal to liberals) an increased immigration (a scandal to conservatives), Matt Yglesias creates a framework he can jump off of to explore a variety of issues. To accomplish this audacious goal, many problems would have to be fixed in Amerian society, politics, and economics, for the most part problems that we will have anyway, and that we will have to fix anyway, whether or not we have in mind the goal of tripling our population.

As a result, the book covers a variety of seemingly disjoint topics, from childcare and education to immigration to transit and urban planning. It therefore avoided the problem a lot of non-fiction books have: I genuinely feel this book is the correct length. Unlike many similar books, it could not have just been a blog post, but rather it would have been a blog series, that is to say, a full-length book.

All in all, a great read, and I am grateful to the friend who gave it to me this year for my birthday. I generally agree with the positions in it, and it provoked a lot of good thought.

Is Section 3 of the 14th Amendment Undemocratic?

2023-12-26T00:00:00+00:00

US politics continue to be interesting.

As many of you know, the Colorado Supreme Court has recently ruled that Donald Trump should be struck from the ballot in Colorado. Under Section 3 of the 14th Amendment to the US Constitution, if you’ve sworn to support the Constitution, and then engaged in (or “given aid or comfort to”) an insurrection, you are no longer eligible to serve in office. The Colorado Supreme Court applied this law to Trump, citing the Capitol attack of January 6, 2021.

This may be the first official ruling to agree that Trump is disqualified, but the theory has been discussed since the events of January 6, 2021. The theory gained more serious attention and respectability when it was endorsed by conservative legal scholars William Baude and Michael Stokes Paulsen in an explosive law review article, and now, it has finally manifested as an official decision in this ruling.

Opinions about the Opinion

There are a lot of criticisms of the ruling. As is often the case with complicated political issues, there aren’t just two “sides,” but a grab-bag of more nuanced opinions and observations.

Some people have procedural nitpicks, claiming that state courts don’t have the jurisdiction to evaluate such issues, or that Trump would have to actually be convicted of a crime for the section to apply, perhaps the crime called insurrection.¹

Others don’t think the riots on January 6 qualify as an insurrection at all, or they think that Trump didn’t “engage in” it, or they think that his participation is protected under First Amendment free speech rights². Still others simply think that Trump should win even if an insurrectionist, or that he was the legitimate President-Elect in 2021, or otherwise hold brazenly anti-Democratic views, both in the sense of hating the Democratic Party as well as democracy itself.

But the criticism that I find most interesting is the criticism that the court’s decision is undemocratic. We ought to let Trump run, this criticism goes, because in a democracy, it is better for the voters to decide that someone should not be President, rather than a court. Some of the people who think this also fall into another category: they think this decision is undemocratic and they think January 6 doesn’t reach the standard of the 14th Amendment, or hasn’t been adequately proven to, and that I find less interesting. But some people simply think that even if January 6 was an insurrection, and Trump engaged in it, he should still be on the ballot, and would still legally become President if elected.

That is to say, there are a large number of people who are uncomfortable not with the specifics of this ruling but rather with the fundamental premise of Section 3 of the 14th Amendment. These people might believe that the law has evolved away from what it says, or that it requires implementing legislation. Or, these people might believe that we simply should ignore this constitutional provision, or that we should never have added it to the constitution. In any case, these people believe that even a violent insurrection against the government, by a person who had specifically sworn not to do that, should not be a disqualification to run for office, or at least, to the extent that it is one, it should be a qualification decided on by the voters.

Narrowing the Question

In this post, I will not try to evaluate whether the events of January 6 counts as an “insurrection.” I will not try to figure out whether the way the Colorado Court proceded was legally correct, nor whether the Supreme Court will overturn it, nor whether it’s a good strategy for defeating Trump or will instead backfire.

Instead, I will think about the underlying theoretical question as if it were not so relevant to today’s news:

Is it undemocratic to disqualify from elections those who have participated in an insurrection?

To help separate this abstract question from the current news, let’s not imagine that this is about Trump. Let’s make up a new scenario in our minds, and let’s imagine instead that the insurrection in question was the Civil War – or some other insurrection that you, as a reader, can feel comfortable wholeheartedly opposing, in favor of explicit Communism or Nazism or racism or whatever other ideology most gets your goat.

Let’s further imagine that the candidate openly admits that they, in fact, did engage in insurrection. In fact, not only do they admit that the insurrection was an insurrection, but they say it was a justified one. They admit – or rather, they proudly announce – they’d do it again. They certainly won’t rule out doing it again if they lose.

But of course, you are not of the opinion that the insurrection was a justified one – you don’t want this person to win at all. And they’re running for President!

So here’s the question: Should this person be allowed to run or not? If you were designing a constitution for your dream country, would you allow the courts or some other mechanism to stop this person from running, or would you hope they simply lost at the polls?

All Qualifications Are Undemocratic

Any disqualification from office, of course, is undemocratic in a sense. A democratic election for President, in a pure sense, means that whoever gets the most votes for President must win. And if someone who people want to vote for isn’t an available option, well, that makes the election undemocratic.

Everything that detracts from the idealized, pure form of an election takes our country away from being a democracy. Things like the electoral college, to the two-term Presidential term limits, to even the restriction that Presidential candidates must be over 35 and natural-born citizens, can prevent the people from perfectly exercising their will through a Presidential election.

However, very few of the people calling this recent ruling “undemocratic” have any problem with preventing 30 year olds, or foreign-born Americans, from running for President. Even though this sort of discrimination based on age or national origin would be severely frowned upon in hiring³, we have simply gotten used to them. Perhaps, perhaps, one could argue that it is a better show of democracy’s power, a more rational system, to simply allow teenage Presidential candidates to be disqualified not by law but by the people’s collective decision, but no one in practice is interested in changing it. We’re simply used to it.

Ironically, these other qualifications, in my mind, strike me as more undemocratic. If everyone 30 and under had just started being oppressed, what President could genuinely sympathize? Is the natural-born citizen requirement today kept because of concern about national loyalties, or out of racism?

At least excluding insurrectionists has a logical pro-democratic angle. Committing an insurrection against our elected government is fundamentally anti-democratic, and so excluding those who have done so is a move to protect democracy. Unlike requiring people to be 35 or natural-born citizens, this provision has a claim to protecting democracy at the same time as it undermines it.

In other words, even though the 14th Amendment directly hurts democracy, by limiting who people can vote for, it indirectly protects it, by keeping people out of power who might do an insurrection and overthrow democracy. It’s a trade-off: By making this election slightly less democratic, it protects all future elections’ existence against the possible insurrectionist’s dictatorship.

The question is then how to evaluate this trade-off.

The Problem of Cheaters

If you want to find out who the fastest person is, have a race.⁴ If you want to find out who the best person is at chess, have a chess tournament. And if you want to find out who the most-supported⁵ person is for President, have an election.

Even if there is no cheating, this can be irregular and unreliable. Sometimes, a person is tired, or ate something that disagrees with them, and that makes them slow of foot. Sometimes, a person has a brain fart and blunders at chess. And sometimes in a modern media-fuelled election, someone says a random gaffe that loses them the election, but not their long-term genuine support, or the election happens right on the wrong news cycle, or their support is concentrated in the wrong specific states or demographics.

But cheating is a deeper threat. If the goal of a race is to find out who is the fastest runner without taking steroids, or (for example) hitching a ride for part of the race, then if someone does that, they will win even if they aren’t the fastest runner. If something isn’t done to prevent people from using steroids, then everyone will have to use drugs just to have a chance, which totally goes against the goal of finding out who the fastest runner is without taking steroids.

So, cheating is a threat to the very concept of a race. How do we stop cheating? Punishment is a viable option, making the behavior of cheating have bad consequences as a deterrent. The punishment of disqualification – from not just the race in which they cheated, but also future races – goes beyond deterrence, however. Not only does it increase the negative consequences of cheating (and therefore discourages it), it decreases the likelihood that someone wins by cheating.

It is true that a one-time cheater might legitimately also be the fastest person, and win a future race legitimately. But it’s also true that if they win a future race, they did so by cheating. They’ve proven themselves willing to cheat. So, if we want to find out who the fastest non-cheater is, excluding past cheaters is a great way to prevent present cheating.

So, excluding past cheaters from a race can actually make the race more fair. Even though the past cheater might be legitimately the fastest person, they should be disqualified, because if they win, what confidence do we have that they won fairly?

Democracy as a Peaceful Replacement for War

The analogue to democracy is this: Someone willing to do an insurrection is likely to be unwilling to give up power in a peaceful manner, likely to use any power they gain to cheat on future elections, either by influencing them or by simply refusing to acknowledge and act on the results. This is true in general, and it is especially true if the original insurrection directly involves not accepting election results.

After all, the whole point of having a democracy is that we decide who’s in charge based on who has the most support, rather than by having a war about it every time. Everyone agrees that fighting with votes is better than fighting with guns, and as a result, we can have changes in government without mass death and destruction.

This is especially important given the violence and destruction of modern warfare. It is not a coincidence that World War I, far more deadly than any other war that Europe had experienced, also spelled the end of large absolute monarchies in Europe. Warfare has such an unacceptable cost that we’ve all collectively decided we’d rather risk our political enemies winning an election as in a democracy, than have a war every time we need the government to change, which is how monarchy often works in practice.

So, engaging in insurrection is even more undemocratic than other types of election cheating. The principle of democracy is vastly more important than any individual person or party winning, because the alternative is war and therefore mass death. If a candidate doesn’t agree, than that candidate is intrinsically undemocratic, to the point where excluding them is more democratic than allowing them to run.

The Downsides of the Ban

I know that people might disagree with these arguments. Banning insurrectionists from running has cons as well as pros. Who shall determine who has committed an insurrection? Will an anti-insurrection provision be abused for political purposes dishonestly, where something that is not an insurrection is called one for political gain?

Hopefully, the law would specify the procedures for this disqualification, and indicate who gets to decide. Hopefully, it would choose someone with enough distance from the political process to actually implement it.

But perhaps that isn’t enough. Perhaps the only way to have a democracy is for everyone to be eligible, and for it to be seen for everyone to be eligible. Everything more complicated is up for misinterpretation, and stokes distrust. Provisions written on paper do not necessarily accomplish their obvious goals. No amount of clarity of rules can counteract a dishonest referee, or convince a partisan that the referee is actually honest.

Rule of Law and the United States in Particular

So, which decision do we make here? Do we have a system for disqualifying insurrectionists, or not? More important than either decision is having a rule for it ahead of time. As a democracy, the way to determine this should be the same as any other determination we make about constitutional decisions. The rule of law is an important principle, so everyone knows what the rules are ahead of time (and knows what referees will be evaluating them). And so, when an insurrection happens, we should ideally follow the law to determine what to do, rather than having an ad hoc discussion then to determine how to handle the situation.

And now, I return from the abstract question to the particulars of the recent decision. The United States has already made this determination, in the 14th Amendment to the Constitution. Unfortunately, it is unclear how it is to be enforced; Congress has defaulted on its duty in Section 5 to “enforce, by appropriate legislation, the provisions of this article.” So, what is clear, is that insurrectionists (who have previously sworn oaths of office) are ineligible for office. What is unclear is the details of how this is accomplished.

To me, this means that the question of whether banning insurrectionists from running for office is already decided. It is not undemocratic to do so, as it is a policy that has pros and cons for a democracy, but the rule of law breaks the tie and so we should follow the 14th Amendment. Questions remain about the details, but that, in our system, are what the courts are for.

So. I do understand (and disagree with) accusations that the Colorado Supreme Court decision is politically motivated. I also sympathize with the claim that Trump should be charged with the crime of insurrection in order for this case to qualify – perhaps that would be a more fair way to determine whether Trump’s behavior on January 6 qualifies as engaging in insurrection. But I thoroughly disagree with those who claim the decision is “undemocratic.” While there are ways in which the 14th Amendment is undemocratic, there are also ways in which the opposite policy is undemocratic. We have already, democratically, made the decision that this disqualification is part of the rules to our democracy. It is too late, for this case, to reconsider now whether that decision was wise.

Appendix: Text of the Section

No person shall be a Senator or Representative in Congress, or elector of President and Vice-President, or hold any office, civil or military, under the United States, or under any State, who, having previously taken an oath, as a member of Congress, or as an officer of the United States, or as a member of any State legislature, or as an executive or judicial officer of any State, to support the Constitution of the United States, shall have engaged in insurrection or rebellion against the same, or given aid or comfort to the enemies thereof. But Congress may by a vote of two-thirds of each House, remove such disability.

14th Amendment to the US Constitution, Section 3

Appendix: Footnotes

Of these, I think the most valid nitpick is that Trump should have to be convicted of the actual federal crime of insurrection for the amendment to be triggered, as that counts as the Congressional implementation of the amendment under Section 5. The least valid nitpick, in my view, is the zany motion that the Presidency is not an “office under the United States” or that the President does not swear to “support the Constitution” because he swears instead to “preserve, protect and defend” the Constitution. Laws are read as documents in natural languages like English, rather than read as computer programs. None of this matters in the slightest to the larger arc of this blog post, which asks, legal technicalities aside, whether the whole idea is fundamentally undemocratic. ↩︎
I feel obligated at this point to point out that since the 14th Amendment comes after the 1st Amendment, technically, free speech might not apply to its provisions, as the 14th Amendment comes more recently and therefore can override the 1st Amendment. ↩︎
Not to mention, they would illegal under various discrimination laws, though in fairness these notions come after the Constitution was written. ↩︎
Though famously fallible, a race is still the best way to find out who is fastest. The swift won’t always win, because of time and chance, but they will more often than not. ↩︎
“Well-supported” in this sense means with some weighting given to total popularity, and some weighting given to being able to dominate in certain states. This is to say, we can mathematically define “well-supported” so as to make the electoral college make sense. Or we could not do that, and say the electoral college is undemocratic, which is a fair position. ↩︎

2023 in Retrospective and 2024 in Prospective

2023-12-23T00:00:00+00:00

Another year has gone by
And in response, I simply sigh
Another year has taken place
I guess I’ll handle it with grace?
Another year, the same old grind…
And yet I feel I’ve fallen behind

As you might know if you’ve read my equivalent post from last year, I am now 35 years old (and 3 days). If we consider “working years” to range from 20 to 65 – which seems a decent definition – then I am 1/3 of the way through them, 1/3 of the way through my career. So, theoretically, we should see my résumé at least triple in impressiveness by the time I retire!

Thinking about this year as 1/3 of the way to retirement is definitely less depressing and existentially terrifying than thinking of 35 as half-way to 70. I think it’s also more realistic. The type of processes I was undergoing from 0-20, the type of growth, the type of tasks, the level of (lack of) freedom, is so different, overall, from my adult life. Of course, 17 might be a better cut-off year, because that’s when I left home and went to college, but that kind of takes the spin out of 1/3, so I’ll keep in terms of 20. And besides, 1/3 of the way through my career seems appropriate for this blog, as much as I talk about programming on it!

Like last year, I’d like to reflect on the previous year. I don’t have such a laundry list of achievements as I mentioned in that previous post, which is fair: I didn’t rebuild a life (kind of) from scratch with a different town to live in, housing situation, and medication (and therefore brain structure).

And indeed, that wasn’t my goal. Unlike 2022 where my theme was rebuilding, my theme for 2023 was growth. By “growth,” I meant an active settling in, a deepening or intensification of the new life I’d built. And I think I managed that. I spent time settling to the new life, getting more used to it, getting closer to the people around me, and solidifying it.

As for the blog, it’s not really growing, which is sad, but it is approximately holding steady, which is good. 34 posts this year (by also including this post) compared to 37 last year isn’t too bad:

$ ls | grep ^2 | cut -f1 -d- | uniq -c # Count posts per year
      3 2017
      1 2018
     17 2019
      5 2020
      3 2021
     37 2022
     33 2023

Alas, I have not transitioned my blog from mostly polemic to mostly educational. My most recent technical post, instead, very controversially criticized a well-established mechanic for organizing software complexity. But! I’ve also not let it fade away, in spite of having had a few curveballs thrown at me this year. And in the meantime, I’ve also done substantially more writing outside of the blog, which is not publicly available.

My goals for the blog remain approximately the same as last year. I’d like to do more educational content. I’d like to write more non-technical stuff. I have to say, the polemic technical content gets views and reactions and spark. That’s hard to beat, and the effort I put in explaining things for more educational content often gets the reaction of “yep, checks out, makes sense.” Perhaps I can find a decent balance somewhere – or find a way to keep the educational content more interesting. If at first you can’t succeed, as they say, try, try again.

In the past year I did get a new job, working for Amtrak. Several of my friends also got new jobs, two of them specifically becoming teachers. Jobs transitions are a lot, I can say from inside of one. This came along with (in my case) a transition from working from home to hybrid, and a commute that includes a driving component (for the first time in my life!), so that was a lot.

In my next year, I know what my theme will be, but I’m not entirely sure what the best word for it is. It will have something to do with being balanced about how I spend my time, and intentional about how I spend my emotional resources. Prioritized or focused might be it, but not “focused” on productivity or “prioritizing” my work and chores correctly, but a bit more general than that. Definitely, it has to do with being intentional about the most precious resources I have: my evenings and my weekends, so as to make sure I can connect with the people I care most about while also building in the types of activities I need to do and maintaining the parts of my life that need active maintainance.

I suppose the one-word theme will be this: balance. I will try to keep balanced, intentional, well-considered and well-prioritized about how I spend my time and emotional energy, rather than just dancing from plan to plan and idea to idea as they arise, and agreeing to things based on things like guilt or unexamined excitement or even just thoughtless and distracted accumulation of plans. (No, instead I shall overwhelm myself with curated and careful accumulation of plans!)

All in all, a very difficult theme perhaps for an ADHD-er, but perhaps for that very reason, an important one.

Rust Is Beyond Object-Oriented, Part 3: Inheritance

2023-12-07T00:00:00+00:00

In this next¹ post of my series explaining how Rust is better off without Object-Oriented Programming, I discuss the last and (in my opinion) the weirdest of OOP’s 3 traditional pillars.

It’s not encapsulation, a great idea which exists in some form in every modern programming language, just OOP does it oddly. It’s not polymorphism, also a great idea that OOP puts too many restrictions on, and that Rust borrows a better design for from Haskell (with syntax from C++).

No, it’s that third pillar, inheritance, that I am discussing today, that concept that only shows up in OOP circles, causing no end of problems for your code. Unlike encapsulation and polymorphism, Rust does not have any direct analogue.

Side note: In this series in general, but especially in this post, I am primarily discussing static OOP languages, like C++ and Java, where interfaces have to be explicit and where classes correspond to different static types. Much of what I write would have to be adapted to apply to more dynamic “duck-typing” styles of OOP like in Python or JavaScript (or Smalltalk), and won’t apply as directly. This series is about why Rust isn’t OOP, and Rust is closer to C++ or Java than to a dynamic language, so this bias makes sense in context.

Why do people like inheritance?

I can see why inheritance is so compelling. The entire system of education encourages us to categorize things into neat little hierarchies. Rectangles are a type of shape, and squares are a type of rectangle. Humans are a type of animal, and men and women are types of humans. Inheritance allows us to take this “X is a Y” and express it to a computer.

This “is a” relationship is seen as intuitive. As the entire point of OOP is to make programming more intuitive, more like reasoning about the real world, inheritance is a perfect match for it. Just like we reason about the real world with categories and subcategories, we can reason about the world of our program in a similar way.

And this allows us to feel smart when we read introductions to inheritance in various books on OOP programming. We see the Tiger class inherit from the Animal class, or the Rectangle class inherit from the Shape class.

We get so excited by the abstract principle of “is a” that we don’t even notice that the examples have nothing to do with programming. We don’t write code about shapes or animals. And even a drawing program or a zoo inventory app wouldn’t use inheritance like this! If inheritance was so useful as to be a pillar of OOP, why are there so few beginner examples that involve things programs actually do?

What do I mean by inheritance?

First, let me clarify what I mean by inheritance, or rather what I don’t mean.

I don’t mean every subtype-supertype relationship, where all values of one type are also included in another, broader type. Subtyping shows up in Rust all the time, particularly when it comes to lifetimes.

I also don’t mean the version of inheritance that only involves implementing an interface. In C++, you implement dynamic interfaces through inheritance as a mechanism, even if the “superclass” is just a list of methods. In Java, inheritance and interface implementation are separate mechanisms. I am not talking about interface implementation as inheritance, even though it is technically considered the same feature in C++:

// This class has no fields, only virtual methods.
//
// In Java, we would call this an interface. In Rust, we would
// call this a trait.
class Shape {
public:
    virtual void draw(Surface &surface) const = 0;
};

// This is considered inheritance in C++. The Java equivalent
// would use `implements` instead of `extends`. And you could still
// do this in Rust with a trait.
class Square : public Shape {
    int size;
    int x;
    int y;
public:
    void draw(Surface &surface) const override;
};

I am only opposed to the type of inheritance that is still called inheritance in Java. Having a type implement an interface (a trait in Rust) is perfectly legitimate and still allowed in Rust, as is casting a reference to a value to a generic, “dynamic” value based on that trait or interface:

trait Shape {
    fn draw(&self, surface: &mut Surface);
}

struct Square {
    size: u32,
    x: u32,
    y: u32,
}

impl Shape for Square {
    fn draw(&self, surface: &mut Surface) {
    }
}

// Assume square is Square, surface is Surface
let shape: &dyn Shape = &square;
shape.draw(&mut surface);

Shape, in this context, is a pure interface. It is only a structured form of polymorphism, not inheritance per se. Very importantly, Shape has no fields. It is defined based solely on what you can do with it. And accordingly, the “is a” language makes sense for interface implementation: Square is a Shape. A Shape has no state, though, just methods, just behaviors.

But some parent classes have fields. And that’s when inheritance really starts to have problems: when the “parent” class has fields. It is at this point that inheritance starts to seem really weird.

What does inheritance actually do?

In my article on encapsulation, I discussed how a class is secretly two things with the same name, entangled and conflated:

A record type (or what Rust would call a struct), that is, a type whose values consist of a number of fields with fixed names and types
A module (a collection of code with enforced encapsulation boundaries), containing that record type and a collection of functions (called “methods”) for interacting with it

Inheritance does something different with each of these concepts. To start out, let’s discuss what it does to the record type. We’ll continue using shapes, a classic example for discussing object-oriented features. A circle is a shape, so we can use inheritance here:

class Shape {
public:
    Color color;
};

class Point {
public:
    int x;
    int y;
};

class Circle : public Shape {
public:
    Point center;
    int radius;
};

So, what does this mean for Circle? Well, it means that all the fields of Shape (namely, color) are also fields of Circle. Therefore, references to Circle can be made into references to Shape, as everything you can do with a shape, you can do with a circle, like set the color, or get the color:

Circle circle;
Shape &shape = circle;
shape.color = Color::Blue;
assert(circle.color == Color::Blue);

The thing is, we already have a mechanism of taking all the fields of struct A and putting it in struct B: by putting a field of type A into struct B! Instead of inheritance’s “is a,” we can accomplish the same thing with having a field, or “has a.” In our example, we can do the exact same thing with Point that we did with Shape – it just involves being a little more explicit about what’s going on:

Circle circle;
Point &point = circle.center;
point.x = 3;
assert(circle.center.x == 3);

So, what does inheritance do to the classes from the record type perspective? It makes the parent class a field of the child class, just a field with no name. By writing:

class Circle : public Shape {
    // ...

… from a record type perspective, we were writing syntactic sugar for:

class Circle {
public:
    Shape shape;
    // ...

And when we wrote:

Shape &shape = circle;

That was translated into something like:

Shape &shape = circle.shape;

“Is a,” from a record type point of view, is just syntactic sugar for “has a.” If you want to do something similar in Rust, just make a has-a relationship, rather than creating an implicit field with no name. Rust doesn’t like implicit nameless things anyway.

This will also save on arguing about whether two types have an “is a” or a “has a” relationship. I regret all the time I’ve spent splitting hairs about that distinction, when really, it’s just a matter of whether we want a field to be implicit or not.

OK, so that covers what inheritance does to the record types, but what about the rest of the class, the module? What happens to the methods?

Well, for non-virtual methods, it’s also straight-forward. Instead of doing inheritance, you can still just use has-a instead, and do a field access. Instead of calling, say, circle.get_color(), we could always call circle.shape.get_color().

So far, with the fields and non-virtual methods, inheritance just seems a bit weird and overrated. Like, we don’t see any reason yet why a programming language would want to support it, when just having a field of a superclass type does everything. But on the other hand, some people like implicit fields and convenient short-hands, so there’s not much of a downside either.

Inheritance without virtual methods may seem harmless, but it doesn’t have much to do with the concept of “is a.” Technically, you can use a field access as an implicit conversion, and think of it as a subtyping relationship, but it doesn’t actually correspond to how the world works. Even in the world of shapes, it doesn’t make sense: if a square is a rectangle, how come it has less state than a rectangle, with only one field for side length instead of two for width and height?

But we’ve not yet talked about virtual methods. When we do, you will see why I think inheritance is not just an unnecessary feature, but an ill-conceived anti-feature.

But what about the virtual methods?

So, earlier we discussed a class as being two things, a record type (with fields) and a module (with methods and visibility restrictions). But once we consider virtual methods, a class is actually three things with the same name:

A record type: each object has the fields
A module: the type, trait, and other methods, are all in an encapsulated module
A trait or interface: the virtual methods form an interface

Side note: some programming languages consider all methods to be virtual for some reason. For these programming languages, everything I say still applies, but all methods are in the trait as they’re all virtual.

Given that most methods aren’t self-consciously written with the intent to be virtual, making methods implicitly virtual seems like a good way to set the programmer up for surprise – that is, a horrible idea. But nevertheless having all virtual methods was for a long time considered the more ideological, more purely OOP way to do things, and so languages which strove to be purely OOP (like the original Java) did it.

Up until now, we have ignored this additional conflation, this additional role that a class plays. In discussing encapsulation, we were discussing simply how classes conflate the two distinct concepts of record types and modules. In discussing polymorphism, we were assuming interfaces, and discussing how OOP’s version of interfaces were constrained by insisting on a specific dynamic implementation. Only now, now that we discuss inheritance, do we see that OOP not only conflates record types and modules, but it also conflates record types and interfaces.

When a class has virtual functions, that constitutes an interface, implemented by dynamic polymorphism. But the only way you are allowed to implement the interface is by inheriting from the class – that is, by also having a (secret, unnamed, implicit) field of the record type.

See, as discussed above, inheriting from a class without virtual methods, a class with just fields and regular methods, is no biggie. It’s just a weird way of writing a has-a relationship that comes with some syntactic sugar and automatic conversions – things I’m not a fan of and wouldn’t put in my programming language, but not that bad.

Similarly, inheriting from a class without fields, a class with just virtual methods (and perhaps regular methods, it turns out they barely matter) is also no biggie. It has all the downsides of OOP-style polymorphism, but is fundamentally just a way to indicate that you’re implementing an interface. In languages like C++, inheritance is the mechanism by which you implement interfaces, and in languages like Java, a methods-only class should probably be an interface.

(To round out all the possibilities, I will mention that a class with neither virtual methods nor fields is just a traditional module.)

But if you have both fields and virtual methods, then you have true OOP-style inheritance, with all of its problems. You have an interface that you can only implement if you inherit from the class. If you did not intend this, perhaps because you are writing in a language like Java where allowing inheritance is the default for classes and virtual is the default for methods, you are setting yourself up for surprises when someone inherits from your class and starts overriding methods.

If you did intend this, however, why? Why make implementing an interface contingent on having certain state, on having a special unnamed field? Why conflate these two fundamentally different concepts of containing another record type’s state and having the new record implement an interface?

There’s a number of problems with this conflation. Why would we assume that in order to implement the methods, you need that state? What if that state is represented differently, like on a disk, or over a network, or as mathematical consequences by a formula? This conflation of implementation and interface means that there is no sane way to implement proxy objects.

But more importantly than that, I’m not entirely sure what the upside of this conflation is. It seems to make programming simpler in one particular scenario, a scenario that I rarely see come up in real life, a scenario that frankly seems like a code smell.

So what can we do instead?

There is no inheritance in Rust. There are no fields in traits. There is simply no way of saying that in order to implement a trait, your type must have certain fields. Rather than conflate the concepts of record types, modules, and traits in this God-concept of “class,” Rust keeps these three concepts quite separate.

So if we have a design that requires inheritance (either because we think in OOP or because we’re translating from an OOP programming language), how would we represent that in Rust?

Well, the most straight-forward way would be to separate out the different parts of the base class. Such a refactor would allow us to express our design in Rust, as literally as possible. This is just meant as a starting point, a proof of concept that our design can survive in a language without inheritance. Alternative, often better ways of replacing inheritance will follow subseqeuntly.

But here’s the straight-forward method: If the base class has just fields, or just virtual methods, that’s easy: it becomes a struct or a trait, respectively. Instead of inheriting from the class, a type would have that struct as a field, or implement that trait. Actually, in this case, the straight-forward method might just be perfect – you weren’t actually using inheritance per se, just an odd syntax for a field or for implementing an interface.

If it has both, we’d have to extract both a struct and a trait. The fields would become a struct, of its own type. The interface of the virtual methods would become a trait. The implementation of the virtual methods would become the implementation of that trait for that struct, or provided methods on the trait, depending on what makes more sense. Any non-virtual methods would then become methods of the struct or provided methods on the trait, again depending on what makes more sense in context.

At this point, it might make sense to consider some of the alternatives that Rust provides to run-time polymorphism, as discussed in the polymorphism post. Is a trait, especially an OOP-style, object-safe trait, really what we want here? We’ve opened up alternative designs now, and perhaps one of the alternatives makes more sense.

Assuming we do want a trait, we can then go to all the “child” classes and make them implement the trait. They also get a new field, perhaps named super, to contain the parent. Their trait implementations would then do a mix of implementing new methods, calling the same method on super, and defaulting to the provided method.

And again, at this point it would be appropriate to consider whether we even need the super field, or if perhaps we can get away with not having it.

After this transformation, we have valid Rust code out of our inheritance-based OOP-style design pattern. But there’s nothing requiring us to use Rust to do it: you could do the same refactor of inheritance structures in an OOP language.

If we were to do this transformation, we’ve paid a small cost of having to potentially write .super (or whatever name we’ve given the parent field) every once in a while, as well as writing trait implementations that forward some method calls to the super field. In return, we’ve deconflated the two very different concepts of interface and fields, and opened ourselves up to more possibilities.

What should I actually do in Rust instead of inheritance?

But notice that in discussing this transformation, I encouraged you to consider alternatives at two points. Rarely does this transformation make sense literally, which is to say, rarely does a literal translation of inheritance into Rust make sense. I find this quite telling, as it implies to me that inheritance itself only rarely makes sense – and indeed, I only tend to use inheritance in OOP languages where a framework requires me to, or as an ersatz² replacement of sum types (i.e. Rust enum).

Here are some other patterns that replace inheritance hierarchies, that you might find yourself considering instead:

A regular enum. This actually covers most situations for me. Methods that would be overriden just do a match on the enum contents, and methods that would not, do not.
struct types that contain a field with an enum types. The enum type represents all the different options, but the struct type contains the fields that are always the same.

struct MessageHeader {
    source: Address,
    destination: Address,
    seqnum: u32,
}

enum MessageBody {
    Ping(PingMessage),
    Pong(PongMessage),
    Request(RequestMessage),
    Response(ResponseMessage),
}

struct Message {
    header: MessageHeader,
    body: MessageBody,
}

Isn’t this so much nicer than putting source, destination, and seqnum in the base class?

enum variants that themselves contain enum types.

enum Message {
    Client(ClientMessage),
    Server(ServerMessage),
}

enum ClientMessage {
    Ping(PingMessage),
    Request(RequestMessage),
}

enum ServerMessage {
    Pong(PongMessage),
    Response(ResponseMessage),
    Error(ErrorMessage),
}

Now, if you want any message, your type is Message. If you know for sure you have a client message, you can say ClientMessage. Or if you know for sure it’s specifically a ping, you can say PingMessage. It’s like a class hierarchy!

A struct with a template-parameterized member to set a policy.

This is perhaps the most sophisticated replacement. Imagine you have a class SocketHandler that handles reading from a socket. Imagine it looks like this:

class SocketHandler {
    CircularBuffer socket_data;
public:
    void data_available(int fd);
protected:
    virtual size_t message_size(const char *data, size_t size) = 0;
    virtual void process_message(const char *data, size_t size) = 0;
};

How this is going to work is, data_available is going to grab more and more data from the socket fd until message_size returns a non-zero value. Then, it’ll call process_message with that data. During this time, it’ll store the data in socket_data. All of that work is being done by data_available, in the parent class, and you can imagine that the socket dispatching library has a collection of these socket handlers, something like std::vector<std::unique_ptr<SocketHandler>> (or perhaps a map indexed by file descriptor).

The child class is responsible for overriding message_size and process_message to actually interpret incoming data for a specific protocol. You’d have a child class for each SocketHandler protocol, and it would include internal state like sequence numbers, etc.

But rather than have these methods overriden by a child class, the right way to do it is to have just those methods in a trait that a SocketHandler has. You can see this when you extract the implicit trait for SocketHandler for the Rust version:

trait SocketProtocol {
    fn message_size(&self, data: &[u8]) -> usize;
    fn process_message(&mut self, data: &[u8]) -> Result<()>;
}

struct SocketHandler<P: SocketProtocol> {
    buffer: CircularBuffer,
    protocol: P,
}

trait SocketHandlerTrait {
    fn data_available(&mut self, fd: u32) -> Result<()>;
}

impl<P: SocketProtocol> SocketHandlerTrait for SocketHandler<P> {
    fn data_available(&mut self, fd: u32) -> Result<()> {
        // Call `self.protocol.message_size/process_message`
    }
}

So, rather than each socket protocol inheriting from socket handler, with its common state, the socket handler has a socket protocol, as a policy. The SocketProtocol trait here can then be a compile-time, static trait and SocketHandlerTrait can be the object-safe, dynamic one, and the std::vector<std::unique_ptr<SocketHandler>> can be replaced with Vec<Box<dyn SocketHandlerTrait>>.

This last refactor can be generalized. Instead of inheriting from a base class to implement specific functionality, inject that functionality using policies³, and parameterize the struct with members that implement policy traits. Then, if need be (and need might not be) write a separate dynamic trait for the overall struct.

I know my last post hasn’t been since February. I’ve been procrastinating this one for a long time, mostly because my life has been so gosh-darn busy, and also mostly because I don’t really instinctively remember what I (or anyone else) really liked about inheritance to begin with. ↩︎
Isn’t it weird that ersatz means replacement in German, but means mediocre as a replacement in English, so that “ersatz replacement” doesn’t mean “replacement replacement” but “mediocre replacement”? Or am I using the English word wrong? ↩︎
Policies are known in Gang of Four terminology as strategies. I’ve touched on the policy pattern in some previous posts, and at some point should write a full post about it, as policies are my favorite thing. ↩︎

Are You Sure? (Revised)

2023-10-24T00:00:00+00:00

This is a revision of a flash fiction piece first posted in 2018.

After a year of talking, and another year of planning, the project was complete. Mothers Against Drunk Driving, the local clergy, and the town council had finally done it: Right in the town square, they installed a giant loudspeaker. From thenceforth, every two minutes, a booming voice would spread all over town, announcing:

ARE YOU SURE?

Foolhardy decisions, they had decreed, would soon be a thing of the past.

The locals seemed to adapt pretty readily. Sales of noise-canceling headphones boomed for a bit, and people’s sleeping habits were surprisingly unaffected – who notices slightly inferior sleep? And drunk driving statistics were immediately better, which the local paper celebrated triumphantly.

The clergy were the first to notice the downsides. Weddings were being canceled during the vows a full 25% of the time – brides and grooms would take back their “I do"s in response to the booming speaker of skepticism. Adult baptisms were fully cut in half. Divorces, on the other hand, were also cut in half – though some of the rescued marriages maybe shouldn’t have been.

At a town council meeting, one of the proponents of the loudspeaker said, confidently, this is a good idea, only to cringe when the timing worked out that in the next second, the entire room boomed:

ARE YOU SURE?

No one was starting new relationships – and no one was exiting them either. New job postings languished unfilled, as both candidate and interviewer expressed their doubt. Slowly, but surely, the social and economic life of the town started to grind to a halt, as it became the norm to cancel even casual plans like going out for a drink (and certainly having another once there), or going to church on Sunday…or work or school on Monday.

Over the days and months and years, the town developed a culture of its own. Only necessities were bought, and only emergencies were handled. Trash piled up on the streets as no one collected it, and then stopped piling up as no one threw anything out. No one remembered what life was like before, and fewer and fewer visitors passed through to challenge it.

It wasn’t just the loudspeaker: people repeated its eternal mantra to each other, having had it etched into their dreams. “We should take down the loudspeaker,” said an occasional rebellious teen, only to hear all their friends in unison say back, “Are you sure?”, echoed, a moment later, by the loudspeaker itself:

ARE YOU SURE?

Eventually the loudspeaker broke. The mayor told his deputy to fix it, but all the deputy could do was respond, “Are you sure?” The rest of the town council waited for the booming voice to agree, but even without the voice, no one was bold enough to fix the loudspeaker. It wasn’t enough of an emergency, and besides, it felt like hubris or even blasphemy to presume to be able to fix what seemed like such a fundamental part of their world.

Without the loudspeaker, slowly but surely, the town returned to normal. A year later, people would only say “Are you sure” as a joke – one that many found vulgar and tasteless. And today, the teenagers wonder why, in the middle of the town square, there is a looming hulk of a loudspeaker system, never turned on or used, never cleaned up or put away. And of course, it is now only the old who endlessly repeat what was once a mantra, as their adult children shake their heads.

Endianness, and why I don't like htons(3) and friends

2023-10-19T00:00:00+00:00

Endianness is a long-standing headache for many a computer science student, and a thorn in the side of practitioners. I have already written some about it in a different context. Today, I’d like to talk more about how to deal with endianness in programming languages and APIs, especially how to deal with it in a principled, type-safe way.

Before we get to that, I want to make some preliminary clarifications about endianness, which will help inform our API design.

Why Little Endian Bugs Us

New students often are more confused by little endian (where the least-significant component of an integer is stored first), and until they are told about it, they tend to assume computers are big endian (where the most-significant component is stored first) even if they don’t know that word. This is due primarily to the fact that big endian is what they’re used to: We write numbers with the most significant digit on the left, and in languages that write from left to write (including English, the lingua franca of programming among other things), this means that we live our day to day lives in big endian. But that doesn’t mean that big endian is more logical in any way, just that it is more conventional.

This isn’t helped by the fact that many learners are first exposed to little endian by it being confusing, and making them do more cognitive work, by reading little endian numbers from a hex dump. Take, for example, this code, which displays a 32-bit number in hexadecimal, and then displays the individual bytes of the same number as a hex dump:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[0], bytes[1], bytes[2], bytes[3]);

This results in this befudding output:

12345678
78 56 34 12

When read as a number, we can just read the number normally. However, when read as a series of bytes, we find ourselves having to read the number from right to left to read the number as big endian, as we are accustomed to doing. We can’t even just read backwards, however, as each byte is still printed internally according to our big endian convention: the higher-order hex digit is still printed first, followed by the lower-order hex digit.

The problem here isn’t little endian. The problem is that the printing functionality accommodates our big endian preference in printing, but only at the level of printing an individual number, either as a byte or as a 32-bit word. The word printed as a whole is printed big endian, to accommodate us. The individual bytes are also printed big endian, to accommodate us. However, the hex dump as a whole is printed with the lower values on the left, and the higher values on the right, to similarly accommodate our values that lower-indexed memory, memory that comes earlier, should be on the left. On a little endian system, this desire to print each number with the most significant digit on the left, but to print a sequence of numbers from left to right, leads to the contradiction. The resulting last line, 78 56 34 12, isn’t, properly speaking, little endian. The print-out is an odd type of mixed endian, due to our awkward conventions.

There is actually a relatively easy fix: if we insist on reading numbers with the most significant digit on the right (which we do), and the computer insists on storing less significant components first (which it does), these two desires can be reconciled by printing the hex dump from right to left:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[3], bytes[2], bytes[1], bytes[0]);

This results in a much cleaner print-out:

12345678
12 34 56 78

This should make clear that the weirdness of little endian is entirely due to our preference for big endian, and our preference for listing the lower-indexed values to the left, and how these preferences interact. It is because of human conventions, not because of any intrinsic problem with little endian. I would argue that, on little endian systems, all hex dumps should be right to left, and that would help, but there is little I can do to change the conventions of this.

Now, almost all modern systems are little endian, either because they are typically configured that way for processors that support either endianness, or because they only support little endian, like Intel processors. The few programmers who have to write code for big endian systems find themselves in the minority, and find themselves doing extra work to deal with other code that no longer accommodates big endianness.

There is one big exception to this: the Internet. All of the Internet protocols are designed to use big endian ordering, known in this context as “network byte ordering.” This is because when the Internet protocols were developed, big endian was a viable rival to little endian, and both byte orders were common.

This does make some sense, as well, because hex dumps of packets are very common, and big endian does make those hex dumps easier to read and reckon with for us big endian humans.

When Endianness Comes In

I would also like to clarify something about how endianness works. A 32-bit word in a register in the processor is neither big endian nor little endian. The processor needs to be designed knowing which bits are more significant, and which are less, but there is no intrinsic way in which the less significant bits come “first.” In a word-based memory system, where only entire words were stored in memory (like the PDP-7 was with its 16-bit words), and where it was impossible to address memory in terms of individual bytes, this would be the end of it.

As an example of this, see the documentation for std::endian on CppReference.com:

If all scalar types have sizeof equal to 1, endianness does not matter and all three values, std::endian::little, std::endian::big, and std::endian::native are the same.

However, once we come up with the idea that memory is made up of bytes, the endianness question arises: How do we split this 32-bit number into bytes? Which end of it should be byte 0, and which end byte 3? Similarly, if we read a series of bytes into memory, where should the first byte (by memory address) go in the register, the most significant (big) end, or the least significant (little) end?

As a result, types like uint32_t (and uint16_t and uint64_t) have no intrinsic endianness, so long as they are stored in registers. Only if they are written to memory, or read from memory, does their endianness matter. And then, it only matters if the actual byte representation is important – if we, as in the code above, use memcpy to copy their representation, byte by byte, into an array of bytes.

In general, if the byte representation does matter, I would argue that uint32_t should be treated as an abstract 32-bit value, devoid of endianness. Only when it is transcribed as a series of bytes should endianness be taken into account – and then the description should instead have the type of uint8_t[4] in C (or std::array<uint8_t, 4> in C++ or [u8; 4] in Rust).

The Main Argument: Why I dislike `htons` and friends

In C, however, we do not in fact do this. We instead have functions like htons, with this signature:

uint16_t htons(uint16_t hostshort);

uint16_t http_port = htons(80);

This function purports to convert a 16-bit number from host endianness (typically little) to network endianness (always big). Assuming a little endian computer, it does a byteswap: It swaps the less significant 8 bits with the more significant 8 bits in the register used to return the uint16_t.

So what are the properties of the returned uint16_t? If we passed in, for example, 80 (the port of HTTP), http_port, the new uint16_t is 20480 – because 80 is 0x0050 in hex, and we’ve swapped the two bytes, so we now have 0x5000. What is this number?

It is not, to be clear, a uint16_t value 80 that is now in “big endian,” though we might say that as a manner of speaking. It is almost certainly in a register, and as mentioned before, registers don’t have intrinsic endianness. It is something far more awkward: It is a value that, if we were to store it in little endian (the only option), results in a different number being stored in big endian.

To expand on this: 20480 is not a particularly meaningful number. It is not actually the port number we want to use. And it has nothing to do with the actual number 20480. It is simply a number that, if we store it in memory as bytes, will result in 0x00 being stored, followed by 0x50 – the big endian representation of 80. It is a uint16_t with a value chosen not for what number we want to store, but what bytes we will get if we store http_port as bytes.

Since uint16_t is designed to store numbers, not collections of bytes, I would argue that this type is not being used in a semantically honest way – it is a lie. What we are really storing is an array of 2 bytes, 2 uint8_ts. We are storing it in a 16-bit register, and implementation-wise that might be a good decision – but I would argue, if we want that to be possible, we should create an ABI where uint8_t[2] should be storable in a single register. The C programming languages, by not making arrays first-class types, is getting in our way here, which explains the situation.

Am I exaggerating when I say the type is a lie? Well, we expect to be able to do arithmetic on a uint16_t, to be able to test, for example, whether it is less than 1024, as listening on a port less than 1024 is a privileged operation. But in order to do that, we have to convert it back to a normal uint16_t – all uint16_t’s usual arithmetic operators are inappropriate for data that’s stored with its bytes swapped around.

So what should be done? Well, if we really intend to express a value in network byte order, e.g. big endian, we are changing the semantics of the information from “this is a 16-bit integer” to “this is a specific sequence of two bytes, chosen for a reason.” Therefore, the return value of htons should be an aggregate of two bytes.

Again, because of pointer decay this is impossible to express straight-forwardly in C, although a wrapper struct could be used. C++ takes care of this by having a built-in wrapper struct for arrays, namely std::array. The equivalent of htons would not emphasize that the uint16_t is in the host order (which I think is the wrong way of thinking about it), but would simply indicate that we’re just storing this short in a big-endian fashion (as opposed to the hardware-supported default storage we can access with a memcpy):

std::array<uint8_t, 2> store_short_as_big_endian(uint16_t value);

Rust already provides this as an alternative:

impl u16 {
    pub const fn to_be_bytes(self) -> [u8; 2] {
        // ...
    }
}

Unfortunately for semantics, Rust still has the problematic signature for to_be:

impl u16 {
    pub const fn to_be(self) -> u16 {
        // ...
    }
}

Perhaps this is due to efficiency reasons, or felt efficiency. Programmers know that this byteswapped value should, for performance, be stored in a single register. Programmers can feel more confident that this is actually done if it remains a u16 (or uint16_t) than if it is transformed into an array of bytes, however semantically inappropriate the u16 is.

However, if we are using a u16 or uint16_t as an implementation layer for what is in fact a way of storing two bytes in the opposite order than the one that makes sense for our processor, if we are using it as an implementation trick to do something semantically different from what a uint16_t normally does, then we should at least make the type distinct to give the maintenance programmer and compiler some ability to avoid letting us do non-sensical things (like comparing the value using uint16_t’s comparison operator).

Luckily, there is a design pattern for using the implementation of a type, but applying different semantics to it: the newtype pattern. We typically think of it as a Haskell or Rust thing, but we can use it in C++ as well. I would argue that if we’re going to abuse uint16_ts and friends in such a way, we should at least abstract it using the newtype pattern. In C++, this would look something like this, assuming a little endian computer:

template <typename T>
class big_endian {
    T value;
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        value = std::byteswap(in);
        return *this;
    }

    operator T() {
        return std::byteswap(value);
    }
};

Adding appropriate if constexpr expressions to also support big endian machines, and defining std::byteswap if you don’t have it yet on your system is left as an exercise to the reader.

But it works on my (little endian) system:

int main() {
    big_endian<uint16_t> be = 80;
    std::array<uint8_t, 2> be_bytes;
    memcpy(be_bytes.data(), &be, 2);
    printf("%04X\n", uint16_t(be));
    printf("%02X %02X\n", be_bytes[0], be_bytes[1]);
    return 0;
}

I would much rather use this to represent “we want to store a value in a register byte-swapped on some platforms” than a uint16_t with no additional type information. You cannot accidentally run invalid uint16_t operators on it, but you can convert it to a normal uint16_t first and then use those operators. However, it does have a big endian representation when stored, as indicated by the memcpy, and it can still be stored in a single register.

Even so, I would still not prioritize that ability to store it in a single register in most situations. Using a uint16_t to store the bytes swapped is still not remotely “storing a big endian value in a uint16_t,” it is “storing a big endian representation in a uint16_t so that when the processor writes that uint16_t little endian, we get a big endian representation of the number we actually want.” It’s still fundamentally a hack for performance, and while I’m comfortable with it contained within the encapsulation of this little_endian class, I would still rather actually write std::array<uint8_t, sizeof(T)> as the underlying storage type, unless the optimization is actually needed. I actually would use a big_endian class that would look more like this:

template <typename T>
class big_endian {
    std::array<uint8_t, sizeof(T)> be_representation;

    static void swap_array(std::array<uint8_t, sizeof(T)> &arr) {
        for (auto it = arr.begin(), jt = arr.end() - 1;
             it < jt;
             ++it, --jt) {
            std::swap(*it, *jt);
        }
    }
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        memcpy(be_representation.data(), &in, sizeof(T));
        swap_array(be_representation);
        return *this;
    }

    operator T() {
        auto bytes_copy = be_representation;
        swap_array(bytes_copy);
        T out;
        memcpy(&out, bytes_copy.data(), sizeof(T));
        return out;
    }
};

This now feels like I’m actually representing accurately what a big endian representation is: a way of storing a number as a sequence of bytes, rather than however the processor feels like storing it, and certainly rather than as a value that the processor will store as little endian, but which will store the value we actually want to store as big endian. I won’t lie and say the optimizer will make it equally performant, and if I needed to actually optimize I would use the other version, but I feel like this version is hack-free. (Again, it still only works on little endian platforms – fixing this is again left as an exercise.)

This version has the added benefit of having an alignment of 1, which I will argue later is more appropriate than using the underlying alignment of uint16_t, uint32_t, etc.

Using These “Big Endian” Types

This leads to a further question, however: When do we need to support network byte order? Really, the only time is when generating messages in wire format to send over the network. In C and C++, we generally represent messages to be sent over the network as structs.

For example, one can imagine a packet format with a 32-bit sequence number. We would want to write uint32_t for this sequence number:

__attribute__((packed))
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    uint32_t sequence_number;
}

However, of course, if it is in big endian byte ordering (as many protocols are), we then have to call htonl when loading this value in:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = htonl(seq_num);

As I said before, I don’t like htonl. I certainly don’t like using uint32_t as the type for sequence_number. So, we can do one of two things:

We can use a Rust-style function to convert to byte representation, and use std::array<uint8_t, 4> as the type of sequence_number. This strikes me as equally awkward. We now know that we need to do soemthing other than just assign the value, but we don’t know what that thing is, necessarly.
We can make the type more semantic, and use our big_endian wrapper. This is the purpose why I wrote it, and the use case where it makes sense it has an alignment of 1 – wire format structures are often packed.

__attribute__((packed))
// ^^ You may need to add this to `little_endian` as well,
// or you may not need it at all now
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    big_endian<uint32_t> sequence_number;
}

Now, when we actually send it over the wire, we will cast or copy this packet_wire_format to get the byte-by-byte representation, and sequence_number will be in big endian, by the invariants of our big_endian class. We will not need to remember to call any function at all, as the class’s interface provides us with only appropriate options:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = seq_num; // Performs conversion

The fewer mistakes you can make by accident, the better. And of course, this has the additional advantage that the type of the wire format is more self-documenting.

Similarly, if you read or write from the wire format using read and write methods on a buffer type, those methods should either be parameterized to take endian information along with the values, or you can pass objects of type big_endian as the value to be copied in: big_endian<uint32_t> is just as trivially-copyable as uint32_t.

Conclusions and Loose Ends

It is a little more awkward to write big_endian for Rust. I would want to use the existing to_be_bytes method in the implementation, and unfortunately that method is not in any trait, as I’ve complained about before. This can easily be remedied by writing our own trait, however, or using external crates that already do so.

However, I wonder if maybe all of these languages should define types that correspond to uint16_t, uint32_t etc, and just are defined to store themselves in network byte order (and perhaps another one that guarantees little endian order). After all, most processors support byteswap instructions, that make writing a value as a byteswap an easy operation. They could be optimized as normal values unless actually written to memory – and only the optimizer knows when they’re actually written to memory. They could even be written to memory in native endianness unless there’s some defined way to get a byte-by-byte pointer to them – and really only the optimizer knows that.

Endianness seems more a configuration on the natural types of the programming language than it does something to be implemented on top of these natural tools. These loops I’m using to do byteswaps are surely not the most efficient way to do it (which is why the non-array based implementation of big_endian is surely more performant even if it is hackish), because processors have some support for non-native endianness baked in. If a C++ vendor provided types like big_endian (and perhaps some do, I’m sure I’ll find out in the comments) it would surely be more performant.

But again, perhaps they should be primitive types. There’s some built-in processor support for them, and only the optimizer knows when the non-native endianness actually should be used.

I am too busy a person to do the research for such a proposal. I don’t know if such a proposal exists. My interest here is simply in using the tools I have to be a good programmer. For that, to_be_bytes and my implementation of big_endian will simply have to suffice.

Operating Systems: What is the command line?

2023-10-08T00:00:00+00:00

This is my newest post in my series about operating systems. Yes, it was last updated in 2019 – I’m a hobbyist blogger. This is a post about the command line, a computer topic, but it is for educating a non-technical (but tech-curious) audience. Most of the programmers in my audience will already know everything I have to say, and may be bored by some explanation of things they already know, though I intend to discuss some technical details of how computers work.

This is not a tutorial on how to use the command line on any particular operating system. Rather, it is a discussion of the role that a command line plays in a modern operating system and why some people (including me) still use that kind of interface.

As I’ve explained before, I often use my computer through the command line. It is a major part of but not the entirety of how I interact with it. I do this so much that people looking at my computer will assume I’m programming even when I’m not – even when I’m working on my blog, or another writing project, or even just organizing my pictures.

Here is a screenshot of a command line session:

Graphical User Interfaces

This is (as you likely know since you’re reading this on a website) no longer the normal way to interact with computers. Nowadays, we usually interact with computers through graphical user interfaces (GUIs), and many people take them for granted. We access applications¹ through each having their own window – or, for web applications, we can combine them into one window via browser tabs.

We navigate these applications through the mouse, or touchpad. Scrolling and clicking to find our way through the document, right-clicking or navigating menus to find further options, and occasionally interacting with a “dialog box” to specify details. All features are expected to be discoverable, that is to say, we expect to be able to find them in a menu, a toolbar, a right-click menu, or by navigating the dialog boxes we reveal through these other things. If we cannot discover a feature by these mechanisms, we can reasonably assume the application does not have this feature.

Here is LibreOffice Calc, a (somewhat old-fashioned) GUI program:

Nowadays, applications often run inside web browsers. This principle of discoverability is still considered important. Here is Google Docs, an application running inside a web browser:

These are both mouse-navigated programs with discoverable features. For both of these applications, there are many visible ways to interact with them. If you want to find a feature, looking through what’s right in front of you is the way to go.

The Command Line in Brief

The command line works differently.

Nowadays, the command line is usually accessed via a window within the context of a graphical desktop environment², but in the olden days, people interacted with computers via dumb terminals that couldn’t display images, just text³:

“It was a dummy terminal, and I was a dummy user.”

A member of the Baby Boomer generation describing what it was like to be a person in a non-IT role using Unix in the 80s.

Instead of being able to find various features via menus visible on the screen, you are instead given a prompt, an indication of the current state of your session that is, well, prompting you to tell the computer what to do, to give it a command:

You can then type your command, maybe a few more.

As you type commands, the output of the commands displays on the subsequent lines. When you hit the bottom of the screen, the screen scrolls up. Most terminal emulators let you scroll the window to see earlier parts of the transcript. A command might also prompt for additional input, or take full control of the terminal emulator and provide a different type of (still text-based) interface entirely.

If you type a bad command, it is not very helpful:

There is no discoverability. There are no hints as to what commands might be accepted. You can use the command line to find out more information about what commands are accepted, but you have to know the commands to do that. In practice, you have to learn a minimal set of commands from a book (or nowadays, a website) before you can actually do anything productive.

It’s not intentionally user-unfriendly. For example, on Linux, there are commands like man (for “manual”) that explain what commands do, and commands like apropos to search for useful commands. Here is the manual page for the man command itself:

Additionally, once you know the name of a command or utility, you can generally find out more about how to use it by passing -? or --help:

Command lines are available on all modern operating systems for personal computing: Windows, macOS, Linux, and certainly any other Unix you might have running. They tend not to be available on mobile OSes.

What is the command line not?

Before we talk about what this is for, and why modern operating systems still support this decidedly old-fashioned way of interacting with them, I want to dispel some myths and misconceptions about the command line, specifically two opposite misconceptions that seem to still be common amongst the computer laity.

Misconception One: The command line is literally DOS, the Microsoft operating system from the 80’s and early 90’s. It is there to support old programs from the 80’s and early 90’s, and exists solely for the support of obsolete and obsolescent software.

This misconception is common among Windows users, because it used to be true. Until Windows XP, Windows still came bundled and intertwined with a version of Microsoft’s older, fully command-line operating system, DOS. Old DOS programs were still in common use, and people needed a way to run them, so they could run a copy of DOS inside a window.

It’s not true anymore, however. Windows is no longer a chimera of DOS and more modern components. Since Windows XP, both the consumer and business versions of the Windows brand have been versions of Windows NT, a different operating system from earlier consumer versions of Windows, one originally targeted at business users, with no DOS code in it at all.

On a modern Windows computer, the command line is not primarily for DOS programs. The ability to run DOS programs isn’t even shipped with Windows by default anymore, but the command line still is. The confusion is understandable, because the command line still looks like the DOS command line. The prompt is still a form of DOS’s famous C:\>.

What is the command line for, then? It is for running modern Windows programs that happen to be designed to be used from the command line. Windows comes with a bunch of such programs, for things like systems and network administration.

There are a bunch more that you can download install, usually tools written by computer professionals for other computer professionals. Many of these command line programs were written primarily for Linux and other Unix OSes, but also have Windows versions.

We will go into specific examples of command line programs in a later section, but the important thing to know is that a command line program has access to all the same system libraries and capabilities that any Windows (or Linux, or macOS) program can access. It can play audio, connect to the Internet, and do pretty much anything – anything except draw a new window on the screen, not because it can’t, but because that would make it not a command line program anymore.

But I don’t want to go too over-the-top rebutting this first misconception, because then I might lead you to believe the second misconception.

Misconception Two: Not only can you do anything from the command line that you can do from a graphical user interface, but the command line is fundamentally closer to the operating system. When graphical programs run, they are using the command line under the hood.

This is not true.

It should be obvious that there is at least one thing you can do from a graphical user interface that you can’t do from the command line, which is to display graphics. The command line is an interface based fundamentally on displaying a grid of text. Thanks to modern Unicode, “text” now includes “emojis,” but it does not include images or high-quality charts and graphs.

But even with that overly-obvious caveat aside, yes, it is true that anything a graphical program can do besides show graphics could be done by a command line program as well. There are command line programs that manipulate images, they just don’t show the images as they manipulate them. There are command line programs that pretend to be web browsers and scrape data off of the websites when they load. All the operating system features and computer resources that graphical programs have at their disposal, command line programs will generally have too, besides (by definition) actually doing graphical displays and interactions.

However – and this is a big however – just because a command line program could exist to do everything a graphical program does, doesn’t mean that you have that program installed on your system, or that someone’s even ever written that program. The capabilities of your computer depend on what software you have installed, and what software you can install depends on what software people have written. If someone creates a file format, but only writes a GUI program to edit it, well, then, until someone reverse-engineers it, that file format will only be editable via GUI. Similarly if they only create command line tools – that file format will then only be accessible by command line.

For example, someone with ImageMagick installed on their computer but not Photoshop may only be able to do image manipulation from the command line. Someone with Photoshop installed but not ImageMagick may only be able to do image manipulation from the GUI. There is nothing intrinsically more powerful about either interface.

Specifically, GUI programs are decidedly not wrappers around command line utilities. You could write a GUI program that way (and there are a couple that are), but the vast majority do not in fact do this. Just as command line programs have access to all the same computer resources and operating system functionality that GUI programs do, it also works the other way around. GUI programs and command line programs both are written in programming languages that allow the program to invoke operating system functionality through system libraries and system calls. These calls are not at all the same as command line commands, and the GUI doesn’t need to use the command line as an intermediate layer.

If there is a GUI version and a command line version of the same functionality, maybe this is implemented as the GUI version launching the command line version under the hood – that is certainly something GUI programs can do, and it might make sense if the command line version is the interface most people use and that most maintainers are interested in. But it is just as likely if not more likely to be implemented by the GUI program and the command line both using the same common library.

And certainly, GUI-only programs like web browsers, e-mail clients, and office suites do not by any means implement their functionality by wrapping command line programs. There is no command line version of or interface to Photoshop, nor of Microsoft Word⁴.

And just like it’s possible to have an operating system with a command line and no graphical user interface, it is possible to have an operating system with a graphical user interface and no command line, not even internal analogues of it.

History of the command line

As I said before, computers used to be frequently accessed via dumb terminals. Before this, they were accessed by teletypewriters. This was literally a typewriter, where the keys you entered went to the computer, and the computers responses were typed on the paper.

Modern command lines mostly follow that pattern – new input goes in at the bottom of the window, and the window scrolls like a piece of paper receding from the typewriter. But on a modern command line, the program can also take over the entire terminal emulator window, as long as what it wants to draw can be expressed as text. They even support multiple colors.

Most command line systems used today, like most operating systems used today, descend from the Unix tradition, written in 1970. The exception is Windows – even though the Windows command line is not DOS, it takes many of its aesthetic principles from DOS, not only the famous prompt C:\>, but also its habit of taking options with /, where Unix and friends use -.

What are some modern command line programs?

git keeps track of different versions of a large folder (called a repository) full of code or other forms of (mostly) text, and allows changes to be merged and reconciled between different authors. While there are GUI and web wrappers around it, the flagship program is a command line utility.
ssh lets you log into a command line interface of another computer, usually a server. This is often the only way to log into and administrate the server, as Linux servers generally don’t have any GUI capabilities or GUI programs installed.
ImageMagick lets you manipulate images.
Last but not least, there are many small programs that let you do basic file management, searching, and editing. Two of my favorite new ones are RipGrep by Andrew Gallant (which lets you search for strings or patterns in text files) and fd by David Peter (which let you search for files by name or other properties).

Why use the command line?

If you are new to a tool, discoverability is an important feature. If you are experienced with a tool, all the hints of where to find things are more distractions than they are useful.

As someone who needs all the focus that I can get⁵, distractions are bad. And so are extra steps: Why spend the time moving the mouse around to access one menu, then another, when on the command line, I can just type the command I already know for what I need to do.

Additionally, the command line is designed to save on extra typing. Generally, most modern command lines support “tab completion,” where you can type the beginning of the command, or a file that it’s operating on, and press the [TAB] key, the command line interpreter will complete the word for you – or list the possibilities if there are multiple.

For a newbie, it might be an intimidating, but for someone who’s used to it, it stays out of your way and lets you get stuff done – while showing you a detailed transcript of what you’ve been doing, in case you forget what exactly it was you were trying to do.

Command lines are even more important on the server. While Windows servers come with a graphical user interface you can remote login into, Unix⁶ servers generally don’t. It’s more efficient to just allow administrators a command line interface – and for most server administrators, it’s quite enough.

And while command lines are not closer to the operating system in a deep technological sense, they are closer to the operating system by convention. They tend to have all the options that a power user would want – and easy ways to specify them, rather than hiding them behind multiple warning signs and buttons labelled “Advanced….”

Last but not least, if you have a series of GUI actions that you often do, you usually have to just keep doing them, even if it’s very tedious. Precious few programs let you do something like write a shortcut key for five menu commands. On the command line, however, you can use aliases or scripts, where a short command stands for a long command, or a single command stands for a whole sequence of commands. You just put into a file the same text you would type at the prompt.

How does the command line actually work?

Generally, a terminal emulator or command line window has a process running in it that presents the prompt (C:\> or similar on Windows, normally something ending with $ on Unix). It then takes in the command, takes the first word, and runs that as a program. This program is launched as a separate process, just like clicking on a program icon launches a separate process in a graphical user interface. The shell waits in the background for the process to finish, and then presents a new prompt. On a modern multitasking operating system, the shell generally also allows you to run commands in the background, and use key combinations (Ctrl-Z on Unix) to put a process in the background, and commands like fg to bring processes back to the foreground. This allows you to run multiple programs at once within the terminal.

On Linux, when a program starts, it conventionally has three open files, 0, 1, and 2, for input, output, and error, respectively. On the command line, by default (for it is configurable), these all correspond to the terminal: input is read in from the keyboard on the terminal (by default line by line), and output and errors are outputted to the terminal. GUI programs will have these three files open when they start too, but unless they’re started from the terminal, the output will normally just silently be ignored.

The program can also draw a window, if a graphical environment is available. On Linux, it is easy for the same program to have a command line interface, and a graphical interface – sometimes at the same time. This is useful if it’s mostly used from the command line, but sometimes also wants to do things like show a chart or graph that can be generated.

macOS and Windows have more complicated GUI frameworks that make a GUI application more different in structure from a command line operation, but you can still launch GUI applications from the command line.

Footnotes

An application is just a computer program that does a task besides making the computer system work as a whole, a task interesting to the user. Examples include word processors, spreadsheets, chat apps, and video games. It’s not so much a rigorous technical term as an amorphous category of software. ↩︎
A desktop environment, also known as a graphical shell, is a graphical user interface for managing the windows you have open, and providing computer-wide menus for launching applications. It also controls the root window, which is what you see when you have no windows open, normally used for shortcuts and files you’re currently working on. Windows and macOS both provide their own desktop environments, which generally aren’t mentioned by name – they are just part of the operating system. Linux and most other Unixes, when they have graphical interfaces at all, can be used with a variety of different desktop environments. ↩︎
This image is taken from Wikimedia Commons. It is by Jason Scott, and available under CC BY-SA 4.0. It was modified by the Wikimedia poster by removing the background. ↩︎
Oddly enough, most web browsers support running without the browser window actually being displayed, in a headless mode. This is generally not usable purely from the command line, but in the context of being wrapped in a larger program (which might be a command line program). Additionally, Microsoft Word and Photoshop can be programmatically controlled – they are both scriptable – but as far as I know neither Microsoft nor Adobe have chosen to provide a command line interface to this functionality, even though they could. Again, it’s about what’s actually available on your computer. ↩︎
It has been said that I have a deficit of attention. ↩︎
I use Unix in a broad sense to include Unix-like operating systems like Linux and the BSDs, even if they aren’t Unix in a trademark sense. ↩︎

Can computers think things?

2023-09-30T00:00:00+00:00

This blog post isn’t about ChatGPT. It isn’t about machine learning, neural nets, or any mysterious or border-line spiritual form of computing. That’s a whole ’nother set of philosophical and metaphysical conundrums (conundra?).

This is about a way people sometimes speak, informally, about bog-standard boring non-AI computers and computer programs. You’ve probably heard people speak this way. You’ve probably spoken this way sometimes yourself:

“The server thinks your password is wrong.”
“The computer thinks you’ve lost the connection.”
“The phone thinks you want to use your headphones. It’s wrong though.”

We normally interpret this as a metaphor, but I’m not sure it is. Is the phone “thinking” you want to use your headphones rather than your car speaker substantially different from us “thinking” our friend would rather get a phone call than a text message?

Part of the problem here is that the word “think” in English can mean different things.

It can mean to cognate, to go through a rational series of propositions in our brains, expressed as internalized speech in our mind’s ear or diagrams in our mind’s eye or pure abstractions. “I am thinking about how to approach this physics problem.” Computers probably cannot do this, and certainly are nowhere as good at it as humans are, not even with this fancy new AI software everyone’s playing with.

But it can also mean to have a belief, a mental model about reality. “I think Joe doesn’t like me very much.” Or, “I think the reason the car won’t start is because the battery is dead.” Computers, I will argue, can do something remarkably similar to humans in this category.

Some languages distinguish these two meanings of “think.” English learners of German often say denken (to cognate), when they mean glauben (to believe), in contexts where both would translate as “to think.” And then, in case that was too simple, there’s also meinen, which means “to suppose” or “to opine,” also used when English speakers might say “to think.”

So here’s my thought on this, or rather, my opinion (meine Meinung):

Computers cannot yet denken, or cognate, like humans. But computers can definitely glauben, or internally believe, specific facts, and they’ve been able to do that since the day they were invented.

In order to figure out whether this is true, we first need to establish what it means to believe something, and then see if computers can do it. What does it mean for humans to think something, to believe something about the world? Can we extract a definition that can then be applied to computers, to see whether computers are capable of the same thing?

So, what does it mean for us to think something is true? Well, it means that we have some internal state, some internal information stored in the physical arrangement of our brains, that corresponds to that thought or belief. We then use that internal state to inform our behavior. If we think our friend would rather get a phone call than a text message, then we might choose to accomodate that and call them instead of texting them.

This internal state, when all is going well, corresponds to a specific external reality. The goal is for the internal state to match the external reality. Sometimes this goal is not met – sometimes we misapprehend the situation, our belief is wrong, or what we think is true is not true. But if we are wrong, we have the same internal state as we would have if we were right, and things were working.

We can therefore define believing or thinking that a proposition X is true thus:

A being believes X is true if they have an internal state that, when the being is functioning correctly, corresponds to X being true, that then informs their behavior such that it is the behavior that makes sense if X is true, rather than the behavior that makes sense if X is not true.

Applied to phone example, we have some internal state in our brain that indicates that “Jill would rather get a call than a text.” How do we know that the state indicates that proposition? Well, we know that when our brains are functioning correctly (a hard thing to define, but also a concept everyone uses all the time), we only have that internal state when the proposition is true. And, we also know that this internal state drives behavior consistent with that proposition being true. Assuming we want to accommodate Jill’s preference, we will call her instead of texting her, an adaptive decision if the belief is true, and a non-adaptive one if the belief is false.

With this framework, it seems almost easier to establish that computers can think something is true than that humans can do this. Humans often have complicated, ambivalent beliefs and thoughts. Humans will often believe something for reasons other than an efficient assessment of its truth value, and act contrary to their own earnestly held beliefs. I think this definition still works for humans, if you take all the confounding factors into consideration, but it’s hard: We get into things like “conscious” or “subconscious” beliefs, or “he says he thinks X, but his actions show he really thinks Y.” And, of course, it’s extremely difficult to define whether a human is “functioning correctly.”

With computers, however, they think all sorts of things. For example, let’s talk about whether a computer thinks a user has administrator privileges. You might see code like this:

let has_admin_privileges: bool = is_admin(conn.get_current_user());

Now, we have an internal state in the computer, a boolean (i.e. true or false) variable that is intended to correspond to whether the user has administrator privileges. If the code is functioning correctly, this variable will take on the value true. We know this, because the definition of “functioning correctly” is implicit in the way the programmer wrote the code, and how they named the variable.

Furthermore, the following lines of code are almost certainly behaviors in line with that interpretation of the internal state.

if has_admin_privileges {
    // Do the thing
    requested_task.perform()?;

    // Signal success
    Ok(())
} else {
    // Signal an error
    Err(Error::AccessDenied)
}

So, when people say things like “my phone thinks I want to use my Bluetooth headphones,” it means that there is information encoded in the silicon of the phone, possibly in an explicitly-named variable, that corresponds to that belief.

So now that I’ve thought this through properly, I don’t even think statements like this are metaphorical. I think they are literally true, and completely appropriate.

Verbal Tics

2023-08-31T00:00:00+00:00

I remember hearing an idea once – I’d like to cite it, but proper citation seems difficult, as I heard it from an acquaintance, and Mr. Google isn’t being his usual helpful self. The idea was, different politicians have these verbal tics, these filler catch-phrases, that indicate their deepest conversational anxieties.

For President Obama, it’s “let me be clear.” According to this thesis, he is really concerned about being unclear, and this tic is so prominent in his speech that it shows that his biggest anxiety is being insufficiently clear about something, as waffling, or evading the deep issue underlying all the petty concerns. And as an American paying some amount of attention, this made sense to me.

For President Trump, the tic under discussion (for he has many) was “believe me.” President Trump was concerned about being called out as a liar, because he was.

And when this discussion came up, I realized that my biggest verbal tick in conversation was “if that makes sense,” or the question form, “does that make sense?” And I realized that I did have anxiety that underlies this verbal tic, a deep suspicion that everything I’m saying is so befuddled and so indirectly and subtly put that it doesn’t make sense to the listener.

Does that make sense?

My Dream C++ Additions

2023-08-30T00:00:00+00:00

UPDATE: I have updated this post to address C++ features that address these issues or have been purported to.

I have long day-dreamed about useful improvements to C++. Some of these are inspired by Rust, but some of these are ideas I already had before I learned Rust. Each of these would make programming C++ a better experience, usually in a minor way.

Explicit `self` reference instead of implicit `this` pointer

UPDATE: This is coming out in C++23, and they did it right! I’m excited! Good job C++!

I admit I haven’t been paying close attention to C++ post C++14. C++17 was up-and-coming and I hadn’t finished learning everything I wanted to about it when I left C++ programming. And I refuse to be embarrassed for not knowing about a feature in a programming language that is not my favorite before any compiler even supports it.

But I am indeed excited for them! This is a substantial improvement I have wanted since well before C++11 came out. They’ve done it pretty close to how I wished for it here, and they have good reasons for how they made it.

There are a few weird parts of this.

For one, it is a pointer, but it is never allowed to be null, and it cannot be modified to point to a different object. In both of these ways, it behaves more like a reference than a pointer.

class Foo {
public:
    void bar() {
        this = new Foo{}; // Error
    }
};

int main() {
    Foo *foo = nullptr;
    foo->bar(); // Undefined behavior
}

For another, when we want to put a modifier on this, like const or volatile, there is nowhere obvious in the function signature to put it. We have to put it awkwardly after the parameters, before the ; or {:

class Foo {
public:
    void bar() const volatile && {
        // Do stuff
    }
};

Oddly enough, whether the parameter is taken by lvalue or rvalue can also be specified, which would make way more sense for a reference parameter instead of a pointer.

The modifiers have to go in this odd location because this is implicit. This is in line with OOP ideology and theory, but in my mind, it’s just a negative. If you have to think about whether it’s const or taken by rvalue anyway when writing the signature, why put those modifiers somewhere you might forget about, instead of right with the declaration of the parameter.

I would change the syntax to fix both of these issues with one fell swoop: allow an explicit self as an alternative to implicit this, and make it a reference:

class Foo {
public:
    void bar(&self) {
        self.baz();
    }

    void baz(volatile const &self) {
        // Do stuff
    }
}

The type would still be implicit, but modifiers can be specified where the type would be. You would also only be able to take by reference or rvalue reference, and never by value, because implicit copy on method call would be a new feature of questionable value. It would not conflict with existing code, as a parameter named self without an explicit type would be illegal under the current syntax.

Of course, this looks rather similar to Rust’s syntax, but believe it or not, I had this idea long before I learned that Rust does self in this way.

A new `byte` type for `uint8_t` and `int8_t`

In C++, the type we use for an individual byte of data, by definition, is char. This is the definition of char in the standard, and while the byte length (CHAR_BIT) doesn’t have to be 8 bits, other standard provisions and practical considerations mean that on a modern platform, it always is.

We might use uint8_t or int8_t for bytes in practical code, but these are defined as typedefs to unsigned char and signed char – I don’t know whether this is required by the standard but it is always done in practice.

However, char is also the type we use for text data, so it is a type with two different contrasting (perhaps even contradictory) sets of semantics.

That leads to many odd results, including the fact that char cannot represent all Unicode characters because it has to be 1 byte long. But the one I want to focus on today is a bit weirder. What does this code print?

#include <cstdint>
#include <iostream>

struct message_data {
    uint8_t message_type;
    uint8_t message_length;
    uint8_t data[1];
};

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << mesg.message_type << std::endl;
    std::cout << "Length: " << mesg.message_length << std::endl;
}

int main() {
    message_data data;
    data.message_type = 100;
    data.message_length = 0;
    print_message_hdr(data);
    return 0;
}

Well, if you thought the numbers 100 and 0 would show up on the output, you’d be wrong. std::cout’s operator<<’s char overloads are triggered, and so these fields, clearly meant as integers, are printed as text:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: d
Length:
[jim@palatinate:~]$

In order to get the integer print-outs we want, we have to override this strange default behavior, perhaps by casting the values to uint16_t before printing them:

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << uint16_t(mesg.message_type) << std::endl;
    std::cout << "Length: " << uint16_t(mesg.message_length) << std::endl;
}

This results in a better output:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: 100
Length: 0
[jim@palatinate:~]$

So, how do we make this a little more ergonomic? We introduce a byte type, that is similar to char, but overloads differently. Like any other integer type, it defaults to signed, and then we add overloads to operator<< and others to treat it like an integer, not like a character. Switching between byte and char would be an implicit cast, but for overloading purposes, they would be different types.

uint8_t and int8_t could then be defined in terms of byte.

I do not know what backwards-compatibility implications it has, but I do think the decision to make char mean byte as its primary meaning instead of “character” was a particularly poor one, and anything we can do to migrate away from it would be good.

Update: Someone drew my attention to std::byte. This one I was aware of, but had not thought about here as I didn’t think it really solves the problem. As it is, it is not an arithmetic type, and therefore cannot be used as the underlying type of uint8_t, leaving the confusing behavior in place.

Real if-else Expression Syntax

Oftentimes, in C++, I find myself writing code like this:

int32_t error_code;
if (setting == Setting::Socket) {
    error_code = initialize_socket();
} else { // setting == Setting::Pipe
    error_code = initialize_pipe();
}

if (error_code < 0) {
    // ...
}

This error_code variable is just one example. I often want to have a variable get different values depending on which side of the if-else statement it’s on, without having to declare the variable without an initializer right ahead of it, and write two assignment statements. Basically, I want if-else to be an expression.

Now, of course, C++ already has the ternary operator: ?:. But it’s so ugly and unreadable that no one uses it, for good reason. It’s hard to remember what the precedence is, meaning if we want to be rigorous and friendly to our readers we need to bracket with ( and ) even if strictly unnecessary, and the result looks like garbage and is hard to format in a way that’s remotely readable:

int32_t error_code = (setting == Setting::Socket
    ? initialize_socket()
    : initialize_pipe()
);

What do I want instead? I want if-else to have this role, to be an expression, where it evaluates to the value of the end of each block (with no semicolon, to make clear that it’s an expression not a full statement):

int32_t error_code = if (setting == Setting::Socket) {
    initialize_socket()
} else {
    initialize_pipe()
};

This is way better than ?:. The blocks can be multiple statements long if necessary. You can add if-else if-else chaining. And, most importantly, it can be formatted like any other if-else.

Update: Someone drew my attention to a lambda-invocation pattern that is, in my mind, equally ugly to ?:, and also leaves you without the ability to return from the enclosing function within the block. This strikes me as extremely hackish and not really an improvement, but I suppose that’s where C++ is going. I am at a loss for why they didn’t just implement GCC’s expression blocks, followed by if as expression. It’s clearly much better in my mind.

I’ve seen the technique from time to time but I guess I figured it was too hackish to mention. I didn’t realize it was getting officially recommended in C++ Core Guidelines. I feel like when they were recommending it, they should’ve simultaneously been trying to get more usable and obvious features included in the programming language itself. Maybe they are, and if so I wish them luck in that! Maybe C++30 will be a safe and usable programming language, equivalent to Rust now.

Variable Shadowing

On a related note, I want to have multiple variables with the same name shadow, rather than resulting in an error message. I want the new variable with the same name to simply hide the old variable, rather than giving me a “conflicting declaration” error (or similar).

Why? Well, a lot of production code involves taking the same conceptual thing, and migrating it through many types. Without shadowing, we have to use awkward Hungarian notation.

void handle_data(const void *data_v, size_t size) {
    const uint8_t *data_ch = (const char *)data_v;
    std::vector<uint8_t> data{data_ch, data_ch + size};
    // Actually do something with `data`
}

The new way would look like this:

void handle_data(const void *data, size_t size) {
    const uint8_t *data = (const char *)data;
    std::vector<uint8_t> data{data, data + size};
}

This also cuts down on how many variables are in scope at once.

This bugs people who are new to Rust sometimes, but it’s fairly easy to learn, and C++ has asked people to learn much, much harder things. Once learned, it is really useful, as the alternative is to use Hungarian notation or equivalents. It also helps you use the right value, as you won’t accidentally go back and use an old one, as it’s shadowed.

First-Class Support for Sum Types

std::variant is awful. I know, because few people except die-hards use it, and people use the Rust equivalent, enums, all the time. The weirdest thing about std::variant is that it supposes that all of the variants hold exactly one value, and one variant per type is sufficient. In reality, multiple variants might hold values of the same type, and many variants don’t need a value – both of which are possible but clumsy to express using std::variant’s semantics.

But C++11 already introduced enum class for more powerful enums! Let’s go all the way and add Rust-style values associated with it, for a compiler-implemented tagged union. The implementation of std::option’s fields would be so much simpler.

template <typename T>
enum class option {
    None,
    Some {
        T value;
    },

    // OK, define some methods
}

This interacts with object lifetimes and constructors in a complicated way, but if there were interest, I know it could be figured out. If you don’t think this feature is necessary, I suspect you’ve spent too long programming without it. Once you get used to this, it’s really hard to go without.

Conclusion

I am not going to do anything to try to make these things happen. I’m sure I’m not the most popular in the C++ community after my long write-ups of how Rust is so much better, and it’s not where my primary interests lie anymore. But, if someone were to make these features happen, it would make my life much easier, when for good reasons, projects I’m working on require me to code in C++.

In Defense of 'C/C++'

2023-08-28T00:00:00+00:00

One of the minor points I discussed in my response to Dr. Bjarne Stroustrup’s memory safety comments was the controversial, apparently deeply upsetting term C/C++. It is controversial and interesting enough that I decided to say a little more about it here.

A little background: Many people, especially outside the C and C++ communities (which, to be clear, don’t always like each other that much) use the term C/C++ to talk about the two programming languages together, as an informal short-hand for “C and C++” or “C or C++.” Within the ~~C/C++~~ C and C++ communities, it is widely hated.

And now for me to say the thing guaranteed to anger the most possible people: I see both sides of this debate.

On the one hand, the term “C/C++” is especially jarring because C and C++ fans regularly engage in actual controversy (famously including Linus Torvalds, of the C-based Linux kernel, insulting C++ and its programmers). It is frustrating to be a C++ programmer, to have strong opinions on what it means to be a C++ programmer, to think that C programmers are making a misguided decision, that using C over C++ is technologically backwards and regressive, and hear people cavalierly implying that the programming languages are the same. And likewise, of course, for the C programmer who feels similarly about C++.

And continuously, both C and C++ programmers are exposed regularly to people who mix up the programming languages when it is harmful. They see bosses and hiring managers who expect you to transition back and forth between them without any friction, and to enjoy them equally. They see resources that promise to teach you “C/C++ skills,” and know that they won’t teach how to use either the way that that language’s particular community actually prefers. They see people using “C/C++” all the time to talk about the languages in a way that only would make sense if they were much more similar than they, in fact, are – or at the very least, than a die-hard partisan of C or C++ would think they are.

And I do think this is understandable. After all, people don’t tend to lump together other languages like this. Java and C# are probably equally related (if not more related), and no one writes that they’re hiring a “Java/C# programmer.” Why should C and C++ get treated this way?

But, on the other hand, C and C++ are actually extremely closely related programming languages. I was writing something recently comparing Rust features to C and C++ features, specifically Rust enums to the tagged union idiom which is used in … C and C++, in very similar ways. I know all the reasons why as a C programmer and as a C++ programmer, I’m not supposed to write C/C++, and still, I was tired of writing “C and C++” over and over again to describe this particular thing that those languages have in common.

It turns out the real problem isn’t the act of writing “C/C++” – it turns out that just banning a problematic word doesn’t fix the real problem here at all – if it has ever fixed any problems. Some people do need to be told that C and C++ programming are different programming languages, different communities and different skillsets, even though they are still related skillsets and related programming languages. But some people who don’t need to be told that still find themselves needing a shorthand sometimes, and don’t feel the cultural need to be over-accommodating in avoiding it.

Because when two things are similar – and stop me if this is confusing! – there’s some ways in which they’re the same, and some ways in which they’re different. Sometimes, it makes sense to lump them together, and sometimes, it doesn’t. But yelling that people shouldn’t write “C/C++” won’t magically help anyone understand this – especially since those people are almost certainly not listening, and you’re preaching to the choir.

In the case of Dr. Stroustrup, he was using the “faux pas” of the NSA using “C/C++” to avoid having to actually address what they said and defend C++. He brought this up in his criticism of the NSA white paper:

As is far too common, it lumps C and C++ into the single category C/C++, ignoring 30+ years of progress.

I said, among other things, that Dr. Stroustrup was being unnecessarily exclusionary based on buzz-words:

He’s reading too much into the orthography and the NSA’s failure to use insider shibboleths of the programming languages they’re trying to criticize. Outside of the “C” and “C++” communities, “C/C++” is a fairly common way to refer to the two related programming languages.

But also, he was calling them out when they were right. In the specific category that the NSA was talking about, there actually is no difference, as I also mention in my post:

While there might be 30+ years of divergence between C and C++, none of C++’s so-called “progress” involved removing memory-unsafe C features from C++, many of which are still in common use, and many of which still make memory safety in C++ near intractible.

Perhaps we all should spend more time thinking critically than nit-picking word choice. And perhaps I should find something better to do than writing blog posts joining the fray, so that’s all I’ll say on the issue for now.

C++ Papercuts

2023-08-26T00:00:00+00:00

UPDATE: Wow, this post has gotten popular! I’ve written a new post that adds new papercuts combined with concrete suggestions for how C++ could improve, if you are interested. Also, if you want to read more about C++’s deeper-than-papercut issues, I recommend specifically my post on its move semantics. Thank you for reading!

My current day job is now again a C++ role. And so, I find myself again focusing in this blog post on the downsides of C++.

Overall, I have found returning to active C++ dev to be exactly what I expected: I still have the skills, and can still be effective in it, but now that I have worked in a more modern programming language with less legacy cruft, the downsides of C++ sting more. There are so many features I miss from Rust, not only the obvious safety features, or even primarily those, but also features that C++ could easily add, like first-class support for sum types (called enums in Rust), or tuples. (Clarification for C++ Fans: std::tuple and std::variant are not first class support, and if you’re used to first class support, you know how unacceptably clunky they are.)

In this blog post, I will focus on the minor problems of C++ that have affected me the most, the little usability papercuts, the petty inconveniences that just waste time. Instead of focusing on comparing them to Rust or other programming languages, I will focus on why they don’t make sense from a C++ point of view, with reference to just C++. I know better than to hope that by doing this that die-hard C++ fans will accept my criticism, but perhaps it will be relatable to C++ programmers who don’t have Rust experience.

Before I start getting into the papercuts, though, I want to address one of the primary defenses I’ve seen of C++, one that I’ve found particularly baffling. It goes something like this:

C++ is a great programming language. The complaints are just from people who aren’t up to it. If they were better programmers, they’d appreciate the C++ way of doing things, and they wouldn’t need their hand held. Languages like Rust are not helpful for such true professionals.

Obviously, the phrasing is a bit of a parody, but I’ve seen this sort of attitude so many times. The most charitable view I can take of it is a claim that C++’s difficulty is a sign of its power, and the natural cost of using a powerful programming language. What it reads like to me in many cases, however, is as a form of elitism: a general idea that making things easy for poorer programmers is pointless, and that good programmers don’t benefit from making things easier.

As someone who has programmed C++ professionally for a majority of my career, and who has taught (company-internal) classes in advanced C++, this is nonsense to me. I do know how to navigate the many papercuts and foot-guns of C++, and am happy to do so when working on a C++ codebase. But experienced as I am, they still slow me down and distract me, taking focus away from the actual problems I’m trying to solve, and resulting in less maintainable code.

And as for the upside, I see very little that C++ gets in exchange for all of this difficulty. The only ways in which C++ is more performant or more appropriate than Rust are in terms of platform support, legacy codebases, optimizations that are only available in specific compilers that happen to not support Rust, or other concerns irrelevant to the actual design of the programming language.

While I am proud of my C++ skills, I am not too proud to appreciate that better technology can render them partially obsolete. I am not too proud to appreciate having features that make it easier. In most cases, it’s not a matter of the programming language doing more work for me, but of C++ creating unnecessary extra make-work, often due to decisions that made sense when they were made, but have long since stopped making sense – don’t get me started on header files!

But I also want my programming language to be beginner-friendly. I am always going to work with other programmers with a variety of skill-sets, and I would rather not have to clean up my colleagues’ mistakes – or mistakes of earlier, more foolish versions of myself. If making a programming language more beginner-friendly sacrifices power, then I agree that some programming languages should not do it. But many, even most of C++’s beginner-unfriendly (and expert-annoying) features do not in fact make the language more powerful.

So, without further ado, here are the biggest papercuts I’ve noticed in the past month of returning to C++ development.

`const` is not the default

It is very easy to forget to mark a parameter const when it can be. You can just forget to type the keyword. This is especially true for this, which is an implicit parameter: there is no time when you are typing out the this parameter explicitly, and therefore it won’t sit there looking funny without the appropriate modifiers.

If C++ had the opposite default, where every value, reference, and pointer was const unless explicitly declared mutable, then we’d be much more likely to have every parameter declared correctly based on whether the function needs to mutate it or not. If someone includes a mutable keyword, it would be because they know they need it. If they need it and forget it, the compiler error would remind them.

Now, you might not think this is important, because you can just not use const and have functions with capabilities they don’t need – but sometimes you have to take things by const in C++. If you take a parameter by non-const reference, the caller can only use lvalues to call your function. But if you take a parameter by const reference, the caller can use lvalues or rvalues. So some functions, in order to be used in natural ways, must take their parameters by const reference.

Once you have a const reference, you can only (easily) call functions with it that accept const references, and so if any of those functions forgot to declare the parameter const, you have to include a const_cast – or go change the function later to correctly accept const.

Lest you think this is just a sloppy newbie error, note that many functions in the standard library had to be updated to take const_iterator instead of or in addition to iterator when it was discovered correctly that they made sense with a const_iterator: functions like erase. It turns out that for functions like erase, the collection is what has to be mutable, not the iterator – a fact that the maintainers of the C++ library simply got wrong at first.

Obligatory Copying

In C++, for an object to be copyable is the default, privileged way for an object to behave. If you don’t want your object to be copyable, and all its fields are copyable, you often have to mark the copy constructor and copy assignment operator as = delete. The default is for the compiler to write code for you – code that can be incorrect.

If you do make your class move-only, however, beware, because that means that there are situations where you can’t use it. In C++11, there was no ergonomic way to do a lambda capture by move – which is usually how I want to capture variables into a closure. This was “fixed” in C++14 – for when you want what should have been the default from the beginning, you can now use extremely clunky move-capture syntax.

However, even then, good luck using the lambda. If you want to put it in a std::function, you’re still out of luck to this day. std::function expects the object it manages to be copyable, and will fail to compile if your closure object is move-only. This is going to be addressed in C++23, with std::move_only_function – but in the meantime, I have been forced to write classes with a copy constructor that throws some sort of run-time logic exception. And even in C++23, copyable functions will be the default, assumed situation.

This is strange, because most complicated objects, especially closures, are never, and should never be, copied. Generally, copying a complicated data structure is a mistake – a missing &, or a missing std::move. But it is a mistake that carries no warning with it, and no visible sign in the code that a complex, allocation-heavy action is being undertaken. This is an early lesson to new C++ devs – don’t pass non-primitive types by value – but it’s possible for even advanced devs to mess up from time to time, and once it’s in the codebase, it’s easy to miss.

By-Reference Parameter Papercuts

It is unergonomic to return multiple values by tuple in C++. It can be done, but the calls to std::tie and std::make_tuple are long-winded and distracting, not to mention that you’ll be writing unidiomatically, which is always bad for people who are reading and debugging your code.

Side note: Someone brought up structured bindings in a comment, as if this fixed the issue. Structured bindings are a great example of the half-way fixes that proponents of modern C++ love to cite. Structured bindings help some, but if you think they make returning by tuple ergonomic, you’re mistaken. You still need to either write std::pair or std::make_tuple in the function return statement, or std::tuple in the function’s return type. This isn’t the worst, but it’s still not as light-weight as full first-class tuple support, and it’s not enough to have convinced people to not use out parameters, which are my real complaint.

And even at that, it’s not that out parameters (or in-out parameters) are bad, but that they’re bad in C++, as there is no good way to express them.

So what do we do instead? The clunkiness of tuples leads people to instead use out parameters. To use an out parameter, you end up taking a parameter by non-const reference, meaning the function is supposed to modify the parameter.

The problem is, this is only marked in the function signature. If you have a function that takes a parameter by reference, the parameter looks the same as a by-value parameter at the call site:

// Return false on failure. Modify size with actual message size,
// decreasing it if it contains more than one message.
bool got_message(const char *mesg, size_t &size);

size_t size = buff.size();
got_message(buff.data(), size);
buff.resize(size);

If you’re reading the calling code quickly, it might look like the resize call is redundant, but it is not. size is being modified by got_message, and the only way to know that it is being modified is to look at the function signature, which is usually in another file.

Some people prefer out parameters and in-out parameters to be passed by pointer for this very reason:

bool got_message(const char *mesg, size_t *size);

size_t size = buff.size();
got_message(buff.data(), &size);
buff.resize(size);

This is great – or would be, if pointers weren’t nullable. What does a nullptr parameter mean in this context? Is it going to trigger undefined behavior? What if you pass a pointer from a caller into it? People often forget to document what functions do with a null pointer.

This can be addressed with a non-nullable smart pointer, but very few programmers actually do this in practice. When something isn’t the default, it tends to not be used everywhere where appropriate. The sustainable answer to this is changing the default, not heroic attempts to fight human nature.

Obligatory side-gripe: At least in non-owning situations like this, it is possible to write such a smart pointer. However, if you want to write the obvious companion, a non-nullable owning smart pointer, a companion version of std::unique_ptr, then it cannot be done in a useful way, because such a pointer cannot then be moveable.

Method Implementations Can Contradict

In C++, every time you write a class, especially a lower-level one, you have a responsibility to make decisions about certain methods with special semantic importance in the programming language:

Constructor (Copy): X(const X&)
Constructor (Move): X(X&&)
Assignment (Copy): operator=(const X&)
Assignment (Move): operator=(X&&)
Destructor: ~X()

For many classes, the default implementations are enough, and if possible you should rely on them. Whether or not this is possible depends on whether naively copying all of the fields is a sensible way to copy the entire object, which is surprisingly easy to forget to consider.

But if you need a custom implementation of one of these, you are on the hook to write all of them. This is known as the “rule of 5.” You have to write all of them, even though the correct behavior of the two assignment operators can be completely determined by the appropriate constructor combined with the destructor. The compiler could make default implementations of the assignment operators that refer to those other functions, and therefore would always be correct, but it does not. Implementing them correctly is tricky, requiring techniques like either explicitly protecting against self-assignment, or swapping with a by-value parameter. In any case, they are boilerplate, and yet another thing that can go wrong in a programming language that has many such things.

Side note: One commentator did not understand what I meant. It is true that many classes can use = default for all these methods. However, IF you customize the copy constructor or move constructor, you must THEN also customize the assignment operator to match, even though the default implementation could have been correct, if the language was defined more intelligently.

I thought this was clear by citing the rule of 5, which essentially says this.

The full rule is explained on CPP Reference. If you customize the copy or move constructor, the corresponding = default assignment operator will be wrong. Be careful! Note how the example code does not use = default for the assignment operators, even though the assignment operators contain no logic.

“Modern” C++

After seeing comments on Hacker News, I felt compelled to add this section. Every time someone complains about anything in C++, someone will mention a newer version of C++ that fixes it. These “fixes” are usually not that good, and only feel like fixes if you’re used to everything being kind of clunky.

Here’s why:

The default way still is the old, bad way. For example, capturing lambdas by move should be the default, and std::move_only_function, coming soon in C++23, should have been the default std::function.
For that reason, and because there’s never warnings enabled on the old, bad way, even new coders keep doing things the bad way.

Of course, I understand that this is important for backwards-compatibility. But that is the entire problem: C++ has too many bad decisions accumulated. Why was copying the default for parameter passing collections, let alone for lambda capture? I know the historical reasons, but that doesn’t mean that a modern programming language should work that way.

Even C++11 couldn’t clean up the fact that raw pointers and C-style arrays get nice syntax, while smart pointers and std::array look terrible. Even C++11 couldn’t clean up that it was working around a language designed without moves.

Conclusion

Unfortunately, I am all too well aware of why these decisions were made, and it is exactly one reason: Compatibility with legacy code. C++ has no editions system, no way to deprecate core language features. If a new edition of C++ was made, it would cease to be C++ – though I support the efforts of people to transition C++ to new syntax and clean some of this stuff up.

However, if you ignore backwards-compatibility and the large existing codebases, none of these papercuts make the programming language more powerful or better, just harder to use. I’ve seen good-faith arguments in favor of human-maintained header files, surprising as that is to me, but I challenge my readers to tell me what is beneficial about C++’s design choices in these matters.

You might find these things trivial, but these all slow programmers down, while simultaneously annoying them. If you are experienced enough, your subconscious might be adept at navigating it, but imagine what your subconscious could do if it didn’t have to. But how adept are you at seeing these mistakes in a code review from your junior colleagues? If you are a rigorous reviewer, how much more time does it take? How adept are you at finding these issues quickly when a bug arises?

We’d be more effective, more efficient, and happier if these issues were resolved. Programming would be both enjoyable and faster to do. What’s the downside? The only upside is continuity with history. And while I can see the value in that, it is a very limited value, with very limited scope.

New Link: Technical Only RSS

2023-08-06T00:00:00+00:00

TLDR: I am adding a new link for RSS subscribers who just want to subscribe to technical posts. The RSS feed has always been available, but it is now explicitly one of the links across the top, for those who want their RSS feed to only give them my new technical posts.

I am writing this post primarily to let people know about this new link, but I also want to muse on it a little.

I realize that I have, in some ways, two blogs here in one website.

The Coded Message is primarily read for its technical content, especially for the posts about Rust. But I also write about other topics that interest me, and those posts are generally much less popular.

I combine them on the same website for a few reasons.

For one, it’s easier for me to have one blog. Blogging is a hobby for me, and so it has to play second fiddle to other life obligations, which is most of why I’ve been slow to finish some blog series and some promised future posts – I have not forgotten. This also means that anything that would make blogging harder for me, including separating out these blogs into two fully separate websites, is likely to make me blog substantially less. Laziness might not always be a virtue, whatever Larry Wall might say, but some amount of it is essential to actually accomplishing goals, especially in the hobby space.

But there is also a reason besides laziness, that is a little harder to articulate. As much as this blog largely concerns my professional work, it is my personal blog. All of the programming posts are laden with my personal opinions about programming, and this website is about everything I personally have to say publicly on any topic, not just programming. A separation between my professional and personal blogs would lead, in my own mind, to a sense of obligation to make the professional blog a polished resource for programmers, with more organization and possibly even a regular schedule, as opposed to merely being a forum where I hold forth on whatever topics interest me, which often but not always happens to be programming.

That said, I do make all my posts in the hopes that people read them, and find them useful in some way (even if that use is, as for my fiction posts, primarily entertainment). And I am aware that a large portion of my readership primarily, or even exclusively, finds my technical posts useful. As much as I may wish that all of my readers who are here for Rust content also care about my musings on other topics, I know that many of them do not, or even seriously disagree with me on these topics.

I try already to accommodate this. If you sign up for my newsletter, by default, you are only subscribed to technical posts, and you have to follow an additional link and explicitly subscribe if you want other topics. If you go to www.thecodedmessage.com in your web browser, you can click the link at the top labelled Computers/Programming Posts. And now, if you want to subscribe to just the technical posts via RSS, there is also a link at the top for that purpose.

I still encourage people who are interested in my other posts to read them, and I still plan on having this website combined for at least the medium-term future, but I wanted people to know that a technical-posts only RSS feed was available, if they so chose.

As always, I welcome feedback on my blog in the form of comments and e-mails (jah259 at cornell dot edu). Thank you so much for reading!

The Curse of Coffee

2023-07-08T00:00:00+00:00

TRIBUNAL PROCEEDING TRANSCRIPT
SUB LEGIBUS ORDINIS SACROSANCTI IMMORTALIUM
PROVISIONAL PROOF TEXT

IN THE CASE OF:
ŌRDŌ SACROSANCTUS VERSUS THE NAMELESS DAUGHTER OF MUŠMAḪḪU THE SEVEN-HEADED SERPENT, SHE WHO IS KNOWN TO THE MORTALS AS EUNICE

LORD JUSTICE MEPHISTO, PRESIDING
LORD JUSTICE DRACHENMILCH, LORD JUSTICE BA’AL-HA-KHUMUS, AND ~~LORD~~ LADY JUSTICE XYXXYZ

MR. AZAXAZALIA, ESQ., PROSECUTOR
MS. “EUNICE”, DEFENDANT

A RECORD OF EUNICE‘S TESTIMONY
TRANSCRIBED BY GEORGE SMITH, HUMAN, JUNIOR APPRENTICE CLERK
COURTROOM 31B, NO OTHERS IN ATTENDANCE

EUNICE, DEFENDANT: My lady. My lords.

I have been called upon by this most ancient, most esteemed, most noble tribunal to give a reckoning of my behavior. You have already heard the prosecutor’s speech, and now it is time for me to defend myself. And defend myself I shall, with pleasure. To be frank, the story – the entire story, without the prosecutor’s dishonest gaps and distortions – speaks for itself. So, rather than try to wrangle a creative interpretation of some of the more arcane and ancient laws of our Ōrdō, as many before you have done, including this sly prosecutor, I will simply tell the whole truth of what happened, and you shall see that, far from being criminal, it upholds – nay, epitomizes – all of our finest traditions.

Let me set the scene for you.

LET THE RECORD STATE THAT AT THIS POINT THE DEFENDANT BEGINS TO SPEAK IN A LOUD, INTENTIONALLY MUFFLED VOICE –GEORGE, CLERK

This is a reminder that all F trains are running on the D line and all D trains are running on the F line from Broadway Lafayette throughout Brooklyn. This is, to repeat, an F train running on the D line from Broadway Lafayette throughout Brooklyn. Thank you for riding MTA New –

LORD JUSTICE BA’AL-HA-KHUMUS: The defendant is reminded to stick to the facts, the legal facts. This is a tribunal, and we are interested in the facts and the law, not your acting or story-telling skills.

EUNICE, DEFENDANT: Of course, my lord. I’m sorry.

LET THE RECORD STATE THAT THE DEFENDANT DOES NOT SEEM AT ALL SORRY. –GEORGE, CLERK

Kevin had no excuse. This is not my opinion, but a legal fact. He wasn’t late for work – he wasn’t even going to work. He wasn’t even late for brunch: that wasn’t for another two hours, and Peter, the friend he was meeting up with, was always late. Even if he just completely went to the wrong place, and had to walk across Brooklyn, he still would have had time to go home – he had spent the night with a girl he was seeing – to not just go home, but also to shower and change, and still make it to brunch on time.

The walk might’ve even done him some good, not in terms of exercise – exercise he got plenty of, efficiently and perfunctorily, at his office gym – but in terms of fresh air and a change of pace and scenery. Maybe he would’ve been able to relax and enjoy life some, to slow down and calm down. Maybe, just maybe, he would even have been able to avoid his fate.

But in spite of these low stakes, when he heard this announcement, when he was reminded of this deeply absurd but ultimately trivial inconvenience, he treated it like an emergency. Carelessly, recklessly, he leapt up from his seat, too quickly to pay attention to where he was going, but somehow still slowly enough that his Apple AirPods™ (Pro, 3rd generation) were at no risk of falling out of his ears.

Have I been unfair towards Kevin? I concede, I really do, that it is not a sin – not a sin per se – to be anxious and rushed. And sometimes I do question how harshly I judged him for the expensive headphones. I knew he could well afford them – I knew everything about him just by looking at him, and when you reach my age you know far more than most people even think possible. And besides, everyone has the right to listen to music, the right to ignore strangers on the train who are trying to talk to you, even innocent, completely harmless old women who are really just trying to explain to you that the lid on your Starbucks coffee cup is slightly askew, that – drip, drip, drip – you’re leaking all over the (admittedly already quite unsanitary) subway seat.

But given what happened next, I think you’ll forgive me all this judgmentalism. I certainly do not regret any of it.

LET THE RECORD STATE THAT AT THIS POINT THE DEFENDANT BEGINS TO VIOLENTLY WAG HER FINGER, AND HER VOICE BECOMES SOMEWHAT CREAKY –GEORGE, CLERK

I regret many things in life. I do rue and lament many of the paths I’ve walked – and many more that I walked right past. Anyone as old as myself who claims to have no regrets is fooling themselves or lying, probably both.

But concerning Kevin I regret absolutely nothing.

Because if you are as careless, as reckless, and as rushed as Kevin – and again, with no urgency or occasion to justify it – and if you pay no attention to your surroundings, not even to rudimentary courtesy, not to mention basic safety, and you cause a poor old woman, who was at the time using her cane to ever so slowly hoist herself out of her seat, to not only fall face-first to the floor but to then find herself drenched and soaked in your still quite hot coffee, well, the least you could do – the very least – would be to resign yourself to a slightly more inconvenient trip, to accept a slightly more complicated day, and check (for more than a split second) whether she’s OK – maybe even help her back up onto her feet.

But nothing like that from Kevin. Just a split second’s glance, just enough that anyone could see that he knew what he had done, and no more. A glimpse, and then out the door he went. Other passengers helped me to my feet and offered me a handkerchief (from the older Italian gentleman from the Bronx in a suit) and napkins (from the young generically-white lesbian transplant from Minnesota) to clean myself. They did this out of common courtesy. They did this, for me, even though I was old and strange, even though they had all just heard me scream, like a crazy person, from the bottom of my gut through the top of my lungs, “MAY YOU NEVER DRINK COFFEE AGAIN! MAY IT NEVER EVEN TOUCH YOUR LIPS! MAY IT ENTICE AND ALSO STYMIE YOU!”

LET THE RECORD STATE THAT THE CAPITAL LETTERS ABOVE INDICATE THAT THE DEFENDANT IS THROWING HER HEAD BACK, CLOSING HER EYES, AND SCREAMING IN THE WITNESS STAND. THE SOUND EMERGING FROM HER INHUMANLY DISTENDED MOUTH IS THAT OF TEN WOMEN SCREAMING THE SAME WORDS SLIGHTLY OUT OF SYNC WITH EACH OTHER. THE SOUND IS ECHOING BEYOND WHAT MAKES SENSE FOR THE ACOUSTICS OF THIS ROOM. THE STONE WALLS ARE SHAKING. AND YET, THE JUSTICES ARE SITTING CALMLY, AS IF THIS WERE A COMPLETELY NORMAL OCCURRENCE. –GEORGE, CLERK

And to be clear, Kevin knew that what he did was wrong. He even was able to hear the screams of the curse, and thought it was fair – fair at least that the old woman was angry, angry enough that he could hear her, that is my, malediction through the closed train doors and over the sound of the departing train. He even felt some measure of repentance – or at least a mental act somewhat cognate to repentance, a Kevin version of it, if you will. The actual words from his mouth were, to be exact, “fucking trains,” but behind those words, there was an inkling of a glimmer of a feeling of a spark of responsibility.

But more than guilt, more than any moral regret, Kevin regretted that he had no coffee left. He threw the now-empty paper cup in an already overflowing trash bin on the platform, and pretended not to notice as it bounced out and fell to the floor. On the next train not only did he not get a seat, but he had nothing to even sip on.

There was a coffee shop near Kevin’s stop. It was a small, independent place, with black walls and mismatched wooden tables, with knick-knacks for sale and a disproportionate number of vegan options on the menu. Kevin didn’t like the vibe; he preferred the standardization of Starbucks like the one he’d gone to earlier that morning. It was two blocks out of the way, but the extra time would be more than made up for by getting someone else to make coffee rather than grind them and prepare them at home – which always took way longer than he thought it would.

As he approached it, there was something off about it. He didn’t fully process it at first, but the relative dimness of the storefront, and the general lack of energy, dampened his mood before he solidified and verbalized his thoughts, and then had them finally consolidated and confirmed when he arrived at the door.

It was closed. On a Sunday, somehow. A sign out front said that there was some sort of an important family matter, and gave no further detail. Kevin sighed, feeling disappointment combine with his pre-existing grogginess and need for caffeine – and the connected need for the taste of coffee and the feel of a drink in his hand.

No matter, there was still a Starbucks. After another block or three of groggy, belabored, un-New Yorker-like slogging, he found it. The line was a little long, but the end was in sight.

But the line didn’t move. And then, it didn’t move. And then, it still didn’t move. The baristas poked at a touch screen and occasionally muttered something about patience or just needing another minute, until finally one of them (his name tag announcing him as Jason) decided to simply call it, and told everyone in the line that their POS system was broken. There was no alternative, so perhaps they should just disperse. And so, coffeeless, Kevin finally went home.

Of course, as any coffee addict would do in such a situation, which is to say any office worker in the entire five boroughs, he went to make himself coffee as soon as he got home. Unlike many office workers, he defined “as soon as” a little over-strictly – he did this even before feeding his rightfully and righteously angry, hungry cat, who had done nothing to deserve this neglect besides being nice to Kevin in the pet shelter.

But I regress! My lords – and lady – none of you are as old as I am, so you perhaps have not yet gained as rigorous a Sight as I have. I can see what happened next as though I was actually there – and of course, in a sense, I was there, through the words of my curse. It brings a smile to my lips even now to remember, not only seeing but even hearing, Kevin rushedly shoveling the beans into his coffee grinder, extra ones clattering on the floor and bouncing in all directions through his kitchen, Mittens the cat running after them, thinking they were perhaps a form of long-awaited food, and finally, once it was vaguely close to full, him pushing the button and hearing, rather than the normal churning noise, a mere, half-hearted “whirr-click.”

Not at all the correct sound, you could tell from his face. Undeterred, he pressed the button on the grinder again. “Whirr-click, whirr-click-SNAP!”

LET THE RECORD STATE THAT AT THE WORD “SNAP” THE WITNESS STAND AND THE DEFENDANT ARE LITERALLY STRUCK BY LIGHTNING. HOW CAN THIS HAPPEN INDOORS? HOW DID IT NOT CATCH FIRE? HOW DO THE JUSTICES REMAIN SO IMPASSIVE, EVEN BORED-LOOKING? I AM NOT PAID ENOUGH FOR THIS. –GEORGE, CLERK

The plastic casing of the grinder cracked, then broke. Streams of beans flew to the ground. Mittens hissed and zoomed away into the bedroom closet.

Kevin slowly slid down to the floor, and put his face in his hands. The sound of his curse was echoing in his ears. He wasn’t a superstitious man. To the contrary, he had registered himself on the website of an atheistic, anti-supernaturalist movement called “Brights” and therefore listed “Bright” under his “Religious Views” on Facebook. He was utterly committed to the Rationalist cause. He even dabbled in Effective Altruism; the way to truly make the world a better place and help humanity, he thought, was to ensure AI alignment. In any case, this is all to say, he didn’t believe in anything as irrational and superstitious as curses, and he certainly wasn’t about to start now.

After all, and I think this is quite essential to take note of, everything that had happened is the sort of thing that just happens sometimes. Family businesses have family emergencies sometimes. Tills break down sometimes, POS systems even more often than sometimes. And even the particular way that that coffee grinder broke, believe it or not, had happened to a full 1.3% of purchasers, particularly those who, like Kevin, weren’t good about cleaning it properly. Things do break sometimes.

And by the way, my esteemed lady and lords, I invite you to investigate if you don’t believe me. I have nothing to hide.

And yet, Kevin cried, feeling himself slip in his faith in no faith, his face twitching in the manner that in my several centuries of life has always indicated dogmatic struggle and religious doubt. Sure, things go wrong sometimes, he thought, but a person with money in a first world city and time to spare will eventually be able to purchase coffee, whatever weird old ladies on the train might say.

Ultimately, a gear clicked into place, and he returned to some semblance of spiritual stasis. This must be some sort of statistical effect, he concluded. Most days, after all, don’t have coincidences, but coincidences do happen sometimes. Sometimes, even coincidences involving very odd words from very odd old ladies. And perhaps something about his behavior is being influenced by her, subconsciously making him go places where the coffee is not available … but then again, how does that make sense?

In any case, he would proceed according to his beliefs. He believed thus: Coffee can be bought, and curses weren’t real, old ladies or no old ladies. Maybe not at any individual place, but in New York City, with enough money and time, a man can eventually drink coffee. To solidify these beliefs further in his mind, he nodded furiously, as if agreeing with himself. Then, he got out his phone, and calmly – he certainly kept on telling himself he was calm – ordered a new coffee grinder for delivery off of Amazon.

Kevin then splashed some water on his face, grabbed his bag, and walked out the door. It is unclear whether he heard, as he was leaving, the muted resumption of meowing from Mittens, who, of course, remained unfed.

I will now jump ahead to the actual brunch with Kevin’s friend Peter, the incident that the prosecutor focused on. I will skip over the incident of the Starbucks barista being fired for not only breaking the till but leading the customers on with the promise of coffee – my supplementary submission shows clearly that she was about to be fired for other reasons, nevertheless. I will also skip over the package thief who would later be struck by a car crossing the street after stealing the replacement coffee grinder from Kevin’s stoop. He was going to steal packages anyway, my curse merely redirected him to Kevin’s house. His death was in any case a result of his own decision to jaywalk, and his reincarnation as a raccoon as punishment for package thievery seems to me completely justified. In any case, the prosecutor failed to articulate a valid legal claim for those incidents, and as I said, I will skip past them, and let my lady and lords of the tribunal read about them in my brief.

My lady and lords, as we all know, Kevin did eventually meet his friend Peter for brunch, albeit half an hour late. Even Peter, who was never on time for anything, was already seated when Kevin arrived, at a small table on the restaurant’s cozy rear patio, already sipping his bloody maria, Peter’s new favorite drink, with tequila instead of vodka. And there beside the novel brunch cocktail, in a small mug, not yet touched, there sat hot, steaming, freshly poured black coffee.

Kevin was halfway through greeting Peter, “Hey, good to s–,” when he saw the coffee. Kevin wasn’t a very emotional man, and he certainly wasn’t easily moved, so he wasn’t all that familiar with the feeling he felt upon seeing it. His limbs tingled and he involuntarily sharply gasped for air, and his shoulders and then his legs shook in an all-body shudder. Peter was looking at his phone and didn’t notice.

Kevin stood there speechless for a few seconds as Peter continued to fiddle with something on his phone. “Hey, yeah, good to see you too, man,” he said, somewhat perfunctorily. “Sorry, just give me one –”

“Can I have a sip of your coffee?” Kevin interrupted. The words rushed out of Kevin’s mouth before any filter could catch them or any social grace (of which Kevin somehow had some amount of) could interfere. A young woman from the next table looked up in surprise at Kevin’s abruptness, and then quickly looked away again, pointedly not eavesdropping.

Peter finally looked up from his phone, opened his mouth to talk, thought better, and closed it again. A second later, he found his words. “Um, sorry, I’m still being COVID conscious. You know, the waiter will be right back and then I’m sure you can order your own.”

“Ah,” said Kevin, simply, slightly embarrassed but not so slightly disappointed in his friend’s dire, even soulless lack of charity in this troubling time.

Peter looked up at Kevin and decided everything was normal after all. “Rough day?” he asked.

“You can say that again,” said Kevin. “Two coffee shops were closed.”

Peter smiled. “Well, I’m sure the waiter will be back soon. I do need to sneak off to the restroom for a moment, though.” Peter stood up and walked away from the table, calling back “But no stealing! Just wait for the waiter.”

Kevin sat patiently for a minute before everything happened. And I would like to just take this opportunity to point out that really, Kevin is to blame here. Not only was he doing something morally impermissible, in stealing his friend’s coffee and spreading his spit and germs, but he was showing a shocking lack of patience – the waiter was even then walking towards the table, ready to take this late-comer’s drink order. That, and, Kevin had to know by now, even if he wouldn’t admit it, that the more drastically he tried to fight the curse, the more drastic the consequences would also be – and Kevin was responsible, and is responsible, for all of them.

But mortals never learn anything, my lady and lords. And we can’t take responsibility for their mistakes. For my curse to work, this action could not go unresponded to. And so, as the coffee moved towards his lips, and his heart rose in his chest as he smelled the familiar smell, a fire alarm went off: “Whoo-OOP! Whoo-OOP!”

LET THE RECORD STATE THAT AT THIS POINT THE ENTIRE COURTROOM BEGINS TO GLOW AS THE SHRILL SOUND OF A FIRE ALARM FILLED IT. –GEORGE, CLERK

Kevin’s hand twitched and the coffee flowed back to the bottom of the cup. Should he evacuate? Did it make sense to go into the building, where there might be a fire, rather than just wait out on the patio? While he was considering this, however, the young woman who was not eavesdropping on Kevin rushed past him, fleeing towards the fire, knocking the cup out of Kevin’s hand and shattering it against the ground.

Not one to resist peer pressure, Kevin also ran into the building, which was actually burning, and where the grease fire soon gave all of our mortal characters severe burns. This sent Kevin to the hospital, where he finally learned wisdom, and since has only ordered tea. Perhaps all the mortals involved will learn some manners from this incident.

I do not enjoy mortals, my lady and lords, nor do I sympathize with them. That, however, does not make me a criminal. The curse was, according to our customs, reasonable and proportionate. It was only Kevin’s willful defiance of it that resulted in this mayhem, and therefore, he was the assailant as well as a victim.

My lady and lords, I rest my case.

AT THIS POINT THE DEFENDANT VANISHES. THE JUSTICES DO NOT SEEM SURPRISED OR DISTURBED. HOW COULD I HAVE SO VASTLY MISUNDERSTOOD THE REQUIREMENTS FOR THIS JOB? –GEORGE, CLERK

LADY JUSTICE XYXXYZ: Thank you. We will now take a brief recess.

[END TRANSCRIPTION]

On ADHD Medication

2023-07-03T00:00:00+00:00

Here’s a story; stop me if you’ve heard it before.

There’s a child, an energetic, enthusiastic child, perhaps hard to deal with in some ways, but all around just beautiful. And then they go to a parochial school – or perhaps they just have a rather strict public school teacher. In either case, the authority figure makes it their wicked mission to suppress all the beautiful children’s personalities into identical, well-behaved zombies in the interest of the idol of order. Only our heroic child remains with their own personality, constantly getting in trouble for it but remaining themselves.

In the next stage of the story, the villain makes their move. A teacher, or a principle or a school nurse can’t handle the child, who admittedly can be a handful sometimes. They suggest the child has ADHD, and put the poor child on medication. Now, with the power of the conformity pill, this child’s beautiful flowering of personhood has been bleached to the same level as all the other children, “proper” and “well-behaved” – which is to say, boring. And perhaps that is “just a “shame – or perhaps a heroic parent removed the medication, or some other “happy ending” intervenes in the third act.

In any case, the story is concluded with self-assured tsking against those who would pathologize childhood and good spirits, and maybe against the overdiagnosis of ADHD … or, if the tellers are bold, the entire concept of ADHD. The moral is clear: Keep the zombie drugs away from our amazing, perfectly normal children.

I’ve heard this story many times. I’ve read this story. I’ve heard this story first-hand or second-hand or as rumor, in in-person conversations and on Facebook posts, from parents and family and friends. Sometimes, people tell me some variation of this when I tell them I’ve started taking ADHD medication – an odd choice, given that few of them are really close enough for it to be appropriate for them to try to undermine my medical decisions. I will say, however, that it doesn’t count in my mind when people tell this story about themselves – that is either second-hand from parents’ framing of the narrative, or a different (but rarer than you might think) effect which deserves an entirely different blog post.

But I think that the story as commonly told has some huge gaps. Or rather, that we’re getting the wrong moral out of it by not thinking critically about what’s going on. Obviously, from the fact that I take ADHD medication, I think it is a good thing, often necessary, often useful. In this I include stimulants (even though for unrelated reasons I’m not on stimulants). So of course, I get a different take-away from this story.

This is difficult to explain, because I do know that Adderall and other ADHD medications do sometimes have unwanted personality-altering side effects. I am also not sympathetic with the villain in the story – I am also not a fan of the near-performative overconformity of parochial schools, nor am I enamoured of “strict” or overly “disciplined” environments for children, no matter how their brains work. But in spite of all of these caveats, I still don’t buy into the premise that, in this story, the school used medication to “turn the child into a zombie.”

Here’s the key point: In this story, generally, all the children, medicated or not, are eventually turned into zombies. Normally, whether spelled out or implied, we understand that the school or teacher only has to resort to medication for its zombification for one child, or perhaps a few children. What zombifies the other children? Or, from the school’s perspective, makes them well-behaved? It is not the ADHD medication that makes the unmedicated children behave “like a zombie,” or even the medicated children, but rather some form of social pressure.

So why doesn’t the social pressure work on the protagonist of the story without medication? I do think if they act differently than all the other children without ADHD medication, and then the same as the other children with ADHD medication, that probably means they do have ADHD. And if there’s so much social pressure on these children that all the other ones behave in an orderly fashion, and the ADHD child does not, that probably really is bad for the ADHD child. They’re probably not enjoying their flowery personality in such an environment. They certainly still get all the downsides of the social pressure – without the upside of even having the ability to conform to it.

See, most children are capable of behaving at various levels of enthusiasm and mutedness, chaos and orderliness. The other children in the class know that, in this school, they are expected to behave a certain way. The ADHD child surely also knows that, but they find that they cannot. Their unmedicated behavior isn’t some flowering of their true self or rebellion in favor of being human – it’s a sign that they can’t do something the other children can. It’s a sign of their disability.

The fact that the child’s behavior changes with the medication takes on a new interpretation in this context. The medication doesn’t turn the child into a zombie; it gives the child self-control. In the social siutuation of a “strict” school, the child chooses to use that self-control to conform and act as a “zombie” – for the same reason the other children are conforming.

Here’s where that matters: Imagine what the child could do with that medication, and that improved self-control, in another environment! ADHD isn’t just about whether a child is frustrating to overly strict teachers – that’s just one outward effect, and a relatively minor one at that. In another environment, they will be able to show their personality (like other kids would), but will also be able to use that self-control to accomplish their goals. When they’re older, they’ll be able to finish larger projects, persue their interests, and live more satisfying lives, because far from being overblown or made up, ADHD is a serious disorder that affects much more than the ability of children to become obedient in service to strict adults.

And, of course, if left untreated, many children with ADHD will have difficulty actually behaving well even in a non-strict environment. If a school is so strict it makes the children into conformist zombies, it has gone too far, but children do in fact need some level of discipline, to prevent them from doing harm to themselves and to others. In many cases, the medication not only helps the children conform to overly strict authority figures, but also to reasonable ones, a goal we should all be on board with.

So, if ADHD medication is called for, for yourself or for your children, please don’t avoid it because of this trope.

Now, I’m not a doctor, and I don’t mean to say that you shouldn’t be careful with medication. Zombie-like feeling and behavior can indeed be a sign of bad dosage or a bad medication match. But if by “zombie” you mean what the school would call “good behavior,” there is another, perhaps even more likely explanation: that the social pressures were such that any child who could behave like that would, and now the ADHD child can.

And here’s one way I know something about this: Because I had similar concerns when I started medication myself, as an adult. I asked my friends to pay careful attention to my personality, and whether anything about it changed. I was very concerned about inadvertant personality changes, and wanted my friends to pay special attention to that – while I paid attention to it in myself as well.

The dreaded personality changes never came. But some non-dreaded personality changes did come, with the increased self-control. I became less anxious, and less likely to randomly demand that my friends explain to me how they don’t hate me. And as I gradually increased the dosage – as you have to do with Strattera – I noticed other changes, changes that might have potentially been seen by some as negative or concerning, but which from an internal perspective were clearly positive.

What do I mean by this? Let me tell you what sort of changes I’m talking about. For example, I’ve been less outgoing. I’ve been less outgoing in the literal sense of going out less, and also in the sense of spending more of my at-home time alone, rather than on the phone. But this isn’t because I enjoy those things less; rather, it’s because I’m enjoying my alone time more. It’s because I’m better able to leverage my planning and self-control skills towards goals, goals like saving money and not eating too much.

See, I have historically had so much trouble doing things at home on my own. I have a lot of things I’d love to do more of, things that I clearly know how to do and can do, but which I only get myself started on if other people are around. Leveraging the presence of other people is a common ADHD coping technique known as “body doubling,” and it is one of many techniques to do the types of tasks which ADHD makes difficult – which at some level, is most tasks.

So before I went on medication I would spend as much time out and about as possible. Need to do work? Go to the coffee shop. Need to read a book? Go to the bar. Need to figure out some thoughts? Discuss them with someone. Need to clean my house? Invite someone over to clean it, or even just to hang out with me while I clean it – that works too, and makes me feel a little less bad. I could do with just the encouragement, similar to a personal trainer who may be there more to nag you into exercising than actually educate you about it in any way.

But now I’m medicated, and I’ve finally found a dosage that works for me. It’s not 100% better, of course, but it’s a vast improvement. And that means I’m suddenly getting work done at home. That means I’m occasionally even cleaning my own house. That means that I’m suddenly actually getting use out of my alone time – and so I am taking much more of it.

But sometimes, I worry that this may be a personality change, and a bad one at that. Am I losing my charm? My outgoing nature? I briefly get trapped by the narrative above, which I have heard so many times, and I think, “Oh no! My ADHD medication has made me boring.” But then I realize I could, if I so chose, go out as much as I used to. It’s just that the other options got better. The medication is just helping me.

And I am so grateful for it. Before, it was like I had a menu of fun things I could do for no cost in terms of extreme effort (basically all social), and also a menu of fun things that I’d theoretically like to be able to do, but would be so difficult to wrangle myself into doing them that they were out of the question, beyond special occasions or situations where other people were around. Making myself clean or even practice piano had become analogous to going to a nice restaurant very occasionally to treat yourself. I had to build my day-to-day life out of the easy tasks, which was mostly the social tasks – not exactly fun, and confusing for the people around me.

But now, the whole menu of activities is available to me all the time – or at least more of it. The harder tasks still take some wrangling, but the wrangling is way easier. That means I stay in more, and there’s probably other changes in my personality that in isolation seem negative, both my apparent personality and the way I approach things. But these changes are usually because I can actually accomplish my goals.

And so, I am deeply grateful for my medication, and that is why I am so sad when ill-thought out narratives perpetuate stigma against medication, especially for children who can’t make their own medication decisions. The effects of ADHD medication can be wide-ranging and complicated, so it’s important to think critically in evaluating them. Anti-medication narratives are often emotionally compelling but ultimately oversimplified, ignoring alternative explanations for what happens. So it’s important to actually think them through, and pull them apart.

Hopefully reading this has provided some practice doing so.

Walk-Through: Prefix Ranges in Rust, a Surprisingly Deep Dive

2023-06-24T00:00:00+00:00

Update: Arvid Norlander has gone through the trouble of refactoring this code into a crate and publishing it. Thank you, Arvid!

Rust’s BTreeMap and corresponding BTreeSet are excellent, B-tree-based sorted map and key types. The map implements the ergonomic entry API, more flexible than other map APIs, made possible by the borrow checker. They are implemented with the more performant but more gnarly B-Tree data structure, rather than the more common AVL trees or red-black trees. All in all, they are an excellent piece of engineering, and an excellent standard library feature.

But they aren’t perfect, as I learned recently when I had a very specific operation that I needed to perform on one. I scanned the method lists diligently, trying to find the one I needed, but it was not there. range was close, but not quite there, and so I would simply have to implement the operation by hand. range is defined based on a start key (where, at our option, it includes keys that are greater than or equal to that key, or strictly greater than that key) and an end key (where the keys in the range are either less than or equal, or strictly less than that key).

Here is an example of the use of range:

let set = {
    let mut set = BTreeSet::new();

    set.insert("ABC");
    set.insert("DEF");
    set.insert("DEG");
    set.insert("HIJ");
    set.insert("KLM");
    set.insert("NOP");

    set
};

for elem in set.range("DEF".."N") {
    println!("{elem}");
}

It outputs starting with "DEF", continuing in order through the set, but not including "NOP", as that is greater than "N" (lexigraphically and therefore according to &str’s Ord instance). If "N" were in the set, it would not be printed, as .. is exclusive on the right side. ..= would include it.

Maps and sets: A brief aside

This discussion only concerns the keys of a map. For simplicity’s sake, throughout the discussion, I’ll be using BTreeSet, a wrapper around BTreeMap for when there are just keys (that are still unique and sorted) and no values. Internally, it contains a BTreeMap with the zero-sized struct SetValZST as its value type.

The Problem

But that isn’t the exact operation I needed. I needed all of the keys (which were also String) that started with a certain prefix. So, if the set was as in the example above, and the prefix was "DE", this operation would give me "DEF',"DEG". As you can see from the example, and as is easy to prove in general, when the keys are sorted, all the keys starting with a prefix form a contiguous range. But it is not a range that can be expressed with the range operation.

It’s close, tantalizingly close. Due to the definition of Ord on String, our prefix-based range starts with the first key that is greater than or equal to the prefix, as strings starting with a prefix always compare greater to or equal to the prefix. This side of the range is therefore expressable with the range operation.

It’s the other side that causes the problem. We don’t have a key where all the keys in the prefix are less than that key. We know that once we hit a key string that doesn’t start with the prefix, it must be greater than all the keys that do, as must all subsequent ones, but we cannot express this bound easily in terms of the prefix. We would need an element that is either the greatest possible key that starts with that prefix, or else the least possible key that does not.

There is a lot of efficiency to be gained by taking advantage of the fact that the range we want is contiguous, which is why the range method exists. But there is no operation that covers this scenario, because of the narrowness of how the range operation is defined.

On the one hand, this is frustrating. We are so close to being able to do this straight-forwardly with the provided operations. It also seems like it would be more performant to determine the bounds of that range by doing a tree search, rather than trying to implement this operation by hand. Without this operation being available, we seem doomed to slowness.

On the other hand, it’s understandable. The key type of a map is only really expected to implement the Ord trait, and nothing about Ord has anything to do with prefixes. Creating ranges with range was allowed, but based on inclusive and exclusive bounds, which is to say, purely based on ordering of opaque elements. Evaluating a prefix as a range, on the other hand (or even merely proving that the keys forming a prefix do indeed constitute a contiguous range) would be outside of the scope of the operations represented by the Ord trait.

So I needed a way of getting keys that start with a specific prefix. So what did I do? I simply coded a manual form of the operation, looping starting from the beginning of the range, and checking each iteration whether we’d left the range yet:

for key in keys.range::<String, _>((Bound::Included(prefix), Bound::Unbounded)) {
    if !key.starts_with(prefix) {
        break; // We've gone past the end of the range
    }
    // ... Actually do something with the key
}

This seemed reasonable enough. My colleagues asked me to put in a comment to clarify that, since the map was sorted, all the items with a prefix would be contiguous, and therefore break was correct and not continue. It worked, and was performant enough for my purposes in writing the code, but perhaps not as much as ideally could be achieved. I couldn’t help but wonder if it could be made a little more performant if it were part of the standard library, if we had insight into and ability to access the inner structure of how a BTreeSet is laid out. Obviously, in such a case the code would also be more concise, and (more importantly) obviously correct, without need for a comment.

The performance considerations, if present, however, would be minimal. Looping through a BTreeSet is a reasonable operation, and I took advantage of the fact that my range was contiguous to stop once we’d gone past the last item. At best, explicit library support for prefixes would simply detect this condition slightly sooner, further up in the tree, without having to actually find the node with the offending item.

The next bit of code I wrote was for a closely related operation: dropping values outside of the prefix. What I wrote seemed like it definitely would be substantially less performant than a specially coded operation from the standard library would be. It certainly was harder to prove correct:

fn prefixed(mut set: BTreeSet<String>, prefix: &str) -> BTreeSet<String> {
    let mut set = set.split_off(prefix);

    let not_in_prefix = (&set).iter().find(|s| !s.starts_with(prefix));
    let not_in_prefix = not_in_prefix.map(|s| s.to_owned());
    if let Some(not_in_prefix) = not_in_prefix {
        set.split_off(&not_in_prefix);
    }

    set
}

This uses two calls to split_off, which like range needs a concrete T, a concrete String, to serve as a comparison-point for where to split. And it is certainly less performant than a dedicated method would have been, as it also uses a call to find to find a concrete String for the end of the range, which constitutes an additional loop through all the strings in the range.

Questions

This raised two questions in my mind:

Is there a way to convert a prefix into a range that can be used with range and split_off? More concretely, is there a way to construct a String such that it is the least possible String that is still greater than all the possible strings that start with our prefix, but less than or equal to all strings that do not? Would doing so in fact improve performance?
How hard would it be to add this feature to the standard library, both for iterating and for splitting the set?

In this blog post, we will focus on the first question. The second question is reserved for a future blog post.

Testing `prefixed`

The prefixed function needs the optimization more than the loop, so we’ll focus on that in our discussion. And as we’re discussing an optimization of the prefixed function, and as it is in any case a gnarly function, we will want to write some unit tests for it.

Here’s one example:

#[test]
fn it_works() {
    let set = {
        let mut set = BTreeSet::new();
        set.insert("Hi".to_string());
        set.insert("Hey".to_string());
        set.insert("Hello".to_string());
        set.insert("heyyy".to_string());
        set.insert("".to_string());
        set.insert("H".to_string());
        set
    };
    let set = prefixed(set, "H");
    assert_eq!(set.len(), 4);
    assert!(!set.contains("heyyy"));
}

This probably isn’t enough. Additional unit tests will be left as an exercise to the reader.

Constructing an upper bound

So, let us return to our example. In our example, the prefix was "DE". As discussed, the lower bound is easy: Everything that starts with a "DE" is greater than or equal to "DE". Strings outside of the range to the left will not:

println!("{}", "DD" >= "DE");       // Prints "false"
println!("{}", "DE" >= "DE");       // Prints "true"
println!("{}", "DEF" >= "DE");      // Prints "true"
println!("{}", "DEG" >= "DE");      // Prints "true"
println!("{}", "DF" >= "DE");       // Still prints "true" -- need something
println!("{}", "NOP" >= "DE");      // Still prints "true" -- need something

The upper bound is also easy enough, actually – we just need to increment the last character. Anything that starts with a "DE" will also compare strictly less to "DF":

println!("{}", "DE" < "DF");        // Prints "true"
println!("{}", "DEF" < "DF");       // Prints "true"
println!("{}", "DEG" < "DF");       // Prints "true"
println!("{}", "DF" < "DF");        // Prints "false"
println!("{}", "NOP" < "DF");       // Prints "false"

This seems easy enough to handle. We just need to write a function that increments the last character in a string, something with this signature:

fn upper_bound_from_prefix(prefix: &str) -> String;

Incrementing the last character in a string seems like it’s just a matter of incrementing the last byte, so let’s see what that looks like:

fn upper_bound_from_prefix(prefix: &str) -> String {
    let mut prefix = prefix.to_string();
    unsafe {
        // SAFETY: It is not. ☹️. XXX
        let prefix_bytes = prefix.as_bytes_mut();
        prefix_bytes[prefix_bytes.len() - 1] += 1;
    }
    prefix
}

Well, that’s not good. It passes the unit test I wrote, but that’s because we need to write more unit tests. Unfortunately, like many programmers before us, we have forgotten about UTF-8. Rust requires all its strings to be stored as valid UTF-8 as a safety invariant. Fortunately, because we’re using Rust, we notice that we’re violating this invariant when an operation we have to invoke is marked as unsafe.

In order to capture this failure, we would have to write a unit test where the prefix ends in a multi-byte Unicode character. Unfortunately, because this is a safety issue, the test might not even fail (but it might be worth doing as an exercise anyway).

That isn’t even to mention the possibility that the prefix is empty, which would result in a panic in this code!

So, how can we get the last character of a string? get allows us to do substrings with byte indexes, but returns None if it is not a valid substring. We can loop backwards until we find an index that works for the split, and we can return an option in case the string is empty:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0 .. prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                prefix[0..i]
            };

            // ???
        }
    }

    None
}

But that gives us two strs, and we want to increment a char. So we have to extract the singular char from the last_char_str, which we know to have exactly one char in it. Looking over the operations of str, we have only one real option:

let last_char = last_char_str
    .chars()
    .next()
    .expect("last_char_str will contain exactly one char");

Walking Through `char`s

But once we do have a char, we cannot simply do + 1 on it. This operation isn’t defined on a char. And before you say that we should convert it to u32 and back, you should know that the operation is left undefined on char for a reason. chars are supposed to remain valid Unicode code points.

So, we must do something else that will skip over invalid code points. There is no obvious operation in char that will do it, but if we look in the “Trait Implementations” section, we find something that looks potentially relevant: Step. And looking at char’s implementation of Step, we see the exact function we want:

fn forward_checked(start: char, count: usize) -> Option<char> {
    let start = start as u32;
    let mut res = Step::forward_checked(start, count)?;
    if start < 0xD800 && 0xD800 <= res {
        res = Step::forward_checked(res, 0x800)?;
    }
    if res <= char::MAX as u32 {
        // SAFETY: res is a valid unicode scalar
        // (below 0x110000 and not in 0xD800..0xE000)
        Some(unsafe { char::from_u32_unchecked(res) })
    } else {
        None
    }
}

Unfortunately, this gives us an Option. Why? Well, you can see that from the code: What if last_char is the highest possible Unicode code point, 0x10FFFF, also known as char::MAX? We’re going to procrastinate handling this (admittedly rare) situation, and panic for now. Spoiler: Fortunately, there is a solution, which we will discuss later.

This is a great example of why Rust is great. Because this operation is defined to return an Option, we have to explicitly say what we’re doing in case it returns None. We don’t even have to have a unit test for 0x10FFFF code-points in our prefix to realize that we have to cover this case (although now would be a great time to write one).

Also unfortunately, we can’t directly call forward_checked … not if we want to use stable Rust, in any case. It’s marked as a nightly-only “unstable API.” Fortunately, however, we can access it indirectly, through the Range API. Some rooting around in the standard library reveals that nth, on an iterator on a closed range, calls forward_checked, yielding :

let last_char_incr = (last_char ..= char::MAX)
    .nth(1)
    .expect("XXX fixme: can't handle highest possible codepoint");

This actually works, with the caveat of handling char::MAX set aside. All my unit tests except my 0x10FFFF one pass. Altogether, here is the state of things: We have a prefixed function that uses this to call split_off with an appropriate value, without iterating through all the strings in range in the set:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0..prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                &prefix[0..i]
            };

            let last_char = last_char_str
                .chars()
                .next()
                .expect("last_char_str will contain exactly one char");
            let last_char_incr = (last_char..)
                .nth(1)
                .expect("XXX fixme used highest possible codepoint");

            let new_string = format!("{rest_of_prefix}{last_char_incr}");

            return Some(new_string);
        }
    }

    None
}

pub fn prefixed(mut set: BTreeSet<String>, prefix: &str) -> BTreeSet<String> {
    let mut set = set.split_off(prefix);

    if let Some(not_in_prefix) = upper_bound_from_prefix(prefix) {
        set.split_off(&not_in_prefix);
    }

    set
}

Cleaning Up the Edge Case

OK, now that we’ve got something that (kind of) works, it’s time to do some clean-up.

So, first, of course, we should address the XXX fixme, the 0x10FFFF case. So what do we do in that case? Well, if we use X to stand in for this “highest code point character”, we can reason about it a little.

Let’s say the prefix is "deX". In order for something to be out of the range of the prefix, it can’t start with "deY", as there is no 'Y' character greater than 'X'. So, it would have to differ on the previous character. It would have to start with "df" or greater.

So, if our prefix ends with this special character, we can simply drop it, and move one character back, and increment that character instead. Strangely enough, that just means going through our for loop again (and no, I did not plan this). See, if we keep going backwards to find another character to increment, we’ll get the previous character. Our way of extracting characters from the suffix works even if there’s more than one character in the second substring – it’ll just get the first character, which is exactly what we want.

So we can actually write:

let Some(last_char_incr) = (last_char ..= char::MAX).nth(1) else {
    continue;
};

Adding some comments to explain, and adjusting existing code to no longer lie to the reader (last_char_str might now contain more than one character) we get this:

fn upper_bound_from_prefix(prefix: &str) -> Option<String> {
    for i in (0..prefix.len()).rev() {
        if let Some(last_char_str) = prefix.get(i..) {
            let rest_of_prefix = {
                debug_assert!(prefix.is_char_boundary(i));
                &prefix[0..i]
            };

            let last_char = last_char_str
                .chars()
                .next()
                .expect("last_char_str will contain at least one char");
            let Some(last_char_incr) = (last_char ..= char::MAX).nth(1) else {
                // Last character is highest possible code point.
                // Go to second-to-last character instead.
                continue;
            };
            
            let new_string = format!("{rest_of_prefix}{last_char_incr}");

            return Some(new_string);
        }
    }

    None
}

If our string contains only copies of this highest possible code point, this returns None, which is appropriate because there will be no strings greater than the strings prefixed with these characters, just like there’s nothing that comes after names that start with “Z” in alphabetical order, nor anything that comes after names that start with “Zz”.

Note that if we want to save the other sets that are created by split_off, we can. We can easily modify this function to return all three sets: The set of keys that come lexigraphically before the prefix, the set that starts with the prefix, and the set of keys that come after the keys that start with the prefix.

Performance

This code certainly hasn’t been optimized to the fullest extent possible. In such a case, we probably would want to do some more extreme optimizations, like working with Vec<u8> rather than Strings, and check if they were valid UTF-8 only at the point when it is necessary (if it in fact is necessary for our application). Or, alternatively, we might want to fork the standard library’s BTree implementation and actually add this operation. Both of these are gnarly, but if the absolute best possible performance was truly our goal, they would both be in scope.

But I am reserving that for a future blog post. Detailed profiling of different implementations of this operation would require that level of optimization to be fully interesting and is therefore also reserved for a future blog post. Instead, here, I will walk through some informal reasoning about the performance of this new implementation of prefixed, and whether it is also useful for iteration rather than splitting off a new set.

So, let’s do some back-of-the-envelope reckoning. In creating this upper bound, we had to reconstruct the prefix string, which costs us an allocation as well as a string copy. In exchange, we saved an extra call for find, which might have had to loop over many, many strings that start with this prefix. We can expect this implementation of prefixed to be more performant, therefore, in situations where there are many strings that start with the prefix (and the prefix is not pathologically long).

For iterating over the range, however, we would be making an allocation, and only potentially saving us some walking through the tree. Given that allocations are expensive (and potentially also involve some amount of walking around memory), it’s probably not going to be worth it unless the tree is extremely large.

A Warning unto the Test-Shy

In an earlier draft of this post, I had the following code to increment a char rather than what I wrote above:

(last_char ..).nth(1)

This seems like it should work, in spite of having no upper bound. It stands to reason that char::MAX would, in such a case, serve as an implicit upper bound. It does still return an Option<char>, and when would None happen if not in such a situation?

But fortunately, I had a test case:

#[test]
fn maxicode() {
    let set = {
        let mut set = BTreeSet::new();
        set.insert("Hi".to_string());
        set.insert("Hey".to_string());
        set.insert("Hello".to_string());
        set.insert("heyyy".to_string());
        set.insert("H\u{10FFFF}eyyy".to_string());
        set.insert("H\u{10FFFF}".to_string());
        set.insert("I".to_string());
        set.insert("".to_string());
        set.insert("H".to_string());
        set
    };
    let set = prefixed(set, "H\u{10FFFF}");
    assert_eq!(set.len(), 2);
    assert!(!set.contains("I"));
}

This test case, in that earlier code, actually panicked! It turns out that in the case of an open-ended range like (last_char ..), which results in a value of the type RangeFrom, it is simply assumed that going forward is possible. Instead of calling forward_checked, its nth method calls forward:

#[inline]
fn nth(&mut self, n: usize) -> Option<A> {
    let plus_n = Step::forward(self.start.clone(), n);
    self.start = Step::forward(plus_n.clone(), 1);
    Some(plus_n)
}

And in forward, every None is converted into a panic:

fn forward(start: Self, count: usize) -> Self {
    Step::forward_checked(start, count).expect("overflow in `Step::forward`")
}

Conclusion

I hope you enjoyed this walk-through. You can find the final version of prefixed and two test cases here.

Please let me know what you think of this format in the comments. Also let me know if you have any follow-up topics you want me to explore, or other problems you would want walk-throughs of.

And, of course, please feel free to provide corrections and even nit-picks!

Fiction Review: The Long Way to a Small Angry Planet

2023-06-16T00:00:00+00:00

I already enjoyed the Monk and Robot series by Becky Chambers (A Psalm for the Wild-Built and A Prayer for the Crown-Shy). It’s now one of my favorite books. so I was excited to also read her earlier work, the Wayfarer series, starting with The Long Way to a Small, Angry Planet, and it did not disappoint me.

Both these series are science fiction. While Monk and Robot is solarpunk, a relatively new sub-genre focused on imagining a world with major environmental (and economic) problems solved, the Wayfarer series much more reminds me of the kind of science fiction I used to read as a kid. While it’s described as space opera, it reminds me more of Heinlein or Arthur C. Clarke or even Niven, who are considered hard sci fi. I’m not sure whether this is because it focuses less on accuracy and logic than those other authors, or if it is because it does not do so at the expense of character development, or perhaps because it is written by a woman.

Nevertheless, in contrast the classic “space opera” clichés, it does not focus on a war (though war is involved). In general, the stakes are far lower than that, involving ordinary people with ordinary jobs, interacting with and influencing events that affect the entire fictional universe in relatively minor ways – outsized for a completely normal person, but not “saving the world” or “overthrowing the evil empire” or other typical space opera fare.

The aliens and the civilization in general is also a lot more developed than typical space opera, where aliens are typically humans with one twist each who live in normal human societies. Instead, it has the full “hard sci fi” range of alien eccentricities, full of philosophical exploration of how xenobiology might end up being and an intricate inter-species galactic social balance.

In fact, not only is our trusty space ship crew made up of regular people just trying to eke out a living, but humans in general are just a regular species, side stepping tropes where humans are the best species or the most creative – or the most violent or the most evil. Our space crew isn’t even entirely human (though it mostly is), and we get a fair amount of explicit alien perspective.

Instead of humans, the privileged species that dominates galactic society, the equivalent of the colonizing (or post-colonizing but still quite privileged) white people in our world, are aliens who we would find quite repugnant, but who until recently dominated large swaths of the galaxy by force. Other species, including humans, try to emulate their ways and are proud to learn their prestigious language. Meanwhile, the lingua franca is not a human language, but most humans are forced to learn it.

This means that our mostly-human and human-led crew are normal not-particularly-privileged people in a normal not-particularly-privileged species. The sense of normalcy and “everyday folks” is refreshing in a genre normally dominated by the powerful and those who become or fight the powerful, and again, does not strike me as within the stereotype of “space opera.”

It does in some ways remind me of Serenity, partially because our trusty crew – with one special perspective character, their newest member, who is decidedly not a protagonist but merely a window into an ensemble cast – take on various jobs to make a living, and the storytelling takes the form of episodic vignettes that take place in the context of these jobs. While there is an overarching, overall plot, it is more like the season plot of an episodic TV show than the plot of a more tightly-woven novel. If anything, it could use a little more development, as the climax ends up feeling a bit abrupt.

The themes are perhaps another reason why it’s not considered hard sci fi even though it probably should be. Rather than an old man trying to evangelize libertarianism or some other weird form of conservatism (looking at you, Heinlein, though others are guilty) or explain how humanity will become a transcendant orgy hive mind (shockingly many Arthur C. Clarke books, and also Heinlein), Becky Chambers explores the meaning of family … and what to do with family members or colleagues who are obnoxious but indispensable. It explores issues like medical consent, or when is it okay to ally with another group, as opposed to when that is too risky.

All in all, I think it’s a good thing to have more diversity in science fiction than what I grew up with, namely a weirdly specific flavor of stodgy conservative white men – a flavor, to be clear, even more specific than that combination of adjectives alone would imply. It’s good to have that diversity even if some of that perspective confuses reviewers and marketers about what genre a book is in. I as my current self greatly enjoyed it, and I think my teenage self would have too, and there’s something uplifting about that concurrence.

Debt Ceiling, Redux

2023-05-26T00:00:00+00:00

So you might or might not be aware about the debt ceiling argument currently taking place in the US.

I’ve already written about this, but President Biden for some reason didn’t listen to me (perhaps because he doesn’t read my blog – which is disappointing). Other, more famous people have written about it too,, but the President insists on pretending he has to make a deal with the Republicans.

So, to catch everyone up, here’s how this all works.

Congress passes a budget every year. This budget requires the President (and the bureaucracy he presides over) to spend money on various programs. Not “allows,” to be clear, but requires – in the vast majority of situations, the President does not have the discretion to not spend the money. The spending is required by law.

The President (or specifically, the Treasury Department) is also required to pay the US debt as it comes due. This is also required by law, and specifically, this obligation is enshrined constitutionally in the 14th amendment, section 4, which was added to the US Constitution shortly after our Civil War.

That money to do these things typically will come from two places: taxes, and debt auctions. The IRS can only collect so many taxes. But the treasury can do debt auctions without any practical limitations.

However, we are now in a situation where the treasury is hitting a boundary where they supposedly cannot do debt auctions anymore, because there is also a law that purports to limit the total outstanding amount of debt the US has: the debt ceiling.

Now, I’m tempted to clarify that government debt is not like household debt, that it’s not really very much like a debt at all. There is no reason economically to have a debt ceiling. Too much government spending or not enough taxes can in some situations cause inflation, but it’s more complicated than any measure of how much debt there is.

But that’s actually beside the point here, because we can get to the same conclusion without it, which is that the laws as they stand force the administration into a contradictory position. The President – or rather, the Treasury – has legal obligation to spend money on the budget and on paying the debt. The Treasury cannot collect more in taxes than it currently is, not without changing the laws. So it has a legal obligation to issue more debt to pay the current debt, from the budget and constitution. And from the debt ceiling, it has a legal obligation not to issue more debt.

The Treasury legally must issue more debt. The Treasury legally must not issue more debt. The two laws contradict. Which wins?

Well, if there is any legal option to make the contradiction go away, the Treasury is bound to try them. These take the form of issuing debt that, for whatever reason, doesn’t count toward the debt ceiling. For example, the Treasury could mint a $1,000,000,000,000 platinum coin. Coins are technically a form of debt, owed by the US Treasury to whoever holds the coin, but (in what is apparently an oversight) they do not count towards the debt ceiling. It’s an absurd loophole, but legally, the situation the Treasury is in otherwise is as absurd.

Or, less meme-like but equally useful, they could issue $0 face value bonds, which again would not count toward the debt ceiling.

To be clear, I’m not saying the Treasury could consider doing these things. I’m saying that the Treasury must do these things if the alternative is default, that it is legally and even constitutionally obligated to. I’m saying that if President Biden doesn’t choose one of those things, he is in violation of his oath of office, and not doing his job.

President Biden, however, and his treasury secretary, have ruled out doing those things. One hopes that they are lying rather than planning on betraying their country and its Constitution. They say that there is not time (this seems false), or specifically time to get it past the courts – which is not how I think courts work. How courts work is you do it, and then maybe courts yell at you. But given that President Biden is obligated to do one of these things, it seems pretty clear to me that doing it is better than not doing it.

The Biden administration argues that these things are unprecedented. But: You know what else is unprecedented, but actually clearly illegal? Default. It seems that President Biden would prefer if some deal is reached, but if a deal is not, he should be willing to do the legal unprecedented thing instead of the illegal unprecedented thing.

But let’s imagine he’s right. Imagine that these special debts for some reason were off the table. Then we’re back to the contradiction. The Treasury is obligated (by the budget and by the constitution) to create more debt, and forbidden (by the debt ceiling) to create more debt.

Certainly, the Constitution trumps the debt ceiling, at least to the extent that government spending is to pay existing debt. Perhaps it is only lawful to do so for that reason, and not to pay normal budgetary outlays, and we should do a normal government shut-down. That is what the Wall Street Journal’s editorial board argues.

But given that the budget is not generally seen to be discretionary, and the debt ceiling will be violated either way, this doesn’t hold water for me. Once you’re past the debt ceiling, in my mind, you’re past it. Once the debt ceiling gets in the way of the treasury paying existing debt, I think it’s entirely blasted away as unconstitutional. But even as a matter of statutory interpretation, the budget was passed more recently than the debt ceiling, is more specific, and doesn’t contain an exception accommodating it.

This is the argument the President and media refer to as “invoking the 14th amendment.” They all talk about it, again, as if it’s something the President may do, rather than what the President must do, what he is legally and morally obligated to do as part of the President’s Constitutional duties to the American people, under his oath of office.

Instead, for whatever reason, the President and the media are all talking as if the only way to avoid default is with a deal. They’re talking as if the debt ceiling actually did what the Republicans claim it does, that it causes a default when the ceiling is reached.

In my mind, this is already a violation of the President’s oath of office. This is a concession to the powerful forces, like the former President Donald Trump, that would like nothing more than to see our constitutional order destroyed, who tried to destroy it only a few short years ago. President Trump has said that he wants default, an unconstitutional outcome, and by conceding that it is even legally possible for that to happen, President Biden has failed to defend the constitution against that manifest insurrectionist.

And the media who talk about the 14th amendment as an unprecedented option, as if it is more tenuous than default, which is also unprecedented, are also abetting enemies of the constitution, hopefully merely out of confusion.

I respect Ezra Klein, but I disagree with his article. The Supreme Court may do what it wants, and they might be majority Republican appointees, but they’re not majority Trumpers.

But more importantly, default is even less of a debt ceiling plan. If no deal is reached, President Biden has no alternative. The Supreme Court may later decide to ruin the country, but that doesn’t mean the President can’t run it correctly in the meantime.

Just because it’s called “default” doesn’t mean we’re allowed to do it if the other options are bad, by default. It doesn’t mean that at all.

Of course, this might all be a negotiating stance. What all these people might actually be saying is that a deal with Republicans is better than a situation where one of these “untested” option is used. I hope – I really truly hope – that they are more or less lying, that if it comes to it, they’ll be in favor of minting the coin or ignoring the debt ceiling rather than actually letting a default happen. But it upsets me, disgusts me, that we’re making a deal with people who do not care about fiscal policy on the basis of fear of an outcome that is legally impermissible because of the threat of an unconstitutional law.

But if so, President Biden should not lie. If the Treasury is secretly studying other options, they should instead do it openly. He should have started all of this by asking for a law clarifying the debt ceiling is unconstitutional and being repealed, but shrug his shoulders if he doesn’t get one. The opposition in this case does not deserve the respect they are currently being given, led by their pro-default insurrectionist-in-chief, for whom the debt ceiling was raised three times.

And if you believe, genuinely, that the US needs to have a conversation about fiscal policy, the time to have that conversation is when passing the budget, not randomly according to an unconstitutional debt ceiling.

There is No One True Best Programming Language (but some are still better than others)

2023-05-24T00:00:00+00:00

I am no stranger to programming language controversy. I have a whole category on my blog dedicated to explaining why Rust is better than C++, and I’ve taken the extra step of organizing it into an MDBook for everyone’s convenience. Most of them have been argued about on Reddit, and a few even on Hacker News. Every single one of them have been subject to critique, and in the process, I’ve been exposed to every corner, every trope and tone and approach of programming language ~~debate~~ religious war, from the polite and well-considered to the tiresome and repetitive all the way to the rude and non-sensical.

There are two tropes in particular that many times have been proferred to me (or rather, levered at me) about programming languages, two opposite errors that I would like to critique. I would say that I’d like to nip them in the bud, or respond to them once and for all, but I know the power of my blog is limited, so instead I’d just like to give my opinion on them, and explain why they are erroneous. Here are the errors:

There is one best programming language.
Every programming language has its place.

Error #1: There is one best programming language

Some languages have fans in the original sense of fanatic. Some languages inspire a level of devotion in programmers where they forswear other programming languages with an almost religious loyalty. These fanatics truly believe that the programming language is perfect, and that no other language can so perfectly capture the structure of computing and of algorithmic reasoning – or even be acceptable in light of the existence of a perfect programming language.

Any threat to this programming monolatry is then attacked as intrinsically irrational. After all, if everyone would just do the basic and obvious step of rewriting everything in this ideal programming language, then all bugs would be fixed. Then, “the wolf also shall dwell with the lamb, and the leopard shall lie down with the kid,” everyone will be immortal, and the messiah will come… And this, of course, is insufferable to normal people, who realize that programming languages are tools, not gods.

Rust, admittedly, brings this out in people. So does Lisp, and so does Haskell. And lest you think I’m exaggerating with the religious references, someone even wrote a Haskell book entitled To Kata Haskellen Evangelion, Biblical Greek for the blasphemous and hopefully tongue-in-cheek title The Gospel according to Haskell.

I know what you’re thinking; I can hear it in my head. You’re thinking: “You’re one to talk, Jimmy! The Coded Message is a Rust blog, and worse, a Rust evangelism blog! How dare you criticize when you’re one of the worst offenders?”

Nevertheless, in spite of what you might think, I don’t think Rust is the one true programming language. I think it’s ahead of other mainstream programming languages in terms of strong typing and functional features (key word “mainstream”), and I personally enjoy working on it full time, all true. But while I am a fan, I don’t think it’s perfect, or even unique in most of the ways it’s good.

Instead, I bring up this error for the reason I promised: Because it has been levelled against me. Early on, with my first Rust post, I wrote this statement (and see if you can see why it was controversial):

If you are a systems programmer, if you are used to C and C++ and to trying to solve systems programming types of problems, Rust is magical, just like when you learned your previous favorite programming language.

If you are not, Rust is overkill for your task at hand and you shouldn’t be using it. I earnestly recommend Haskell.

This got me quite a bit of anger on Reddit. One commenter was furious that I recommended Haskell, because they had tried to learn it in the past and had a bad time. Another tried to tell me I was being stubborn because the collected testimony of the Rust Reddit hadn’t somehow managed to override my 18 years of professional programming experience and convince me that garbage collection was not a necessary thing to have in a programming language sometimes.

And the key term there is Rust Reddit: There are some people there who think everyone should be writing Rust, even people who have every reason to benefit from a garbage collector and who have nothing to gain from the strictness of a borrow checker, because they think Rust is just the absolute best possible language. And the Rust sub-reddit does what any good echo chamber does, and brings out that vibe in every Rustacean.

But the echo chamber did not get me. Although I’ve moderated my opinion some – I’ve realized that there are some times where Rust beats out GC’d languages for applications outside of my narrow definition of systems programming, if only because it is both so mainstream and so successful at bringing in modern FP features – I still hold by my fundamental point:

Sometimes, indeed probably for most programming projects, Rust is the wrong choice. Just like I wouldn’t use Excel to do systems programming, I wouldn’t use Rust to keep track of splitting expenses on a trip.

Even for “serious” programming projects (whatever that means), sometimes, you simply do need a garbage collector. Sometimes, the semantics of Rust are too deep-cut or complicated to teach to the people you need to do your programming.

Heck, sometimes even existing infrastructure or existing legacy codebases or just existing skillsets are more important than what programming language features you have. Sometimes, Rust would take a re-write. And re-writing in Rust is not a panacea, or even always a good idea.

Error #2: Every programming language has its place

This one of course gets levelled against me far more often, especially in my Rust vs C++ debates. Most people realize programming languages are technical tools, and a skilled programmer can pick new ones up with relative ease. But some people act and talk instead as if, say, C++ programmers were an ethnic or religious group. If I call for the gradual deprecation and obsolescence of C++ in favor of Rust – while understanding that legacy code is a genuine concern that will be with us for decades – these people act as if I’m calling for crimes against humanity, saying Kumbaya-reminiscent statements like “All programming languages have their place.”

But of course, some tools are simply obsoleted by other tools. While Rust won’t serve your needs if what you really need is garbage collection, there are very few scenarios where C++ still beats Rust for new development. Sure, C++ has improved over time, but Rust doesn’t have a legacy to weigh it down, and so can actually do things right the first time.

Some people disagree with this in a way I respect, because of support for optimizing compilers, or the vagueness and immaturity of the semantics of unsafe Rust, or some other concrete reason where C++ has something to offer as a tool. Others simply live in worlds where too much code is in C++, and it would be impossible to migrate anytime soon, and that also makes sense to me. But I simply cannot take seriously an assertion that in some axiomatic way, reminiscent of the intrinsic value of all human beings, every programming language has its value.

Why should this be true? It’s not like which programming language someone uses is an intrinsic quality. I’ve changed from a C++ programmer to a Rust programmer, and so can you. Perhaps some of the people saying this are hobbyist programmers, asserting the right of people to enjoy C++ personally, and to program it as nerds. And that’s fair! But that’s also not what I’m talking about. I’m talking about what the best programming language is to use for projects that people will use in anger, where it matters whether a language is likely to lead to security vulnerabilities when used. If what programming language you use for such projects is a key part of your identity, then that’s not an OK way to structure your identity.

If it were true that all languages had their place and their value, does that mean that there should be shops writing in the obsolete versions of C++, like the original C with Classes? Does that mean that there should be shops writing code in INTERCAL? Does that mean that there’s some situations in which it’s best to do greenfield development in COBOL?

One example of this trope is the famous essay "‘Considered Harmful’ Essays Considered Harmful", which has of course been cited to criticize my own “Considered Harmful” post (for more on the “Considered Harmful” trope, see the Wikipedia article). Ironically but unsurprisingly, “‘Considered Harmful’ Essays Considered Harmful” is dogmatic in exactly the way it criticizes, in spite of giving itself a (silly and ill-defended) out. In spite of recommending that “considered harmful” essays be replaced by “benefits and weaknesses” lists, or even “perceived benefits and weaknesses” lists, it does not follow its own advice. It does not list benefits of the “Considered Harmful” essays it considers harmful.

So I will fill in this deficit. “Considered Harmful” essays are good when a feature of a tool does indeed cause harm, and a better option is available – as is often actually the case. The title is a cliché, which is a good thing in this case: it signals to the reader, in a light-hearted way, what the thesis of the document is – as opposed to “benefits and weaknesses” lists which tend to be biased in any case and can amount to passive-aggressiveness. Weaknesses in one’s argument or benefits in one’s opponents argument can and should be acknowledged and addressed, but that doesn’t mean you have to pretend not to have a position. Just because something has some benefit doesn’t mean that it can’t, overall, be fairly considered harmful.

Indeed, my own post did do some “benefits and weaknesses,” in spite of being titled as a “Considered Harmful” essay. It did spend some time explaining why C++ made the decisions they did, and what the benefits of C++’s decisions were, even in the context of a post about why these decisions were considered harmful. C++ had to implement non-destructive moves for backwards-compatibility. They had boxed themselves into them, harmful as they are. That doesn’t make them any less harmful, but it does make them understandable.

So I disagree with the people who have used that post to criticize me, and ask them why they don’t also turn the arguments of that post against itself. Perhaps I could write:

‘“Considered Harmful” Essays Considered Harmful’ Considered Harmful

The only problem with this would be how to punctuate it. That and, I’m sure it would widely be considered… quite silly.

Conclusion: Restatement and Summary

Programming languages are tools. They are important tools, so it’s good to make sure they are of high quality, and do the things we demand of them, because they are often asked to do critical tasks for society. They are also not to be conflated with the people using the tools, who can retrain on new tools if they’re worth their salt.

Tools should not be idolized, and tools cannot be perfect. It is impossible to make a tool that can serve any purpose equally well – programming language design, in particular, will always have trade-offs. However, it is possible to make a tool that loses to another tool in all categories, and that is what C++ will soon be in comparison to Rust, if it is not already there.

And C++ programmers have their place in the new Rust world – it’s very easy to learn Rust from a C++ background. And C++ history has its place there too – Rust builds on C++, and it wouldn’t have been possible without the contributions of those who worked on making C++ what it is. Everything that is community about C++, everything that is people, everything that has moral value, can be migrated to Rust.

But that doesn’t mean that C++, the tool, has a place in production programming beyond legacy (i.e. pre-existing) projects. Again, there still may be a few other valid reasons to favor C++ over Rust (though they’re getting fewer and weaker with time), but a bald assertion that “every language has its place” is not one of them.

x86S: A Long Time Coming

2023-05-23T00:00:00+00:00

Intel has just released a new white paper, where they discuss removing a lot of the legacy cruft of the Intel/AMD architecture they call Intel64. Only 64-bit operating systems – and a narrow set of 32-bit legacy apps that don’t use segmentation (a small subset in theory but basically all of them in practice) – will be supported. I am surprised at how excited I am, although after all this time perhaps the better word is “relieved.”

Finally, Intel computers will dispense with the illusion that the default mode is the DOS-compatible, 16-bit “real mode.” They will drop the conceit that modern memory protection, not to mention the ability to address more than 1MB of memory (approximately, yes I know about A20), is opt-in – which it currently, literally, is. All of the code to accommodate these legacy modes can be phased out. All of the circuitry and/or microcode to implement all of these legacy modes can be removed – though I’m sure Intel has had ways to keep it from doing too much damage, it definitely increased the complexity of their processors.

This is one of the biggest tech debt paydowns I’ve seen in a long time. I have long felt about Intel architecture somewhat analogously to how Richard P. Gabriel, author of “The Rise of Worse is Better”, felt about C++ and Unix decades ago:

The good news is that in 1995 we will have a good operating system and programming language; the bad news is that they will be Unix and C++.

Similarly, I have always felt that Intel architecture would become reasonable someday, that it would gradually convert itself to something less absurd than its traditional state. I was excited when AMD (not Intel, note) came out with what Intel now calls Intel64, getting rid of segmentation in 64-bit mode and adding 8 sorely needed additional general-purpose registers (for a total of 16).

Now, finally, they’re phasing out the legacy modes. No more DOS on a modern PC (and it wouldn’t work anyway for other reasons). Good!

Like many tech debt paydowns of this magnitude and this level of historical relevance, it’s about the cognitive burden as much as it’s about the actual implementation or the actual code and circuitry to work around the complexity. We can now, slowly but surely, forget the arcane details of how things used to be.

It brings me a tinge of nostalgia, actually. 16-bit DOS programming was where I first learned assembly, at least to read it. Segmentation and the different processor modes was firmly in my awareness when I used a DOS computer with Windows 3.1 as a child. I remember playing with the edge cases, like “unreal mode” which was like real mode but where each segment could be addressed with 32-bit registers. Knowing the complexity of Intel architecture was relevant, and part of how I learned computer architecture in general.

But more recently, all of this knowledge has seemed overpresent. Too many times I’ve seen people assume Intel architecture and bring these old irrelevancies of PCs into conversation and even formal talks, assuming familiarity with not just operating system and systems concepts but the Intel-specific details of them. They’ll be talking about registers and you’ll see that instead of generic names like r3, r4, they’re talking about specific Intel registers. Or they’ll mention cr3 instead of generically saying “page table base register,” or “the syscall instruction” or even the obsolescent 32-bit int 0x80 instead of saying “issuing a syscall through a trap.”

The biggest example is how often I hear people talking about “ring 0” and “ring 3” when they should be saying “kernel mode” and “user mode.” The numbered rings are so jarringly and gratuitously Intel-specific. It makes me wonder if they genuinely think all processor architectures number protection rings or privilege levels like that (they do not), or if they think the intermediate rings between 0 and 3 are still relevant to modern OS design on Intel (they are not). Or perhaps they’re just okay with assuming Intel, ignoring the mobile and embedded worlds, and also bringing in an irrelevant, overengineered concept while they’re at it.

Maybe this will stop now that Intel is eliminating the unused rings 1 and 2. Maybe people will stop occasionally talking as if protected mode was an exceptional mode, now that it won’t be a mode at all, but the only way the processor runs.

Voice is Hard

2023-05-22T00:00:00+00:00

I was reading my ADHD blog post today, considering whether to send it to a friend, and it was surprisingly hard for me to bring myself to. I realized I was embarrassed at the voice, the phrasing, the lack of beauty in the individual words, all of which is something I paid relatively little attention to before – and which my friend, who also writes, will definitely notice.

It’s something I’ve paid less attention to than I should. “Writing is thinking” is my philosophy, and I have tons of thoughts that I know other people are interested in. Shouldn’t the structure of the thoughts, both the logical structure and the order in which they’re presented, be more important than voice? And I still believe they are – and yet voice does still matter.

I know this deficit has frustrated many of my writing friends. They know that when it comes to it, I can produce English with good voice, with solid and compelling rhetoric even. They know that because when I talk, especially when I’m passionate about a topic, or excited about the conversation, it comes out much smoother than when I write, much more poetic. How can this be, when I have more time to plan writing? How have I not leveraged my deeply cultivated conversational skills to be a better writer? How do these fluent spoken conversations and stilted written phrasings exist in the same person?

Perhaps it’s because when I’m writing, I’m using all of this extra planning ability for something besides voice, perhaps even contradictory to it. But am I even achieving … whatever this other thing is? Or am I simply overbaking all my statements with my focus on clarity and good structure, with no upside, letting my sentences sit for too long under my skeptical eye until they’re purified not only of any confusion and complexity but also any character?

Whatever the problem, my speaking remains unaffected. I could leverage this. There are so many YouTubers, so many podcasters… I might be better off doing one of those things, leaning on my natural (or at least far better cultivated) conversational tone, or my natural instructional, professorial voice, rather than my artificial, too in-the-head, downright overbaked writing style.

Or maybe I should just bite the bullet and do what multiple people have now recommended: the awfully intimidating and high-executive-function task of figuring out a sound recording workflow, where I speak what I want to write, and then listen and type it up. This might just work, as long as I’m okay with more organizational complexity, more phases of work in between outline and draft, and even more steps before a project can finally be deemed finished – if such a state is ever possible.

In any case, I’m committed to writing. I’m committed to continuing this blog, writing fiction, and finishing other on-going (secret!) writing projects. Ideally, I wouldn’t have to talk first and then write. Ideally, I could just sit down and words would flow through my fingers as naturally and as artfully as they come out of my mouth.

I don’t have an easy solution. I’ll continue to pay attention, both to my voice and other people’s, and read advice – and I even plan on doing the voice-recording thing at some point, if only just to try it. But most importantly of all, I think I just need a ton of more practice, more low-stakes writing, and simply much more raw volume.

To this end, I’ve decided to commit to writing at least one writing prompt from 300 Writing Prompts daily, a journaling prompt book given to me a while ago by a dear friend. When I first got this book, I was confused, because the prompts in there don’t seem designed to lead to stories or writing ideas. But for raw writing practice, they’ll be perfect. And perhaps by communicating in yet another medium – the hand-written word, instead of the spoken or the typed – I will be able to develop more fluidity in all of the media through which words can be delivered.

A New Garden: Rust vs C++ mdbook

2023-04-24T00:00:00+00:00

Here it is, the Rust vs C++ mdbook.

I’ve wanted for a while to re-organize some of the content on my blog into gardens. I got the idea from the blog post “The Garden and the Stream: A Technopastoral”. Basically, some content is ill-suited to date-based, time-organized systems like blogs. In fact, most of my content remains valid over a long period of time, rather than participating in conversation (with some exceptions), but rapidly becomes less discoverable after I’ve written it, as it is buried by newer posts.

If I want to have content that is useful in a long-term fashion, the blog is not the ideal structure. While you can always scroll down, or look through tags, a more refined system would be to store information in gradually evolving, more comprehensive documents, that are gradually augmented or refined over time, that is to say, a garden.

The About Me page on a blog is one example of this, but my blog series about Rust vs. C++ seemed like another one where I had a lot of material that could be better structured and more coherently presented in a single, hierarchical document.

So I’ve posted it as an mdbook, here. I don’t like to think of this as a “book” in a form that would ever be published on paper – it’s not long enough, interesting enough, or complete enough for that. That would also go away from the garden aesthetic, where it is a continuous work-in-progress that is always evolving. But I do think the mdbook format is better suited to the material than my existing blog series, for long-term access.

I haven’t incorporated all the material from my blog series yet, as some of the older material I think could stand a re-write. It is maintained in the open on GitHub, so feel free to give feedback there in terms of issues and even merge requests. It’s released under the CC license for non-commercial, attributed, share-alike use, with this license file.

While I will continue to try and integrate existing material into this garden, and expand on it when I am inspired to do so, I plan on not focusing on Rust vs. C++ going forward. If there are any substantial additions, however, I will update you on this blog.

Thank you for reading! More, different Rust content is coming soon!

Rust: A New Attempt at C++'s Main Goal

2023-04-06T00:00:00+00:00

I know I set the goal for myself of doing less polemics and more education, but here I return for another Rust vs C++ post. I did say I doubted I would be able to get fully away from polemics, however, and I genuinely think this post will help contextualize the general Rust vs. C++ debate and contribute to the conversation. Besides, most of the outlining and thinking for this post – which is the majority of the work of writing – was already done when I set that goal. It also serves as a bit of conceptual glue, structuring and contextualizing many of my existing posts. So please bear with me as I say more on the topic of Rust and C++.

Rust is a polarizing programming language, because of how radical it is. It has gone the furthest in introducing features from functional programming languages into the mainstream world, and ignoring long-held programming language design principles from the realm of object-oriented programming. Its fans can be very enthusiastic, sometimes off-puttingly so, stereotypically demanding that all software be rewritten in Rust even when completely unfeasible – a stereotype that is mostly untrue, but whose existence and occasional true examples shows the intensity of the debate. But a lot of Rust’s criticism comes specifically from C++ programmers, and correspondingly a lot of Rustaceans’ criticisms of other programming languages is directed specifically at C++, including mine. Even the creator of C++, while not mentioning it by name, entered the fray (and along with other Rustaceans, I responded).

There’s a good reason for this particular rivalry. While usable in other domains, Rust is strongest where C++ has hitherto been unopposed: as a high-level systems programming language. Many of Rust’s greatest strengths are directly based off of ideas originated in C++. And Rust has, in many ways, the same goals that C++ has. It can be argued – and in this post I shall argue – that Rust has the exact same overall goal that C++ does, albeit with a different interpretation of how that goal is best accomplished.

Zero-Cost Abstractions

C++ has an explicit goal of providing zero-cost abstractions.

This is a bit of a confusing term of art and has the potential to be misleading, but it comes attached with explanations that clarify it some. It is also referred to as the “zero-overhead principle,” which Dr. Bjarne Stroustrup, father of C++, explains (see pg. 4) describes as containing two components:

What you don’t use, you don’t pay for (and Dr. Stroustrup means “paying” in the sense of performance costs, e.g. in higher latency, slower throughput, or higher memory usage)
What you do use, you couldn’t hand code any better

There is also an executive summary of the concept at CppReference.com.

I, however, prefer the terminology of “zero-cost abstraction,” confusing as it can be, because it embodies a hidden third principle, that is unstated among those other two, and against which those other two principles are balanced. The word “abstraction” is the key, and the third principle is:

You can still get the abstractive and expressive power you expect from a modern programming language.

This third principle is necessary to distinguish higher-level “zero cost” languages like C++ and Rust from lower-cost languages like C.

To fully explain why I include this third principle, and to delve into the history of the concept in general, I want to talk more about C.

C: The Portable Assembly

C has often been described as a “portable assembly language.” Unlike other high level programming languages before it (“high level” at the time meaning anything higher level than raw assembly language), it exposed users directly to gnarly machine-language abstractions like pointers, and to common assembly-language capabilities like shifting and bitwise operators.

The goal was to give the programmer something minimally distinct from assembly language, where the programmer had almost as much control over the computer as an assembly language programmer without sacrificing portability. Few higher-level features have been added, even now: there was no built-in string type, and only a limited array type that exposed the underlying concept of pointers the instant you poked at it. Structures are little more than a way of calculating offsets, and memory management is done by explicitly invoking memory management routines.

C’s preference, in general, was to only add onto assembly those features absolutely necessary for portability, and not to impose any other structure on the programmer – or, said another way, not to provide any other structure to the programmer.

This was far from an iron-clad rule. And there are definitely exceptions: C, built into the programming language, prefers null-terminated strings (also known as “C strings”) to arrangements that use specific lengths, a substantial constraint on the programmer beyond assembly language and probably a mistake overall.

More deeply, and probably less avoidably at the time, C assumes a traditional call structure. Many techniques that can be used to implement closures, co-routines, or other more radical alternatives to a call stack are difficult to impossible to do with standard C – while generally being possible in any assembly language.

But, with these exceptions, C generally does tend to only provide one overarching abstraction, portability, and when it does, it has the same zero-cost goals that C++ has, to only make the user pay for the abstractions they actually use, and to provide abstractions as efficiently as the equivalent hand-coded assembly.

Put another way, C++’s zero-cost overhead principle, as Dr. Stroustrup defines it, is more or less inherited from C. Where C++ differs from C is in the “abstraction” part of providing “zero-cost abstractions.” Everything you can do in C++ you can do in (potentially tedious and repetitive and error-prone) C, but C++ provides more abstractions, beyond just what is necessary for portability.

C++: A More Abstracted C

This gives us a framework for understanding the entire goal of C++, and I would argue, of Rust. Once we understand that C++ is trying to keep the zero-cost principle of C, where abstractions do not come with a performance penalty (and where “zero” is a reference to the difference between the performance cost and a manual assembly-language implementation), but with the expressive and abstractive power of a higher-level programming language, everything else about C++ makes sense.

C++ was originally christened “C with Classes,” and it tried to add Object-Oriented Programming to C. All the mechanisms of OOP could be portably added to C directly by an application or library developer with judicious use of function pointers and structure nesting (and glib is a famous example of a library that does exactly that), but C++ built this abstraction into the programming language itself.

Objective-C also did this (and according to Wikipedia it “first appeared” one year sooner in 1984), but Objective-C has always felt like two programming languages glued together. In Objective-C, the object-oriented features do not inherit the zero-overhead principle from C – nor do they look like C at all. They look instead like a Smalltalk dialect, where switching between C and this odd Smalltalk dialect was permitted on an expression-by-expression basis using an odd mix of square brackets and @-signs.

In C++, the added abstractions, including OOP, take on more of a resemblance to C, and importantly, continue to try to retain C’s advantages in systems programming by making the new features zero-overhead.

During much of the history of C++, OOP was considered to be the most important abstraction that a programming language could offer. But once it was added, it expanded the scope of C++ abstractions. Nowadays, C++ is considered multi-paradigm, and provides not just OOP, but a wide array of abstraction.

Nowadays, C++ tries to keep up with other programming languages in what features it offers, to the extent that it can while being limited by the zero-cost principle. This is in sharp contrast to C, which continues to try to define existing features better and make them more rigorous within the existing feature scope. The only features C++ rejects out of hand are those that do not jive with zero-cost abstraction, showing that in actuality C++’s defining trait is to have the three-pronged concept of zero-cost abstraction that I introduced above, two prongs about “zero cost” and one about “abstraction”:

What you don’t use, you don’t pay for
What you do use, you couldn’t hand code any better
We give you the power of abstraction expected for a programming language of the day

This is why garbage-collection is not offered in C++ (though it is still possible to implement manually) – it cannot be offered in a zero-cost way. However, C++’s alternative to garbage collection, namely RAII, continues to become more effective as new features like move semantics and std::unique_ptr were added, to the extent that in modern C++, it would be unimaginable not to have those features, and they have become essential to C++’s memory management model.

These three goals explain why C++ keeps accruing new features, whereas C maintains the features it has. They explain why C++ had to add templates – as a zero-cost alternative to OOP, or a zero-cost way of implementing collections. They explain why C++ had to add move semantics – because without it, RAII is a worse abstraction than GC.

Rust: A C++ Redo

Rust simply does a better job at achieving these goals, because Rust gets to start from scratch, with the modern concept of what’s expected in a high-level programming language, rather than working forwards through time. And, in doing so, it avoids a lot of the mistakes that C++ made, and can design a language that includes all of the modern features together.

A full set of OOP features is no longer ideologically required, so Rust doesn’t offer them. Instead, safety has become a sine qua non, so Rust offers that (with an opt-out provision). One might argue that safety violates the zero-cost abstraction because of bounds checking, but that’s simply not true as defined. You only pay for bounds checks if you’re actually using the feature of safety – unchecked unsafe accesses are in fact available just an unsafe keyword away – and the feature of safety is implemented as efficiently as one would by hand (by inserting bounds checks into array accesses).

Similarly, C++ has learned that move semantics turn out to be essential in an RAII/value-semantics model to avoid spurious copy-and-deletes and/or indirections for e.g. storing std::strings in a std::vector that might be resized. Before move semantics, C++ often forced violations of the zero-cost abstraction principle by providing abstractions that would do extraneous copies or required extra indirections to use effectively, which is not what an assembly language programmer would ever write. However, since C++ move semantics were bolted on after the fact, it does them in a deeply confusing way, where Rust gets to reset and design itself for destructive moves from the get-go.

A Note on “the RAII Model”

In my RAII post I referred to C++’s alternative to garbage collection, centered on RAII, as the “RAII model,” and wrote that std::unique_ptr and move semantics were essential to this model. A Reddit comment later explained that I must be confused, because RAII pre-dates those features.

They had misunderstood me, and I stand by my statements, but I think it is worth some clarification. By “RAII model,” I mean RAII and other features which, when combined, provide an alternative to garbage collection. And the RAII model before C++11 did indeed lack features essential to competing with garbage collection. It was simply a worse model then, and much harder to use correctly in a complicated codebase.

In a similar way, I would say that in Rust, borrow checking and destructive moves are essential to the RAII model, because without it, the model is a much worse competitor to garbage collection. And yes, that does imply that C++’s concept of RAII is fundamentally deficient by not being paired with borrow checking, just like pre-C++11 RAII was fundamentally deficient by not being paired with move semantics and std::unique_ptr.

The alternative to garbage collection that C++ and Rust have built has been a work in progress through most of its history. Rust had to be a new programming language rather than an evolution for a number of reasons, but fixing C++’s lack of borrow checking and weird move semantics were some of the most important such reasons.

Backwards-Compatibility

Of course, C++ does have goals that Rust drops – and in doing so, it can do better at this core goal. The biggest such goal is perhaps also a trivial example: C++ has the goal of being source-compatible with earlier versions of C++, and even to some extent with C. This makes sense, as backwards-compatibility between versions is sort of a fundamental expectation of any programming language, certainly one that tries to provide a modern set of abstractions, but it does restrain C++’s development.

While Rust tries to be backwards compatible with itself, dropping compatibility with C++ has allowed it to get out of a lot of C++’s accumulated cruft of complexity, much of which is inherited from C times.

This accomplishes a lot on its own. C++’s syntax has gotten so complex over the years that many in the C++ community are doing their own resets of the syntax, including Herb Sutter’s cppfront and Google’s Carbon. Even if starting from scratch to accomplish C++’s goals was the only thing Rust did, it would still result in a much better programming language, more ergonomic and with fewer pitfalls.

Some criticize Rust by saying that in another 30 or 50 years, Rust will end up as convoluted as C++ is now. This criticism has confused me, because it seems possible, even likely, that this is true, but that doesn’t strike me as a reason to not (gradually and responsibly) switch from C++ to Rust (especially for new projects or for when rewrites are particularly called for). If this is true, that just means programming languages are subject to entropy and obsolescence like everything else. And in that case, C++ will just continue to get worse, Rust will also continue to get worse, and Rust will be better than C++ the entire time. If all programming languages accrue cruft as they age, in what world is that a reason to use the cruftier programming language?

Most Rustaceans are not, despite the stereotype, treating Rust as some apocalyptic, messianic programming language to end all programming languages. I wouldn’t be surprised if 20 or 30 years from now, a new programming language will emerge, accomplishing the same goals from a fresh start. And when that happens, I will probably advocate in favor of this new programming language just like I now advocate in favor of Rust.

The goal isn’t to have an eternally good programming languages; the goal is to have tools now. What should new projects be written in now? When a rewrite is called for (as it sometimes is), should it include a new programming language now that there is a viable alternative?

I suspect that many making this argument are including an unstated assumption – that C++’s cruft is actually a sign of its maturity, and fitness for production use. Alternatively, and a little more charitably, they might assume that Rust isn’t ready for production use yet, and by the time it is, it will be just as crufty as C++, perhaps converging to the same level of cruft. But while there are a few categories where Rust lags C++, they are mistaken in the big picture. For the vast majority of C++ projects, Rust is already a better option for if the project had to be rewritten from scratch (a big “if,” but irrelevant to the merits of the programming languages).

Rust Deficits

Rust has a few downsides compared to C++.

Interfacing with C is an important goal for reasons besides backwards-compatibility. On many platforms, C serves as a lowest-common-denominator programming language, and its ABI serves as an inter-language protocol. C++ does provide smoother interfacing with this protocol than Rust does.

Relatedly, C++ generally has a relatively stable ABI on a given platform for a given compiler vendor. This allows dynamic libraries to be used as plugins with minimal glue code, something that in Rust normally requires awkwardly working through a C ABI interface. Personally, I think machine-language plugins as dynamically loaded libraries are mostly a relic of past software distribution models, and haven’t seen many situations where they make sense, but I could think of a few edge cases.

In both of these cases, Rust is clumsier, but not completely incapable. Rust still can speak the protocol that is the C ABI, just not as natively and smoothly-integrated as C++.

Other downsides of Rust have to do with network effects and Rust adoption. There is only one Rust compiler, while there are multiple C++ compilers, that work together through a standards process. GCC is currently in the process of getting Rust support, and we’ll see how well that works out for Rust.

Similarly, there are a lot of libraries that exist in C++ that don’t yet exist in Rust or have Rust bindings. Though that’s true of any pair of programming languages, it is a specific reason some developers might still want to write new projects in C++ in favor of Rust.

Finally, while I still think Rust would be a better programming language than C++ even if unsafe code were allowed everywhere, I think Rust could do more to make its rules clearer in the unsafe realm. The fact that the latest research on Rust’s memory models seems so deeply difficult to square with how async code often works as in this bug report makes me nervous.

I’m sure there are other ways in which Rust is behind C++, and the devil is as always in the details. I’m sure I’ll find out about some of them as soon as I post this post.

Conclusion

This was all topics I’ve discussed in other blog posts, but I hope this brings some perspective on how I think about the programming languages in general, and provides a conceptual framework for thinking about some of my other posts. I was a fan of C++ because of its goals, and I’m now a fan of Rust because I think Rust pulls them off better. When I was skeptical of Rust, it was because I did not think Rust would pull them off better, but that was due to a misunderstanding.

Next Steps

I am considering using (a revised version of) this post as an introduction, and then trying to bring all of my Rust vs C++ content into an mdbook so it could be more of a garden. It would have a title like “Rust: A Better C++ Than C++” and be licensed under some CC non-commercial license, and it would accept MRs from other people as a community resource for consolidating resources on this particular issue. Then, if I had further ideas I could put them in there. What do people think of that idea?

I realize now that I write this that the repo where I already have the bones of this idea is actually already public. I think I’m going to restart from scratch with just a reorganization existing blog posts, and save the more ambitious ideas in those notes files for later. What do people think?

Guest Collaboration: Paradigm Shift

2023-03-28T00:00:00+00:00

Does the choice of programming language matter?

For years, many programmers would answer “no”. There was an “OOP consensus” across languages as different as C++ and Python. Choice of programming language was just a matter of which syntax to use to express the same OOP patterns, or what libraries were needed for the application. Language features like type checking or closures were seen as incidental, mere curiosities or distractions.

To the extent there was a spectrum of opinions, it was between OOP denizens and those that didn’t really think software architecture mattered at all — an feeble attempt of corporatization against true programmers and their free-spirited ways. The office park versus the squatters. That’s how we got the wave of so-called “scripting languages”.

But OOP was the least of their concerns. They shrugged along with some sort of class system, and save their criticism for (static) types and compilation (an implementation strategy, not language property).

Now, times are changing. When in the last 30 years have we seen so many concurrent pivots in major languages?

Perhaps it began with lambdas. Once, they were seen as curiosities from the functional world, a special case of an OOP class overriding a single method (which is exactly how you had to write them in C++ in Java). Now, Java has lambdas. Even JavaScript thought its function() syntax was too heavy, replacing it with a lighter-weight =>. Hold up, even Excel has lambdas. Functional programming has intruded against the mainstream consensus.

When this intrusion broke through, the old equilibrium cracked. Both the OOP consensus and scripting language counterculture started to crumble. Now, Javascript, Python, and Ruby are getting type checking. Java is getting a whole mish-mash of “functional” features. C++ is de-emphasizing inheritance and doubling down instead on templates. Even Go is getting generics.

So here we’ve reached a funny point. Before we had a bunch of languages which roughly did the same thing. Now we have the same bunch of languages all adopting the same features they never dreamed of having before. Within that cohort there is still little reason to adopt one or another, but over time there are clear reasons to choose the newer versions over the older versions. You might not care about Java vs Go, but you sure as hell want the version with generics over the versions that don’t.

So among 20+ year old languages, the choice of languages absolutely matters for programmers with time machines (or contemplating Debian stable), but what about for the rest of us?

Well, there are newer languages now mainstream (enough) too. And here we find the front of the pack, the language bringing functional features into the mainstream more completely and thoroughly than others (because being born with them helps): Rust.

There are other languages zooming out in front of the pack, leading Rust just as Rust leads the others. Being way out ahead is exciting. But it can be lonely. It might be cold. And you might run out of steam. Being at the front of the pack, the furthest along of the mainstream, is nice. You still see where we’re going better. You go there early. But you’re not alone; you’re shoulder to shoulder with others doing the same.

If that sounds nice, learn Rust. Don’t learn it as a mish-mash of exotic cool features. And don’t let it lull you into thinking you must do some sort of whiz-bang systems programming that almost no one does.

Learn Rust, idiomatic Rust, yes, for solving all the mundane problems you face in your programming life, but also to get a head start on what will be the next era of accepted programming practice. Learn type classes (aka traits) in their full power (and not just the object-safe ones), and learn how Rust’s move semantics can be used to simulate type-state.

These features might seem niche now, but remember, so once did lambdas.

Rust Tidbits #1

2023-03-24T00:00:00+00:00

This is a collection of little Rust thoughts that weren’t complicated enough for a full post. I saved them up until I had a few, and now I’m posting the collection. I plan on continuing to do this again for such little thoughts, thus the #1 in the title.

`serde` flattening

What if you want to read a JSON file, process some of the fields, and write it back out, without changing the other fields? Can you still use serde? Won’t it only keep fields that you know about in your data structure?

Turns out, you can parse the fields you want, while also just preserving the fields you don’t!

#[derive(Serialize, Deserialize)]
pub struct {
    pub known_field: KnownField,
    pub known_field2: KnownField2,

    #[serde(flatten)]
    pub unknown_fields: BTreeMap<String, serde_json::Value>,
}

I found out about this in the serde documentation, so it’s not an original insight, but it came in handy for me recently and so I’m trying to raise awareness:

`let` surprises!

So, in Jon Gjengset’s popular Twitter thread transcribed here, he wrote this:

Did you know that whether or not let _ = x should move x is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488

I didn’t think much of this, besides making a note to self not to use let _ = x to ever drop anything, which hopefully I wouldn’t have done anyway because drop(x) is much more self-evident in what it intends. I remember also vaguely hoping that it did drop, because in my mind that was the obvious, logical thing for it to do.

But then later, as I was writing a match, I realized why _ couldn’t mean drop, from the match context:

match foo.bar.baz {
    MyEnum::Option1(_) => {
        // This shouldn't move from `foo.bar.baz`, but just
        // inspects whether it is `MyEnum::Option1`. Otherwise, there'd
        // be no straight-forward way to perform that inspection!
        //
        // And indeed, it doesn't.
        None
    }
    MyEnum::Option2(ref baz_inner) => {
        Some(foobar(baz_inner))
    }
}

So, if let _ = x was to be consistent with this use case, well, that meant that _ has to not drop, as it’s important for _ to mean the same thing. And, after all, the left-hand side of a let is just another pattern context!

But wait, I thought! Does this mean that you can write let ref x = y;? Yes, it does. It’s just another way of writing let x = &y;… But just because you can write it that way, doesn’t mean you should. Keeping to idiom is important.

Nevertheless, fun fact! The more you know!

Remember: `serde` `struct`s Can Be Function-Local

Let’s say you need to extract three fields out of some JSON, like name, age, and phone_number (which, ironically, is a string in JSON terms, and not a number). One of the great things about Rust and serde is that you can just write those fields in a struct with the Deserialize trait (which is deriveable) and grab the values into such a struct, even if there’s other actual fields in the JSON:

#[derive(Deserialize)]
struct Person {
    name: String,
    phone_number: String,
    age: f64,
}

let person: Person = serde_json::from_str(json_str);

The question then becomes, where should Person go? Well, if you plan on passing around this Person value, and structuring the rest of your code in terms of it, then it should be a prominent type.

But more often, especially in my own code, I immediately split such a structure into its constituent parts, which I then will use for other things:

let Person {
    name,
    phone_number,
    age,
} = serde_json::from_str(json_str);

let handle = person_database.lookup(&name)?;
handle.set_phone_number(&phone_number);
let demographic = demographic_for_age(age.trunc() as u32);

This is very reasonable. It makes sense that our internal data structures would be designed for whatever logic we want to do on them, rather than having them coincidentally match the wire format. For most complicated applications, having the internal data format match the wire format literally is actually sort of a code smell.

So, we often will have types that we use to deserialize (and serialize) JSON in exactly one function. In that situation, the type should in fact be written locally to that function. So in the example above, where struct Person { ... } is immediately followed by the serde_json::from_str, I didn’t just write them next to each other as convenience. I would literally put them together in a function:

fn do_thing(json_str: &str) -> Result<()> {
    do_something_else()?;

    
    #[derive(Deserialize)]
    struct Person {
        name: String,
        phone_number: String,
        age: f64,
    }

    let Person {
        name,
        phone_number,
        age,
    } = serde_json::from_str(json_str);

    let handle = person_database.lookup(&name)?;
    handle.set_phone_number(&phone_number);
    let demographic = demographic_for_age(age.trunc() as u32);
}

I bring this up mostly because many programmers don’t seem to be aware that you can do this, or don’t think to. I’ve seen people write types like Person at the top level. I realize that many programming languages either don’t let you do this sort of embedding, or else strongly discourage it. But I’m a big believer in giving things the least scope they need, and for many serde-related types, that’s function scope.

Rust Shadowing

Speaking of minimal scope, I wanted to write in praise of Rust’s penchant for shadowing that allows you to not have to come up with a bunch of names for the same thing. Oftentimes, we just convert the same information from type to type: wire format in bytes, to parsed wire format, to application domain format (wrapped in an Option in a Result), to application domain format with errors and absence handled (not wrapped in those things… Fortunately, Rust lets us shadow and re-use names for these different variables, and ultimately we get code that looks something like this (although no type annotations are normally necessary):

let foo: FooTypeC = {
    let foo: FooTypeA = get_foo();
    let foo: FooTypeB = transform_foo(&foo)?;
    match foo {
        Some(foo) => transform_foo_again(foo)?,
        None => FooTypeC::default(),
    }
};

This is really helpful, along with the fact that braces { … } enclose expressions, in really minimizing how much scope each variable has. But it’s also really helpful, because if shadowing wasn’t available, what would we name all these different variables? foo_a and foo_b and similar stupid names? This is an issue in certain other programming languages where shadowing isn’t as straight-forward, and the results aren’t fun.

Treat Tolkien's World Like Other Mythologies

2023-03-23T00:00:00+00:00

Tolkien was trying to make a new mythology, a new set of deeply resonant stories, for modern (especially English) culture, and he succeeded. He transformed fantasy, and founded the concept of high fantasy. His detailed legendarium (as his mythology is called) is a masterpiece of world-building, with deep symbolism and emotional complexity, a mythology with arguably more depth and room to explore than many ancient ones. Tolkien scholars work full-time to study it, and many more people draw from it explicitly and implicitly for their own art, in D&D and other more modern fantasy settings. Especially with his near-human species, his concepts of hobbits (off-brand as halflings) and elves (distinct from previous iterations) have deeply resonated with many people.

And yet, relatively few of the works from his legendarium are actually that enjoyable (or even feasible) to read. Only the works published in his lifetime, The Hobbit and The Lord of the Rings, are readable as literature by modern audiences – and many modern readers even struggle to get through the reams of poetry and milieu-building that make up The Lord of the Rings, with many fans only familiar with the much more digestible Peter Jackson movies.

As for the posthumously published works, even though they more fully flesh out his beautiful and intricate imagined history – detailing, for example, the character of the elves, the creation of that world – they are extraordinarily dense and heavy reading. The Silmarillion has famously been compared by many readers to the Old Testament, and this was definitely meant as an insult – although I personally am a huge fan of the Hebrew Bible as literature, this is a large part of why my friends consider me such an eccentric.

It’s easy to understand why. The Silmarillion was composed in a remarkably similar process to how historical-critical scholars say the Hebrew Bible was composed. That is to say, both were composed after the fact, by a redactor. For both, this redactor stitched together somewhat-contradictory stories in their rudimentary form into a consistent order, with minimal editing and no attempt at expansion. Many of the original sources read as synopses, and the only consistency of voice is self-conscious archaism and reverence.

And the more recent publications are worse, and even more like modern editions of ancient texts. Beren and Luthien is stitched together between poetry and prose, and as full of footnotes as many “study Bibles” I’ve seen. This is beautiful work – or at least the source material is. But rather than presented in a form that can be enjoyed, it is only presented in a form that can be studied. It is treated like we treat the literature of an ancient civilization (as C.S. Lewis complained about in his Introduction to Athanasius’ On the Incarnation), rather than the writing of someone who died within living memory and whose works are now a multi-billion dollar media franchise.

And that’s a damn shame, because the world that Tolkien created is beautiful, and the stories that he grew within it are beautiful, and deserves a presentation, a literary realization, as beautiful as the underlying concepts. You shouldn’t have to be the type of nerd who is intrinsically driven to read through tedious notes to see the underlying beauty that is the Tolkien legendarium – the First and Second Ages of Middle Earth deserve to be portrayed through engaging, well-written literature, like the end of the Third Age is in The Lord of the Rings, rather than a study Bible that can only be read by those whose devotion to Tolkien borders on religious – the hyper-nerds (and I number myself among them) whose very existence testify to how great the ideas are.

Unfortunately, too many Tolkien hyper-nerds feel that their ability to access this is a compliment towards them – that it’s the reader’s fault for not liking or being able to get through The Silmarillion, for not being dedicated enough. But this attitude – in addition to being arrogant, ableist, patronizing, and damaging to the reputation of Tolkien fans as a whole – is simply not worthy of the beautiful world that Tolkien built. Tolkien was trying to craft this world into publishable books.

That world should be accessible to as many people as possible, not only some “elites” who are willing to go footnote-diving. If that world is so beautiful that it inspires some people to do incredible feats of research to try to understand it, it is beautiful enough that it should also be shown to those who (quite reasonably) don’t have the time or energy for such activities.

This is a solvable problem – in fact, many other literary franchises have solved it handily: The franchise should be opened up to collaborators. There would be no shortage of talent: Tolkien’s work is well-loved within the genre, even foundational. Many fantasy authors – top-tier ones – would consider it a great honor to be able to write within Tolkien’s legendarium in an authorized fashion.

But for it to work, the Tolkien estate would have to allow those collaborators to do their job. We have to allow them to question and add complexity to Tolkien’s themes, to explore some of the awkward components (like the moral status of the orcs, or questions of races of men and apparent races of elves) at their own discretion. They have to be allowed to make adjustments to the canon – something Tolkien would’ve done freely himself.

Perhaps it would be made easier if there was no attempt to keep the extended canon strictly consistent, if they rejected that as a possible goal from the outset. Some rules and negotiation will doubtless be necessary, but complete alignment to canon and literary excellence are fundamentally incompatible goals – and literary excellence the more important one.

Because of course, these works are literally not scripture, or ancient texts. They are modern fiction, and like many works of fiction they deserve to be taken seriously – but not religiously. Those who do take it religiously are not the best company to keep.

But even if they were ancient texts, allowing an open, logically fluid canon would be appropriate. After all, Tolkien was trying to build a modern mythology – a legendarium – and ancient mythologies are contradictions and fanfic the whole way down. Remember Achilles' Heel? It seems a core part of the mythos. But not only are there multiple versions of that story that disagree on how the rest of him was made invulnerable, it also doesn’t even appear in the Iliad which considers him as vulnerable as any mortal. There are even versions of Achilles' story where he dies a normal death being shot in the back.

Throughout antiquity, every time a new poet or playwright would set a Greek myth to writing, they’d put their own spin on it. When modern writers do the same, they’re not ignoring or changing or misrepresenting Greek mythology, but just continuing the same pattern.

This is nothing against Tolkien scholarship, and trying to study his mind and the original intent behind the legendarium – that is also a good thing. But perhaps that scholarship would be more useful if it had an outlet in the creation of new works.

The TV Show

Of course, so far I’ve avoided the elephant in the room – the new Rings of Power TV shows. So I will address that now.

I acknowledge many of the problems with them. I was particularly disappointed by the “elves taking our ~~jobs~~ trades” concept. Rather than express the original, interesting reasons that men had become bigoted against elves in Númenor – jealousy of Elvish immortality and closeness to the gods – they fell back on a cheap political reference. I’m OK with changing the canon, but it doesn’t work. Elves are the colonial power, the more privileged species, and “taking our jobs” is generally a line that is used by the more privileged against the less privileged. It makes no thematic sense, and I have to simply pretend they said something else in order to keep watching the show.

(Diverse casting is, to be clear, not a problem with them. It’s a fantasy world, and they’re actors. Literally all of the elves are also – gasp! – depicted by non-elvish actors. It’s unfortunate that that conversation took up well-needed space for better conversations about the show.)

But that’s the risk of opening the canon up, and I accept it. I did enjoy Rings of Power a lot, just for depicting on screen many places and events that were emotionally resonant for me. I am happy it was made, flaws and all – while still not really counting it as “canon” in my mind. I hope they recover from many of their flaws, and I hope more work like it is done.

Because more important than “getting everything right,” it presented this world in a way that many of my friends could enjoy it. Basically none of my friends would be willing to read The Silmarillion just because they would enjoy discussing it with me (or for any reason at all). But many of my friends watched the show, and enjoyed it, and those discussions have been great.

Now, imagine if more such works were made, and by established fantasy greats!

The Importance of Logging

2023-03-21T00:00:00+00:00

Intro programming classes will nag you to do all sorts of programming chores: make sure your code actually compiles, write unit tests, write comments, split the code into functions (though sometimes the commenting and factoring advice is bad). Today, however, I want to talk about one little chore, one particular little habit, that is just as essential as all of those things, but rarely covered in the CS100 lectures or grading rubrics: logging.

And why am I choosing this particular topic for a blog post today? Simple: It’s to punish an earlier version of myself for not logging enough, for not caring about logging enough. It turns out it’s important. But I’ll get back to the OOP blog series soon enough, don’t worry!

Logging – writing text describing what’s been happening in your program to a file or other storage system – is essential for any software system. Luckily, Rust has a (nearly) standard logging framework, technically outside the standard library but maintained by many of the same people and solidly endorsed by the community: the log crate. But note: Even though this post is written specifically for Rustaceans, much of the advice and commentary in here will apply to logging systems in all programming languages.

Logging is essential for debugging and troubleshooting. When you find a bug, you need to find out which specific part of the program is actually broken out of the many parts, because it’s often not the part that’s visbly acting weird. This is often the first step in addressing a new bug after reproducing it, or even part of figuring out how to reproduce it – or the step before that, so obvious it goes without saying, of noticing that a bug exists.

In fact, logs can be helpful at every stage of the debugging process. You have to confirm your assumptions on what parts are known to work. After all, the whole program is supposed to work, and often times, the thing that’s broken is something that you would’ve assumed definitely worked, until absolutely everything else was ruled out.

Every programmer understands this intuitively, even as a student or a beginning self-taught programmer: When you are developing a project, and it’s not working, the easiest ad hoc debugging technique is “debug print statements,” a go-to technique of CS100 students worldwide. Ironically, CS100 professors often advocate against this in favor of debuggers, in spite of the fact that logging, the grown-up version of debug prints, is more generally useful, as code often exhibits bugs in environments where it didn’t happen to be running in a debugger, like production.

Debug prints work, by accomplishing two goals:

Verifying that the program got to the point of that debug print line.
Verifying that the data it has at that point is correct.

Logging is fundamentally debug print statements, but phrased and annotated correctly, so that it looks professional both in the code and in the log, and uses actual logging mechanisms with timestamps and log levels and stuff.

So instead of:

initialize_rainbows();
println!("Got here 2");
initialize_sunshine();
println("Got here!!!!");

You write the much nicer-looking:

initialize_rainbows();
info!("Rainbows fully initialized");

initialize_sunshine();
info!("Sunshine fully initialized");

When To Log

You should log as much as possible.

Every time you make a decision, you should log it. Every time you query a URL or build a string of some kind, you should log it. Every time you load a config parameter, you should definitely log it. This might seem silly, because you’re duplicating the configuration file, but a bug processing configuration (or prioritizing different sources of configuration) can be especially hard to find.

Logging can be used instead of comments to organize functions into parts. If you feel the need to tell the reader of your code what each part of a function does, perhaps you should tell your poor ops person which parts you’ve reached in the same breath. So instead of:

fn close_out_section(self) -> Result<()> {
    // Flush dirty data
    for datum in &mut self.data {
        if datum.is_dirty() {
            datum.flush()?;
        }
    }

    // Close files
    for file in &mut self.files {
        file.flush()?;
        file.close();
    }

    decrease_global_section_count()?;

    Ok(())
}

You could write:

fn close_out_section(self) -> Result<()> {
    info!("Closing out section: {}", self.name);

    debug!("Flushing dirty data");
    for datum in &mut self.data {
        if datum.is_dirty() {
            trace!("{} is dirty, flushing...", datum.name);
            datum.flush()?;
        }
    }

    debug!("Closing files");
    for file in &mut self.files {
        trace!("Closing {}", file.name);
        file.flush()?;
        file.close();
    }

    debug!("Decreasing global section count");
    decrease_global_section_count()?;

    debug!("Section successfully closed!");
    Ok(())
}

These log statements serve both as comments to your reader and information to your administrator at the same time! And, since you are writing to someone who is perhaps not looking at the source code, you don’t feel silly adding even more information that’d be obvious to a reader – which is useful also to readers of the source code, who might not share your definition of what is obvious. In spite of what you may have heard, it’s still a good idea to err on the side of explaining things more in comments. (Yes, I linked that post twice. It’s that good.)

You may object that all this logging might slow down your process a little, and I can see wanting to avoid it in the middle of a computational loop. But oftentimes, people avoid logging when there is no possible performance excuse, when much slower I/O is happening all around it, in comparison to which the logging would be a rounding error. Remember that famous Donald Knuth quote: “[P]remature optimization is the root of all evil….”

Log Levels

In addition to performance, you might claim that the amount of logging that I show above is spammy, and that the resulting log files would cause an information overload. But our programming foreparents were wise, and created an additional tool to address both this, and the potential performance problems: log levels.

An error message is different from a warning is different from information is different from debug printing. We want to distinguish these, so we can avoid seeing insufficiently important logs. There are many systems of log levels, and Rust’s log crate endorses a pretty typical list, enumerated in its Level enum:

pub enum Level {
    Error,
    Warn,
    Info,
    Debug,
    Trace,
}

They form an ordered, descending scale of severity, so that Trace is the least severe. You probably always want to enable Error-level logs (though even they can be turned off) but you probably only want to enable Trace-level logs if you’re doing some serious debugging.

In recognition of how the levels are ordered, log filtering is typically done by setting a level, and then logs of that level or more severe are let through. So if the level is Debug, Warn logs are also outputted, but if it is Error, Warn logs are suppressed. See the LevelFilter enum.

Errors are for problems that stop the process, or at least the specific thing the process was doing (e.g. API or RPC request being serviced). Warnings are for where something seems wrong but we’re going to do it anyway.

Info, debug, and trace are honestly kind of just labels with decreasingly urgent-sounding names, levels for the sake of levels. You should use them according to importance, so that most of the absolute nonsense can get filtered out as mere trace, like implementation details or extra information. You also want the occasional interesting high-level stuff to be captured with info, like what high-level task is the process currently working on. Medium-level tasks can get debug.

In general, the more performance-critical the code, the lower the log level you want to use, to increase the likelihood that you’ll just have a (very predictable) branch to indicate that you don’t need to print that line. Then, if there’s an actual problem, an operator can raise the log level (which they can sometimes do on a per-module basis) when those lines are worth seeing.

As a corrollary, configuration should use info and warn heavily, and generally log at higher log levels. Configuration only happens once, and in one section, so it’s allowed to be spammy. Furthermore, raising the log level at run-time won’t help reveal more configuration logs: unless the configuration is re-processed, you’ve just already missed those messages. Finally, configuration is never too latency sensitive for logging – configuration is the least performance sensitive part of your program.

So there is no excuse. Loading different configuration than you thought you had is a shockingly common cause of bugs and confusing system behavior. Log obsessively in your configuration code, at high log levels.

Using the Log Crate in Your Rust Projects

So how do we log in Rust?

log is a framework – in the words of its well-written documentation, it is a “lightweight logging facade.” The front-end is shared: You output logs through the log crate itself. The backend is pluggable, meaning that different backends exist with different features.

As a result, as the documentation says, libraries should just use the log crate, so that when they output logs, it will work with any backend. Applications choose the backends, and import an appropriate crate, like for example env_logger. The log documentation has a list of available backend crates.

This split between what crates should be used by libraries as opposed to application is not uncommon in Rust. For example, it also comes up with error handling, where libraries should generally use thiserror to preserve error information in a way that applications can programmatically investigate, but applications generally want to use anyhow and eyre to ergonomically convey any errors they cannot handle to the user.

Write Everything Down (Part 4): My Desktop Environment

2023-02-28T00:00:00+00:00

I’d like to share with you how I use my computer, in a way that is (for me) ADHD friendly and well-suited for implementing my organization system. Tools are important to any organizational and productivity system, and optimizing your tools for your brain and your workflow are important. My computer is my most important productivity tool, where my work happens, and where my life/chore/errand/calendar organization happens, so it should be an interesting example of an optimized key tool.

Note: I consider this a non-technical post, as it is intended for a general audience. Even though it is about a computer set-up that I’m not recommending to a non-technical audience, this description and explanation of my computer set-up should be accessible enough for everybody. However, it is also literally about computers, so it’s going in the “Computers/Programming Posts” bucket as well – and therefore it will show up under both feeds.

It’s been some time since I’ve written about organization – I had basically paused the series until further inspiration struck. I had even outlined this very post, and considered writing in more detail about my personal computer usage, how my desktop actually looks, and the actual techniques I use to get this machine to work for me for programming, blogging, and planning. The reason I didn’t was basically because I didn’t think it would be interesting enough.

But inspiration did finally strike, in the form of two things that changed my mind and convinced me that there was an audience for this post, two things that happened very close together in time:

I learned that huge numbers of people were excited to hear about how somebody had optimized their arrangement of iPhone app icons on the Cortex podcast. This was a completely standard iPhone, running unmodified, not-even-jailbroken iOS – perhaps the least customizable, least interesting consumer operating system out there. If huge numbers of people were interested in how icons are arranged on iOS, and how that can be optimized for productivity and to match someone’s brain, people will definitely be interested in how I use my computers, which do not even use a normal user interface for Linux and are extremely customized to how I think.
Several friends of mine in rapid succession thought that my computer interface was worthy of comment to me or to others as a way of characterizing me. One friend even said, when I showed her how a few vim commands worked, that she understood why I used this for my organization files.

So I’ll start by taking a screenshot of how my desktop looks right now, literally as I write this, to use as a conversational starting point:

I know I’ve shown some screenshots in my last post, but this time we’re going to discuss it in some more detail.

It looks very … computer-y. Very low-level. Very much as if I’m doing programming, even though I’m actually doing blogging.

It’s not just the presence of the terminal, either, though just using a command line is considered to be advanced or even programmer-level computer usage these days. It’s the whole aesthetic. There’s no window decorations on either the left side of the screen where I’m editing my post, nor on the right side of the screen where I’m having a command line session – that is to say, no title bar, no minimize-maximize-close icons, no menu bars. Clearly, if I want to save the file I’m working on, I can’t go up to the menu and click File -> Save. And, actually, there don’t seem to be any places designed for clicking at all.

Along the top, instead of a start menu or a system menu or a dock of application launchers, I have a bunch of status information, formatted in such a way so that you have to know what you’re reading to understand it: one number highlighted out of several; the word Tall; jim@palatinate: ~/Writing/TheCodedMessage/conte..., which is the same text as my prompt in the terminal, and indicates who I am, what computer I’m logged into, and what directory I’m currently in (in the currently highlighted terminal). Then, what WiFi I’m connected to, my CPU percentage, memory usage, date and time, and battery status.

There’s not a single icon among these status indicators – it’s just a long line of text. Text, that goes well with the text of the blog post I’m editing and the text of the command line. I can see why sometimes friends refer to my computer interface as “not logged into a graphical environment” or “in text mode” or “in command line only mode” – even though that is actually a thing, and is literally not the situation my computer is in.

It’s a modern graphical login session! Here’s me using a web browser if you don’t believe me (Chromium is off-brand Chrome, made from the same source code):

And of course, I can also view videos with VLC or look at pictures with Eye of GNOME (yes, I can use GNOME components even though I don’t use the GNOME desktop environment), and in literal text mode, that wouldn’t be possible.

But I understand why people call my set-up text mode, and now that I’m paying attention, I see that in a very literal sense, there aren’t any images or icons at all on my screen right now, just text in various colors. That is an intentional choice, and how I like it, and it does have to do with me being a programmer (in at least being aware of my options and capable of configuring it), so fair.

So what is going on? Why does my computer look so text-y, even if it’s not technically text-mode?

xmonad

To be clear, my set-up is not typical of how Linux computers normally look. On Linux, you get your choice of desktop interface, of what software draws things like window borders and docks and start menus. Usually, people use ones like GNOME or KDE (or dozens of others), which look much more like macOS or Windows, with a normal amount of icons, and sometimes even futuristic, overly dynamic graphics. Here’s a screenshot of KDE from Wikimedia Commons to demonstrate:

But I instead chose xmonad, which is designed for things like minimalism, deep configurability, and keyboard control – and in general designed almost exactly for my priorities. My XMonad set-up is not that weird, for an XMonad set-up. Like any XMonad set-up, however, it is deeply customized to my particular workflow.

But before we get into my customizations and use of it, I’d like to talk a bit about why I prefer XMonad to other, more traditional desktop environments. It’s not to be weird or to show off my technical skills or even to communicate that I’m a programmer and a nerd – I actually don’t very much like that people think I’m programming when I’m actually working on a writing project, nor do I like that other people find borrowing my laptop intimidating. Instead, it’s about adapting to what I feel comfortable with, and what works well with how my brain works.

So the lack of distractions, the lack of icons, is actually very important to helping me focus, as is the simplicity of the interface. My ADHD doesn’t manifest by having my eyes be regularly pulled away to where the icons are because they’re pretty – or at least, if it does I’m not aware of it. But if there is a dock of icons on the screen, my awareness that the dock is there can be a distraction to me, taking up precious space in my brain of very limited short-term memory that could be better served juggling the other things going on in the computer. This distraction even happens on macOS, even when the dock is hidden – I have to be aware of it so I know not to move my mouse to the bottom of the screen, or that if I do, I will suddenly see icons.

The title bars that typically line the top of windows are such a distraction, as are the menu bars (with File, Edit, etc.) that give you a list of things to do. If I were designing an operating system UI from scratch – which I have often fantasized about – the menu would show up as an overlay on top of the window when you pressed the [ALT] key, and a list of available keyboard shortcuts would show up when you pressed and released [CTRL], reminding you that paste, for example, is Ctrl-V.

Back in real life, I also don’t have menu bars on my machine for my most commonly used apps. But the replacement, unfortunately, isn’t an overlay, but simply knowing the relevant commands for both gvim and terminal, both literal commands, and keyboard and mouse gestures, like Ctrl-D to log out or middle-click to paste the last thing you highlighted – because I find Ctrl-C/Ctrl-V too tedious and prefer copy-and-paste through the “secondary clipboard” Linux supports: highlight and middle mouse click, or three fingers on my laptop trackpad.

The streamlined simplicity allows me to just see the text of the actual app I’m using. It reminds me of math textbooks. I prefer math textbooks that just are about math. I saw a math textbook for the high school level once that was full of pictures of youths doing math, very visually busy, lots of stuff going on. I thought to myself, I don’t know how long I could read this book, not because I would jump from thing to thing, but because I would try to extract the actual math out of it, and filtering out the rest would be well-nigh impossible, and quite fatiguing.

Thus, xmonad lets me choose exactly what goes on the screen. Even xmobar, the system status bar across the top with all that status information is optional – you can make it so that it appears and disappears based on a keyboard shortcut, or leave it out altogether. And certainly, no panel of icons – if I want to start a program, I have a keyboard combination to start the terminal, another to start the browser, and another to type in the name of a program I want to run (which I could also do, of course, from the terminal). The iOS equivalent would be to have one icon for Safari, and besides that to literally always use search to find your app, with no icons visible and an empty home screen.

One thing that I like about iOS, however, is also true of xmonad: when you start a program it takes up the entire screen. For the life of me, I don’t understand what I ever saw in having different windows that could overlap on your desktop. What were you doing with the empty space? Why was it so essential to be able to arrange the screen any way with enough work? Isn’t it more important to be able to have the screen in the configuration you want consistently?

In macOS, if I want a window to be full screen, that’s easy enough – but it’s still not the default, even if it’s the only window. However, if I want multiple windows to be tiled, then I have to do so many steps. The cost of the flexibility of freely moving window arrangements around is that the one I do want is harder.

In xmonad, when I open a window, it takes up the whole screen. If I open a second window, they split the screen. I can use key combinations to adjust which one is on which side, or to switch from left-right tiling to top-bottom tiling, or to move the dividing bar left or right, but most of the time I can just immediately use it.

I can also use a key combination (⌘-TAB – it would be ALT, but I have ⌘ generally configured to replace ALT) to switch which window is focused, but I usually use the mouse for that. I have focus-follows-mouse enabled, so I don’t actually have to click the mouse before I can start typing in the newly-focused window.

If I open a third window, then, it works perfectly how I like it: arranged so I can see all three:

More than three windows is similar to three – but I don’t let that happen normally. I stick to three windows per screen, or specifically, per virtual desktop.

Virtual Desktops

Virtual desktops are a key component of how I use my computer. macOS has the feature as well, described by the less techie-sounding name of spaces (and it appears that in that context, it’s also pretty easy to set up split-screening, which is good news). Virtual desktops are like having multiple full-screen windows that you switch between, except that each virtual desktop can have multiple windows on it. In my context, it means I never have more than three windows on a screen at a time, but I have multiple sets of three windows that go together that I can switch between, in my case indexed by number.

If I want to go to virtual desktop 1, I press ⌘-1 (where ⌘ is the command or logo key, a Windows™ logo on my keyboard even though I bought this computer from Dell™ with Linux™ pre-installed). To go to virtual desktop 3, I press ⌘-3. The currently available virtual desktops are shown on my status bar, with the currently showing one highlighted in yellow – if they weren’t, I would probably have forgotten about windows left in other virtual desktops when I first started using them. In the screenshot above of three windows, you can see that I am working in desktop 4. There are also windows on desktops 1, 2, and 5, but none on desktop 3, which is why there is no 3 shown. They go up to 9, or at least 9 that are accessible by that keyboard short-cut in my current configuration.

If I want to move a window from one virtual desktop to another, I just need to type ⌘-Shift-N while hovering my mouse over the window, where N is the desktop I want to move it to. Sometimes, the windows come out in the wrong arrangement on the new desktop, but I can use ⌘-Enter to switch them.

Virtual desktops are key to my workflow and my focus, because each one corresponds to a mode of using my computer, a type of action. I can switch between them, but while I’m within one, the only indication that others are available is up in the status bar.

I use specific virtual desktops for specific tasks on a permanent basis. When I have not recently been doing the task, there might be no windows in them, but when I want to do that task, I switch to that virtual desktop and start windows there. This keeps information about what is where in my long-term memory, as a fact about how my system works, rather than in my prospective memory, which as I’ve discussed is far more problematic.

To be specific, this is what I use each virtual desktop for:

Desktop 1: Browsing

Desktop 1 is a full-screen browser session. If you look at my web screenshot (also displayed above), you see that I am on desktop 1.

This is the only place I put a web browser window; you don’t need more than one because of tabs. I will occasionally also move a terminal or editor window to this desktop, if I need to type something into the terminal directly from a web browser, or manually retype text based off of what I’m reading there, but this is rare. Similarly, I will occasionally split-screen two web browser windows for the same reason – but only for as long as I need to see both pages at once.

I don’t use tabs as heavily as some people. I don’t relate to the ADHD person with hundreds of tabs open. I generally have Slack, e-mail, and then whatever exact thing I’m using the web browser for. If this is programming, and I’m reading documentation or troubleshooting an issue, that might be multiple tabs deep (e.g. of different but related documentation, or of documentation and source). And occasionally I’ll absent-mindedly find myself going on a tangent. But besides Slack, and sometimes e-mail, I close the relevant tabs as soon as I’m done doing the task – tabs are transient.

When I do read documentation from the web to write code, I do fully switch desktop environments as I write the code vs reading the documentation.

I don’t like that this is how I access my e-mail. I would prefer to have it set up with a TUI-based system, while still syncing with the GMail app on my phone. I know I can do that – I’ve done it before – but I simply haven’t gotten around to it.

One final note: To help me maintain focus, I do have a blacklist of websites I don’t let myself go to, implemented through /etc/hosts. This doesn’t actually restrict me, because I can always go to those websites on my “unproductive” computer (mostly for Netflix), or on my phone. They do, however, prevent me from going off the rails and drifting into a Reddit rabbit-hole when I’m supposed to be working. I can always unblock a website if I (temporarily or permanently) do need to access it from one of my primary computers.

Here’s the blacklist, all the domain names that my computer resolves as referring to localhost, my local computer, rather than the actual IP address of my server. Here’s all the websites the browser will therefore fail to connect to:

127.0.0.1       facebook.com
127.0.0.1       www.facebook.com
127.0.0.1       quora.com
127.0.0.1       www.quora.com
127.0.0.1       twitter.com
127.0.0.1       www.twitter.com
127.0.0.1       news.google.com
127.0.0.1       etrade.com
127.0.0.1       us.etrade.com
127.0.0.1       www.etrade.com
127.0.0.1       reddit.com
127.0.0.1       www.reddit.com
127.0.0.1       news.ycombinator.com

Desktop 2: Coding (Primary)

This is where I look at and edit files in the repo and project that I’m currently working on. I have a terminal open to the project directory, and normally two gvim windows – gvim is my preferred text editor – open to files within that project. The large full-height space is for the file I’m editing, the smaller space above the terminal for a file I’m referring to, but within the same project. If I want to edit the other file instead, I switch them so that gvim window is the new tall one – there’s a keyboard shortcut for that. The terminal stays on the right, and in the case of multiple windows on the right, the terminal stays as the lowest.

I continuously open and close new gvim windows, which is part of why I use gvim – it loads fast enough for this to be a viable strategy.

Desktop 3: Coding (Secondary)

Sometimes, when you’re working on a project, you need to know how something’s done in a different project. Perhaps you need to know an implementation detail of a function you’re calling, or maybe just the interface. Perhaps you know the other project did the thing you’re trying to do, and you need to see how they did it. Perhaps you suddenly realized you can’t have X dependency, and now you need to know if Y depends on X.

Sometimes this is a different internal project, sometimes it’s an open source project you need to download off GitHub. But it’s a different repo, with a working copy in a different directory, and that means that I have a different virtual desktop for it, with a different terminal in that directory.

There, I mostly do reading, but I can also do editing in a pinch. For example, if I need to make a change that straddles two repos, the application (for example) will often be in desktop 2 and the library in desktop 3. If it straddles 3 repos, I either switch which repo desktop 3 is used for (it is only used for one at a time) or I spill over to desktop 4 as a tertiary coding repo, as a non-standard use of that desktop. I usually feel vaguely uncomfortable when I do that, though.

Desktop 4: Blogging

I’m on desktop 4 right now as I’m writing this, because that is the desktop I use for blogging – and most other forms of prose writing (though specifically not documentation for work, which counts normally as part of a coding project).

I blog just as I program. I use gvim to edit text files. I use a terminal to open the right text files, list which text files are present, keep git up-to-date with what I’m working on, and build and deploy my blog. In this case, by “build,” I mean translate it from a directory full of Markdown files into a website, which I then upload to my server.

Here’s a screenshot of me editing the markdown for this post in the left window, and trying and failing to run my build-and-upload script in the other folder (which refused to upload as I hadn’t synchronized my files with GitHub yet):

I prefer editing my blog as a bunch of plain text files on my computer. It gives me a sense of control that I would not get if I installed Wordpress on my server – or used the official Wordpress. It allows me to use gvim to edit them as plain text, which I refer to WYSIWYG editing.

Generally, I’m only working on one file and so I have a terminal window and single solitary gvim window, rather than two or three gvim windows. It only makes sense to work on one file at a time in writing normally, unlike in programming where there’s intricate mutual references. Occasionally, for a blog series like this, I will open a previous part of the blog series to see how much I’m repeating myself.

Desktop 5: Organization

You might notice, however, that in none of the other desktops do I describe having any of my organizational files open. I have detailed organizational files, which I edit in gvim, and discussed in detail in the previous post. And as you can see in the sample screenshot from that post (reposted here), this organizational system lives entirely on desktop 5:

I do not have the complete list of things I have to do hanging over me while I’m doing each thing, only when I’m planning. Instead, when I reach the end of whatever I’m working on – or, as often happens, when I find I’ve generated a new TODO item that I want to write down but not yet fully switch my focus to – then I switch to virtual desktop 5 to interact with my TODO system. When I switch back, with a new task or with the idea safely written down, I can then (more) fully focus on my task without worrying about other ones.

Desktop 6: Signal

When I run the desktop version of signal, which I do sometimes, it runs on its own virtual desktop, namely desktop 6.

Desktop 7: Long-running processes

This is where I put VPN sessions, if they’re tied to a terminal window. It’s also where I put some very long-running builds or locally hosted servers.

Editor and Terminal

I’ve already discussed in a previous session how I use the web browser. Occasionally, I use a variety of other random graphical programs: an image viewer, a PDF viewer, or a video player. But most often, the two types of windows I have open besides the web browser are gvim, a text editor, and alacritty, a terminal emulator.

Both of these tools are primarily used by computer professionals of some stripe, so it’s a little unfair of me to bristle when people see them – also without any icons on the screen – and assume I am a programmer. I do have specific reasons for using them for non-programming tasks, that match my habits well, so I’d like to discuss them further.

Both of them are tools that require substantial investment in skill. Obviously, to use a terminal, you have to know commands. You can’t discover the interface like you can with a series of menus, or settings pages, or icons. Similarly but less obviously, gvim, like any version of vim, is close to useless to anyone who doesn’t know it. Both of them require reading documentation in the form of a book (or website) to explain to you what to do and at least get you started.

But I did all of that investment years ago, as a youth, and it’s been paying off ever since – to the point where if I try to edit text, or navigate file systems, without these tools, I feel substantially hindered.

Vim

I start with gvim because it’s the more relevant to my organizational particularities. It’s a text editor, which means that unlike something like Google Docs or Microsoft Word, it edits plain text files, files that just have sequences of characters organized into a sequence of lines. Characters can include Unicode – including accented letters, Chinese characters and emojis – but not styling like bold and italics.

Text editors are important to programmers because programming is done via collections of plain text files, and so text editors are universally useful tools for handling all of them. Rather than each programming language having its own special file format requiring its own special editor, text files allow programmers to bring their preferred text editors with them to a variety of projects, thus allowing a deeper investment in the skill of using the text editor.

Even this blog, which is not a programming project but a writing project, is maintained using text files, using Markdown, a format which interprets *italics* as italics and **bold** as bold, and Hugo, a software package that converts a hierarchy of Markdown-formatted plain text files appropriately into a website. And for Markdown, just as for any programming language, I can choose any text editor I want to, and it will be compatible.

This choice, the choice of text editor, can be greatly personal to a programmer. The rivalry between two major text editors from earlier eras of Unix, vi and emacs, was often referred to as a holy war for how intense the fights about it would get on Usenet (an old discussion forum that ran on an old pre-Internet network). gvim, which is the text editor I use, is a form of vim, which is a form of vi, so I have a definite position in that holy war. And I’m sure I’m going to hear from people who disagree with my position in response to this blog post!

While my gvim window looks like a terminal window – and vim can indeed run inside of a terminal – it’s actually a separate graphical application. That is what the initial g stands for, “graphical.” When I edit a file, I want a new window to be opened, and I also want to be able to use the mouse to click on a location on the screen and move my cursor there.

vim, like many of the tools I use, is optimized for expert use, rather than discoverability by beginners. It’s designed to be a skill to be invested in: I put in the effort to learn how to use it a long time ago, and it pays off over a lifetime. The commands I can make from my keyboard are more powerful than most computer text editing facilities can support, allowing me to with a few keystrokes perform complex manipulations of the text.

This is essential, in my mind, for efficient programming, which is why I put the effort in to learn it. However, it is also particularly well-suited to my organizational files, which, if you remember from my previous post, consists of plain text files with lots of highly-nested bulleted lists, like this outline for this section of the post:

* gvim
    * Text editor
        * Plain text and website generation
    * vim
        * But not terminal vim
        * Still has separate "window"
        * And can use mouse if necessary
    * Line-based editing good for organization
        * Commands work on lines
            * Delete
            * Paste last delete
            * Select multiple
            * Shift indentation level
        * Org-mode style use of hierarchical bullet points
            * Perfect match for those commands
            * No notes longer than a line
                * Make it more hierarchical instead

When I edit plain text files in this format – a custom habit inspired by Org mode but still compatible with Markdown – it’s important for me to be able to operate on the scale of entire lines. And operating on entire lines is one of vim’s strongest points! dd to remove a line, p to insert the line back in, and relevantly for hierarchical bullet points, << and >> to change indentation! Using V, I can select multiple lines, and then use <, >, or d to change indentation or move them! Meanwhile, j, and k, right on the home row, move down and up through the file, line by line, respectively.

This equates to removing tasks (when they’re done or no longer wanted), moving tasks between different places in the hierarchy (which I do shockingly often), removing or adding levels of hierarchy, and other such common operations on a hierarchical list.

Now, you may wonder how, if typing dd deletes a line, how I type a literal dd. Well, dd deletes a line in normal mode, but if you type o, it opens up a new line in insert mode, so that your letters are interpreted as letters again – until you are done inserting what you had to insert, and hit [ESC] to return to normal mode.

One of the ways you can tell you’re a proficient vim user is if you keep the system in normal mode any time you are not literally typing. Typing tends to be bursty anyway, and evenly interspersed with editing and navigating – at least in programming, and in my use case, also with writing.

But it is hard for a newbie. Every once in a while, even I find myself inserting an editing command as text by accident, or running random commands trying to type text while I’m actually in normal mode. When you’re new to vim this happens all the time. It’s decidedly not beginner-friendly.

But most of your time at a text editor – especially if you’re a programmer – you won’t be a beginner. And for me, I’m extremely used to it – and frustrated when I have to write text into a non-vim interface like Google Docs, or an especially long Slack message. That, and, I do revise just as often, if not more often, to how much additional text I type – I need those commands, and the ones I listed are only a brief sample.

Terminal/Command Line

This is probably the most interesting thing to many of my readers. Many readers my age or older remember DOS and the DOS prompt, and having to use the computer from the command line. For some of them, the only commands they knew were those to launch their games, or to launch other tools from which they would do their real work – the command line was fundamentally just a launcher, a menu, albeit one that didn’t list the options. Others may have simply used it to launch Microsoft Windows, by typing the win command, a usage pattern so common that Microsoft made it the premise of Windows 95, and skipped the whole “DOS” step, even though it was still present as a weird operating system layer and as a boot stage until Windows XP finally rolled out a modern Windows.

So I have some misconceptions to address about the command line that come from that perspective.

First, a modern command line is not DOS in a window. It’s certainly not on Linux or macOS, where it’s more visibly different, but it isn’t even DOS in modern Windows. The Windows command line might look like the DOS command line, with its famous prompt C:\>, but it is a modern Windows application that is used to launch modern Windows applications. No DOS involved, just a different interface mode.

On a related note, the command line, even on Windows but especially on macOS or Linux, is a modern user interface. It can do things that involve the Internet. It can make web requests, download and send e-mail, synchronize files, and do things that DOS couldn’t do.

However, on the flip side, it is not true that the command line can do everything a graphical user interface can do. It’s comparable, but it’s simply not identical, as should be obvious if you realize that it’s impossible to watch a video from the command line. You can use the command line to launch a video player, but the video player remains graphical.

And while it is true that the command line allows you more control over the operating system settings and file system, this is more an accident of graphical user interfaces trying to be “user friendly” or having limited room for options, rather than anything intrinsic. You may have heard of graphical user interfaces described as a layer or façade on top of the “underlying” command line, but that is a misconception. Graphical programs and command line programs have the same access to operating system facilities, except for user interface.

The command line does, however, have a more power user-friendly aesthetic. Like vim, it requires investment to use effectively – to use at all. And it is closer to the operating system in that by convention, it exposes as much control of it as possible, and its conventions were established in the 70s, before the modern concept of user-friendliness was really invented. This has been written about at length in many places, and one of my favorite (book-length) essays about it is Neal Stephenson’s “In the Beginning was the Command Line”.

Enough about what the command line is (and isn’t)! What do I actually use the command line for, then?

Well, the command line is an entire interface into the computer, used by many programs and utilities as the way to interact with them. And I do use it for basically all of the things I do on the computer that aren’t web browsing, text editing, or viewing various graphical-only files (like PDFs, images, or videos), and there’s some variety there.

Primarily, I use the command line for file management. I use the classic Unix tools for listing files (ls for list) and navigating directory hierarchies (cd for change directory). I use git to sync code and writing across computers and make sure it’s backed up somewhere. I use wc (for word count) to see how many lines of code or words of writing I’ve written. I use bc (basic calculator) to do back-of-the-envelope math.

I prefer this to graphical file managers. Not only do I not trust them – I’ve seen Finder crash relatively recently – they change all the time. And the changes are not good, and usually serve to hide the actual directory hierarchy and instead impose an organizational system on you. Instead of seeing directories inside your home directory, you see stuff like “Music” and “Downloads,” “Documents and “Movies.”

Usually, when I use a graphical file manager, I know where the directory is in the file system, but then I have to translate it to their list of commonly used directories, which assumes I keep loads of movies and photos on my computer, but can have all my “documents,” whether legal documents or writing projects, in one directory? Where is my home directory? What if I want to organize my files in a different hierarchy? Can I just navigate to it from my home directory, please? If you want to put fancy icons on subdirectories of my home directory based on their names, that’s fine, but please list all the directories within my home directory, thank you very much! Not just the pre-defined things you think I ought to have, like “Music” – this is my work computer, I listen to music on my phone.

So you can see why I prefer the straight-forwardness of the command line to Finder or Windows Explorer.

I also use the command line to actually do writing and programming work, not just launching gvim – once I’ve navigated to the file in my complicated directory system – but also running compilers and build scripts to turn program source code into programs, and then running those programs, almost all of which can be controlled entirely from the command line. I log into other computers I maintain, both embedded devices and servers, and do work on them. I run scripts that run hugo to turn my Markdown files into a website and post it on a server.

I also use it for system administration: apt for installing files (I use Ubuntu – I’m not trying to be a hero of sysadmin) and systemctl and all of those gnarly commands for other sysadmin stuff. But of course, the most powerful system administration command is just the text editor – by editing configuraton files, you can accomplish a lot.

All of this is easier and more focused than if I were using the graphical equivalent. I write my command and I run it, without having to go through all the tedious boring steps of a GUI wizard. It’s faster with fewer steps, with the penalty of accumulated life expertise – which is to say it’s easy on my perspective memory at the expense of my retrospective memory, which is to say, aligned to how my brain works.

And yes, I do occasionally have to look up how to do things – though that’s more in programming than in writing. But having a graphical user interface doesn’t save you from that, and if you think it does, you’re fooling yourself. At least when I look up how to do things, I get suggestions for commands I can directly type in, rather than having to go through 10 screens and dialog boxes and search them for whatever it is the poster’s talking about, only to find out I’m using a different version of the GUI, and that the directions became obsolete in the 2022 edition of Windows 10, or some other such thing.

To reiterate, it turns out that a deep enough hierarchy of dialog boxes and settings pages is just as complicated as the command line – but usually less powerful, harder to document, and more subject to arbitrary change. Just give me the command line!

Conclusion

If I were to summarize some themes of my user interface decisions, it would be in these three inter-related points:

Don’t use condescending, corporatist concepts of “easy to use,” because they’re more focused on the appearance of ease of use, or most charitably stated, not intimidating the user, rather than actually making it usable for an expert user for a wide variety of actual tasks.
Use systems that emphasize the long term power user over the short term newbie. They will often have a learning curve, but it will pay off.
Use systems that are customizable, so that I can use them my way.

But this is all for my work computer, where work is both writing/blogging and programming. For goofing off, I have a MacBook Air M1, which I use in macOS as a glorified tablet, and that is perfectly fine for watching Netflix and YouTube.

Rust Is Beyond Object-Oriented, Part 2: Polymorphism

2023-02-07T00:00:00+00:00

In this post, I continue my series on how Rust differs from the traditional object-oriented programming paradigm by discussing the second of the three traditional pillars of OOP: polymorphism.

Polymorphism is an especially big topic in object-oriented programming, perhaps the most important of its three pillars. Several books could be (and have been) written on what polymorphism is, how various programming languages have implemented it (both within the OOP world and outside of it – yes, polymorphism exists outside of OOP), how to use it effectively, and when not to use it. Books could be written on how to use the Rust version of it alone.

Unfortunately this is just a blog post, so I cannot cover polymorphism in as much detail or variety as I want to. I shall instead focus specifically on how Rust differs from the OOP conceptualization. I will start by describing how it works in OOP, and then discuss how to accomplish the same goals in Rust.

In OOP, polymorphism is everything. It tries to take all decision-making (or as much decision-making as possible) and unite it in a common narrow mechanism: run-time polymorphism. But unfortunately, it’s not just any run-time polymorphism, but a specific, narrow form of run-time polymorphism, constrained by OOP philosophy and by details of how the implementations typically work:

It requires indirection: Every object must typically be stored on the heap for run-time polymorphism to work, as the different “run-time types” have different sizes. This encourages the aliasing of mutable objects. Not only that, but to actually call a method, it must go through three layers of indirection: dereferencing the object reference, then dereferencing the class pointer or “vtable” pointer, and then doing an indirect function call.
It precludes optimization: Beyond the intrinsic cost of an indirect function call, the fact that the call is indirect means that inlining is impossible. Often, the polymorphic methods are small or even trivial, such as returning a constant, setting a field, or re-arranging the parameters and calling another method, so inlining would be useful. Inlining is also important to allow optimizations to cross the inlining boundary.
It is polymorphic in one parameter only: The special receiver parameter, called self or this, is the only parameter through which run-time polymorphism is typically possible. Polymorphism on other parameters can be simulated with helper methods in those types, which is awkward, and return-type polymorphism is impossible.
Each value is independently polymorphic: In run-time polymorphism, there is often no way to say that all the elements of a collection are of some type T that all implement the same interface, or to say that two parameters to a function are the same type but what that type is should be determined at run-time.
It is entangled with other OOP features: In C++, runtime polymorphism is tightly coupled with inheritance. In many OOP programming languages, it is only available for class types, which as I discussed in my previous post are a constrained form of modules.

I could write an entire blog post about each of these constraints – perhaps I will someday.

But in spite of all these constraints, it is seen as the preferred way of doing decision-making in OOP languages, and as especially intuitive and accesible. Programmers are trained to reach for this tool whenever feasible, whether or not it is the best tool for the decision at hand, even if there is no current need for it to be a run-time decision. Some programming languages, such as Smalltalk, even collapsed “if-then” logic and loops into this one oddly specific decision-making structure, implementing them via polymorphic methods like ifTrue:ifFalse that would be implemented differently in the True and False classes (and therefore on the true and false objects).

To be clear, having a mechanism of vtable-based runtime polymorphism isn’t a bad thing per se – Rust even has one (similar, but not quite identical, to the OOP version described above). But the Rust version is used in the relatively rare situations where that mechanism is the best fit, among a whole palette of mechanisms. In OOP, the elevation of this tightly constrained and unperformant form of decision making above all others, and the philosophical assertion that using it is the best way and most intuitive way to express program flow and business logic, is a problem.

It turns out that programming is much more ergonomic when you choose the tool most appropriate for the situation at hand – and OOP run-time polymorphism is only occasionally the actual tool for the jobs it is often asked to do.

So let’s look at 4 alternatives in Rust that can be used when OOP uses run-time polymorphism.

Alternative #0: `enum`

Not only are there other forms of polymorphism that have strictly fewer constraints (such as Haskell’s typeclasses) or a different set of trade-offs (such as Rust’s traits, heavily based on Haskell typeclasses), there is another decision-making systems in Rust and Haskell, namely algebraic data types (ADTs), or sum types, that also take over many of the applications of OOP-style polymorphism.

In Rust, these are known as enums. enums in many programming language are lists of constants to be stored in integer-sized types, sometimes implemented in a typesafe fashion (like in Java), sometimes not (like in C), sometimes with either option available (like in C++ with the distinction between enum and enum class).

Rust enums support this familiar use case, with type-safety:

pub enum Visibility {
    Visible,
    Invisible,
}

But they also support additional fields associated with each option, creating what in type theory is known as a “sum type,” but it is better known among C or C++ programmers as a “tagged union” – the difference being that in Rust, the compiler is aware of and enforces the tag. Here’s some examples of some enum declarations:

pub enum UserId {
    Username(String),
    Anonymous(IpAddress),
    // ^^ This isn't supposed to be a real network type,
    // just an example.
}

let user1 = UserId::Username("foo".to_string());
let user2 = UserId::Anonymous(parse_ip("127.0.0.1")?);

pub enum HostIdentifier {
    Dns(DomainName),
    Ipv4Addr(Ipv4Addr),
    Ipv6Addr(Ipv6Addr),
}

pub enum Location {
    Nowhere,
    Address(Address),
    Coordinates {
        lat: f64,
        long: f64,
    }
}

let loc1 = Location::Nowhere;
let loc2 = Location::Coordinates {
    lat: 80.0,
    long: 40.0,
};

What do these tagged unions have to do with polymorphism, you may ask? Well, most OOP languages don’t have good syntax for these sum types, but they do have powerful mechanisms for run-time polymorphism, and so you’ll see run-time polymorphism used for situations where Rust enums would actually be just as well-suited (and I will argue, better suited): when there’s a few options for how to store a value, but those options contain different details.

For example, here’s one way to represent the UserId type in Java using inheritance and run-time polymorphism – how I would’ve done it when I was a student (putting each class in a different file):

class UserId {
}

class Username extends UserId {
    private String username;
    public Username(String username) {
        this.username = username;
    }

    // ... getters, setters, etc.
}

class AnonymousUser extends UserId {
    private Ipv4Address ipAddress;
    
    // ... constructor, getters, setters, etc.
}

UserId user1 = new Username("foo");
UserId user2 = new AnonymousUser(new Ipv4Address("127.0.0.1"));

Importantly, just as in the enum example, we can put user1 and user2 in variables of the same type, and can pass them to the same kinds of functions, and in general do the same operations on them.

Now, these OOP-style classes look super-light to the point of being silly, but that’s mostly because we haven’t added any real operational code to this situation – just data and structure and a bit of variable definitions and boilerplate. Let’s consider what happens if we actually do anything with user IDs.

For example, we might want to determine whether they’re an administrator. In our hypothetical, let’s say anonymous users are never administrators, and users with usernames are only administrators if the username begins with the string admin_.

The doctrinally approved object-oriented way of doing that is to add a method, e.g. isAdministrator. In order for this method to work, we have to add it to all three classes, the base class and the two child classes:

class UserId {
    // ...
    public abstract bool isAdministrator();
}

class Username extends UserId {
    // ...
    public bool isAdministrator() {
        return username.startsWith("admin_");
    }
}

class AnonymousUser extends UserId {
    // ...
    public bool isAdminstrator() {
        return false;
    }
}

So, in order to add this simple operation, this simple capability to this type in Java, we have to go to three classes, which will be stored in three files. Each of them contains a method that does something simple, but nowhere can the entire logic be seen of who is and isn’t an administrator – something that someone might naturally ask.

Rust would use match for such an operation, putting all the information about it in one place:

fn is_administrator(user: &UserId) -> bool {
    match user {
        UserId::Username(name) => name.starts_with("admin_"),
        UserId::AnonymousUser(_) => false,
    }
}

This yields a more complicated individual function, but it has all the logic explicitly right there. Having the logic be explicit, instead of implicit in an inheritance hierarchy, cuts against an OOP precept where methods should be simple and polymorphism used to express the logic implicitly. But that doesn’t help guarantee anything, just sweeps it under the rug: It turns out that hiding the complexity makes it harder to grapple with, not easier.

Let’s go through another example. We’ve had this UserId code for a while, and you’re tasked with writing a new web front-end for this system. You need some way of displaying the user information in HTML, either a link to a user profile (in the case of a named user) or a stringification of the IP address in red (in the case of an anonymous user). So you decide to add a new operation for this small family of types, toHTML, which outputs your new front-end’s specialized DOM type. (Maybe the Java’s compiled to WebAssembly, I’m not sure. The details don’t matter.)

You submit a pull request to the maintainer of the UserId class hierarchy, deep in a core library of the backend. And then they reject it.

They have pretty good reasons, actually, you grudgingly admit. They’re saying it’s an absurd separation of concerns. Besides, the company can’t have this core library handling types from your front-end.

So, you sigh, and write the equivalent of a Rust match expression, but in Java (please pardon my absurd hypothetical HTML library):

Html userIdToHtml(UserId userId) {
    if (userId instanceof Username) {
        Username username = (Username)userId;
        String usernameString = username.getUsername();
        Url url = ProfileHandler.getProfileForUsername(usernameString);
        return Link.createTextLink(url, username.getUsername());
    } else if (userId instanceof AnonymousUser) {
        AnonymousUser anonymousUser = (AnonymousUser)userId;
        return Span.createColoredText(anonymousUser.getIp().formatString(), "red");
    } else {
        throw new RuntimeException("IDK, man");
    }
}

And this code your boss rejects upon code review, saying you used the instanceof anti-pattern, but then later they grudgingly accept it after you make them argue with the maintainer of the core library that wouldn’t accept your other patch.

But look at how ugly that instanceof code is! No wonder Java programmers consider it an anti-pattern! But in this situation, it’s the most reasonable thing, really the only possible thing besides implementing the observer pattern or the visitor pattern or something else that just amounts to infrastructure to fake an instanceof with inversion of control.

Having operations implemented by adding a method to every subclass makes sense when the set of operations is bounded (or close to it) and the number of subclasses of the class might grow in unanticipated ways. But just as often, the number of operations will grow in unanticipated ways, while the number of subclasses is bounded (or close to it).

For the latter situation, which is more common than OOP advocates would imagine, Rust enums – and sum types in general – are perfect. Once you’ve gotten used to them, you find yourself using them all the time.

I will say for the record that it isn’t this bad in all object-oriented programming languages. In some, you can write arbitrary class-method combinations in any order, and so you could write all three implementations in one place if you so chose. Smalltalk traditionally lets you navigate the codebase in a special browser, where you can see either a list of methods implemented by a class, or a list of classes that accept a given “message,” as Smalltalk calls it, so you can have your cake and eat it too.

Alternative #1: Closures

Sometimes, an OOP interface or polymorphic decision only involves one actual operation. In such a situation, a closure can just be used instead.

I don’t want to spend too much time on this, because most OOP programmers are already aware of this, and have been since their OOP languages have caught up with functional languages and gotten syntax for lambdas – Java in Java 8, C++ in C++11. Silly one-method interfaces like Java’s Comparator are therefore – fortunately – mostly a thing of the past.

Also, closures in Rust technically involve traits, and so are implemented using the same mechanism as the next two alternatives, so one could also argue that this isn’t really a separate option in Rust. In my mind, however, lambdas, closures, and the FnMut/FnOnce/Fn traits are special enough aesthetically and situationally that it deserved a little bit of time.

And so I’ll take the little bit of time to just say this: If you find yourself writing a trait (or a Java interface or a C++ class) with exactly one method, please consider whether you should instead be using some sort of closure or lambda type. Only you can prevent overengineering.

Alternative #2: Polymorphism with Traits

Just like Rust has a version of encapsulation more flexible and more powerful than the OOP notion of classes, as I discuss in the previous post, Rust has a more powerful version of polymorphism than OOP posits: traits.

Traits are like interfaces from Java (or an all-abstract superclass in C++), but without most of the constraints that I discuss at the beginning of the blog post. They have neither the semantic constraints or the performance constraints. Traits are heavily inspired in semantics and principle by Haskell’s typeclasses, and in syntax and implementation by C++’s templates. C++ programmers can think of them as templates with concepts (except done right, baked into the programming language from the get-go, and without having to deal with all the code that doesn’t use it).

Let’s start with the semantics: What can you do with traits that you can’t do with pure OOP, even if you throw all the indirection in the world at it? Well, in pure OOP terms, there’s no way you can write an interface like Rust Eq and Ord, given greatly oversimplified definitions here (the real definitions of Eq and Ord extend other classes that allow partial equivalence and orderings between different types, but like these simplified definitions, the Rust standard library version of non-partial Eq and Ord do cover equivalence and ordering between values of the same type):

trait Eq {
    fn eq(self, other: &Self) -> bool;
}

pub enum Ordering {
    Less,
    Equal,
    Greater,
}

trait Ord: Eq {
    fn cmp(&self, other: &Self) -> Ordering;
}

See what’s happening? Like in an OOP-style interface, the methods take a “receiver” type, a self parameter, of the Self type – that is, of whatever concrete type implements the trait (technically here a reference to Self or &Self). But unlike in an OOP-style interface, they also take another argument of &Self type. In order to implement Eq and Ord, a type T provides a function that takes two references to T. That’s meant literally: two references to T, not one reference to T and one reference to T or any subclass (such a thing doesn’t exist in Rust), not one reference to T and one reference to any other value that implements Eq, but two bona-fide non-heterogeneous references to the same concrete type, that the function can then compare for equality (or ordering).

This is important, because we want to use this to implement methods like sort:

impl Vec<T> {
    pub fn sort(&mut self) where T: Ord {
        // ...
    }
}

OOP-style polymorphism is ideal for heterogeneous containers, where each element has its own runtime type and its own implementation of the interfaces. But sort doesn’t work like that. You can’t sort a collection like [3, "Hello", true]; there’s no reasonable ordering across all types.

Instead, sort operates on homogeneous containers. All the elements have to match in type, so that they can be mutually compared. They don’t each need to have different implementations of the operations.

Nevertheless, sort is still polymorphic. A sorting algorithm is the same for integers or strings, but comparing integers is a completely different operation than comparing strings. The sorting algorithm needs a way of invoking an operation on its items – the comparison operation – differently for different types, while still having the same overall structure of code.

This can be done by injecting a comparison function, but many types have an intrinsic, default ordering, and sort should default to it. Thus, polymorphism – but not an OOP-friendly variety.

See the contrivance Java goes through to define sort:

static <T extends Comparable<? super T>> 
void sort(List<T> list)

There is no simple trait that can require T to be comparable to other Ts, for T to be ordered. Instead, as far as the programming language is concerned, the idea that T is comparable to itself, rather than to any other random type, is only articulated as an accident to this method. Nothing is stopping someone from implementing the Comparable interface in an inconsistent way, like having Integer implement Comparable<String>.

Additionally, when it actually looks up the implementation of Comparable, it decides what implementation to use based on the first argument of any comparison, not based on the type. Normally, they will all be the same type, but theoretically, this list could be heterogeneous, as long as all the objects “extend” T, and they could implement Comparable differently. The computer has to do extra work to indulge this possibility, even though it would certainly be a mistake.

As we’re now drifting outside of the realm of semantics, and into the realm of performance, let’s discuss the performance implementations of this fully.

The Java sort method, as we mentioned, requires every item in the collection to be a full object type, which means that instead of storing the values directly in the array, the values are stored in the heap, and references are stored in the array. This is unnecessary with a traits-based approach – the values can live directly in the array.

This means that different arrays will have different element sizes, so this has to be handled by a trait as well. And it is: The size of the values is also parameterized via the Sized trait. The size does have to be consistent among all the items of the array, but this is enforceable because we can express that all the elements are actually the exact same type – unlike Java’s List<T> which only expresses that they’re of type T or some subtype of T.

Rust’s sort method could have been implemented by passing the size information (from the Sized trait) and the ordering function (from the Ord trait) at runtime as an integer value and a function pointer. This is how typeclasses work in Haskell, which was the inspiration for Rust traits. This would still be more efficient than the Java, as there would be a single ordering function, rather than a different indirect lookup for every left side of the comparison, allowing indirect branch prediction to work in the processor.

But Rust goes even further than that, and implements its traits instead via monomorphization. This is similar to C++ template instantiation, but semantically better constrained. The premise is that while sort is only one method semantically, in the outputted, compiled code, a different version of sort is outputted for every type T that it is called with.

C++ templates create infamously bad error messages and are difficult to reason about, because they are essentially macros, and awkward ones. Even Rust cannot create great error messages with its macro system. But also, writing them requires expertise, and means that the programmer is forgoing many of the benefits of the type system – templates are often called, in my opinion rightly so, a form of compile time duck-typing. For these reasons, template programming in C++ is often considered more advanced (read as harder and less convenient rather than more powerful) than OOP-style polymorphism.

In Rust, however, traits provide an organized and more coherent way of accessing similar technology, getting the performance benefits of templates while still giving the structure of a solid type system.

Alternative #3: Dynamic Trait Objects

Sometimes, however, you do need full run-time polymorphism. You have the opposite of the scenario with the enum: You have a closed set of operations that can be performed on a value, but what those operations actually do will change dynamically in a way that cannot be bounded ahead of time.

In such situations, Rust has you covered with the dyn keyword. Please don’t overuse it, though. In almost all situations where I’ve thought it might be appropriate, static polymorphism combined with other design elements have worked out better.

Legitimate use cases for dyn tend to come up in situations involving inversion of control, where a framework library takes on a main loop, and the client code says how to handle various events. In network programming, the framework library says how to juggle all the sockets and register them with the operating system, but the application needs to say what to actually do with the data. In GUI programming, the framework code can say what widget was being clicked on, but very different things happen if that widget is a button versus a text box versus a custom widget you invented for this particular app.

Now, you don’t strictly need run-time polymorphism for this. You could use closures (or even raw function pointers) instead, creating struct of closures (or function pointers) if multiple operations are called for – which amounts to basically doing what dyn does the hard way by hand. For example, I fully expected tokio to use Rust’s run-time polymorphism feature internally to handle this inversion of control in task scheduling. Instead, for what I imagine are performance reasons, tokio implements dyn by hand, even calling its struct of function pointers Vtable.

But dyn does all of this work for you, for your trait. The only requirement is that your trait be object-safe, and the list of requirements may seem familiar, especially when it comes to the requirements for an associated function (e.g. a method) to be “dispatchable”:

Not have any type parameters (although lifetime parameters are allowed),

Be a method that does not use Self except in the type of the receiver.

Have a receiver with one of the following types:

&Self (i.e. &self)

&mut Self (i.e &mut self)

Box<Self>

Rc<Self>

Arc<Self>

Pin<P> where P is one of the types above

Does not have a where Self: Sized bound (receiver type of Self (i.e. self) implies this).

That is to say, it can be polymorphic in exactly one parameter, and that parameter must be by reference – more or less the exact requirements for methods to support run-time polymorphism in OOP.

This is of course because dyn uses almost exactly the same mechanism as OOP to implement run-time polymorphism: the “vtable.” Box<dyn Foo> really contains two pointers rather than one, one to the object in question, and the pointer to the “vtable,” the automatically-generated structure of function pointers for that type. The one-parameter requirement is because that is the parameter whose vtable is used to look up which concrete implementation of a method to call, and the indirection requirement is because the concrete type might be different sizes, with the size only known at run-time.

To be clear, these are limitations on one particular implementation strategy for run-time polymorphism. Alternative strategies exist that fully decouple the vtable from individual values of the type, as in Haskell.

There are still a few advantages of Rust’s version of run-time polymorphism with traits as opposed to OOP-style interfaces.

Performance-wise, it’s something done alongside a type, rather than intrinsic to the type. Normal values don’t store a vtable, spreading the cost of this throughout the program, but rather, the vtables are only referenced when a dyn pointer is created. If you never create a dyn pointer to a value of a given type, that type’s vtable doesn’t even have to be created. Certainly, you don’t have 8 bytes of extra gunk in every allocation for all the vtable pointers! This also means there’s one fewer level of indirection.

Semantically, it’s also a good thing that it’s just one option among many, and that it’s not the strongly preferred option that the entire programming language is trying to push you towards. Often, even usually, static polymorphism, enums, or even just good old-fashioned closures more accurately represent the problem at hand, and should be used instead.

Finally, the fact that run-time and static polymorphism in Rust both use traits makes it easier to transition from one system to another. If you find yourself using dyn for a trait, you don’t have to use it everywhere that trait is used. You can use the mechanisms of static polymorphism (like type parameters and impl Trait) instead, freely mixing and matching with the same traits.

Unlike in C++, you don’t have to learn two completely different sets of syntax for concepts vs parent classes, and vastly different semantics. Really, in Rust, dynamic polymorphism is just a special case of static polymorphism, and the only differences are the things that actually are different.

The Debt Ceiling Is Unconstitutional, and Biden Should Just Say So

2023-02-02T00:00:00+00:00

The validity of the public debt of the United States, authorized by law, including debts incurred for payment of pensions and bounties for services in suppressing insurrection or rebellion, shall not be questioned.

US Constitution, 14th Amendment, Section 4

The debt ceiling is unconstitutional. We’ve let the Republicans play their games for long enough, in the interest of “stability of the economy” and a general fear of rocking the boat, but that time is over now. President Biden should simply announce that his administration will not follow this brazenly unconstituional law, because unconstitutional is literally what it is, and every Congressperson who wants to use it as leverage is in flagrant violation of their oath of office.

Often, when people say something is unconstitutional, they mean they don’t like it, or that a Supreme Court decision they agree with has ruled it that way, or one that is established deeply in precedent. In this situation, however, it’s literally in the text of the US Constitution, a document that we in the US are raised to treat as a sacred.

Let me explain.

Congress has given the President and his administration three sets of instructions, three policies, three laws:

The budget: To spend X amount.
The tax code: To tax Y amount.
The debt ceiling: To not go over Z amount of debt.

None of these are optional. President Biden may not unilaterally cut spending. He certainly may not unilaterally raise taxes. And so, in our current situation, where we have reached the ceiling, the only way to follow these instructions from Congress, to spend the money he is obligated to spend with the tax money he is allowed to collect, is to default on payments on debt.

This is how the debt ceiling is generally interpreted, as a legal requirement to default, coming from Congress. That is the only interpretation that makes sense from the perspective of the House Republicans who are trying to use this as leverage in a negotiation.

But Congress isn’t allowed to require that. It’s literally unconstitutional.

I’m not the only person who thinks so. Here’s a sampling of other articles making the same point, which comes from sources I just happen to have been reading. The first most matches my perspective:

Of course, the Constitution is only as good as people actually paying attention to it. Republicans have recently demonstrated repeatedly that their respect for it is only lip service, that they actually despise the document.

So if you don’t care about the Constitution, care about the economy – the US economy and the world economy. Care about the stability of the US dollar. A default would destroy faith in this country, just as the authors of the 14th amendment feared. It would result in lawsuits that would likely invalidate the debt ceiling anyway in the courts – after the massive economic damage has already been done. The damage would be unimaginable: We literally have never done something so stupid before, and have no idea what would happen to the US’s position in the world, or to the US dollar.

If you’re genuinely worried about the Federal government overspending, this is not an appropriate forum to express your worries. There already is a process for that, and it’s called the budget.

My Reaction to Dr. Stroustrup's Recent Memory Safety Comments

2023-01-30T00:00:00+00:00

The NSA recently published a Cybersecurity Information Sheet about the importance of memory safety, where they recommended moving from memory-unsafe programming languages (like C and C++) to memory-safe ones (like Rust). Dr. Bjarne Stroustrup, the original creator of C++, has made some waves with his response.

To be honest, I was disappointed. As a current die-hard Rustacean and former die-hard C++ programmer, I have thought (and blogged) quite a bit about the topic of Rust vs C++. Unfortunately, I feel that in spite of the exhortation in his title to “think seriously about safety,” Dr. Stroustrup was not in fact thinking seriously himself. Instead of engaging conceptually with the article, he seems to have reflexively thrown together some talking points – some of them very stale – not realizing that they mostly are not even relevant to the NSA’s Cybersecurity Information Sheet, let alone a thoughtful rebuttal of it.

Fortunately, he does eventually discuss his own ideas of how to make C++ memory safe – in the future. If these ideas are implemented well, it will make C++ a safe programming language as the NSA’s Cybersecurity Information Sheet has defined it. But given that they are currently just proposals in an early stage, it’s unfair of him to expect the NSA to mention them when advising people on what programming language to use. C++ has been an unsafe language for a long time. Maybe someday that will change, but we’ll believe it when we actually see it.

But before I discuss that, I’d like to rebut and discuss my disappointment at the talking points he uses earlier in his response, because I think they unfairly frame the debate, shield C++ from legitimate and important criticism, and slander memory-safe programming languages and downplay memory safety as a concept, even though it’s very important.

Multiple Types of Safety?

One of the most interesting and conceptually relevant points that Dr. Stroustrup harps on is that memory safety is not the only type of safety:

Also, as described, “safe” is limited to memory safety, leaving out on the order of a dozen other ways that a language could (and will) be used to violate some form of safety and security.

This might technically be true – it’s not entirely clear what other forms of “safety” he’s talking about – but it’s misleading. Memory unsafety is not just one of a dozen equally important forms of “unsafety.” Rather, memory unsafety is by far the biggest source of security vulnerabilities and instability in memory unsafe programming languages – estimates as high as 70 percent in some contexts.

A 70% decrease in security vulnerabilities is worth committing significant resources towards. Memory safety on its own is worth writing a Cybersecurity Information Sheet about, and it is the area where C++ has the most serious deficits. Given that, this feels like a car manufacturer whose cars do not provide air bags responding to a government advisory not to buy the C++ cars by saying “What about other types of safety? By talking just about air bags, the government is clearly not thinking seriously about safety.” Sure, there’s other types of safety features besides air bags (or memory safety), but air bags are still important!

So, Dr. Stroustrup, what about memory safety in C++? Shouldn’t C++ have memory safety? Are you saying it’s not important, especially when all of these other programming languages have it?

Of course, he doesn’t go into detail about other types of safety, which is telling. Of course, it’s because C++ doesn’t really have the advantage in any of them. For example, Rust also has a lot of mechanisms for thread safety and type safety, intimately connected with its memory safety mechanisms, and baked into the design of Rust in a way that would be next to impossible to retrofit into another programming language.

And, when you read later on about the “safety profiles” in the C++ Core Guidelines that he makes such a big deal about, most of the focus there is also about memory safety.

Petty Irrelevancies

Let’s look at some of the other points he makes.

That specifically and explicitly excludes C and C++ as unsafe.

C++ does not enforce memory safety as a feature of the programming language. This may change in the future (as Dr. Stroustrup discusses), but is the current state of things. Dr. Stroustrup tries to downplay this, but is not convincing.

As is far too common, it lumps C and C++ into the single category C/C++, ignoring 30+ years of progress.

Writing “C/C++” to mean “C and C++” is considered a faux pas among C++ programmers, and among C programmers as well, because it is seen as asserting that these two programming languages are near-identical when there are in fact major differences between them. By pointing out that the NSA does this, Dr. Stroustrup is trying to make them look like they don’t know what they’re talking about, just because they used a “/” character instead of the word “and.”

He’s reading too much into the orthography and the NSA’s failure to use insider shibboleths of the programming languages they’re trying to criticize. Outside of the “C” and “C++” communities, “C/C++” is a fairly common way to refer to the two related programming languages.

And that’s the most relevant thing here: C and C++ are indeed related programming languages, and they have a lot in common: They are both compiled programming languages with a focus on performance, and they are (very relevantly) both not particularly focused on guaranteeing memory safety. C and C++ have a substantial common subset, with many memory unsafe features that are popular with programmers, perhaps even more popular because they work similarly in both programming languages. For the purposes of this document, it’s often the features that C and C++ have in common that are the problematic ones, so it makes sense for the NSA to lump them together.

While there might be 30+ years of divergence between C and C++, none of C++’s so-called “progress” involved removing memory-unsafe C features from C++, many of which are still in common use, and many of which still make memory safety in C++ near intractible. Sure, new features in C++ have been added that (in some but by no means all cases) do not make it as easy to corrupt memory, but the bad old features are not in any real way being phased out: They are not guarded by any special opt-in syntax, nor in many cases do they result in warnings. Given that, the combined set of features is as strong as its weakest link.

Unfortunately, much C++ use is also stuck in the distant past, ignoring improvements, including ways of dramatically improving safety.

This is a common C++ talking point, but it doesn’t help Dr. Stroustrup’s position as much as he thinks it does.

He’s trying to talk up how much C++ has improved, especially in the last 11 years – and it has indeed improved. New ways of writing C++, emphasizing relatively new features, can indeed result in more reliable C++ code with less memory corruption.

But unfortunately, this talking point just serves to remind us that these old memory-unsafe features are still in common use. When someone says their project is written in Rust, we can guess that it likely uses only the safe features (including using standard library functions that use unsafe internally – that truly doesn’t count as unsafe), or maybe uses the unsafe features when absolutely necessary. But when someone says their project is written in C++, by Dr. Stroustrup’s own admission, there’s a high likelihood that it uses old features “stuck in the distant past, ignoring … ways of dramatically improving safety.” This is also a reason to avoid C++.

However, I would also contest his claim about these new features. Memory safety isn’t just an absence of memory corruption, but a reliable method for ensuring the absence of memory corruption. “Using new features” isn’t good enough. Even if using the new features in preference to the old ones were a guarantee of memory safety – which it isn’t, they’re less memory corrupting but not truly memory safe – the presence of the old ones would still cause problems. You would need some mechanism to ensure that the new features were only used safely, and that the old features were not used, and no such mechanism exists, at least not in the programming language itself. Someone who remembers the old features can always still slip up and use one by accident.

Static Analysis: Not Good Enough

Dr. Stroustrup points out that he’s been working very hard on improving memory safety in C++, for a very long time:

After all, I have worked for decades to make it possible to write better, safer, and more efficient C++. In particular, the work on the C++ Core Guidelines specifically aims at delivering statically guaranteed type-safe and resource-safe C++ for people who need that without disrupting code bases that can manage without such strong guarantees or introducing additional tool chains.

Unfortunately, it’s not done. The key word here is, of course, “aims.” The next sentences admit that this feature is not in fact available:

For example, the Microsoft Visual Studio analyzer and its memory-safety profile deliver much of the CG support today and any good static analyzer (e.g., Clang tidy, that has some CG support) could be made to completely deliver those guarantees….

For memory safety, “much of” is not really good enough, and “could be made” is practically worthless. Fundamentally, the point is that memory safety in C++ is a project being actively worked on, and close to existing. Meanwhile, Rust (and Swift, C#, Java, and others) already implements memory safety.

It’s worse than that, though. What Dr. Stroustrup is trying to downplay is that this involves using static analyzers, considered separate from the programming language, something the NSA’s original article also discusses. Theoretically, if a static analyzer could be used to guarantee memory safety, that could be just as reliable as a programming language that does it. An engineering team could have a policy that all code must pass this static analysis before being put into production.

But unfortunately, human nature is more fickle than that. If it’s not built into the programming language, it’s going to get skipped. If a vendor says their software is written in C++, or if an engineer takes a job in C++, how will they know that these static analyzers will in fact be used? A programming language that takes memory safety seriously doesn’t provide it as an optional add-on that most people will simply ignore.

But All The C++ Code!

The end of the last quote provides a common talking point in Rust vs C++ arguments:

[Static analyzers] could be made to completely deliver those guarantees at a fraction of the cost of a change to a variety of novel “safe” languages.

Besides the laughably condescending matter of calling Java (which first appeared in 1995), C# (first appeared in 2000), and Ruby (first appeared in 1995) “novel,” this is a jab at a common trope that (some immature) Rust programmers go around demanding that people rewrite their projects in Rust (please don’t do this!), and an attack on the idea that all code can be written in safe programming languages, given the large body of existing work in unsafe programming languages.

This is a bit of a straw man in this context. The NSA article that Stroustrup is responding to addresses that switching existing codebases might be expensive, even prohibitively so, saying:

It is not trivial to shift a mature software development infrastructure from one computer language to another. Skilled programmers need to be trained in a new language and there is an efficiency hit when using a new language. Programmers must endure a learning curve and work their way through any “newbie” mistakes. While another approach is to hire programmers skilled in a memory safe language, they too will have their own learning curve for understanding the existing code base and the domain in which the software will function.

It then follows this up immediately with an explanation of how tools like static analyzers can be used as a back-up plan for improving memory safety in memory unsafe programming languages – exactly what Dr. Stroustrup discusses. He’s criticizing this NSA document, implying it is not thinking “seriously,” while fundamentally making a point that they already made for him.

Of course, this is a terrible endorsement of C++. It’s far from ideal to have to use add-on tools to work around a language’s flaws. Coming from Dr. Stroustrup, it reads more like a brag that his programming language has locked everyone in than a defense of why C++ is good. Or else, it’s an admission that other programming languages should be used for new projects, and that C++’s fate is now to gradually fade like the elves from Middle Earth.

But he’s also overstating his case. As I mention before, safe programming languages have existed for a long time. Many programming projects that in the early 90’s would have been done in C or C++ have in fact been done in safe programming languages instead, and according to the NSA’s recommendation, that was a good idea. As computers have gotten faster and programming language technology has improved, there has been fewer and fewer reasons to settle for languages like C or C++ that don’t have memory safety as a feature.

When I was a professional C++ programmer as early as 2013, some people – even some programmers – already thought that C++ was a legacy programming language like COBOL or Fortran. And outside of narrow niches like systems programming (e.g. web browsers, operating systems, and lower-level libraries), video games, or high performance programming, it kind of has become one. The former application niches of C++ have been taken over by Java and C#, or more recently by Go. If you have an application program written in C++, chances are that it’s a relatively old codebase, or written at a shop that has reasons to write a lot of C++ (such as a high-frequency trading firm).

Now, even C++’s systems niche is under threat, with Rust, a powerful memory-safe programming language that avoids many of C++’s problems. Now, even the niches where C++ isn’t at all “legacy” have a viable, memory-safe alternative without a lot of the technical debt that C++ has. Rust is even allowed in the Linux kernel, a project that has only previously accepted C, and whose chief maintainer has always explicitly hated C++.

A Memory-Safe C++

Fortunately, after all of these ill-thought out, tired talking points, Dr. Stroustrup subtly changes his perspective. After his distractions, after bashing memory safe programming languages as “novel,” bragging about how C++ is too entrenched to be removable, pretending memory safety is just one of many equally important safety issues, and promising optional add-on tools that will eventually be standardized, he finally begins to tackle the question of how C++ could be made memory safe, in an opt-in fashion:

There is not just one definition of “safety”, and we can achieve a variety of kinds of safety through a combination of programming styles, support libraries, and enforcement through static analysis. P2410r0 gives a brief summary of the approach. I envision compiler options and code annotations for requesting rules to be enforced. The most obvious would be to request guaranteed full type-and-resource safety. P2687R0 is a start on how the standard can support this, R1 will be more specific. Naturally, comments and suggestions are most welcome.

…

For example, in application domains where performance is the main concern, the P2687R0 approach lets you apply the safety guarantees only where required and use your favorite tuning techniques where needed. Partial adoption of some of the rules (e.g., rules for range checking and initialization) is likely to be important. Gradual adoption of safety rules and adoption of differing safety rules will be important. If for no other reason than the billions of lines of C++ code will not magically disappear, and even “safe” code (in any language) will have to call traditional C or C++ code or be called by traditional code that does not offer specific safety guarantees.

This is a lot closer to what the NSA document actually specifies for memory safe programming languages than he gives the document credit for. For example, the document already provides for opting out of memory safety via annotation, paired with an observation that that will focus scrutiny on the code that opts out.

Dr. Stroustrup did not need to criticize the document for not thinking “seriously” to reach this conclusion, but simply acknowledge that it’s true that C++ is not a memory safe programming language yet, but that based on his work, it might soon become one. Maybe the next version of the NSA document will endorse using C++, but only if it’s C++ZZ – where ZZ is some future version of the C++ standard.

I’m glad comments and suggestions are welcome, however, because I have a huge one.

Opt-in for memory safety is unacceptable, and is almost as bad as having a separate static analysis tool to enforce safety. Opt-out is fine – Rust has a way to opt out of memory safety with the unsafe keyword, and this concept is discussed and defended in the NSA’s original document. But the default should be to enforce memory safety unless otherwise specified.

For C++, this means that if these safety features are added in C++ZZ, --std=c++ZZ should cause unsafe constructs to be rejected – and the C++ standard should require that these constructs be rejected for an implementation to be a conforming implementation of C++ZZ. Perhaps (but only perhaps) other command line arguments could be added to override this constraint on a file-by-file basis. Ideally, a new compiler command (e.g. g++ZZ) should be created for each implementation that defaults to this stricter behavior.

Parts of the codebase that use legacy features should have to have at least a file-level annotation that that file is a legacy file – and then this annotation could gradually be moved to the function level. As a side benefit, this could also be used to phase out and deprecate weird points of C++ syntax, similar to the Rust edition system: Anyone using, for example, 0 literals to mean nullptr would have to declare some sort of a legacy annotation on their file or in their build system.

Only with this sort of opt-out memory-safety system would I consider C++ a memory safe programming language. I’d be very happy to see a memory-safe C++. I earnestly hope Dr. Stroustrup is successful in his endeavors. I’m not holding my breath, though, and in the meantime, I will continue to use other programming languages, that are already memory-safe, for my new projects, as will the majority of programmers.

In the meantime, it is unfair for Dr. Stroustrup to call safe programming languages novelties or to pretend that C++ isn’t already far behind the times on this. This was already an important criticism of C++ decades ago, when Java first came out in the 90’s and was referred to as a “managed programming language.” This was discussed in detail in my classes when I was a college student in the late aughts. To read Dr. Stroustrup’s writing, C++ is being criticized by “novel” upstarts when it is well on its way to getting the feature, but in actuality, the time to act was 1996.

Complexities of Defining ADHD

2023-01-18T00:00:00+00:00

ADHD is a controversial topic, and it’s never been more relevant. Diagnoses are soaring right now, driven up by a variety of interacting forces. Open discussion about ADHD – and the related general concept of “neurodiversity” – has been exploding on the Internet. And recently, there’s been a very unfortunate Adderall shortage.

So I wanted to take an opportunity to share some thoughts about it. I would say that I was taking this opportunity to clear things up, but unfortunately, that might not be possible. The reality is a really muddy situation, and many people’s mental models – including many professionals’ – are oversimplifications.

This is unfortunate because ADHD is an important issue, not just in childhood, but in adulthood as well. It is prevalent: according to one study, it affects 4.4% of adults in the US, and according to another, 2.8% of adults globally (numbers can vary greatly, for reasons we’ll discuss). ADHD can, especially if untreated, cause severe adverse life outcomes, including up to a 13 year decrease in life expectancy. Treatment for ADHD – especially stimulant medications – is very effective, and access to it is an urgent matter for those who need it.

Aside: ADHD: A Misnomer

There are a lot of misconceptions about it that cause people to think ADHD is less severe than it actually is, many to do with the name. ADHD, or Attention Deficit Hyperactivity Disorder, is named for the two traits that bother parents and teachers the most when they manifest in children: inattention and hyperactivity. While they are important in a classroom or disciplinary setting, they are not the actual core symptoms, or the symptoms that cause people with ADHD – especially adults but also children – the most trouble. And I will focus on adults with ADHD in this post, because I am an adult with ADHD.

So what are the actual core symptoms? Dr. Russell Barkley, one of the leading experts on ADHD, considers ADHD to be a misnomer. He summarizes it instead as an “Executive Function Deficit Disorder” because its core symptom is difficulty with executive functions, which he lists and explains in more detail in this article, essential reading to understanding ADHD better.

In a terminological distinction of questionable value, ADHD is considered a neurodevelopmental disorder like dyslexia or autism, which are considered distinct from a mental illness like anxiety or depression. Disorders of both categories are documented in the DSM, or Diagnostic and Statistical Manual of Mental Disorders. ADHD, while not itself considered a form of mental illness, does lead to an increased likelihood of developing a mental illness.

ADHD is a serious and relatively prevalent condition in adults, so it’s fortunate that such effective treatments exist, and that it has been able to be studied as well as it has been. Unfortunately, its causes are poorly understood, and even defining what ADHD is or what it means for someone to have ADHD can be surprisingly difficult.

In this post, I intend to explain why ADHD is so difficult to define, and explore some of the consequences of that difficulty.

Competing Approaches to Defining ADHD

Let me start with an example: Trauma, especially Complex Post-Traumatic Stress Disorder (CPTSD), can have a lot of the same symptoms as ADHD. It can cause difficulty with executive function, which is the core symptom of ADHD. Specifically, it can cause trouble staying on task, keeping track of responsibilities and physical objects, and restlessness – all classic ADHD symptoms.

But how to think of that is something of a philosophical question: Is ADHD a pattern of symptoms with common coping skills and treatments? In that case, we could say that CPTSD can cause ADHD. Or is ADHD an attempt to figure out an underlying specific brain disorder? In that case, we wouldn’t want to say that trauma “causes ADHD.”

This question comes up surprisingly often; this connection between CPTSD and ADHD is just one example. It is a surprisingly nuanced question, and I’ve seen ADHD (and its connection to CPTSD) framed both ways by reliable sources. I don’t think it necessarily has a clear answer. At a certain point, it can feel like “arguing over semantics.” But it is important, because we need some way of categorizing and discussing people’s brains, if only to provide treatment.

In practice, the answer may depend on context. For a therapist teaching coping skills, it might be easier to think about it as “trauma causes ADHD,” and then teach the ADHD coping skills. For a psychiatrist, the underlying causes may (or may not) be more relevant, depending on how much it influences the effectiveness of various medications; treating the CPTSD with typical CPTSD medications (such as anti-depressants or mood stabilizers) might (or might not) be a better way of treating the ADHD-like symptoms, rather than prescribing an ADHD medication like Adderall.

Intuitively, it seems obvious: Split them up. We like to think of ADHD as a neat and tidy disorder, one that you’re born with, one that’s genetic. CPTSD is acquired, and has drastically different causes than a typical ADHD case. It seems obvious that different causes should mean different disorders. And if some of the same techniques are helpful, therapists can think of the ADHD traits caused by CPTSD as just that: “ADHD traits.” And if some of the same medications are helpful, we can just say something along the lines of “in some cases ADHD medications can help with CPTSD.”

But it’s harder than you might think to fully avoid basing the definition on the symptoms. For all the definitions in the world, in practice “people with ADHD” means “people diagnosed with ADHD” – and ADHD is diagnosed based on the symptoms. While research into the causes and underlying neurological mechanisms have made great strides, the best diagnostic tools we have don’t involve brain scans or genetic tests. Instead, you have to use some combination of surveying and interviewing the patient, surveying or interviewing people the patient knows, or doing cognitive tests to see if the patient is in fact impaired in those areas of cognition that ADHD makes more difficult.

All of these involve investigating symptoms, not causes. And ADHD diagnosis also requires that these symptoms actually cause problems. To quote the standard DSM (The Diagnostic and Statistical Manual of Mental Disorders), an ADHD diagnosis has this absolute criterion:

D. There is clear evidence that the symptoms interfere with, or reduce the quality of, social, academic, or occupational functioning.

This all paints a picture of a definition – or at least a diagnostic process – based on the symptoms. And yet, the DSM also includes a criterion that points in the direction of ADHD being a discrete disorder, rather than a collection of symptoms:

E. The symptoms do not occur exclusively during the course of schizophrenia or another psychotic disorder and are not better explained by another mental disorder (e.g., mood disorder, anxiety disorder, dissociative disorder, personality disorder, substance intoxication or withdrawal).

This would be more straight-forward if ADHD had a known, specific cause. If there were a single known mutation that caused ADHD – like the ones known to cause Down syndrome or Fragile X syndrome – it would be clear: you would either have ADHD or you don’t, based on whether you had that genetic abnormality.

But though there has been some research on the genetics of ADHD, it is far from definitive or conclusive. In fact, ADHD isn’t even 100% determined by genetics. Instead, it has an estimated heritability of 77-88%, which is a measure of how likely it is for a person with ADHD’s identical twin to also have ADHD, and related to how likely it is for their other relatives to have it.

And to be clear, heritability is difficult to study (and thus the wide range): It’s hard to control genetic factors from environmental ones, when you necessarily have to consider people who are blood-related to each other. Furthermore, this doesn’t mean that one “ADHD gene” represents all of this heritability – there likely are many different genes lumped in there, all causing (potentially different flavors of) ADHD, with different levels of heritability that then work out to a weighted average of 77-88%. And sometimes, related people might both have ADHD traits by coincidence, and that’s also hard to control for: How easy is it for a study to verify that the specific executive function deficits experienced by these relatives are similar?

Non-genetic risk factors also abound: people born prematurely are twice to three times as likely to develop ADHD. Modern ADHD research got its start with yet another risk factor: children recovering from Spanish Flu started having drastic behavioral shifts, that were then called “Minimal Brain Damage” or “Minimal Brain Dysfunction,” which was ultimately renamed ADHD. This has come to more public attention recently due to concerns with long COVID. (That article contains quotes from Russell Barkley, the leading expert on ADHD mentioned above when discussing executive functions.)

Given this diversity of causes, and our substantial but still incomplete understanding of the neurological mechanisms, do we have the knowledge necessary to think of ADHD as a disorder of symptoms, rather than a list of symptoms that tend to correlate with each other? I am far from the first person to consider this; some have even gone as far to propose that the word “disorder” be removed from the acronym as misleading.

So, returning to the CPTSD example, perhaps it makes more sense to add one more possible cause. Perhaps saying that CPTSD can cause ADHD traits is equivalent to saying that CPTSD causes ADHD, because “ADHD traits” is all the traction we have on this disorder.

Aside: Obligatory Caveat

I say “perhaps” for a reason. One reason not to would be if there were discernible differences, especially in treatment, between ADHD as caused by CPTSD, and other cases: for example, differences in effective medication.

And about medication I offer no opinion. I’m not a psychiatrist, nor am I any sort of expert on CPTSD – that is not the cause of my ADHD, personally. It’s a very complicated issue, especially because the causal arrow might point both directions, which is to say that recent studies have shown that not only can CPTSD cause ADHD traits, but ADHD is a risk factor in developing CPTSD. For any individual case, it may not be clear which came first.

As you can see, ADHD is not a simple, tidy disorder at all (and neither is CPTSD) and that, more than any particular position, is what I want you as a reader to take away from this.

Of course, we might someday discover a crystal-clear ADHD gene, or perhaps a couple of them. This would mean essentially that we were discovering new disorders, disorders that were previously all lumped under the umbrella label of ADHD. Once this happens, we’d have to decide as a society what to do to realign our labels.

And how to realign the labels will depend on the nature of the discovery. Perhaps one gene would cover the majority of people with ADHD, and therefore it might keep the name, with a new, objective, genetic test. The others would be considered “ADHD-like.” Or perhaps, a smaller group of patients would be covered under a narrower disorder, and then we would say things like “this used to be considered a type of ADHD, but now we know better.”

This may seem far-fetched, but it has already happened with autism. Similarly to ADHD, autism is primarily diagnosed based on symptoms. But there are genetic disorders, like Fragile X and especially Rett syndrome with substantial overlap in symptoms to autism. Rett syndrome in particular used to be categorized alongside autism in the DSM, as a “pervasive developmental disorder” alongside Asperger syndrome and autism proper – basically as one of several parts of the “autism spectrum.” But when its genetic and neurological mechanisms were discovered, it was removed from that section, and from the DSM entirely.

Perhaps, as more and more discrete causes of autism are discovered, this will happen more and more. Autism is currently a very large umbrella, appropriately termed a “spectrum,” covering profoundly disabled adults who cannot take care of themselves, and mostly functional adults who simply exhibit some levels of social and executive function difficulty.

Aside: Are disorders with multiple causes “real”?

One popular conclusion to draw from the muddied and ill-understood causes of ADHD is that ADHD is not real. One example of this is the fringe book ADHD Does Not Exist by Richard Saul, a neurologist who wrote this book with no backing from wider research, and who is not widely recognized as legitimate among ADHD experts. Nevertheless, the book is popular in some circles, and Richard Saul has gotten traction with some parents and teachers (and unfortunately even some doctors), and even wrote an opinion piece in Time Magazine.

In the book’s blurb, it says that “ADHD is actually a cluster of symptoms stemming from over 20 other conditions or disorders” – a statement that may be tempting to believe, given what I’ve said above, but is ultimately deeply misleading. So I thought I’d spend some time picking this argument apart.

First, we don’t know a complete list of what disorders can “cause ADHD,” but there’s lots of evidence that ADHD is primarily genetic. Whatever other problems the gene(s) involved may cause, and whatever shifts in categorization may be brought about by further research, there are definitely genes that do cause ADHD symptoms.

There are also definitely many people whose primary set of symptoms that raise to the level of needing psychiatric care are exactly that set of symptoms, the typical ADHD. Whether this set of symptoms is caused by one gene or many, and whether it is caused by genes alone or a combination of genes and environment, or even sometimes by environment alone, it is a real occurrence and a real problem that often occurs on its own.

That is enough to make ADHD exist.

But more importantly, this set of symptoms, whatever genes mediate them and whatever variety of causes they have, can be extremely debilitating. People need treatments for it now, and it has well-proven, well-studied, extremely effective treatments, especially medicinal ones. Even if ADHD were primarily caused by other identifiable psychiatric disorders, that would not mean setting aside ADHD medications. They would still be effective for all the people they’re currently effective for.

If anything, this perspective should make us study expanding the use of ADHD treatments and medications to situations where the symptoms can be said to be caused by other disorders, rather than give up on them for everyone in hopes of finding the “proper” treatment for the “underlying” disorder for every individual.

Of course, the popularity of this book and its flawed line of thinking is easy to explain: Many people have already made up their mind that ADHD medications are problematic, and are looking for any excuse to get rid of them. Motivated reasoning abounds.

What to do with the connection between ADHD and autism?

I want to return to the topic of autism. As I mentioned, autism often comes with some level of deficit in executive function. That throws a wrinkle into the definition of ADHD, because a deficit of executive function is the core symptom of ADHD, the summary or cause of all the other symptoms.

Given this, it’s not surprising that many children and adults seem eligible for both diagnoses. In the past, following a model of ADHD where it was considered a discrete disorder with its own particular causes (even though we don’t understand them), practitioners had to choose one. A diagnosis of autism (or the then-separate diagnosis of Asperger syndrome) could explain any and all ADHD symptoms, so if both sets of diagnostic criteria were met, Asperger syndrome was the one chosen.

But that was changed in the most recent edition of the DSM, the DSM V, so now both diagnoses are possible in the same person. This has been a great step forward pragmatically: It has allowed children and adults who exhibit traits from both disorders to get access to better treatment, especially stimulant ADHD medications, which are among the most consistently effective psychiatric treatments modern medicine has ever developed.

But this has also led to some surprising, and philosophically challenging, results. Now that it’s possible for a person to have both diagnoses, we have found a huge amount of comorbidity, which is correlation between two disorders – an amount of comorbidity that leads to questions about whether we’re categorizing these disorders correctly.

According to a meta-analysis 50-70% of those with a diagnosis of ASD (autism spectrum disorder) also meet the criteria for a diagnosis of ADHD. Many experts believe the number should be even higher. Some even believe that all autism cases cause executive dysfunction and therefore can be expected to lead to ADHD symptoms – and that therefore we should no longer allow concurrent diagnoses.

In the other direction, we necessarily see lower numbers, because ASD is less common than ADHD. Still, around 20%-30% of people with ADHD are diagnosable with autism, especially if it is specifically screened for. Given that ADHD is about 2 to 2.5 times as prevalent as autism (depending on the studies used), these are the numbers we’d expect mathematically. The connection may be even stronger if we consider the prevalence of specific autism symptoms, such as sensory sensitivity, in cases where a full autism diagnosis isn’t indicated.

So what should we do with this? Given that both ADHD and autism have unclear and diverse causes, we treat them in practice, if not always in theory, as correlated symptoms or traits. But if they’re also correlated with each other, as they seem to be, then what basis do we have for separating them? Should we merge them into one disorder? Given the relative prevalences, should we consider autism to be a more severe form of ADHD? A more narrowly defined subset of it?

If we were to combine them, it wouldn’t be the first time two disorders were merged – even ones that might seem drastically different from the outside. The DSM V merged autism and Asperger syndrome into one diagnosis, “autism spectrum disorder.” And ADHD used to be considered distinct from the non-hyperactive ADD, but now it is just one disorder, ADHD, which can then be subdivided hyperactive, inattentive, or combined “presentations.”

Many on social media have already made up their mind, and rushed ahead of the experts. One particular Instagram post asked, “Is ADHD on the autism spectrum?” In spite of the stereotype that all articles titled with a question can be summarized as “no,” the linked article gave an enthusiastic, almost gleeful “yes.”

I commented that if ADHD and autism are connected, there might be a better way to express this connection than saying “ADHD is on the autism spectrum.” In fact, given that ADHD is the more common diagnosis, perhaps it would be more accurate to say that autism is on the ADHD spectrum – and probably less stigmatized at that. For this, I was yelled at for being an ableist. Ah, the folly of writing things on the Internet (says Jimmy, while writing a blog post on the Internet).

Less controversially, many have adopted the term neurodivergent as a de facto umbrella term for autism and ADHD – and other disorders, like CPTSD, that share traits with them. This term originated from autism advocacy, to shift from a model where such disorders are treated as pathologies to a model where they are treated as differences, fully natural and possibly even beneficial.

Theoretically, the term neurodivergent is meant to include anyone whose brain substantially differs from the brains of average – or neurotypical – people. If this theoretical definition is given credence, especially when coupled with an insistence that it doesn’t have to refer to a pathology, it can become dizzyingly broad almost to the point of meaninglessness. Are left-handed people neurodivergent? Are people with anxiety? Are people with extraordinary talents, even when not coupled with any symptoms of any recognized disorder? If the definition is broad enough, then there won’t be a substantial number of neurotypical people left! Does that make the term meaningless? Or is that, in fact, the point?

But in my experience, the term primarily seems to be used to describe the nebulous space of traits with substantial overlap with autism and ADHD – such as autism and ADHD themselves, and disorders with significant symptom overlap, like sensory integration disorder (SID), and, of course, CPTSD.

This serves a practical purpose: It allows people to share advice, common experiences, and coping mechanisms without getting into the trouble of playing the game of which specific diagnosis they’re for. And while sometimes there’s glitches (such as universal human experiences being depicted as “neurodiverse” experiences), overall, this is a helpful thing.

But while the Internet has addressed the terminology problem appropriately for the goals of sharing empathy and coping skills, professionals still have to deal with the panoply of diagnoses. For them, there are many practical questions to wrestle with:

Should a person with both ADHD and autism traits be diagnosed with both?
Should they be diagnosed with autism only, out of philosophical reasons, as was required in the 90’s under DSM IV, even if the ADHD traits are the ones that actually cause them the most trouble?
Is autism a term for a particular sub-type of ADHD, and should it automatically come with an ADHD diagnosis?
Should ADHD interventions and medications be tried more often for those whose diagnosis is just autism?
Should there be more mechanisms available for people to switch from an autism diagnosis to an ADHD diagnosis, or vice versa?
If so, how can these mechanisms be made available to children who are not capable of effective self-advocacy?

ADHD and autism as spectrums

Aside: Plural Forms

I expect this article to have a lot of neurodivergent readership, and we tend to be a pedantic lot, so I want to clarify something even though it’s objectively unimportant:

I was really tempted to write “spectra” instead of “spectrums” above, but as we’re discussing the metaphorical concept of a spectrum, and not the physics concept, I thought it would be unnecessarily confusing. The dictionary accepts the regular English plural in addition to the Latinate one, and that is the plural I have decided to adopt for this article.

After all, there’s no way that it would be appropriate to pluralize “stigma,” when used as a mental health and disability rights term, as “stigmata,” the classical Greek plural of that word.

This is made even more complicated by the fact that as with autism, ADHD traits come on a spectrum. While ADHD on its own normally doesn’t cause the types of profound disability associated with severe autism, it can cause serious struggles and suffering. Part of the stigma of ADHD is that it is not taken seriously as a deeply disabling condition, which it very much can be. Everyone knows someone who has it, but who manages it successfully with coping skills and/or medication. Everyone knows someone for whom it is – given medication or coping skills – just a personality quirk. And people project that understanding of it onto someone who has drastic problems functioning.

Meanwhile, mild autism is treated as a catastrophe, even when it’s extremely mild, even when society would be better suited treating it more like a quirk. Erring on the side of caution is still erring.

To paraphrase another Instagram meme that spoke to me greatly: How can it be that ADHD and autism are such closely related disorders, but ADHD is treated as a quirky personality trait and not taken seriously, and autism is treated as the devil’s work that has to be eradicated?

Given this, if a person, especially a child, can be diagnosed with both autism and ADHD, but the autism is mild and the ADHD is severe, ADHD may be the more appropriate diagnosis not for any objective reason, but simply for the reason of avoiding the stronger stigma.

But given the subjective nature of diagnosis, and the fact that both disorders are (in practice if not in theory) correlated bundles of traits, it’s even worse than that. The autism spectrum (and the ADHD spectrum) are normally considered to range from mild autism (or ADHD) to severe autism (or ADHD). But is there any evidence of a solid cut-off?

For genetic disorders, you typically either have it or you don’t. But for disorders that have a variety of causes, many of them unknown (even if many are heritable), that are on a spectrum of severity, there’s also the possibility of almost having the disorder. There’s people out there who almost have ADHD, or almost have autism.

There’s lots of people like this: People with “sub-clinical” ADHD or autism, or with “some ADHD (or autism) traits.” To analogize to a different field of medicine, they are the neurodevelopmental equivalent of people who have to squint a little more than average to read things far away, but don’t actually need glasses. Or people who have a little trouble telling green apart from red, but can figure it out with mild difficulty.

Perhaps such a person is one criterion short of the DSM checklist. Or perhaps they check all the boxes, but they’ve built a life for themselves where it isn’t a problem, and they fail to check the all-important box of experiencing significant “impairment in functioning.”

The spectrum of such a disorder extends from the most severe cases to the most mild, yes, but it doesn’t stop there. It extends through these sub-clinical cases, and beyond, to people with normal executive functioning (in the ADHD cases), and then great executive functioning, and then perhaps even to people who have opposite but equally dysfunctional traits.

Aside: An Anti-ADHD?

This was referenced in the article where Dr. Barkley was interviewed, where he discussed a disorder he characterized as “the opposite” of ADHD, but I suspect it’s more complicated than that.

In all honesty, from the brief description, it sounded like a blend of inattentive ADHD and mild autism to me, but perhaps I didn’t understand it correctly. I suspect some people with either or both of these diagnoses will fall into this new diagnosis instead, if and when it becomes available.

This all goes to show how much we’re still learning about this topic.

We use the term “on the spectrum” as a euphemism to mean someone who has autism spectrum disorder, but the term is really a misnomer. I’m not trying to change how we talk – I know that is beyond my power – but I do believe, in a more literal sense, everyone is somewhere on the spectrum of how much autism they have compared to the “average person.” This is true even if the amount of autism they have is negligible or even negative. And likewise with ADHD.

And you have to draw the line somewhere, but the line can move. If it’s a normal distribution, which is the most normal type of distribution to occur in nature (thus the name), most people over the line are going to be close to the line. That is to say, not only will most (ADHD or autism) cases be mild, but a substantial portion of cases will be marginal, and will genuinely be a matter of opinion.

You might assume that most people who have ADHD have clear-cut ADHD, but that simply is untrue. Most people who have ADHD have mild ADHD, and a substantial number barely have it. These people also need treatment, because by definition, the line should be put so that people who are on the ADHD side of it are impaired in functioning.

Aside: Are disorders on a spectrum “real”?

Unfortunately, the idea that these disorders are defined by symptoms and possibly on a spectrum that includes neurotypical people leads some people to conclude that therefore ADHD isn’t a “real” disorder, like in the provocative title of this article: “Is ADHD a Real Disorder or One End of a Normal Continuum.” I wish I didn’t have to address this, but unfortunately, given the level of ADHD denialism in society, which will take any excuse to deny the reality and severity of ADHD, it’s important.

It is a false dichotomy to think that something can be a disorder, or one end of a continuum, but not both. Being too far along on a continuous spectrum can be a real medical problem. For some reason, we have no trouble with the idea of considering “high blood pressure” to be a disease, even though the cut-off for what’s considered high is sometimes adjusted, and even though blood pressure readings form a continuum. Similarly, we have no trouble taking diabetes seriously, when it too is on a spectrum, and we even have names like “pre-diabetes” for other ranges on the spectrum. Why we have difficulty applying similar reasoning to neurodevelopmental disorders is beyond me.

ADHD is more complicated than these, because as I discuss, the line is drawn not based on where it causes problems for the body, but where it causes problems in context – and context changes. But that doesn’t mean that it’s not real. Many real things are not clear-cut binaries – few real things are clear-cut at all. ADHD, and ADHD diagnosis, is complicated, not because ADHD is “not real,” but because it is real.

Consequences

The fact that ADHD and autism are not clear-cut binaries does, however, lead to a number of weird effects.

It explains why children who are young for their class are more likely to be diagnosed with ADHD – something I am sure is true for autism as well. They are, after all, more likely to experience “impairment in functioning” because of their traits, because they have higher expectations of them in their context – and these impairments are more likely to attract the attention of the adults in their life.

It partially explains why the number of cases fluctuate over time. Both rates of autism and ADHD diagnosis are on the rise, and I’m sure part of that is attributable to better screening and better access to health care. But perhaps some of that is also attributable to more demands placed on our executive function, and our social conformity.

My impression is that society has gotten more difficult for mildly neurodivergent people over time. On the autism side, society has gotten more and more complex, more ironic, less rule-driven, and more informal – that is, there are more unwritten rules (that are changing faster than ever between generations), and fewer explicit ones. On the ADHD side, more and more distracting devices and social media apps degrade our attention span, as we’re expected to navigate increasingly Kafkaesque bureaucracies with less and less social support. I could write an entire blog post on how society is getting less ADHD-friendly.

This “spectrum effect” almost certainly explains why many adults “grow out of” their childhood ADHD – they didn’t actually grow out of it in the sense that they’re now in a discretely different category. Rather, what happened is that they matured and improved in absolute terms with respect to their executive function (as everyone does), and also developed coping mechanisms (as everyone does to make up for those situations where their executive function doesn’t naturally reach the task at hand). In so doing, they drifted over the line of diagnosability and clinicality, but most such people are almost certainly still on the “ADHD” side of things.

Autism is seen as incurable out of recognition that you can’t ever discretely jump categories. ADHD is seen as something you can grow out of in recognition that you can drift over the line from disorder to quirk. I believe from personal experience that this is a difference in attitude rather than fact – that those formerly ADHD children who become “neurotypical” adults are better described as no longer “clinically” ADHD than no longer ADHD at all. And, likewise, there are likely plenty of people whose childhood autism spectrum diagnosis (e.g. Asperger syndrome) may well have been valid, but who as adults would never be able to be diagnosed with it if evaluated from square one.

This explains at least part of why ADHD diagnoses went up so dramatically during COVID – people’s coping mechanisms were shattered by the restrictions and lock-downs, or by the stress and anxiety of avoiding the disease, or by the political turmoil. I know my ADHD and anxiety got much worse over COVID, so that I felt I’d lost 5 years of maturity and emotional progress. I’m not surprised that it brought some people over the line.

But given that ADHD and autism are so clearly connected, this has other consequences as well. Severe or unmedicated ADHD can be as disabling as mild autism, but different. It does come with social difficulties, often (but clearly not always) different from the autism ones. Can you tell the difference between a deficit in social performance and a deficit in social understanding, especially in a child? They raise many of the same red flags.

This leads to the following odd effect: Severe ADHD can often look like mild autism. I don’t mean just to the untrained eye; I mean also to experienced professionals. And in many cases they do go together; severe ADHD often comes with autism. But in some cases, severe ADHD gets mistaken for autism when it is not autism, especially because people will assume that ADHD is mostly relatively mild, and that therefore severe problems with functioning must correspond to a more severe diagnosis.

In situations like this, if the autism traits are mild enough, the ADHD will sometimes be the disorder that requires more treatment, or even the only disorder that is severe enough to be clinical and require treatment at all. But if it is diagnosed, autism is the disorder that causes more concern, and gets more institutional attention.

Sometimes, this institutional attention is a good thing, and can be used to get treatment that can then be tailored to the individual. But sometimes, it results in ill-tailored, overdone treatment instead, and all the stigma that comes with it. And of course, a lot of those more extreme treatments are never appropriate for anyone: Even when extreme interventions are in fact called for, not all extreme interventions are created equal.

Thoughts on “neurodiversity culture” vs medical perspectives

I do not want to arrive at the conclusion that professionals should untangle this by uncritically taking as fact everything neurodivergent people say, especially on the Internet. Internet neurodiversity culture has plenty of its own issues, and some of that is an insistence on believing everyone’s experiences that has spilled over into believing everyone’s conclusions, even if they’re questionable. Half-baked opinions are asserted as gospel truth, to be dissented from only on pain of extreme social censure – which is hard for people who struggle with any of these disorders to deal with proportionately.

Self-diagnoses and peer diagnoses are common. This is understandable because it helps people find coping mechanisms that are useful to them and answer their questions. But it can also be problematic, because sometimes important and useful treatments are missed. And people who have some ADHD or autism traits – which absolutely everyone can show from time to time – can trigger these informal diagnoses that are then also treated as unquestionable dogmas. And, of course, perfectly universal experiences are sometimes presented as signs of neurodivergence – sometimes because neurodivergent people experience them moderately more often than average, and sometimes just because it’s hard to tell subjectively what’s part of your disorder and what’s just a part of normal life.

But I would ask professionals (and parents, teachers, and loved ones – “hearts,” as How to ADHD calls them) to take neurodiversity culture seriously, even if not always at face value. Please listen, but with a grain of salt. It’s a complicated nuance, and nuance is one of the hardest things a person can ever accomplish, but I think it’s possible.

That includes this blog post – I hope that everyone reading this believes my experiences (and most of my knowledge and opinions are very strongly derived from extensive personal and vicarious experiences). I hope that my readers take my arguments and reasoning seriously, because it is greatly informed by both my experience and the huge amount of both research and consideration I’ve poured into this topic – consideration, again, heavily influenced by a deep familiarity with the facts on the ground.

But that does not mean that I’m necessarily right about all of my conclusions, even where I speak confidently. This is a complicated issue – as I hope I have conveyed – so it’s hard for anyone to be completely right about it. But also, this is not a professional interest of mine. I have studied and contemplated this topic as thoroughly as I have not because I have taken classes on it, or been naturally interested in it (especially in a “hyperfocus” or “special interest” kind of way – I could write an entire blog post about that terminology as well), but because I have been repeatedly forced to by circumstances – both mine, and those of other neurodiverse people in my life.

I know that detracts from my credibility in some ways, but hopefully adds to it in others, and that people take me seriously even when I fail to use the exact right terminology du jour, whether that be medical terminology or cultural neurodiversity terminology.

Takeaways

If there’s anything I’d ask people to take away from this, it’s that neurodivergence is anything but simple and straight-forward. Neither autism nor ADHD is a discrete disorder with an objective test. The way we organize symptoms and traits into diagnoses is arbitrary and imperfect; we can only hope that it will improve over time.

That said, ADHD medication is extremely effective, and stimulant medications are among the most effective and well-proven treatments available. I personally take Strattera, a non-stimulant, and it has been life-changing for me, addressing many issues that have caused me real problems throughout my life.

We cannot just stop prescribing Adderall because ADHD is hard to define. We can’t just wait until we’ve pinned down these definitions more to treat it. Whether or not it is caused by one or many underlying mental disorders or mental differences, ADHD is a label for very serious symptoms, and it is only properly diagnosed when there is an impairment in functioning – which there often is. It leads to vastly worse life outcomes, worse career performance, more spending (the “ADHD tax”), and in too many cases, poverty. As I mentioned in the introduction, it is objectively linked to drastically lower life expectancy. It is fundamentally mistaken to treat it as so categorically less severe and serious than autism when it is so closely related – and when it is so readily treatable with medication.

Rust and Default Parameters

2023-01-11T00:00:00+00:00

Rust doesn’t support default parameters in function signatures. And unlike in many languages, there’s no way to simulate them with function overloading. This is frustrating for many new Rustaceans coming from other programming languages, so I want to explain why this is actually a good thing, and how to use the Default trait and struct update syntax to achieve similar results.

Default parameters (and function overloading) are not part of object-oriented programming, but they are a common feature of a lot of the programming languages new Rustaceans are coming from. This post therefore fits in some ways with my on-going series on how Rust is not object-oriented, and so it is tagged with that series. It was also inspired by Reddit responses to my first OOP post.

How Default Parameters Work (in e.g. C++)

So before I talk about why Rust doesn’t have default parameters and what you can do instead, let’s talk a bit about what default parameters are and the situations in which they are useful.

Let’s say you have a function that takes many parameters, perhaps (to take an example from the Reddit response) one that creates a window in a GUI:

WindowHandle createWindow(int width, int height, bool visible)

auto handle = createWindow(10, 30, false); // Create invisible window
auto handle2 = createWindow(100, 500, true); // Create visible window

Now, let’s say that you assume that most windows that are created are intended to be visible, and you don’t want to burden the programmer with having to specify whether the window is visible – or even think about it explicitly – in that normal case. In a programming language that supported default parameters, you could then provide a default for visible.

WindowHandle createWindow(int width, int height, bool visible = true)

auto handle = createWindow(10, 30, false); // Create invisible window!

auto handle2 = createWindow(100, 500, true); // Create visible window!

auto handle3 = createWindow(100, 500); // Also create visible window!
auto handle4 = createWindow(100, 500); // Most of the time, that's what
auto handle5 = createWindow(100, 500); // you want, so why have to say it?

Default parameters can also be simulated with function overloading for programming languages where function overloading is available but default parameters are not:

WindowHandle createWindow(int width, int height, bool visible);

WindowHandle createWindow(int width, int height) {
    return createWindow(width, height, true);
}

Rust also does not have function overloading, and that’s a much more complicated issue, but many of the same arguments apply to this idiom.

Benefits (and Detriments) of Default Parameters

Defaults are good, and default parameters in this style are one way to implement them and reap their benefits.

Defaults are good because they uphold the DRY principle – Don’t Repeat Yourself. If we didn’t have defaults, we’d have to repeat parameters that don’t actually contribute to understanding of the goals of the code. And if the best default parameters changed in such a way that the best way to update the code was to continue using the default – perhaps because of a change of best practices – we’d have to update every call rather than just changing it once, where the default parameter is defined.

Defaults are also good because they decrease the programmer’s cognitive load. Programmers have to keep a lot of information in their brain at a time, and defaults help programmers by not forcing them to think about extra details when they don’t matter – which is the usual situation for most defaults.

Default parameters also make the code more concise, and are popular for that reason. But this isn’t a particular value that I have. I believe the DRY principle is important, and that often amounts to more concise code, but given modern editors and IDE, and modern expectations of typing and reading speed, a moderate amount of verbosity in exchange for other benefits (such as clarity and explicitness) is completely acceptable to me. I believe that default parameters, as they are implemented in C++ and Python, have a substantial cost in clarity and explicitness, and therefore conciseness isn’t a good enough reason to justify them.

In this case, what particularly bothers me about the lack of clarity is that the reader of the code doesn’t know that there are potentially more parameters; there is no hint that there might be other parameters. If a maintenance programmer wants to change one of these calls to make invisible windows instead, they might not realize they should check the documentation for create_window: after all, it only seems to take two parameters, and neither of them have anything remotely to do with invisible windows.

Fortunately, Rust has alternative features that allow us to reap the benefits for cognitive load and DRY without sacrificing explicitness and clarity.

Defaults in Rust: the `Default` trait

Rather than allowing default parameters, Rust allows you to optionally specify default values for your types using the Default trait. Here’s how it works:

enum Foo {
    Bar,
    Baz,
}

impl Default for Foo {
    fn default() -> Self {
        Foo::Bar
    }
}

Or, written using the more concise derive syntax:

#[derive(Default)]
enum Foo {
    #[default]
    Bar,

    Baz,
}

Once this default is defined, Foo::default() or even (in a context where the type is clear) Default::default() can stand in for Foo::Bar.

If you are used to re-using existing types for your function parameters, this might seem worse than useless. After all, the parameter we defaulted was of type bool, and the orphan rule (explained in the Rust book’s chapter on traits) forbids us from defining the Default trait on bool – as I alluded to above, Default allows you to define default values for your types. And even if we could, setting a default on booleans is way too overpowered a thing to do just to give this one function parameter have a default! After all, some other function might also have a boolean parameter with a different default.

But this makes more sense if you consider that in Rust, it is common – even idiomatic and preferred – to create custom types for things like configuration and function parameters. After all, if you’re not looking at the documentation, it can be unclear what true means. It’s not even clear that it has anything to do with visibility, let alone that true means that the window is to be visible when the parameter could just as easily be called invisible.

In Rust, we would prefer to define a new type for this situation, an enum listing the visibility options – which will also help if a new visibility option is created. And on this enum, it would be reasonable to declare a default:

#[derive(Default)]
enum WindowVisibility {
    #[default]
    Visible,

    Invisible,
}

Yes, this is more verbosity, but it is more clear, and no less DRY, than our original code. Conciseness is again not a value in and of itself. Explicitly listing the options is preferred to leaving them implicit.

Then, when we call the function, we can use this default:

fn create_window(width: u32, height: u32, visibility: WindowVisibility) -> WindowHandle;

let handle = create_window(10, 30, WindowVisibility::Invisible);
let handle2 = create_window(100, 500, WindowVisibility::Visible);

let handle3 = create_window(100, 500, WindowVisibility::default());
let handle4 = create_window(100, 500, WindowVisibility::default());
let handle5 = create_window(100, 500, Default::default()); // Also permitted

This is, as promised, more verbose, but equally DRY, and much more explicit and clear.

NB: I’m using free-standing functions for example purposes only. In reality, this particular function is just as likely to be part of a type’s intrinsic methods, something like WindowHandle::new or WindowHandle::create_window.

Scaling defaults in Rust: Struct update syntax

So this is all well and good for one default. But it doesn’t scale that well. What if we want to add another 3 parameters to our window creation function? In a language like C++, we can give them defaults, and the callers don’t even need to be updated (parameters are for example purposes only and do not represent a well-thought out list of what you might want to specify in creating a window):

WindowHandle createWindow(int width, int height, bool visible = true,
                          WindowStyle windowStyle = WindowStyle::Standard,
                          int z_position = -1,
                          bool autoclose = false);

createWindow(100, 500); // Still works identically
createWindow(100, 500, false); // Also still works
createWindow(100, 500, false, WindowStyle::Standard, 2, true); // Specify everything

This is a useful feature. In Rust, with the techniques we’ve discussed so far, we’d have to write Default::default() repeatedly for however many parameters there are. This is a DRY violation, and interferes with the ability to add new parameters.

There is a flaw with this feature, however. You’ve now constrained yourself to specifying parameters to the left in order to specify parameters on the right. In the last example call to createWindow, we violate DRY by explicitly specifying a value when we probably wanted to use the default, but that wasn’t available because we wanted to override the default for a later parameter.

Fortunately, Rust has a version of this too. Just as we created an enum just for the purposes of this function call, it is idiomatic in Rust to create structures for configuration parameters like this. The structure would look something like this:

pub struct WindowConfig {
    pub width: u32,
    pub height: u32,
    pub visibility: WindowVisibility,
    pub window_style: WindowStyle,
    pub z_position: i32,
    pub autoclose: AutoclosePolicy,
}

Then, we can implement Default for that entire struct:

impl Default for WindowConfig {
    fn default() -> Self {
        Self {
            width: 100,
            height: 100,
            visibility: WindowVisibility::Visible,
            window_style: WindowStyle::Standard,
            z_position: -1,
            autoclose: AutoclosePolicy::Disable,
        }
    }
}

Now, this might seem to be extremely tedious to use. You might imagine using it something like this:

let mut config = WindowConfig::default();
config.width = 500;
config.z_position = 2;
config.autoclose = AutoclosePolicy::Enable;
let handle = create_window(config);

I would argue that even this is preferable to default parameters, because again, it is explicit. However, Rust has a syntactic construct designed exactly for situations like this, struct update syntax. With it, we get something very similar to default parameters, but a little more verbose, a lot more explicit, and a lot more flexible:

let handle = create_window(WindowConfig {
    width: 500,
    z_position: 2,
    autoclose: AutoclosePolicy::Enable,
    ..Default::default()
});

Unlike C++-style default parameters, we can override exactly the defaults we want to. It is also explicitly clear that there are other parameters we could modify if we wanted to, without forcing the maintenance programmer to check the documentation.

But beyond that, this allows there to be other sets of defaults defined. In addition to WindowConfig::default, there might be another set of configuration parameters for creating dialog boxes, like WindowConfig::dialog() or WindowConfig::default_dialog. An app where the programmer usually creates invisible windows, or windows all of the same height, might define its own default set, config::app_local_default_window_config(). These wouldn’t be mediated through the Default trait, but Default is just a trait, and Default::default() is just a method call. You can call your own methods instead, and still use this struct update syntax.

So now, we have a system of idioms in Rust to replace default parameters. It’s just as DRY, and decreases the cognitive load just as much. More importantly, it does so without sacrificing explicitness and clarity as to exactly what’s going on – a given function always takes the same number of parameters, which is an invariant that Rust maintenance programmers can (and do) rely on.

The Builder Pattern

At this point, the old-hand Rustaceans in the audience will note that I haven’t discussed one common Rust approach to designing these configuration structs, the builder pattern.

That’s for a reason: I don’t like it. I personally prefer to use Default and struct update syntax where others might reach for the builder pattern. I think it’s less explicit, and since I have a lot of experience in non-OOP programming languages, it feels to me like a solution without a problem, the primary upshot of which is to make the code look more object-oriented.

But it is a commonly used pattern in Rust, and you will use crates that use the builder pattern, so it’s worth being familiar with it. It’s the same concept as before: using a struct full of parameters to send configuration to a constructor or to a function call. It’s probably going to be called something like WindowBuilder instead of WindowConfig.

However, instead of using the struct update syntax directly, a bunch of helper methods are added to do the struct update:

impl WindowBuilder {
    fn height(mut self, height: u32) -> Self {
        self.height = height;
        self
    }

    // ...
}

Or, as I would notate it:

impl WindowBuilder {
    fn height(self, height: u32) -> Self {
        Self {
            height,
            ..self
        }
    }

    // ...
}

Sometimes, enumerations are split into multiple update methods:

impl WindowBuilder {
    fn autoclose_enable(mut self) -> Self {
        self.autoclose = AutoclosePolicy::Enable;
        self
    }

    fn autoclose_disable(mut self) -> Self {
        self.autoclose = AutoclosePolicy::Disable;
        self
    }
}

Then, normally, instead of calling e.g. the window constructor, you call a build method defined on the builder (and at this point I cringe at the gratuitous OOP philosophy influencing the design):

impl WindowBuilder {
    fn build(self) {
        window_create(self)
    }
}

Then, instead of using struct update syntax, you chain together calls to these methods:

let handle = WindowBuilder::new()
    .width(500)
    .z_position(2)
    .autoclose_enable()
    .build();

I still prefer this to default parameters, but I also find it tacky. I don’t like being forced to think in terms of abstract “objects” like builders, and I don’t like the presumption that this style is more intuitive. Why is a “builder” an object that does something? Why is that prefered to a structure that is “configuration”? Are OOP programmers aware that in real life, the vast majority of objects literally don’t do things, and certainly don’t build other objects?

But for people familiar with the idioms of object-oriented programming, this might be preferable. It is a commonly chosen option, so it’s important at least to recognize it.

Conclusion and Application

Rust has a lot of idioms that are different from those in other programming languages. I often see proposals from new Rustaceans to add default parameters – and other similar features – to Rust, and these new Rustaceans are confused that the strong demand they feel is not as widely felt in the greater Rust community.

And normally, it’s similar to this situation with default parameters. There are alternative idioms that accomplish the same goals, to the extent that those goals are in line with Rust’s values: in this case, DRYness, and reducing developers’ cognitive loads. They are also better solutions in some other ways, according to Rusty values: the additional explicitness is worth a little more verbosity.

But often, the new Rustaceans making these proposals are unaware of the Rusty way of doing things. And if they are aware of it, they are approaching it from the goals of other programming languages, and don’t see how the solution measures up.

So I hope this can serve as a case study to help people understand that there often are Rusty ways of accomplishing the goals of popular features from OOP land, and why Rustaceans prefer these solutions to blind accumulation of features.

Christmas Disappointment: Smashing Princes and Cities

2023-01-03T00:00:00+00:00

Today, in liturgical Western Christianity, it is the 10th day of Christmas. Merry Christmas to those who celebrate the extended edition of the holiday!

Unfortunately, this essay is not a celebration of Christmas, but rather an explanation of why I have often found it disappointing recently in life, because of a disconnect between the promise and the reality.

Every time Christmas comes around, I think of a classical sacred choral piece that I’ve performed in multiple different choirs in youth and adulthood, from Mendelssohn’s Christus, namely “Es Wird ein Stern aus Jakob Aufgeh’n” (“There shall come a star out of Jacob”).

These are the words:

Es wird ein Stern aus Jakob aufgeh’n
Und ein Scepter aus Israel kommen;
Der wird zerschmettern Fürsten und Städte

The times we’d sung it in English, this was sung as:

There shall a star come out of Jacob
and a scepter shall rise out of Israel,
with might destroying princes and cities.

And, of course, the line about destroying princes and cities is sung loudly, calamitously, and – perhaps strangely – triumphantly. I remember having a discussion once with someone who was confused by this, but then suddenly got it: “Oh!! It’s a good destroying princes and cities!”

And the English “destroying” is too weak. Without the constraint of a singable text, I’d translate the German thus:

There shall arise a star out of Jacob
And a scepter shall come out of Israel
He shall shatter sovereigns and cities

Fürsten is normally translated “princes,” but means “princes” in the sense of “sovereign leaders of principalities” or “heads of state,” rather than in the more common modern sense of “sons of monarchs.”

This text is based off of a Bible verse, Numbers 24:17, where specific enemy nations of Israel are called out as those this “star” will destroy:

I see him, but not now;
I behold him, but not near:
a star shall come out of Jacob,
and a scepter shall rise out of Israel;
it shall crush the forehead of Moab
and break down all the sons of Sheth.

Numbers 24:17 ESV

Of course, like many things in the Hebrew Bible, Christians reinterpreted this text to be a Messianic prophecy about Jesus. Jesus made no literal war against anybody, let alone Moab and the sons of Sheth, so that line was reinterpreted as some sort of metonym, perhaps meaning “any power of this earth opposed to God’s people.”

In the Mendelssohn piece, the transformation goes further, and the text replaces the specific with the general: the star, the Christ, will “shatter princes and cities.”

This is seen as a good thing. The feeling that I have always gotten from this section of the piece is reminiscent of a leftist jubilantly speaking of “dismantling power structures,” but with much more poetic, less wonky language. The rulers and their strongholds will be shattered. The people who are currently in charge – and who are clearly responsible for many of the world’s problems due to their selfishness – will be violently set aside.

It is reminiscent of Jesus’s declaration that “the first will be last, and the last will be first,” or the similarly anti-rich and anti-powerful declaration in the “Magnificat” (Luke 1:51-53, Book of Common Prayer):

[God] hath shewed strength with his arm:
he hath scattered the proud in the imagination of their hearts.
He hath put down the mighty from their seat:
and hath exalted the humble and meek.
He hath filled the hungry with good things:
and the rich he hath sent empty away.

All of this is connected to Christmas in that Jesus, as the Christ, is supposed to be the “star of Jacob.” It is Jesus that is supposed to shatter the princes and cities, and Christmas celebrates his birth and coming into the world.

The problem, and the part that makes me sad when I think about this song, is that princes and cities still are around. And, unfortunately, they are as oppressive as ever.

Recently, in the US, we’ve had the particularly intense prince that is President Trump, and the city that was his version of Washington, DC. And throughout the world we have others: There is Vladimir Putin, and his well-fortified Moscow, protected with nuclear weapons rather than walls. There is Kim Jung Un in North Korea, and Xi Jinping in China. These figures come off like movie villains – or like the many evil sovereigns in the Bible.

And they all remain thoroughly unshattered. They’re still there, even though Christmas happened 2022 years ago, when Jesus was born to shatter them.

I could recite to you all sorts of standard Christian explanations for this: that the princes and cities have in fact been shattered, in a spiritual sense; that this is a process that happens over time through the church; and that the final, literal shattering will occur once Jesus returns.

But that honestly seems like a moving goalpost. The piece doesn’t say “a star shall arise and then set, and then arise three days later, and then eventually after thousands of years shall shatter princes and cities.”

In the early 20th century, the worldwide Communist movement tried to take matters into their own hand, and shatter the princes and cities by means of violent revolutions. But of course, they couldn’t shatter human nature. After the Tsar and the oligarchs were shattered, new ones arose. Stalin and Mao Tse Tung were even worse princes than those who came before them.

That is unfortunately how revolution usually goes. Given that, you can see why some feel the need for a divinely-appointed hero to come in and do the shattering, who will reign with justice afterwards rather than requiring another round of shattering in another couple decades.

But Jesus literally didn’t do that. And no one seems to be forthcoming. And if they were, they’d be more likely to be another Stalin than a “star of Jacob.”

The Mendelssohn is a beautiful piece, but now, it just reminds me of the evil princes and cities that remain unshattered. The promised shattering feels like just a fantasy.

A Life (and Blog) Theme for the Coming Year

2022-12-21T00:00:00+00:00

Happy December! Happy Winter Holidays! We’re almost done with 2022!

I just had my birthday yesterday, on December 20. I am now 34 years old, which is more than a third of a century! I generally take the opportunity on my birthday to do some reflection on the previous year, and to set a theme for the next year. I wanted to share both with you, my audience.

The past year has been intense for me personally. It’s just been a laundry list of life changes and achievements:

I moved from New York City to a small town.
I bought a house and got a mortgage.
I started medication (Atomoxetine) for my ADHD, an experience I want to blog about, and will as soon as I figure out what exactly I actually have to say about it.
I’ve made new friends and deepened other friendships.
I got an (antique family) upright piano installed in my house.
I got a home gym installation.
I’ve greatly refined and improved my organizational system.
I’ve started a weekly rock-climbing habit.

And, most relevantly for this audience, this blog has accelerated:

I have posted far more than any previous year, averaging 3 times per month (not including this post):

$ ls | grep ^2 | cut -f1 -d- | uniq -c # Count posts per year
      3 2017
      1 2018
     17 2019
      5 2020
      3 2021
     36 2022

I have migrated to Hugo, making posting super easy and the design mobile-friendly without me having to do my own front-end/CSS work.
I added comments.
I added a newsletter, and just today, added a subscription form to every page (thanks, subscribers!)
I’ve written or at least started several series:
- Comparing C++ to Rust
- Explaining my organizational system
- Explaining how Rust’s paradigm differs from OOP

My theme for this past year – or perhaps for 2021, I’m not sure, but I think it was in practice a 2-year theme – was rebuilding, and I’ve certainly done that. I’ve built a new life, in a new place, with so many different habits. And I’m really happy with the results.

Now, what I really want to do is let these new habits mature, to build on top of what I built. And so, a dear friend of mine suggested growth as the theme for this coming year. Rather than focusing on planting new seeds in the garden of my life, as I’ve been doing, I should let the existing seeds that are already planted grow and mature.

Here is part of what I hope that will mean in practice:

Exercise
- Continue rock climbing
- Use the home gym more regularly
Music
- Solidify the habit of practicing the piano
- Try to learn new skills on it (perhaps getting lessons)
Friendships
- Deepen my relationships with the people I’ve gotten closer to (literally and figuratively) and met over the past year
Organization
- Use my system more consistently
- Continue to reap the benefits of having it at all

And of course, last but not least, this blog. I would like this blog to grow. Getting a newsletter is the first and obvious step, but I have some other ideas as well.

For one thing, I hope to take some of my blog posts and series and turn them into “gardens” rather than “streams”. Rather than a series of posts that boom in readership and then get forgotten about, organized by time-line, I think a well-organized complete documentation on, say, my opinions about Rust vs C++, or perhaps just a list of Rust opinions, might be more useful as a reference, and allow me to refine my positions over time, and address objections (even if some on Reddit think addressing objections is somehow a sign that I’m being disingenuous or not open-minded).

For another thing – speaking of Reddit – I’d like to move away from polemic towards education and insight. Many of my posts now are about why Rust is better than C++ – which I still think is an important topic – but some of them are about how to do things in Rust, or how Rust works. My OOP series, while starting out with a fair amount of polemics, will hopefully show a lot of examples and provide a lot of insights into how to use Rust’s features to structure code for people who are used to OOP-style programming languages, and will help people do their job more directly. I don’t think I’m going to be able to fully give up polemics – I am, after all, a self-described “temperamental opinionator” – but I’d like to spend more of my time explaining how to use Rust and why it is the way it is, and less time arguing with angry people on Reddit who take criticism of their preferred tools as a personal attack.

I’d also like to increase the amount of non-technical blogging I do. I used to post fiction on my blog, and some of it I’m even proud of. I’d like to return to doing that. I’d like to share more of my insights about ADHD in specific and neurodiversity in general, specifically about medication and the philosophical question about what a diagnosis like ADHD even means. I’d like to write more about religion, a complicated issue that I have some insights on.

And finally, I’d like to blog some about Reflex, my favorite GUI programming framework, and one that is in Haskell and completely independent of the object-oriented tradition. It’s an excellent hidden gem of a library, and I think it deserves some good resources for it.

If any of you have any particular requests, please let me know! Many of you have already given me ideas for new posts – I have dozens of outlines and hundreds of ideas – but specific requests are (sometimes) very motivating.

I’d like to thank you all so very much for reading! I hope you all have had a good 2022, and I wish all of you the best possible 2023. If you do set a new theme for the new year, may it help you live your best life.

Rust Is Beyond Object-Oriented, Part 1: Intro and Encapsulation

2022-12-12T00:00:00+00:00

Rust is not an object oriented programming language.

Rust may look like an object-oriented programming language: Types can be associated with “methods,” either “intrinsic” or through “traits.” Methods can often be invoked with C++ or Java-style OOP syntax: map.insert(key, value) or foo.clone(). Just like in an OOP language, this syntax involves a “receiver” argument placed before a . in the caller, called self in the callee.

But make no mistake: Though it may borrow some of the trappings, some of the terminology and syntax, Rust is not an object-oriented programming language. There are three pillars of object-oriented programming: encapsulation, polymorphism, and inheritance. Of these, Rust nixes inheritance entirely, so it can never be a “true” object-oriented programming language. But even for encapsulation and polymorphism, Rust implements them differently than OOP languages do – which we will go into in more detail later.

This all comes as a surprise and an adjustment to a lot of programmers. I see Rust newbies on Reddit asking how to implement OOP design patterns literally, trying to get “class hierarchies” like “shapes” or “vehicles” working with traits standing in as “the Rust version of inheritance” – in other words, trying to solve problems they only have because they’re committed to the OOP approach, and doing contrived OOP examples to try to learn what they expect to be just another version of it.

It’s a stumbling block for many. I regularly see “lack of OOP” mentioned on the Internet by Rust newbies and sceptics as a reason Rust is hard to adjust to, or not a good fit for them, or even why it will never catch on. For people who learned to program in the height of OOP as a trend – when perfectly good languages like C and ML had to become object-oriented as Objective-C and OCaML – the amount of hype about a non-OOP language just feels off.

It’s not an easy adjustment either. So many programmers learned software design and architecture in an explicitly object-oriented way. I see question after question where a beginning or intermediate Rust programmer wants to do an object-oriented thing, and want a literal Rust equivalent. Often, these are examples of the XY problem, and they have trouble backtracking and approaching the problem in a more Rusty way.

But that isn’t Rust’s fault. The answer is still for us to adjust, even if it isn’t easy; being proficient in not only multiple languages but also different programming paradigms makes us better programmers.

And, as a paradigm, OOP is actually thoroughly mediocre – so much so that I’m writing a whole blog series to explain why, and why Rust’s approach is better.

OOP Ideology

Look, I get it. I used to drink the OOP Kool-Aid myself. I remember how it was billed to us: not as just a set of code organization practices, but a revolution in programming. The OOP way was held up as more intuitive, especially to non-programmers, because it would align better with how we think of the natural world.

For an archetypical example of this marketing, here is an excerpt from the first public article about OOP in a popular magazine (Byte Magazine, in 1981):

Many people who have no idea how a computer works find the idea of object-oriented programming quite natural. In contrast, many people who have experience with computers initially think there is something strange about object oriented systems.

It was pretty easy to buy into, as well. Of course, our everyday life doesn’t have anything like subroutines or variables – or, to the extent that it does, we don’t think about them explicitly! But it does have objects that we can interact with, each with its own capabilities. How could it not be more intuitive?

It’s very compelling pseudo-cognitive science, light on research, heavy on really persuasive rationales. The objects can be thought of as “agents,” almost as people, and so you could leverage your social skills towards it instead of just analytical thinking (never mind that objects act nothing like people, and actually substantially dumber in a way that still requires analytical thinking). Or, you can think of objects and classes as an almost-platonic representation of the world of forms itself, making it philosophically compelling.

And oh, how I bought in, especially in my wanton and reckless youth. I personally soaked up the connection between OOP and Platonic philosophy. I delved deep into meta-object protocols, and the fact that in Smalltalk every class had to have a metaclass. The concept of the Smalltalk code Metaclass class felt almost mystical to me, as the notion that any value could be organized in the same hierarchy, with Object at its root.

I remember reading in a book that OOP-style polymorphism made if-else statements redundant, and therefore we should strive to ultimately only use OOP-style polymorphism. Somehow, instead of putting me off, this excited me at the time. I was even more excited when I learned that Smalltalk in fact does this (if you ignored implementation details that optimize away some of this abstraction): In Smalltalk, the concept of if-then-else is implemented via methods like ifTrue: and ifFalse: and ifTrue:ifFalse: on the single-instance True and False classes, with their global objects, true and false.

As a more mature programmer, exposed to the less ideological OOP of C++ and the alternative of functional programming in Haskell, my positions softened, and then shifted dramatically, and now I am barely a fan of OOP at all, especially as its best ideas have been carried on to a newer synthesis in Haskell and Rust. I’ve realized that this hype about new programmers is typical for any paradigm; any new programming paradigm is more intuitive for a newbie than it is for someone who’s a veteran programmer in a different paradigm. The same thing is said for functional programming. The same thing is even said for Rust. It really doesn’t have that much to do with whether a paradigm is better.

As for if statements being fully replaceable by polymorphism, well, it’s easy to come up with a set of primitives that are Turing-complete. You can simulate if statements with polymorphism, true. You can also simulate while loops with recursion, or recursion with while loops and an explicit stack. You can simulate if statements with while loops.

None of these facts make such substitutions a good idea. Different features exist in a programming language for different situations, and making them distinct is actually a good thing, in moderation.

After all, the point of programming is to write programs, not to make proofs about Turing-completeness, do philosophy, or write conceptual poetry.

Practicality

So, in this blog series, I intend to evaluate OOP in practical terms, as a programmer with experience in what makes programming languages cognitively more manageable or easy to do abstraction in. I will do it in terms of my experience solving actual programming problems – I see it as a bad sign that many examples of how OOP abstractions work only make sense in really advanced programs or with contrived examples about different types of shapes or animals in a zoo.

And unlike most introductions to OOP, I will not primarily be focusing on how OOP compares to pre-OOP programming languages. I will instead be comparing to Rust, which takes many of the good ideas from OOP, and perhaps also to functional programming languages like Haskell. These programming languages have taken some of OOP’s good ideas, but transformed them in a way that fixes some of their flaws and moves them beyond what can reasonably be called OOP.

I will organize this comparison according to the three traditional pillars of object-oriented programming: encapsulation, polymorphism, and inheritance, with this first article focusing on encapsulation. For each pillar, I will discuss how OOP defines it, what equivalents or substitutes exist outside of the OOP world, and how these compare for practical ease and power of programming.

But before I jump in, I want to talk a second about a use case that turns much of this on its head: graphical user interfaces or GUIs. Especially before the era of the browser, writing GUI programs to run directly on desktop (or laptop) computers was a huge part of what programmers did. A lot of early development of OOP was done in tandem with research into graphical user interfaces at Xerox PARC, and OOP is uniquely well-suited for that use case. For this reason, the GUI deserves special consideration.

For example, it is common for people to emulate OOP in other programming languages. Gtk+ is a huge example of this, implementing OOP as a series of macros and conventions in C. This is done for many reasons, including familiarity with OOP designs and a desire to create some kind of run-time polymorphism. But in my experience, this is most common when implementing a GUI framework.

In this series of articles, we will primarily focus on applying OOP to other use cases, but we will also discuss GUIs as appropriate. In this introductory section, I will just point out that GUI frameworks are clearly possible outside traditional OOP designs and programming languages, and even in Rust. Sometimes, they work by completely different mechanisms, like the functional-reactive programming mostly pioneered in Haskell, which I personally prefer to traditional OOP-based programming and for which traditional OOP features would not be helpful.

Now, without further ado, let us compare OOP to Rust and other post-OOP programming languages, pillar by pillar, from a pragmatic perspective. For the rest of this first post, we will focus on encapsulation.

First Pillar: Encapsulation

In object-oriented programming, encapsulation is bound up with the idea of a class, the fundamental layer of abstraction in object-oriented programming. Each class contains a layout for some data in a record format, that is, a data structure where each instance contains a set number of fields. Individual instances of the record type are known as “objects.” Each class also contains code that is tightly paired to that record type, organized into procedures called methods. The idea is then that all of the fields will only be accessible from inside the methods, either by the conventions of OOP ideology or by the enforced rules of the programming language.

The fundamental benefit here is that the interface, which is how the code interacts with other code, or what you have to understand to use the code, is much simpler than the implementation, which are the more fluidly changing details of how the code actually accomplishes its job.

But of course, lots of programming languages have abstractions like this. Any program longer than a dozen lines has too many parts to keep in your brain all at once, and so all remotely modern programming languages have ways of dividing a program into smaller components, as a way to manage the complexity, so that the interface is simpler than the implementation, whether enforced by the programming language or a matter of the “honor system.” So in a broader sense of the word, all modern programming languages have some version of encapsulation.

One simple form of encapsulation – one that most object-oriented programming languages maintain as a layer within the class – is procedures, also known as functions, subroutines, or (as OOP calls them) methods. Rather than allow any line of code to jump to any other line of code, modern programming languages tend to group blocks of code together into procedures, and you can then change the contents of the procedure without affecting the outside code, and change the outside code without affecting the procedure, as long as they follow the same interface and contract.

The contract is usually at least partially a human-level convention. There’s not usually much stopping you from taking a procedure that is supposed to process some data and instead making it instead loop indefinitely or crash the program. But some of it, like the separation of the procedure from the rest of the program, and in many cases the number and types of values it is allowed to accept and return in an invocation, will be enforced by the programming language.

For example, variables declared inside the procedure are usually local, and there’s generally no way to reference them outside the procedure. The inputs and outputs are usually listed in a signature at the top of the procedure. Normally, outside code can only enter the procedure on its first line, rather than on an arbitrary line half-way through. In some programming languages – including Rust – procedures can even contain other procedures, which can only be called within the outer procedure.

But of course, modern programs are often more complicated than a mere handful of procedures. And so, modern programming languages (and again, the word “modern” here is being used in a very loose way) have another layer of encapsulated abstraction: modules.

Modules will generally contain a group of procedures, some externally accessible, and some not. And in non-duck typed languages, they will generally define a number of aggregate types, again some externally accessible, and some not. It is generally even possible to expose these types abstractly, so the existence of a type is accessible to the rest of the program, but not the record fields, or even the fact that it is a record type. Even C has this ability in its module system – C++ did not introduce it, just added an additional, orthogonal level of field-by-field access controls.

Seen from my pragmatic point of view, class-based encapsulation is not some special insight of OOP, but a specialized – or rather, tightly restricted – form of module. In an OOP programming language, we have this notion of a class, which is a special form of module (sometimes the only supported form, or sometimes even layered underneath a completely different, more traditional notion of module, for extra confusion). It’s just that, for a “class,” there can only generally be one primary type defined, which shares a name with the module itself, and where the fields of that type are given special protection against access by code outside the class.

Of course, there are other differences between a class and a module, but these have to do with the other pillars, and we will get to them later. For right now, we will just discuss the idea of a “class” as it relates to encapsulation – where a class is just a special module with one privileged, abstracted type.

And this is a reasonable way to write a module, but it’s not as special as object-oriented programming makes it out to be (especially once we discuss alternative approaches to the other pillars, but again, more later). There are some situations where a module doesn’t have any record type that it defines, which is awkward in programming languages like Java, where you have to define an empty record type anyway and still make a “class.” There are also situations in which a module defines multiple publically accessible types that are tightly entangled – and where the encapsulation between those types that OOP style would encourage you to do is more of a hinderance than a help.

Fundamentally, being able to hide the fields of a record from other modules is important, which is why even C supports it. It is even essential for implementing safe abstractions over unsafe features in Rust, such as for collections, where raw pointers have invariants in combination with other fields in the same record. But it is not new to OOP, and it is simply not the best choice for every possible type.

As evidence of this, in Java and Smalltalk, and to a lesser extent even in C++ or Python, the insistence on a one-type-per-class style of encapsulation means that you get these boilerplate methods like setFoo and getFoo. These methods do nothing but serve as field accessors for something that is fundamentally a dumb record type. In theory, this helps you if you want to change what happens when these fields are set or read, but in practice, the fact that they are raw field accessors is part of the contract. If they, for example, instead made a network call rather than just returning a value, that would strongly value the principle of surprise for such simply named methods.

It is far simpler to say:

pub struct Point {
    pub x: f64,
    pub y: f64,
    pub z: f64,
}

… than the Java idiomatic “JavaBean” equivalent from when I was a Java programmer (Java has apparently changed since then, but this is representative of many OOP programming languages including Smalltalk and many books on how to program):

class Point {
    private double x;
    private double y;
    private double z;

    double getX() {
        return x;
    }

    void setX(double x) {
        this.x = x;
    }

    double getY() {
        return y;
    }

    void setY(double y) {
        this.y = y;
    }

    double getZ() {
        return z;
    }

    void setZ(double z) {
        this.z = z;
    }
}

Such data types generally don’t use any of the other features that OOP classes get, such as polymorphism or inheritance. To use such features in such “JavaBean” classes would also violate the principle of least surprise. The “class” concept is overkill for these record types.

And of course, a Java developer (or Smalltalk, or C#) will say that by accessing the fields indirectly through these getter and setter methods, that they are future-proofing the class, in case the design changes (and in fact I was reminded to add this paragraph when someone on Reddit made exactly this point). But I find this disingenuous, or at least misguided – it is often used for structures internal to a portion of the program, where the far more reasonable thing to do would be to change the fields openly to all users of the structure. It is also extremely difficult to think of an unsurprising thing for these methods to do besides literally set or get a field, as the method name implies – making a network call, for example, would be a shocking surprise for a get or set method and therefore a violation of at least the implicit contract. In my time programming object-oriented programming languages, I never once saw a situation where it was appropriate for a getter or setter to do anything but literally get or set the field.

If code does change to require the getter or setter to do something else, I would rather change the name of the method to reflect what else it does, rather than pretend that’s somehow not a breaking change. fetchZFromNetwork or setAndValidateZ seem more appropriate than a getZ or setZ that does something more than the simple field access that we assume a setter or getter does. OOP’s insistence that every type should be its own code abstraction boundary is often absurd when applied to these lightweight aggregate types. These sorts of getters and setters are used to protect an abstraction boundary that shouldn’t exist and just gets in the way, and future-proof against implementation changes that shouldn’t be made without also changing the interface.

Setters and getters, in short, are an anti-pattern. If you intend to create an abstraction besides “data structure,” where validation or network calls or anything else beyond raw field accesses would be appropriate, then these get and set names are the wrong names for that abstraction.

Edit 2023-02-13 to add this paragraph: To be clear, these objections apply to properties as well. It’s not the syntactic inconvenience that I object to, but the entire notion that replacing field accesses with code transparently is a good thing to strive for, or an important possibility to leave open. I should hope that foo.bar = 3 would never make a network call in Rust! And what if it had to be async? It should be clear if I’m calling a function. Rust is about explicitness.

The get and set functions, in reality, are only used as wrappers to satisfy the constraints of object-oriented ideology. The future-proofing they purportedly provide is an illusion. If you provide “JavaBean” style types, or types with properties, over an abstraction boundary, you are in practice just as locked in as if you’d provided raw field access – the changes you are most likely to want to make to those structures would not allow shifting the getters and setters to maintain compatibility. Leveraging this future-proofing is likely to be completely impossible for the changes you’d want to make, and at best it would involve a horrendous hack.

Rust might seem to be the same as OOP languages in all of this; it superficially looks like it has something very similar to classes. You can define functions associated with a given type – and they are even called methods! Like OOP methods, they syntactically privilege taking values of that type (or references to those values) as the first argument, called the special name self. You even mark fields of a record type (called struct in Rust) as public or (by default) private, encouraging private fields just like in an object-oriented programming language.

According to this pillar, Rust seems pretty close to being OOP. And that’s a fair assessment, for this pillar, and an intentional choice to make Rust programming more comfortable to people used to the everyday syntax of OOP programming in C++ (or Java, or JavaScript).

But the similarity is only skin-deep. Encapsulation is the least distinct pillar of OOP (after all, all modern programming languages have some form of it), and the implementation in Rust is not bound with the type. When you declare a field private in Rust (by not specifying pub), that doesn’t mean private to its methods, that means private to the module. A module can provide multiple types, and any function in that module, whether a “method” of that type or not, can access all of the fields defined in that type. Passing around records is encouraged when appropriate, rather than discouraged to the point that accessors are forced instead, even in tightly-bound related code.

This is the first sign we see that Rust, in spite of its superficial syntax, is not an OOP programming language.

Future Posts

And at this point I’m going to have to pause for today.

Of course, encapsulation isn’t the only fancy thing OOP-style classes can do. If it were, classes wouldn’t have enamored so many people: it would simply be obvious to everyone that classes were nothing more than glorified modules, and methods nothing more than glorified procedures.

In the next posts of this series, we will discuss the other features associated with OOP, the two remaining traditional pillars of OOP, polymorphism and inheritance, analyze them from a practical point of view, and see how Rust compares with OOP as it comes to those pillars.

Next up will be polymorphism!

How to Write a JIRA Ticket in ... Relatively Few Steps

2022-10-31T00:00:00+00:00

If you’re confused by how to use JIRA effectively, do not worry! If you learn this process, which is ~~very simple~~ not literally impossible, you too can become ~~good at JIRA~~ ~~passingly competent at JIRA~~ not liable to being fired for being bad at JIRA.

Here are the steps:

Create personal TODO item to write JIRA ticket
- Accumulate requirements for JIRA ticket in personal notes
  - Often more complicated than the feature itself
    - This is the System Working™
- Write TODO items strategizing how to:
  - Share the JIRA ticket with other people
  - Connect it properly with other JIRA tickets
    - Advanced: Also epics, projects, or other meta-JIRA constructs
Write JIRA ticket
- Fail to understand what any of the fields are for
  - Oh, they’re required?
- Ask random people for appropriate values for required fields
  - Sometimes they never get back to you
  - Or they get back two days from then
  - In the meantime, forget you were writing a JIRA ticket
    - And then get reminded only by personal TODO list item
      - You did write one of those, right?
- Curse the names of whoever designed the schema
  - Find out it’s someone you actually liked
    - It made sense at the time
      - No, it cannot be changed now
Do follow up connecting JIRA ticket to other people’s JIRA
- Argue with people about whether JIRA set-up appropriate
- Reconcile said arguments
Relitigate everything at next stand-up meeting
- Potentially go back to beginning to write JIRA ticket again
Be too tired to code anymore
- What even is code?

First Impressions of Asahi Linux

2022-10-24T00:00:00+00:00

I bought my M1 Mac over a year ago with the intention of installing Asahi Linux on it, but I never got around to it until now. I am still thrilled to be using an ARM workstation made by a major computer manufacturer, and it’s good to be able to run the operating system of my choice on it (though macOS is acceptable for entertainment and video calls, Linux is what I work and do my organization in). And I don’t particularly do GPU-intensive things in my day to day computing – I run XMonad, of all things! – so I don’t really feel like I’m missing out by not having a “proper” graphics driver.

Installation

The Asahi Linux installation process, in spite of some dire warnings, was relatively friendly. It was a “wizard” process rather than a series of instructions to run individual commands that I would have to read off a website. Wizard is definitely better, because those instruction series almost always contain mistakes, assumptions of things you’d “obviously” do, or un-fleshed-out untested alternatives; NixOS in particular has stolen from me many hours of frustration I’ll never get back (and hours later of fixing configuration issues that resulted just from me following instructions from official materials).

So, I guess simply because I’m comparing it to NixOS, Asahi Linux felt extremely easy to install! I didn’t even mind that there wasn’t a concrete recommendation for how much space to give each operating system (although I would have appreciated it). The installer did, however, do two things that annoyed me.

The first thing was that it asked me if I wanted to enter some sort of an expert mode. It said that the questions it would ask in that mode were only interesting to developers, and while normally that would be “yes, absolutely me,” in this case I think they meant “developers of Asahi Linux” – so, not me. I wanted to say y out of curiosity, but I didn’t want to actually choose any wrong option and risk bricking my laptop – which I don’t think were the actual stakes, but I wasn’t entirely sure.

I really hope that if I’d said y, it would have been okay. I would hope that the default option in each “advanced” prompt would be the same as what I’d get if I didn’t do advanced options, but I didn’t really trust them to do that, and it was intimidating.

I’d much rather they said what the advanced options actually did, and reassured me that you could always go with the pre-set defaults if you were unsure, rather than just ask me if I wanted to do “expert mode.”

So that was a little annoying.

The second thing that annoyed me was something that the designers have definitely put some thought into, and I’m befuddled how they arrived where they did.

So, there is one point where the computer is turned off, and you must follow the instructions on how to turn the computer back on very carefully and particularly, or else there be dragons, because if you don’t boot it into recovery mode for the first boot, then Linux will never install.

That isn’t the problem. I appreciate them communicating the stakes, and communicating how it works. I’m sure it’s not their fault that you have to do this extra step, but rather something to do with how the M1’s firmware work. However, I am befuddled why they provide the instructions in the most detail on the laptop where you’re currently installing it – you know, a screen that’s immediately going to disappear as soon as you turn the computer off. There were 7 steps!

It appears that I was expected to:

Read all 7 steps very carefully
Memorize them (carefully!) and remember them when I turned the computer back on

Now, I have ADHD, so my short-term prospective memory is very poor. There’s also a high chance that I’ll get distracted while the computer’s off, and will have to come back to the turning-it-on step later. But even a neurotypical person can’t be expected to reliably remember how to do 7 steps carefully.

I took a picture of the instructions with my phone. I think they should have:

Suggested writing down or taking a picture of the instructions, because “careful” is likely not good enough for many people.
Included all 7 instructions in detail on the website, so if you fail to write it down, you get more than this condensced summary:

Once the first stage of the installation is done, you will have to reboot into 1TR mode (One True recoveryOS) in order to finish the install. Read the instructions that the installer prints carefully! Simply rebooting into the new OS won’t work until this is done. You need to fully shut down your machine, then boot by holding down the power button until you see “Entering startup options”, choose your new OS in the boot selector menu, and follow the prompts.

The website references the transient “instructions that the installer prints.” If anything, the installer should direct you to the website, which should give the instructions in equal detail to how the installer gives them:

In any case, what I actually did was panic, close the laptop, panic again, open it again, realize that made it turn on, and held down the power button – which worked, in spite of blatantly violating the instructions. So maybe warn people not to close and then re-open the laptop, while you’re at it?

… Perhaps it’s moves like this that prevented me from installing NixOS correctly, where they just kind of assume you wouldn’t do something that dumb.

First Boot

I haven’t dual booted a computer since I lived with my parents, and either had to share a computer with them (my Linux partition and their Windows) or later when I only had one computer that I could use in full privacy, but needed both Linux and a more “normal” OS – thus an iBook which ran Mac OS and a PowerPC version of Ubuntu. Even when I ran FreeBSD and other out-there OSes, I had a dedicated (old) full-tower desktop to run it from.

So the idea of dual-booting a “normal” OS that comes with the computer and the more “edgy” programmer-friendly OS that is Linux is quite nostalgic for me. I wondered whether there was any way to refer to macOS with capitalism-criticizing character substitutions a la Mi¢ro$oft: maybe macO$? And to be honest, I was even a little nervous that my IT/sysadmin skills had rotted a little bit since I was a kid. Even though this installer was bending over backwards to make everything easy, this was an alpha operating system unsupported by the workstation vendor.

But all went well.

Once you have it installed, the computer boots into Asahi Linux. You have to hold down the power button to get the boot menu – it uses a firmware-based boot manager to distinguish macOS and Linux. This is a little annoying, as I prefer being asked what operating system I want to boot every time in a dual boot set up, but I can deal with it.

The first boot requires a few remaining set up steps to select keyboard layout, language, and time zone, and also to name the computer and set up a default user. I named the computer protectorate as part of my forms-of-government naming scheme (my Dell laptop is palatinate), and in reference to that this is Linux acting in somewhat foreign territory, claimed by another Unix.

Once this set-up had been complete, I turned on Wi-Fi, which to my mild surprise worked immediately and like a charm, from the KDE-based graphical WiFi menu.

I mean, in all honesty, I kind of knew it would work the first time – that was the point – but I was still viscerally surprised. I guess I am used to the idea of getting Linux to run on a “new” or “odd” platform being an issue of chasing down driver after driver, so I’m happy that I have a distribution designed for basically exactly the computer that I have, even if it’s not a computer particularly associated with Linux.

Then, as soon as I’d verified that Linux worked, I very nervously rebooted the whole thing into macOS – which also worked. Yay!

Next Steps

So that’s where I am now.

To get my normal Linux workflow set up, I’m going to need XMonad and Dropbox. This should be interesting, as I understand neither of those things are Arch Linux packages on ARM, and Dropbox isn’t supported on Linux ARM at all (though you can maybe use their APIs directly to implement a janky home version?)

So, when I get that all set up, I will let you know in another post!

Pictures will come with the next blog post.

I make no promises as to schedule.

RAII: Compile-Time Memory Management in C++ and Rust

2022-10-11T00:00:00+00:00

I don’t want you to think of me as a hater of C++. In spite of the fact that I’ve been writing a Rust vs C++ blog series in Rust’s favor (in which this post is the latest installment), I am very aware that Rust as it exists would never have been possible without C++. Like all new technology and science, Rust stands on the shoulders of giants, and many of those giants contributed to C++.

And this makes sense if you think about it. Rust and C++ have very similar goals. The C++ community has done a lot over all these years to pioneer new programming language features in line with those goals. C++ has then given these features years to mature in its humongous ecosystem. And because Rust also doesn’t have to be compatible with C++, it can then steal those features without some of the caveats they come with in C++.

One of the biggest such features – perhaps the biggest one – is RAII, C++’s and now Rust’s (somewhat oddly-named) scope-based feature for resource management. And while RAII is for managing all kinds of resources, its biggest use case is as part of a compile-time alternative to run-time garbage collection and reference counting.

As an alternative to garbage collection, RAII has deficits. While many allocations are created and freed neatly in line with variables coming in and out of scope, sometimes that’s not possible. To fully compete with garbage collection and capture the diverse ways programs use the heap, RAII needs to be combined with other features.

And C++ has done a lot of this. C++ added move semantics in C++11, which Rust also has – though cleaner in Rust because Rust was designed with them from the start and so it can pull off destructive moves. C++ also has opt-in reference counting, which, again, Rust also has.

But C++ still doesn’t have lifetimes (Rust got that from Cyclone, which called them “regions”), nor the infamous borrow checker that goes along with them in Rust. And even though the borrow checker is perhaps the most hated part of Rust, in this post, I will argue that it brings Rust’s RAII-centric compile-time memory management system much closer to feature-parity with run-time reference counting and other run-time garbage-collection technologies.

I will start by talking about the problem that RAII was originally designed to solve. Then, I will re-hash the basics of how RAII works, and work through memory usage patterns where RAII needs to be combined with these other features, especially the borrow checker. Finally, I will discuss the downsides of these memory management techniques, especially performance implications and handling of cyclic data structures.

But before I get into the weeds, I have some important caveats:

Caveat: No Turing-complete programming language can completely prevent memory leaks. Even in fully-GC’d languages, you can still leak memory by filling up a data structure with increasing amounts of unnecessary data. This can be done by accident, especially when sophisticated callback systems are combined with closures. This is out of the scope of this post, which only concerns memory management issues that automated GC can actually help with.

Caveat #2: Rust allows you to leak memory on purpose, even when a garbage collector would have reclaimed it. In extreme circumstances, the reference counting system can be abused to leak memory as well. This fact has been used in anti-Rust rhetoric to imply its memory safety system is somehow worthless.

For the purposes of this post, we assume a programmer who is trying to get actual work done and needs help not leaking memory or causing memory corruption, not an adversarial programmer trying to make the system leak on purpose.

Caveat #3: RAII is a terrible name. OBRM (Ownership-Based Resource Management) is used in Rust sometimes, and is a much better name. I call it RAII in this article though, because that’s what most people call it, even in Rust.

The Problem: Manual Memory Management is Hard, GC is “Slow”

Caveat: To be clear, “slow” here is an oversimplification, and I address that more later. I mean it as a tongue-in-cheek way of saying that it has performance costs, whereas Rust and C++ try to adhere to a zero-cost principle.

So. C-style manual memory management – “just call free when you’re done with the allocation” – is error prone.

It is error prone when it is easy and tedious, because programmers can make stupid mistakes and just forget to write free and it isn’t immediately broken. It is error prone when multiple programmers work together, because they might make different assumptions about who is supposed to free something. It is error prone when multiple parts of the code need to use the same data, especially when that usage changes with new requirements and new features.

And the consequences of doing it wrong are not just memory leaks. Use-after-free can lead to memory corruption, and bugs in one part of the program can abruptly show up when allocation patterns change somewhere else entirely.

This is a problem that can be solved with discipline, but like many tedious clerical disciplines, it can also be solved by computer.

It can be solved at run-time, which is what garbage collection and reference counting do. These systems do two things:

They keep allocations from lasting too long. When memory becomes unreachable, it can be reclaimed. This prevents memory leaks.
They keep allocations from being freed early. If memory is still reachable, it will still be valid. This prevents memory corruption.

And for most programmers and applications, this is good enough. And so for almost all modern programming languages, this run-time cost is well worth not troubling the programmer with the error-prone tedious tasks of C-style manual memory management, enabling memory safety and resource efficiency at the same time.

GC (including RC) Has Costs

But there are costs to having the computer do memory management at run-time.

I lump mark-sweep garbage collection and reference counting together here. Both mark-sweep garbage collection and reference counting have costs above C-style manual memory management that make them unacceptable according to the zero-cost principle. GC comes with pauses, and additional threads, in the best case. RC comes with myriad increments and decrements to a reference count. These costs might be small enough to be okay for your application – and that’s well and good – but they are costs, and therefore they can’t be the main memory management model in C++ or Rust.

This is a complicated issue, and so before continuing, here comes another caveat:

Caveat: GC is not necessarily slower, but it does have performance implications that are often unacceptable for situations where C++ (or Rust) is used. To achieve its full performance, it needs to be enabled for the entire heap, and that has costs associated with it. For these reasons, C++ and Rust do not use GC. The details of these performance trade-offs are beyond the scope of this blog post.

A Dilemma

But C++ and Rust are not most programming languages. They face a dilemma:

On the one hand, manual memory management is unacceptably error prone for a high level language, a detail the computer should be able to handle for you.
On the other hand, run-time garbage collection violates a fundamental goal that C++ and Rust share: the zero-cost principle. Code written in these languages is supposed to be as performant as the equivalent manually-written C. To conform to that principle, reference counting (or GC) have to be opt-in (because, after all, sometimes manually written C code does use these technologies).

So, for the vast majority of situations, where a C programmer wouldn’t use reference counting (or mark-sweep), Rust and C++ need something more sophisticated. They need tools to prevent memory management mistakes – that is, to at least partially automate this tedious and error-prone task – without sacrificing any run-time performance.

And this is the reason C++ invented (and Rust appropriated) RAII. Instead of addressing the problem at run-time, RAII automates memory management at compile-time. Analogous to how templates and trait monomorphization can bring some but not all of the power of polymorphism without many of the run-time costs, RAII brings some but not all of the power of garbage collection without constant reference count updates or GC pauses.

But as we will see, RAII as C++ implements it only solves one of the two problems addressed by garbage collection: leaks. It cannot address memory corruption; it cannot keep allocations alive long enough for all the code that could possibly need to use it.

Raw RAII: How RAII Works on its Own

The simplest use case for RAII is underwhelming: it automatically inserts calls to free up heap allocations at the end of the block where we made the allocation. It replaces a malloc/free sandwich from C with simply the allocation side, by inserting an implicit (and unwritten) call to a destructor, which in its simplest version is an equivalent of free. And if that was all RAII did, it wouldn’t be that interesting.

For example, take this C-style (no RAII) code:

void print_int_little_endian_decimal(int foo) {
    // Little endian decimal print of `foo`
    // i.e. backwards from how we normally write decimal numbers
    // e.g. 831 prints out as "138"

    // Big endian would be too hard
    // Little endian is as always actually simpler platonically,
    // if somehow not for humans.

    // Yes, this only works for positive ints. It's an example.

    char *buffer = malloc(11);
    for(char *it = buffer; it < buffer + 10; ++it) {
        *it = '0' + foo % 10;
        foo /= 10;
        if (foo == 0) {
            it[1] = '\0';
            break;
        }
    }
    puts(buffer); // put-string, not the 3sg verb form "puts"
    free(buffer); // Don't forget to do this!
}

Just using RAII (and unique_ptrs, which are an essential part of the RAII model), but using no other features of C++, we get this very unidiomatic and unimpressive version:

void print_int_little_endian_decimal(int foo) {
    std::unique_ptr<char[]> buffer{new char[11]};
    for(char *it = &buffer[0]; it < &buffer[10]; ++it) {
        *it = '0' + foo % 10;
        foo /= 10;
        if (foo == 0) {
            it[1] = '\0';
            break;
        }
    }
    puts(&buffer[0]);
}

It doesn’t help us with our random guess of an appropriate buffer size, our awkward redundant attempts to avoid a buffer-overflow, or with any abstraction over the fact that we’re trying to implement a collection.

In fact, it makes the code more awkward, for a benefit that seems hardly worth it, to just automatically call free at the end of the block – which might not even be where we want to call free! We could instead have wanted to return the data to the caller, or inserted it into a bigger, greater data structure, or similar.

It’s a bit less ugly when you use C++’s abstractions. Destructors don’t have to just call free (or rather its C++ analogue delete) as unique_ptr’s does. Any C programmer can tell you that idiomatic C code is rife with custom free functions to free all of the allocations of a data structure, and C++ (and Rust) will choose which destructor to call for you based on the type of the data. Calling free when a custom destructor must be called is a common careless mistake in C. This is true especially among beginners, and (hot take!) making programming languages less needlessly tricky for beginners is a good thing for everybody.

We can combine RAII with other features of C++ to get this more idiomatic code, with the first do-while loop I’ve written in years:

void print_int_little_endian_decimal(int foo) {
    std::string res;
    do {
        res += '0' + foo % 10;
        foo /= 10;
    } while (foo != 0);
    std::cout << res << std::endl;
}

Does std::string allocate memory on the heap? Maybe it only does if the string goes above a certain size. But the custom destructor, ~std::string, will call delete[] only when the allocation was actually made, abstracting that question away, along with handling terminating nuls and avoiding overruns in a cleaner way.

This ability of RAII – to call custom destructors that abstract away allocation decisions – gets more impressive when we consider that many data structures don’t make just 0 or 1 heap allocations, but whole complicated trees of complicated heap allocations. In many cases, C++ (and Rust) will write your destructors for you, even for complicated types like this:

struct PersonRecord {
    std::string name;
    uint64_t salary;
};

std::unordered_map<std::string, std::vector<PersonRecord>> thing;

To destroy thing in C, you’d have to loop through the hash map, free all the keys, and then free all the values, which then requires freeing all the strings in each PersonRecord before freeing the backing for each vector. Only then could you free the actual allocations backing the hash map.

And perhaps a C-based hash map library could do this for you, but only by assuming that the keys are strings, and then taking a function pointer to know how to free the values, which would ironically be a form of dynamic polymorphism and therefore a performance hit. And the function to free the values would then still have to manually free the string, knowing which field of the PersonRecord was a pointer and duplicating that information between the structure and the manually-written “free” function, and still likely not supporting the small-string optimization that C++ enables.

In C++, this freeing code is all automatically generated. PersonRecord gets an automatic destructor that calls the destructor of each field (int’s destructor is trivial), and the destructors of std::unordered_map and std::vector are templated so that, at compile time, a fresh destructor is built from those templates that handles all of this, all without any indirect function calls or run-time cost beyond what manually would be written for exactly this data structure in C.

See, with RAII, a destructor isn’t just automatically and implicitly called at the end of a scope in a function, but also in the destructors of values (“objects” in C++) that own other values. Even if you do write a custom destructor for aggregate types, that just specifies what the computer should do on destruction beyond the automatic calls to the destructors of the fields, which are still implicit.

Ownership and its limitations

This is all possible based on the concept of “ownership,” one of the key principles of RAII. The key assumption is that every allocation has one owner at any given time. Allocations can own each other (forming a tree of allocations), or a scope can own an allocation (forming the root of such a tree). RAII then can make sure the allocation ends when its owner does – by the scope exiting, or when the owning object is destroyed.

But what if the allocation needs to outlive its parent, or its scope? It’s not always the case that a function has primitive types as its arguments and return value, and then only constructs trees of allocations privately. We need to take these sophisticated collections and pass them as arguments to functions. We need to have them be returned from functions.

This becomes apparent if we try to refactor our big-endian integer decimalizer to allow us to do other things with the resultant string besides print it:

std::string render_int_little_endian_decimal(int foo) {
    std::string res;
    do {
        res += '0' + foo % 10;
        foo /= 10;
    } while (foo != 0);
    return res;
}

int main() {
    std::cout << render_int_little_endian_decimal(3781) << std::endl;
    return 0;
}

Based on our previous discussion of RAII, you might assume that the ~std::string destructor is called on the end of its scope, rendering the allocation unusable for later printing, but instead this code “Just Works.”

We’ve hit one of many mitigations against the limitations of raw RAII that are necessary for it to work. This mitigation is the “Named Return Value Optimization (NRVO),” which stipulates that if a named variable is used in all of the return statements in a function, it is actually constructed (and destructed) in the context of the caller. It is misnamed an “optimization” because it’s actually part of the semantics: It eliminates entirely the call to the destructor at the end of the scope, even if that destructor call would have side effects.

This is just one of many ways RAII is made competitive with run-time garbage collection, and we can have values that live outside of a certain scope of a function. This one is narrow and peculiar to C++, but many of the others lead to interesting comparisons. In the next section, we discuss the others.

Filling the Gaps in RAII

Copying/Cloning

We’re going to start with one of the oldest of these: copying. When C++ was designed, the intention was that the programmer would not see a difference between types that don’t involve allocation (like int or double) and types that do (like std::string or std::unordered_map<std::string, std::vector<std::string>>.

When a function takes an int argument, as in print_int_little_endian_decimal, that integer is copied. Similarly, if we take a std::string argument without additional annotation, C++ will also make a copy:

int parse_int_le(std::string foo) {
    int res = 0;
    int pos = 1;
    for (char c: foo) {
        res += (c - '0') * pos; // No input validation -- example!
        pos *= 10;
    }
    return res;
}

int main(int argc, char **argv) {
    std::string s = argv[1];
    std::cout << parse_int_le(s) << std::endl;
    return 0;
}

This is indeed consistent. Treating ints and std::string objects in parallel ways is also in line with how higher-level programming languages sometimes work: a string is a value, an int is a value, why not give them the same semantics? Aliasing is confusing, why not avoid it with copying?

It’s made to work by an implicit function call. Just like destructor calls are implicit in C++, copying also calls a function in the types implementation. Here, it calls std::string’s “copy constructor.”

The problem here is that this is slow. Not only is an unnecessary copy made, but an unnecessary allocation and deallocation creep in. There is no reason not to use the same allocation the caller already has, here in s from the main function. A C programmer would never write this copying version.

The only reason this feature is allowed under C++’s zero-cost principle is because it is optional. It may be the default – and making it the default is one of the most questionable decisions C++ ever made – but we can still alias if we want to. It just takes more work.

Rust, as you can guess by my tone, requires explicit annotation to copy types that have an allocation. In fact, Rust doesn’t even use the term “copy,” which is reserved for types that can be copied without allocations. It calls this cloning, and requires use of the clone() method to accomplish it.

Some types don’t use an allocation, and “copying” them is just a simple memory copy. Some types do use an allocation, and “cloning” them requires allocating. This distinction is important and fundamental to how computers work. It’s relevant and visible in Java and even Python, and pretending it doesn’t exist is unbecoming for a systems programming language like C++.

Moves

Returning an allocation from a function can’t always use NRVO. So if you want your value to outlast your function, but it’s created inside the function (and therefore “owned” by the function scope), what you really need is a way for the value to change owners. You need to be able to move the value from the scope into the caller’s scope. Similarly, if you have a value in a vector, and need to remove the last value, you can move it.

This is distinct from copying, because, well, no copy is made – the allocation just stays the same. The allocation is “moved” because the previous scope no longer has responsibility for destroying the allocation, and the new scope gains the responsibility.

Move semantics fix the most serious issue with RAII: your allocation might not live exactly as long as its owner. The root of an allocation tree might outlive the stack-based scope it’s in, such as when you want to return a collection from a function. The other nodes of an allocation tree might leave that tree and be owned by another stack frame, or by another part of the same allocation tree, or by a different allocation tree. In general, “each allocation has a unique owner” becomes “each allocation has a unique owner at any given time,” which is much more flexible.

In Rust, this is done via “destructive moves,” which oddly enough means not calling the destructor on the moved-from value. In fact, the moved-from value ceases to be a value when it’s moved from, and accessing that variable is no longer permitted. The destructor is then called as normal in the place where the value is moved to. This is tracked statically at compile-time in the vast majority of situations, and when it cannot be, an extra boolean is inserted as a “drop flag” (“drop” is how Rust refers to its destructors).

C++ didn’t add move semantics until C++11; it was not part of the original RAII scheme. This is surprising given how essential moves are to RAII. Returning collections from functions is super important, and you can’t copy every time. But before C++, there were only poor man’s special cases for move, like NRVO and the related RVO for objects constructed in the return statement itself. These have completely different semantics than C++ move semantics – they’re still more efficient than C++ moves in many cases.

When C++ did eventually add moves, the other established semantics of C++ forced it to add moves in a weird and deeply confusing way: it added “non-destructive” moves. In C++, rather than the drop flag being a flag inserted by the compiler, it is internal to the value. Every type that supports moves must have a special “empty state,” because the destructor is called on the moved-from value. If the allocation had moved to another value, there would be no allocation to free, and this had to be handled by the destructor at run-time, which can amount to a violation of the zero-cost principle in some situations.

C++ justifies this by making moves a special case of copy. Moves are said to be like copies, but make no promises of preserving the initial value. In exchange, you might get the optimization of being able to use the original allocation, but then the initial value will not have an allocation, and will be forced to be different. This definition is very different than what moves are actually used for (cf. the name of the operation), and therefore, even though it is technically simple, claiming that focusing on that definition (as Herb Sutter does) will simplify things for the programmer is disingenuous, as I discuss in my post on move semantics.

In practice, this means that all types support the operation of moving – even ints – but even some types that manage an allocation might fall back on copying if moves haven’t been implemented for them. This inconsistency, like all inconsistencies, is bad for programmers.

In practice, this also means that moved-from objects are a problem. A moved-from object might stay the same, if no moving was done. It might also change in value, if the move caused an allocation (or other resource) to move into the new object. This forces C++ smart pointers to choose between movability and non-nullability – no moveable, non-nullable pointer is possible in C++. Nulls – and the other “moved-from” empty collections that you get from C++ move semantics – can then be referenced later on in the function, and though they must be “valid” values of the object, they are probably not the values you expect, and in the case of null pointers, they are famously difficult values to reason about.

This is a consequence of the fact that C++ was a pioneer of RAII semantics, and didn’t design RAII and moves together from the start. Rust has the advantage of having included moves from the beginning, and so Rust move semantics are much cleaner.

In Rust also, all types can be moved. But in Rust, no resources or allocations are ever copied. Instead, moves always have the same implementation: copy the memory that is stored in-line in the value itself, and then do not call the destructor. For copyable types like int that do not manage an allocation or other resource, this does amount to a copy, but the original is still not usable. But no allocation or resource is ever copied; for those types, the pointer or handle is simply brought along bit-by-bit just like other data, and the old value is never touched again, making this a safe operation.

All types must then be written in such a way to assume that values might not stay in the same place in memory. If some operations on a type can’t be written that way, they can be defined on “pinned” versions of that type. A pin is a type of reference or box that promises that the pointed-to value will never move again. The underlying type is still movable, but these particular values are not.

This is a gnarly exception to Rust’s “all types can be moved” rule that make it false in practice, though still true in pedantic, language-lawyery theory. But that’s not important. What is important is that Rust’s move semantics are consistent, and do not rely on move constructors and manual implementations of Rust’s drop flags within the object. The dangerous possibility of interacting with a moved-from object, whose value is unpredictable and quite possibly a special “empty” state like null, is not present in Rust.

Borrows in Rust

While moves cover returning a collection (or other resource-managing value) from a function, they don’t cover passing such a value into a function, or at least not in the general case. Sometimes, when we pass a value into a function, we want to move the value in, so that the function can consume it or add it to an allocation tree (like inserting into a collection). But most times, we want the function to be able to see and perhaps mutate it, but then we want to give it back to the owner.

Enter the borrow.

In Rust, borrows are commonly introduced as a sort of an improvement on moves. Consider our example function that parses a string to an int, here implemented in C++ with copies:

int parse_int_le(std::string foo) {
    int res = 0;
    int pos = 1;
    for (char c: foo) {
        res += (c - '0') * pos; // No input validation -- example!
        pos *= 10;
    }
    return res;
}

Here is a Rust version, with moves, so that the function consumes the string:

use std::env::args;

fn parse_int_le(foo: String) -> u32 {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    res
}

fn main() {
    let mut args: Vec<String> = args().collect();
    println!("{}", parse_int_le(args.remove(1)));
}

As we can see with the “move” version of this, we are in the awkward position of removing the string from the vector, so that parse_int_le can consume the string, so it doesn’t have multiple owners.

But parse_int_le doesn’t need to own the string. In fact, it could be written so that it can give the string back when it’s done:

fn parse_int_le(foo: String) -> (u32, String) {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    (res, foo)
}

“Taking temporary ownership” in real life is also known as borrowing, and Rust has such a feature built-in. It is more powerful than the above code that literally takes temporary ownership, though. That code would have to remove the string from the vector and then put it back – which is even more inefficient than just removing it. Rust borrowing allows you to borrow it even while it’s inside the vector, and stays inside the vector. This is implemented by a Rust reference, which has this borrowing semantics, and is, like most “references,” implemented as a pointer at the machine level.

In order to accomplish these semantics, Rust has its infamous borrow checker. While we are borrowing something inside the vector, we can’t simultaneously be mutating the vector, which could cause the thing we’re borrowing to move. Rust statically ensures that this is impossible, rejecting code that use a reference after a mutation, destruction, or move somewhere else would invalidate it.

This enables us to extend the RAII-based system and both prevent leaks and maintain safety, just like a GC or RC-based system. The borrow checker is essential to doing so.

For completeness, here is the idiomatic way to handle the parameter in parse_int_le, with an actual borrow, using &str, the special borrowed form of String that also allows slices:

use std::env::args;

fn parse_int_le(foo: &str) -> u32 {
    let mut res = 0;
    let mut pos = 1;
    for c in foo.chars() {
        res += (c as u32 - '0' as u32) * pos;
        pos *= 10;
    }
    res
}

fn main() {
    let args: Vec<String> = args().collect();
    println!("{}", parse_int_le(&args[1]));
}

Dodging memory safety in C++

In C++, of course, there is no borrow checker. In the parse_int_le example, it’s still possible to use a pointer, or a reference, but then you’re on your own. When RAII-based code frees your allocation, your reference is invalidated, which means it’s undefined behavior to use it. No coordination is performed by the compiler between the RAII/move system and your references, which point into the ownership tree with no guarantee that said tree won’t move underneath it. This can lead to memory corruption bugs, with security implications.

It’s not just pointers and references. Other types that contain references, such as iterators, can also be invalidated. Sometimes those are more insidious because intermediate C++ programmers might know about pointer invalidation, but let their guard down with iterators. If you add to a vector while looping through it, you’ve just done undefined behavior, and that’s surprising because no pointers or references even have to show up. Rust’s borrow checker handles these as well.

Even though the Rust borrow checker gets a bad reputation, its safety guarantees often make it worth it. It’s hard to write correct C++ when references and non-owning pointers are involved. Maybe some of you have that skill, and are unsympathetic to those who don’t yet have it, but it is a specialized skill, and the compiler can do a lot of the work for you, by checking your work. Automation is a good thing, and so is making systems programming more accessible to beginners.

And of course, many C++ programmers do make mistakes. Even if it’s not you, it might be one of your colleagues, and then you’ll have to clean up the mess. Rust addresses this, and limits this more difficult mode of thinking to writing unsafe code, which can be contained in modules.

Multiple Ownership

In RAII, an allocation has one owner at a time, and if your owner is destroyed before the allocation is moved to another owner, the allocation must be destroyed along with it.

Of course, sometimes this isn’t how your allocations work. Sometimes they need to live until both of two parent allocations are destroyed, and sometimes there is no way to predict which parent is destroyed first. Sometimes, the only way to solve that situation – even in C – is to use runtime information – and so you can model multiple ownership through reference counting: std::shared_ptr in C++, or Rc and Arc in Rust (depending on whether it is shared between multiple threads).

This is something that C programmers will sometimes do in the face of complicated allocation DAGs, and end up implementing bespoke on a framework-by-framework basis (cf. GTK+ and other C GUI frameworks). C++ and Rust are just standardizing the implementation of this, but, in line with the zero-cost rule, making it optional.

Interestingly enough, reference counting is implemented in terms of RAII and moves. The destructor for a reference-counted pointer decreases the reference, and cloning/copying such a pointer increases it. Moves, of course, don’t change it at all.

RAII+: What this all adds up to

Between RAII, moves, reference counting, and the borrow checker, we now have the memory management system of safe Rust. Safe Rust is a powerful programming language, and in it, you can write programs almost as easily as in a traditionally GC’d programming language like Java, but get the performance of manually written, manually memory managed C.

The cost is annotation. In Java, there is no distinction between “borrowing” and “owning”, even though sometimes the code follows similar structures as if there were. In Rust, the compiler must be informed about the chain of owners, and about borrowers. Every time an allocation crosses scope boundaries or is referred to inside another allocation, you must write different syntax to tell Rust whether it’s a move or a borrow, and it must comply with the rules of the borrow checker.

But it turns out most code has a natural progression of owners, and most borrows are valid in the borrow checker. When they’re not, it’s usually straight-forward to rethink the code so that it can work that way, and the resultant code is usually cleaner anyway. And in situations where neither of them work, reference counting is still an option.

At the cost of this annotation, Rust gives you everything a GC does: Allocations are freed when their handles go out of scope, and memory safety is still guaranteed, because the annotations are checked. Memory leaks are as difficult as in a reference counting language, and the annotations are checked, which is most of the benefit of automating them. It’s an excellent happy medium between manual memory management and full run-time GC with no run-time cost over a certain discipline of C memory management.

Of course, other disciplines of C memory management are possible. And using this Rust system takes away flexibility that might be relevant to performance. Rust, like C++, allows you to sidestep the “compile-time GC” and use raw pointers, and that can often be better for performance. A recent blog post I read explores some of that in more detail; encouragingly, that blog post also considers RAII to be in-between manual memory management and run-time GC – serendipitously, because I had already drafted much of this post when it came out.

But the standard memory management tools of Rust cover the common cases well, and unsafe is available for when it’s inappropriate – and can be wrapped in abstractions for interfacing with code that uses the RAII-based system.

In C++, the annotations of “borrows” vs “moves” can easily result in undefined behavior. Leaks are prevented, but memory corruption is not. So the C++ system is a much worse replacement for garbage collection – RAII is only doing some of its job, as it is not paired with a borrow checker.

Cycles

I leave the most awkward topic for the end. We’ve talked about allocation trees and DAGs, but not general graphs. These require unsafe in Rust, even something as supposedly basic as doubly linked lists. It’s against the borrow checker’s rules, and the compiler will statically prevent you from making them using safe, borrowing references. They simply aren’t borrows in the Rust sense, but are rather something else, something about which Rust doesn’t know how to guarantee safety.

This is not as bad as you might think, because cycles also form a hole in reference counting, which is a popular run-time GC system. This is why you can’t use Rc or Arc to implement a doubly-linked list correctly in Rust either: You’ll get past the borrow checker and guarantee a memory leak.. These systems generally can’t detect cycles at all, and leak them, which is arguably worse than forbidding them to be created.

In any case, the unsafe keyword is not poison. For things that Rust doesn’t know how to keep safe, you need to exercise extra responsibility, but at least the programming language is making you aware of it – unlike C++, which is unsafe all the time.

Write Everything Down (Part 3): My Personal Organizational System

2022-10-06T00:00:00+00:00

As promised in my previous posts about organization, I will now go into some detail about my own organizational system. But before I start talking about it, and how I came to develop it, I’d like to emphasize a few points, or more specifically, three caveats, lest Zeus strike me down with a thunderbolt for my hubris:

Caveat the First: My system is a work in progress. Even though it is overall very helpful, it’s always falling apart a little bit. Some parts of it work better than others, and it’s constantly evolving as I try to shore up the parts that fall apart more easily. Sometimes, it’s in a better state than others.
Caveat the Second: What works for me might well not work for you, dear reader. I reckon you and I have very different brains. Even if a psychiatrist would categorize me and you with all the same formally recognized traits, we still have literally different brains, and literally different histories, cultural backgrounds, and personal struggles.
Caveat the Third: Nothing in this system is particularly novel. It is however very tweaked to my own personality. I present this not to claim that I’ve developed anything new, but as a worked example of applying existing practices to my own life, in hopes that it will be useful to you.

And it is indeed a very personal system and a continuously evolving system. I am sensitive to minor issues. If a TODO list system is insufficiently ergonomic for me, I’ll get overwhelmed by it, or intimidated by it, disheartened, blocked out by my personal “Wall of Awful”, and I will default to not using any organizational system at all, and simply relying on my natural faculties – my naturally poor prospective memory – to make sure I do the things I need to do.

This has predictably terrible results, so I keep trying to use the system, but it involves a lot of tweaking, a lot of tricking myself, occasionally changing up the system in some ways, not necessarily because the improvements help – though sometimes they do – but also because changing the system makes it more interesting and more engaging and gives me another reason to look at the TODO items as I re-sort them into new categories rather than just idly reading through them.

Some parts of the system also work better than others, and so sometimes I’ll just use the parts of the system that work the best for me on their own for a few days, and the other parts of the system, the parts that work less well, can take a few days off before I am up to using them again. And that’s OK – that’s part of the design. It’s sort of become part of the organizational pattern – making a task out of re-organization.

So with these caveats out of the way, let’s begin the tour. Each section will go over a feature of the organizational system, describe where I got the idea from, and discuss how it fits into my dynamic.

E-Mail: Let me write that down real quick

TODO tasks can come at any time, in any situation. Whether you abruptly remember something that you have to do, but didn’t write down, or someone tries to create a plan with you where you have to check the calendar, or you get an e-mail you have to respond to or else lose an important account, TODO tasks are not things that happen when you have your computer out and are ready to use your Fully Realized Personal Organizational System (TM).

This is perhaps the most important part of any TODO system: How do new tasks enter it? For those who use apps with phone versions, this might be that you actually put the task where it “actually goes” right away. But in my experience, this is too tedious. The version of the task when fully considered and put in the appropriate spot is different from the version you got it in, and converting it, and finding the right spot, is itself a mini-task you might be tempted to procrastinate. And God forbid you decide that, really, it should be in a completely new category, and other things should be moved to that category too!

No, new tasks belong in their own lightweight place, and then a separate habit should be developed of draining that place, later, when not trying to make conversation with someone, and organizing those tasks.

This is a core tenet of the Getting Things Done productivity methodology, and one I happen to agree with – and realized on my own a long time ago in my battles with JIRA. Recording tasks as they come up, and organizing tasks so that you can plan to do them, are fundamentally different things, and require different systems. The burdens of “organization” should not interfere with the lightweightness of “collection” – especially when “organization” is as arcane and heavyweight as JIRA is.

For my “collection” step, I use e-mail, and send myself quick subject-only e-mails to cover tasks I unexpectedly learn about. I marked all my e-mail as read in one heroic step late 2021, and ever since then, I’ve actually been an “Inbox Zero” person. Or rather, the type of person who regularly reads all my e-mails. This allows me to use unread e-mails as a repository for tasks, which is good because sometimes tasks from other people or organizations come in this way. I can mark e-mails unread as well, if they are both messages and tasks, an advantage text messages don’t have, so e-mail is better than texting in that way.

And then, there is a habit that comes with this: Every once in a while, ideally every day but at least every other day, I have to have a little session where I sit down, go through the e-mails, and either do the things or copy the TODO items into another place, within my real organizational system, which is in text files on my computer.

Of course, it also entails aggressively checking my e-mail multiple times a day, and in the biggest change, actually deleting or marking as read the “real” e-mails from companies that I’m not going to read, rather than just letting them pile up. So here I am, in my 30’s, finally a practitioner of Inbox Zero, which is honestly a bit of a pain. But the ability to use the e-mail Inbox as a TODO list has, so far, been worth it.

And I also know if I used a dedicated TODO app as my TODO list, instead of e-mails, I know that I’d (a) let them live there forever and (b) eventually stop looking at the app. The point is that these items don’t live in my e-mail – they just stay there as scratch space, for collection purposes. I get them into my real TODO system as fast as I can and they get deleted. The real TODO system, where items live, is much more sophisticated, and that’s what I’ll get into next.

Hierarchical Plain Text Files

So, first of all, rather than use any sort of structured app, or spreadsheets, I use plain text files. I edit them in vim, my preferred text editor for programming, as I am very used to it. The commands, such as dd for delete line, p to paste in the line you just deleted, o for write a line above the current one, are all in my fingers’ muscle memory. This is the easiest way I can move content around and reorganize it, from file to file and within a file.

As I am using a programmers’ text editor to do my organizational system (and to do my writing), often when I’m working on blogging or just figuring out what to put on my TODO list, I look like I’m programming. People come up to me at bars and coffee shops to tell me that they’re impressed with the programming I’m doing:

But, as you can see if you read the words on the left, this is my blogging TODO list, not my work at all.

It’s very important to me that the items are hierarchical. I have a lot of ideas that flow out of my mind, and I like feeling like I can write them down so that I can eventually actually follow through on them. If I wrote all the ideas in list form – ever – there simply would be too little structure, and I would never find ideas that went together.

This is somewhat obvious for the planning for a blog post: Of course elements of a blog post go together, under a heading for that blog post, and of course pre-writing can be done in the form of a hierarchical outline.

But other tasks are hierarchical as well, even those we normally write lists for. I find that I do better when I express the hierarchy, and re-adapt the hierarchy for various phases of the planning.

For example, grocery lists. As a grocery list is being generated, it is hierarchical by planned meal (or non-meal category like “snacks”), and I type it out accordingly:

* Grocery shop
    * Snacks
        * Chips and salsa
            * Hot salsa
            * Mild salsa for guests
        * Bread and hummus
            * Challah bread
            * Potato rolls
    * Planned meals
        * Chili
            * Beans (I may have this already)
            * Tomatoes
            * Onions
            * Spices
                * Check online recipe
                * Confirm that I have them
        * Veggie Carbonara
            * Shiitake mushrooms
...
...

Notice that many of the TODO items are actually little research items to improve the hierarchy. I have to go check whether I have beans, and once I do, I can edit it. This same “Planned Meals” section can also be duplicated from the grocery shopping section to a separate section that tells me to actually make the meal.

If I do my shopping through a delivery system, the shopping list can stay in this format as I enter it into their app. But if I need to go shopping in person, I can restructure this shopping list to be by section of the grocery store, rather than by meal. The act of restructuring the list also helps me solidify the list in my memory, and gives me opportunities to realize I’ve missed things:

* Grocery shopping, buy:
    * Produce
        * Green onions
        * Peppers
            * Traffic light (red, yellow, green)
        * Mushrooms
            * Shiitake
            * Cremini
    * Canned food
        * Beans
        * Diced tomatoes
        * Hot salsa
        * Mild salsa

My TODO lists can be very long, as I can see by running the wc -l command to show how many lines they have:

[jim@palatinate:~]$ wc -l Log/TASKS Log/CALENDAR Log/WRITING.md Log/TECH_WRITING
  353 Log/TASKS
   95 Log/CALENDAR
 1022 Log/WRITING.md
  719 Log/TECH_WRITING
 2189 total

If I didn’t use some level of hierarchicalization, they would be completely impossible to read.

I am, of course, not the first person to use hierarchical plain text files as an organizational system. I use vim as my text editor, but in the Church of emacs, where emacs is considered the one true text editor, there is a long tradition of using Org Mode, a special file format that emacs specifically supports for such hierarchial organizational text. I have had colleagues use Org Mode in the past, and it was definitely an inspiration for my current system.

My files are not valid org mode files – they use the markdown style of hierarchical bullet points – but I sometimes consider making them org mode files, as that format is supported by multiple iPhone apps and then I could use my phone to access and update my organization files.

I also use Dropbox to keep these files synced between computers, because I am unfortunately no stranger to losing my laptop – because I forget to take my bag back with me when I’ve left it in a place.

Task-Specific File Formats

But the fact that I use hierarchical plain text files is not enough to constitute a system. It’s more information than just “write it down”; it’s “write it down in plain text in a specific style of bullet points in a known location on my laptop and in Dropbox.” We’re closer to a system, but we’re not the whole way there yet.

This reminds me of the OSI model of networking layers. Indulge my nerdiness for a second: computer networking is systematized in multiple layers. The difference between Ethernet, Wi-Fi, and cable is one layer. At another more abstract layer, it’s all “the Internet,” using a family of protocols called TCP/IP. At a still higher layer, there’s a different “protocols” between browsing the web, your WhatsApp messages, your Zoom call, and each of them is layered on top of “Internet” or TCP/IP, and each of them is layered on top of either Wi-Fi, Ethernet, Cable, 5G, or carrier pigeon.

A simpler analogy: On your phone, you have different apps. It’s the same phone with the same hardware, but the apps are different.

Similarly, we have defined a few layers of my system:

English
Writing
On my laptop and in DropBox
Plain text files
- With hierarchies of bullet points
  - To organize information
    - Inspired by “org” mode

But in order to actually use the system to organize my life, I am now at the point where I need different details for different parts of my life – where I have to do different things in my different apps. What works for one part of my life might not work in others.

Calendar: What to do each day

I’ll talk about my calendar first. I use it absolutely every day, and it is the part of my organizational system that I absolutely couldn’t function without. My use of this organizational system waxes and wanes, as do my general organizational skills, but the presence of this system definitely elevates the waxing and waning so that my lowest lows are still more functional than my normal days before I had it. The calendar provides this baseline level, and is one of the parts that still functions when I’m too busy or too exhausted or simply too frazzled to use the other parts.

I won’t post screenshots of my calendar because it’s, ahem, obviously very private, but my main calendar is not Google Calendar on my phone (which I do not use), nor a calendar on the wall (which my parents always used to great effect), but it is actually one of these hierarchical text files.

If you send me a Google calendar invite, it won’t actually happen unless I integrate it into my personal calendar.

Fine, here’s a screenshot, but it’s scrolled down because I plan very far in advance (wink):

Each day has its own little entry. They can be as short or as long as they need, though if they get too long, I take this as a sign that I’m expecting too much of myself for that day. The actual items are a bit of a hodgepodge: It includes absolutely mandatory appointments (with times), things I must get done that day or else suffer greatly, or just tasks I know I need to do at some point and figure this is a good day for them. Some of it’s super time sensitive, but a lot of it is a near-term TODO list straddled along a few days of calendar, or moved from day to day as I literally procrastinate (pro cras being Latin for “for tomorrow”). For some items, the day it’s listed under doesn’t really mean much of anything at all, besides a promise to myself to think about that task on that day, and also an anxiety-relieving reassurance that I don’t have to think about it before then, because it’s written on a later day, which is the designated time for that obligation to resurface in my life, and no sooner.

(As a result, if an event needs preparation, I often need to write the preparation separately from the event, as when I’m in my less-functional modes I might not look ahead on the calendar.)

I don’t really distinguish these things in the calendar file; I keep track of what’s urgent and what’s not in my head. In spite of my sometimes over-enthusiastic over-wrought hierarchical notes, I actually only need a little reminder to not forget the thing at all. Once I’m reminded, my more normal-functioning retrospective memory kicks in, and I know all the details of why I have to do the thing and how urgent and/or important it actually is.

Like all parts of this organizational system, the calendar comes with some habits, which form the core of any organizational system. For the calendar, the habit is that every day, before I actually go about the tasks of my day, I look at the calendar to see what all I have to do that day. Often, I have leftover material from the previous day or days, from what I did (or at least, what I was supposed to do) yesterday.

I take those bullet points from previous days and respond appropriately. If it’s something I did, I’ll remove it. If it’s something I didn’t do, and should do, I move it to another day. And if it’s something I didn’t do, and it’s too late to do now, I can schedule apologies and other damage control for today or another day.

Perhaps just as important but less likely to actually happen is looking the previous night about what I have to do the next day. This ensures I actually get up on time to do things that are in the morning. Fortunately, such things are normally social in nature and my excitement for a social opportunity helps glue it in my memory – sometimes.

The other habit that comes with the calendar file is for when things get added to it. I don’t commit to anything until it’s hit my calendar and written the thing into the calendar. If I know I’m free, I might commit after having written myself an e-mail to add something to my calendar, but even then, I write the e-mail while I’m still having the conversation, and don’t actually agree to do the thing until I’ve sent it. This goes for work events and doctors appointments as well as personal events.

The Work File

Though the calendar will contain some amount of day-to-day life TODOs, it generally doesn’t include work. If it’s a weekday, I start work, but I don’t write “get a day’s worth of job work done” on every entry (which is probably how I’d phrase it if I were to). I’ll put specific work meetings on it, because otherwise I might miss them, but my work is not heavy on concrete deadlines, and to the extent that it is, that goes in a separate work organizational system, in its own hierarchical text file.

I keep my organizational files open in GVim windows – panes, really, since I use XMonad – on virtual desktop #5 on my laptop. I generally have at least one organizational file open when I’m using the computer, depending on what I’m doing, and sometimes more than one. If I’m doing something else on another screen, writing code or reading Wikipedia, I always have my organizational files available at a moment’s tap of the buttons.

My work organizational file is open if and only if I’m working. It’s state of being open defines the concept of “being at work,” and reifies it in my mind. One hierarchical file contains all my personal work organization.

“All my personal work organization” doesn’t necessarily mean all my work organization overall: Ticketing systems and documentation and project plans that might have to be shared with others live in separate places, in formats preferred by the teams and companies I work for. But I always keep track of where they are (so long as they are relevant and so long as I have to make sure not to forget about them entirely) in the work file itself. If it doesn’t exist in the work file, I might forget about it, and likely will. This implies that if I shouldn’t forget about something, it should be referenced in the work file.

Therefore, one thing that is in my work file, and goes near the top, is a series of links to shared organizational pages. What tickets am I currently working on and responsible for updating, even if the actual work is done and I just have to answer questions or follow up on QA (quality assurance) at this point? What code have I written in the form of a merge request that I need to make sure my colleagues review and actually integrate with the code, even if I’ve written all the code I have to and I’ve theoretically handed it off? What issues have I filed on GitHub with open source projects that I need to follow up on to see if their maintainers have gotten back to me? Programming requires a surprising amount of following up with other people and reminding them of things, and that requires a list of things that are pending so I don’t forget a whole thing to follow up on.

Also at the top of the file I include my current TODO list. Oftentimes, I get a torrent of tasks at once, like when I think I have something simple to do (like fill out a JIRA ticket) but it turns out to have several side-quests (like researching which versions are supposed to support which features and who maintains them, and creating a secondary JIRA ticket to make sure the code eventually gets QA’d). Instead of trying to rely on my prospective memory, I spill them into the “immediate TODO” location. If they are not in fact the things I should do next, or if I find them overwhelming, I then sort them properly into projects.

Projects are the majority of the file, each in its own little cluster, analagous to dates in the calendar file. Usually, at any given point for my job, I have 1-3 projects that I’m actively working on. I prefer having at least one major project and one minor project: I work well when switching between multiple projects and I can use the minor project to take a break from the major project.

But my work file always has more than 1-3 projects. Often, I know what projects I will have to do after I’m done with the current project. Sometimes, I have started a project, and it might or might not come up again – and I have my notes for it in case it does. Sometimes, I have an idea for a future project, but it’s not really relevant now, and I write up how to do it. In any case, they’re all ready to consider when I’ve finished my current project, or to thaw out if someone tells me they’ve increased in priority or urgency.

The TODO/project dichotomy might seem a little complicated, but it’s necessary. Sometimes, the TODO items involve writing project summaries for a project. Sometimes, it includes communicating with other people about the project. Usually, the project summary itself contains the actual steps needed to get the project done as a programming project, whereas the TODO includes things like communicating to other people’s day-to-day messages about them.

And a programming job can easily gear up to that level of complexity. I get tasks on a 1-2 week time scale that I then have to break down into subtasks myself, sometimes even on a timescale of months. Only a small part of the job is actually changing the code. Most of it is figuring out how the existing code worked, figuring out why that doesn’t meet the needs, and then making sure the modified code actually gets into the finished product. People who ask for help often are stuck because they’ve made a false assumption, or misunderstood what the problem even is. And often problems straddle many unrelated systems – code testing and deployment almost always does.

So here are the habits that come with the work file: This means that whenever I find myself at work and not sure what I’m doing, I go read the top TODO item. If there is none, I go grab some items from the current project. If I do know what to do, but it isn’t written down in either of those two places, well, I write it down before I do it, in case I get distracted and forget. Meanwhile, meetings and time-based TODO items, as I said before, go in the same calendar along with everything else.

Other Files

I have other hierarchical files in my system. Besides my work file for my job, I also have files in a similar structure for programming as a hobby (including technical blog posts), writing as a hobby (including non-technical blog posts), finances, and my social life. But for the most part, they just follow the same structure: a section for each project, in hierarchical format, with an immediate TODO list at the top.

Conclusion

This system overall works well for me, but I’m always ready for improvements. Please share in the comments things that work well for you, and maybe I can learn something!

Write Everything Down (Part 2): Failed Organizational Systems

2022-10-05T00:00:00+00:00

In my previous post on organization, I concluded with this statement:

As everyone’s brain works differently (whether ADHD or not), people differ tremendously in what their ideal organizational systems are. For me, I am much less productive if I have a less than ideal system – the stakes are very high. But even for people who can be productive on any system, I think that tailoring their system to their brain, their lifestyle, their job and schedule and hobbies, can have amazing results.

In this post, I want to go more into detail about that. Specifically, I’d like to demonstrate the point by looking at organizational systems and techniques that have not worked for me personally, in approximate chronological order of my life.

Handwriting

The first one is the one I noticed the earliest, before anyone expected any organization out of me: I am not very good at handwriting. I don’t have the best coordination, it makes my hand hurt, and I can’t really get myself to do it in a sustained way.

I have a number of reasons or excuses for this, but the biggest one is probably how slow it is. I get impatient. I get distracted by the effort, and I forget what I was going to write next. Even if I don’t, I get frustrated with the lack of speed: I can type at 110 WPM or so, whereas my handwriting is probably more like 15.

I basically don’t use handwriting at all in my present life – which means I’m unpracticed and makes it even less of an appealing option. Many people recommend handwritten TODO lists and reminders as a way of organizing, or handwritten journals as a way of meditating and logging life, and I’ve come to realize that as appealing as it may sound, and as relaxing as it can be to not be at a screen, these techniques are not for me – but can be for me if modified to be on a computer (or phone).

Homework and Note-Taking

My handwriting problems led soon to note-taking problems. Note-taking was a bust for me in school, and even in university.

I have heard that writing notes is supposed to help you pay attention. Maybe this works for people who don’t struggle as much with handwriting as I do – I do have some smatterings of experiences doing better note-taking with a computer. However, for whatever reasons, in school I simply could not pay attention to the teacher and write notes at the same time, or honestly get myself to write any notes even if I was not simultaneously trying to pay attention to the teacher. When the teacher was finished teaching, or took a break, I never took that opportunity to write anything down either, because that would get me maybe 10% of the actual notes that I could theoretically get out of the class, and that wasn’t enough to actually be useful.

This, perhaps surprisingly, didn’t really cause me problems for tests. I was usually actually paying really good attention to the teacher (whether the teacher could tell from my body language or not), and as I said before, I had a really good retrospective memory. Generally, as long as I remembered and understood everything the teacher had explained, I would do well on the tests, which almost always, in practice, tested retrospective memory.

See, tests always remind you what the question is about, and very rarely contain a gotcha where you have to remember to do something. Generally, even in math tests, you can’t leave out a step, because otherwise you simply don’t arrive at the answer. When leaving out a step was possible (e.g. doing a separate “units check” after answering the question), I often would forget to do it. But in such situations, there was at least usually a reminder in the text of the exam. All in all, school tests privilege retrospective memory over prospective memory, biasing them towards people who think like me, and (especially due to my deficits in prospective memory) giving an inflated view of my skill-sets.

But even though this didn’t affect my test-taking, this complete lack of note-taking had some other negative effects. I remember in middle school you used to have to write your homework down in a little planner they gave you called an “agenda book,” which also contained hall passes in the back for teachers to initial. I never remembered to write down the homework. Basically every day I had to call a very tedious hotline (from the landline phone in my parents’ bedroom), a hotline on which teachers usually (but not completely reliably) used to make voice recordings of their daily homework. It was called “info connect,” and it always played ads for the local bank, a service offered to all students and very helpful for my ADHD – a good example of a universal accommodation and one that I desperately needed because that homework was not getting written down, even if it was so tedious.

By the time I got to university, homework was usually listed as part of a syllabus you got at the beginning of class, or else posted on the course website, perhaps both. It’s sort of implied university will be harder than middle school, but in this particular way, it wasn’t. The syllabus is great organizational technology – it was way easier than calling a hotline and listening for your homework. All you had to do was look at the syllabus, which you would have saved from your first day of class in a known location in your room… or who am I kidding? It was available for regular consultation on the course website.

There have been a few other occasions over the years where note-taking was essential, and they made me nervous. In our high school debate team, note-taking was necessary because you were judged on whether you’d replied to everything your opponent said. Leaving something out rendered you vulnerable to your opponent reiterating it and claiming that you dropped it because you had no counterargument. Therefore, it was actually more prospective rather than retrospective memory, and my natural memory, which had covered for my lack of note-taking ability in so many classes, was not able to help me as much.

However, though I was nervous about it, I was ultimately able to do fine in Debate, because note-taking actually was the primary activity. I did not have to pay full attention to the speaker, as I often automatically did in class, because I already understood most of the relevant concepts to the topic and didn’t need to pay full attention to the concepts, just which ones they invoked in what structure. Furthermore, the requirements of the note taking was minimal: It was more an outline of my rebuttal than an actual record of what was said. All in all, debating was like being a stereotypical bad listener: You’re barely paying attention, focusing the entire time on what you’re going to say next.

But mostly, note-taking was problematic for me when prospective memory was called for, and I was expected to amplify it with writing. I would more often fail than succeed in that situation. When retrospective memory was called for and other students amplified it with writing, I simply defaulted to my non-amplified skills, and that was enough.

Interestingly enough, the same problem doesn’t really apply to me often today. Nowadays, it’s actually easier to extract TODO items out of a meeting than it was when I was in high school or in college. For one thing, I basically always get to have my computer with me now to type up notes – a non-starter in high school, and commonly forbidden in college classes for disrupting the class. For another thing, though, if a meeting creates TODO items for me, I can write them down towards the end, while simultaneously clarifying – out loud! – what exactly the tasks are.

Often, these tasks are the result of a convoluted discussion, and so people appreciate taking the time to hear me summarize my take-aways out-loud, and it gives everybody an opportunity to sanity-check whatever plan we’ve come up with. Meanwhile, I can talk while I write, and write it directly into my work TODO list (to be re-triaged afterwords into more finely-grained tasks, as I’ll discuss later).

Work Ticketing Systems

Sometimes, the programming jobs I’ve had have required me to use ticketing systems as a job requirement. These ticketing systems are often both organizational systems and collaboration systems, and while I have to use them as collaboration systems, they don’t do much for me as organizational systems – in fact, often, they increase my organizational burden rather than decreasing it.

Take JIRA, for example. JIRA is a system for tracking work tasks. I’ve used it at several different jobs, and when I worked at a software consultancy, I’ve used it interacting with several different clients.

With JIRA, your work is structured into tickets, which move across a board based on their level of completion. And when those tickets are well-specified and manageable in size, if you’re consuming these tickets, it can actually be quite nice. And because you often need to ask other people to do things, or because there’s often a lot of work to do on a team, but where anyone on the team can do it, some sort of collaborative system is absolutely necessary.

But creating a JIRA ticket takes a lot of work. There’s no way to write a note and postpone to a later meeting or later time how to turn it into a fully-fledged ticket. Often, creating a ticket requires answering a lot of questions, mandatory questions, like what version does it apply for, or similar things – which if you’re making a ticket for another team to work on, or for a new project, or just for a new kind of problem, you might not even know the answers to without asking your colleagues. If you have to create multiple tickets, or side-track yourself from your work to create a ticket, it’s basically impossible to keep track of it all without writing a list of tickets to create, as creating a ticket can take a very long time.

Not even to mention the fact that to write a JIRA ticket, or look at JIRA tickets, I need to de-immerse myself from the land of command lines and text editors, and return to my web browser screen – which is intrinsically more distracting, even if I’m only doing work things with my web browser like following up Slack messages or work e-mails.

So even though it’s tempting to use JIRA directly as an organizational system, especially as that’s how it’s seemingly designed to be used, I can’t. I have to keep my own TODO lists, and when collaboration renders it necessary to make a ticket so someone else on the team can work on it, or so managers have insight into my work, my own TODO list has to contain the tickets.

Furthermore, as I’ve learned, and will go into detail about later, doing my job well requires breaking down tickets far more than JIRA will normally encourage you to do.

And so, in both directions, I look at JIRA more as a communication tool than as a tool for organizing my own work, and from my perspective, it’s actually one more burden, one more thing to organize.

Now, I think there are other ticketing systems that work better, whether in being able to be used directly (in some ways) as an organizational system, or at least in being a more efficient and effective communication system that’s not as much of a burden. But that, I think, is a topic for a potential future post specifically on programming ticketing systems.

In any case, my personal organizational needs are unique enough that I would always have my own system running parallel to it, even if a ticketing system were better than JIRA. I’m sure, however, there’s someone out there using JIRA for their personal life, and I wish them all the best.

TODO Apps

For a while, I used Remember The Milk as an app. I ultimately ended up not continuing because it felt too inflexible to reorganize. My lists simply got too long, and ended up being intimidating, and I ended up not looking at them again.

To be clear, this happens to all my TODO lists: They get longer, I spend some time not removing things from them, I have bursts of many ideas for what to put on them, and eventually they become too long to even dare to look at, as anxiety expands and explodes in my brain. The difference is, if I can take a few items off of the TODO list sometimes, and put them on another list, from which I can’t see the larger list, of things to do on a per-day basis, then I’m much calmer and happier. This is a very specific set of requirements, and most TODO apps don’t work exactly that way.

Even if they did, I always am changing up how my TODO lists are structured, and how they relate to each other. Apps are by nature opinionated about such things. Most of them don’t have support for hierarchies – whereas my TODO lists often are bullet points within bullet points within bullet points, a tree-shaped outline of the task rather than a literal flat list.

And even if it’s possible to move things around between lists freely enough to impose new structures, and to protect myself from the long lists I don’t always want to see, TODO apps are still not the most natural interface for me. Moving things around has to be easy for me, and in an app normally there’s simply too many steps to it, especially too many clicks of the mouse. I’m used to doing things in a more keyboard-driven fashion rather than a mouse-driven fashion, and I’m used to using a traditional computer interface over a phone or web interface. This makes me a weirdo, but it also makes most TODO apps a poor match for me.

My Answer: Developing my own system

So I’ve developed my very own, very bespoke, very complicated system. I’m extremely happy with it, but it’s for me, not for you, so I’m not going to share it.

Just kidding, I’m going to explain it in the next post! But I’ll warn you ahead of time, it might not work for you. It might work just as poorly for you as keeping a hand-written planner is for me, and a hand-written planner might work perfectly for you. But hopefully my experience will give you insight into how brains work and how they differ, and help you understand the diversity of what makes different people tick.

A Strong Typing Example

2022-09-15T00:00:00+00:00

I’m a Rust programmer and in general a fan of strong typing over dynamic or duck typing. But a lot of advocacy for strong typing doesn’t actually give examples of the bugs it can prevent, or it gives overly simplistic examples that don’t really ring true to actual experience.

Today, I have a longer-form example of where static typing can help prevent bugs before they happen.

The Problem

Imagine you have a process that receives messages and must respond to them. In fact, imagine you have potentially many such processes, and want to write a framework to handle it.

The incoming messages are expected to be in JSON, and the responses are also supposed to be in JSON. So your framework parses the incoming messages from JSON before passing it to the application’s callback function, and then serializes the results.

In Rust, the interface for the callback would look something like this (Value is a parsed JSON type from serde_json:

trait MessageHandler {
    fn handle_message(&self, input: Value) -> Value;
}

In a dynamically-typed language like Python, the callback function would look more like this:

def handle_message(self, input):

The code in the callback would then (hopefully) validate the JSON to make sure it meets the expect schema, and if it’s not, return some error in the reply message. In a programming language like Python (I make no promises that my Python is idiomatic or accurate; it’s meant as an example of a duck-typing language), it perhaps could be written like this:

if not self.is_valid_input(input):
    return {"error": "Invalid input", "input": input}

If the JSON is in a valid format, it would do some processing and return a non-error result.

The framework code, in order to do this, runs code that looks something like this (in pseudo-Python):

input = conn.recv_message()
input = json_parse(input)
output = handler.handle_message(input)
output = json_serialize(output)
conn.send_response(output)

And all of this will work just fine.

Except… what if the input isn’t valid JSON? And what if none of our test cases considered this possibility, but it nevertheless arises in production? What if we didn’t even write test cases?

Some Attempts to Solve

Making sure we catch the error at all

In Rust, we would already have a hint that there’s something wrong. JSON parsing in Rust is a function that can fail, and that is reflected in the type of the function to parse JSON, which looks something like this:

pub fn from_slice(v: &[u8]) -> Result<Value>

The Result means that this function can fail. We have to handle that failure in some way before we can get the resultant type. We can crash the whole program:

let input = from_slice(&input).expect("Invalid JSON");

NB: Reusing the name input like this with a different type is allowed in Rust; this declares a new variable that shadows the old one. This is idiomatic when the value is being transformed and we don’t need the old form anymore.

Or we can do what Python will likely do by default, and bubble the error up to the caller of the current function:

let input = from_slice(&input)?;

Or we can handle the error. And in this case, we should handle the error in some way, as we need to reply to the message whether it’s in JSON or not, and so we don’t want to skip over the code that does the reply.

Already, Rust’s typing discipline is helping us. In order to do what Python does by default, we need to at least opt in with a ?. Admittedly, the programmer may do that on autopilot, but it at least gives the programmer a hint that there might be an issue worth spending a second or two considering before moving onwards.

What to do with the error?

But let’s assume that the programmer did, in fact, realize that these errors need to be handled. What should we do in case of an error?

One possibility is to handle it completely in the framework. If we know all inputs must be valid JSON, we can take this burden off of the application code:

try:
    output = json_parse(input)
except JsonError:
    output = {"error": "Invalid JSON"}

But what if we want to give the application-writer more flexibility? What if we envision a situation where the application-writer wants to accept either JSON or non-JSON data?

In a duck-typed programming language like Python, if the parsing fails, we can simply pass the original input to the handler. This is really easy to do.

try:
    input = json_parse(input)
except JsonError:
    pass

Now, the handler function just needs to ensure that the passed-in value is a dictionary in our validation:

def is_valid_input(input):
    if type(input) is not dict:
        return False
    if 'requiredField' not in input:
        return False
    return True

Of course, we might forget to do that, and if we do, we might now throw an exception when we run the not in test, which throws an exception if input is not in fact a dictionary. This would be bad, as not even all JSON parses to dictionaries, but it’s a mistake someone could make if they’re not thinking about error handling.

In Rust, we can’t pass the initial input directly to the handler, as it would be a different type. So if we try to do the direct equivalent to the Python, it gives us an error:

let input = match from_slice(&input) {
    Ok(parsed_value) => parsed_value, // This is the parsed value, type `Value`
    Err(_) => input, // This is the raw `Vec<u8>` data... TYPE MISMATCH!
}

We are then forced to brainstorm another solution, which might raise ideas we didn’t otherwise consider, and force us to backtrack in our design a little, which is actually a good thing because this solution, while simple in Python, has some flaws.

Here’s some solutions we might brainstorm:

Call a different callback in handler for unparsed data
- Application specifies whether data should be parsed
- Framework chooses which callback to call dynamically
Use an enum

That last one is interesting. If we do want to create a value that can contain either Value or Vec<u8>, we still can in Rust. We just have to create a new type that tells the compiler we want that:

enum IncomingMessage {
    Parsed(Value),
    Unparsed(Vec<u8>),
}

Then, before we can do any work on the wrapped Value, we have to say what happens if it’s actually a Vec<u8>:

let input = match input {
    Parsed(value) => value,
    Unparsed(_) => {
        // return an error JSON blob
    }
}

In fact, this even helps with the fact that not all parsed JSON is a dictionary, as serde_json::Value is itself an enum!

Further Problem

But even if we do correctly validate that we have a dictionary, and we output an error in our message response if we don’t, I want to point back to our original pseudo-Python for what error to output:

if not self.is_valid_input(input):
    return {"error": "Invalid input", "input": input}

If input is JSON parsed into a dictionary, it will definitely serialize back into JSON, and this line makes sense. But now that input might not be parsed JSON, but instead might be in some sort of raw format, this dictionary might fail to serialize back into JSON.

Conclusion

A lot of programming is converting data from one format to another and validating it. Strong static typing systems like Rust’s can help prevent mistakes before they happen, and force people to come up with more rigorous designs rather than shoe-horning different values into the same variable, which dynamic typing makes easy – too easy. I hope this example was relatable!

Exploring Traits with Erased 'serde'

2022-08-13T00:00:00+00:00

I came across a programming problem recently where I wanted to use dynamic polymorphism with serde. This turned out to be much easier than I expected, and I thought it was an interesting enough case study to share, especially for people who are learning Rust.

A Brief Discussion of Polymorphism in Rust

As most of you will know, Rust’s system for polymorphism – traits – supports both static and dynamic polymorphism, with a bias towards static polymorphism.

For static polymorphism, it uses the impl keyword, or alternatively, a syntax called “trait bounds” reminiscent of C++. It is implemented through “monomorphization,” which involves making on-demand copies of any polymorphic functions at compile-time. And it is the default way to use polymorphism in idiomatic Rust, as evidenced by the fact that it comes earlier in the Rust book.

Dynamic polymorphism, in contrast, uses the dyn keyword to create “trait objects.” This is implemented through vtables, which are also how C++ implements OOP-style polymorphism. Even though it is more of an OOP-style feature, and therefore more familiar to programmers with an OOP background, in Rust it is less commonly used. This is evidenced by the fact that it is introduced later in the Rust book with a much narrower use case in a chapter that encourages a programmer to “implement[] a solution using some of Rust’s strengths instead.”

The biggest reason dynamic polymorphism is not one of “Rust’s strenghts” is that only object-safe traits can be used with dynamic polymorphism, due to the technical limitations of vtables. Whether a trait is “object traits” is defined by whether it meets a long list of criteria, which generally get more liberal over time as people agree on how to address technical limitations, but fundamentally only some traits can be used with vtables. Additionally, dynamic polymorphism also adds a performance cost, due to indirect calls and less optimization opportunities.

The biggest reason to use dynamic polymorphism in spite of these issues is when an “object” needs to take on a range of possible values at run-time that can’t be expressed in an enum, because other code has to be able to expand the list. As the Rust book points out, this comes up especially often in GUI programming, where the GUI framework has no way to enumerate every possible widget and know how to draw it or how it should handle events.

My Situation

I’m not currently a GUI programmer and I rarely use dynamic polymorphism. My recent experience before Rust was with Haskell and C++ template programming, and both of those are more similar in style to Rust’s static polymorphism.

But it still occasionally comes up.

Step 0: A Normal `serde` Use Case

So here was the situation: I had a data structure that I was serializing into JSON so I could send the JSON over TCP. For the sake of the blog post, let’s pretend I was sending reports on groceries as an extremely contrived example:

pub enum MeatStatus {
    Veg,
    Fish,
    Meat,
}

pub struct CustomerId(pub u64);

pub struct GroceryItem {
    pub description: String,
    pub customer_id: CustomerId,
    pub price_in_cents: u64,
    pub calories: f64,
    pub grams_protein: f64,
    pub grams_carbs: f64,
    pub grams_fat: f64,
    pub grams_alcohol: f64,
    pub meat_status: MeatStatus,
    pub halal: bool,
    pub kosher: bool,
}

Now, I not only wanted to send this data out on the wire, but I also wanted to aggregate it. How many calories was each customer buying, total? How many customers were vegetarian, pescetarian, or religiously observant?

So I needed to pass this data structure around once I got it from the cash register (thank you for bearing with this silly example), and then after extracting some data from it, send it over the wire.

Well, Rust makes this sort of thing easy: “There’s a crate for that.” In this case, it’s serde, which lets you annotate data structures for serialization into JSON and other formats. A simple call to a derive macro makes it implement the serde Serialize trait:

#[derive(Serialize)]
pub enum MeatStatus
...
#[derive(Serialize)]
pub struct CustomerId(pub u64);

#[derive(Serialize)]
pub struct GroceryItem {
...

So far, very easy and boring (though we should probably take more time to appreciate just how amazing serde is, which I will someday write more about in a dedicated blog post).

I then collect the data from the cash register with a function that looks like this, as the cash register has a completely different trait-dependent notion of the food, which is still a static trait because … each cash register is only for one general category of food, because … it’s actually a farmer’s market (I’m good at examples!):

fn extract_grocery_data<T: FarmersMarketStand>(
    customer_id: CustomerId,
    item: &T::Item,
)-> Result<GroceryItem> {
    Ok(GroceryItem {
        description: item.read_description()?,
        customer_id,
        calories: item.calculate_calories()?,
        ...
    })
}

Each farmer’s market stand has its own Item type, and the data from each is extracted and put into this generic structure, so that I can both process it and send it over the wire. Easy enough!

Step 1: A New Requirement

I thought this code was well-structured and well-architected, and patted myself on the back for it! But, as any experienced programmer knows, the true test of a software architecture is when you get a new requirement.

It’s when you get a new requirement (including “fix this bug we found”) that you actually learn if you did a good job with the architecture. It’s the only objective measure. If you built flexibility in, did it have anything to do with the new set of requirements? If not, it might have been over-engineered. Did you make any decisions that made it unnecessarily inflexible? If so, it might have been poorly engineered. Can you still even read the code so you can change it? Do you know exactly where the change fits? Are you tempted to throw the code out and rewrite it from scratch? Can you even still run it on your machine?

I digress.

The new requirement was quite simple: the farmers wanted us to pipe some data back to them from the grocery items. They were already connecting to the TCP stream, but the data we were using to aggregate wasn’t enough. We had to convey more information in the JSON, and unfortunately, this information was FarmersMarketStand specific.

Now, we had to add an additional field to our data structure. But what type should it be? I don’t need to use it for analytics, unlike the other fields. I just need to get it to the TCP connection so the farmers can get it right back:

pub struct GroceryItem {
    ...
    pub halal: bool,
    pub kosher: bool,
    pub market_specific_data: ???
}

Now, if I want to use static polymorphism, I have to add a type parameter to GroceryItem:

pub struct GroceryItem<T: Serialize> {
    ...
    pub market_specific_data: T,
}

But if I do this, I have to keep on parameterizing all my functions after this on this new type parameter. Besides, this would mean that I can’t send all the GroceryItems through a single channel; I have to have a separate channel per FarmersMarketStand. Maybe I could figure it out, but I don’t feel like I should have to, and besides, I’m trying not to have to rearchitecture half the program.

An alternative prospect is serializing the data first, since the only thing I’m going to do with it is serialize it. Then, I can store it in a serialized form. serde_json, which implements serde support for JSON, has a type and a function just for this purpose: serde_json::Value and serde_json::to_value.

That gives us something like this:

pub struct GroceryItem {
    ...
    pub market_specific_data: serde_json::Value,
}

fn extract_grocery_data<T: FarmersMarketStand>(
    customer_id: CustomerId,
    item: &T::Item,
)-> Result<GroceryItem> {
    let market_specific_data = item.read_market_specific_data()?;
    let market_specific_data = serde_json::to_value(&market_specific_data);
    ...

The problem here is, the farmers only connect to the TCP connection maybe 10% of the time, and the rest of the time, I don’t want to pay the extra cost of serialization. Plus, I don’t want to pay the cost of serializing to this intermedaite format, and then to JSON, rather than serializing directly to JSON.

Step 2: Dynamic Polymorphism

Now, you might be having an idea right now: Why not use dynamic polymorphism? This way we can have a little blob that means “I know how to serialize myself,” but we only have to do the serialization if it actually comes up. We don’t have to know anything else about the blob, nor do we have to pass the type all over the place at compile-time with all the baggage that comes with that.

So you write something like this:

pub struct GroceryItem {
    ...
    pub market_specific_data: Box<dyn Serialize>,
}

… and you find out that Serialize is not object-safe. You look up the docs for the Serialize trait, and lo and behold! It’s got one method:

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>

Well, why isn’t this object-safe? Well, at a Rust level it’s one method, but it’s a method uses static polymorphism. At a Rust level, we might think we just need to store what method to call at run-time, but actually, by the time we get to run-time, this isn’t a single method anymore. It will have been monomorphized into a method per every possible value of S, every possible serializer.

Now, we’re only using the JSON serializer, but there’s no way for the method to know that. To make a vtable for this method, Rust would have to write down an implementation of this method for every possible serializer, which is too many and not a well-defined set.

OK, well, you might think, why not take advantage of the fact that we’re just using the JSON serializer? Why not write this:

trait JsonSerialize {
    fn json_serialize(
        &self,
        serializer: serde_json::Serializer,
    ) -> Result<
        serde_json::Serializer::Ok,
        serde_json::Serializer::Error,
    >;
}

This trait is like Serialize, but because it no longer uses static polymorphism, it’s now object-safe. Only one time method is needed per implementing type.

Well, how do we implement this trait? Serialize has a derive macro, but JsonSerialize does not. However, a type’s JsonSerialize implementation could just call the Serialize implementation. And rather than making every farmer at the market do this for their own type, we can use a blanket implementation that says if a value is Serialize, it’s also JsonSerialize:

impl<T> JsonSerialize for T where T: Serialize {
    fn json_serialize(
        &self,
        serializer: serde_json::Serializer,
    ) -> Result<
        serde_json::Serializer::Ok,
        serde_json::Serializer::Error,
    > {
        self.serialize(serializer);
    }
}

So we can have all the trait implementations for the object-safe trait be implemented using static polymorphism based on the non-object-safe trait. This is a common pattern and it’s known as type erasure, because you’ve erased all the <T: Serialize> you would otherwise need everywhere you mentioned the GroceryItem type.

However, this isn’t very good, because we want to use this as part of a serializable structure:

#[derive(Serialize)]
pub struct GroceryItem {
    ...
    pub market_specific_data: Box<dyn JsonSerialize>,
}

See, when the Serialize derive macro gets to the market_specific_data field, it doesn’t implement Serialize. It just implements JsonSerialize, since that’s how we made it object-safe. However, it’s trying to implement Serialize on GroceryItem – for all serializers, and it’s never heard of JsonSerialize.

Step 3: There’s a crate for that!

At this point, I thought: There’s got to be a way to entirely type-erase Serialize. The problem with the method in Serialize is that it’s passed in a statically polymorphic Serializer – but what if we type-erased Serializer? The problem with that is Serializer has like a bajillion methods, so we’d have to deal with all of them in our type-erased version.

My conclusion? It’s possible, but it’d be a lot of work, so much that it might well be its own crate. And when you have that thought, well, one possibility is that crate may already exist.

And lo and behold, it does! Allow me to introduce the excellent erased-serde by David Tolnay. It does all of the work of type erasure for all of serde, and if you’re new to type erasure, the code is worth a read. It even uses macros!

It called its type-erased trait Serialize, which layered on top of the non-type erased trait, called Serialize. If your type implemented Serialize, it automatically implemented Serialize due to a blanket implementation, which was great, because then you could write Box<dyn Serialize>, and would you know that dyn Serialize also had an implementation for Serialize already done?

use erased_serde::Serialize as ErasedSerialize;

I mean to say: If your type implemented Serialize, it automatically implemented ErasedSerialize due to a blanket implementation, which was great, because then you could write Box<dyn ErasedSerialize>, and would you know that dyn ErasedSerialize also had an implementation for Serialize already done?

This meant, all in all, that I could write this:

#[derive(Serialize)]
pub struct GroceryItem {
    ...
    pub market_specific_data: Box<dyn ErasedSerialize>,
}

fn extract_grocery_data<T: FarmersMarketStand>(
    customer_id: CustomerId,
    item: &T::Item,
)-> Result<GroceryItem> {
    Ok(GroceryItem {
        description: item.read_description()?,
        customer_id,
        calories: item.calculate_calories()?,
        ...
        market_specific_data: Box::new(item.read_market_specific_data()?),
    })
}

The cast from Box<impl Serialize> to Box<dyn ErasedSerialize> is implicit, and Box<dyn ErasedSerialize> implements Serialize, so the derive macro is happy!

Voilà!

The code is available in a GitHub repo and the output shows the power of Rust polymorphism:

[jim@palatinate:~/hobby/groceries]$ cargo run | jq .
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/groceries`
[
  {
    "description": "Apples",
    "customer_id": 0,
    "price_in_cents": 3,
    "calories": 10,
    "grams_protein": 10,
    "grams_carbs": 10,
    "grams_fat": 10,
    "grams_alcohol": 10,
    "meat_status": "Veg",
    "halal": true,
    "kosher": true,
    "market_specific_data": {
      "variety": "Gala",
      "doctors_kept_away": 30
    }
  },
  {
    "description": "Bacon",
    "customer_id": 1,
    "price_in_cents": 3000,
    "calories": 10,
    "grams_protein": 10,
    "grams_carbs": 10,
    "grams_fat": 10,
    "grams_alcohol": 10,
    "meat_status": "Meat",
    "halal": false,
    "kosher": false,
    "market_specific_data": {
      "farm_of_origin": "Stolzfus and Sons",
      "breakfasts_served": 15
    }
  }
]

Step 4: Bonus Round: Another Requirement

Does this work well? Let’s see how a new requirement can be dealt with!

So next I learn that I have to implement Clone on GroceryItem, for some of the processing code where we do the data metrics.

I might think, well, this should be easy! I have a Box, and I never write to the inner value, so I just need a cloneable Box, an Arc. Then, I can #[derive(Clone)], and the market_specific_data field will just be multiply-owned.

But, alas, no! This error appears:

error[E0277]: the trait bound `Arc<dyn erased_serde::Serialize>: _::_serde::Serialize` is not satisfied

Why does this work for Box<dyn ErasedSerialize> and not Arc<dyn ErasedSerialize>? Well, this is actually quite straight-forward: There is an implementation of Serialize for Box<T> when T implements Serialize, part of the Serialize crate. It does not exist for Arc.

I know that I can’t do the same in my own crate, but for Arc instead of Box:

impl<T> Serialize for Arc<T>
where
    T: Serialize,
{
    #[inline]
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        (**self).serialize(serializer)
    }
}

– because that would violate the dreaded “orphan rule”:

error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
   --> src/main.rs:191:1
    |
191 |   impl<T> Serialize for Arc<T>
    |   ^                     ------ `Arc` is not defined in the current crate
    |  _|
    | |
192 | | where
193 | |     T: Serialize,
194 | | {
...   |
201 | |     }
202 | | }
    | |_^ impl doesn't use only types from inside the current crate
    |
    = note: define and implement a trait or new type instead

But if we know the orphan rule well, or just read the note in the error message, we know that we can get around it with… you guessed it, a newtype!

Newtypes are named after the Haskell keyword newtype, though in Rust they don’t use that keyword, so we refer to the “newtype pattern.” In both Haskell and Rust, they’re the standard way to get around the orphan rule. The premise is simple: We define a new type that is distinct to the compiler (so we can’t use type) but not practically distinct. It’s generally implemented in Rust as a tuple-struct with one field.

There’s two ways to go with this, as this blog post indicates (which surprisingly enough is also about serde!). We can try and fix the Arc<T> problem for everybody with a generic newtype, or just for ourselves with a regular ol’ newtype.

Here’s how the regular newtype solution looks:

#[derive(Clone)]
pub struct MarketSpecificData(Arc<dyn ErasedSerialize>);

impl Serialize for MarketSpecificData {
    #[inline]
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        (*self.0).serialize(serializer)
    }
}

#[derive(Serialize, Clone)]
pub struct GroceryItem {
    pub description: String,
    pub customer_id: CustomerId,
    ...
    pub kosher: bool,
    pub market_specific_data: MarketSpecificData,
}

If your takeaway at this point is that writing trait-heavy code involves a lot of functions that call other functions with the same name and almost the same arguments, you’re not wrong.

However, in this case it was all unnecessary, as it turns out that we can get support for Arc<T> from serde itself if we enable the rc feature:

[jim@palatinate:~/hobby/groceries-contrived-example]$ cargo add --features rc,derive serde
    Updating crates.io index
      Adding serde v1.0.143 to dependencies.
             Features:
             + derive
             + rc
             + serde_derive
             + std
             - alloc
             - unstable

Both versions are available in the example repo’s clone branch.

Write Everything Down (Part 1)

2022-08-10T00:00:00+00:00

Memory Leak

I have an excellent memory. I have a terrible memory.

Well, which one is it?

This is a confusing state to be in. It can be frustrating to people around me. How is it – my father used to ask me when I was in high school – that I could remember all the lessons and readings for my tests in school, and get all the good grades, but couldn’t ever remember to do the simplest task or household chore, or to bring with me the simplest item? And of course the fact that I remember these conversations from so long ago is a bit of a case in point.

So I’d like to introduce a distinction between different kinds of memory, a technical distinction made by mnemologists, which is what I wish people who studied memory were called:

Retrospective memory is the memory we normally think of when we hear the word “memory”: it is the memory of what was. It is the type of memory that you use to look through the past. It allows you to recollect when you first met someone, or to explain the Fundamental Theorm of Calculus after a decade. Think of it like a database, where you can make queries against it.
Prospective memory is less famous but equally important and equally deserving of the word “memory”: it is the memory of what to do. It is the type of memory that you use to make sure all your responsibilities and goals are handled. It allows you to remember to pick up milk for your partner, or to copy-edit before publishing your blog post. Think of it like the notifications from all the annoying apps on your phone; rather than you querying it, this type of memory makes requests of you.

My retrospective memory is solid, even legendary. I remember obscure conversations from years, even decades, ago – and I still feel some type of way about them! More privately but more troublingly, I remember embarrassing things I’ve done all the way back to Kindergarten, and still feel some type of way about those too. Whatever part of my mind is responsible for this type of memory and its concomitant long-term annoyance and shame is in perfect working order – in fact, I think I’d rather like to get hit in the exact right spot in the head to take that part of my brain down a peg! (Although, to be fair to that portion of my brain, it did help quite a bit with exams when that was part of my life, and it helps a lot now with remembering essential programming facts for my job, not to mention essential linguistics facts to tell my friends.)

My prospective memory, on the other hand… Well, I never remember to bring with me the things I need, prompting my German professor in college to ask me, “Ist es der Fall, dass du niemals deinen Kuli zur Klasse bringst?” (Is it the case that you never bring your pen to class?) It was indeed; I never did.

I will even regularly forget to brush my teeth in the morning, only to remember when I smell my own breath. I could put up a post-it note to remind myself, but I would forget to look at the post-it note.

I always used to forget to take my lunchbox from one class to another in school, so my mother ended up buying extra lunchboxes, and my lost lunchboxes wandering the school became such a trope that one group of people started calling me “Lunchbox” after how many times I’d abruptly run to a previous class to get one, or claim an old one that someone found with stinking spoiled food.

And those “senior moments” my parents and grandparents would joke about when I was a kid, where you’d walk into a room and forget what you were going to do? I felt like I was having those as often as an actual senior at 16, or even at 12… Imagine how I’ll be when they actually are age appropriate!

But more importantly, for me, this gets me in all the little things of life: the chores and the errands, and all the baby steps towards achieving my goals. I may have to check my mail, because a check is coming – or even if there is no check, if your mailbox is full they eventually stop bringing the mail.

I may have to make sure that when I next travel to my hometown, I bring along a book that I have to give back to a friend there, because people don’t understand how likely I am to fail at that task, insist it’s easy and that I’ll be able to do it, and then get really frustrated – and I feel really bad – when their book that they intended to have me deliver ends up disappearing to whatever afterlife objects go to when I no longer am exactly sure where they are. It’s bad enough when my own stuff goes there.

Write Everything Down

I can whine all day about my particularly – even clinically – bad (prospective) memory. But I also know that even people with prospective memory much better than mine can benefit from organizational techniques. That’s why civilizations throughout history have created technology to augment our memory – both prospective and retrospective.

The biggest, grandest, most impactful such technology – the one that not only changed the course of history, but allowed history as we know it to even be possible – is of course writing. Whether implemented by using reeds to stamp symbols into clay or using pens and ink to draw letters on parchment or vellum or paper, or making stamps for those same letters, or storing the symbols in a binary encoding in computer memory, writing has spanned many technological and civilizational eras to augment our brains, to help us record bits of language, whether retrospective or prospective, whether it’s about things that have happened, or things that have to happen, beyond our natural brain capacity.

There are deep social consequences to this technology we call “writing.” Since it augments both prospective and retrospective memory, and literacy is now common-place and expected, standards have risen in society. A more and more complex society – with its chores and bureaucratic paperwork – requires more and more prospective memory to handle it. The advances of writing have been almost entirely consumed by increases in societal demand for organization. More and more jobs have become abstract instead of concrete, and also have required more and more prospective memory. Since writing is available to everyone, the ability to use it well has transitioned from being an edge and an advantage to being a necessity. For many jobs and many lives, even normal or good prospective memory is no longer enough without the aid of writing.

As someone with poor prospective memory, I have personally found writing to be invaluable. But it hasn’t always come naturally to me to use it as intensely as I have to. Most organizational practices assume a certain baseline of prospective memory and focus, often higher than I naturally have. Therefore, the way I use it can be peculiar, sometimes even intrusive: It is only recently that I’ve become comfortable using my system fully in my social life, with the confidence to pause my friend if they casually say something that generates a TODO item, so I can send myself an e-mail and make sure that that TODO item actually happens. Waiting till the end of the conversation like a normal person won’t work for me, and certainly neither will “just remembering” to do the thing.

See, I need to write everything down, every little obligation I incur, and a lot of things that I feel the need to write TODO e-mails for are things that other people would naturally remember. But that’s the thing: I won’t. Or at least, I don’t trust myself to. And I trust my judgment about my memory better than my friends’. So please excuse me as I whip out my phone to send myself an e-mail – I’ve gotten very fast at it.

And because text messages cannot be marked unread, often just asking for a text message with a recommendation (of something to read or listen to) isn’t enough. I often need to write a corresponding e-mail to myself to remember to go find that text message and actually do the thing. So it’s not just “text me the article,” it’s “text me the article while I make a note of it on my phone through e-mail.”

You Need a System

Just “use writing” (or its digital equivalents) isn’t a complete answer. And it wasn’t just a lack of confidence interrupting conversations to write things down that was holding me back before. There are a lot of other questions that need to be answered:

Where should I write things?
What apps should I use, if any?
How should these notes be organized?
What tasks are required to keep them organized that way?

As I said before, if I were to just write everything on post-it notes, I would forget to look at the post-it notes. There’s a reason that “buy a planner” is widely panned as bad ADHD advice – ADHD makes using the planner hard, and the details of how you use that planner are equally, if not more, important. There needs to be a system, and that system comes with additional chores, and since this is the system that tells you what other chores to do, those chores need to become a habit.

As everyone’s brain works differently (whether ADHD or not), people differ tremendously in what their ideal organizational systems are. For me, I am much less productive if I have a less than ideal system – the stakes are very high. But even for people who can be productive on any system, I think that tailoring their system to their brain, their lifestyle, their job and schedule and hobbies, can have amazing results.

Future Posts

In my next organization post, I shall go over some systems that don’t work for me, and why they don’t. Finally, I shall lay out my system, which I already have discussed some already in a previous post.

Blocking Sockets and Async

2022-08-08T00:00:00+00:00

Using async in Rust can lead to bad surprises. I recently came across a particularly gnarly one, and I thought it was interesting enough to share a little discussion. I think that we are too used to the burden of separating async from blocking being on the programmer, and Rust can and should do better, and so can operating system APIs, especially in subtle situations like the one I describe here.

Every async programmer learns early on not to call a blocking function from an async function. If you do, it is a hidden color violation, as I discuss in a previous post. By “hidden,” I mean that unlike other color violations, Rust gives you no compiler-time help. You just have to use discipline. You just have to “make sure not to do it.” You just have to increase your cognitive load. It is a rule that the computer is no help with – which means that you’ll definitely mess it up at some point, possibly at many points.

Unfortunately, it’s also a gnarly problem to debug. The actual blocking function call will quite possibly work just fine. It’ll return when the resource is ready, and block until then – probably exactly what you wanted. It’s the rest of the system that falls apart – other tasks on the same thread starve, tasks that are depending on them for progress also starve, but meanwhile other tasks might proceed without a problem. Worse, there’s no guarantee that the bug will manifest every time, so the bug isn’t readily reproducible.

You might think this is an easy problem to address, either through improvements in the programming language or better programming discipline.

At a programming language level, you could imagine Rust having some sort of generalization of unsafe, or maybe an effects system. Functions that block would have blocking as part of their signature. Calling a blocking function from an async function would then be an error, with a way out for functions like spawn_blocking.

Unfortunately, Rust doesn’t have this feature, so we have to rely on programmer discipline. The discipline seems easy enough: If you’re in an async function, and you call a function that’s going to take some time or do I/O, make sure you’re doing an async call, which in most cases means using the async keyword.

Unfortunately, this doesn’t work 100% of the time, because the operating system isn’t on board. There are system calls that block sometimes, based on dynamic configuration. Does the recv system call block? Well, that depends on whether the socket is a blocking socket, or a non-blocking socket. Fundamentally, recv is run-time polymorphic on socket type, in a way that makes it a different color based on run-time information.

This is bad design: BSD should have split recv into two system calls, recv or recv_nonblock. recv could error if given a non-blocking socket, and recv_nonblock could error if given a blocking one. Linux at least has a flag MSG_DONTWAIT that makes an individual recv call unconditionally non-blocking, but it’s non-standard. It’s not supported on macOS and tokio/mio understandably doesn’t use it.

Most of the time, this isn’t an issue. Sockets controlled through tokio or other async runtimes are always configured with the operating system to be non-blocking, as an invariant on those socket types. Sockets controlled through std or other libraries will be blocking, and will be contained in completely different Rust types. The Rust type system is used to keep track of the distinction even if the operating system won’t.

But this becomes an issue where these boundaries are broken, namely in conversion functions between them. These methods then have whether or not a socket is blocking as part of their contract. For example, the documentation for TcpStream::from_std says:

This function is intended to be used to wrap a TCP stream from the standard library in the Tokio equivalent. The conversion assumes nothing about the underlying stream; it is left up to the user to set it in non-blocking mode.

Thus, as a precondition of calling the from_std function, you must pass a “non-blocking” socket. If you instead did not set the socket as non-blocking – perhaps because you were making it with some extra options you needed, but assumed that tokio would handle the non-blocking part – bad things happen.

If blocking were considered a safety issue, this function would be marked unsafe. But it’s not, and so it’s simply an unchecked precondition – and we’re not used to those in Rust. Most safe functions check their preconditions, either returning a special value (like an Err) or panicking if something is wrong. The ones that don’t are typically marked unsafe. Unchecked preconditions still exist – they cause rogue behavior but not behavior deemed “unsafe” under Rust’s definition – but they are rare, and therefore surprising to a Rust programmer.

Why is it not a checked precondition? That’s easy to answer: Checking it would take an extra system call, as would unconditionally setting it unblocked in that system call itself. System calls are slow, and that would be an unacceptable performance penalty for many applications.

This leads to a dissapointing end result, though. It’s not enough to simply make sure you don’t call I/O methods unless they come with an async version. To be disciplined enough to be an async Rust programmer, you also have to watch out for these extra unchecked preconditions.

Otherwise, you get a hidden color bug that’s even harder to track down because the blocking functions you’re calling don’t look blocking. tokio calls recv, thinking it’s not blocking, but it is. You expect tokio to be correct, but because of this broken invariant, it isn’t. These sorts of issues can be very hard and time-consuming to debug.

Why Rust should only have provided `expect` for turning errors into panics, and not also provided `unwrap`

2022-07-14T00:00:00+00:00

UPDATE 2: I have made the title longer because people seem to be insisting on misunderstanding me, giving examples where the only reasonable thing to do is to escalate an Err into a panic. Indeed, such situations exist. I am not advocating for panic-free code. I am advocating that expect should be used for those functions, and if a function is particularly prone to being called like that (e.g. Mutex::lock or regex compilation), there should be a panicking version.

UPDATE: This post by Andrew Gallant, author of the excellent ripgrep, is a good overall discussion of the topic I am trying to address here. I basically entirely agree with it and recommend it as very educational; specifically, I disagree only in that I think that linting for unwrap is a good thing, for the reasons he acknowledges but ultimately does not find compelling in that section. In his own terms, I just think that the juice is worth the squeeze.

I see the unwrap function called a lot, especially in example code, quick-and-dirty prototype code, and code written by beginner Rustaceans. Most of the time I see it, ? would be better and could be used instead with minimal hassle, and the remainder of the time, I would have used expect instead. In fact, I personally never use unwrap, and I even wish it hadn’t been included in the standard library.

The simple reason is that something like expect is necessary and sometimes the best tool for the job, but it’s necessary rarely and should be used in the strictest moderation, just like panicking should be used in strictest moderation, and only where it is appropriate (e.g. array indexing, for reasons I elaborate on later). unwrap is too easy and indiscriminate, and using it at all encourages immoderate use.

This has turned out, much to my surprise, to be a somewhat controversial stance, and so I’d like to take some time to explain why I feel that way.

I’ll begin by reviewing what Result is and what options we have for dealing with its recoverable errors.

`Result`s and what to do with them

Rust is widely and rightly praised for its use of Result for recoverable error handling. Instead of using exceptions like C++, which propagate invisibly and surprisingly, or using sentinal values like NULL and -1, Rust has sum types and thus, a function can return a value that is either an error (of a specified, potentially narrow type) or the value we want:

#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

If we have a function call foo() that can fail and therefore returns a Result, we have a few different tactics we can use to handle it:

Ignore: We can ignore the return value, and therefore also ignore whether it errors. This is almost never what we actually want, and so the #[must_use] annotation on Result causes a warning to be issued:

foo(); // WARNING

Manual We can manually match on the return value and do different things:

match foo() {
    Ok(value) => do_something(value),
    Err(err) => handle_error(err),
}

Propagate: We can propagate the error ergonomically using the ? operator. This makes Result work like exceptions in many of the good ways, while cluing in the reader to the additional control flow, which is good:

foo()?;

Panic with custom message: We can transform the error into an “unrecoverable” panic using expect, which takes a string argument which is used to customize the error message:

foo().expect("foo error");

Panic without custom message: We can transform the error into a panic with unwrap, which does not take a string argument and therefore leads to more generic error messages:

foo().unwrap();

Most of the time, in production code, we will want to go with the propagate option, especially in library code where the application will likely have a better notion of what to do with the error. This option makes the flow control clearer, tends to result in better error messages when the error messages are ultimately outputted, and gives the calling functions more options.

The manual option is useful even in a library for when an error is in fact recoverable at that particular point (e.g. by retrying). In an application, we’re often at a point where it makes sense to report the error (via a log message or console output or user-facing error message).

Sometimes, errors are in fact no big deal, and should be suppressed completely, but this is better expressed through the manual option at the application layer, with a comment explaining why the right thing is to do nothing.

But sometimes (and only sometimes), panic is appropriate, and for that, there are two options, unwrap and expect. I always prefer expect for this, and pretend that unwrap doesn’t exist, because it makes panicking too easy. To explain, I’d like to discuss in what situations I think panics are appropriate.

When to panic?

Escalating an Err result to a panic should be done in similar situations to when panic is appropriate in general, which the Rust book offers some guidance on.

The most clear-cut case is when a code path is a logic error, when the error is only possible if the programmer has made a mistake and an invariant has been broken.

A typical example is array indexing. We often find ourselves with an array index that we (think we) know is valid and we want to use it to index an array, because we got it from looping or otherwise operating on the array bounds. We’re not so confident that we want to use unsafe and do an unchecked array access – that could result in a security vulnerability if we’re wrong – but it would also be nonsensical to try to recover from such an invalid access.

For array indexing, this is actually the most common scenario, and so the index operator in Rust actually panics for us if we specify an out-of-bounds index. An unsafe checked array indexing method container is available, as is one that never panics and instead returns an Option, but most of the time we want the panicking checked operation and so that is the version that gets the syntactic sugar: arr[index] will neither memory corrupt nor return a recoverable error on an invalid index, but instead panic.

This is definitely the best default for array indexing. But sometimes, logic errors result in recoverable errors, in Err (or its equivalent in the Option world, None). For example, maybe you have an array-like data structure in which only a get method is available, which returns an Option. If you were confident in your indexing, you would want to panic on None, and you can call expect or unwrap to make that happen.

It is certainly more ergonomic to write expect than to do the match manually, and less likely to lead to mistakes:

let val = arr.get(i).expect("i should be valid index");
let val = match arr.get(i) {
    Some(val) => val,
    None => panic!("i should be valid index"),
}

Besides logic errors, panics are also relevant in test cases, where they are used to indicate test failure.

No need to panic: Propagation Made Easy

However, expect and unwrap – especially unwrap – are also amenable to overuse and misuse.

Perhaps you’re doing prototyping and just need something that works most of the time, or you’re writing a simple app with limited error-handling needs. Some people use unwrap and expect for this situation, but I don’t. I use ? even in that situation, because I never know when prototype code might have to escalate to production code – either so suddenly there’s no time for me to intervene and improve the error handling or so gradually there’s no occasion for it and it never gets prioritized. Fixing crappy usage of ? in such a situation is way easier and more likely to happen than fixing a bunch of expects or unwraps.

How can I prototype with ?? Doesn’t it require a lot of extra work, compared with unwrap? Honestly, not really. Writing Result<Foo> is not substantially harder than writing Foo for functions which can error. As for converting between error types, libraries like eyre and anyhow exist so that all errors can be included.

Example code similarly can be written with ?. This is important because Rust is rapidly growing and has a lot of new programmers using it. They see that a function returns a thing, and want to get to the thing and don’t know how to, and they see unwrap in the example code and they cargo cult it. Even if they have learned a thing or two about Rust, it does have the perfect type signature for their problem, and so they jump on it, and end up using it in prototype code and then trying to use it in production code. Perhaps they know about ?, but it has a higher barrier to entry, and so they’ll procrastinate learning about it.

In these situations, unwrap provides an easy, ergonomic way of calling a function that might error, and so it’s very tempting, like walking through the grass when there’s a paved path available. However, ? is generally preferable to unwrap or expect, and so the relative easiness is misaligned to the order of preferences.

And unfortunately, once code has been written using unwrap or expect heavily, it’s hard to adapt it to use ? and propagation, especially if those interfaces have come to be relied upon.

Why I prefer `expect` to `unwrap`

There are definitely legitimate use cases to turning an error into panic, but they are relatively rare, especially if the code is well-factored. Turning an error into a panic is also extremely tempting to be abused. The second situation is more common than the first, so in many codebases, the bad unwraps and expects, the sloppy “OK for now” ones or the “it’s just an example” ones outnumber the legitimate use cases.

Raising the barrier to entry seems like a good solution, and expect seems like the perfect balance. The error string can also serve as documentation of why this decision was made, like comments for unsafety. The fact that expect is a little less ergonomic is a feature, as it discourages casual use. expect has enough convenience to encapsulate the concept of escalation from a “recoverable” error to an “unrecoverable” error, but not so much that it competes with ? in ergonomics.

expect’s error message can serve as a comment as to why the panic is justified. Comments are a good thing, and for as questionable an operation as escalating an Err to a panic, it’s useful to explain why we think it will never happen even if we think it’s obvious. Like the comments recommended for unsafe blocks, I think that expect is a situation that deserves some indication to the reader as to why the author thinks this is OK.

Why have this in the error message rather than just a comment? expect’s error message is also helpful in debugging. unwrap can give good error messages, printing the error value and providing a backtrace, but in other configurations and deployments you might not see a backtrace and the error value might not be useful. Some unwrap calls might provide good enough error messages sometimes, but it doesn’t work 100% of the time, so it can’t be relied upon – especially when expect is readily available. Especially in the case of a logic error, when the condition was thought impossible, debugging will already be hard, and the person doing the debugging needs all the help they can get.

Objections

When I’ve expressed my opinions about unwrap before, one objection stands out in my mind as particularly interesting and particularly valid. I say above that legitimate use cases to turning an Err into a panic are rare, which is generally true, but sometimes can seem false. There are certain APIs where it comes up a lot, APIs where Errs frequently are actually logic errors.

For example, regular expressions. The regex crate uses a method called new that is used to prepare regular expressions. It is practically always called on a constant string, making any failure a logic error, which should result in a panic, as discussed above. However, this same new method returns a Result, necessitating an unwrap or an expect to make the logic error into a panic. Am I seriously suggesting that the poor user write .expect("bad regular expression") instead of .unwrap() every time?

Well, that puts regex compilation in the same category as array indexing in my mind, and means that the default regex compilation function should panic on the user’s behalf (of course, the Result version should still be possible, just as get is a possible function for slices).

Similarly, when I’ve expressed my opinions about unwrap, some have assumed I’m opposed to panics altogether, and asked me if I used array indexing, implying that if I accept the possibility of panics in array indexing, I should accept the possibility of panics in unwrap as well.

For both of these objections, I want to clarify something: I’m not opposed to panicking in logic error situations. But that does not imply that unwrap is a good idea. Most Errs are not logic errors, and so converting one to a panic should be a little inconvenient, and should require the user to think enough to write an error message.

For those situations where an error is actually likely to be a logic error, such as array indexing or regex compilation, returning Result need not be the function’s default behavior. Perhaps the author of regex can make new panic on compiler error, and another function can be written for when the regex in question was user inputted, or where a regex compilation error would not be a logic error.

In general, when you find yourself using expect or unwrap over and over again in the same way, and you’re sure it’s legitimate each time, do what you do with all smelly-seeming code if you know it’s actually the right thing in spite of the smell: Wrap it in an abstraction. Put it in a function that calls expect to panic on error.

This is not cheating. This new, panicking function would instead serve as a documentation for the fact that in this context, an Err is in fact likely to be a logic error, a tangible paper trail that someone made a conscious call that, as a policy, panicking is appropriate in this instance. The decision to panic instead of returning an Err in this situation is made in one place instead of many, where it can be explained in a detailed comment if desired, and where it certainly won’t be too much of a burden to use expect instead of unwrap. Even the fact of the function existing and having a panic-based interface is a signal from the library author that they have thought about this issue, and deemed the situation to be more analogous to array indexing than, say, a file-not-found.

Tendencies and Statistics

In any case, array indexing and regex compilation are the exceptions, not the rule. Almost all bounds checks failures may be logic errors. Almost all regex compilation errors may be logic errors. Making these functions panic would indeed do little damage, as panicking is almost always the right move.

But – and this is a big “but” – most functions, when they return Err, genuinely are signalling recoverable errors, and unwrap doesn’t discriminate – it works equally well on all of them, in the inappropriate situations as well as the appropriate situations. With array indexing and regex compiling, the nature of the function being called gives some indication of why it’s a logic error; with unwrap, there is no indication.

Generally, this argument is in terms of statistics and human nature, not in terms of absolutes. Turning an Err into a panic should be rare, not necessarily in terms of how often it happens, but in how often it shows up in code. If it is common, either the programmer is using bad practices, and should be using better practices, or the API has a design flaw, and that needs to be fixed. In either case, expect is better than unwrap.

Ideally, we don’t get used to seeing expect and unwrap being used all the time. We don’t get used to casually panicking on Err, but instead treat panicking like an operation that should be considered carefully, whether once for all instances of a specific call (as in array indexing or regex construction), or on a case-by-case basis (for other uses of expect).

Humans are creatures of habit and lazy by nature. unwrap is a powerful tool, a way to get around the type system, and as such, we might find ourselves addicted to it. We should treat even expect as mildly suspicious, something only to be used with consideration, something to be wrapped behind an abstraction (as in the regex case). unwrap is even more dangerous, because it is easier, and given that legitimate usage of except should be rare (again in terms of lines of code, not frequency of invocation) and hidden behind an abstraction when it is common in frequency of invocation, I see no need for unwrap to exist.

Context

I am aware that removing unwrap from Rust is not a viable option at this point, which is why I said that I wish it was never put in Rust to begin with. I am aware that unwrap is used in the Rust compiler, and that there is no consensus to avoid unwrap to the level that I avoid it.

I will however note that the documentation of unwrap comes with a warning not to use it. The warning is framed in terms of the fact that unwrap may panic, but the documentation of expect, where this is equally true, does not come with such a warning.

Conclusion

Escalating an Err to a panic is sometimes appropriate. But it should be a considered choice, either on a function-by-function basis (through a wrapper function calling expect or a different choice of interface), or on a case-by-case basis. In either case, unwrap makes it too easy.

Including an error message, and documenting why a panic is appropriate (either through the error message or separately) should not be too much to ask. If it is, that’s a code smell. The fact that expect is more difficult is a feature.

In this article I have mentioned only briefly the other motivation for using expect – better error messages for debugging. I thought the code smell argument was more important. But debuggability can be very important as well, so I’ll discuss it briefly here. I don’t think it’s safe to assume backtraces will always be available. I don’t think it’s safe to assume every use of unwrap will print a useful error message, even if it sometimes can. Maybe an individual use of unwrap in one context does not cause this problem, but once unwrap is established as acceptable, it opens the door for it to be abused.

I personally do not use unwrap, nor do I sign off on code that does. I even prefer expect("foo") to unwrap, because it signals that it’s off-the-cuff example code and shows that the person writing it knows that more consideration would be needed to put it into production. Please consider joining me in this approach.

If you do not want to implement so strict a policy, and you think I’m too extreme in this way, hopefully this article at least makes my argument clearer, and explains why I do not call unwrap but still feel comfortable indexing my arrays. Hopefully also this has given food for thought about Results, errors, and panics.

Edits

This post has been edited to clarify certain things, including a clarification in the opening to the post to make sure my overall position is easily comprehensible.

Fiction Review: Plain Truth

2022-07-06T00:00:00+00:00

I enjoyed Plain Truth by Jodi Picoult. I finished it a couple of months ago, when I was feeling very restless and impatient about everything going on in my life. At the time, I desperately needed fun books to read, but I was simultaneously having a lot of trouble finishing books.

This book pulled me the whole way through when other books were failing to: It was in a setting, the Amish communities, that had always interested me. It was competent enough dealing with that community to not drive me away. It made nuanced and smart enough points to keep me engaged, without being so subtle or so sophisticated as to be too heavy or dry or otherwise difficult to get through. All in all, the perfect balance for where I was just then.

This book juxtaposes two concepts that people wouldn’t normally associate with each other: the pacifistic, quaint, and well-respected Amish community; and the trend of young unwed mothers murdering their newborns, which was commonly discussed in the news at the time the book was written and which the author has discussed as inspiration.

The general theme of the book was that Amish people are just people. They’re not a monolith, and their culture, while it values conformity, doesn’t erase individual differences or interpersonal tension. The book managed to avoid the twin temptations of glorifying and fetishizing Amish culture on the one hand, and degrading it as cultish or criticizing it on the other. The differences are impactful but also nuanced and they’re morally complex.

There were a couple of minor details that got me, nerd as I am. Some of the Pennsylvania Dutch was misspelled, especially the name of the language which is Deitsch not Dietsch (pronounced with an “eye”-vowel). This made me laugh but didn’t detract from my enjoyment too much – though it did make me more aware that the Amish culture as depicted in the book was to a certain extent a fictional culture inspired by actual Amish culture rather than a documentation of it.

Another minor quibble: There was a scene where a judge wanted an Amish witness to swear an oath and they had to negotiate the accommodation of “affirmation” on the spot. “Affirmation” is a well-established accommodation for people who don’t swear oaths for religious regions; it used to be much more common, is in the US Constitution, and is talked about in law school. The judge wouldn’t have had to have it explained to them and the lawyer wouldn’t have had to come up with it on the spot. I do concede, however, that how the book did it was more interesting.

Like all Jodi Picoult books, it came with a twist at the end. I shan’t spoil it, but I will say that it was interesting, emotionally challenging, and resonated well and contributed to the previously-established themes.

All in all, a read that I enjoyed and needed at the time!

Another Confusing Haskell Error Message

2022-06-17T00:00:00+00:00

The Error Message

I’ve written before about just how befuddling Haskell error messages can be, especially for beginners. And now, even though I have some professional Haskell development under my belt, I ran across a Haskell error message that confused me for a bit, where I had to get help. It’s clear to me now when I look at the error message what it’s trying to say, but I legitimately was stumped by it, and so, even though it’s embarrassing for me now, I feel the need to write about how this error message could have been easier to understand:

frontend/src/Frontend/WordTiles.hs:87:25-45: error:
    • Could not deduce (HasDomEvent t () 'ClickTag)
        arising from a use of ‘domEvent’
      from the context: (DomBuilder t m, PostBuild t m, MonadHold t m,
                         MonadFix m)
        bound by the type signature for:
                   app :: forall t (m :: * -> *).
                          (DomBuilder t m, PostBuild t m, MonadHold t m, MonadFix m) =>
                          m ()
        at frontend/src/Frontend/WordTiles.hs:(70,1)-(76,9)
    • In the expression: domEvent Click submit
      In an equation for ‘click’: click = domEvent Click submit
      In the second argument of ‘($)’, namely
        ‘do inputText <- fmap value $ inputElement $ def
            submit <- el "button" $ text "Submit"
            let click = domEvent Click submit
            pure $ current inputText <@ click’
   |
87 |             let click = domEvent Click submit
   |                         ^^^^^^^^^^^^^^^^^^^^^

The code in question was in the Reflex FRP’s “widget” monad, defined as usual by a number of monad typeclasses:

app
  :: ( DomBuilder t m
     , PostBuild t m
     , MonadHold t m
     , MonadFix m
     )
  => m ()
app = do
    let
        start = Game [] wordSet "PIETY"
        moveAll word (gm, _) = move word gm
    rec
        game <- foldDyn moveAll (start, []) newWord
        gameDisplay game
        newWord <- fmap (fmap T.unpack) $ el "div" $ do
            inputText <- fmap value $ inputElement $ def
            submit <- el "button" $ text "Submit"
            let click = domEvent Click submit
            pure $ current inputText <@ click
    pure ()

My Confusion

Some of you might already see the problem, especially those who know Reflex. But I didn’t see it. My brain saw (HasDomEvent t () 'ClickTag) and completely misread it. I assumed it meant something like “with t as the tag, we can get the DOM event as 'ClickTag.” I assumed that the () was irrelevant to understanding the type, indicating some sort of optional type was not necessary to be provided.

I then tried to address this by adding (HasDomEvent t () 'ClickTag) to the context of app:

app
  :: ( DomBuilder t m
     , PostBuild t m
     , MonadHold t m
     , MonadFix m
     , HasDomEvent t () 'ClickTag
     )
  => m ()

It wasn’t the issue.

I had hoped this wasn’t the issue, but I thought it might be, and I had no idea what the issue actually was. Maybe we just needed to list all the DOM events t can handle, I had thought. I should’ve noticed it was t and not m, and I would expect m to be involved in such a context. I should have read the thing out loud in my head, and realized that it wasn’t t that didn’t have the DOM event of 'ClickTag, but (). But I didn’t. My eyes kind of glazed over at the complicated typeclass expression. I just didn’t think.

The Solution

The problem, a friend had to tell me, was nothing to do with t and everything to do with (). submit was not, as I had thought, a representation of the DOM element I had created with a button. To do that, you need to call el':

(submit, _) <- el' "button" $ text "Submit"
let click = domEvent Click submit
pure $ current inputText <@ click

submit, gotten from el, was actually of type (). And, of course, you can’t get any DOM event out of (), let alone a Click.

Better Error Messages

But while I left this situation with take-aways for myself, to better read Haskell error messages in the future, I was also frustrated at the Haskell compiler, especially in comparison to the Rust compiler I have gotten used to recently through my job.

List Involved Types

How on earth did it not indicate at all that (HasDomEvent t () 'ClickTag) was a problem with the type of submit? Sure, the constraint “arose” from the type of domEvent, but submit is clearly an important value involved in making the type not work.

This is easier to implement than a Haskell person might think. I understand that it’s unclear which type “caused” the problem from a human perspective. So why not list them all? Just a laundry list of inferred types would’ve been helpful: I would have seen that submit was of type (), and that would’ve helped me through the situation. Is that too much to ask? Something like this:

Related types:
domEvent :: HasDomEvent t a => EventName en -> a -> Event t (EventResultType en)
Click :: EventName ClickTag
submit :: ()

Any two of those types would have given me the hint I needed. Really, either domEvent or submit would have enabled me to figure it out.

Warn About `()` Bindings

Similarly, how on earth was I allowed to write this line without a warning:

submit <- el "button" $ text "Submit"

submit is invariately (). Shouldn’t binding a () value be at least a warning? In what possible situation would you want to do that? I know that situations exist, especially situations where a type is sometimes (), but this type is invariably (), and I have -Wall turned on in this project. I want warnings for things that there are occasionally legitimate use cases for. Binding a name to (), especially when it’s from a function call and not literally let unit = (), has got to be a mistake 99 times out of 100.

This is apparently not a warning in Rust either, and I am confused by that, because Rust is normally better about its warnings:

fn foo() {
}

fn main() {
    let x = foo(); // Compiles without warning!
    drop(x);
}

I think it would be a reasonable and useful warning in both programming languages. The opposite situation already provokes a warning in Haskell, where you have an action in a do-block that returns a value and you implicitly ignore it:

[jim@palatinate:~/Writing/TheCodedMessage/content/posts]$ ghci -Wall
GHCi, version 8.8.4: https://www.haskell.org/ghc/  :? for help
Prelude> do { pure 'x'; pure () }

<interactive>:1:6: warning: [-Wunused-do-bind]
    A do-notation statement discarded a result of type ‘Char’
    Suppress this warning by saying ‘_ <- pure 'x'’
Prelude>

It only makes sense that the converse mistake, which is even more likely to be a mistake, also have a warning.

Conclusion

Error messages are an extremely important part of a programming language, both for adoption and for programmer efficiency. Part of the point in working in a strongly-typed language with a sophisticated type system, like Rust or Haskell, is supposed to be that we discover most of our problems through compiler error messages, rather than through runtime bugs. So most of our troubleshooting will happen at compile time, grappling with these error message. This makes error messages in Haskell more important than in the average programming language, and makes the standard for good error messages even higher. We can do better than the status quo, and we should.

Command Line Interface UXes Need Love, Too

2022-06-16T00:00:00+00:00

It took me a long time to admit to myself that the venerable Unix command line interface is stuck in the past and in need of a refresh, but it was a formative moment in my development as a programmer when I finally did. Coming from that perspective, I am very glad that there is a new wave of enthusiasm (coming especially from the Rust community) to build new tools that are fixing some of the problems with this very old and established user-interface.

The Role of the Unix CLI Interface

To describe the Unix command line interface, “venerable” is definitely the right word: many programmers (including myself at some points of my life) have an awe of Unix and its role in computing history that has sometimes bordered on veneration.

Since the Unix operating system began development at Bell Labs in 1969, it has gone viral. That’s probably an understatement: Most modern operating systems descend from this original Unix, either directly through gradual code change (macOS and iOS are descended it from it through BSD), or through Linux (the kernel behind most servers and behind Android and ChromeOS) and its accompanying usermode software (much of which was part of the GNU project), which were designed to work like Unix due its familiarity for users and programmers.

Unix was and is billed not just as an operating system, but a philosophy. Among other things, its command line interface has been held up time and time again as an example of good design practices and an ideal realization of this philosophy, with its developer- and administrator-friendly orientation towards plain text files and with its modularity, especially as embodied in the concept of pipelining.

And as a result, when people say they know “the command line,” it’s almost certainly the Unix command-line interface that they’re talking about. And what’s more, many of us were taught it from texts that gushed about how great it is. But even the Unix command line interface, though part of a well-established standard, the topic of many books, and used by and intimately familiar to millions of programmers and admins across generations, is, in the end, just another computer interface for users and developers. And it has its flaws.

A Disappointing Ambiguity

As I alluded to before, when I was a much younger programmer, I had an awe-struck veneration for Unix. One of my colleagues at an early job in my career referred to me as our company’s “Unix philosopher.” While I wasn’t sure whether he meant it as a compliment, at the time, I took it as one.

The first flaw that really got my attention in the Unix command line had to do with the mv command. I’m going to take some time explaining this flaw in detail, as it’s somewhat subtle, and as discovering it was a formative moment for me in my development as a programmer.

mv, as many of you know, is short for “move.” And while its job indeed includes moving files from one place to another, due to idiosyncracies of the Unix file system (if they can be called idiosyncracies when most file systems followed Unix’s lead on this), moving files and renaming files are closely related operations under the hood, causing the mv command to be both the “move” command and the “rename” command:

# Assume a file called 'draft-file'
# Assume a directory called 'final-docs'

# Rename 'draft-file' to 'final-file' and put it in 'final-docs'
mv draft-file final-file # rename 'draft-file' to 'final-file'
mv final-file final-docs # move 'final-file' into 'final-docs' directory

# Alternatively, one step:
mv draft-file final-docs/final-file

As you can see, there is no distinction between these operations. There is no option that you must enable to get the “moving” feature as opposed to the “renaming” feature. And this can result in surprises, which are bad in software development.

Consider this command again:

mv draft-file final-file

What does it do? It changes the name of the file from draft-file to final-file, keeping it in the same directory, right? Well, probably, and that’s almost certainly what the user intended, but what if someone, accidentally or intentionally, had created a directory called final-file? That command would be interpreted instead as moving draft-file into the final-file directory:

$ # Rename operation
$ touch draft-file
$ ls
draft-file
$ mv draft-file final-file
$ ls
final-file
$ ls final-file
final-file
$ rm final-file
$
$ # Move operation
$ mkdir final-file # Imagine someone else did this, or it was done by accident
$ touch draft-file
$ mv draft-file final-file
$ ls
final-file
$ ls final-file
draft-file
$ rm final-file
rm: cannot remove 'final-file': Is a directory
$ rm -rf final-file

Notice that if there is no color-coding enabled, a simple ls command doesn’t even distinguish the two situations, so you can’t tell which one happened without issuing a more specific command, as ls also has a dual role: it can either show you the names of the files you specify, if they are present, or it can show you the files in a directory you specify. The -d option disambiguates that you want the names and not the contents, but the default is still ambiguous.

In the case of the mv command, this potentially could even be a security vulnerability in a shell script (which is admittedly not a very secure platform). It is in any case an unnecessary complication.

The GNU version of mv has a -t option to indicate that the destination is not to be interpreted as a directory to put things in, and a -T option to show unambiguous intent for a target directory to be used. But these are extensions; the POSIX standard manual page for mv doesn’t mention them.

And while this GNU extension is helpful, especially in scripts that you know will only be run with the GNU version of mv (that is, not on macOS), I don’t think it goes far enough. Most people don’t know about them, and the possibility of surprise is still there.

Disillusioned

When I realized this, it created a huge hole in my previous (admittedly unreasonable) esteem for the Unix command line interface. I realized that the ideal solution was something impractical, almost unthinkable to the younger version of me: mv should be deprecated in favor of two commands, one to do renaming, and one to do targeted directory-dropping.

This glitch in the mv command is just a gotcha to be aware of, one of many minor flaws to dance around when shell scripting. But I remember it strongly, because rather than being warned about it in a book, I discovered it myself, and therefore it was the distinct moment I realized that the command line interface would need to be improved at some point. And once the metaphorical levee was broken, I started noticing many inconveniences and problems in the traditional Unix CLI tools, often more relevant to my day-to-day workflow than this minor gotcha.

I ultimately came to read more critical sources about Unix, such as the famous UNIX-HATERS Handbook, and similar sources that emphasized the problems. And I’m very glad I went through this process, because before this, I was a naive CLI user and shell-scripter, trusting the system way more than I should, leaving myself open to serious problems.

Many Unix commands have gotchas and inconveniences, some I knew about before this revelation and brushed aside, others that I found out about later. tar has its idiosyncratic traditional syntax that many, many scripts (and people) still use, and inconsistency between platforms on whether you need -z to unpack a compressed archive. The way the shell itself worked also contained gotchas: What happens if you have files whose names start with a -? (Answer: Their names get misinterpreted as options, even if you didn’t type them but simply included them accidentally in a wildcard expansion.)

Among the more practical issues that particularly effect me, I want to emphasize two in particular: Why is find’s syntax so gnarly, so that you have to type out --name and explicitly specify the current directory? Why is it so hard to get grep to not display the pages-long lines of minimized Javascript or similar files when I want to only display the shorter lines from actual source files?

The Future

Luckily, improvement is on its way. For the last two cherry-picked examples, there are new re-conceptions of find and grep -r that fix them (with new names, of course, so they’re not beholden to interface-compatibility), and I recommend them (dare I say such blasphemy?) over the traditional equivalents:

Don’t let their long names dissuade you; they are commonly installed as fd and rg, respectively, and come with such modern features as:

Normal command line syntax (fd)
Integration with git, the de facto standard version control system, by ignoring .git and .gitignore’d files by default (both)
Line length maximums (rg)
Modern leveraging of multithreading (both)
Better performance than their traditional counterparts

These are the only new Rust-based commands I’ve tried, but they’ve already vastly improved my workflow, so that I miss having them (fd especially) when SSH’d into relatively minimalist embedded devices. And I have reason to hope there’s more gems out there as part of this explosive movement to implement new Rust-based commands.

Whether people are doing this to improve their Rust chops, or because they’ve felt a need for a long time and Rust is just their PL of choice, it’s good to see some actual evolution in my day-to-day experience as a Unix CLI user. It hasn’t fixed mv – yet – but it’s good to see it evolving.

On the implementation side of things, I am also very happy to see a Rust project to reimplement the standard coreutils. The C implementations undoubtedly leave some performance and stability on the table, and a new implementation is long over-due. A fresh implementation of these utilities will hopefully also spark improvements to the interfaces.

And Meanwhile, in `git`-land

On a related positive note, I learned very recently (in 2022) that git has (in 2019) fixed a problem similar to mvs: git checkout, ambiguous in a similar way, has been rendered unnecessary by the less ambiguous git switch and git restore.

Why I Won't Correct You're Grammar (unless you ask)

2022-06-14T00:00:00+00:00

I am an Ivy League-educated professional who regularly has to write for my job, who was always in the top English classes in school. And sometimes, I mix up “your” and “you’re.”

I know how grammar works. I always, if I stop to think about it, can figure out which one to use. I know all the tricks. Most of the time, I don’t have to think about it, and the right one comes out. But sometimes, I’m just thinking in terms of what sounds I would make if I were speaking, and I’m in a rush or just distracted or just glitching, and the wrong one comes out.

What’s my point? My point is that written English conventions are hard and unnatural, that even a very educated native speaker can mess them up, even one who writes all the time. Not only are there sets of homophonous grammar words which are super-easy to mess up – such as “you’re” vs “your,” “they’re” vs “their” vs “there” – we also have one of the few spelling systems complicated enough that using it is a competitive national sport. If you were trying to make a language hard to write in on purpose, I’m not sure you’d do better than English.

If written English is basically impossible to get right all the time, even for the most educated native speakers, what about everyone else? Most people are not Ivy League-educated native English speakers. A lot of people learn English in adulthood, or at least later in childhood. A lot of people grow up speaking non-standard dialects. A lot of people simply don’t get the educational opportunities I have had – or simply choose to focus on other skills in life.

So, I say, let’s not use “your”/“you’re” as a value judgment or a sign of stupidity. Obviously if your friend gives you something to copy-edit and they use the wrong one, fix it, but especially in informal settings like social media and text, let’s maybe not make it out to be a bigger deal than it is?

Descriptivism and Prescriptivism

Is this the dreaded “descriptivism”? Perhaps it is, at least in the sense in which that term is bandied about in popular culture.

In this dramaticized popular conceptualization, descriptivists and prescriptivists are different camps opposing each other, aligned with our larger societal cultural war:

Descriptivists: Made up of linguistics professors and self-appointed activists, the descriptivists align with the culture-war liberals, upholding diversity. They think every form of speech and grammar is equally valid, especially those of underprivileged communities.
Prescriptivists: Made up of English teachers and self-appointed grammarians, the prescriptivists align with culture-war conservatives, upholding tradition. They believe that there is one true system of English grammar, that everyone should aspire to, be taught, and be socially pressured into adhering to.

As you may have guessed from my descriptions, I think this way of looking at the issue is silly. Rather than dividing people into camps, I think there is a more descriptive way of using these terms. I instead would prescribe definitions that focus on attitudes that any person can adopt:

Descriptivism: the attitude of science. If we are acting as linguists, as Sprachwissenschaftler or “language scientists,” then we want to study the amazing fact that humans naturally develop and perpetuate intricate systems for turning sounds into words into sentences. For this goal, all dialects (and sociolects and idiolects) are equally valid, because all of them can teach us more about how language works.
Prescriptivism: the attitude of conventionality. If we are acting as professional writers or speakers or copy-editors, then we want to make sure that we and those we work with can communicate in a way that is comprehensible by our audience, follows the rules of grammar and spelling and punctuation that our audience expects, so that writers and speakers can signal that they take the situation appropriately seriously and so that grammar doesn’t distract from communication. For these goals, what is “valid” (or better put, appropriate) is often the standard, conventionalized, and academic prestige forms of a language.

With these definitions in mind, it is possible for one person to take on different stances in different contexts. An English teacher can use descriptivism when they want to understand why their students speak and write a certain way, or struggle with conventions in comparison to their peers: Is it a problem grasping the concepts, or is it because they speak a different dialect or sociolect from the the other students in their class? In the same class, they can take on a more prescriptivist stance when they set their goals and standards for how the students should ultimately learn to write and speak in formal situations.

Linguistics researchers often study more stigmatized dialects or common non-standardisms in speech, and dispel stereotypes about them that are not based in fact. They then write the resultant papers in immaculate academic formal language. This, of course, makes no sense if you think of prescriptivism and descriptivism as camps, but there is no contradiction here. It makes perfect sense if you instead think of prescriptivism and descriptivism as attitudes, appropriate in different situations.

But my point in this article is not about science, teaching, or professional communication. My point is about what stance to take in everyday communication. And in everyday communication, neither prescriptivism nor descriptivism is appropriate. Unwanted grammar corrections are rude, and so is unwanted field linguistics. The appropriate stance to take in everyday communication is neither descriptivism nor prescriptivism, but rather politeness.

Sure, descriptivism is sometimes used as an argument here. It can be scientifically demonstrated that informal forms of English and even sentences like “It don’t do nothing,” in the dialects where they arise, have their own internal logic, just as sophisticated from an objective standpoint as more prestigious forms of English – even though that has very little to do with the “your”/“you’re” distinction and purely written distinctions like it. Many long-established shibboleths of the ostentatiously grammar-conscious can be shown to have little basis in history or established usage – but “your”/“you’re” is a pretty well-established distinction. We could even find scientific studies to show how much more difficult the English writing system is from those of other languages, but even if it wasn’t, the rules of politeness wouldn’t change.

Science and research may be useful for answering questions like “how can we most effectively teach children?” But it doesn’t really have any bearing on whether we should give people unwanted grammar corrections or use their grammar to judge them. In this, common sense and politeness win the day, and they simply say: “No.” Or perhaps rather: “Please don’t.”

Language “Decay”

There is one objection that I think is worth addressing, that comes from a place other than raw snobbery. It goes something like this:

But Jimmy! What if this social pressure serves a good purpose? What if it prevents language change, allowing our language to stay as it is for longer, connecting us with writers of the past? What if we really like the way English is and don’t want it to change, for aesthetic or culture reasons, or the belief that the language is in some way particularly well-suited for use as it is?

This objection (phrased differently) was raised when I posted an earlier, shorter version of this essay to Facebook. And I’m not sure what to do with it. I suspect some people think that this is a worthy goal, an upside of grammar corrections to be balanced against the politeness elements. Others, I imagine, see it as a silver lining or a subsidiary purpose to our prescriptions and our societal elevation of relatively conservative conventions and grammar norms.

For me, there are two considerations here. One is whether this works at all. Of course, in the long term, it is futile; English will change eventually, slowly but surely, as it has before, whatever we do. But maybe prescriptive grammar slows down language change.

Does it? Probably, but I think people overestimate the effect. And I think the effect is almost entirely accomplished in those situations where prescriptivism is appropriate, namely, copy-editing and education. I simply don’t think people correcting their acquaintances’ text messages for them or judging their grammar online does much to keep the language from changing.

But even if judgmentalism (and fear of judgmentalism) does slow down language change, why does it matter? English as it is now isn’t that special. It’s just a language, like any other. Whatever it evolves into will suit humanity and society’s purposes equally well. Heroic efforts to try and stop the inevitable is not certainly not worth the rudeness that often comes along with them.

I personally am not attached to the current form of English, but there is a way in which I can relate. I do remember being upset at some of the changes that are happening (very slowly) in the German language, but then I was comforted by something: Unless something drastically increases my life-span, German as I’ve learned it will remain a valid and prestigious way to speak German (modulo my mistakes and accent), for the rest of my days.

Perhaps my children’s children will live in a world where German is drastically and unrecoverably different – if I even have children who then have children – but that will be their problem, and I’m sure their opinions will differ from mine.

Trivia About Rust Types: An (Authorized) Transcription of Jon Gjengset's Twitter Thread

2022-06-06T00:00:00+00:00

Preface (by Jimmy Hartzell)

I am a huge fan of Jon Gjengset’s Rust for Rustaceans, an excellent book to bridge the gap between beginner Rust programming skills and becoming a fully-functional member of the Rust community. He’s famous for his YouTube channel as well; I’ve heard good things about it (watching video instruction isn’t really my thing personally). I have also greatly enjoyed his Twitter feed, and especially have enjoyed the thread surrounding this tweet:

Okay, learning time! Name a @rustlang type (can be generic), and I’ll (try to) tell you something you didn’t know about that type!

What great fun!

I immediately felt that this thread should have a transcription outside of social media (Jon Gjengset already did a Reddit transcription), and so I asked him if he had any plans to turn it into a blog post, and failing that, whether I could. Much to my surprise, he gave me the go-ahead.

So I have done so, and this is the blog post! It wasn’t even boring, because I learned so much as I copied the entries! Minor edits have been made to add formatting and adapt links to how blogs work rather than how Twitter works. This is taken from the Reddit version. My markdown source is also available.

So, without further ado, Jon Gjengset’s “Trivia About Rust Types.”

Trivia About Rust Types (by Jon Gjengset)

`std::fmt::Debug`

Did you know that the Formatter argument to Debug::fmt makes it really easy to customize debug representations for structs, enums, lists, and sets? See the debug_* methods on it.

`Formatter`

Did you know that std::fmt::Formatter is super easy to use if you want more control over debugging for a custom type? For example, to emit a “list-like” type, just Formatter::debug_list().entries(self.0.iter()).finish().

`Option<T>`

Did you know that Option<T> implements IntoIterator yielding 0/1 elements, and you can then call Iterator::flatten to make that be 0/n elements if T: IntoIterator?

`type EmptyTupleList = Vec<()>`

Did you know that since () is a zero-sized type, and the vector never actually has to store any data, the capacity of Vec<()> is usize::MAX!

`T`

Did you know that T doesn’t imply ownership? When we say a type is generic over T, that T can just as easily be a reference to something on the stack, and the type system will still be happy. Even T: 'static doesn’t imply owned — consider &'static str for example.

[Reminds me of this excellent article -Jimmy]

`std::sync::mpsc::channel::Sender`

Did you know that std::sync::mpsc has had a known bug since 2017, and that the implementation may actually be replaced entirely with the crossbeam channel implementation? https://github.com/rust-lang/rust/pull/93563

`u128`

Did you know that even though we got u128 a long time ago now, we still don’t have repr(128)? https://github.com/rust-lang/rust/issues/56071

`std::ffi::OsString`

Did you know that there are per-platform extension traits for OsString that bake in the assumptions you can safely make on that platform? Such as strings being [u8] on Unix and UTF-16 on Windows.

`std::ptr::NonNull`

Did you know that one of the super neat features of NonNull is that it enables the same niche optimization that regular references and the NonZero* types get where Option<NonNull<T>> is the same size as *mut T?

`Cow<T>`

Did you know that there used to be a special IntoCow trait, but it was deprecated before 1.0 was released! https://github.com/rust-lang/rust/issues/27735

`Box<T>`

Did you know that Box<T> is a #[fundamental] type, which means that it’s exempt from the normal rules that don’t allow you to implement foreign traits for foreign types (assuming T is a local type)?

`std::process::Child`

Did you know that std has three different ways to spawn a child process on Linux (posix_spawn, clone3/exec, fork/exec) depending on what capabilities your kernel version has?

`Pin<T>`

Did you know that the name Pin (and the name Unpin) where both heavily debated? Pin was almost called Pinned, for example. The discussion is an interesting read now after the fact.

`Vec<T>`

Did you know that Vec::swap_remove is way faster than Vec::remove if you can tolerate changes to ordering?

Did you know that the smallest non-zero capacity for a Vec<T> depends on the size of T?

`CStr`

Did you know that CStr::default creates a CStr that points to a const string "\0" stored in the binary text segment, which means all default CStrs point to the same (non-null) string!

`for<'a> SomeTrait<'a>`

Did you know that you can use for<'a> to say that a bound has to hold for any lifetime 'a, not just a specific lifetime you happen to have available at the time. For example, <T> for<'a>: &'a T: Read says that any shared reference to a T must implement Read.

This monstrous warp type

Did you know that the trailing commas you see in some places in there, ,), are to distinguish one-element tuples from regular parenthetical expressions?

`FnOnce`

Did you know that until Rust 1.35, you couldn’t call a Box<dyn FnOnce> and needed a special type (FnBox) for it! This was because it requires “unsized rvalues” to implement, which are still unstable today. https://github.com/rust-lang/rust/issues/28796 + https://github.com/rust-lang/rust/issues/48055

`f32`

Did you know that in Rust 1.62 we’ll get a deterministic ordering function for floating point numbers? https://github.com/rust-lang/rust/pull/95431

`Arc<T>`

Did you know that Arc has a make_mut method that effectively gives you copy-on-write? Given a &mut Arc<T>, it will either give you &mut T if there are no other Arcs, or it will clone T, make the Arc<T> point to that new T, and then give you a &mut to it!

`!`

Did you know that std::convert::Infallible is the “original” !, and that the plan is to one day replace Infallible with a type alias for !?

`fn`

Specifically, did you know that the name of a function is not an fn? It’s a FnDef, which can then be coerced to a FnPtr?

`PhantomData`

Did you know that it’s actually kind of tricky to define PhantomData yourself: https://github.com/dtolnay/ghost

`u32`

Did you know that u32 now has associated constants for MIN and MAX, so you no longer need to use std::u32::MIN and can use u32::MIN directly instead?

`bool`

Did you know that bool isn’t just “stored as a byte”, the compiler straight up declares its representation as the same as that of u8?

`Any`

Did you know that Any is really non-magical? It just has a blanket implementation for all T that returns TypeId::of::<T>(), and to downcast it simply compares the return value of that trait method to see if it’s safe to cast to downcast to a type! TypeId is magic though.

`Self`

Did you know that fn foo(self) is syntactic sugar for fn foo(self: Self), and that one day you’ll be able to use other types for self that involve Self, like fn foo(self: Arc<Self>)? https://github.com/rust-lang/rust/issues/44874

`()`

Did you know that () implements FromIterator, so you can .collect::<Result<(), E>> to just see if anything in an iterator erred?

[Note that this doesn’t say whether or not this is a good idea. -Jimmy]

`struct S`

Did you know that struct S implicitly declares a constant called S, which is why you can make one using just S?

`RefCell`

Did you know that RefCell allows you to replace a value in-place directly (like std::mem::replace)? https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.replace

`core::num::Wrapping`

Did you know that there used to also be a trait accompanying Wrapping, WrappingOps, that was removed last minute before 1.0? https://github.com/rust-lang/rust/pull/23549

`*const T`

Did you know that, at least for the time being, *const T and *mut T are more or less equivalent? https://github.com/rust-lang/unsafe-code-guidelines/issues/257

`std::os::unix::net::UnixStream`

Did you know that (on nightly) you can pass UNIX file descriptors over UnixStreams too, and thereby give another process access to a file it may not otherwise be able to open?

`std::sync::Condvar`/`Mutex`

Did you know that Mara is doing some awesome work on making Condvar (and Mutex and RwLock) much better on a wide array on platforms? https://github.com/rust-lang/rust/issues/93740

`std::task::Waker`

Did you know that Waker is secretly just a dyn std::task::Wake + Clone done in a way that doesn’t require a wide pointer or support for multi-trait dynamic dispatch? See https://doc.rust-lang.org/std/task/struct.RawWakerVTable.html

`impl Trait`

Did you know that impl Trait in argument position and impl Trait in return position represent completely different type constructs, even though they “feel” related? https://doc.rust-lang.org/nightly/reference/types/impl-trait.html

`BTreeMap<K, V>`

Did you know that BTreeMap is one of the few collections that still doesn’t have a drain method? https://github.com/rust-lang/rust/issues/81074

`struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>);`

Did you know that PhantomData<T> has variance like T, and *mut T is invariant over T, and so by placing a lifetime inside T you make the outer type invariant over that lifetime?

`Rc<T>`

Did you know that the Rc type was among the arguments for why std::mem::forget shouldn’t be marked as unsafe? https://github.com/rust-lang/rust/issues/24456

`std::future::Ready`

Did you know that these days you can just use async move { x } instead of future::ready(x). The main reason to still use future::ready(x) is that you can name the future it returns, which is harder with async (without type_alias_impl_trait that is).

`usize`

Did you know that usize isn’t really “the size of a pointer”. Instead, it’s more like “the size of a pointer address difference”, and the two can be fairly different! https://github.com/rust-lang/rust/issues/95228

`std::thread::Thread`

Did you know that the ThreadId that’s available for each Thread is entirely a std construct? Creating a ThreadId simply increments a global static counter under a lock.

`std::ops::ControlFlow`

Did you know that ControlFlow is really a stepping stone towards making ? work for other types than Option and Result? The full design has gone through a lot of iterations, but the latest and greatest is RFC3058.

`File`

Did you know that there are implementations of Read, Write, and Seek for &File as well, so multiple threads can share a single File and call those concurrently. Whether they should is a different question of course.

`Result<T, E>`

Did you know that Rust originally (pre-1.0) had both Result and an Either type? They decided to remove Either way back in 2013

`Cow<str>`

Did you know that because Cow<'a, T> is covariant in 'a, you can always assign Cow::Borrowed("some string") to one no matter what it originally held?

`PanicInfo`

Did you know that since PanicInfo is in core, its Display implementation cannot access the panic data if it’s a String (since it can’t name that type), so trying to print the PanicInfo after a std::panic::panic_any(format!("x y z")) won’t print "x y z"? Source link.

`std::ffi::c_void`

Did you know that the whole c_void type is a collection of hacks to try to work around the lack for extern types? https://github.com/rust-lang/rust/issues/43467

`#[feature(raw_ref_op)] &raw const T`

Definitely cheating :p But did you know that originally the intention was to have &const raw variable be just a MIR construct and let &variable as *const _ be automatically changed to &const raw? https://github.com/RalfJung/rfcs/blob/fd4b4cd769300cfde5d54865d227990b71b762d1/text/0000-raw-reference-operator.md

`u256`

Did you know that because Rust compiles through LLVM, we’re sort of constrained to the primitive types LLVM supports, and LLVM itself only goes up to 128?

`_`

Did you know that whether or not let _ = x should move x is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488

`MaybeUninit`

Did you know that MaybeUninit arose because the previous mechanism, std::mem::uninitialized, produced immediate undefined behavior when invoked with most types (like uninitialized::<bool>()).

`struct T<const C: usize>`

Did you know that with Rust 1.59.0 you can now give C a default value?

`Weak<T>`

Did you know that actual deallocation logic for Arc<T> is implemented in Weak<T>, and is invoked by considering all copies of a particular Arc<T> to collectively hold a single Weak<T> between them? Source link.

`[T; N]`

Did you know that while most trait implementations for arrays now use const generics to impl for any length N, we can’t yet do the same for Default.

`u8`

Did you know that as of Rust 1.60, you can now use u8::escape_ascii to get an iterator of the bytes needed to escape that byte character in most contexts.

`HashMap<K, V>`

Did you know that the Rust devs are working on a “raw” entry API for HashMap that allows you to (unsafely) avoid re-hashing a key you’ve already hashed? https://github.com/rust-lang/rust/issues/56167

`&mut T`

Did you know that while &mut T is defined as meaning “mutable reference” in the Rust reference, you’re often better off thinking of it as “mutually exclusive reference”. Quoth David Tolnay.

`std::ops::Range`

Did you know that there’s been a lot of debate around whether or not the Range types should be Copy? https://github.com/rust-lang/rust/pull/21846

`AtomicU32`

Did you know that you’ll often want compare_exchange_weak over compare_exchange to get more efficient code on ARM cores.

`std::ops::Hash`

Did you know that Hash is responsible for not just one , but two of the issues on the “rust 2 breakage wishlist”?

`{integer}`

Did you know that fasterthanlime’s most recent article does a great job at explaining {integer}?

`Fn`

Did you know that until Rust 1.35.0, Box<T> where T: Fn did not impl Fn, so you couldn’t (easily) call boxed closures! https://github.com/rust-lang/rust/pull/55431

`((), ())`

Did you know that ((), ()) and () have the same hash? Playground link.

`[T]`

Did you know that &[u8] implements Read and Write? So for anything that takes impl Read, you can provide &mut slice instead! Comes in handy for testing. Note that the slice itself is shortened for each read, hence &mut &[u8].

`*`

Did you know that * is (mostly) just syntax sugar for the std::ops::Mul trait?

`UnsafeCell<T>`

Did you know that UnsafeCell is one of those types that the compiler needs “special magic” for because it has to instruct LLVM to not assume Rust’s normal aliasing rules hold once code traverses the boundary of any UnsafeCell?

Function Overloading in Rust

2022-06-04T00:00:00+00:00

I just made a pull request to reqwest. I thought this particular one was interesting enough to be worth blogging about, so I am.

We know that many C++ family languages have a feature known as function overloading, where two functions or methods can exist with the same name but different argument types. It looks something like this:

void use_connector(ConnectorA conn) {
    // IMPL
}

void use_connector(ConnectorB conn) {
    // IMPL
}

The compiler then chooses which method to call, at compile-time, based on the static type of the argument. In C++, this is part of compile-time polymorphism, an easy “if statement” in the template meta-language. In Java and many other languages, it’s merely a convenience, for when an ad-hoc group of types are possible for what an outsider sees as the same operation, but which from the perspective of the library requires different implementations.

Rust does not support this, at least not in this form. This is a mildly controversial decision; I’ve seen many people complain about it, because it is a commonly-used feature in the languages they’ve come from. Ultimately, I think Rust made the right call. There are too many advantages of having a one-to-one correspondence between method or function names and implementations, and ultimately I think the feature is more confusing than helpful. traits cover a lot of the same ability, but in a more structured fashion, acting like C++’s compile-time “if-statements.” But of course, there is always a learning curve giving up a feature you’re used to using.

But just because Rust doesn’t officially support function loading as a feature, surprisingly doesn’t mean that it’s completely impossible. Recently, I was looking into the depths of reqwest, trying to troubleshoot an issue, and I came across this code:

#[cfg(any(feature = "native-tls", feature = "__rustls",))]
#[cfg_attr(docsrs, doc(cfg(any(feature = "native-tls", feature = "rustls-tls"))))]
pub fn use_preconfigured_tls(mut self, tls: impl Any) -> ClientBuilder {
    let mut tls = Some(tls);
    #[cfg(feature = "native-tls")]
    {
        if let Some(conn) =
            (&mut tls as &mut dyn Any).downcast_mut::<Option<native_tls_crate::TlsConnector>>()
        {
            let tls = conn.take().expect("is definitely Some");
            let tls = crate::tls::TlsBackend::BuiltNativeTls(tls);
            self.config.tls = tls;
            return self;
        }
    }
    #[cfg(feature = "__rustls")]
    {
        if let Some(conn) =
            (&mut tls as &mut dyn Any).downcast_mut::<Option<rustls::ClientConfig>>()
        {
            let tls = conn.take().expect("is definitely Some");
            let tls = crate::tls::TlsBackend::BuiltRustls(tls);
            self.config.tls = tls;
            return self;
        }
    }

    // Otherwise, we don't recognize the TLS backend!
    self.config.tls = crate::tls::TlsBackend::UnknownPreconfigured;
    self
}

I was shocked to see this! I felt like I was reading Java. My first thought was that this was the Java instanceof (anti-)pattern, but after a little more thought, I realized that this in practice would work out to function overloading.

Since this uses impl Any instead of &mut dyn Any, this function will be monomorphized at compile-time, and I would expect that the relevant branching would be collapsed, resulting in these monomorphizations, written in an imaginary version of Rust where function overloading is supported:

#[cfg(feature = "native-tls")]
pub fn use_preconfigured_tls(mut self, tls: native_tls_crate::TlsConnector) -> ClientBuilder {
    let tls = crate::tls::TlsBackend::BuiltNativeTls(tls);
    self.config.tls = tls;
    self
}

#[cfg(feature = "__rustls")]
pub fn use_preconfigured_tls(mut self, tls: rustls::ClientConfig) -> ClientBuilder {
    let tls = crate::tls::TlsBackend::BuiltRustls(tls);
    self.config.tls = tls;
    self
}

There is a wrinkle though. Unlike the Java or pseudo-Rust equivalent, the Rust code in reqwest will still allow functions to compile if they specify another type that is not one of the two supported. So you can call this function with anything, even an i32, and the compiler won’t signal an error or even a warning:

client_builder.use_preconfigured_tls(42); // COMPILES!

In this implementation, it eventually causes a run-time error instead (a separate function produces it in the case of UnknownPreconfigured). But this odd type-safety work-around still can’t be removed without breaking API-compatibility. Code could theoretically be relying on this function producing a run-time error in certain situations, or it could rely on that other function not being called. Luckily, reqwest is not 1.0, and I have reason to hope they won’t consider this problematic.

There are other ways to accomplish the same goal. Instead of an ad-hoc list of supported types, this code could’ve used a trait. Such code would look something like this:

pub trait TlsConfig {
    fn to_tls_backend(self) -> crate::tls::TlsBackend;
}

#[cfg(feature = "native-tls")]
impl TlsConfig for native_tls_crate::TlsConnector {
    fn to_tls_backend(self) -> crate::tls::TlsBackend {
        crate::tls::TlsBackend::BuiltNativeTls(self)
    }
}

#[cfg(feature = "__rustls")]
impl TlsConfig for rustls::ClientConfig {
    fn to_tls_backend(self) -> crate::tls::TlsBackend {
        crate::tls::TlsBackend::BuiltRustls(self)
    }
}

pub fn use_preconfigured_tls(mut self, tls: impl Tls) -> ClientBuilder {
    self.config.tls = tls.to_tls_backend();
    self
}

This would allow the library to be used in the exact same way for valid uses, but would still allow the compiler to catch invalid types. To be sure, the trait and its impls would have to be separated in the code from the use_preconfigured_tls method, as you can’t put a trait inside an impl block. But I think such an inconvenience is worth the better type-safety.

My take-away here is to be wary of emulating features from other programming languages, and also to be wary of std::any.

Addendum/Errata

I was wrong about the existing code not providing a run-time error. It sets an enum to UnknownPreconfigured, which then triggers a run-time error elsewhere in a separate function. The article has been updated accordingly.

The trait example code was also edited to reflect a version that actually compiles, but not the final version in the MR.

I also edited the intro to clarify the relationship between function overloading and traits.

The MR was ultimately rejected for reasons I deeply disagree with.

Reviews and Reactions: 2022 Short Story Hugo Nominees

2022-06-01T00:00:00+00:00

We decided to write up our thoughts on each of the short stories nominated for the 2022 Hugo awards. Of course, here be spoilers, spoilers galore. If you don’t want these stories spoiled, go read them, and then come back here.

This is the same concept as Jimmy’s review of the 2021 nominees, and so we shall adapt the explanation from that post:

As an exercise, we read each of these stories and told each other what we thought the themes were, and I reference that throughout these reflections. Themes, as we define them, are thematic statements: the point the story is trying to make. Themes are distinct from thematic concepts, in that they are complete sentences rather than just nouns. They are distinct from premises, in that they are the take-away for the real-world, not a statement about the world of the story. And, to be clear, there can be more than one completely valid answer. Both of us would posit what we thought the theme was, answering independently without consulting each other, and then we would discuss the story in greater detail.

What follows are the tangible results of those discussions: reflections about each story, somewhere between review and analysis. Each header is also a link, because all of these stories are available to read online. They are reviewed in descending ranked order according to Jimmy’s ranking, and some overall discussion of ranking is reserved for the conclusions.

Mr. Death

A trick ending, indeed. A relatively common trope, but unexpected here, at least for us: In order to pass a test of morality, you have to refuse an order, to not only do the right thing but do it in spite of what you think will be horrible consequences to you. Can your conscience survive dishonesty and manipulation?

It’s terrifying to see this trick done at the “salvation/damnation” scale. It reminds Jimmy of this SMBC. It sort of calls into question the whole premise of “eternal damnation” and “eternal punishment,” especially if the operators of these mechanisms have values that disagree with ours, or are simply a result of arbitrary but impersonal rules.

Given the twist at the end, it’s unclear what the rules of this story are. How much wasn’t this reaper told? We understand why he was lied to for the test, but now will he be given a more complete picture with a new boss? Is he going to get more and more shocking revelations every couple of eons? Unclear!

On the other hand, this story is a resounding endorsement of the theory that you should always avoid doing something horrible, even if orders compel you to do the horrible thing, even if it goes against the theory you’ve been taught. We agreed that this was the theme. As Doug put it, when the rules compel you to violate your conscience, violate the rules. A good person’s conscience is usually the better guide than a good rule book.

With life-or-death stakes such as these, this theme makes sense. There is a balance, however. “Follow your conscience in all circumstances” is bad advice when consciences are fallible, and sometimes the person giving you the order is simply someone who knows more than you about the situation. Who cannot say they held onto a stubborn but incorrect rebellion against an authority as a kindergartner? Who truly has never done it as an adult? Humility is actually a virtue. But Doug thinks that this story’s power is that it has more faith in humanity’s ability to intuit morality. In effect, the story is taking a powerful stand in favor of act consequentialism versus rule consequentialism. Doug is more inclined to support act consequentialism than Jimmy is.

This particular story could’ve gone a different way, Perhaps, by violating the cosmic rule, all of time could have unravelled, or there could have been a butterfly effect where someone else had to die as a result of the protagonist’s decision. Such a story would have been written by a different author who had less faith in humanity’s ability to intuit morality and was a stronger proponent of a rule-based ethical system. In such a story, the blame for the negative consequence would’ve (in Jimmy’s eyes) definitely fallen on Raz for not explaining the stakes and what would happen if the death was avoided. (Doug disagrees and thinks that, in a truly rule-based system, the blame would still have been on the protagonist. Raz would definitely be part of the causal chain, though, and it would have behooved Raz to give a little more information.) In the story as written, the narrator kept on asserting that you cannot cheat death, without giving any evidence or specific reason. At the time, this felt like there just wasn’t enough time to go into it for the story, and it counted against the story. Now, it feels like foreshadowing, and it is a strength of the story.

For Jimmy, while this story made him emotionally believe in the theme, and while he greatly enjoyed this story and its subversion of the normal trope of “don’t mess with forces greater than you, even to save a life, because it could have even greater consequences,” he finds himself intellectually not as convinced as he wants to be. As he thinks deeper about it, he finds the questions brought up, and this story, somewhat unsettling.

Doug thinks that this is the story’s greatest strength, though. This story forces the reader to confront a difficult moral question and examine the consequences. Whenever a short story succeeds in making the reader question an inherent moral belief, it deserves major kudos. Go read this story.

Ranking

We both agreed that this was the best story, and so here it comes ranked first, for interesting thought-provocation and quality of writing with a twist at the end.

Where Oaken Hearts Do Gather

Jimmy (very much not Doug) had a lot of fun with this one! A satire of Internet communities, where everyone jockeys to maintain their karma and online reputations and fails to engage properly with the actual realities of the situation at hand, where if they were paying more attention they would realize that their more serious colleague was finding out how truly important the thing actually was.

Themes include “People on the Internet are idiots more concerned with their own reputation than the things they’re actually interested in,” and also “there’s more to folklore than meets the eye” and “we can never truly know the past.”

Because of its unusual structure, it was important for us to discuss the plot so that we were on the same page about what happened; namely, that a bunch of argumentative nerds are too busy trying to get Internet points from each other to realize that the song being discussed is all-too-real and another serious scholar is going to get his heart taken out.

Ranking

The juxtaposition of old folklore, scholarly academic discussion of folklore, Internet arguing, and horror weaves a tight mesh that Jimmy enjoyed greatly, as a fan of basically all those things. This won the Nebula and, in Jimmy’s eyes, well deserved it; it’s a very close second to Mr. Death in his mind. The form must have been incredibly difficult to write: the opposite of lazy writing.

Doug, on the other hand, was a harsh critic of the story. It was actually Doug’s least favorite story of the lot, and there were several Doug really disliked this year. Doug’s biggest problem with the story was that it seemed like it was just a Reddit conversation, with no character development and a well-trodden plot (specifically, the bit about an old folktale actually being real and youth not realizing it while someone befalls a ghastly fate). Sure, the way in which the story was told was super unique, but that artifice could not cover up the tired plot in his eyes.

The Sin of America

This is a new retelling of “The Lottery”, but with different themes for a different America. That is to say, both pieces are satires of American culture, but in the years that have passed, American culture has changed a lot. This author seemed to think an updated version was called for, and given the new story, Jimmy is convinced.

The Wikipedia article on “The Lottery” mentions two themes in it (or did at the time of this writing):

This seemed off to Jimmy, and so he went and reread “The Lottery,” and found his suspicions confirmed: There’s next to nothing in there about scapegoating; perhaps that’s how this tradition originated, but it now seems to be a thin memory. And even when there is some elements of mob mentality, it’s not a mob of anger, but a mob of raw traditionalist energy. It’s really all about the second theme, which isn’t surprising, as a short story normally only can support one theme. (Doug thinks that the Wikipedia article isn’t wholly off base, but will stay mum here while Jimmy makes his point!)

Jimmy would characterize the original “The Lottery” as if the author wanted to say this: “Wow, America is obsessed with tradition. Do you even know why you do the things you do? Do you know how much you’ve actually changed the tradition, from previous countries, from the past? Do you know how silly this all is? If tradition told you to jump off a bridge, would you? If it told you to murder your friends, would you? Actually, yes, I think you would. Here, let me write what it would look like. Doesn’t that seem just like you?”

Jimmy remembers “The Lottery” both resonating with him and not. He grew up in a town and a church with enough old-fashioned American traditionalism left that he recognizes the particular flavor of traditionalism that it’s satirizing, but he also thinks a lot of America, after “The Lottery,” became too suburbanized and too detached from a sense of community to have the same type of traditionalism, that community continuity has become so shattered and so obsolete as a value that if anything we need more of what “The Lottery” satirizes right now, not less of it. But that’s “tradition,” and “The Lottery” satirizes “blind tradition,” which is generally bad. He also thinks that tradition should be maintained thoughtfully.

But that’s a discussion for “The Lottery” itself, not this spin-off. (Hear hear! says Doug) This spin-off, unlike “The Lottery,” is clearly actually about scapegoating, the ancient Biblical practice of putting the sins of the community onto a goat which was then sent away or forced to “[e]scape”:

But the goat, on which the lot fell to be the scapegoat, shall be presented alive before the LORD, to make an atonement with him, and to let him go for a scapegoat into the wilderness.

Leviticus 16:10 (King James Version)

The continued repetition of the word “sins” – this is a much more flowery piece than “The Lottery” – makes abundantly clear the religious element, and reminds us of Jesus, the Christian scapegoat, who dies for everyone’s sins in a manner whose mechanism is somewhat unclear, with many theories.

The oddest theory we’ve seen for why Jesus had to die was not that he was a ransom or bait for Satan or that punishment must be carried out to fulfill a divine requirement for justice, but instead that punishment had to be carried out to fulfill a human requirement for justice. This theory is naturally repulsive to most – basically, the theory was that humans need someone to blame, and God signs up for the role – and this seemed way too pessimistic an outlook on human nature – and way less cosmic an event than we understand the crucifixion to be.

But however heterodox such a theory might be, that theory, applied to a randomly selected human instead of to an incarnation of God, is the logic, we think, behind the sacrifice in this short story. Humanity needs someone to blame. America, specifically, needs someone to blame.

Well, yes, we kind of do. We’ve been developing a “great villain” culture for a while now. Jimmy says, every President is set up to be vilified by the other party, and it’s been escalating: Bush feels tame for liberals now compared to how liberals feel about Trump, and conservatives are now basically cussing at Joe Biden with the “Let’s Go Brandon” line. Meanwhile, Jeff Bezos and Elon Musk make all kinds of negative press for their stunts as rich people.

Within the story, all of the news that was blamed, at first on the previous lottery “winner” and then on our protagonist, is reflective of recent news. They threaten to turn it off, but then they don’t, because in this America, rather than focusing on our day-to-day problems – and the characters in the story had many, many problems – we feel instead like it’s more appropriate to blame the figures in the national news.

And this is because, according to the author, we as Americans feel hopeless. Why try to get a better job? The ultra-rich and their government cronies will prevent it anyway. Why try to buy a house? Capitalism has prevented millennials from succeeding. If we’re not able to succeed in our personal lives, then why not find somewhere else to focus our attention and our passions? This story is just a vivid depiction of what we’re already doing.

Doug found this story to be a bit heavy-handed, but it was a coherent story with a clear point. It is very much The Lottery, updated to be told by a liberal who has come to see America as more of a nation of problems than the land of the free and the home of the brave. Doug worries that this story reflects a belief by some in our country that America is no longer a place worth saving. It is, and the hopelessness and anger felt at our country by this story’s author portends something awful for this country. Doug hopes the author is in a minority and that people in this country can find a renewed sense of pride and optimism, finding solutions to America’s many problems instead of giving up hope.

Proof by Induction

This story starts with an assertion:

The Coda cannot change in the way that a person can, however; it cannot learn or grow. Your father’s soul is not in there. Your father has moved on.

It is put in the mouth of a Presbyterian minister, and so Jimmy’s immediate instinct was to question it. (Doug barely noticed this part of the story until after Jimmy brought it up as a focal point.) The chaplain is obviously biased, trying to uphold her religious views, trying to defend her traditional notion of an afterlife against an upstart competitor. Jimmy hopes that perhaps this story will balance her perspective against a different perspective and take sides.

Later, when we find out that the simulation restarts upon every entrance, we found ourselves wondering if it’s perhaps been programmed to do so to prevent people from taking it to seriously, as we see no particular reason why it should work like that. Perhaps our protagonist can change the programming, as it is clear to us that, in the real world, this would be a programming choice and not a fundamental design constraint of the Coda. (Of course, this is not the real world…)

As the protagonist tries to repair his emotional connection with his father, we wondered if out of frustration he might hack the darn thing to remove the restriction and receive some closure. It does seem like he’s making some progress partway through the story, but it’s erased by the plot contrivance.

Whatever this story is trying to say, it is held back by this contrivance. Is it trying to say that immortality is impossible and anything that pretends to it is a simulacrum? If so, then the weird unexplained technical limitation gets in the way of that point. Is it trying to say that even an afterlife wouldn’t help you fix your relationship with your parent? That might be true, but again, the contrivance makes us (Jimmy especially) feel like our protagonist hasn’t been given a fair shot.

Most of what Jimmy gets out of this story is “it would suck if someone created a very realistic afterlife technology and put an arbitrary limitation on it, because people would find it very frustrating.”

But also, why is the Presbyterian minister allowed to just proclaim things about this unquestioned, that might as well have come from the narrator? Why is anyone who isn’t extremely religious taking her approach? Why isn’t everyone debating whether these Codas have rights? Why aren’t they protesting in the streets? Why is the only person to consider the intellectual implications a random-ass mathematician rather than the writer of a think-piece from when it’s still in development? Is this some sort of totalitarian Presbyterian dictatorship?

Science Fiction is supposed to propose hypotheticals and then explore the consequences, or at least it’s supposed to come across that way from the reader. The theme must flow from the premise logically. This seems more like an attempt to make a point, and then contrive a hypothetical to prove it, and it is so contrived that I’m having a hard time discerning what the point even is.

But we recognize that’s not how this story works. This premise was tailor-made to demonstrate that sometimes, no matter how much we think we’re making progress with another person, they’ll just revert to their old ways the next time we see them. It’s as if the author was talking about their actual parent, and saying that from their behavior, they might as well be in a form of death, where the memory is lost each time they see them and progress is impossible. This can be taken as a portrait of that frustration, but due to the unbelievability of the premise, it was difficult for Jimmy to take it that way.

In Doug’s eyes, the theme of this story was: “People won’t change just because you want them to. Value people for who they are.” Doug was not as harsh as Jimmy was on this story, but he thinks that is because he generally tends to “softer” sci-fi that cares less about the reality of the underlying science or the technical elements. Unlike most of the other stories in this lot, this story involved some real character development, and a family relationship that felt super real. Indeed, that’s why we mentioned that it felt like the author was working through the author’s own family issues. In Doug’s eyes, this was a strength of the story.

Ranking

Due to the extreme unbelievability of the premise, Jimmy ranks this lower than he otherwise would have. Doug ranks it much higher, but Jimmy simply refuses to accept that society would create an invention so powerful just to use it mostly for finding documents and quick good-byes, and he thinks the resets-every-time thing is a contrivance.

Unknown Number

First, a caveat: we are not trans and probably have gaps in our knowledge about what it’s like to be trans. We are trying really hard to discuss this story accurately, but are not entirely sure of my choice of terminology or perspective. Please, send corrections if warranted! We would like to learn more.

The premise would fit as a specific example of the many lives that are touched by inter-universe communication in Ted Chiang’s Anxiety is the Dizziness of Freedom, one of Jimmy’s favorite science fiction novellas, and also a Hugo nominee for Best Novella in 2020 (Doug hasn’t read it… Don’t get mad). Unlike that novella, which was a rather realist take on “what would alternate-universe technology do in a real way to society,” this story gives the alternate-universe communication technology to exactly one person, so they can talk specifically about gender.

This serves as a window into what it’s like to be trans. Being trans often involves a huge decision: Whether to transition, and whether to change your public gender identity, your pronouns, etc. It is a high risk/high reward decision, and so the alternate universe model is very fitting for it. We both regularly imagine the alternate universes created by alternate answers to our big decisions in life, and we wish we could talk to those alternate versions of ourselves and see how that had gone. So we imagine if we had gender dysphoria, we’d like to talk to the alternate universes where we did or did not transition.

Well, in this story, only the version who didn’t transition reached out. This means that transitioning worked and fixed the problem, which is, we think, the theme, specific to the trans experience: If you are trans, you should come out/transition. You won’t regret it.

And more generally: Big decisions are hard, and they do have an impact on your life, but to be a coward is a decision in and of itself. Be bold.

Ranking

We agreed that the text message format wasn’t that interesting, and simply got the author out of having to write more detailed description. Jimmy wouldn’t say it was lazy (Doug probably wouldn’t either), but we do think the effort was put in to get a particular point across, not to develop a rich world and story. Similarly, the characters, both versions of the same person, are somewhat bare-bones, and the story itself gets a little repetitive. We think it’s really good for a Twitter post trying to convey a point about the trans experience and major life decisions in general, but not rich or well-developed enough for this list.

Tangles

This is Magic: The Gathering fanfic, and we mutually know basically nothing about Magic: The Gathering, so we will simply discuss this as outsiders – which we literally are.

As an outsider, we found it very difficult to read. We both procrastinated reading it (Jimmy for almost two weeks, much to Doug’s chagrin). The aesthetic and the world here does little for us, and we can’t tell what is novel to the story and what is a reference that fans will get excited about, which is disorienting and makes it difficult to enjoy.

The specific concept of a dryad needing a tree to survive just feels like a metaphor for a toxic way of thinking about relationships, which is immediately off-putting, so the premise immediately bothers us.

The plot, at its base, strikes us as somewhat better: Two people (using an expansive definition of “people”) meet, both in their own life-or-death level crisis. By cooperating and making “peace” between them (is the word “peace” repeated so much for thematic or world-building reasons?), they both manage to solve some of their problems, which they would have been unable to solve separately. The theme, then, is “work together even in emergencies,” which is a good moral lesson that many people need to hear.

Ranking

This seems written primarily for people who will be inordinately excited by the concepts of dryads and by having a story set in a Magic: The Gathering world, and we are very much so not that person. We also found it tedious to read, and none of the characters felt like characters, so Jimmy leaves it last, and Doug next to last.

Final Ranking Comparison

Jimmy

Mr. Death
Where Oaken Hearts
Sin of America
Proof by Induction
Unknown Number (Twitter)
Tangles (MTG)

Doug

Mr. Death
(after a huge dropoff) Proof by Induction
Sin of America
Unknown Number (Twitter)
Tangles (MTG)
Where Oaken Hearts (but I had a lot of trouble ranking this last one)

Conclusion: A Note on 2022 vs 2021

Given that we previously reviewed the 2021 stories, let’s compare these as a set.

Overall, we both believed the 2022 nominees were all around weaker than the 2021 nominees. Last year’s set had several strong stories (“Metal Like Blood in the Dark”, “Little Free Mermaid”, and “Open House on Haunted Hill”, all spring to mind). We could see how even the stories we personally weren’t as crazy about in 2021 had strong merits and would appeal to particular folks.

In comparison, the 2022 stories felt like a bit of a letdown. In Doug’s eyes, “Mr. Death” is really the only story worth reading in this whole lot, and while Doug likes “Mr. Death” more than any story in last year’s set, that by itself can’t carry the day. Doug is also concerned that the inclusion of some of the more atypical stories in this year’s set (“Tangles” and “Unknown Number”) signals that the nominators are too green and fanfic-y. And “Sin of America” (much like “Badass Moms” from last year) seems included mostly because it appeals to a particular political mentality.

Notably, the Nebula Award nominees did not include any of these three stories this year (although they did nominate “Badass Moms” last year). The three stories that were cross-nominated for the Hugo and Nebula were Mr. Death, Where Oaken Hearts, and Proof by Induction, all of which do seem like deserved nominees (even though Doug really disliked Where Oaken Hearts and Jimmy really disliked Proof by Induction, we both recognize that these respective stories were good, just designed for people who care about different things in their stories). Here’s hoping for a better lot for our next post!

Netflix Should Become a Tech Company

2022-05-27T00:00:00+00:00

Netflix should become a tech company.

I hear the obvious response already: Jimmy, Netflix is already a tech company!

Counterpoint: Is it though?

Somehow, after two dot-com booms, the markets still have an aesthetic-based definition of what constitutes a “tech company”: If a company – any company – has an expensive enough app, and if its founders talk enough about “disrupting” industries, then it is a “tech company” and is therefore entitled to a valuation completely disconnected from its actual industry. Think WeWork – and think what happened to it as people gradually realized it wasn’t an exciting tech start-up but rather a quite boring real estate company. Turns out, you don’t need an expensive app to run a coworking space.

A friend of mine pointed this out to me recently, claiming that the whole concept of a tech company was a façade. WeWork was the obvious example, but there are others: Uber and Lyft are taxi dispatchers, GrubHub (known in NYC by its other brand, Seamless) is a take-out catalogue, and Netflix is a premium channel. And Amazon’s more famous business (more on this later) is to be a retailer: its competitors are Wal-Mart and Target, or else mail-order catalogues.

Sure, all of these companies use phones and apps to do their thing better, sometimes uselessly (like WeWork), sometimes “disruptively” so, genuinely transforming the industry (like Amazon or Uber). But their thing is something that people have done before them, and will continue to do after them. At this point, doing something “with an app” should be as surprising as doing that thing “over the phone” or “using writing.”

Another class of companies is harder to categorize. Facebook and Twitter are doing things that would be impossible before the web, but fundamentally are not about providing technology either. They manage and organize content, and in so doing, get the ability to suggest sponsored content, to – as Zuckerberg informed the Senate – sell ads. They are content companies or web companies.

But, as I pointed out to my friend, this doesn’t mean there’s no such thing as a tech company, whose job is to provide technical infrastructure in the computing world. These tend to be older names: There’s IBM, which makes mainframes and whose subsidiary Red Hat maintains a Linux distribution. There’s Oracle, which licenses its database software that underpins a shockingly large slice of our economy. There’s Microsoft, which still maintains Windows even though it isn’t cool anymore, and Excel, the most popular programming language in the world that isn’t even branded as a programming language.

So now that we’ve established this dichotomy between true “tech companies” and “companies that do their business with an app,” let’s look at the corner cases:

Google in my mind straddles the line between content management – e.g., YouTube, GMail, Search (I would argue) – and tech – e.g, Android and Chrome – with the odd caveat that the tech is what they give away for free and the content management is how they make their money.

Amazon, in my mind, is a more interesting corner case. Their famous retail website, amazon.com, is, like Uber or AirBnB, an example of a normal business, but with an app. However, their app required so much technology that they have a major business in providing some of that technology to others as a cloud company: AWS. Whether or not “the cloud” is overhyped – and I think it really is important – a cloud company is definitely a technology company.

And this leads me back to my title topic: Netflix.

Netflix has been having a rough couple months. At the time of this writing, it has lost 72% of its valuation in the past 6 months, which is a lot even for this recent bear market; in the same time period, the NASDAQ index only lost 28.18%, and the S&P 500 only 13.84%. It lost subscribers for the first time in its history, and for many, it’s clear that the writing is on the wall.

And this corresponds to a frustration with Netflix as a content platform that I’ve noticed anecdotally among my friends. It’s been a long time since it’s been the go-to for streaming, when almost every show was either on Netflix or not streamable, and most popular shows were on Netflix: “What service is it on?” is now a more common question than “Is it on Netflix?” And many people I know anecdotally are questioning whether Netflix, now one premium channel among many, is even worth keeping in their streaming portfolio. Do we really still like its TV shows, or do we just keep it on there out of loyalty to what it used to stand for?

And Netflix will soon force many people to make that decision, by cracking down on password-sharing. Well, Netflix, you might not like the results. You’re coasting right now, and you might be out of gas; now might be a bad time to put on the breaks.

But oh, how the mighty have fallen! Netflix was a smashing success as a company when streaming was a novelty. As the Internet developed the necessary bandwidth – as Netflix helped force the Internet to improve its infrastructure – streaming exploded. Most people pirated for convenience rather than to save money, and streaming was even more convenient without any of those pesky moral concerns.

At the time, the goal clearly was for Netflix to be the sole streaming provider, licensing from traditional channels and movie producers, and being the one subscription every household needed, simultaneously creating and monopolizing the concept of streaming.

And it worked, for a while, but the traditional content providers were not so easily displaced, and competition was not so easily avoided. HBO was one of the early competitors, but now they are legion: Hulu, Disney Plus, Amazon, Apple TV, YouTube TV…

The new streaming market was too big for Netflix to hold onto. When streaming was a novelty used by a significant but not overwhelming number of households, it made sense for content creators to work through Netflix to reach that slice of customers. Now that almost everyone streams, it’s not just a slice of the market anymore, and it makes more sense for content creators to try and work around Netflix’s attempted monopoly and make their own streaming service.

And that’s a shame. Don’t get me wrong, I have no love for monopolization. But Netflix’s technology is simply better than all the other streaming companies’. Every single other provider simply has a worse user experience.

If you’re a regular streaming user, you’ll have already noticed this, since the Netflix apps work, and the others are merely workable enough. Glitches, lags, buttons that don’t work right plague the other streaming apps, whereas Netflix just works, especially on the web (SmartTV platforms sometimes have other issues).

And not only has Netflix spent more money and done a better job at polishing the user interface, they have worked really hard to collaborate with ISPs to store your videos as close to you in the network as possible, speeding up loads, decreasing lag, and increasing video quality. Other video streamers have not caught up, and I fear they will never do as good a job as Netflix – each one individually simply would not have the same bargaining power with ISPs that Netflix once had, and “good-enough” tech will be the standard at companies like Disney and HBO that never considered themselves tech companies.

As a programmer, I’ve heard great things about their tech, and heard it’s a great workplace for programmers (as opposed to content creators), but if I worked there, it would make me sad to know that my work, rather than improving everyone’s streaming experience, would only improve the experience on one second-rate streaming channel.

What if – and hear me out now – what if Netflix licensed its technology to other streaming providers? What if whenever you used the Disney Plus app or the HBO app, Netflix code ran and cached your content on Netflix colocated servers and played it in Netflix’s video player? I wouldn’t want it to be every streaming provider, but enough that the quality could go up. The OG Netflix could just be one premium channel “by Netflix Technologies” among many. In fact, the company could even split, so that potential clients don’t think that the original Netflix channel would get preferential treatment.

It would take time to do this transition. Maybe the channel should keep the original name for momentum, and the tech spin-off adapt a new name. Maybe it can start talks with the other platforms immediately about technology sharing. If I were in charge of another streaming platform, I’d definitely want a slice of that tech.

And maybe it wouldn’t work. Maybe the other streaming providers think their crappy streaming technology is “good enough.” This pivot would be a risky move, but the current stock price calls for it.

I understand why Netflix hasn’t done this before now. Monopolizing streaming seemed like a realistic goal for most of its history. But in the end, it failed. It would be a shame for its excellent technology to fail with it.

Disclosure: I own no position of NFLX whatsoever, but maybe I should get myself a short position, because I know they won’t do this. I just really wish they would. Goodbye, Netflix! We’ll remember you fondly!

Can you have too many programming language features?

2022-05-11T00:00:00+00:00

There’s more than one way to do it.

Perl motto

There should be one– and preferably only one –obvious way to do it.

The Zen of Python (inconsistent formatting is part of the quote)

When it comes to statically-typed systems programming languages, C++ is the Perl, and Rust is the Python. In this post, the next installment of my Rust vs C++ series, I will attempt to explain why C++’s feature-set is problematic, and explain how Rust does better.

C++ fans brag that it is “multi-paradigm,” and it is. You can do everything the C way, as C++ has a subset almost exactly identical to C. You can use pointers and virtual functions and inheritance to create all the classic OOP design patterns, as C++ is object-oriented. Or you can use templates, and “static” or “compile-time” polymorphism, and program that way.

At first glance, this all seems like an unmitigated good thing, because it gives you, as a programmer, flexibility. You can express your code in OOP style if that matches the problem at hand, or even if you just like it better. If you need the performance of templates, you can use them, and if you don’t (or you just find them confusing), you can use run-time polymorphism instead. Or you can just ignore all of it, and program in almost-plain C. Flexibility is good: you can use the features you want, and not use the features you don’t want. Even if a feature is downright harmful, in your opinion, that’s easy enough to handle: Just don’t use it.

And this is all very well and good if you’re programming a quick project completely by yourself. But most code comes in long-lived projects, with developers jumping in and out of the project all the time. In such an environment, as Robert C. Martin puts it:

“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.”

Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

(Sidenote: I will admit to knowing almost nothing about Robert C. Martin besides this famous quote. I have no idea if the rest of his work is as insightful as this quote, or not, and will probably try to find out someday, but not today.)

Since programmers in general spend much more time reading code than writing it, we very rarely actually get to reap the benefits of this flexibility as writers. Much more often, as maintainers and readers, we have to be flexible ourselves. We have to be ready to read code in any style, in any paradigm, using any feature-set.

This is why Perl was commonly panned as a write-only programming language: It had so many features that you could not be up to speed on all of them. Each programmer at each point in time had a set that they used, but no one could ever get proficient at working in the entire available feature-set.

In Perl, the features were syntactic, so the programs would be unreadable at a line-by-line level. In C++, the different features have more to do with code organization, which is harder to make fun of, but I think more insidious, because a lot of the features are structural.

Let me explain what I mean. Let’s say you’re a C++ maintenance programmer, and you don’t like exceptions. You’re trying to maintain a program that uses exceptions heavily, and add new features to it. Not only do you have to be able to understand exceptions to read the code, you have to write your own code so that it handles the exceptions where appropriate, and so that it’s exception-safe. Even if you’re just using a third-party library that throws exceptions, you have to understand exceptions to use that library.

The entire programming language, with all the features, is part of the necessary skill-set to program proficiently. Even if it is just you writing your own project, you still will have to use libraries, and the features involved with it. And even if it is just you, if the project lives long enough, you will have to deal with your previous decisions. Migrating from dynamic to static polymorphism in C++ is no joke. Ask me how I know.

And of course, every feature has to be considered when writing advice. Every best practices manual for C++ is written for C++, not a subset of C++ features. The more things it’s possible for a future programmer or future library writer to do, the more things you have to worry about coding defensively, and the more things that have to be included in best practices manuals, and finally the more things that a proficient programmer has to stuff into their brain.

Specific C++ Examples, Rust Responses

But I’m also not trying to advocate for absolute minimalism. There may be a cost to every feature, and it may be that no feature is optional, but that doesn’t mean that we should have the bare minimum number of features. Sometimes the cognitive and maintenance cost of a seemingly extraneous feature is still worth it. Especially in a systems programming context, different problems often do actually call for different implementation strategies with different programming language features to express them.

C++, however, does this poorly. I’m not even sure I’d claim that C++ has too many features; it’s more that the features are not consistent. They clash with each other. Different feature-sets make assumptions that are violated by other feature-sets. C++ is not designed with the costs of extra features in mind, and as such, the features cost more than they have to.

Let’s discuss a few specific ways in which C++’s features cause problems and clash with each other. For each of these categories, I then discuss how Rust handles the same topic, with a more coherently-designed feature set.

Value and Reference Semantics: Slicing

Slicing is a famous beginner error in C++, where the semantics of combining certain features are surprising with a tendency to break invariants, but no diagnostics are issued as the code is completely valid. Perhaps unsurprisingly, this code comes from a mismatch between two C++ features designed for two C++ programming styles.

Specifically, C++ has a distinction between value and reference semantics.

With value semantics, you can use operator overloading to make your custom class look and act like a built-in type, supporting operators like + and +=:

class complex {
    double re;
    double im;
public:
    complex &operator=(const complex &other) {
        re = other.re;
        im = other.im;
        return *this;
    }

    complex &operator+=(const complex &other) {
        re += other.re;
        im += other.im;
        return *this;
    }

    complex operator+(const complex &other) {
        complex res = *this;
        res += other;
        return res;
    }
};

// Sample usage
Complex a, b;
a = b;
Complex c = a + b;

With reference semantics, you can use polymorphism to create many different types of object that support the same interface. You can then access these objects through pointers or references to the base class.

class Complex {
protected:
    double re;
    double im;
public:
    virtual double getMagnitude() {
        return sqrt(re * re + im * im);
    }
}

class Quaternion : public Complex {
protected:
    double j;
    double k;
public:
    double getMagnitude() override {
        return sqrt(re * re + im * im + j * j + k * k);
    }
}

// Sample usage
void print_magnitude(Complex &c) {
    std::cout << c.getMagnitude() << std::endl;
}

Quaternion a;
Complex b;
print_magnitude(a);
print_magnitude(b);

However, these two programming techniques cannot be combined. You cannot assign a Complex object a Quaternion value:

Quaternion a;
Complex b;
b = a; // Non-sensical

Why? Well, unlike in Java, Complex b actually allocates the space for a Complex number as a local variable on the stack. This means that it only has room for the two fields, re and im.

But, unfortunately, if you include all the methods from both examples, that code will compile, and run, and b will have only re and im from a. This is almost certainly not what you want, and may in fact break invariants (e.g. for this you might only be dealing with values of magnitude 1, and this truncation would lower the magnitude).

This comes from two alternative paradigms for objects: by value as “primitive replacement,” where Complex can be used like an int, and by reference with traditional OOP inheritance and polymorphism. These paradigms don’t use different keywords, however. They can just all be used in the same objects, causing this trouble.

Advice on how to prevent this includes rules like “give all parent classes at least one pure virtual function,” which would make Complex b as a by-value declaration illegal. But if this rule is recommended in leading books on C++, why isn’t it enforced in the programming language itself?

How Rust Handles This

C++’s slicing is caused by a conflict between two features, inheritance and assignment. Rust handles both of those features differently, so that they do not conflict.

So the most important difference here between Rust and C++ is that Rust does not have implementation inheritance like C++ does. For two given C++ concrete types, one of which surrounds the other, there are two possible relationships between them: is-a, and has-a. Rust only does has-a for concrete types.

C++ inheritance is a feature with many use cases, such as sharing implementation, implementing policy, and implementing interfaces (what Rust calls traits). Rust, rather than having one big broad feature, instead implements individual features as appropriate. The closest feature Rust has to inheritance is traits (including subtraits and supertraits), but because traits are not concrete types, they cannot be assigned, and so this issue is avoided.

But also in assignments, Rust implements a simpler feature that is easier to reason about: Rust does not allow custom assignment operators. Rust instead builds assignment out of two operations: move, and drop (cf. C++ destructors). If drop is implemented correctly, so will assignment. If you want to copy instead of move, you have to explicitly call a clone() method. And moves are not customizable either.

So, although Rust has some of the best parts of inheritance in traits, and still allows assignment of custom types (but through customizing drop, not assignment per se), it avoids this particular clash through restricting the scope of those features.

Exceptions and “Exception Safety”

It would be impossible to write a post criticizing C++ for its problematic feature-clashes and not talk some about exceptions.

Exceptions are another famous example of a C++ feature you simply can’t “not use.”

Exceptions are viral by nature. If you call a function that might throw and don’t catch all the exceptions that it throws – which might be impossible to determine – then your function can throw as well.

And lots of functions can cause exceptions. Allocating memory indicates failure via exception. Exceptions are the only way for constructors to signal failure, and C++ idiom encourages constructors to be written in such a way that success guarantees that the object is usable. The programming language was clearly not designed to be used without exceptions.

But exceptions are gnarly and confusing. I already know people will comment to this post and say that if you write and structure C++ code correctly, it will be exception-safe. And that’s almost trivially true, since exception safety is part of correct C++ practice, but it’s not easy and it doesn’t follow naturally from easy-to-learn principles, which is why Herb Sutter, a huge name in C++, felt the need to write two books about it. Of course, in practice, people just write exception-unsafe code, all the time.

Every time you call a function – which can happen in C++ simply by declaring a variable, or even by ending a scope (though destructors are supposed to avoid throwing exceptions) – you have to worry about whether that function throws an exception, and if you’re leaving things within that function in an inconsistent state. In C++, it is very common to implement your own unsafe data structures, and exceptions are designed to be sometimes recoverable from. Lack of exception safety can mean memory corruption or even exploitable security vulnerabilities.

No wonder a lot of codebases ban exceptions. Unfortunately, many shops simply avoid using exceptions instead of banning them, leaving exceptions possible. Also, code from “exception-free” codebases can then later be mixed back in with regular C++, re-opening it to exception-safety concerns.

The fact that exceptions are so controversial can lead to confusion as well. Consider this function signature:

std::unique_ptr<DatabaseConnection> connect(const ConnectionParameters&);

How does this function indicate failure? From the signature, there are two possibilities: It could either return nullptr, or it could throw an exception. Hopefully the documentation would clarify – but again, oftentimes, people don’t write documentation, especially for internal APIs.

How Rust Handles This

Normal Error Handling in Rust

For recoverable errors, Rust encodes them in the type. Rust’s equivalent to std::unique_ptr – Box – is not nullable. If we want to return one, but possibly also signal an error, we use a sum type or what Rust calls an enum, and what C++ would call a “tagged union” and make you implement by hand:

fn connect(param: &ConnectionParameters) ->
    Result<Box<DatabaseConnection>, OurError>;

This means that it can return either a database connection or an error. This is the convention to return any error condition that is recoverable, which is half of what exceptions are used for in C++. Since Box is not nullable, you have to say more than just Box to signal that it’s possible to return an error, proving that you really mean it.

For unrecoverable exceptions – for situations like logic and programming errors that the program has caught – Rust has panics, which work much more like C++ exceptions in practice.

Panic Safety

Rust afficionados will know that Rust has not escaped exception safety, having instead an analogous notion of “panic safety.” How, then, can I criticize C++ so boldly?

There are two notable differences between C++ exceptions and Rust panics. The first is that Rust panics are used primarily for unrecoverable errors, such as errors that indicate that a programmer’s assumptions were violated due to a bug or a circumstance that the program cannot recover from or a misunderstanding from the programmer. These generally are unrecoverable, and Rust by convention uses a different mechanism, Results, for recoverable errors. So most Rust code doesn’t have to care about maintaining invariants in the face of panics, because most Rust code can presume that if it panics, that’s the end. This is better scoping for the panic feature, as opposed to exceptions.

But the fact remains that panics can be recovered from, and do still do stack unwinding and destructor/drop calls, and safety issues can still exist. Panics in Rust can cause memory corruption – in unsafe code. And that’s where panic safety really still matters: in unsafe code only. By cordoning off the implementations of sophisticated data structures that require unsafe, Rust also cordons off who has to worry about panic safety.

In C++, every function that calls another function has to be written in an exception-safe way. In Rust, it’s really only unsafe code that has to worry about it. This, in my mind, is a huge win, and it comes from both better scoping of panics, and better management of the situations where panics can break things.

C-style vs C++-style Pointers and Arrays

There is a subset of C++ that is almost identical to C, and C++ must maintain compatibility with this subset for tradition’s sake. It also must maintain compatibility with previous versions of itself. Between the C and the C++, the concepts contained in C++20 stretch from 1972 to 2020, almost 50 years of active change in programming language technology. This leads to features being duplicated, but differently, and in ways that unfortunately clash with each other.

For example: How do you express indirection? How do you alias a value? There are three different ways to do it, and rather than breaking down by use case, the biggest difference between them is era:

Pointers, from the original C
References, a newer innovation that attempts to solve some of the issues with pointers
Smart pointers, an even newer innovation that attempts to cover some of the remaining use cases. For pointers into arrays, iterators also cover a lot of the same territory as smart pointers, and can be lumped together for this conversation.

These overlap a lot, and there is no single principle that will tell you when to use which. You can invent some rules, and come up with some principled reasonings for them, but your colleagues won’t necessary listen, and external libraries and other codebases you have to interact with certainly won’t, not even the standard library, not even the programming language itself. Fundamentally, the difference is era.

Nullability? Part of original pointers. Later, we learned it was harmful and got rid of it in references, but due to issues with how C++ does move semantics it comes back with a vengeance for smart pointers. (Of course, you still can make a null reference, it’s just undefined behavior. Ah well.)

Pointers and references have special syntax, whereas smart pointers, because they came from a later era, use the more standard ptr_type<T> syntax. Pointers and smart pointers can be used to manage ownership, and references should not be.

How should out parameters be expressed? It’s easy to say they should be expressed with references, because otherwise they’re nullable, and you have to worry about whether to check for nulls or not. On the other hand, expressing out parameters with references mean you can’t tell at the caller whether it’s an out parameter, only at the callee:

int foo_ptr(int in, int *out);
int foo_ref(int in, int &out);
int out;
foo_ptr(3, &out);
foo_ref(4, out); // Surprise, this changes `out`! Can't tell, though!

foo_ptr(5, nullptr); // Does this crash? Does this work? Who knows!
// Read the docs, I guess *shrug* hope there are docs

References should be used, in my practice and in the practice of many people I respect, in every case where the reference is not owning, will not be used for arithmetic, and is not optional. Of course, this meets all of those requirements, but is a pointer, not a reference (but a special pointer, where being null is undefined behavior, like a reference), simply because references were invented after this was, and for no stronger reason.

Similarly, my practice dictated that std::unique_ptr should be used for owning pointers. It’s nullable, but at least it auto-frees, and so you should use it everywhere you’re conveying ownership. And then, Foo * can be used when you want an optional non-owning reference. But old APIs and APIs from C exist all over the place that will use Foo * invariably, and some will use Foo * for out parameters because of the callee readability issue, or because of concerns about std::unique_ptr, or simply out of old habit, meaning you can’t count on this convention actually being upheld, not at all.

And of course, converting between these different representations is sometimes as easy as & or *, and sometimes as difficult as having & and * compile and seem to work but result in memory corruption, and everywhere in between.

Similarly, T foo[N] and std::array<T, N> foo are different ways of writing the same basic thing. It gets weird when N = 0, of course; this is only supported by std::array. And (on compilers that support it at all), having N be dynamic on the stack is only supported by T foo[N]. And of course, new T[N] returns a raw pointer to T whereas new std::array<T, N> returns a pointer to a std::array, which makes much more sense.

So, basically, T foo[N] should be completely deprecated, but keeps on being used even by new C++ programmers because it looks like it should be the normal way to write an array, and because it looks like the arrays from C. But they’re completely different types – one isn’t syntactic sugar for the other.

This gets unwieldy, because the ways with the syntactic sugar (like new and T* instead of std::make_unique and std::unique_ptr) are the old, more C-style ways, the ways that yield more memory leaks (you have to explicitly free or delete a T*) and memory corruption (T foo[] doesn’t even have a safe indexing operation, or proper iterators).

And of course, even if you use the more modern formulations to save on cognitive load because they’re more consistent with the rest of the programming language (where std::unique_ptr does RAII unlike traditional pointers (spelled *) and std::array implements the expected collections methods unlike traditional arrays ([])), you still have to understand the traditional pointers and arrays completely to call yourself a C++ programmer. Due to C interop, people not changing their ways, and old resources, lots of new code is still written with them, and there are still situations where they’re unavoidable, like pointer arithmetic or this.

Besides, even if you do correctly discern that a T* must be freed, how do you free it – free, delete, or delete[]? Choose wisely, because the consequences of mixing malloc and delete can go beyond whether destructors are called, and lead to undefined behavior and general memory corruption. The documentation (or lack thereof), however, might just assume you know which one to call.

How Rust Handles This

Rust also has references and various types of smart pointers and iterators. It also has raw pointers, from which smart pointers can be implemented. So in terms of the range of features, it’s actually the same as C++. What’s the difference then?

Well, in Rust, the difference is that they don’t overlap in the same way. Each feature has its own purpose, unlike in C++ where it’s anyone’s bet whether references or pointers are used for aliasing or pass-by-reference, or whether raw pointers or smart pointers are used to express ownership. Nullability is mostly a separate concern from day-to-day use of Rust’s types, and so it is implemented orthogonally through Option and Result, rather than being available in some types but not in others haphazardly.

References are for everyday aliasing and pass-by-reference. They are not nullable. They represent the primitive concept of aliasing and pass-by-reference, and they are the only feature that does so. Unlike C++ references, you must use the & operator to create a Rust reference, making them explicit on the caller.

Smart pointers are, for the most part, also not nullable, possibly partially because Rust has destructive moves. They represent ownership semantics – whether “unique” ownership (Box), shared ownership (Rc or Arc), or locking (Mutex or RefCell).

Raw pointers in Rust are very special – they are for implementing smart pointers or other low-level data structures. They are for situations where the structure of memory and the concept of a pointer is actually key to the situation. They are kept within these narrow bounds, and outside of everyday application programming, by having most of their features considered unsafe.

If only that could be done for raw pointers in C++! But there is too much momentum behind the C++ raw pointer.

Dynamic vs Static Polymorphism

This is the most intense one, and could be a blog post all on its own – and probably I’ll write it one day.

In response to comments, I’m going to add a caveat here even though I address it later: In this section, I’m discussing the status of C++ pre-concepts, from C++17 and earlier, because that is the form of C++ that most people are still using, and that the vast majority of code is still written in. It is too early to tell how much concepts will help, but because they are an optional feature, I’m not at all optimistic.

We have two forms of polymorphism in C++, two very different systems. One is a Turing-complete macro system that comprises overloads, templates, and template metaprogramming. The other is an object-oriented style system of polymorphism through inheritance.

They were designed with different purposes in mind, and considering their original purpose, it’s clear to see why they must have different implementations.

Templates were designed for collections and algorithms, for being able to write a vector or linked list that could contain any arbitrary type, without resorting to a C-style void* that would require both indirection and type erasure. The lack of indirection is the point – at least it was for C++ – and so as a consequence templates had to be carried out statically.

Dynamic polymorphism, on the other hand, was designed for OOP design patterns. As such, in line with OOP principles, it supports heterogeneous containers, especially necessary to support OOP’s core use case of GUI programming.

But in spite of this deep contrast between static and dynamic, they overlap in use case. For example, Smalltalk, Objective-C, and Java (pre Java 5) all show us that you can use dynamic polymorphism to implement generic containers. If C++ had been less performance-centric, and could tolerate the indirection, it could have used a similar strategy, the (old school) Java approach to generic containers without generics or templates:

Make all classes inherit from a universal base class, Object. This way, Object * (just Object in Java) can refer to any object. Make sure, for C++, that this has a virtual destructor, so you can delete any object through its Object* handle.
Write “boxed versions” of all primitive types, classes that extend Object to correspond to int and double, etc.
Write all collection classes (std::vector, std::list) in terms of Object *, writing Object * instead of T.
Use RTTI and dynamic_cast (or in Java terms, casts) to allow the user to get whatever object type they want out of them.

Voilà! You can now store anything in your collections without need for generics or templates, using dynamic_cast, an obscure feature of the OOP-style dynamic polymorphism that C++ has. And this system is in fact still the basis of Java generics, and so we can project that C++ would have used something similar if performance weren’t a concern and indirections and RTTI were acceptable.

So that shows the overlap between templates and runtime polymorphism in a theoretical sense, but do these very differently implemented features in fact overlap in practice?

I’ve seen skepticism. I once interviewed people for a job, and I asked candidates to explain to me the similarities and differences between dynamic polymorphism and templates. The candidate said there was no overlap; templates were for generic programming (e.g. collections and algorithms and STL), and dynamic polymorphism was for object-oriented programming.

But they do overlap in practice. I know, because I spent a lot of time transitioning object-oriented dynamic code into static form, and teaching the static equivalents to dynamic polymorphism patterns. It wasn’t easy, because even though the overlap is huge, the semantics are vastly different.

Let me give an example. Let’s start with one of my favorite patterns: the policy pattern. Let’s imagine we have a function that sends messages in a way that can fail, and let’s also imagine that we have a policy that indicates how we should delay and retry sending this message. I’ll start out writing it the object-oriented way, something like this:

struct RetryPolicy {
    virtual bool should_retry(mesg_send_err_t error_code) = 0;
    virtual uint32_t delay_microseconds() = 0;
};

mesg_send_err_t retry_send_message(Message &mesg, RetryPolicy &policy) {
    while (true) {
        auto err = send_message_once(mesg);
        if (err == mesg_send_err_t::SUCCESS) {
            return mesg_send_err_t::SUCCESS;
        } else if (!policy.should_retry(err)) {
            return err;
        } else {
            usleep(policy.delay_microseconds());
        }
    }
}

The policy can then do things like “retry 5 times, waiting 0.01 seconds between each retry” or “exponential back-off, so that each retry waits twice as long as the previous.” It can also deem certain errors as fatal, but others as worth sleeping and retrying for. Here’s an example of using this interface:

struct WaitOneSecondAndTryFiveTimes : RetryPolicy {
    int retry_count = 0;

    bool should_retry(mesg_send_err_t error_code) override {
        if (error_code == mesg_send_err_t::MALFORMED_MESG) {
            return false;
        }

        retry_count++;
        if (retry_count == 5) {
            return false; // do not retry
        }

        return true; // do retry
    }

    uint32_t delay_microseconds() override {
        return 1000000;
    }
};

WaitOneSecondAndTryFiveTimes policy;
auto err = retry_send_message(mesg, policy);

Now, it turns out we can do this exact same pattern with static polymorphsim. The callee code now looks like this:

template <typename T>
mesg_send_err_t retry_send_message(Message &mesg, T policy) {
    while (true) {
        auto err = send_message_once(mesg);
        if (err == mesg_send_err_t::SUCCESS) {
            return mesg_send_err_t::SUCCESS;
        } else if (!policy.should_retry(err)) {
            return err;
        } else {
            usleep(policy.delay_microseconds());
        }
    }
}

This is no longer a function. It is a function template, which is a type of macro. Its implementation must now move from the .cpp file to the .h or .hpp file, for reasons that only make sense if you think about how the programming language is implemented.

No longer is the policy interface spelled out separately. The only thing the function signature says about the type of policy is that it is T – which can be any type. Only in the implementation, in the body, do we see that should_retry() and delay_microseconds() must be implemented on it. This is an implicit interface, defined by usage, very similar to Python and Ruby’s duck typing. More importantly, it is completely unrelated to the OOP-style explicit interface using inheritance and virtual functions.

The errors are completely different, because the rules are completely different.

test.cpp:57:34: error: variable type 'WaitOneSecondAndTryFiveTimes' is an abstract class
    WaitOneSecondAndTryFiveTimes policy;
                                 ^
test.cpp:22:22: note: unimplemented pure virtual method 'delay_microseconds' in 'WaitOneSecondAndTryFiveTimes'
    virtual uint32_t delay_microseconds() = 0;
                     ^
1 error generated.

With the template version, you get:

test.cpp:34:27: error: no member named 'delay_microseconds' in 'WaitOneSecondAndTryFiveTimes'
            usleep(policy.delay_microseconds());
                   ~~~~~~ ^
test.cpp:59:16: note: in instantiation of function template specialization 'retry_send_message<WaitOneSecondAndTryFiveTimes>' requested here
    auto err = retry_send_message(mesg, policy);
               ^
1 error generated.

The ad-hoc nature of template requirements should not be underestimated. It means that objects that are designed to work with a whole library might only work with the exact combinations of functions they’ve been used with so far. It means that documentation, if it wants to be rigorous, must do the work of defining the protocols itself of every argument taken by every function. It means that it’s not clear when you’re putting new requirements on arguments to a function, as there is no warning and no clear red-line step to tell you that you’re breaking backwards-compatibility.

Concepts have been introduced recently to clean it up, and I think it’s still early to tell how good a job they will do. But even if they do a great job, the polymorphism will still look very different from the OOP style, and the old template-based code will still exist, and so in the meantime the C++ programming language has simply continued to grow.

And the concrete consequences: It’s a perfectly reasonable decision to use OOP-style polymorphism, for the benefits of cleaner structure and explicit specification of the interface, even when the dynamic nature of the polymorphism – and its concomittant performance costs – is never actually called for. Meanwhile, using static polymorphism to accomplish the same goals is simply harder, requiring much more skill and training.

How Rust Handles This

Like C++, Rust has both static (compile-time) polymorphism, and dynamic (run-time) polymorphism. Unlike C++, Rust integrates them closely into a single feature, inspired by Haskell’s typeclasses: traits.

Let’s use the same example again, but in Rust, using static polymorphism, which is the more Rusty way to write such a function:

trait RetryPolicy {
    // Return None to not retry at all
    // Takes `self` as `&mut` to implement counting and back-off
    fn retry_microseconds(&mut self, error: MesgSendError) -> Option<Duration>;
}

fn retry_send_message(mesg: &Message, mut policy: impl RetryPolicy) -> Result<(), MesgSendError> {
    loop {
        match send_message_once(mesg) {
            Ok(()) => {
                return Ok(());
            }
            Err(err) => match policy.retry_microseconds(err) {
                None => {
                    return Err(err);
                }
                Some(delay) => sleep(delay),
            },
        }
    }
}

I changed the example a little to showcase some other differences with Rust. Instead of querying two functions, for example, to know whether to try again and how long to delay, I feel in Rust it is more natural to use sum types (and in particular Option) to fold them into a single function. Similarly, rather than a u32 count of microseconds, std::thread::sleep takes a Duration, and so I felt the policy trait should reflect that as well. Also, last but not least, in Rust it is not necessary to consider SUCCESS to be one of the error options, and so the types are more well-honed to the situation.

Notice, however, that this is the more performant static version, and it has an explicit in-code specification of what the interface is for the policy. However, the policy code and the generic code are fully integrated just like in the C++ templated version, through a process known as monomorphization. Fundamentally, monomorphization exhibits the same behavior to C++ template instantiation, but in a more principled, constrained fashion.

Here is the example of the usage of such a polymorphic function:

struct WaitOneSecondAndTryFiveTimes {
    retry_count: u32,
}

impl WaitOneSecondAndTryFiveTimes {
    fn new() -> Self {
        Self {
            retry_count: 0,
        }
    }
}

impl RetryPolicy for WaitOneSecondAndTryFiveTimes {
    fn retry_microseconds(&mut self, err: MesgSendError) -> Option<Duration> {
        if err == MesgSendError::MalformedMessage {
            return None;
        }

        self.retry_count += 1;
        if self.retry_count == 5 {
            return None;
        }

        Some(Duration::from_secs(1))
    }
}

let policy = WaitOneSecondAndTryFiveTimes::new();
let res = retry_send_message(&mesg, policy);

If we wanted to use dynamic polymorphism for some reason – for example, if we wanted to look the policy up in some sort of map based on a user-supplied keyword, or load the policy from a dynamic library – we could, easily.

Unlike in C++, barely anything has to change. In fact, only three lines have to change.

The function signature has to change, to indicate that it’s using dynamic polymorphism now. Dynamic polymorphism, due to its nature, can only be done through indirection, so we have to add that (though it does not affect the function body):

fn retry_send_message(mesg: &Message, policy: &mut dyn RetryPolicy) -> Result<(), MesgSendError> {

Similarly, the call site has to change, to implement the indirection:

let mut policy = WaitOneSecondAndTryFiveTimes::new();
let res = retry_send_message(&mesg, &mut policy);

And that’s it! Now it’s dynamic polymorphism!

When I first saw this is when I was truly convinced that Rust would eclipse C++.

Discussion and Conclusion

So, assuming I’ve convinced you that Rust has a better organized feature-set than C++, we have to discuss what, in the big picture, C++ has done wrong and Rust has done right.

The first and most obvious thing Rust did right was learn from the mistakes of the past. Each new version of C++ has to be compatible with previous versions to a great extent, including (in a lot of ways) C, giving it a legacy back into the early 70’s. Rust started maintaining compatibility in 2015, and so it’s only had 7 years or so of cruft, but knew about all of C++’s later add-ons from the beginning.

And one of the things Rust learned from the experience of others is how to mitigate this effect, so we can hope Rust retains its youthful freshness for longer going forward. Rust has an edition system, so that features actually can be deprecated and phased out, while still maintaining compatibility.

But also, Rust’s goal of separating safe and unsafe features – and keeping unsafe code encapsulated using the unsafe keyword – forces Rust’s feature set to be more coherent. If two features clash in C++, the standards committee can put the work of reconciling them on the programmer, but in Rust, they often have to do the work to make them make sense together, so they can continue to guarantee that safe code can’t cause undefined behavior.

Additionally, Rust believes in, and has, invariants. In C++, some structs can be trivially copied. In Rust, all data types can be trivially moved (Pin is almost but not quite an exception, and the work that went into making Pin not break everything shows how important the invariant is.) In Rust, a mutable reference always means that a block of code has exclusive access to a value. These invariants also structure other features, and force them to work in concert.

Enough about Rust, though. I think there’s deeper lessons to be learned from the flaws in C++. Bjarne Stroustrup famously said, “Within C++, there is a much smaller and cleaner language struggling to get out.” I think he regrets the quote, which he clarified is about the modern semantics of C++, held down by the outdated syntax of C. It’s such a compelling quote, though, because C++ is so messy and dirty, so we want to believe in a small clean underlying core.

The truth, however, is that there isn’t one smaller and cleaner programming language struggling to get out. There’s multiple. And from the beginning, C++ was a glomming-together of multiple ideas: a Simula-like OOP system glued awkwardly to C, without a unified motivating vision. Operator overloading was intended to help integrate the two parts, which it did but at the expense of creating its own entire sub-paradigm. And then came templates, which tried to add generic containers and algorithms but unexpectedly exploded into their own programming paradigm.

So inside C++, struggling to get out, is of course C, the original “portable assembly,” which does its very simple job well. There’s also Java/C# in there, if we take the OOP features on their own. For the operator overloading and RAII and templates, the closest I can really imagine is Rust, which I think if Bjarne was being fair, he would have to admit is close to what he specified when he clarified his quote: Rust does emphasize “programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.”

It’s understandable that Bjarne glommed OOP, a foreign paradigm, onto the otherwise-stable base of C. OOP was extremely popular for a long time, and has been awkwardly glommed on to many programming languages, and I think Rust benefits from not even trying to be an OOP language in the traditional 3-pillar sense (Rust doesn’t have inheritance at all, and has non-OOP concepts of encapsulation and polymorphism).

C++ wasn’t even the only programming language to result from glomming on object-oriented programming to C, and of the two big ones, it is the more coherently integrated. Objective-C comes from a more dynamic tradition of object-oriented programming, and it really feels like two programming languages glued together, in this case C and Smalltalk.

I programmed Objective-C professionally for a while, and most of the time, the C only came out when you had to do a little bit of pure logic outside of the object-oriented framework. In the meantime, all of the OOP code had to be written using the little whisps of syntax C left behind, especially @, which basically served as a sigil to indicate that what followed was to be interpreted in an Objective-C way … which in an Objective-C codebase basically should have been the default.

At the time, I dreamed of the leaner programming language inside Objective-C (the non-C one), and even started designing a Smalltalk dialect designed to interact with Apple’s Cocoa APIs: CocoaTalk, I think it was called. Ultimately, Apple unveiled their concept of it, sharing many ideas with Rust, known as Swift. I felt very vindicated the day Swift was announced.

Rust is C++’s chance to get a leaner, cleaner programming language. The syntax is heavily influenced by C++, even as the semantics come from a variety of sources. The design was done de novo with guiding principles that allowed all of C++’s vast repertoire of features to be reimagined but working in concert with each other. As someone who used to love programming in C++, which enabled programming techniques no other programming language could, I continue to be deeply impressed by the feature design of Rust.

A Checklist of Dev-Ops Disciplines

2022-05-09T00:00:00+00:00

I have worked on a lot of programming projects in my time, and while I was a programming consultant I have worked in a lot of different corporate environments. At some of them, it was easy to be concretely productive: I was able to contribute immediately, and at a rapid rate. At others, actual useful contributions would be impossible until I had a month or more of experience with a codebase, and even then every change would be a long slog. The difference can be overwhelming and palpable.

The biggest contributors to this difference wasn’t what programming language was chosen (though I do care a lot about that), nor how well the code was factored (though that’s also very important), but rather the organizational structure that surrounded the code: the build system, the repo configuration, the tests, the documentation, the ticketing system – the stuff outside of the code itself that was essential to how programmers interacted with the code. Most, but not quite all, of what I’m talking about falls under the header of Dev Ops.

After having read (some of) Code That Fits in Your Head, I have come to believe in the importance of check-lists, so I’ll share with you my personal check-list of important dev-ops and dev-ops adjacent considerations when setting up a new project, so that developers can work rapidly, effectively, and with fewer mistakes.

The stakes are high – if it takes forever to make a change, if the process between modifying your code and running your code is too long, programmers won’t be able to work unless they’re much more confident, biasing them towards overly simple fixes and against more complicated refactors. If there’s no tests, programmers will be overly careful modifying the code to avoid breaking things, and so the code won’t be able to evolve. New team members will take much longer to gear up, and everyone will be much less productive.

The worst thing is, managers are liable to dismiss developers’ complaints, and developers are unlikely to have the confidence to raise them. It’s easy to be unsympathetic to the complaint that a job is tedious or inconvenient. It sounds to many programmers and managers alike like laziness, and the obvious answer is “Well, that’s why we pay you the big bucks.” Obviously development at these shops is still possible, and the old hands at the company, who are used to whatever system’s in place, have accepted the costs already.

But make no mistake: Developer convenience and happiness is closely connected to developer productivity and accuracy. So let’s discuss how to make a development environment convenient for a developer.

So here’s how we can make programming convenient, as a check-list with some explanation for each item. Many of these items I learned from colleagues and leaders along the way in my programming career; this is my first attempt to collect all of them.

Development Environment

Let programmers use their own preferred development environment

Many developers have life-long habits and long-accumulated configurations for their favorite editors. I know I do! Standardizing IDEs or even operating systems can be tempting, but in general it isn’t worth it. Programming includes a lot of little steps, and making all of them take longer by changing a programmer’s environment can destroy momentum.

Provide standards for developer workstations

This might seem to contradict the previous point, but I honestly think you need both. It should be super easy to figure out what kind of operating system requirements and dependencies are necessary to build all the projects, because, as we’ll get to soon, developers should be able to build projects locally.

For example, most Linux distributions are customizable enough that programmers will be able to find a development environment within that distribution that suits them. The dependencies of a project can then be specified as a package list within that distribution, but the developers can then customize the rest of their interface. Commonly used distributions should be preferred if developers are doing their own IT, so that they can easily find help online.

Build System

You’ve changed a line of code. Congratulations! Now how long will it be until you can see the results of your change in action? How many steps do you have to take to see if it fixed your issue? If it broke compilation? If it passes tests?

If the amount of time or number of steps is low, then people will be able to try out various solutions, use trace statements to debug issues, and otherwise interact with their code like a live system. If it’s high, they have to rely more on their own reasoning, which is fallible, more likely to lead to bugs, and more likely to lead to timid, overly-conservative changes, that work around problems rather than addressing them.

So how do we accomplish this?

Projects should run natively and directly on developer workstations

In my mind, this is almost a deal-breaker for development. If you have to deploy to a dev environment or install on a physical piece of embedded hardware to test your software, your dev cycle will be far too long. Dev environments and physical hardware are of course essential for testing, but using them for absolutely all development introduces resource constraints where there don’t have to be, and lengthens dev cycles.

Even if the local dev environment is different from the prod environment, that’s fine. Even if some of the code won’t run and make sense, it’s still important to be able to run the rest of code locally. Even if it’s running on an embedded platform and operating system with no proper simulator, some of the code will work on Linux or macOS. Those components should be testable on the developer workstation itself.

Building a project locally should be a single command
Building and running automated tests for a project should be a single command

When I say a single command, I mean it. Exactly one. Two is far too many. Once you try it, you’ll never go back. If your workplace doesn’t do this, write a script. Check the script in.

Of course, if different developers have different computers, this might be difficult, but if you assume a standard set of dependencies, (or use a reproducible build system like NixOS), this command can just be an invocation of the build system.

In situations where it’s more complicated, a shell script should be written to encapsulate the complexity. This shell script should be included in the repo and maintained and checked by CI along with the other code, so that it always works. The exact invocation of the command should be completely invariate, and documented in the projects README.md file.

Programmers don’t need to be distracted by complicated multi-part instructions that haven’t worked exactly right in years, or that work on some machines and not others or by twiddling with their Docker settings. They should be focused on actually improving and fixing code.

Building and running a project locally should be a single command

This is similar to the above but might require sample configuration to be checked in along with the repo.

Builds should be reasonably fast

Developers should program on sufficiently powerful computers for their builds. Build scripts should use options like -j and if helpful send builds seamlessly to build farms (the seamlessness is important; it should still be a single command for the developer and result in a local build and run). Private caches should be set up, if this is possible with your build system.

If programming in C or C++, header file hygiene can be an important consideration in build speeds – invest time into it. Use incrementality features of your build systems rather than having scripts that clean every time. Structure the code so that incremental builds are possible.

If necessary, allow developers to build only part of the project (while still making it simple to build the entire project).

Version Control

Use version control for all projects

This is hopefully obvious to all modern teams, but I wanted to make sure I said it anyway to talk some about why it’s important.

The first and more obvious upshot of version control is being able to undo and research mistakes. If the code changed how it works, developers should be able to ask “when did it break” before asking “how did it break.” If the changes in the log are fine-grained enough, this might prevent the need for investigating the “how.” (Note that bisecting often requires fast dev turn-around as well – these are interconnected.) Version control should always be used. Even informal, one-person projects, such as writing test programs to try out APIs, should exist within a version-controlled repo.

The second upshot is that it enables collaboration. This also makes it important even for very small projects, because it enables you to easily ask your colleagues for help, and your colleagues can then look at the code with their own preferred development environment and try out fixes on their own machine.

Developers should be proficient in Git

It’s not enough to cargo cult Git knowledge or focus on that “one guy who understands Git.” Everyone should put the effort in to be that “one guy.” If you don’t know what “reflog” means or how rebasing differs from merging or how to edit commits deep in the history, you’re not a sufficiently proficient git user. Many, perhaps even most, programmers aren’t.

Use and Enforce a Branching Discipline

Even on relatively small projects, no one should be committing and pushing directly to the trunk/main/master branch. If people push directly to master, every commit is automatically collaborative. This will make developers commit less frequently than they otherwise should, and will decrease the effectiveness of version control by having fewer versions to go back to.

It will also, obviously, lead to people accidentally “breaking the build” as projects get bigger. Committing a small change and merging that change into trunk or develop should be two different actions. The first should be done extremely often, and the second should only be allowed if a certain number of hoops have been jumped through.

Enforce CI

Before code can be merged into master, it should build. By default, merging into master should be impossible unless the repository has verified the build with CI. This is where we can easily test that it builds in a deployment setting in addition to a development setting, where artifacts can be created to deploy to servers or embedded devices (though this should also be possible to do locally) and where we can run automated tests. Coding standards should be enforced here, through lints. clippy and cargo fix are great tools for Rust.

Ideally, your CI scripts should be checked into the same repo as the systems they test, as is supported by GitLab with its .gitlab-ci.yml files.

Have tests in the repo

This is related. I’m not going to go into how to write tests and test coverage and all of that here; that’s again a separate topic for many many books. But there should be tests, and the important tests should be in the repo, and they should automatically be run by CI.

Remember: Tests aren’t just a tool for making sure the developers didn’t mess up after a fact. They’re there so developers can make sweeping changes with confidence.

Avoid mono repos

This one’s simple: The git log is too spammy and CI for the whole thing takes too long to run. Also, we have the technology of submodules, or, if on Nix, nix-thunk.

Require code review

This should be enforced by your Git system. As for how to actually do code review, this is a big enough topic to be its own section, which is coming up.

Code Review

The main point of code review is not to make sure bugs don’t get into the code, although it helps with that. The main point of code review is to mitigate bus factor, that is, to make sure there’s more than one person who is ready to maintain the code. All other guidance flows from here.

At least the person who maintains the code should also review

If the MR is written by the primary maintainer of the codebase, it should reviewed by whoever would have to step up if they were abruptly “hit by a bus.”

This ensures that everyone maintaining the code is in agreement with not just style and correctness concerns, but in the general design, architecture, and organization of the code.

The standard should be “Would I take responsibility to maintain this?”

If the answer is no, why not? Asking myself this question motivates me to make more suggestions about how the code should be factored, so I can jump in and make changes easily like I can with my own codebases, rather than just simply verify that it looks like it works and doesn’t have any unwrap() calls.

This question leads to some natural sub-questions:

How hard is it to find bugs in?

It shouldn’t just not have bugs, it should be obvious it doesn’t have bugs. This way, when a bug is actually discovered, code that isn’t buggy but is complicated won’t distract the poor developer trying to find the cause.

How hard is it to modify to do something else?
How easy is it to mess up?

This is where DRY (don’t repeat yourself) comes in. If I repeat the same pattern of code more than 2 times, and someone modifies it, they might only modify some of the instances of the pattern. This can also be mitigated not through abstraction but by putting all the instances next to each other, which is sometimes appropriate.

The code, however, should also not do premature abstraction, because then it will be impossible to find issues among all the spaghetti of function calls and variable references, so this is a balancing act.

If a bug is found to be caused by this change, will we know which part to revert?

Remember, programmers should be able to bisect instead of having to read an entire codebase when they want to find a bug. If you found out that the bug was caused by this change set, would you be relieved to know or would you still have a lot of work ahead of you?

Documentation

Last but not least, documentation.

Documentation should say how to build the project

It should, as mentioned, be one command, and it should not depend on very much set up beyond “having a standard development workstation.”

Documentation should say how to run the project

What flags or configuration does it take? How do you tell it to re-read the configuration? Does it use any environment variables?

Documentation should say what the project is for

This should be before how to build it and run it, and should explain who might want to run it and where it fits into the broader organization, and the first things a programmer might want to know before looking at it. This will help people understand the stakes of modifying it, and where to start looking for features. This should be covered in the lede paragraph.

Which leads me to:

There should be a lede paragraph

This should introduce the repo to someone who’s never heard of it and doesn’t have any context for what they’ve stumbled across. It should include its role in the company’s tech stack, its status, and what technologies it uses.

Here’s some examples:

This is the main repo for our flagship product, and it is one of our few repos that is not open source. Customers use it directly to control the widget machines, which it contains all the drivers for, and also Node.js code to serve the user-facing web interface.

This is run as a twice-daily batch job to automate pruning the widget description files. It is run on customer machines, and is open source as local administrators might want to customize it. It is written entirely in Perl 4 except for one module that is written in APL. Sometimes, it doesn’t work correctly, and we have to manually run an earlier version written in JCL and Cobol (link).

This implements the new DSL for widget description. Currently, it only supports translation to old widget descriptions, but it is hoped that it will eventually be integrated into the main repo. It is a research project still under active development. It is written in Haskell and Idris, and contains, as a component, a custom Prolog interpreter.

Documentation should be discoverable

It should either be in the README.md of the relevant repo or linked to directly from there.

Ticket Systems

I guess I lied when I said that documentation was last. Project management is, I think, a topic for a different blog post, but what I wanted to say about this is: It should be very easy to add a new TODO item that the programmer doesn’t have to remember anymore. If it takes too long to make a ticket, developers will lose their flow on the project they were trying to work on, or will produce fewer tickets, in a bad way.

Ideal is “type a single sentence and press a single button” either in web or (preferably) command line. The resultant TODO items can then be fleshed out in a separate grooming meeting.

Conclusion

Paying attention to these things is a bigger multiplier on developer productivity than finding “10x developers,” and is essential for attracting and retaining good developers. Improving these things is hard, especially at organizations that are set in their ways, but it is far more important than it might look. Dedicated dev-ops professionals are essential in such things.

God grant me patience... and I want it RIGHT NOW!

2022-04-20T00:00:00+00:00

I’ve been feeling recently like I’ve been spinning my wheels in my personal life. I’m pressing on the metaphorical accelerator as hard as I can, probably too hard for safety, and instead of moving forward, the wheels are just spinning, spinning, spinning. I think a large part of it is my perspective of time. “Time is canceled,” my friends and I would say continuously during the lockdown. And it isn’t back, not yet, not how it used to be, not for me.

I would be far from the first to note the disconnect between the literal, constant, inexorable progression of time in a physical sense, and the wacky way in which we remember it. As Groucho Marx (apocryphally) framed it:

“Time flies like an arrow; fruit flies like a banana.”

When my friends and I would say “Time is canceled,” we meant it as a joke. But like many jokes, it was literally true; it was true of subjective time. While objective time is a physical property of the universe to be studied by scientists, subjective time is built out of rituals and milestones. Objective time is measured on clocks and dilated by near-light speed travel and gravity; subjective time is measured in hearts and minds and dilated by activities and events and locations.

Day isn’t just when our section of the earth faces the sun; it’s when we’re in the city, at or near the office. Night isn’t when we’re in the earth’s shadow; it’s when we’re at home, or at a bar. Weekend is not just a legal construct or a mark on a calendar, but it’s made out of brunches, mimosas, and daytime outings with friends, while still in Brooklyn.

Or at least that’s what these words meant before COVID. Once COVID hit, and the lockdowns came, all of these manifestations of time congealed into an undifferentiated gray goop. Subjective time was canceled, replaced with something incoherent. Time was simultaneously slow and fast: slow, because the beginning of COVID seemed infinitely long ago, as it felt like I had been starved of social contact for my entire life; and fast, as months flew by without commutes and outings and brunches and clubs and churches or anything at all to break up the gray goop of continuous apartment, apartment, and more apartment.

Then, eventually, after an infinite amount of time had passed in a few months, the restrictions started to lift. Gradually, and then suddenly, there was so much stuff to do. And so I experienced the summer camp effect.

The summer camp effect (not to be confused with, but perhaps related to, summer camp syndrome) is one of my favorite examples of subjective time. As I can’t find it via Googling, I think my friends might have discovered it from experience and first principles. The effect goes like this: When you were a child, summer camps lasted one week (leastways they did for me). But during a camp experience, you would make new friends, have a temporary best friend and rivalries, if you’re older maybe even a “camp girlfriend” or “boyfriend”!

Children at summer camps (and adults on retreats or vacations) get an entire in-camp life, squeezed somehow into a few objective days, but as large emotionally as an entire semester at school. This makes sense if you think about it. You’re doing completely different activities than you’ve previously done. Each day you’re doing a lot of adjusting, a lot of learning new routines, a lot of meeting new people. It’s just long enough that you take it seriously as something to acclimate to, a new context to judge everything by, but also short enough that every individual event has an outsized importance.

You know you’re experiencing this effect when you say things like “I just went swimming earlier today? That feels like three days ago.” And the reason is simple: Three days’ worth of different things have happened since then, both different from each other, and different from what you’re used to.

COVID was the opposite of summer camp effect. During COVID, the set of events and activities I was used to dwindled to naught. Leaving the apartment at all felt like an event. “Three days ago? That felt like earlier today,” I would say, as less than a day’s events had happened in the past three days.

So then, as COVID restrictions thawed, completely normal outings, like going to a restaurant, felt like huge accomplishments – because in context they were. Every event where there were more than 2 or 3 people felt like a huge, even decadent, party! Because my baseline was so pathetically low, this was a perfect recipe for the longest-lasting summer camp effect I’ve ever experienced.

For months, for basically the entire second half of 2021, subjective time snapped back in the other direction like a rubber band. Each week felt like a month. Things that happened a few days ago felt like they were long-forgotten memories. So much was happening so quickly, even if it wasn’t objectively all that much, because I was de-acclimated.

And then, while still de-acclimated, while still experiencing this post-COVID perpetual summer camp effect, I up and moved to a new town. Now that I’m in said new town, visiting new places, meeting new people, navigating new obstacles and forging new routines, I added an additional layer of summer camp effect on top of the “COVID recovery” time dilation I was already experiencing.

But of course, settling into a new place – especially a new house, when I’ve never owned a house before – is a lot of work, work that takes time.

And it’s not just the logistics and paperwork of moving (though of course there are ungodly reams of paperwork). I have to get acquainted with this new town, learn the dance that this town dances, a dance with a different rhythm than I’m used to. Some processes simply can’t be rushed, and that includes meeting new people and setting up new routines here.

So now I’m in a bit of a pickle: I’m experiencing time as going slower, while also having a lot of goals that simply will take a lot of time, no matter what I do. Between these facts, everything in my life is taking forever.

So I’m just here, spinning my wheels, particularly within my social and personal life, making no apparent progress on my urgent goal and top priority of developing a routine for my spare time and settling into my new living situation. Why don’t I have a routine worked out? Why don’t I have a full complement of local friends, a weekly game night, hobbies both social and solitary? Am I just not cut out for small town life after all?

Simultaneously, I’m feeling extremely busy setting up the house: Why isn’t it set up yet? Why is everything so untidy, again? Why do I feel like I’m perpetually behind on making the house look remotely presentable, or even correctly furnished? Am I just not cut out for home-ownership?

There’s a word for this feeling: burn-out. The popular conception of burn-out is when you’ve worked too hard, are exhausted from it, and cannot work anymore. And I certainly have worked hard: Getting a mortgage and buying a house is one of the most difficult, convoluted, and bureaucratic journeys I’ve ever undertaken. And I’ve since had to do a lot to “settle in,” a shockingly pleasant euphemism for a deeply stressful process.

But that isn’t the entire picture. Burn-out doesn’t come just from working too hard. Burn-out comes from a feeling that your hard work isn’t accomplishing anything – that accomplishing things may be impossible, because nothing’s happening even with all the effort put in.

Some examples of burn-out:

Therapists get burn-out when in spite of all their efforts, they can’t fix everything, and their patients still have the same problems.
Teachers get burn-out when in spite of all their efforts, whether creative lesson plans or well-structured incentives, still leave some students struggling at academic concepts and skills.
Programmers get burn-out when they spend aeons learning a new code-base and learn all its flaws, but instead of being able to fix the flaws and make everything easier in the long run, they just have to learn to live with it as the technical debt gets even worse and every simple task just takes 5x longer than it would if they could just spend some time to clean up some things.

It can be hard to recognize burn-out, because in the moment, it feels like laziness, failure, or procrastination. But it’s not about the amount of work. It’s about the ratio of (felt) work to (perceived) results. It’s about the feeling that maybe if you try even harder, you’ll get better results, when what you really need to do is step back and take stock of things, and make sure you have the right overall approach and right goals.

So here’s my example of burn-out to add to the list:

New residents of a town can get burnt out when they feel like they’ve done gazillions of things, but their life in that city is still not as full as their old pre-COVID life.

But, though I may have done gazillions of things, I haven’t actually lived here that long in objective time. I’m measuring my results in subjective time, and that’s unfair to my efforts. Some things just can’t be rushed.

All I can do now is remind myself it is too early to be making conclusions. Even if small town life is perfect for me, I shouldn’t expect to be in my regular groove already. Even if home-ownership suits me perfectly, I still should expect to be setting up this house for a while. I had to start reminding myself of this the day after I moved, and I have to keep reminding myself of this now. It’s my new constant mantra: “Some things just take time.” And also, when that gets old, I can chant: “It hasn’t even been that long!”

Because in the end, subjective time is not objective time. I can remember that, and use that fact as the weapon to fight my unreasonable emotions. Because in truth, all things properly considered, everything’s going according to plan. I may not have enough friends or activities here yet, but I’m working on it, and after all, I just moved here.

More on Mortgages

2022-04-19T00:00:00+00:00

Mortgage interest rates have recently risen, and are currently very volatile. At the time of this writing, PSECU, my credit union, is offering mortgages at 5.125%, much higher than the 3.125% I locked in at, but lower than the peak above 6% I had recently read about in the news. But what does this mean in practice? Well, let’s run some numbers.

Understanding how expensive a house is can be confusing. The total price of a house is a huge number, more money than we normally ever deal with, for most first-time buyers more money than they’ve ever actually had or seen. It can be intimidating.

The more accessible number – the more relevant number for our day-to-day lives – is the monthly payment. The monthly payment includes principal and interest, and also escrow for property taxes and insurance, but let’s focus on the mortgage and interest right now.

The total principal and interest portion of the payment stays constant, but over time more of the payment goes towards principal, due to amortization. For a standard American 30-year fixed rate mortgage, the size of that monthly payment is a linear function of the size of the mortgage and a non-linear function of the interest.

So at my interest rate of 3.125%, a $100,000 mortgage would correspond to a monthly payment of $428. A more realistic $300,000 mortgage would cost three times that per month, or $1285. If you assume $700 or so in insurance and taxes, that comes out to $1985, a very reasonable total amount. If that were a New York City rent, the occupants need to make 40 times that monthly amount per year, or $79,400, to qualify for the apartment, and so in my mind it’d be reasonable for a person or couple who made that much to live in that house.

What about the new rate of 5.125%? The $100,000 mortgage now has a monthly payment of $544. The $300,000 mortgage would then come out to $1633, for an overall monthly payment (including the estimated taxes and other escrow costs) of $2333. By New York City standards, to qualify for that rent a group of tenants would have to make $93,320.

That’s a 17.5% increase in monthly costs, with the estimated monthly escrow payment included. Without including the monthly escrow payment, that’s a 27.1% increase in monthly payments. That means that a $300,000 mortgage at 5.125% would have the same monthly principal and interest payments as a $381,300 house at 3.125%. If you take escrow payment into account, the equivalence is closer to $350,000, which is still substantially more expensive.

This is to say, houses have now become much more expensive for borrowers, who represent the vast majority of middle class home buyers. The numbers on Zillow now mean something very different from what they meant just a few short months ago.

Will this dampen the insane level of real estate demand? Will this lower home prices to accommodate the fact that many home buyers’ budgets will naturally have shifted? Theoretically, yes, but it’s unclear whether we’ll notice in this intense a market.

But it’s important to keep this in mind when thinking about houses. The big total price of each house isn’t the only number that matters. The interest rate can also make a huge difference, especially if it is changing rapidly.

Reviews and Reactions: 2021 Short Story Hugo Nominees

2022-04-10T00:00:00+00:00

NB: These are for the 2021 Hugo awards, not the recently-announced 2022 Hugo awards. That one is coming soon.

I decided to write up my thoughts on each of the short stories nominated for the 2021 Hugo awards. Of course, here be spoilers, spoilers galore. If you don’t want these stories spoiled, go read them, and then come back here.

As an exercise, a friend and I read each of these stories and told each other what we thought the themes were, and I reference that throughout these reflections. Themes, as we define them, are thematic statements: the point the story is trying to make. Themes are distinct from thematic concepts, in that they are complete sentences rather than just nouns. They are distinct from premises, in that they are the take-away for the real-world, not a statement about the world of the story. And, to be clear, there can be more than one completely valid answer. Both my friend and I would posit what we thought the theme was, answering independently without consulting each other, and then we would discuss the story in greater detail.

What follows are the tangible results of those discussions: reflections about each story, somewhere between review and analysis. Each header is also a link, because all of these stories are available to read online. They are reviewed in descending ranked order of how good I thought they were, and the rankings are explained.

1. Metal Like Blood in the Dark

I found this story deeply compelling as well as deeply enjoyable, and it touches on deep questions of how we should interact with evil and what to do about necessary evil and the corruption that results from it, and how much we should prepare our children for it. It does so while leaning heavily on its Sci Fi setting – the message itself depends deeply on the unrealistic setting where it’s possible for a child to grow into a quite capable near-adult without experiencing the concept of intentional deception.

A creator and father makes one male, one female creature, and raises them in blissful innocence, within the confines of a garden (or, as it were, a planet), with boundaries programmed in to prevent them from gaining too much knowledge of good and evil, keeping them innocent – innocent, or alternatively put, naïve.

This telling is more optimistic than the Biblical story. Knowledge of Good and Evil comes from practical experience and necessity, not disobedience. Somehow, Sister (but not Brother) works out (from first principles!) deception, lying to yourself to stay consistent, sabotage, and killing in self-defense.

Her initial reaction to lying was very visceral to me as a programmer. Programming is an exact discipline; most computer programs cannot recover from internal data corruption. If an error is not caught and handled, the corruption can spread, resulting in security breaches or arbitrary instability. Much of the history of software design is coming up with ways of partitioning this instability. The thought of causing corruption on purpose is so counter to all of this work!

And of course, that is how lies spread throughout the honest world in human life too; we’re just so used to dishonesty we don’t realize.

When Sister realizes lying is possible, she realizes that understanding lying is necessary to interact with others, and gradually realizes that lying is something she might be forced, by circumstances, to do. She realizes that it would therefore be impossible to go back to a world where the only falsehoods are errors, where errors do not need to be maintained on purpose. What a novel way to look at the loss of innocence!

I was surprised that, once she realizes how necessary lying is, she decides that in any case, she wants to protect Brother from it – another contrast to the Biblical story where Eve shares the fruit with Adam. I feel like even this drive to preserve innocence comes from a loss of innocence: The old Sister, the naïve Sister, would surely reason that this new “lying” concept was an important skill, that of course it would be practical for Brother to know about. But now that she knows how to lie, she decides to use this new skill to protect Brother’s innocence.

Because she is lying to Brother: She lies by omission about what happened concretely to the villain, but she also lies in a bigger sense by omission by not explaining lying to him, and how it is sometimes necessary. From this point on, her relationship with Brother will be a big lie by omission about the fundamental nature of the universe – and still a lie that she thinks is worth it, to preserve his innocence.

My friend says the theme of this story is “Teach your children to be good, and then when they confront evil, they will learn to be cunning without becoming evil themselves.”

But if the story is trying to say that, I don’t think it proves it. Brother does not learn cunning; Sister is merely luckier. And with Sister, I get the sense that she barely learns it on time, and that her ability to leverage it so successfully on her first life or death attempt is fundamentally, again, a lucky break.

I would agree that this story is about evil, particularly dishonesty, but I would say the theme is that even the concept of evil and dishonesty is fundamentally corrupting. That innocence, once lost, cannot be regained. The concept of saying something false on purpose, to a truly innocent soul, would itself be an irreversible corruption that she would then want to protect others from.

I ranked this one first, as did my friend, independently of each other. Later, I learned that it did in fact win the Hugo, well-deservedly. It stood head and shoulders above the rest, in my opinion.

2. Open House on Haunted Hill

This story is about places. Places often seem to have personalities; the premise here is about what it would be like if houses literally did have personalities – if they had personalities and could act on them. This story is about belonging in a home, a home that is so loved that it begins to love you back. It is about finding a place to belong after being stricken by grief.

I enjoyed the detail that the couple was not yet married, but did have a child together. It showed the motion and dynamism of their lives when tragedy struck.

My friend said the theme of this story was “in our grief, it’s hard to see what’s good for us or to see when others are trying to help us, but goodness abounds in the world – just keep your eyes open.”

I agree.

But I think there’s also lot about place: “Choosing a good environment is important especially if you’re emotionally developing or emotionally healing.” As someone who’s moved not only houses but states in the past year, that really resonates with me. As we’ve all suffered through the lockdowns and restrictions of Coronavirus, the importance of our environment for our sanity and stability, I think, has been magnified for all of us.

It is perhaps for that reason that I rank this one second instead of third; it was a really close one.

3. Little Free Library

I really enjoyed this story. I couldn’t put it down – and my ADHD has really been getting in the way of my reading recently, so this is high praise.

I used to live in New York City, and I often would people-watch, and think about the lives of all the people I’d see on a day-to-day basis. Their worlds were completely separate from mine, even though, for a few moments, we were in the same physical space. In the buildings around me, completely different lives, completely different problems, completely different dreams are happening, and we get continuous brief windows to interact with them. And of course, the same is true in other environments as well; the other worlds are just less visible to us than they are in a big city.

Little free libraries are designed to encourage this, encourage this connection with strangers, to reach out into other worlds, with the power of books, and the power of art. Who knows what’s going on in the lives of the people who take your books, who drop other books off?

This story expanded on this theme. The narrator’s little free library, rather than simply connecting with the “other worlds” that the neighbors live in, connected in Narnia-like fashion to a literal other world.

Like many fantasy stories, this has an element of “There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.” As applied here, it is a reminder that the people we interact with might be going through a completely alien situation to ours, as symbolized through interacting with a literal alternative world.

A friend said the theme could be stated as “books are powerful tools, and it’s good to help others.” I said “one person’s community engagement art project is another’s life or death lifeline,” and I think this is true even if they aren’t some strange birds or dragons from another dimension.

After all, sometimes other people’s lives, in real life, are as strange to us as inter-dimensional birds or dragons.

The premise is super fun. I would have enjoyed a sequel: what is it like to raise the heir to the bird/dragon throne, in our world?

4. The Mermaid Astronaut

The premise of the story is so loud here it crowds out the theme: “The Little Mermaid”, but in space! So full confession: I have actually not read “The Little Mermaid,” but I have seen the Disney movie. So from my point of view, the romance of being a crewmate of a starship fills in for the romantic Disney prince, which is definitely a statement about alternatives to finding a partner as a purpose for life.

My friend says it’s about remembering your family and where you came from – specifically, “It’s important to explore and grow, but at the end of the day, never forget your roots.” I had a similar thought, but a bit starker: “Following your dreams often requires great sacrifice, in the form of missing out on the entirety of what your life otherwise would be.” This happens twice: she misses out on her entire life with her sister to go to space, and she misses out on having a full life in space to ever see her sister again. No, this story says, you can’t “have it all,” even if that isn’t defined to include raising children.

The big reveal was that she was separated from her sister by time dilation, so that she is still young when her sister is old. One thing I found unrealistic is that no one warns her, and that she’s not angry. Both the witch at the bottom of the ocean and the crew of the ship knew about time dilation, but apparently it’s only explained to her right on time for her to get back to her sister while the sister is barely still alive.

The end result is appropriate thematically, as it stands in for careers or dreams when we realize only when it’s almost too late – or entirely too late – that we need to re-connect with the people that we love and that we have left behind.

But that theme wouldn’t have been dampened by some more conflict, but rather enhanced. If she were angry that she hadn’t been warned, that would apply equally well to real-life careers. If her crewmates were more resistant to her going back so early (and diverting their entire ship), that would, again, apply equally well to real-life careers.

All in all, everyone was too chill about these high-stakes life-altering decisions. The knife to cut her fins and the pain was good, but the emotional pain of conflict would have been even better. In all honesty, I suspect this story would improve with expansion – I think it would work better as a novella or even a novel, so all these major life twists could be fully fleshed out.

Because of these technical issues, and the lack of originality in the premise without that substantial a twist (“The Little Mermaid” is also originally about sacrifice and irrevocable major life choices), this one ranks on the lower end. The twist was still enjoyable, but not as deep or as well-executed as it could have been.

5. A Guide for Working Breeds

This story was fun, but didn’t seem that rich.

It wasn’t clear, as my friend pointed out, in what way these were robots and not just regular people. This wasn’t meant entirely literally; the story makes constant minor references to them being robots, like how they don’t eat food (but somehow still like omelettes) or get static damage to their GPUs, but they’re not robots in a way that is interesting to the plot. Robots are just another oppressed minority group, taken advantage of by bosses via machinations of questionable legality. The story could’ve worked equally well set in a medieval setting with some oppressed ethnic group. It’s not using the Sci Fi for the purpose.

My friend said the theme was “always be nice to those you meet; goodness is paid forward.” That’s definitely there, but I think there’s more to it. The continuous references to labor laws, plus the odd gladiator fights the mentor is involved in, make me think the author is going for (and hitting) something deeper than that. “Solidarity is necessary in the working class” I think is closer to it. “Leverage the system fully to acquire wealth and then share with your working-class comrades.”

Unfortunately, it seemed too easy in general, but particularly in not requiring the Sci Fi content for its theme (a bad sign in my book). Why did the original mentor come around and turn from an annoyed conscript into the mentorship program to a true friend to the mentee? No solid explanation is given, besides raw empathy, robots’ robotity to robots. And enjoyment of dogs is there, I suppose, as a facile personality quirk, but not fully developed or explained.

6. Badass Moms in the Zombie Apocalypse

This story wasn’t for me.

This is literally true, in that it seems to be for women about womanhood, and I am not a woman. It is also more broadly true, in that I am not a huge fan of zombie apocalypse stories in general (though I’m not categorically opposed either), and specifically I wasn’t a huge fan of this story.

This story focuses on a group of women who have decided to live in an explicitly matriarchal and (at least initially) all-woman group to better survive the apocalypse, which reminds me of the ancient Greek legend of the Amazons, or the Many Mothers from Mad Max: Fury Road. This group uses explicit feminist solidarity as their impetus to band together to survive the zombie apocalypse.

In general, I found the story cringily on-the-nose. I thought the way it re-applied the feminist rallying cry “my body, my choice” to be forced, rather than insightful. In general, I found it somewhat incredible (and therefore also forced) that even an all-woman group would be talking so much about feminism and feminist topics while surviving a zombie apocalypse. It seems more likely that they’d be more focused on being people than being women, as their gender wouldn’t be the most relevant factor in the situation.

There was one point within the story where it was revealed that the protagonist had once had to come out to her racist preacher father; it was extra difficult, because she was not only dating a woman, but dating a Black woman. I know that these situations are all too common in our society, but I also felt like in this story, this was gratuitous. It was played entirely straight, just referenced as a situation that would obviously be difficult, in a way that conveyed nothing new about such a situation, no particular insight. I already know that such situations exist, and are hard, and I already know that a lot of people go through similar situations. Show me some depth, some new realization about it!

And that holds for this story in general. I already know that women can be survivors too, that women is just as useful in an apocalypse as a men, in some ways more useful. I’m even willing to believe that a group of all women might have even better survival characteristics, but this story didn’t convince me of it – it just asserted it.

Conclusion

I enjoyed this activity, and I feel like I have new insights into Sci Fi, particularly that stories are better when the Sci Fi elements of it are more core to the theme. I hope to do more of these short story reviews in the future.

Review: The Comic Book Story of Beer

2022-04-08T00:00:00+00:00

I like beer, and I like comic books, so I was excited to read The Comic Book Story of Beer.

And it was overall quite a fun read! It contextualized how important beer was in antiquity – including theories that beer catalyzed the agricultural revolution – and how important it’s been in society ever since, taking a social approach to the entire history, while also explaining a lot of the science alongside the primarily social narrative. It was a really fun read, and I recommend it to anyone who enjoys beer or who cares about history, which I think is most people.

I would state the general theme as this: “Beer always has been an essential ingredient to civilization.” And I think it does a solid job of proving that theme!

It spent some time specifically on the craft brewing revolution that took off in the US and the UK, and is now associated with “hipsters.” And it made me reflect a little on what a hipster was. Here’s some things associated with hipsterdom:

Living in a city after having grown up in the suburbs
Beards
And the focus of this book: Having actual variety in beer, instead of corporatized “light” American Lagers

All of these are things that Boomers (especially White Boomers) for some reason really made untrendy, and which Millenials are bringing back, skipping the bland suburban generation(s) for an attempt to return to some equilibrium, to a more normal state, undoing suburbanization, white flight, and the bland corporatized beer that goes with it.

And of course, the irony is really the dissonance from being raised in an environment where you weren’t expected to have this life trajectory, but here you are.

Then again, I’ve been told I don’t understand hipsterdom, so take what I say with a grain of salt.

In any case, I’m glad that among the decisions of our parents and grandparents we’re reconsidering, “light” beers are among them, and if you want to learn more about beer-making, its history, and little tidbits of scientific details, this is a fun book.

Can you reproduce it?

2022-03-22T00:00:00+00:00

NOTE: This post has the #programming tag, but is intended to be comprehensible by everyone, programmer or not. In fact, I hope some non-programmers read it, as my goal with this post is to explain some of what it means to be a programmer to non-programmers. Therefore, it is also tagged with “nontechnical”.

What is the most important skill for a software engineer? It’s definitely not any particular programming language; they come and go, and a good programmer can pick them up as they work. It’s not estimating how long a project will take, as important and elusive as that skill is – because fundamentally, no one can, and many, many programmers are successful without having fully built up that skill.

No, in my learned and considered opinion, the most important skill in a software engineer is solving – and preventing! – problems. It is squashing and preventing “bugs” – those situations where the software behaves in an undesirable fashion, where it fails to meet expectations, whether or not you knew about those expectations ahead of time. That is the crux of the software engineering skillset. Preventing and fixing bugs is the goal which the other skills uphold, and the criterion by which software engineering principles and practices should be evaluated.

My other programming posts can be understood through that lens. All my posts on why Rust is a better programming language than C++ – the point is that Rust, as a programming language, is top-notch bug repellant technology. For any post about code organization and readability, the reason it’s important for code to be organized and readable is so that another programmer trying to find a bug is able to find it quickly, or that a programmer trying to add a feature doesn’t end up also adding more bugs, due to a misunderstanding of how the code works.

But today, I wanted to talk less about the prevention, and more about the squashing, about what to do when you’ve found a bug.

So how do you squash bugs?

First, I want to note that the most important bug-squashing tool is the human brain.

There is a tool, a type of program, called a “debugger,” but that is less essential than you might think from the name. A debugger won’t fix bugs for you, and unless the bug is a crash that actually happened, it can’t even find them. If a debugger could fix – or even just find – all your bugs, that would be almost equivalent to a program that could write programs, because, as mentioned before, preventing, finding, and fixing bugs is the crux of the entire job, and I know it hasn’t been automated, because I still get a paycheck.

What a debugger can do is attach to a running program, let you run it one line at a time instead of all at once, and let you inspect the program’s internal state to make sure it is what you think it is. Additionally, if there is a crash, the debugger can inspect the crash data, sometimes in the form of what’s known as a “core dump,” and tell you what line of code was running when the crash happened, a “backtrace” of how the program got there, and what values were in what variables then.

This is all useful in the debugging process, but not essential. The program ought to be called an “inspector” – or perhaps the “debugger’s companion,” because as a programmer, the true debugger is you, and much of what the “debugger” tool can do, you can do without it as well, using (for example) more verbose log lines and error messages, and just good old-fashioned reasoning power.

So what do you do, when you have a bug? Where do you start?

You might assume that the first thing to do when you see a bug is to try to find out what caused it. But not only is that going to be difficult without some initial steps, it can lead to problems, where you think you’ve got it, you go and do your fix, think it’s better, and actually the bug turns out to still be there.

No, more important than trying to guess what might have caused a bug is figuring out how to tell when the bug is actually fixed. If all we know is “sometimes, the app crashes,” and we change something, and the app doesn’t crash right away, well, is that because we fixed it, or is that because it just happened to be one of those times where the app doesn’t crash? If I had a nickel for every time a programmer thought they’d fixed a bug…

And this is where, although programming definitely is a type of engineering – software engineering – it has an advantage over other engineering fields. With software, we can run the same program over and over again, often almost for free, in a way that we can’t rebuild a bridge or dig a new mine tunnel. With software, often – not always, but usually – we can re-run a program, do a few things, and see if the bug arises again. And then, through experimentation, we can come up with a procedure that allows us to always trigger the bug.

In this way, “sometimes the app crashes” can be refined through experiment to “when you go to the settings page in particular, sometimes, the app crashes” which can then be refined to “when you go to the settings page, and you’re not logged in, and you’re on an iPhone from the past 3 years, it crashes every time.” And now, you have a way of knowing when you’ve fixed it. If you do those exact things, and it doesn’t crash, then you can be confident that your fix actually took.

Refining the conditions in which your bug is a bug, or rather, coming up with a list of instructions to make the bug happen on purpose – ideally as short a list as possible – is known in the business as “reproducing the bug.” And it is the most important skill-set in people who are testing software.

Because if I’ve written some code, and someone else is testing it, and they’ve gotten it to crash, but don’t know how or why it crashed – or especially if they don’t even know what they were doing when it crashed – well, that doesn’t do very much for me. I have no idea where to even start. Because I haven’t experienced any crashes, I can’t look at your crashes. It works on my machine. How can I solve a problem I can’t even see?

So this was a problem for me when I was a lowly iPhone programmer working in a small three-person company. My app would have error messages pop up – and this was frequent, as my boss, who was not a programmer, didn’t let me spend time improving how the app worked unless I was making continuous visible progress. My boss would tell me that he’d gotten an error message while using the app. Can I look into that?

I didn’t know what to do. I always looked into it when I got error messages, and was often able to reproduce them, find them, and fix them, but obviously my boss was better at QAing my app than I was – at least in the triggering bugs department. So I told him so. “I don’t know what to do – I can’t fix it unless you can reproduce it.”

What happened subsequently confused me. My boss would text me, giddy, for some reason acting as if he was winning an argument against me, saying “I reproduced it.” And then he’d send me a screenshot of the problem. He did this repeatedly. He seemed to think “reproducing the bug” meant screenshotting the bug in action.

Later I saw him in person, and he asked me about whether I’d fixed the bug yet. I tried to explain that that wasn’t enough; what I needed was a step by step explanation of how to make the bug happen. And he said something along the lines of, “How can you still not believe me? I sent the screenshot.”

I was flabbergasted. I said, “Wait. I’m not literally doing this as a policy, where I don’t work on it unless you prove to me there’s an actual problem. It’s not because I don’t believe you. It’s not because you have to convince me it’s real before I work on it.”

And my boss responded, with an affected, over-the-top laugh, “Haha, no I get it.” And then paused a second and continued, “But you kinda are though. That’s exactly what you’re doing.”

Apparently, when I said “I can’t,” my boss heard “I won’t.” My boss thought this was about me standing up for myself against potential spurious work, and being overly strict about burdens of proof, rather than me literally asking questions that would make it possible for me to do my job and troubleshoot these issues.

At this point, I was just shocked. How did he think I was going to do my job? Obviously I have to figure out what was going wrong. And obviously – or at least it was obvious to me – that required follow-up questions. Why was he so affronted by my follow-up questions? What did he think fixing the issue looked like? Did he think I’d just say, “Oh, error messages. I must have left some extras in. I’ll go take them out.” No, the error messages were the visible result that happened when the code didn’t know how to proceed to accomplish its tasks, and I had to go deeper in to find out why that was happening.

Luckily, I was able to compose myself, and think quickly on my feet. I knew I wasn’t going to be able to actually explain reproducibility to him – after all, I just had, and he somehow misinterpreted it as insubordination – so I fibbed a little.

The debugger that came with Macs for debugging iPhones looked very sophisticated, and to use it with the app, you had to connect the phone to the computer with a cable. This would allow you to, as I said before, inspect the current program state, and so on. It looked like a very useful tool – and it was. Just not as useful as the human brain, and not something you could use to skip steps.

But even though what I really needed was instructions on how to make the bug happen, I told my boss that what I needed was for the bug to actually happen while the phone was plugged into the debugger.

This served two purposes: It made him believe me that I wasn’t just messing with him, and it gave him a concrete reason to reproduce the bug. Now that he had the goal of making the error message pop up while it’s plugged in, he was able to figure out for himself that the easiest way to do that was to come up with a way to make it happen on purpose.

Now, when he found a new bug, he wouldn’t bring it to me until he was prepared to make it happen while it was plugged in. Some of them, he already knew how to trigger; he just hadn’t heard an adequate explanation for why he should tell me. Other bugs, he would figure out. In either case, once he was done, he would make the bug happen while it was plugged in.

And when this happened, I could either ask him how he got it to happen, casually, while pretending that the more important thing is that it’s plugged in, and while he sees that I’m using a computer and therefore “working” – or, if the steps were inconsistent or unclear, or if we’d just gotten lucky to see an error message while attached to the debugger, I could use information from the debugger to figure out what the program had been doing before the error – and use this, not to fix the bug immediately, but to figure out how to reproduce it myself.

So what did I learn from this experience? Well, even though the most important tool in programming (and bug-squashing) is the human brain, I learned that people, especially non-programmers, are more comfortable when they see you using other, concrete, fancy-looking tools. And if your human brain needs input to complete a task, people might be more likely to give it to you if you pretend the computer needs that input.

For those of us who work primarily with our brains, this can be frustrating and disappointing, and can lead to the need to fib a little sometimes to accommodate these biases.

And additionally, I learned about the depths of the disconnect between different levels of expertise. I thought it was obvious that to fix a problem with some code, you had to understand it at least well enough to make it happen again. This seemed to follow directly from first principles, to be the only logical way that it could work, if you thought about it. But my boss hadn’t thought about it, and didn’t understand this. The gap in our perspective didn’t come from his lack of detailed technical knowledge or specific technologies, but rather from his lack of a developed intuition for how programming works.

A Rust Gem: The Rust Map API

2022-03-12T00:00:00+00:00

For my next entry in my series comparing Rust to C++, I will be discussing a specific data structure API: the Rust map API. Maps are often one of the more awkward parts of a collections library, and the Rust map API is top-notch, especially its entry API – I literally squealed when I first learned about entries.

And as we shall discuss, this isn’t just because Rust made better choices than other standard libraries when designing the maps API. Even more so, it’s because the Rust programming language provides features that better expresses the concepts involved in querying and mutating maps. Therefore, this serves as a window into some deep differences between C++ and Rust that show why Rust is better.

And for this post, specifically, we’ll also be discussing Java, so this will be a three-way comparison, between Java, C++ and Rust.

Reading from a Map

So, let’s talk about map APIs. But before we get to Entry and friends, let’s discuss something a little simpler: getting an item from a map. Let’s say we have a sorted map of strings to integers:

In Java, TreeMap<String, Integer>
In C++, std::map<std::string, int>
In Rust, BTreeMap<&str, i32>

Let’s also say we have a string "foo", and want to know what integer corresponds to it. Now, if we’re always sure that the string we’re looking up is always in the map, then we know what we want: we want to get an integer.

But what if we’re not sure? There are plenty of situations where we want to read a value corresponding to the key – or do something else when that key is not present. Maybe the value is a count, and an absent key means 0. Or maybe the absent key means that the user has made a typo, and needs to be informed. Or maybe the map is a cache, and the absent key means we need to read a file or query a database. In all of these cases, we need to know either the value, or the fact that the key is absent.

Let’s see how this is handled in our three programming languages, and how fundamental design choices in these programming languages lead to such APIs.

Java `get` a (Nullable) Reference

A long time ago, Java made an extreme choice in the name of simplicity: It divided all values into a dichotomy of “primitives” and “objects.” Primitives are passed around by implicit copy, whereas objects are aliased through many mutable references. Objects always have optionality built in – any object reference is automatically “nullable,” which means you can store the special sentinal/invalid value null in it, the interpretation of which varies wildly. Primitives are not optional in this way.

Also for the sake of simplicity, and very relevantly to the topic at hand, generics are only supported for object types, not primitives. That means that map values can only ever be object types. And that means that our map from strings to integers in Java doesn’t use Java’s primitive integer type int, but rather this special wrapper/adapter type Integer, which auto-casts to and from int, and which, like any object type, is managed through mutable, nullable references. (At this point, I for one am beginning to suspect they missed the mark on their simplicity).

So what’s that mean for our map? How do we find out what value corresponds to "foo" in our map, or else that there is none? Well, the method for this is called get, and that returns the value in question if there is one. And when there isn’t? Well, Java here leverages nullability, and returns null when there is no value.

So we can write something like this:

Integer value = map.get("foo");
if (value == null) {
    System.out.println("No value for foo");
} else {
    int i_value = value;
    System.out.println("Value for foo was: " + i_value);
}

So far, so good. But there are problems. And perhaps I’m missing some – now is a good time to take a second, look at the code, and try to imagine in your mind what problems there may be with this system (you know, besides the fact that I have to use i_ as improvized Hungarian notation due to lack of support in Java for shadowing).

You have some? I’ll now list what I’ve got.

Problem the first: The signature of get doesn’t really alert us to the possibility of a value not being in a map. This is the sort of “edge case” that programmers regularly forget to handle; a programmer may know, due to their situation-specific knowledge, that the key ought to be present, and forget to consider that the key might not be.

Compilers of strongly typed languages generally work to ensure that programmers don’t miss edge cases like this, don’t make simple “thinkos” (typos but with thought) or “stupid mistakes.” How’s Java hold up? Well, remember how we mentioned that primitives can’t be null, but these wrapper types like Integer are coercible to primitives? Well, this compiles without a word of complaint from the compiler:

TreeMap<String, Integer> map = new TreeMap<String, Integer>();

map.put("foo", 3);

int foo = map.get("foo");
System.out.println("int foo: " + foo);

int bar = map.get("bar");
System.out.println("int bar: " + bar);

And what happens at run-time? Similar behavior to Rust’s infamous unwrap function. The conversion from the nullable Integer and the non-nullable int crashes when the Integer is in fact null:

int foo: 3
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.TreeMap.get(Object)" is null
        at test.main(test.java:12)

So you might try to fix this by querying if the key exists first:

TreeMap<String, Integer> map = new TreeMap<String, Integer>();

if (map.containsKey("bar")) {
    int bar = map.get("bar");
    System.out.println("int bar: " + bar);
} else {
    System.out.println("bar not present");
}

But now we’ve reached problem the second. Unfortunately, even though this looks like it addresses the issue, this won’t prevent the crash either. There is nothing stopping you from putting a null into the map, so this code also crashes given the right context:

        TreeMap<String, Integer> map = new TreeMap<String, Integer>();
        map.put("bar", null);
        if (map.containsKey("bar")) {
            int bar = map.get("bar");
            System.out.println("int bar: " + bar);
        } else {
            System.out.println("bar not present");
        }

So for a given key in a Java map, there are actually three possible situations:

The key is absent.
The key corresponds to an integer.
The key corresponds to one of these special null-values.

get can distinguish 2 from 1 and 3, but cannot distinguish between 1 and 3. containsKey can distinguish 1 from 2 and 3, but cannot distinguish 2 from 3. To distinguish all 3 scenarios, and handle all the representable values, you need to call both get and containsKey:

if (map.containsKey("bar")) {
    Integer bar = map.get("bar");
    if (bar == null) {
        System.out.println("bar present and null");
    } else {
        int i_bar = map.get("bar");
        System.out.println("int bar: " + i_bar);
    }
} else {
    System.out.println("bar not present");
}

In addition to this precaution not being enforced to the compiler, it leads to problem the third: We are now querying the map twice. We are walking the tree twice with our containsKey followed by get.

At this point, we find ourselves scrolling through the Map methods in Java’s documentation, trying to find a more general solution. getOrDefault might help in some situations – when there’s a value that makes sense as the default. compute might be useful – if we’re OK with modifying the map in the process.

But in general, nothing clean exists to tidy up these problems. And the blame lies squarely on Java’s decision to make almost all types – and all types that can be map values – nullable.

But wait! – you might object – Can’t we just maintain an invariant on the map that it contains no null values? If we have a map without null values, all these issues – well, many of these issues – dry up.

And this is true. Maintaining such an invariant makes for a much cleaner situation. Pretend you aren’t allowed to put nulls in maps, and arrange not to do it.

But, first off, maintaining an invariant like this is easier said than done. Programmers often do this sort of thing implicitly in their head, but it’s much better to comment. Either way, you have to trust future programmers – even future versions of the same programmers – to know about the invariant, either by intuiting it (all too common) or by reading the relevant comment (which, even if there is one, might not happen). And you have to trust them to not intentionally violate the invariant, and also to not accidentally violate the invariant: Are they sure that all those values they add to the map can never be null?

And second off, somewhat shockingly, sometimes people do assign special meanings to null. I said before null has a wide range of meanings, and it’s not uncommon to use null to mean special things. Maybe “not mapped” means “load from cache,” but “null” means “there actually is no value and we know it.” Or maybe the opposite convention applies. null is frustratingly without intrinsic meaning.

For such situations, programmers should probably compose the map with other types or better yet, write custom types that make the semantics of these situations abundantly clear. But let’s not put all the blame on the programmers. If Java had really wanted to protect people from distinguishing these “not mapped” and “mapped to null” situations, Java maps shouldn’t have made the distinction representable at all. It’s bad programming language design to put features in a library that can only be abused, and it’s bad understanding of human nature to then solely blame the programmers for misusing them.

C++: No Nulls No More

So now we move on to C++.

In C++, fewer types are nullable, and non-nullable types like int can be used as the value type of a map. For our map, of type std::map<std::string, int>, we no longer have the trichotomy of “key not present, value null, or value non-null,” but the much more reasonable dichotomy of either the key is present and there is an int, or it’s absent and there isn’t one.

This is, in my mind, the bare minimum a strongly typed language should be able to provide, but after the context of Java it’s worth pointing out.

There are three (3) methods in C++ that look like they might be usable as a get operation, an operation where we either get an int value or learn that the key is absent:

See if you can identify which one is the right one to use.

Spoiler alert! It’s find, the one whose name superficially looks least like it’ll be the right one. at throws an exception if the key is absent, and operator[], the one with the most appealing name, is an eldritch abhomination which we’ll discuss and condemn later.

But all well-deserved teasing aside, find is much better than Java’s get. It returns a special object – an iterator – that can be easily tested to see whether we’ve found an int, and easily probed to extract the int.

auto it = map.find(key);
if (it == map.end()) {
    std::cout << key << " not present" << std::endl;
} else {
    std::cout << key << " " << it->second << std::endl;
}

This is actually pretty good! The -> operator also serves as a signal to experienced C++ programmers that we’re assuming that it is valid: generally -> or * means that the object being operated on is “nullable” in some way.

So when a C++ programmer reads something like this, they have a little bit of warning that they’re doing something that might crash:

int foo = map.find(key)->second;

And certainly, they have more warning than the Java programmer with the equivalent Java:

int foo = map.get(foo);

Of course, this is awkward. find returns an iterator, which isn’t exactly the type we’d expect for this “optional value” situation. And to determine if the value isn’t present, we compare it to map.end(), which is a weird value to compare it to. Nothing about what these things are named is specifically intuitive, and people would be forgiven for using the accursed operator[]. map["foo"] just looks like an expression for doing boring map indexing, doesn’t it?

And what does operator[] do, if the key isn’t present? It inserts the key, with a default-constructed value. No configuration is possible of what value gets inserted, short of defining a new type for the object values. This is sometimes what you want – like if your value type has a good default (especially if you defined it yourself), or if you’re about to overwrite the value anyway. But in most cases, you want some other behavior if the value is not present – operator[] doesn’t really tell you that it inserted the item, so if you need to make a network query or read a file or print an error, you’re out of luck. operator[], as innocuous as it looks, has surprising behavior, and that is not good.

But all in all, as far as getting values goes, as far as querying the map goes, C++ is doing OK. Solid B result on this exam, I think. Decent work, C++. Especially since we just looked at Java.

The Rust `Option`

So now on to Rust: we want to query our BTreeMap<&str, i32>.

(Or… it might be a BTreeMap<String, i32>, depending on whether we want to own the strings. This is a decision we also have to make in C++ (where we could have used string_views as the keys), but do not have to make in Java. At least in Rust, we know that whichever decision we make, we will not accidentally introduce undefined behavior. But that’s a distraction!)

So let’s apply the same test to Rust as we’ve applied before. Here, the method in question is given an obvious name, get rather than find. So let’s see how it does in our test, of allowing us to read a value if present, but know if not:

if let Some(val) = map.get(key) {
    println!("{key}: {val}");
} else {
    println!("{key} not present");
}

See, get returns an Option type. Therefore, unlike in C++, we can test for the presence of the value and extract the value inside the same if statement. Unlike in C++, the return value of get isn’t a map-specific type, but rather the completely normal way to express a maybe-present value in Rust. This means that if we want to implement defaulting, we get that for free by using the Option type in Rust, which implements that already:

// Let's say missing keys means the count is 0:
let value = *map.get("foo").unwrap_or(&0);

Similarly, calling is_none() or pattern-matching against None is much more ergonomic than comparing an iterator to map.end(). It requires some more intimate knowledge – or some follow-up reading – to learn that the concept of “end of collection” and “not found” are for various reasons combined into one in C++.

So while C++ avoids the problematic elements of Java maps, Rust does so more ergonomically, because it has a well-established Option type. C++ now has one as well, std::optional, but it hasn’t yet reached its map API, because it was only added very recently, in C++17.

And Option integrates even better than std::optional with the programming language, because Option is just a garden-variety sum type, a Rust enum, which lets you do things like if let Some(x) = ..., and combine testing and unpacking in the same statement. C++ could not design a map API this ergonomic, because they lack this fundamental feature.

Also, unlike with null in Java, if you want to use Option as a meaningful distinction in your map, you still can. The get function would then return Option<Option<...>> instead of just Option – the outer one representing presence, the inner one representing whether the value was None or Some(...). Option is composable in a way that null is not.

For the record, the Rust equivalent to operator[] – the Index trait implementation on maps – does the equivalent to C++ at, and panics if the key isn’t present. While not as generally useful as get, I think this is a reasonable interpretation of what map["foo"] should mean.

Mutation Station

So Rust wins, I’d say pretty handily, when comparing how to access a value from a map, how to query them. But where Rust truly shines is when mutating a map. For mutation, I’m going to approach the discussion differently. I’m going to start by specifying what use cases might exist, and then, in that context, we can discuss how an API might be built.

The mutation situation has a similar dilemma to querying: the key in question might or might not already be in the map. And, for example, we often want to change the value if the key is present, and insert a fresh value if the key is absent.

Of course, we could always check if the key is present first, and then do something different in these two scenarios. But that has the same problem we already discussed for querying: We then have to iterate the tree twice, or hash the key twice, or in general traverse the container twice:

auto it = map.find(key); // first traversal
if (it != map.end()) {
    return it->second;
} else {
    int res = load_from_file(key);
    map.insert(std::pair{key, res}); // second traversal
    return res;
}

So what should we do for our API for this scenario, where we want to change the value if the key is present, and insert a fresh value if the key is absent?

Well, sometimes that fresh value is a default value, like if we’re counting and the key is the thing we’re counting – in that case, we can always insert 0. In that case, C++’s operator[] – when combined with an appropriate default constructor – can actually work well.

And sometimes, that fresh value depends on the key, like if the value is a more complicated record of many data points about the item in question. If the value is a sophisticated OOP-style “object,” and the key indexes one of the fields also contained in the value, C++’s operator[] would not work. The default value is a function of the key.

And sometimes, there isn’t a default value per se. Sometimes, if the key is absent, we need to do additional work to find out what value should be inserted. This is the case if the map is a cache of some database, accessed via IPC or file or even Internet. In that situation, we only want to send a query if the key is not present. We would not be able to accomplish our goals simply provide a default value when sending the mutation operation.

C++ doesn’t have anything for us here. operator[] is pretty much its most sophisticated “query-and-mutate” operation. Java, somewhat surprisingly, does have something relevant, compute. This handles all of these situations, with a relatively unergonomic callback function – and as long as your map never contains nulls.

Rust’s solution, however, is to create a value that encapsulates being at a key in the map that might or might not have a value associated with it, a value of the Entry type.

As long as you have that value, the borrow checker prevents you from modifying the map and potentially invalidating it. And as long as you have it, you can query which situation you’re in – the missing key or the present key. You can update a present key. You can compute a default for the missing key, either by providing the value or providing a function to generate it. There are many options, and you can read all of them in the Entry documentation; the world is your oyster.

So the C++ code above can be ergonomically expressed as something like this in Rust:

let entry = map.entry(key.to_string());
*entry.or_insert_with(|| load_from_file(key))

And the idiom where we’re counting something could be expressed something like:

map.entry(string)
    .and_modify(|v| *v += 1)
    .or_insert(1);

So we get this nice little program that counts how many times we use different command line arguments:

use std::collections::BTreeMap;
use std::env;

fn count_strings(strings: Vec<String>) -> BTreeMap<String, u32> {
    let mut map = BTreeMap::new();
    for string in strings {
        map.entry(string)
            .and_modify(|v| *v += 1)
            .or_insert(1);
    }
    map
}

fn main() {
    for (string, count) in count_strings(env::args().collect()) {
        println!("{string} shows up {count} times");
    }
}

Conclusion

So first off, Entrys are super nice, and neither Java nor C++ has anything anywhere near as nice. Even when it comes to just querying, Rust’s get is much better than Java’s get, and a little more ergonomic than C++’s find.

But this isn’t an accident. This isn’t just about Rust’s map API having a nice touch. When we look at the definition of Entry, we see things that Java and C++ can’t do:

pub enum Entry<'a, K, V> 
where
    K: 'a,
    V: 'a, 
 {
    Vacant(VacantEntry<'a, K, V>),
    Occupied(OccupiedEntry<'a, K, V>),
}

First, this is an enum: There’s two options, and in both option, there’s additional information. Of course, Java and C++ can express a dichotomy between two options, but it’s a lot clumsier. Either you’d have to use a class hierarchy, or std::variant, or something else. In Rust, this is as easy as pie, and since it does it the easy way, you can not only use the various combinator methods in Rust, you can also use Entrys with a good old-fashioned match or if let to distinguish between the Vacant and Occupied situation.

Second, there’s a little lifetime annotation there: 'a. This is an indication that while you have an Entry into a map, Rust won’t let you change it. Now, in Java and C++, there’s also iterators, which you may not change a map while you’re holding, but in both those languages, you have to enforce that constraint yourself. In Rust, the compiler can enforce it for you, making Entrys impossible to use wrong in this way.

Without both of these features, Entry would not have been an obvious API to create. It would’ve been barely possible. But Rust’s feature set encourages things like Entry, which is yet another reason to prefer Rust over C++ (and Java): Rust has enums (and lifetimes) and uses them to good effect.

Addendum

I wanted to address a few points that people have raised in comments since I posted this.

Some people have pointed out that C++ has insert_or_assign, but in spite of the promising name, it just unconditionally sets a key to be associated with a value, whether or not it previously was. This is not the same as behaving differently based on whether a value previously existed, and it is therefore not relevant to our discussion.

More interestingly, it has been pointed out to me that with the return value of insert, you can tell whether the insert actually inserted anything, and also get an iterator to the entry that existed before if it didn’t. This allows implementing some, but not all, of the patterns of Entry without traversing the map twice.

For example, counting:

int main(int argc, char **argv) {
    std::vector<std::string> args{argv, argv + argc};
    std::map<std::string, int> counts;

    for (const auto &arg : args) {
        counts.insert(std::pair{arg, 0}).first->second += 1;
    }

    for (const auto &pair : counts) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }

    return 0;
}

This works, but is much less clear and ergonomic than the Entry-based API. But perhaps more importantly, this functionality is much more constrained than Entry, and is equivalent to using Entry with just or_insert, and never using any of the other methods. As another commentator pointed out, counting is possible with just or_insert:

*map.entry(key).or_insert(0) += 1

But counting is just one example. C++’s insert is still deeply limited. Using C++’s insert means you have to know a priori what value you would be inserting. You can’t use it to notice that a key is missing and then go off and do other work to figure out what the value should be. So you can’t do my load_from_file example.

In order to do the load_from_file example in C++, even with this use of insert, you would have to temporarily insert some sentinal value in the map – and that goes against how strongly typed languages ought to work, in addition to breaking the C++ concept of exception safety.

This is, as was pointed out in another comment, exactly what C++ programmers sometimes have to do, to meet performance goals, at the expense of clarity and simplicity, and therefore, especially in C++, at the expense of confidence in safety and correctness.

Biking to Philly

2022-03-07T00:00:00+00:00

I am out of biking shape. I know I am out of biking shape. The pandemic has not been good to my physical fitness. (For the record, this isn’t a proper edited and outlined and triaged essay, just some notes on my past weekend.)

But as out of shape as I am, I also know it’s only 25 miles from here to Philly on the Schuylkill River Trail, and so I figured maybe I could do it without any additional prep. When I found out that it was less hilly than the longer bike rides I used to do, I was sold, and I did it.

And I got there, with many breaks, in much more time than it should have taken. Now I know how out of biking shape I am, and I can work on it. And I was fortunately able to cancel the trip back and take my bike on the train back with me (which a woman asked me if she could take a picture of to prove to her husband it was possible).

First off, this trail is perhaps some of the safest cycling – especially on a per mile basis – that I’ve ever done. It’s fully protected, almost entirely paved (and the unpaved bits are fine), and only interacts with cars for a very safe 3 miles in Manayunk.

In fact, the safety and the lack of interaction with traffic presented an unexpected problem to me: parts of the ride let me zone out enough that I would get a little… bored. This meant that I would focus less, which meant I would slow down, which meant I would get even more bored. I should have a podcast with me, which means I’ll need to take a battery pack with me so my phone doesn’t die as I listen to it. Lesson learned.

But with all this safety, I was so confused to see so many people biking in full gear, with reflector shirts and their lights on during the daytime. Like, are you worried about getting hit by a deer?

I’m used to biking in NYC, interacting with the traffic all the time, and this helps my anxiety, as you’re forced to pay very close attention to actual potential dangers. This was more just… exercise.

There were some fun surprises! Somewhere around the halfway point, I saw a sign for “The Tricycle,” and much to my pleasant surprise, there it was! A bicycle shop and cafe! I had a coffee and a delightful conversation with some of the other patrons. It was super refreshing.

When I got to my destination, I sounded like I was dying. I was coughing like an elderly smoker with the flu, and I don’t even smoke anymore. What is even the point of not smoking if you’re still going to cough like that? (I’m kidding! Kidding!)

Biking in Philly itself was a different story. Cars drive faster in Philly than in NYC, there’s fewer traffic lights and more all-way stops, which is terrifying, and I had a minor crash, a “trolly track spill,” because it was raining and my wheel got caught in the trolly tracks.

I was fine, and it was only a $11 repair (with tax) to re-true the wheel and readjust the breaks, but oy gevalt! (und “Gewalt” meine ich Buchstäblich…)

So in conclusion: I should do this way more, because this is a great way to get some exercise and some sociality in on a weekend! Hopefully next time I can bike back too.

Crank-'em Out

2022-03-04T00:00:00+00:00

For a time, I tried to cultivate an interest in Go. Not this Go, but this Go. The interest didn’t last long – like chess, I had a hard time getting up to even a fairly basic level of competence. And I quickly developed another enthusiastic interest to replace it – sometimes, an interest just doesn’t work out, and it’s nobody’s fault, and you have to just move on and not get too sad, because there’s plenty of fish in the sea.

But one memory from this particular interest sticks out. I was reading a list somewhere on the Internet about Go etiquette, specifically the etiquette of playing with a more advanced Go player. Some of it was obvious – or at least obvious if you thought about it – like showing gratitude that they’re playing with you and remembering that they’re doing you a favor.

But there was one piece of advice that was less obvious: Don’t spend too much time making a move. Basically, this came out to “don’t waste their time,” but what it also came down to was, “your thoughts won’t help you.” They gave an example of a move someone had thought for 5 minutes before making, as if it was an obviously horrible move. It was not at all obvious to me why it was horrible, and I realized it was a Dunning-Kruger type of situation.

Basically, beginners at Go will often think very hard about their moves, trying to make the most out of their time with an expert, trying to not lose the game. But they don’t know how to think very hard about their move – their thoughts are unsupervised, and likely to miss something an expert would find very obvious. Their thinking time will be trying to avoid an outcome that is unlikely, and miss an easily-avoidable doom.

Rather than thinking beyond their actual skill level, their time is better spent getting through the game, and spending a more proportionate amount of time thinking. They will learn more this way, certainly more per minute of time spent – and in the meantime, they will avoid annoying a potential future mentor.

When I read this advice – and this was over 10 years ago – a light-bulb went on in my head. I realized this wasn’t just about Go, but about any skill. Perfectionism in early stages can be a self-defeating game.

A sign in my high school orchestra room boldly proclaimed: “Practice doesn’t make perfect; perfect practice makes perfect,” calling out people who just blew through exercise after exercise or song after song without actually trying to improve or find problems to fix. And that is truly a problem. But the opposite problem is also possible, where you work too hard on fixing all the problems in something, where you try to make your practice too perfect.

For another example: A friend of mine was once trying to learn German from scratch, using DuoLingo. I wanted to help her, saying, “try pronouncing these words aloud” or “read along with the lyrics to this song,” just trying to get her used to how letters corresponded to sound, and pick up a few words, and gain some familiarity with how they might be used, but she wasn’t interested in that, because she wouldn’t be able to perfectly understand everything.

Instead, she was stuck, completely stuck, on repeatedly saying very simple phrases into DuoLingo, trying to make her accent 100% perfect, saying, over and over again, for hours, “Das Frühstück hier ist lecker, oder?” until the phrase, with her particular pronunciation of it, was entirely drilled into my brain. She couldn’t hear what was wrong with her pronunciation, and she didn’t believe me that she was focusing on the wrong things. She didn’t believe that her pronunciation would get better with time, that her time was better spent moving on and learning more things to say. Besides, I said, it’s a far more practical skill to speak German with an accent than to say one or two phrases flawlessly.

But more importantly, you can’t get flawless pronunciation without learning more German. Needless to say, my friend’s approach made the process completely unengaging, and she soon gave up. But even if she had persisted, she would never have gotten it. Accents need to “click,” and that simply was not going to happen with only the one sentence she kept repeating in her repertoire. If it did, it would be imitation, and she would have no understanding of how those sounds changed in other contexts. She was insisting on starting with an imaginary, useless skill.

So how does this apply to me?

Well, this blog is finally moving again. And one of the (many) things I had to overcome in getting it moving is spending too much time trying to make each post “just right.” It’s not so much quantity over quality per se, but recognizing where there were diminishing returns, at my current skill level, to the work I was doing, and where posting (or not) and moving on to the next post would get me much better bang for my buck. They say that you need to write a million words of garbage before you’re a good writer, and so far my blog only clocks in at around 80K, which I think means I’m not quite there yet. But even when I’m further along, I’ll need to use my time where it’s most productive, rather than just spinning my wheels or trying to play 4D chess with my flawed mental model of my reader.

It’s all about the time management.

Which brings me to a related time management principle for me: Having lots of projects in flight at once. This one probably is more specific to me and people whose brain works more like mine (not just ADHD but perhaps even my particular flavor of ADHD), and it comes with the caveat that it only works for me when I have a well-maintained organizational system, but it’s a principle I’ve found super useful personally.

The reason to have multiple projects in flight at once, to be clear, is so that you can choose a project based on where you are at the time, and therefore always be making progress.

This has led to conflict in work environments. I remember when I was working as a web (frontend!) developer between junior and senior years of college, and there were dozens of tickets. I would pick off easy tickets sometimes in favor of the higher-priority harder tickets, especially early in the week, early in the day, or when switching to a new part of the codebase. I would do this fundamentally as a warm-up, a way to still be productive while I was gearing up and building up momentum.

My supervisor at the time called me out, told me I was supposed to focus on the higher-priority tickets. I told him that I saw them, had looked at them, didn’t know immediately how to do them, and instead of actively wracking my brain trying to figure it out, I sort of let myself process them in the background while doing some easy tickets. I told him that the alternative was taking just as long (maybe slightly shorter but honestly maybe even slightly longer) with the harder tickets, but not getting any easy tickets done in the meantime. Additionally, wasn’t I getting the harder tickets done at a reasonable enough rate?

I don’t know if he really believed me, but at the end of the summer they wanted me to take a break from college to work with them during the year, so it couldn’t have gone too poorly. I’ve since gotten better at managing my focus in a work setting, designing plans and TODO lists that work well for how I focus, and communicating pro-actively with supervisors.

But regardless of how well this principle applied (or didn’t) at that job, or at jobs since then, it applies super well to this blog.

I currently have, at the time that I am writing this paragraph, 4 on-going drafts for blog posts (3 non-technical, 1 technical). I have notes for 18 essays, notes and/or additional drafts for 7 fiction projects, and notes (ranging from spattered ideas to relatively complete outlines) for 23 tech blog posts.

When I have an idea – sometimes a new one, sometimes a thought that fits into an existing one – I can write it down in the appropriate place in my notes. When it’s time for me to write – I have goals on how many posts to finish per month – I can go look through the notes for one that is ready to work on turning into actual prose. And when I do, I can be picky: I can find something that I’m in the mood to work on right now.

If none of them really are ready to start cranking out prose, I can start developing one of the outlines a little more, which is often necessary. Every once in a while, even if I don’t plan on writing prose yet, I find myself thinking of good ideas, and start scrolling through the lot of them, tweaking notes, adding in detail, until they’re ready to mature.

This has been a really good system for me, and has allowed me to split up what can seem like super monumental tasks into actual achievable chunks. If I can’t stay focused on a single piece long enough to write it, I can put it on the back burner, come back to it, and still be producing content in the meantime, sometimes from newer ideas, but often from pieces that I previously put on the back burner.

I’m looking forward to seeing how well it works for fiction. I’ve still not fully dusted off my fiction projects, though I hope to soon. But I hope that the non-fiction, essay writing type of stuff – and maybe even the tech writing – still count towards that 1 million words, and still help me become a better fiction writer when I get back to it.

In the meantime, I’ll keep crankin’ out the posts.

The Good Ol' Days of QBasic Nibbles

2022-02-28T00:00:00+00:00

Let’s talk about an ancient programming language! I think we can all learn things from history, and it gives us grounding to realize that our time is just one time among many, to see what people in the past did differently, what they got wrong that we would never do now, and also to see what they got right.

Do you remember MS-DOS? Do you remember that it came with an interpreted programming language? From MS-DOS 5 onwards, it came with not Python, not Javascript or R or Matlab, but a dialect of BASIC. But I think most people, especially most people my age who were children at the height of the MS-DOS era, remember it for the games, the two sample programs that came with it, namely Gorillas and Nibbles (their name for Snake).

Nibbles is extra near and dear to my heart because not only is it the game that I better enjoyed, but more interestingly because it’s the first “large program” that I ever did work on (for me as a child, “large” meant multiple subroutines), and the first existing program I ever modified.

So recently, I tried to see if I could find it. And indeed, I could. I just needed DosBox, the QBasic interpreter (you want QBasic EN 1.1), to run it. After that, you just need the program itself, after which, you can throw them in a directory, “mount” it from inside DosBox, and run QBASIC.EXE and use its very discoverable interface (by 90’s standards).

It looks a little less impressive in such a small little emulation window, but of course at the time it took the entire screen of an entire CRT monitor, and was the best technology available for me to interact with.

Nibbles was a sample game designed for you to learn to program as well as having fun with. True to its time, it had a little set-up interface where you answered questions in a very basic prompt-and-respond TUI before you could start playing:

You ate numbers going from 1-9, which were easy to display – the program, though a video game, runs in text mode! – but at the time I just appreciated that it helped you keep track of how far along in the level you were. So I decided to take a look at the code and discuss it a little bit.

The first thing that struck me was how short it was – at 721 lines, this is a rather short source file, a “simple” module or class, let alone a whole program! I suppose things do seem bigger when you’re a kid.

But also, I didn’t view it as one block of size-12 text on a high-resolution monitor. I read it in QBasic’s built-in code browser where it showed up as 14 different logically separate parts, at the time an overwhelming number:

And this is then what the subroutine would look like:

Code browsers are great, and this interface is a solid reminder that subroutines are a very early form of modules, especially given that in QBasic, these subroutines could contain their own sub-subroutines using the more traditional GOSUB command.

So let’s talk about this programming language and program that once people used to get real work (and real play) done.

First off, we see some mutable global variables, a big no-no by modern standards, but can you really blame them when their scope is no larger than that of a small modern class, where the fields would be effectively global within the context of an instance?

But also, to my pleasant surprise, there were also some global constants, and they are marked as such, with the CONST keyword. In fact, as we see in multiple places, QBasic is actually strongly typed, sometimes even using the sigils, for which the BASIC family is infamous.

The “B” in BASIC stands for “beginner,” and that is exactly the target audience QBasic was designed for. So it’s really refreshing that in the past they didn’t have this notion that types were too advanced for novices, or perhaps too tedious, that an easy-to-learn programming language wouldn’t have you declare them.

Or, of course, maybe duck-typing was seen as too difficult or inefficient to implement. But in that case, why did they have what I imagine would be an equally difficult compromise measure, alphabetically-based type defaulting.

To be fair, for a long time I had no idea what DEFINT even meant, but DEFINT A-Z certainly seemed like an appropriately mysterious and even badass way to start a subroutine, a magical invocation, covering the ends of the alphabet to start off each page of code.

Obviously, QBasic is not object oriented. Its fundamental notion of module isn’t a class, but rather a subroutine or function. These two notions were distinct: functions returned values (like in math) – though they could also have side effects – and subroutines did not. (Both had strongly-typed arguments, however).

This might seem an odd distinction to make, but it makes sense at a certain level. Especially syntactically, subroutine calls definitionally must be the top-level construct of a statement. And lo and behold! – they do not require parentheses around their arguments whereas functions do.

There’s really no reason not to do something like that in Rust, come to think of it. And come to think of it, Haskell makes a vaguely similar distinction, where if what others would call a “function” does IO and does not take arguments, it’s not a function at all, but a special value known as an “action,” which can then only be called in certain contexts.

So what did I do with this? I added more action keys. I added keys to speed up and slow down gameplay on command, so that if you pressed the arrow in the direction the snake was currently going, instead of doing nothing, it sped up the snake. Pressing the opposite direction of where you were going would then slow it down. And then, I wrote new levels, using the existing levels code as a baseline.

And then, after that, I began to attack the main subroutine’s main loop. I thought it would be cool if multiple numbers could be on the screen at the same time, but this required modifying how the location of the numbers were stored, replacing the two variables indicating their current location with a two-dimensional array of boolean values (represented by 0 and -1 – integer/boolean distinctions were not yet well-established).

I wish I still had the code. But more importantly, I’m grateful that the Microsoft of the ’90s, as evil and monopolistic as it was, saw the need to put a programming language, a little IDE, and some sample programs and include them with their operating system. Bill Gates was my hero when I was a small child – before I knew what anti-trust was – and the fact that Microsoft made sure that computers came with plenty of fun corridors for me to explore was a huge part of why.

But also, there was no particular reason why the stuff I was doing couldn’t be done by any other elementary schooler, if there were interest in the schools in teaching it. Variables in programming are far more concrete in their meaning than variables in algebra – for one thing, their values actually vary with time, which made me think the variables in algebra were a bit of a misnomer.

And yet, programming isn’t even a required course in most American high schools. And that, I think, is a real shame. I understand that most schools don’t have the resources to do a good job of it, and that also, honestly, is a real shame.

Warnings and Linter Errors: The Awkward Middle Children

2022-02-25T00:00:00+00:00

What is “bad” Rust?

When we say that a snippet of code is “bad” Rust, it’s ambiguous.

We might on the one hand mean that it is “invalid” Rust, like the following function (standing on its own in a module):

fn foo(bar: u32) -> u32 {
    bar + baz // but baz is never declared...
}

In this situation, a rule of the programming language has been violated. The compiler stops compiling and does not output a binary. In fact, it has to stop compiling, because this is not a Rust program. It might resemble one, but it in fact does not make any sense, because it is violating one of the extra-syntactic constraints that text has to have to be a Rust program.

What would it even mean, to access a variable that’s not declared? When you write a variable access, the compiler issues an access to the corresponding register or location in memory. When a variable is undeclared, no such location exists. The compiler couldn’t compile this code if it wanted to!

On the other hand, there’s this sort of “bad Rust” as well:

fn foo(bar: bool) -> &'static str {
    match bar == false {
        true => "false",
        false => "true",
    }
}

This code is – as the kids say – cringe. Whatever this code is trying to do, it should not be done this way. But for all its flaws, it’s definitely “good” Rust in a validity sense: the compiler knows exactly what to do to output a binary from it, and will do so with no complaints. Whatever is “bad” about this code – and it’s a lot – is bad from a human perspective only; the computer doesn’t even notice. It’s bad idiomatic Rust, not erroneous invalid Rust, and it’s bad because humans prefer not to structure their concepts this way.

So now we have a nice little dichotomy of problems with a Rust program. On the one hand, we have errors, where the compiler will not – cannot, even – produce an output. And on the other hand, we have idiomatic failures. It’s a nice neat tidy distinction that a lot of people make, but in the context of Rust – and with most programming languages – it’s actually problematic, because problems with programs, like gender or political views, don’t actually quite form a tidy binary. And as with gender and politics, oversimplifying types of “bad Rust” into a binary, even conceptually, can lead to practical problems.

I am, of course, talking about warnings and linter errors – those rules that if you violate them, it won’t necessarily cause the compiler to reject the program, but it may, depending on its settings. I’m also talking about things like safety rules, where if you dereference pointers the compiler will normally reject your program, but it can be told not to on a block-by-block basis.

Here’s an example of that, for Rust:

fn foo() -> u32 {
    return 3;
    println!("Not reached!");
}

The compiler knows that that the println can’t be called, and it makes a point to tell the user about it:

warning: unreachable statement

But more on those later. For right now, we’ll continue to try and brush these warnings under the rug.

The Binary Error Model

I call the philosophical framework I am criticizing the “binary error” model, and before I start picking apart at it and denouncing it, I’d like to spend some time explaining what I mean by it, and why it’s appealing.

So to talk about “the binary error model,” as I’ve termed it, we’ll start by talking about why it exists, what problem it’s trying to solve. It’s trying to distinguish between a notion of the programming language in itself, as a platonic ideal almost, versus the other things that surround it – like a reference implementation, or a set of community norms. What would belong in a formal specification, and what not? What would have to be the same for another compiler to also be a Rust compiler?

In the “binary error model,” Rust, or any programming language, is a set of valid programs and their semantics. You could look at it as being analogous to a Rust function with this signature:

fn rust_programming_language(program: SourceTree) -> Option<Semantics>;

SourceTree in this context is a directory hierarchy of properly organized Rust code at some level of organization, maybe a crate. Semantics is a little harder to define – it’s an abstract notion of what the program “does,” a representation of the platonic essence of what the program should output (meaning, in this context, any observable behavior) given a set of inputs (meaning, in this context, any information the program can observe).

So this definition is to say, the Rust programming language, in general, can be thought of, philosophically, as a function from source trees to specifications of concrete behavior. Since this isn’t an actual Rust function, we can handwave those specifications a bit, and discuss them in English or a formal model of our choice.

And this is a coherent way to talk about Rust, a philosophical abstraction with practical applications. For example, if we were comparing two Rust compilers, trying to find out if they implemented “the same programming languages,” we could use this model as our criterion.

So, to find out whether two Rust implementations both implement the same programming language, we use this function signature as our guide: Given the same source tree, do they output programs with the same semantics, the same concrete interaction with the outside world?

There are a lot of things that can be different between implementations:

Do the programs, as compiled by these two different implementations, print out the same values when given the same inputs
Do the programs write the same data to the disk?
Do they panic in the same situations?
Do they have the same FFI characteristics to interact with a C library?
Do they have the same asymptotic complexity? (For a systems programming language, we definitely want to include this under “semantics”)
Do they have the same memory model for internal inter-thread interactions?
Do they make the same safety guarantees?
Do they accept and reject the same set of programs?
Do they print the same exact error messages?
Do they issue warnings on the same set of programs?
Are the two compilers invoked by the same command?
Is one of the compilers actually an interpreter?
Do they target the same processor architecture?
Do they output the exact same binaries?
Do they run with exactly equal performance?

Obviously, different implementations will differ in some of these ways. But we do need some way of defining whether two compilers both implement Rust, rather than one implementing Rust and one implementing Go, or one implementing Rust and the other one not quite succeeding at implementing Rust.

In the model, as we’ve defined it, the question comes down to whether accepted programs have the same semantics (but not form) and whether the set of accepted programs are the same. This means that, of the above questions, they stop mattering after “do they accept and reject the same set of programs?” That is where the binary error model draws the line.

To apply this model, the relevant part of a compiler is that it implements something like this:

fn rust_compiler(program: SourceTree) -> Option<CompiledProgram>;

And then, you could compare two compiled programs based on their semantics.

This model could also be useful for writing a formal specification of the Rust programming language (no, “the compiler itself” doesn’t count as a specification), and for programming languages that have a formal, written specification, it is couched in terms of something similar to this model – but not necessarily exactly.

Warnings and Errors

Let’s take another look at our abstract “function signature” for the Rust programming language:

fn rust_programming_language(program: SourceTree) -> Option<Semantics>;

We have so far been glossing over a feature of the return type, Option. But that is what makes this particular model the “binary error” model, and that’s what I’m going to be criticizing, so let’s discuss it now.

Some source trees are not Rust programs. Some are, in fact, Go programs, or directories full of plain text files, or random binary data. Some, on the other hand, are almost Rust programs, like the example from above:

fn foo(bar: u32) -> u32 {
    bar + baz // but baz is never declared...
}

This model treats all of these programs equally. From the perspective of this abstract function, these all return the same value, None. Which means, from the perspective of this philosophical perspective, all of these are the same: not a valid Rust program.

If we’re comparing two implementations of Rust, this model therefore considers these statements to be irrelevancies:

Do they generate the same error messages?
Are their error messages equally relevant to the problem?
Are their error messages equally comprehensible to a beginner programmer?

These things, however, are still relevant:

Do they reject the same source trees?

In fact, a single program accepted by one and not by the other would make these two compilers implementations of different programming languages.

And what about warnings? This abstract function signature barely has room for errors, flattening them all to None. The complexities of the ways in which a Rust program might be bad are simplified to a binary: it is or is not a valid Rust program. Warnings are rounded to “it is valid.”

So in the “binary error” model, where the “return value” of the abstract function for the programming language is just Option<Semantics>, this function falls into the “valid Rust” side of the binary:

fn foo() -> i32 {
    let Foo = 3;
    Foo
}

This is considered to be the case, even though the standard Rust compiler outputs a warning for it:

warning: variable `Foo` should have a snake case name
 --> test.rs:2:9
  |
2 |     let Foo = 3;
  |         ^^^ help: convert the identifier to snake case (notice the capitalization): `foo`
  |
  = note: `#[warn(non_snake_case)]` on by default

warning: 1 warning emitted

So what’s going on here?

Well, in point of fact, our compiler implementation does not implement Option<CompilerError> as its conceptual return value. Its contract looks more like this:

fn rust_compiler(program: SourceTree) ->
    (Result<CompiledProgram, Vec<ErrorMessage>>, Vec<Warning>);

But when we compare the compiler to other compilers in the “binary error” model, we pretend instead the compiler was wrapped in this wrapper:

fn rust_compiler_for_comparison(program: SourceTree) -> Option<CompiledProgram> {
    let res = rust_compiler(program);
    let (res, _) = res; // strip warnings
    res.ok() // flatten errors, did it compile or not?
}

In this model, only the parts that are part of our original rust_language function truly are part of the Rust programming language. Only the rules that would cause every hypothetical compiler to reject the program are part of the programming language. This warning is “just the compiler’s opinion, man.”

It’s as if the compiler had two jobs: compiling the Rust programming language (defined as including a binary distinction between valid and invalid programs) and separately a linter, which tells you the compiler-writer’s opinions about what might be considered wrong with the code.

And this is a self-consistent way to think about Rust and about programming languages. It has practical applications: It gives you a definition of when two compilers implement the “same” programming language, and it allows you to define a formal specification for Rust – or to imagine an abstract formal specification, if you so choose, and use this notion to think about how your Rust code might fare under alternative implementations of the programming language.

Alternatives to the “Binary Error Model”

There is no coherent way to say that this way of thinking about Rust is wrong, per se. It is a philosophical perspective, a definition of what concepts (like type safety) are part of the “programming language” and the “programming language specification” (even if none has been written) and what concepts are not, what concepts (like using snake case) are just opinions and conventions outside of the scope of the programming language.

But on the other hand, we are not forced to assume this model. As it is a definition of what is part of the “programming language,” we are free to use a different operating definition. As it is a scope for what goes in the “programming language specification,” the Rust community is free to write a formal specification with different scope.

And I think we should, when that time comes, use a different scope. I think that the people in charge of writing the spec come to it, they will use a different scope rather than strictly following the definitions explained here. Because even though the “binary error model” isn’t wrong, per se, I think it is, nevertheless, harmful.

I not only think if a formal Rust specification is written, it should not use this model. I think people should not assume this model. I think it will lead to mistakes in your thinking. I also think that, if you do assume this model specifically in Rust, you have to do a lot more mental work that can be saved by asserting a different model.

So what’s the alternative? Well, our original definition of a programming language did two things. It determined if the program was valid (a binary up-down decision), and it mapped each valid program to its semantics.

An alternative model would not make validity so binary. If we do this in the most straight-forward way, we get something like this:

fn rust_programming_language(program: SourceTree) ->
    (Result<Semantics, Vec<Errors>>, Vec<Warning>);

This loses a few of the nice properties that we had in the previous definition. “Valid Rust programs” is no longer a straight-forward set. Instead, we have a potential multiplicity of sets distinguished by this definition:

Programs that compile
Programs that compile without warnings
Programs that compile without a specific warning we may care about
Programs that don’t compile but only have one error
Programs that don’t compile but only have one category of error

Also, this definition imposes more on the writers of alternative implementations. Suddenly, a compiler is only a valid Rust compiler if it outputs the exact same list of errors and warnings, given an input program.

This seems to me a little too strict. I don’t think the exact wording of an error or warning should necessarily matter or be part of a programming language spec. And compilers regularly stop compiling after experiencing too many errors (where too many can sometimes be one), and implementations would reasonably differ about which errors they would output before giving up.

But I think it’s a good starting point, and in any case much better than the binary-error Option<Semantics> model. Part of the benefit of Rust as a programming language is how much work has gone into its warnings. For an alternative implementation to claim to be Rust without having the same warning system would strike me as extremely misleading. Warnings – obligatory warnings – should be included in any language spec.

Many important Rust safety features are actually warnings. Ignoring #[must_use] is technically a warning – just set to #[deny] by default. A function that has dead code after a return statement: this is a warning, but also a serious correctness issue.

Rust Warnings are Complicated

And of course any Rust implementation would have to include warnings. Just as C (in practice) has a #warning directive, which causes the compiler to issue warnings, Rust has a number of annotations that control the issuance of warnings.

For example, if we add an annotation to our function from before:

#[deny(non_snake_case)]
fn foo() -> i32 {
    let Foo = 3;
    Foo
}

… the warning becomes an error:

error: variable `Foo` should have a snake case name
 --> test.rs:3:9
  |
3 |     let Foo = 3;
  |         ^^^ help: convert the identifier to snake case (notice the capitalization): `foo`
  |
note: the lint level is defined here
 --> test.rs:1:8
  |
1 | #[deny(non_snake_case)]
  |        ^^^^^^^^^^^^^^

error: aborting due to previous error

Any Rust specification, even one with the binary error model, would therefore have to include:

The rules about snake case (variables should have snake case)
The rules about annotations (so that #[deny(...)] triggers an error)

This means that, even if we did imagine a specification where only errors were in scope, rules for warnings would have to also be in that specification, because they can be configured to become errors. And at that point, why not also specify in the specification that the warnings are obligatory?

Especially because we can also say #[warn(...)] as a tolerance level for these configurable rules. What do we say about #[warn(...)] in the spec if warnings are out of scope?

The Other Side of the Binary

Now that I’ve criticized the “binary error” model from the warnings side, I also want to address the notion that all errors are created equal. Errors are different from each other.

First off, there’s an obvious distinction between syntax errors and semantic errors. This is kind of boring and obvious, but it adds some nuance into the idea that invalid Rust is simply “not Rust,” and it comes up in practice sometimes.

As I write my code, I sometimes run cargo fmt as part of my editing workflow. Usually, this helps me read my own code better for further editing, and usually, this works even if my code is full of errors – it might even help me find and understand the errors. But sometimes, my code has a relatively superficial, syntactic error, like a missing }, and cargo fmt can’t even help me. This sends me into a little bit of a panic, but I’m usually also glad I didn’t keep working longer with such a problem.

If a Rust specification wanted to include formatting tools in its scope, it could conceivably make a formal distinction between syntax and semantic errors.

More interesting, however, are errors that don’t have to be errors, where the compiler could keep compiling, but it chooses not to.

We have the obvious example, where the error is configurable, where it’s actually a warning that’s just been set to #[deny(...)] as a lint level.

But we also have things like lifetime errors, which cannot be disabled. Or the rule against dereferencing a pointer outside an unsafe block. The Rust compiler could, if it wanted to, simply allow those things. We could do something like:

#[unsafe_allow(lifetime_mismatches)]

The compiler would then output a program, which would then exhibit undefined behavior – or not. It would then be potentially unsound – or not.

This is not included in Rust, but it’s theoretically possible, unlike referring to a variable that doesn’t exist, where there is no reasonable interpretation of what the code should do.

On the border is things that C++ allows, but are arguably non-sensical like referring to a variable that doesn’t exist. If a function returns u32, and you reach the end of the function, that’s non-sensical, right?

fn foo() -> u32 { }

But depending on the ABI, you can just not output the code that sets the return value, and perhaps not even output the code that returns from the function. This is definitely undefined behavior, but C++ will often allow it, sometimes without even a warning.

Unsafety as Always-On Warnings

As an aside, the #[allow] and #[deny] annotations are very similar to how unsafe works. We could imagine an alternative world where there was no unsafe keyword for blocks. Instead of writing:

unsafe { *ptr }

… we could instead imagine a Rust where this is written as:

#[allow(unsafe)]
*ptr

Basically, using operations like dereference (*ptr) are disallowed in Rust by default, but can be allowed. They are disallowed because, like Rust that is warned about, they are indications that the programmer likely made a mistake. But like Rust that is warned about, the programmer can make explicit that they are using the construct on purpose.

Given that unsafe/safety, one of Rust’s core features, works in a way very similar to warnings, should make us take seriously the importance of warnings. It would have been just as valid from a safety point of view to use literally the same mechanism with #[allow] and #[deny], but I think safety is such an important category of possible mistakes it’s probably for the best that it has its own special syntax.

Take-Aways

So why am I writing all of this, besides thinking it’s all an interesting mental exercise?

I don’t think the authors of any future Rust spec would actually err in such a way as to not discuss warnings at all. But I think it’s important to understand the theoretical implications.

But I also know that people do think in terms of the hypothetical Rust specification which only accepts or rejects programs. I recently saw someone write that capitalization conventions, such as snake case, are not part of the Rust programming language. They meant that according to the “binary error” model that we discussed above, which they implicitly subscribed to, using snake case or not will never change whether your program is a valid Rust program, and therefore, the entire convention is not part of Rust.

But even if we ignore the fact that a Rust compiler needs to know about this convention in case #[deny] is used, this assumes a definition of Rust programming language and Rust specification that uses the “binary error” model.

And while that is one way to think about Rust, it’s not a very good one, and I would say it’s not a very useful one. And more fundamentally, you don’t have to. You don’t have to use this philosophical framework where only rules that cause compilation failures are part of the programming language.

So I don’t think it’s fair to say “in Haskell, variable name case is significant, and in Rust, it is not, it is only a convention and not part of the programming language.” I think it’s more fair to say “in Haskell, case conventions are mandatory, violations are errors, and they are used to disambiguate the syntax. In Rust, they are non-fatal warnings by default, and the compiler can still process Rust with incorrect case, and in some situations has to.” Or, more simply, “in Haskell, case convention violations are errors, but in Rust, they’re just warnings.” But, in both Haskell and Rust, capitalization conventions are part of the programming language. In both, the compiler has to know about them, and enforces them in at least some situations.

This may seem like a nitpick, but I think using definitions of “programming language” vs “convention” can make the “convention” stuff seem less important than it should be. I think that if you think that way, and were writing an alternative implementation of Rust, you might give yourself permission to not care about the warnings. You might be less likely to add a policy to use -Werror, or require clippy to pass in your CI.

If someone with that attitude were writing the language spec – which I don’t think they would be, but if they were – they might underspecify the things that make Rust the useful tool that it is. Programming is about contracts, and as far as I’m concerned, the warnings are part of the Rust compiler’s contract. And a compiler should not be allowed to call itself a Rust compiler if it doesn’t follow it.

Snake case for variables is part of the Rust programming language, as I define the Rust programming language, and – I think – as most of the community defines it. Certainly it is part of “the Rust programming language” as that phrase is used in common parlance, and it is one of many features that make Rust special. If there is to be a specification, it should be part of the Rust specification. I understand that if you use the “binary error” model to define what a programming language is, and what a programming specification should be, you don’t get this result. But I just don’t think you should be using that model, and I think it does matter whether you do, even though it is a philosophical perspective that cannot be disproven.

Of course, Rust will probably not ever have a single monolithic specification mediated by ISO or an equivalent. It will certainly continue to be a community organization, with many different standards and specifications, perhaps one for a compiler with basic features, another for a compiler that fully supports errors, another for formatters like cargo fmt. Each of these specifications will delineate different sets of source trees: source trees with the syntax of Rust, source trees without errors, source trees without warnings, etc.

Just like the notion of a “programming language” doesn’t have to be a single set, a single binary between valid and invalid, the notion of a specification also needn’t be so monolithic.

Haskell Error Messages: Come on!

2022-02-16T00:00:00+00:00

I am a big fan of strongly typed languages, and my favorite GC’d language is Haskell. And I want you, the reader, to keep that in mind today. What I am writing is some commentary about a language I deeply love, some loving criticism.

So here’s what happened: A few days ago, I was showing off some Haskell for a friend who primarily programs in Python. The stakes were high – could I demonstrate that this strange language was worth some investigation?

My primary focus was on infinite lists, and defining fibonacci as a recursive data structure – all fun things to show off Haskell’s laziness. But at some point, we wrote an expression by accident that had a type error in it, and so we got to see how the compiler treated such things. I don’t remember the exact expression – it was deep in context – but the problem was I was trying to add an integer to an list. Something analogous to 1+[2,3].

Now, in some “weakly typed” languages, this sort of thing is actually allowed, as a colleague of mine recently pointed out:

[jim@palatinate:~]$ node
> 1+[2,3]
'12,3'

This is, of course, hilarious. But! We shouldn’t paint “weakly typed” languages with such a broad brush. In my friend’s native Python, it would have been an error, as it should be. It is a run-time error, but what does that matter when you’re working in an interpreted language, writing ad hoc scripts. The important thing is that failure is recognized as failure, and it doesn’t try to continue with nonsense:

[jim@palatinate:~]$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1+[2,3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'

This is an error message. It’s even a pretty decent error message. There are many things you can pass to the + operator in Python, but an int and a list together are not among them.

So now, what did Haskell do, this language that I’m trying to show off? Well, unfortunately, my friend didn’t see the actual problem in the code, but was first made aware of it from the compiler’s error message. And if you’ve ever done this before in Haskell, you’re probably wincing right now, because you know what this error message is:

[jim@palatinate:~]$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude> 1+[2,3]

<interactive>:1:1: error:
    • Non type-variable argument in the constraint: Num [a]
      (Use FlexibleContexts to permit this)
    • When checking the inferred type
        it :: forall a. (Num a, Num [a]) => [a]

Now, my friend didn’t understand this error message at all. Since I was in Demonstration Mode, my instinct was to explain it to him, but after a few false starts, I realized that this would simply not help, and pointed out that you couldn’t add integers to lists, and showed him where this was happening (it was a little more subtle than this example).

But since then, my colleagues and I were discussing error messages in Slack, specifically how good Rust’s error messages are, specifically how much better they are than Haskell’s. So I had an opportunity to paste that very bad Haskell error message me and my friend discovered into the Slack. There, it served as a case study, so we could discuss how problematically incomprehensible it is, sparking a lot of discussion, from which I shall try to extract the most interesting parts into this post.

For one, this error message has little to do with the concrete problem. The problem is – and the error message should say this – that you can’t add lists. Specifically, in Haskell, you can only add things that implement the Num typeclass (which lists don’t), and so you’d think the compiler would be smart enough to mention anywhere in this error message something along the lines of “expecting [a] to have Num instance, but it does not.” That’s the actual problem, even if not well-explained.

But instead, ghc tries to assume you meant what you wrote, and figure out a way in which [a] can have the Num instance. This is where it fails, and then it gives advice on how to make that succeed. As my professor-colleague points out, this is dangerous advice, especially for beginners, because there’s no way that using FlexibleContexts will actually help in that situation. The problem isn’t that these lists aren’t numbers in particular, and that you need to only accept lists that are numbers in your function. The problem is that no lists are (or at least should be) numbers! But a beginner might just follow the advice, try to figure out what the hell FlexibleContexts are, and find themselves in a world of pain, and no closer to solving the actual problem.

Part of what causes this is the type of 1 itself. Haskell, unlike Rust, allows literals like 1 to be interpreted in any number type. Given that Haskell (like Rust) has return-type polymorphism, it can directly express this in the type system:

Prelude> :type 1
1 :: Num p => p

In Rust, this would be something like impl Num. It means that 1 can be any type that is Num. Combine that with the fact that + requires its arguments to be Num and to match ((+) :: Num a => a -> a -> a), and when we see 1+[2,3], we’re simply left trying to figure out how [2,3] is Num.

If we did not have this polymorphic literal, this notion that the meaning of 1 is flexible, we would have seen a much more comprehensible error message. If 1 meant the same thing as (1::Integer) (or any arbitrary choice), we’d have this beautiful explanation:

Prelude> (1::Integer) + [2,3]

<interactive>:4:16: error:
    • Couldn't match expected type ‘Integer’
                  with actual type ‘[Integer]’
    • In the second argument of ‘(+)’, namely ‘[2, 3]’
      In the expression: (1 :: Integer) + [2, 3]
      In an equation for ‘it’: it = (1 :: Integer) + [2, 3]

Or even if we just had non-numbers on both sides, we’d similarly have a better error message:

[jim@palatinate:~]$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude> () + [1,2]

<interactive>:1:6: error:
    • Couldn't match expected type ‘()’ with actual type ‘[Integer]’
    • In the second argument of ‘(+)’, namely ‘[1, 2]’
      In the expression: () + [1, 2]
      In an equation for ‘it’: it = () + [1, 2]
Prelude>

What is my take-away here? I don’t think the compiler has been sufficiently tweaked when it comes to error messages, or that the Haskell community cares sufficiently about beginners. Rust as a community puts a lot of energy into good error messages, so that even though Rust also has a trait you could add to arrays to make + work, it still has a better error message:

error[E0277]: cannot add `[{integer}; 2]` to `{integer}`
 --> test.rs:2:7
  |
2 |     1 + [2,3];
  |       ^ no implementation for `{integer} + [{integer}; 2]`
  |
  = help: the trait `Add<[{integer}; 2]>` is not implemented for `{integer}`

But I also think the semantics of 1 are too liberal, leaving the compiler in an awkward place. See, the weird thing is, you can declare [2,3] a number, making 1+[2,3] an expression that adds two lists:

instance Num [a] where
    (+) = (<>)
    (-) = (<>) -- Eh, why not?
    (*) = (<>)
    negate = reverse
    abs = id
    signum = const []
    fromInteger i = take (fromInteger i) $ repeat undefined

main = do
    print $ signum $ 1 + [2,3]

Once you’ve defined lists as a number, 1 is suddenly a list if it wants to be. And this contributes to the difficulty of finding the right error message: what you asked for is possible after all.

And in the end, this leaves me with the feeling that Haskell has this in common with Javascript, and that makes me sad. A polymorphic enough strongly typed language is no longer strongly typed.

Mortgages are Interesting

2022-02-08T00:00:00+00:00

I just bought a house, and it came with a mortgage. I bought the house and committed to the mortgage all in one ceremony, in a cute little office where I signed enough papers that the sellers were able to solemnly hand me the keys to my new castle. In the lead-up to this, I was told how early payments, mortgage insurance, and refinancing works, and it’s – I think very reasonably – been on my mind since.

It was in a university economics class – macroeconomics – where I first encountered the concept that “paying down debt” was equivalent to “saving.”

And of course, when economists – and especially undergraduate econ professors – use words like “equivalent” you have to take it with a heap of salt. But there’s a lot of truth to it!

Let’s say you get $1,000 from your job. If you pay down a debt with it, your net worth has just increased by $1,000. If you save it into a savings account, your net worth has also just increased by $1,000. But if instead you spend it, say, on rent, your net worth does not increase at all.

It also interacts with interest in a similar way. Let’s say your debt and your savings account are both with Easy Round Numbers Credit Union (ERNCU), and both have 10% yearly interest as an annual percentage yield (i.e. after compounding is already taken into account). Well, if you save the $1000 now, you will have $1100 in a year. If you pay off $1000 in debt now, you’ll have $1100 less to pay in a year. If you were, for whatever reason, planning on paying off the entire debt, whatever it was, in a year’s time, this is an exactly identical situation. Even if not, though, it’s still the same net worth, and macroeconomically “equivalent.”

Interest rate is important. To maximize long-term net worth, you would want to put excess money in the place with the maximum interest rate. This is common financial advice for loans: After paying all your minimum payments, spend the extra money on the loan with the most interest. And, though it’s less commonly brought up, it works for saving too: If you save money in a place where it grows faster than the loan, put it there. Keep that loan going, and pay it off only when you have to. So if you have a savings account more valuable than the loan, use it.

So considering this “equivalency,” I perhaps have a valuable opportunity. Instead of thinking of this as a loan, a burden to pay back, I should think of this as a very special savings account, with 3.125% guaranteed interest, more than the best CDs out there (which are currently hovering around 1%ish). If I have some excess cash, I can use it to increase my equity in the house, and I’ll save that much, plus 3.125% compounded, in future payments.

“Wow,” I said to myself (and to a friend), “Maybe I could just take advantage of this, and sell CD’s. I could have people lend me money – sorry, deposit their money with me – at 3%, better than they’d be able to get a bank, and use it to pay down the mortgage faster. After all, saving and paying down debt is equivalent, right?”

And then I thought to myself, wait, how would I pay them back? Let’s say I get a customer, John Doe. Mr Doe wants to leave his $1,000 with me, and get paid $1,030 in a year. That’s 3% interest, and less than my 3.125% interest rate, so I should be able to make this worth my while. And it’s better than what Mr Doe can get anywhere else, so that should make it worth his while!

So I take his $1,000, and I pre-pay my mortgage with an extra $1,000 payment. My principal goes down by $1,000. The amount of interest I have to pay in the following year, therfore, goes down by that times my interest rate. I then pay $31.25 less in interest that year, take the extra money I would have paid in interest, give $30 of it back to Mr Doe, and keep the extra $1.25 as profit. Woo!

But already there’s a problem. It’s true that I would have to pay $31.25 less in interest that year, yes. But I can’t take that money and give it back to Mr Doe. I have to keep paying the same total monthly payments, and whatever now doesn’t go to interest, must go to paying the principal down faster. I simply don’t have the flexibility to pay it to Mr Doe (and myself) instead; I have to wait until the end of the mortgage term, in 30 years, when I’ve paid it off early, to see the money.

But even setting that problem aside, I’m assuming that once I give Mr Doe the interest, he’ll want to leave the $1,000 in for another year. But what if he wants his money back? What if inflation continues or the Fed just feels like raising interest rates, and now there’s CD’s at 4% available elsewhere? Well, in order to get the principal back (and this would help with the interest too), I’d have to increase the loan amount on my house back up again, and take-backsies the pre-payment. Given that this is a fixed-rate mortgage and not a line of credit, this would require refinancing – which, given that in this scenario interest rates are higher, is not going to be fun or affordable for me.

So what financial product can I sell Mr Doe and his friends? I can sell a very long-term CD, that matures in 30 years. Once his money gets used as my early repayment, he has to wait until I see it again – when my mortgage ends early because of it. Then, instead of paying the bank back for the last however-many payments, I pay him the amount I would have paid the bank, and he should get his money back with 3.125% interest – or rather, with 3% interest after I also take out my cut. Mr Doe pays $1000 now, and I pay him $2427.26 after 30 years, for $89.94 profit on my part.

So, it looks like having a loan at 3.125% is indeed like having a savings opportunity at 3.125% – as long as it’s a 30-year CD. Problem is, who wants a 30-year CD? Would I buy a 30-year CD at 3.125%? I mean, sure, the interest is enticing, and much better than I can get in a savings account.

But at 30 years, the only thing I could practically use this CD for is retirement – and I already have a retirement account, which I expect to do better tha 3.125% a year. The 3.125% is a guarateed return, that is true, but 30 years is a long way away. On average, the stock market, while not guaranteed, has historically tended to pay 10% returns. While it might result in bad years, 30 years is plenty of time for the bad years to be beaten out by the good ones.

And it’s not like my monthly payments would go down if I created such a CD – that would require a refinancing. The bank is actually creating CDs (and other financial assets) out of this mortgage, and in exchange for predictability, they don’t want to lower the monthly payments, which is in my mind fair enough, as I get to lock in this low interest rate. But even if I pay half the mortgage in a single payment, unless I refinance, my monthly minimum payment remains the same until the mortgage is paid off.

And honestly, some borrowers – those with certain subprime mortgages – have it worse. Some mortgages come with a pre-payment penalty, where paying the principal ahead of time actually accrues fees. For me, it does decrease the principal – I just have to wait until the end of the mortgage to reap the practical benefit.

Again, unless I refinance. I pay the entire loan off, all at once, and replace it with another loan. This makes sense if the interest rate is lower, or even if it’s the same and I want to lower the monthly payment or take money out of my equity. If I could know I could refinance at any time at the same interest rate, then I could sell (and provide myself with) shorter-term savings opportunities.

Refinancing is similar to selling a mortgage-sized group of CDs to a bunch of interested John Does… except it’s to a bank, the only organization actually interested in such a product, and therefore at the market rate for mortgages, rather than at the lower market rate for CDs. Part of what makes a bank a viable business is that they provide longer-term loans than any normal individual would be interested in, in exchange for an upcharge on interest rates. They then take on some amount of risk and some amount of overhead to turn that loan into more reasonable (i.e. shorter-term than 30 year) CDs, or even checking and savings accounts.

So if a mortgage doesn’t give me a super-special savings opportunity for my excess money, why am I getting a mortgage? Well, I have to live somewhere, and I can’t afford a house outright, and the alternative is renting. Monthly mortgage payments are comparable, in their size, with rent payments, but even with the default monthly mortgage payment, the principal’s still going down.

Put another way: Even though I’m not interested in putting my disposable income into a 30-year 3.125% CD, I’d much rather put my monthly housing expenses into such a CD than pay them to some landlord, never to be seen again. This adds up – in 30 years, I will definitely have the entire principal of my mortgage paid off, which means I will no longer have to pay the mortgage payments. If things go well, my house will have also appreciated, in case I want to borrow money against home equity. If I’d rented this entire time, I’d still have to pay rent, which the landlord could increase at any time. The “free rent” from my mortgage will be a valuable asset in my later years, and one that I basically don’t have to pay any extra for above getting a rental.

And mortgages are honestly also convenient. If I owned the property outright, I’d have to pay all the different forms of property tax myself, and also my homeowner’s insurance. With a mortgage, I have one monthly payment, that indirectly pays those things through an escrow account. And this is honestly much more convenient, especially for someone who struggles to juggle all the demands of modern life.

Burying the Lede

2022-02-02T00:00:00+00:00

Imagine you don’t know who Napoleon was. You know he’s a figure from history, but you don’t even know he has to do with France. And imagine, when you read the Wikipedia article, for some reason you skip the opening paragraphs above the fold, and you’re reading about his upbringing in Corsica as a petty Italian noble under French rule. And you just want to know, why’s this guy important, what’s his deal, why do people keep talking about him (something military, it seems?) but you have to read two-thirds of the way through the article to find out, oh, he became Emperor of the French. Finally, you have context to understand everything else, and you now know the first thing about Napoleon.

This is how I feel reading technical documentation, not all the time, but a fair amount of the time. I hear about a new project, and I get documentation about it, and it immediately goes into the weeds. What is a Foo? Is it a library, an application, a command-line utility? It’s a “framework” – that word means several things, but I’m guessing it means it’s a library that dictates how an application is written? Who would use this framework? What problems does it solve for them? Oh, I see it uses a bunch of technologies to solve the problem, but what problem is it trying to solve, exactly?

Tech writing isn’t the only time this comes up, of course. The phrase “burying the lede” is famous from journalism, the canonical example being an article about a fire, where the article goes into details about how it might have been started, it talks about how this is the third fire this year, does a brief profile about the firefighter who put it out, and then finally, at the end of the article, mentions with gravity and somber respect, the two children who were so unfortunately lost, burnt to a crisp. Finally, by the end of the article, the studious reader knows why it’s the talk of the town, and the other reader, who only read the first two paragraphs, will say something insensitive at the local pub that night and look like quite the asshole.

For another example: I remember being a high schooler in the early aughts, and hearing, as was often discussed then, about Halliburton. What, exactly, was Halliburton and what do they do as a business? Democratic Congressmen complained about them so much, but what, actually, was the company, and what business did they have in Iraq? Literally, what were they doing there? As in, what were they doing while they were there?

And so, a friend and I decided we’d look at the Halliburton website. And besides an option to sign up to pay $300 to be some sort of member of something, and a careers page that shined no light on our question, there simply wasn’t very much content to go off of. Lots of testimonials and statements about how they were simply the best in the business – and clearly it had something to do with oil, though this had to be gleaned as even that was not explicitly stated – and nothing, absolutely nothing, about what exactly the business was. My friend said they did “consulting” on oil fields, but I didn’t know what that meant then, and even now, “consulting” is one of my least favorite terms on account of how vague it is.

Of course, what we should have done is gone to Wikipedia. I don’t know what it said then, but now it has a pretty darn helpful second sentence, opening with:

Halliburton Company is an American multinational corporation. In 2009, it was the world’s second largest oil field service company. It has operations in more than 70 countries

“Oil field service” is much more evocative and specific than anything it said on the Halliburton website at the time (or even now). And it also is a link, to an list of such companies, which doesn’t explain much about them but at least tells us the first thing:

This is a list of oilfield service companies – notable companies that provide services to the petroleum exploration and production industry but do not typically produce petroleum.

Of course, it’s still unclear what those services actually are. Do they build equipment, install equipment, run trainings for the people who actually drill it? Do they help staff the oil fields? Lots of actual questions. But I’m a lot further ahead than I would’ve been just looking at the Halliburton website. Not very much, but a lot further.

Actually, if anyone in my readership can give me any insight on what Halliburton actually, specifically does, please let me know in the comments. Thank you for indulging me in this anecdote about the petty frustrations of my youth.

Now back to the topic.

Let’s look at a better, cleaner example of a lede, what Wikipedia actually has to say about Emperor Napoleon in the very first paragraph:

Napoleon Bonaparte (born Napoleone di Buonaparte; 15 August 1769 – 5 May 1821) was a French military and political leader who rose to prominence during the French Revolution and led several successful campaigns during the Revolutionary Wars. He was the de facto leader of the French Republic as First Consul from 1799 to 1804. As Napoleon I, he was Emperor of the French from 1804 until 1814 and again in 1815. Napoleon dominated European and global affairs for more than a decade while leading France against a series of coalitions in the Napoleonic Wars. He won most of these wars and the vast majority of his battles, building a large empire that ruled over continental Europe before its final collapse in 1815. He was one of the greatest military commanders in history, and his wars and campaigns are studied in military schools worldwide. Napoleon’s political and cultural legacy has endured, and he has been one of the most celebrated and controversial leaders in world history.

That’s a lot to unpack, but I think it’s a fairly solid intro paragraph! For the record, I did not choose it because I knew ahead of time that it was going to be a solid intro paragraph. I just happened to be thinking about Napoleon when I started this essay, and I trusted Wikipedia to provide a solid introduction to such a historically important figure, and Wikipedia did not disappoint.

Which is all to say, this solid quality is pretty standard on Wikipedia. How do they do it?

It turns out they have some pretty nice policies, specifically about the first sentence.

I always appreciate how Wikipedia starts out biographies by saying the person’s nationality and what jobs or activities they were known for. Napoleon was a French military and political leader. Gramsci was “an Italian Marxist philosopher, journalist, linguist, writer, and politician.” This is especially important, because other sources are likely to assume I know who a famous person is, especially actors and athletes, neither of whom I’m likely to recognize the name of. You can’t assume the audience already knows Napoleon was Emperor of the French – or that Robin Williams was an actor – especially if you’re an encyclopedia! If you can’t learn such things reading an encyclopedia, where can you?

I also appreciate that Wikipedia explicitly highlights the level of importance of the figure. It’s not hype when Wikipedia says that Kafka “widely regarded as one of the major figures of 20th-century literature” or that “Napoleon dominated European and global affairs for more than a decade while leading France against a series of coalitions in the Napoleonic Wars”; in both cases, it’s something that readers really ought to know.

All in all, though, this comes down to having a taste for saying the otherwise obvious. The editors who wrote these lead paragraphs knew who Kafka and Napoleon and Gramsci were, had a lot more expertise than the people who’d need these basic facts. But with effort, with the aid of explicit lists and examples of other articles, they were able to communicate to the less knowledgeable effectively.

Go and do likewise! Do likewise in your documentation. Explain to me what your programming project does, who it’s for, what its role is within your company. In the comments/developer docs: Tell me what the most important files are, where the main loop is, where you go to modify various things.

And for goodness sake, tell me what problem the darn thing is trying to solve, and what type of thing it is (e.g. program, library, framework, specification, protocol).

Being Fair about Memory Safety and Performance

2022-01-20T00:00:00+00:00

For this next iteration in my series comparing Rust to C++, I want to talk about something I’ve been avoiding so far: memory safety. I’ve been avoiding this topic so far because I think it is the most discussed difference between C++ and Rust, and therefore I felt I’d have relatively little to add to the conversation. I’ve also been avoiding it because I wanted to draw attention to all the other little ways in which Rust is a better-designed programming language, to say that even if you concede to the C++ people that Rust isn’t “truly memory safe” or “memory safe enough,” Rust still wins.

Array Indexing

But there is a persistent and persnickety little argument that I wanted to talk specifically about. This argument is really persuasive on its face, and so I think it deserves some attention – especially since I am guilty of having used this argument myself, many years ago when I still worked at an HFT firm, to claim that C++ had a niche that Rust wasn’t ready for. I’ve also seen it a few times in a row in the wild, and it’s made me so emotional that I simply had to write this, and as a result, it’s a little more emotional than some of the other posts.

In this argument, array indexing stands in for a number of little features. But – I’ve seen array indexing cited so often as a canonical example that I feel compelled to address it directly!

The argument goes like this: In Rust, array accesses are checked. Every time you write arr[i], there is an extra prepended if i >= arr.len() { panic!(..) }. As you can see, that is more code, and worse, a run-time check. And while the optimizer might eliminate it, or the branch predictor may well predict it right every time, the extra code bloat and possible run-time check, is just unacceptable in [insert field here (I used HFT)], where every nanosecond matters. And until some acceptable solution is found to this, I just don’t see Rust making it in [insert field].

When I made this argument, to a group of programming-language academics, the defenders of Rust countered with a number of points, all of which accepted the basic premise:

Do I really need those extra nanoseconds? Yes.
Is it really too much of a price to pay for all that extra safety? Yes.
Do I really distrust the optimizer that much? Yes. If only Rust had a way to do optimizer assertions, a way to statically verify that the panic had been optimized out.
Would dependent typing on integer values help? Yes. That sounds very promising. I think Rust will get there someday, but for right now we must use C++.

Now that I know more about Rust I’m happy to tell you that I was completely off base. I wasn’t off base about the performance considerations, or the unacceptability of even the slightest risk of a run-time check. I was off base about an even more basic premise: that Rust uses checked array indexing, whereas C++ uses unchecked array indexing.

But wait! Isn’t that the whole point? Doesn’t C++ avoid checking everything, to make sure all abstractions are zero-cost, to be blazing fast? Doesn’t Rust, while trying for performance, in the end always concede to the demands of safety?

Well, let’s look at the APIs in question. C++ apologists are always saying to use the modern C++ features from C++11 and later, rather than the more C-like “old style” C++ features, so on the C++ side let’s take a look at the documentation for std::array, introduced in C++11.

Here we see two indexing methods. The first one, at, is bounds checked and will throw an exception if the index is out of bounds, whereas the second one, operator[], is not, and will instead exhibit undefined behavior of a very difficult-to-debug nature. It looks like C++ actually believes in free choice here, leaving the choice of method up to the user. Not quite what we supposed, but the important part is that unchecked indexing is available, so so far the argument can still stand.

Now let’s look at Rust. Rust arrays and vectors can also be used with methods from slice, as can slices, so the slice documentation is the best place to look. And looking there, we immediately see – drum roll please – 4 methods. We see get and get_mut, which are checked, and right underneath them, in alphabetical order, get_unchecked and get_unchecked_mut, which are not.

To review, where do Rust and C++, these programming languages with their vastly different philosophies, Rust for the cautious, C++ for the fast and bold, stand? In the exact same place. Both programming languages have both checked and unchecked indexing.

Let me say that again. This is the talking point form, what to say if you need something quick to say, if you’re ever debating programming languages on a political-style talk show (or at a party or even a job interview):

In both Rust and C++, there is a method for checked array indexing, and a method for unchecked array indexing. The languages actually agree on this issue. They only disagree about which version gets to be spelled with brackets.

The difference is simply in the default, which one gets that old fashioned arr[index] syntax. And even that can be changed. Even if the C++ default were superior – and, as I will argue later, it is not – this is surely a minor issue. After all, don’t we normally use our fancy for x in arr syntax in Rust? This issue is just so small as to be unlikely to be a deciding factor in what programming language is better, even if we’re in a special application domain where every nanosecond matters.

The Unsafe Keyword

So that’s a wrap folks. We can all go home, and none of us will ever see this extremely silly argument on the Internet or in person again. It’s just a misunderstanding, the person making it was simply misinformed, and all it will take is a link to this blog post – or the relevant method in the docs to set them straight.

But wait! The C++ apologists are still talking! What are they saying? How have they not been completely flummoxed? They’re pointing at that method, chanting a word like a slogan at a protest march. I can’t quite make it out – what it is it?

Oh. They’re chanting unsafe. And credit where credit is due: it’s very difficult to chant in a monospace font.

Well, that is easy to respond with! The nerve, that C++ programmers would call our unchecked array indexing method unsafe. For one, all unchecked array indexing methods are unsafe: that’s what unchecked means. If it were safe, it would be at least statically checked. For another, isn’t this the pot calling the kettle black? Isn’t C++ all about unsafety, so much that C++ programmers don’t even mark their unsafe code regions becasue it all is, or their unsafe functions because they all are?

“But isn’t that the whole point of Rust?” they cry. “If you have to use unsafe to write good Rust, then Rust isn’t a safe language after all! It’s a cute effort, but it’s failing at its purpose! Might as well use C++ like a Real Programmer!”

This, my friends, is a straw man. No, the point of Rust and specifically Rust’s memory safety features is not to create an entirely safe programming language that can’t be circumvented in any circumstance; you must be thinking of Sing#, the programming language for Microsoft’s defunct research OS.

Let me be abundantly clear: The point of memory safety, the unsafe keyword, and friends in Rust is not to completely enforce memory safety, to make it impossible for the programmer to do anything they want to with the computer, even if they can’t prove to the compiler that it’s OK. In fact, the point of memory safety isn’t to make it impossible to do anything at all – it’s to make it possible to reason about the program.

The premise of Rust is that the vast majority of code in a systems program doesn’t need to be unsafe, and so it might as well be safe. People used to believe that you needed garbage collection for safety, but Rust proved that you could use lifetimes to still get safety without that performance cost. Now that we’re there, why worry about null pointers? Why not tell the compiler which things can be null, and which things can’t, so the compiler can check for you whether you’re handling nulls correctly? I’ve programmed C++ professionally for years without such a feature. You’d better believe I would have totally annotated the crap out of the code so the compiler could’ve caught them ahead of time.

Sometimes, C++ apologists cite valgrind. I’ve had codebases where I tried to use valgrind. Unfortunately, there was so much undefined behavior and memory leaks already caked into this project that new ones were simply impossible to see among all the noise. An army of junior engineers was at some point required to clean this up when finally the hierarcy decided that “valgrind” was something we might want to be able to use in the future.

And a lot of those undefined behaviors were ticking time bombs. Certainly, this codebase had its issues. A friend of mine took days to find a bug where a pointer had a value of 7. I don’t mean 7 elements into some array, not 7 of the relatively wide pointer type, not a convenient, testable-for NULL, value. No, none of that: The pointer’s value was exactly 0x7.

Update: My friend had a very similar incident to that described in this piece, but it was not the same incident. Some time after, I read that piece and shared it with this friend … and I must have conflated the numbers from the piece and from what happened to my friend. It was some null-page number, some “low integer,” however, even if not 0x7.

I’ve had memory corruption issues where I poured over every line of code that I wrote, over and over again, finding nothing. Ultimately, I learned that the issue was in framework code – code written by my boss’s boss. The code was untested, and written extremely poorly, and had rotted, so that it didn’t work at all. In Rust, I might have had some idea that my code – which in Rust would have all been able to be “safe” – couldn’t possibly be the source of the problem. Maybe my humble assumption that my code was to blame would be a little less tenable.

If I wanted a language that was always safe, at the time I knew Java or Python existed. Some companies even do finance in Java, for exactly that reason. But sometimes you still need that extra bit of performance. unsafe is sometimes necessary.

But given what gains safe Rust has made in predictable performance, it’s not as necessary as it used to be. The majority of the code I wrote then could’ve been written in safe Rust, and not lost a single clock cycle. The parts that needed to be unsafe could have been isolated, delegated to specific sections, wrapped in abstract data types, perhaps entrusted to a specific team.

And even then, I’m sure we would have been debugging memory corruption issues. But we’d know where to look. We’d know where to throw the tests. And we’d have saved programmer-years of time, days if not months of my life.

Now, I’m proud of my C++ skills. There is some part of me that wishes that C++ was better than Rust, that all that time getting better at debugging memory corruption wasn’t dedicated to a skill that is becoming obsolescent through better technology. And to be honest, that’s part of why I dismissed Rust as a candidate for HFT programming languages.

But it’s possible to be proud of a skill that is also becoming obsolete. And I am trying to replace it with a new skill to be proud of – writing Rust as performant as idiomatic C++, or even more performant, while reaching for the unsafe keyword rarely and modularly. I think it’s truly possible, for where it’s relevant.

Now I must turn to a subset of C++ apologists, who write using “modern C++” which is “very safe now” and experience therefore no memory corruption issues. To them I say, you are not doing high performance programming. If you were, you’d have to do some wonky things with pointers to spell the bespoke high-performance constructs you’d need.

There is indeed a safe subset of C++ heavy with modern features. If you are disciplined and keep your programming in that realm, you can avoid memory corruption mostly. But first, this safe subset covers fewer high-performance features than Rust. I’ve read some of this code and its idioms: It’s full of shared_ptrs not to share ownership but simply to avoid types that might be invalidated. It ironically leans on reference counting more than idiomatic Rust. This is among other, similar problems.

Let me be clear: First off, instead of keeping in your brain which features are “modern” and which are “edgy,” why not have a distinction where it’s well-marked? Second off, if you are writing entirely in this safe subset of C++, you can get much better performance instead out of the safe subset of Rust. You have no right to complain about Rust’s safety trade-offs, as you’re using a worse set, where you get no safety promises from the compiler and none of Rust’s surprising safe performance.

Rust’s safe and “slow” subset is faster than C++’s while still being, obviously, safer. Rust’s unsafe subset is better factored and better distinguished. Comparing apples to apples, Rust is better programming language for extracting performance out of LLVM, because you’ll be able to code more often without fear, and with very focussed fear when you do feel it.

A tool is even more useful if you can adjust it. The defenders of C++ talk about choosing trade-offs, but really, Rust offers both trade-offs. Mark your code as unsafe and convince yourself of its safety manually, or rely on programming language features. It’s up to you, on a function-by-function, even block-by-block, basis. In C++, if you have a problem, every line of code is suspect; you simply can’t opt in to safety, but in Rust, for where you don’t need the performance of unchecked indexing and other unsafe features, you can relax about the possibility of going bankrupt due to inadvertent memory reinterpretation – and how do I wish my NDA permitted me to talk about consequences at my own previous jobs!

And for where you do need to use unsafe, you can make sure your debugging and overthinking efforts are well-directed, for the few places in a large project you need it.

Unchecked Indices

This has gotten a little far from the original question. Should array indices be checked? Well, let me be clear about two facts that are both true, but in tension with each other:

Unchecked array indexing is sometimes absolutely necessary
Unchecked array indexing is an edge-case feature, which you normally don’t want.

If unchecked array indexing was unavailable in Rust, that would be a bug. What is not a bug is making it inconvenient. C++ programmers probably should be using at instead of operator[] more often. But in C++, what would it gain? There’s so many unsafe features, what’s the cost of one more?

But in Rust, where so much code can be written that’s completely safe, defaulting to the safe version makes more sense. Lack of safety is a cost too, and Rust makes that cost explicit. Isn’t that the goal of C++, making costs explicit?

Let’s look at situations where you are indexing memory. First off, most of them I saw were in old C-style for-loops, where you loop over an index rather than using iterators directly with a collection. Both Rust and C++ have safe versions of for that loop over collections with iterators, and those use the same check for the loop as they do for bounds, so those are easy enough to address. Nevertheless, I think that a lot of the noise about checked vs. unchecked array accesses comes from people who use indexing for their for-loops instead of iterators, and therefore mistakenly think that array indexing in general is a far more common operation than it is.

For the remaining situations, most are implementing either gnarly business logic, or a subtle, fast algorithm.

If it’s gnarly business logic, in my experience, it’s usually at config time – along with a good third to half to even more of the code in a complicated production system.

What do I mean by config time? A running high-performance system, whether optimized for latency or throughput, has a bunch of data structures organized just so, a lot of threads set up just right to move data between them in the perfect rhythm, and a lot of the work is in arranging them. That work is generally not performance-sensitive, but often has to be in the same programming language as the performance-intensive stuff.

Config-time is, depending on how you look at it, less of a thing or the entire thing in a programming language like Python. Python basically exists to do config-time programming for performance-intensive code put in very comprehensive “libraries” written in C or C++. But in C++, where you have a constructor that runs only once or a few times at first, and other methods related to it, in the same programming language as the money-making do-it part, you have to really adjust programming style between them.

Config-time is obviously when you read the configuration files. It’s where you open the relevant files. It’s where you call socket and bind and listen on your listening port. It’s where you spin up your worker threads, and make computations on how many worker threads there are. It’s where you construct your objects and your object pools. It’s where you memory map your log file. It’s where you set your process priorities. It’s where you recursively call the constructors and init functions of every object in your overwrought OOP hierarchy.

There is no need to sacrifice safety for performance at config time – especially since undefined behavior might lie latent and destabilize the system once it’s actually up and running. If you do an unchecked array access at config time, you might put garbage data in an important field, maybe one that determines how much money you’re willing to risk that day or how many of a thing to buy. And for what? To save a few nanoseconds before your process has even “gone live”?

So, when do you truly need unchecked array accesses? If it’s a subtle fast algorithm, probably deep in an inner loop, you should probably be wrapping it in an abstraction anyway. The code that actually executes the algorithm should be separate from the business logic, so that programmers trying to maintain the business logic don’t accidentally break it. And that’s exactly where it makes the most sense to use unsafe – when implementing a special algorithm. Maybe the proof that the index is within bounds relies upon some number theory the compiler was never going to understand without its own proof engine: great! You should probably be explaining that in a comment in C++ anyway, and so the conventional comment that goes with the unsafe block in Rust is a perfect place to explain it.

But maybe I’m wrong about all of this. Maybe your experience hasn’t matched mine. Maybe your particular application needs to make unchecked array accesses a lot, needs them to be unchecked, and needs them littered all over the codebase. I raise my eyebrows at you, suspect you need more iterators and perhaps other abstractions, and wonder what problem you’re trying to solve. But even if you’re absolutely right, I think it’s still a better idea to write Rust littered with unsafe every time you index an array, than to write C++.

Because, as I keep emphasizing, Rust is still a better unsafe programming language than C++. It would be better than C++ even if safety weren’t a feature.

Post-Script: Some Perspective for the New Rustacean

I understand where this straw man argument comes from. The word unsafe is scary, and advice, especially aimed at people coming from safe languages like Python and Javascript, is to avoid unsafe features while learning. And while I think adding unsafe to production code should only be done once you’ve exhausted safe possibilities – which requires full understanding of safe possibilities – this advice can feel overbearing for a transitioning C++ programmer, especially when it is immediately obvious that the safe features are very constrained and can’t literally do everything.

For that good-faith recovering C++ programmer, new to Rust: You’re right. The safe subset isn’t enough to do everything you want to do. And when it doesn’t, that doesn’t mean it failed. Its goal is to make unsafe code rare, not non-existent. But it might surprise you how rarely you truly need unsafe. And a good resource for you might be, as it was for me, the excellent Learn Rust the Dangerous Way by Cliff L. Biffle.

For what it’s worth, however, this criticism of Rust in general is often levelled either in bad faith, or from a misunderstanding of what the unsafe keyword is for. For all the philosophical discussion of what unsafe truly means – and how it interacts with the surrounding module and encapsulation/privacy boundaries – as well as principled conventions for using it, please see the Rustonomicon, the canonical book on unsafe Rust, the same way the book is canonical for introducing Rust.

Other criticisms of Rust from an HFT or low-latency point of view are more relevant. Most specifically, gcc and icc are much better compilers for those use cases – empirically – than is LLVM. Also, the large codebases existing in C++ are often tested and contain thousands upon thousands of programmer-years of optimizations and bugfixes, where even small compiler upgrades are scrutinized closely for performance regressions. Migrating to another programming language from that starting point would be prohibitively expensive.

None of which is to say that if Rust gradually replaced C++ altogether, eventually such ultra-optimizing compilers and ultra-optimized codebases wouldn’t start appearing in Rust. I hope to see that day within my lifetime.

In Defense of Async: Function Colors Are Rusty

2022-01-03T00:00:00+00:00

Finally in 2019, Rust stabilized the async feature, which supports asynchronous operations in a way that doesn’t require multiple operating system threads. This feature was so anticipated and hyped and in demand that there was a website whose sole purpose was to announce its stabilization.

async was controversial from its inception; it’s still controversial today; and in this post I am throwing my own 2 cents into this controversy, in defense of the feature. I am only going to try to counter one particular line of criticism here, and I don’t anticipate I’ll cover all the nuance of it – this is a multifaceted issue, and I have a day job. I am also going to assume for this post that you have some understanding of how async works, but if you don’t, or just want a refresher I heartily recommend the Tokio tutorial.

The Questionable Feature: Colored Functions

In any discussion of a programming language feature, the first thing to ask is what problem the feature is trying to solve. In the case of async, it’s trying to deal with asynchronous operations – operations that don’t require more work from the CPU to make progress, and where several might be in flight at any given time. For example, a single process might be writing some data to a file, reading data from another file, waiting for new incoming connections, and servicing an existing connection.

So how does Rust solve this? The easiest way to address this problem would be to have a thread for each operation, and to let the thread block at the asynchronous operation, essentially pretending that the operation is a long-running function like any other the CPU has to do, rather than something taking place elsewhere. But operating system threads are expensive. And rather than using green threads as some other programming languages do, Rust decided to create a syntactic sugar for futures, meaning that Rust’s async feature now suffers from the dreaded function coloring effect first explained by Bob Nystrom in a Javascript context in 2015.

In Bob Nystrom’s now-famous essay he complains that an analogous feature in Javascript is harmful, because asynchronous functions – which he refers to as “red” functions – can only be called from other red functions. Once a red function is needed, the function that calls it must also be red, and same with the function that calls that, the whole way up the call chain. And the syntax and semantics of calling a red function is more complicated than that of calling blue functions – especially in Javascript, where the next thing to do had to be enclosed in a lambda, resulting in callback hell (I do not endorse the suggestions in that post).

Colored Is Good, Actually, and Rusty

My position is close to those of this article, but with enough nuance that I wanted to write my own blog post to explain it in more detail. Fundamentally, I agree that Rust does indeed have colored functions, and that it’s not a bad thing. But I would go further. I say that function coloring has always existed in Rust, even before it manifested in the async world, that it is the Rustiest way to solve this problem, and furthermore, that Rust needs more function coloring than it has.

Rust, unlike the Javascript of the original colored functions article, is strongly typed, and influenced heavily by Haskell. This means that it has lots of type distinction on its values: “colored” values, if you will.

This type information includes basic ideas of type (string vs number vs widget), but also shades of distinction that a Javascript programmer won’t even be aware of. Let’s say you want to take a parameter to your function, a “widget.” In Javascript, you just take a parameter widget and do widget things with it, and hope that it works out. The name is just a comment: it’s up to the caller to know what exactly is expected, hopefully some sort of widget that works. In Rust, on the other hand, you have to annotate the parameter with a type, which not only ensures it’s actually a widget, but distinguishes between these potential requirements:

Exclusive reference to a widget &mut Widget
Ownership of a widget Widget
Reference to a widget that lives forever: &'static Widget
Optional widget (in Javascript this is very unclear): Option<Widget>

If Widget is a trait, you have even more options:

Owned run-time generic widget: Box<dyn Widget>
Non-owned reference to compile-time generic widget: &impl Widget

The list goes on. For each of these options, also, the caller often has to do something different. If the parameter is optional with Option, and the caller in fact has a widget, the caller still has to add Some to the parameter:

fn foo(widget: &Widget) { ... }
fn foo2(widget: Option<&Widget>) { ... }
fn foo3(widget: Widget) { ... }

let baz = Widget::new();
foo(&baz);
foo2(Some(&baz));
foo3(baz);

All of these, in my synaesthetic mind, are expressed by different colors and textures on the parameter. For all of these, Rust has made a value judgment that the programmer should be explicitly aware of these shades of distinction, if you will (pun intended). If a parameter is to be optional, the function is called differently than if it is mandatory. If a borrow happens, that requires a & from the caller, to make clear to the programmer what is going on, to make sure the writer of the caller and the writer of the callee are on the same page. Parameters in Rust are, in general, colored.

And this value coloring, like the async/sync function coloring, propagates. If a function requires a parameter to be 'static in lifetime, that requirement propagates to the caller of that function to the caller of that function to the originator of the value in question.

Similarly with return values – I disagree with “More Stina” about Result. I say Result-returning fallible functions are colored. In many programming languages, including Javascript, any and all functions can throw recoverable exceptions. In Rust, functions that might fail (in a recoverable fashion) must have a different return type than those that do not – they must return a Result<...>. Functions that return Result<T, E> are, as with async functions, harder to call than functions that just return T. If you don’t want to use the syntactic sugar ? to propagate the error, you have to grapple with Result as a literal return type, which means unpacking it and doing something else in the Err case. This is more straightforward than dealing with a raw impl Future, but fundamentally the same concept: either propagate the “color” with ? or async, or else deal with all the implications of Result or Future on the spot.

And all of these distinctions mean something. Passing by shared reference, mutable reference, or value are different, and put different safety requirements on the calling code, safety requirements that allow Rust to make more safety guarantees than Javascript ever could. Passing by reference is literally different at the ABI level from by-value, so each can implement the exact contract as efficiently as possible, unlike Javascript which leans on an expensive garbage collector for cleaning up the difference between these notions. That is to say, where Javascript (and Python) use garbage collectors, Rust uses distinctions – color distinctions, one might say – between types to achieve the same result, benefitting in performance but requiring more exactness from the programmer.

And in Rust, a statically typed programming language, we believe this to be a good thing. Rust is not for every project – it’s a steeper learning curve than Python or Javascript, and not every project needs to be maintainable long-term – but it has a distinct, consistent philosophy, which says that different things should be treated differently.

Async Functions Are Just Different

A blocking or asynchronous function is not the same thing as a non-blocking function. A non-blocking function fundamentally does some CPU tasks, taking control of the processor, using it, and giving it back. An asynchronous function does the set-up necessary for work to happen elsewhere. That work doesn’t need control of the CPU, and can be dealt with through a handle – a future – rather than just waiting for completion. These are fundamentally different notions, and while it might (or might not) make sense in Go or Javascript to lump them together into one notion of “calling a function,” Rust doesn’t do lumping.

When you call a normal function – without async/await – you build up a stack. When you use async/await, you build up a complex nested state object. If you use async/await with an executor to spawn a new task, that complex state object ends up on the heap in a data structure next to other task objects.

Both “call stack” (for synchronous code) and “task state object” (for asynchronous code) are reasonable ways of managing memory. Honestly, the miracle is that Rust, through async and await, manages to make these two vastly different paradigms look as similar as they do. Having to annotate the difference is a small price to pay for high-performance reactive programming.

It’s not 100% perfect. Even with the must_use warnings, people forget to call await on their futures sometimes. And writing reactive, async code is harder – which makes sense, because the resulting code is a more difficult but more performing usage of memory. Writing code that passes the borrow checker is harder, but considered worth it because we can remove indirections and avoid garbage collection. async offers us the same deal for reactive programming.

Alternatives to Async

But let’s say we did want to remove the coloring here. Let’s say we did want to pretend that blocking functions were just like CPU-based ones, but just taking a long time. What would we have to do?

Well, we’d still have to wait for multiple things simultaneously. Our servers have many connections they have to service at once, and when a message comes in on socket B, it can’t be ignored just because the code happens to be on socket A. If asynchronous operations are implemented by blocking, we have to handle this with multithreading.

Kernel multithreading is expensive, but even Go-style “green threads” have to have a separate stack for each green thread. Stacks are gnarly, because it’s unclear how much space should be reserved for them ahead of time. They have to dynamically adjust to the run-time demand, and when the original allocation is used up, you get a pause as you try to allocate more. The advantage is, you have a simpler mental model with fewer distinctions. Basically, you trade performance for simplicity – like in garbage collection.

If you want to do this trade, Rust doesn’t stop you from implementing it yourself. OS threads and blocking system calls are perfectly reasonable solutions to many problems. But Rust isn’t going to encourage the trade by creating a new compromise point of “green threads.” You have to do async the whole way, and if you think of what async code actually de-sugars to, you wouldn’t complain about how hard async functions are, but be impressed it’s so darn easy to write them!

Rust is a systems programming language at heart. I understand and respect that, because of its type system and guarantees, it has found use outside of the old domains of C and C++, but those C and C++ systems programmers are Rust’s ideal “base,” in a political sense of the word. Rust should not sacrifice performance for ease of programmability.

Blocking vs Non-Blocking

Rust has two ways of doing off-CPU “IO” operations, blocking and non-blocking. Blocking takes over the thread, and non-blocking works through async. This mirrors a distinction in the system calls that most kernels provide. The operating system API has this distinction built into it, and it makes sense for Rust to propagate that to the user.

But fundamentally, one of these constructs is more honest than the other. When we call a blocking kernel system call, rather than the kernel taking over the CPU, running on it, and then returning the thread of execution to us, what actually happens internally is more of a mirage. The kernel deschedules the current process, and using an internal mechanism more like async than like blocking, schedules it again, recovering its previous state as if nothing happens, when the IO is done.

This means that we can pretend the I/O operation was just an operation like any other, but it comes at a risk – the operation might not return anytime soon. It might in fact wait for a situation that’s not going to happen anytime soon.

If such a blocking function is called from a non-async Rust thread, we assume that the caller is using threads to juggle multiple I/O events – or else that they simply don’t have anything else going on. But it is very dangerous to call a blocking function from an async function. It can starve threads in a thread pool, and cause knock-on effects in other places. Maybe an async task is waiting for a message from a channel, and even though the message was sent, the task doesn’t resume because the thread it’s scheduled on is busy on this blocking function. The effects are unpredictable and non-local – similar to the dreaded “undefined behavior” – and debugging is similarly difficult – ask me how I know!

Functions that block but are not async are referred to in the “More Stina” blog post (also linked above) as “purple functions.” They are not true async “red functions” that you can call with async, but they are also not safe to simply call from an async function like a truly CPU-based “blue function” would be. Calling a blocking function from an async function is extremely unsafe, and there is simply no warning generated by rustc, normally so helpful about such things, to let you know how deep and undebuggable a mistake you’re making.

These purple functions ought to be a different color in Rust, just like they are in practice. It should be an error to call a blocking system call from an async function. I don’t know how this would work – I imagine a generalization of unsafe that includes things like blocks, perhaps as well as panics. That would fundamentally be an “effects system,” as is regularly proposed, but that’s not the only solution. But I do fundamentally think that something ought to be done about this deficit in Rust’s otherwise quite rigorous function-coloring system.

So, in conclusion, I say: yes, Rust async functions are colored. This is the same as saying they are strongly typed, and this is a good thing. And instead of trying to fix it, we should have more of it.

Postscript: Monads

As I mentioned before, calling an async function does something fundamentally different under the hood from calling a vanilla “blue” function. Similarly, calling a fallible function with Result does something different from calling a function with a normal return value. In both cases, the control flow is different – either it contains short-circuits to error code (Result); or regular hops back and forth between the task, other tasks, and the executor (async/Future).

In both these cases, it’s like the meaning of having one statement come after another has changed: ; itself has been overriden. And it would be nice if generic collections methods, like map and filter, supported this, so that you could fail, or await, in the closures.

This is possible in Haskell, because Haskell has a typeclass (equivalent to Rust traits) for abstracting over different styles of control flow. That is what Haskell’s infamous monads are for, and why Haskell persists in using this technology even though it’s so famously confusing for beginners.

Fundamentally, every Haskell monad is a function color. And often, they can be stacked together (via “monad transformers”) so that you can say something like “this function can do IO, fail, and be asynchronous.” You can also create functions that are polymorphic on “color”: the control flow is rewritten based on which monad you actually end up in.

Why is this useful? As “More Stina”’s post already mentions, there is a proposal to add try_ versions of iterator adapter methods: try_filter, etc., to enable them to work smoothly with Result-“colored” functions. A method like filter or map also would need an adapter to work well with async. If there were an abstract concept of monad, we could write code with filter-like methods that could short-circuit on failure and do the right thing with await:

vec!([2,3,4])
    .iter()
    .filter_monad(|x| fallable_thing.contains(x)?)
    .filter_monad(|x| network_file_thing.contains(x).await?)
    .for_each_monad(|x| network_other_thing.send(x).await);

Perhaps Rust will someday gain this abstraction as well. I actually think that would be good for Rust. Monads are hard to deal with conceptually, and I’m not sure how to make them more user-accessible, but I think if anyone could do it, it’s the Rust people, who’ve already done such a good job so far at programming language design and maintenance.

Endianness, API Design, and Polymorphism in Rust

2021-11-21T00:00:00+00:00

I have been working on a serialization project recently that involves endianness (also known as byte order), and it caused me to explore parts of the Rust standard library that deals with endianness, and share my thoughts about how endianness should be represented in a programming language and its standard library, as I think this is also something that Rust does better than C++, and also makes for a good case study to talk about API design and polymorphism in Rust.

To start with, let’s discuss endianness a little. I assume most of my audience has some familiarity with endianness; nevertheless, I’d like to explain it from first principles. That way, we can subsequently apply the insights from the explanation to API design. That, and I want practice explaining concepts, even if they are basic. I’ll try to keep it interesting, but also feel free to skim the next section.

Big End, Little End

I first encountered the concept of endianness when I was first learning to program using the DEBUG.EXE program on DOS. When a 16-bit value was displayed as a 16-bit value, it was just normal hexadecimal, but when it was displayed as two 8-bit bytes, something weird happened with the display.

Here’s a C++ snippet that demonstates the effect:

template<typename T>
void display_bytes(const T &val) {
    char bytes[sizeof(T)];
    memcpy(bytes, &val, sizeof(T));
    for (auto byte : bytes) {
        printf("%2x ", byte);
    }
    printf("\n");
}

int main() {
    int value = 0x12345678;
    printf("%x\n", value);
    display_bytes(value);
}

When run on any little-endian processor (the vast majority of processors), we get:

12345678
78 56 34 12

The least significant byte is first, so if you print out the individual bytes in order, you have to read it backwards – though each individual byte is still forwards.

If you were to run the same code on a big-endian processor, you would get:

12345678
12 34 56 78

At this point, little endianness as I understood it was a weird thing that Intel processors did for reasons I didn’t understand, that made me do a little extra work when reading hex dumps. At the time, this was fine: I thought that having to apply this extra arcane knowledge was cool for its own sake. But also, I thought of little endianness as the weird way to do things that required extra work, and big endianness as the more natural design. It wasn’t until much later that I got some nuance on that opinion.

See, the writing system we use for numbers is big endian. Instead of dividing a number into bytes (base 256), we divide it into digits. We consider the left of the page to come before the right of the page, and we write the most significant digit first. This is all taught explicitly in grade school:

1234 = 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0

There’s a certain, very human logic to this system: the more important information comes first, then the details. Mathematically, though, we see decreasing numbers for the powers of 10: first we specify a factor for 10 to the third, then to the second, then to the first, then to the zeroth. We could instead imagine where we wrote our decimal numbers big endian, where the same number would be written 4321, and still mean “one thousand two hundred thirty-four,” where we’d count like this:

This would have the advantage that the first digit, digit zero, would be multiplied by 10^0, digit one by 10^1. Not what humans would normally decide to do, but it has a certain logic. And if you think about languages like Hebrew or Arabic, which are written right from left, but which write numbers the same direction we do, the least significant digit is actually reached first in the normal direction of reading: when they see “100” in the midst of the text, the zeroes are “before” the “1”. (I am told that this is not how most people think of it; that they instead just think of numbers as going the other way from other text, but it just goes to demonstrate how based on convention all of this stuff is).

So all of this is to say that, the weird effect we had before with big endian looking “normal” and little endian looking “weird” has nothing to do with the intrinsic logic of big vs little endian, but rather with the fact that we’re mixing a little endian processor with a big endian writing notation. If we instead were to print the digits of each number in increasing significance – that is, if we were to use little endian as our printing convention – we’d get:

# Little Endian Machine
87654321
87 65 43 21
# Big Endian Machine
87654321
21 43 65 87

The mismatch between writing the whole word in hex and writing the individual bytes in order, in hex is caused by a mismatch between the endianness of the system (normally little in practice) and the endianness of the writing system (normally big in practice). When the writing system isn’t a factor, little endian makes more mathematical sense, is easier to reason about in circuitry and code, and therefore has won out over big endian in every major processor architecture.

The only real exception is network byte order, which uses big endian. This is convenient for manually reading hex dumps of packets, but probably has more to do with the fact that the Internet developed when this question was much less settled. Due to the presence of network byte order, however, and the fact that the endianness of Intel and modern ARM is opposite of the endianness of most human writing systems, the concept remains with us.

When is endianness relevant?

In writing numbers, a digit has no endianness: 8 means the same thing as a single digit number. Similarly, a byte is indivisible in a processor. Bytes are made up of bits, but outside of special instructions, the ordering of those bits is not relevant. One of them is most significant, one of them is least, but unless we’re indexing them for a special instruction, or sending them over a wire one by one, there is no way to say which such bit comes “first.”

Indeed, if we want to display a byte as a series of bits, we as the programmer get to choose the endianness, and the program runs identically on either a big endian or a little endian platform. The little endian version is a little more intuitive, as “2 to the N” is an operation that’s easy to write on computers, and in little endian the N increases as the index increases:

fn byte_as_bits_le(byte: u8) -> [u8; 8] {
    let mut res = [0u8; 8];
    for i in 0..8 {
        let mask = 1 << i;
        if byte & mask == 0 {
            res[i] = 0;
        } else {
            res[i] = 1;
        }
    }
    res
}

Nevertheless, at no point is this function relying on the endianness of the hardware, and it does the same thing on either types of hardware.

Why do I bring this up? Well, I don’t think it makes any sense to speak of the endianness of a (multi-byte) word per se. The endianness of the word only comes into play when it is stored as – and accessible as – a series of bytes.

So from that point of view, what are the operations where endianness is relevant? Given a word, what series of bytes comprises it? And then, given a series of bytes, what word is it?

In Rust terms, these are to_be_bytes (for big endian)/to_le_bytes (for little) in the one direction, and from_be_bytes/from_le_bytes in the other. These methods are all bundled together in the Rust documentation for – in this case – the primitive u32 type, along with ne which gives whatever the native endianness of the processor is.

These are the APIs I’m going to be discussing. But before discussing how they might be improved, I’m going to point out an API that I think doesn’t make as much sense: to_be. This method takes in a word, a u32, and outputs a u32, and yet claims to change the endianness of that word, which as I mentioned, does not have endianness per se, only in that it’s represented by bytes.

I know what they mean by it. On a little endian platform, it will replace 0x12345678 with 0x78563412. But what does that actually mean? In its form as a u32, as I have argued above, a number has no endianness. So what is this number 0x78563412? It is the number that, if stored in bytes in the native endianness, will store the original number in big endian.

That’s a mouthful, I know, because it’s actually a complicated concept. That is to say, it’s a hack. We want to write a number – say, 2000 – in big endian, but we don’t want to think of it as bytes, yet. We want to be able to load the whole number into a register, and when we write it, we want it to be 2000 in big endian. So we byte swap the number, and instead of storing 2000, we store 3490119680, so that if we write it using the processor’s normal mechanism for writing, it comes out to 2000 in big endian.

Basically, to_be does the equivalent of u32::from_le_bytes(input.to_be_bytes()), and using it looks like this:

let input: u32 = 2000;
// These two invocations do the same thing
let be = input.to_be();
let be2: u32 = u32::from_le_bytes(input.to_be_bytes());
println!("{} {}", be, be2); // 3490119680 for both

// The result can be written using native (little endian) byte
// order, and it will give 2000 in big endian byte order.
assert_eq!(be.to_le_bytes(), input.to_be_bytes());

This is arguably a useful hack – though I’m not fully convinced – but it is definitely a hack. I do not think the description is sufficiently rigorous. The output of to_be is not a number “in big endian,” it is a different number that resembles the big endian representation of the original number. The description is a simplification, and I think a conceptually incoherent one – which is understandable because the concept at play here is so hackish.

It appears that to_be was in Rust 1.0, and to_be_bytes was introduced later. This to me is a good sign, as to_be_bytes, I think, makes much more sense as an interface. And as to why we started out with the to_be type of interface in Rust, that makes sense as well, because in C the traditional (POSIX but not ANSI C) functions for these conversions have similar semantics, such as htonl (host to network long), where we have this conceit of storing a “big endian” or “network byte order” value in a uint32_t (C for u32). This always struck me as the wrong abstraction, but it is justified – or at least more understandable – for C as we simply can’t pass around things like char[4] (C for [u8; 4]) by value in C.

There are other technical and historical reasons why htonl and to_be and friends exist, even if conceptually messy, but in any case, since I’m talking about API design, and to_be_bytes and friends are a better match for the concepts at hand, I am now going to pretend to_be is deprecated (it is not), and move on to discussing the design of to_be_bytes and to_le_bytes.

Policies

So the first thing I notice is that there’s six methods that deal with fundamentally one topic:

from_be_bytes
from_le_bytes
from_ne_bytes
to_be_bytes
to_le_bytes
to_ne_bytes

But really they vary in two ways, namely:

which operation is performed (from_X_bytes vs to_X_bytes)
which endianness is required (le, be, and ne)

For us humans, this is clear from the names, but to the compiler, these names do not form a pattern that it is capable of recognizing. There are simply 6 separate functions named with 6 separate combinations of characters.

Now, having separate functions for separate operations makes sense; that’s what functions are for. But for the same operation but with different endianness, it might make more sense to indicate that it is one operation with several possible endiannesses by making the endianness into a parameter.

The obvious way to do this would be via run-time parameter. A fairly literal translation of this API would be something like:

enum Endian {
    Little,
    Big,
    Native,
}

impl u32 {
    fn to_endian_bytes(self, endianness: Endian) -> [u8; 4] {
        match endianness {
            Endian::Little | Endian::Native => { ... }
            Endian::Big => { ... }
        }
    }

    fn from_endian_bytes([u8: 4], endianness: Endian) -> Self {
        match endianness {
            Endian::Little | Endian::Native => { ... }
            Endian::Big => { ... }
        }
    }
}

This would also allow us to implement the concept of “native” byte order a little differently, and create more names for byte orders:

enum Endian {
    Little,
    Big,
}
static NATIVE_ENDIAN: Endian = Endian::Little;
static NETWORK_ENDIAN: Endian = Endian::Big;

So, besides simplifying away the need for a separate implementation for the ne functions, and making the code more in sync with what’s happening, what other positive things have we accomplished? Well, given that we now have a parameter, we can now make more complicated code parametric on it. Imagine we have an entire structure to write out, and we want to write the entire structure as big-endian or little-endian, perhaps because the protocol in question changed endianness at some version. Or perhaps we just want to make clear to the reader that one endianness is used for the entire structure. We can now do something like this:

struct Structure {
    a: u32,
    b: u32,
    c: u32,
}

impl Structure {
    fn serialize(&self, endianness: Endian) -> [u8; 12] {
        let mut res = [0u8; 12];
        [0..4].copy_from_slice(self.a.to_endian_bytes(endianness));
        [4..8].copy_from_slice(self.b.to_endian_bytes(endianness));
        [8..12].copy_from_slice(self.c.to_endian_bytes(endianness));
        res
    }

    pub fn serialize_old_version(&self) -> [u8; 12] {
        self.serialize(Endian::Big)
    }

    pub fn serialize_new_version(&self) -> [u8; 12] {
        self.serialize(Endian::Little)
    }
}

The alternative would be to write two separate serializers, and duplicate all the logic of how to arrange the layout. Duplication is bad, because bug fixes don’t necessarily get to all the duplicate copies. So, to save on duplication, we’d have to basically wrap to_be_bytes and to_le_bytes in a version of this; it would be more convenient if the standard library had done this for us.

What is the downside of this? Well, the implementation didn’t really get any simpler. Actually, in the normal case, where you don’t change your mind about endianness, the implementation got more complicated. We now have a match expression in our two simplified functions, which theoretically indicates a run-time decision. We could trust the optimizer to fold the decision in through inlining and constant-propagation, but trusting the optimizer is suspicious and unnecessary.

Nothing we’ve done so far requires this decision to be made at run-time, and so we can instead make the decision at compile-time. Where we had a run-time parameter, we can now have a compile-time parameter.

Now, although Rust has rudimentary support for other kinds of compile-time parameters, the archetypical compile-time parameter is a type, bound by a trait. Our enum from before would then have to be lifted into the type space, as a trait and a few types:

trait Endianness { }

struct BigEndian;
impl Endianness for BigEndian { }

struct LittleEndian;
impl Endianness for LittleEndian { }

Now, we need a compile-time equivalent for match. This is a little harder, as at the time of this writing stable Rust does not have the most direct equivalent of match for implementors of traits, that is, “specialization.” But Rust does allow something similar: the code for each branch of the match must go in each type’s implementation of that trait, and the fact of the match must be provided in the trait itself.

This will also help us simplify the implementation. This to_be_bytes/ to_le_bytes API is not just implemented for u32, but for all primitive types. Currently, these mostly-similar implementations are stamped out by a macro, along with other methods for primitive types. But we might imagine that there are two things going on in the implementation:

write out the type into an array of bytes
either swap the bytes, or not, based on whether we’re using the hardware endianness

We could then make the trait come into play – with the decision made at compile time – for the swapping part.

trait Endianness {
    fn possibly_swap(bytes: &mut [u8]);
}

struct BigEndian;
impl Endianness for BigEndian {
    fn possibly_swap(bytes: &mut [u8]) {
        // actually swap here
    }
}

impl Endianness for LittleEndian {
    fn possibly_swap(_: &mut [u8]) {
        // no need to do anything here
    }
}

We have now moved some of the implementation into a trait, where the specifics of the implementation are determined by which type implements that trait. This is an example of the policy pattern, where a portion of the code is abstracted out into a policy, and the policy and the main body of the function are sewn together – in this case, at compile-time – into many variations of a function that execute similarly to what an implementor might have written by hand.

Note that there is no possibility of doing run-time endianness determination in this version. This trait methods does not take a self parameter, and would have to be invoked as T::possibly_swap. This is possible in Rust because we are doing compile-time polymorphism, not run-time, so there is no need to make this trait object-safe.

Our previous example serializer, with the two versions, now looks something like this:

struct Structure {
    a: u32,
    b: u32,
    c: u32,
}

impl Structure {
    fn serialize<T: Endianness>(&self) -> [u8; 12] {
        let res = [0u8; 12];
        (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
        (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
        (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
    }

    pub fn serialize_old_version(&self) -> [u8; 12] {
        self.serialize::<BigEndian>()
    }

    pub fn serialize_new_version(&self) -> [u8; 12] {
        self.serialize::<LittleEndian>()
    }
}

The policy pattern is a fairly common pattern in generic programming just like in object-oriented programming, but when generic programming is implemented through monomorphization, as it is in Rust, it can be just as efficient as hand-implementing the combinations of policy and code, while allowing for more policies.

For example, if there were a platform where 4-byte chunks were split into 2-byte chunks little endian, but 2-byte chunks were split into 1-byte chunks big endian, we could write a new policy for this platform and all the existing code would support it.

A much more complicated example of the policy pattern is serde, where the generated serializers and deserializers for each structure are all polymorphic on what serialization format should be used. If a new serialization format comes out with serde support, all existing Serialize instances can then be used with the new format without modification.

Now, in practice, there are often processor instructions that do byte swaps. The hardware uses an interface analogous to the hackish, conceptually messy to_be(), which at a hardware level makes sense because elegance of abstraction is not as an important goal as performance. This converts 0x12345678 into 0x78563412, and similar. So, this implementation is not actually what the policy would look like in a production context. Nevertheless, the endianness argument could definitely be passed in by a trait-constrained type parameter; the implementation would just be more complicated.

Traits

I mentioned before that u32 is not the only type that implements this set of methods, this convention, this informal protocol of to_be_bytes, to_le_bytes, etc. This means that if we were writing in C++, we would have enough from this informal protocol to write a function that did something like “write this value in big endian twice, and little endian twice, to different locations” that was agnostic to the type provided, as long as it implemented this informal interface. It would look something like this:

template <typename T>
void write_four_times(T val) {
    write_to_location_1(val.to_be_bytes());
    write_to_location_2(val.to_be_bytes());
    write_to_location_3(val.to_le_bytes());
    write_to_location_4(val.to_le_bytes());
}

This would allow you to call write_four_times on any type for which that code made sense, as C++ templates are literally templates, and the T is filled in before type-checking. The protocol here is implicit in the structure of the function – it is compile-time duck typing.

Rust generic functions are type-checked before monomorphization, so we can’t do this in Rust. Instead of defining to_le_bytes() and friends separately on each type, this function would require them to be in a trait, maybe EndianBytes:

fn write_four_times<T: EndianBytes>(val: T) {
    write_to_location_1(&val.to_be_bytes());
    write_to_location_2(&val.to_be_bytes());
    write_to_location_3(&val.to_le_bytes());
    write_to_location_4(&val.to_le_bytes());
}

EndianBytes would have to define at least those methods:

trait EndianBytes {
    fn to_be_bytes(self) -> [u8; ???];
    fn to_le_bytes(self) -> [u8; ???];
}

Unfortunately, as the ??? shows, the different output arrays have different lengths – a u16 would be 2 bytes and a u64 8 bytes – and so the Rust trait system at the time of this writing is (to my knowledge) not powerful enough to represent this trait as is. Instead, it would have to return a slice, which introduces an additional run-time value (the length) into the mix that we’d rather avoid in this exercise on compile-time generic programming.

Run-Time Endianness

What if we want to make decisions about endianness at run-time, say, because we are implementing DBus? This is, as Linus Torvalds pointed out in one of his famously angry emails, a stupid idea for a protocol, but we don’t always get to choose what protocol we implement. Even though choosing one endianness and sticking to it would have avoided the run-time cost of making a decision (which as Torvalds points out is more than the cost of either decision), the developers of DBus did not do that. UTF-16 also didn’t – it also does run-time endianness adjustment with a sentinal character at the top of the text block to indicate the endianness.

The most obvious solution is to use the run-time parameterized version we discussed towards the beginning of this post, and have an enum Endianness parameter. This would be parsed in each message (or connection, or whatever duration of time endianness is configured) and then passed through to all the serializing and deserializing code, which would look something like our original serialization example:

fn serialize(&self, endianness: Endian) -> [u8; 12] {
    let res = [0u8; 12];
    (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes(endian));
    (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes(endian));
    (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes(endian));
}

pub fn serialize_old_version(&self) -> [u8; 12] {
    self.serialize(Endian::Big)
}

pub fn serialize_new_version(&self) -> [u8; 12] {
    self.serialize(Endian::Little)
}

We can do better than that, though. This has one copy of the serialization code in the source, and one copy in the binary. What we could do instead, is expand the more sophisticated compile-time version of the serialization code, and move the match into a wrapper serialize method:

fn serialize_impl<T: EndiannessTrait>(&self) -> [u8; 12] {
    let res = [0u8; 12];
    (&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
    (&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
    (&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
}

pub fn serialize(&self, endianness: EndiannessEnum) -> [u8; 12] {
    match endianness {
        EndiannessEnum::Big => self.serialize_impl::<BigEndian>(),
        EndiannessEnum::Little => self.serialize_impl::<LittleEndian>(),
    }
}

This generates two serializers from one serializer function (thus mitigating the biggest problem with code duplication – that of maintainability), and makes the run-time decision further up in the call tree. This ability – to adjust between finer-grained run-time decisions and duplication of run-time code – is one of the greatest powers of C++ and of Rust. We can effectively – in the DBus case – create two entire DBus deserializers – one for little-endian, one for big endian – and then decide between the two deserializers at run-time on a per-message basis, which, because fewer run-time decisions are being made, will be much more efficient than making the run-time deserialization decision at every deserialization site.

Of course, for serialization we can simply write one serializer and always generate little-endian DBus messages.

C++ Move Semantics Considered Harmful (Rust is better)

2021-11-03T00:00:00+00:00

This post is part of my series comparing C++ to Rust, which I introduced with a discussion of C++ and Rust syntax. In this post, I discuss move semantics. This post is framed around the way moves are implemented in C++, and the fundamental problem with that implementation, With that context, I shall then explain how Rust implements the same feature. I know that move semantics in Rust are often confusing to new Rustaceans – though not as confusing as move semantics in C++ – and I think an exploration of how move semantics work in C++ can be helpful in understanding why Rust is designed the way it is, and why Rust is a better alternative to C++.

I am by far not the first person to discuss this topic, but I intend:

to discuss it thoroughly enough to contribute to the conversation
to nevertheless discuss it in such a way that those familiar with systems programming, but unfamiliar with either C++ or move semantics, can understand it, starting from first principles

Modern C++

First, some background.

In 2011, C++ finally fixed a set of long-standing deficits in the programming language with the shiny new C++11 standard, bringing it into the modern era. Programmers enthusiastically pushed their companies to allow them to migrate their codebases, champing at the bit to be able to use these new features. Writers to this day talk about “modern C++,” with the cut-off being 2011. Programmers who only used C++ pre-C++11 are told that it is a new programming language, the best version of its old self, worth a complete fresh try.

There were a lot of new features to be excited about. C++ standard threads were added then – and thread standardization was indeed good, though anyone who wanted to use threads before likely had their choice of good libraries for their platform. Closures were also very exciting, especially for people like me who came from functional programming, but to be honest, closures were just syntactic sugar for existing patterns of boilerplate that could be readily used to write function objects.

Indeed, the real excitement at the time, certainly the one my colleagues and I were most excited about, was move semantics. To explain why this feature was so important, I’ll need to talk a little about the C++ object model, and the problem that move semantics exist to solve.

Value Semantics

Let’s start by talking about a primitive type in C++: int. Objects – in C++ standard parlance, int values are indeed considered objects – of type int only take up a few bytes of storage, and so copying them has always been very cheap. When you assign an int from one variable to another, it is copied. When you pass it to a function, it is copied:

int print_i(int arg) {
    arg += 3;
    std::cout << arg << std::endl;
}

int foo = 3;
int bar = foo; // copy
foo += 1; // foo gets 4
std::cout << bar << std::endl; // bar is still 3
print_i(foo); // prints 4+3 ==> 7
std::cout << foo << std::endl; // foo is still 4

As you can see, every variable of type int acts independently of each other when mutated, which is how primitive types like int work in many programming languages.

In the C++ version of object-oriented programming, it was decided that values of custom, user-defined types would have the same semantics, that they would work the same way as the primitive types. So for C++ strings:

std::string foo = "foo";
std::string bar = foo; // copy (!)
foo += "__";
bar += "!!";
std::cout << foo << std::endl; // foo is "foo__"
std::cout << bar << std::endl; // bar is "foo!!"

This means that whenever we assign a string to a new variable, or pass it to a function, a copy is made. This is important, because the std::string object proper is just a handle, a small structure that manages a larger memory allocation on the heap, where the actual string data is stored. Each new std::string that is made via copy requires allocating a new heap allocation, a relatively expensive operation in performance.

This would cause a problem when we want to pass a std::string to a function, just like an int, but don’t want to actually make a copy of it. But C++ has a feature that helps with that: const references. Details of the C++ reference system are a topic for another post, but const references allow a function to operate on the std::string without the need for a copy, but still promising not to change the original value.

The feature is available for both int and std::string; the principle that they’re treated the same is preserved. But for the sake of performance, ints are passed by value, and std::strings are passed by const reference in the same situation. In practice, this dilutes the benefit of treating them the same, as in practice the function signatures are different if we don’t want to trigger spurious expensive deep copies:

void foo(int bar);
void foo(const std::string &bar);

If you instead declare the function foo like you would with an int, you get a poorly performing deep copy. The default is something you probably don’t want:

void foo(std::string bar);
void foo2(const std::string &bar);
`
std::string bar("Hi"); // Make one heap allocation
foo(bar); // Make another heap allocation
foo2(bar); // No copy is made

This is all part of “pre-modern” C++, but already we’re seeing negative consequences of the decision to treat int and std::string as identical when they are not, a decision that will get more gnarly when applied to moves. This is why Rust has the Copy trait to mark types like i32 (the Rust equivalent of int) as being copyable, so that they can be passed around freely, while requiring an explicit call to clone() for types like String so we know we’re paying the cost of a deep copy, or else an explicit indication that we’re passing by reference:

fn foo(bar: String) {
    // Implementation
}

fn foo2(bar: &str) {
    // Implementation
}

let bar = "hi".to_string();
foo(bar.clone());
foo2(&bar);

The third option in Rust is to move, but we’ll discuss that after we discuss moves in C++.

Copy-Deletes and Moves

C++ value semantics break down even more when we do need the function to hold onto the value. References are only valid as long as the original value is valid, and sometimes a function needs it to stay alive longer. Taking by reference is not an option when the object (whether int or std::string) is being added to a vector that will outlive the original object:

std::vector<int> vi;
std::vector<std::string> vs;
{
    int foo = 3;
    foo += 4;
    vi.push_back(foo);
} // foo goes out of scope, vi lives on
{
    std::string bar = "Hi!";
    bar += " Joe!";
    vs.push_back(bar);
} // bar goes out of scope, vs lives on

So, to add this string to the vector, we must first make an allocation corresponding to the object contained in the variable bar, and then must make a new allocation for the object that lives in vs, and then copy all the data.

Then, when bar goes out of scope, its destructor is called, as is done automatically whenever an object with a destructor goes out of scope. This allows std::string to free its heap allocation.

Which means we copied an allocation into a new heap allocation, just to free the original allocation. Copying an allocation and freeing the old one is equivalent to just re-using the old allocation, just slower. Wouldn’t it make more sense to make the string in the vector just refer to the same heap allocation that bar formerly did?

Such an operation is referred to as a “move,” and the original C++ – pre C++11 – didn’t support them. This was possibly because they didn’t make sense for ints, and so they were not added for objects that were trying to act like ints – but on the other hand, destructors were supported and ints don’t need to be destructed.

In any case, moves were not supported. And so, objects that managed resources – in this case, a heap allocation, but other resources could apply as well – could not be put onto vectors or stored in collections directly without a copy and delete of whatever resource was being managed.

Now, there were ways to handle this in pre-C++11 days. You could add an indirection, and make a heap allocation to contain the std::string object, which is only a small object with a pointer to another allocation, but would at least let you pass around a std::string * which is a raw pointer that would not trigger all these copies by automatically managing the heap allocation with this façade of value semantics. Or you could manually manage a C-style string with char *.

But the most ergonomic, clear std::vector<std::string> could not be used without performance degradation. Worse, if the vector ever needed to be resized, and had to itself switch to a different allocation, it would have to copy all those std::string objects internally and delete the originals, N useless reallocations.

As a demonstration of this, I wrote a sample program with a vastly simplified version of std::string, that tracks how many allocations it makes. It allows C++11-style moves to be enabled or disabled, and then it takes all the command line arguments, creates string objects out of them, and puts them in a vector. For 8 command line arguments, the version with move made, as you might expect, 8 allocations, whereas the version without the move, that just put these strings into a vector, made 23. Each time a string was added to a vector, a spurious allocation was made, and then N spurious allocations had to be made each time the vector doubled.

This problem is purely an artifact of the limitations of the tools provided by C++ to encapsulate and automatically manage memory, RAII and “value semantics.”

Consider this snippet of code:

// Pre-C++11, without moves
std::vector<std::string> vec;
{ // This might take place inside another function
  // Using local block scope for simplicity
    std::string foo = "Hi!";
    vec.push_back(foo);
}
{
    std::string bar = "Hello!";
    vec.push_back(bar);
}
// Use the vector

If we didn’t use this string class, we would then have not done a copy, just to free the original allocation. We would have simply put the pointer into the vector. We would then have been responsible for freeing all the allocations – once – when we’re done:

// Manually written equivalent
std::vector<char *> vec;
{
    // strdup, a POSIX call, makes a new allocation and copies a
    // string into it, here used to turn a static string into one
    // on the heap. We will assume we have a reason to store it
    // on the heap -- perhaps we did more manipulation in the
    // real application to generate the string.

    // The allocation is necessary to be the direct equivalent of
    // `vec.push_back("Hi")` or even `vec.emplace_back("Hi")` for
    // a `std::vector<std::string>, because that data structure has
    // the invariant that all strings in the vector must have their
    // own heap allocation (assuming no small string optimization,
    // which many strings are ineligible for).

    char *foo = strdup("Hi!");
    vec.push_back(foo);
}
{
    char *bar = strdup("Hello!");
    vec.push_back(bar);
}

// Use the vector

// Then, later, when we are done with the vector, free all the elements once
for (char *c: vec) {
    free(c);
}

The copy version of the C++ code instead does – after de-sugaring the RAII and value semantics and inlining – something that no programmer would ever write manually, something equivalent to this:

// Desugaring of pre-C++11 version of code
std::vector<char *> vec;
{
    char *foo = strdup("Hi");
    vec.push_back(strdup(foo)); // Why the additional allocate-and-copy?
    free(foo); // Because the destructor of foo will free the original
}
{
    char *bar = strdup("Hello!");
    vec.push_back(strdup(bar));
    free(bar);
}

// Use the vec
for (char *c: vec) {
    free(c);
}

C++ without move semantics fails to reach its goal of zero-cost abstraction. The version with the abstraction, with the value semantics, compiles to code less efficient than any code someone would write manually, because what we really want is to allocate the allocation while it’s a local variable foo, use the same allocation on the vector, and then only free it on the vector.

The abstractions of only supporting “copy” and “destruct” mean that the destructor of the variable foo must be called when foo goes out of scope. This means that the “copy” operation must make an independent allocation, as it cannot control when the original goes out of scope, or will be replaced with another value. If we had instead re-used the same allocation, it would be freed by foos destructor.

But copying just to destroy the original is silly – silly and ill-performant. What any programmer would naturally write in that situation results in a “move”. So this gap – and it was a huge gap – in C++ value semantics was filled in C++11 when they added a “move” operation.

Because of this addition, using objects with value semantics that managed resources became possible. It also became possible to use objects with value semantics for resources that could not meaningfully be copied, like unique ownership of an object or a thread handle, while still being able to get the advantages of putting such objects in collections and, well, moving them. Shops that previously had to work around value semantics for performance reasons could now use them directly.

It is not, therefore, surprising that this was for many the most exciting change in C++11.

How Move Is Implemented in C++

But for now, let’s put ourselves in the place of the language designers who designed this new move operation. What should this move operation look like? How could we integrate it into the rest of C++?

Ideally, we would want it to output – after inlining – exactly the code that we would expect to write manually. When foo is moved into the vector, the original allocation must not freed. Instead, it is only freed when the vector itself is freed. This is an absolute necessity to solve the problem as we must remove a free in order to remove the allocation, but we also cannot leak memory. If there is to be exactly one allocation, there must be exactly one deallocation.

Calls to free (or delete[] in my example program) are made in the destructor, so the most straight-forward way to go forward is to say that the destructor should only be called when the vector is destroyed, but not when foo goes out of scope. If foo is moved onto the vector, then the compiler should take note that it has been moved from, and simply not call the destructor. The move should be treated as having already destroyed the object, as an operation that accomplishes both initialization of the new object (the string on the vector) from the original object and the destruction of the original object.

This notion is called “destructive move,” and it is how moves are done in Rust, but it is not what C++ opted for. In Rust, the compiler would simply not output a destructor call (a “drop” in Rust) for foo because it has been moved from. But, in fact, the C++ compiler still does. In destructive move semantics, the compiler would not allow foo to be read from after the move, but in fact, the C++ compiler still does, not just for the destructor, but for any operation.

So how is the deallocation avoided, if the compiler doesn’t remove it in this situation? Well, there is a decision to make here. If an object has been moved from, no deallocation should be performed. If it has not, a deallocation should be performed. Rust makes this decision at compile-time (with rare exceptions where it has to add a “drop flag”), but C++ makes it at run-time.

When you write the code that defines what it means to move from an object in C++, you must make sure the original object is in a run-time state where the destructor will still be called on it, and will still succeed. And, since we established already that we must save a deallocation by moving, that means that the destructor must make a run-time decision as to whether to deallocate or not.

The more C-style post-inlining code for our example would then look something like this:

std::vector<char *> vec;
{
    char *foo = strdup("Hi!");
    vec.push_back(foo);
    foo = nullptr;
    if (foo != nullptr) {
        free(foo);
    }
}
{
    char *bar = strdup("Hi!");
    vec.push_back(bar);
    bar = nullptr;
    if (bar != nullptr) {
        free(bar);
    }
}

This null check is hidden by the fact that in C++, free and delete and friends are defined to be no-ops on null, but it still exists. And while the check might be very cheap compared to the cost of calling free, it might not be cheap when things are moved in a tight loop, where free is never actually called. That is to say, this run-time check is not cheap compared to the cost of not calling free.

So, given the semantics of move in C++, it results in code that is not the same as – and not as performant as – the equivalent hand-written C-style code, and therefore it is not a zero-cost abstraction, and doesn’t live up to the goals of C++.

Now, it looks like the optimizer should be able to clean up an adjacent set to null and check for null, but not all examples are as simple as this one, and, like in many situations where the abstraction relies on the optimizer, the optimizer doesn’t always get it.

Arguing Semantics

But that performance hit is small, and it is usually possible to optimize out. If that were the only problem with C++ move semantics, I might find it annoying, but ultimately I’d say, like about many things in about both C++ and Rust, something like: Well, this decision was made, remember to profile, and if you absolutely have to make sure the optimizer got it in a particular instance, check the assembly by hand.

But there’s a few further consequences of that decision.

First off, the resource might not be a memory allocation, and null pointers might not be an appropriate way to indicate that that resource doesn’t exist. This responsibility of having some run-time indication of what resources need to be freed – rather than a one-to-one correspondence between objects and resources – is left up to the implementors of classes. For heap allocations, it is made relatively easy, but the implementor of the class is still responsible for re-setting the original object. In my example, the move constructor reads:

string(string &&other) noexcept {
    m_len = other.m_len;
    m_str = other.m_str;
    other.m_str = nullptr; // Don't forget to do this
}

The move constructor has two responsibilities, where a destructive version would only have one: It must set up state for the new object, and it must set up a valid “moved from” state for the old object. That second obligation is a direct consequence of non-destructive moves, and provides the programmer with another chance to mess something up.

In fact, since destructive moves can almost always be implemented by just copying the memory (and leaving the original memory as garbage data as the destructor will not be called on it), a default move constructor would correctly cover the vast majority of implementations, creating even fewer opportunities to introduce bugs.

But in C++, the moved-from state also has obligations. The destructor has to know at run-time not to reclaim any resources if the object no longer has any, but in general, there is no rule that moved-from objects must immediately be destroyed. The programming language has explicitly decided not to enforce such a rule, and so, to be properly safe, moved-from objects must be considered – and must be – valid values for those objects.

This means that any object that manages a resource now must manage either 1 or 0 copies of that resource. Collections are easy – moved from collections can be made equivalent to the “empty” collection that has no element. For things like thread handles or file handles, this means that you can have a file handle with no corresponding file. Optionality is imported to all “value types.”

So, smart pointer types that manage single-ownership heap allocations, or any sort of transferrable ownership of heap allocations, now of necessity must be nullable. Nullable pointers are a serious cause of errors, as often they are used with the implicit contract that they will not be null, but that contract is not actually represented in the type. Every time a nullable pointer is passed around, you have a potential miscommunication of whether nullptr is a valid value, one that will cause some sort of error condition, or one that may lead to undefined behavior.

C++ move semantics of necessity perpetuate this confusion. Non-nullable smart pointers are unimplementable in C++, not if you want them to be moveable as well.

Move, Complicatedly

This leads me to Herb Sutter’s explanation of C++ move semantics from his blog. I respect Herb Sutter greatly as someone explaining C++, and his materials helped me learn C++ and teach it. An explanation like this is really useful if programming in C++ is what you have to do.

However, I am instead investigating whether C++’s move semantics are reasonable, especially in comparison to programming languages like Rust which do have a destructive move. And from that point of view, I think this blog post, and its necessity, serve as a good illustration of the problems with C++’s move semantics.

I shall respond to specific excerpts from the post.

C++ “move” semantics are simple, and unchanged since C++11. But they are still widely misunderstood, sometimes because of unclear teaching and sometimes because of a desire to view move as something else instead of what it is.

Given the definition he’s about to give of C++ move semantics, I think this is unfair. The goal of move is clear: to allow resources to be transferred when copying would force them to be duplicated. It is obvious from the name. However, the semantics as the language defines them, while enabling that goal, are defined without reference to that goal.

This is doomed to lead to confusion, no matter how good the teaching is. And it is desirable to try to understand the semantics as they connect to the goal of the feature.

To explain what I mean, see the definition he then gives for moving:

In C++, copying or moving from an object a to an object b sets b to a’s original value. The only difference is that copying from a won’t change a, but moving from a might.

This is a fair statement of C++’s move semantics as defined. But it has a disconnect with the goals.

In this definition, we are discussing the assignment written as b = a or as b = std::move(a). The reason why moving might change a, as we’ve discussed, is that a might contain a resource. Moving indicates that we do not wish to copy resources that are expensive or impossible to copy, and that in exchange for this ability, we give up the right to expect that a retain its value.

This definition is the correct one to use for reasoning about C++ programs, but it is not directly connected to why you might want to use the feature at all. It is natural that programmers would want to be able to reason about a feature in a way that aligns with its goals.

The goal of this post is to obscure the goal, and to treat move as if it were a pure optimization of copy, which will not help a programmer understand why a’s value might change, or why move-only types like std::unique_ptr exist.

The explanation of the goal of this operation is reserved in this post for the section entitled “advanced notes for type implementors”.

Of course, almost all C++ programmers in a sufficiently large project have to become “type implementors” to understand and maintain custom types, if not to write fresh implementations of them, so I think most professional programmers should be reading these notes, and so I think it’s unfair to call them advanced. But beyond that, this explanation is core to why the operation exists, and the only explanation for why move-only types exist, which all C++ programmers will have to use:

For types that are move-only (not copyable), move is C++’s closest current approximation to expressing an object that can be cheaply moved around to different memory addresses, by making at least its value cheap to move around.

He follows up with an acknowledgement that destructive moves are a theoretical possibility:

(Other not-yet-standard proposals to go further in this direction include ones with names like “relocatable” and “destructive move,” but those aren’t standard yet so it’s premature to talk about them.)

For his purposes, this is extremely fair, but since my purposes are to compare C++ to Rust and other programming languages which have destructive moves, it is not premature for me to talk about them.

This gets more interesting in the Q&A.

How can moving from an object not change its state?

For example, moving an int doesn’t change the source’s value because an int is cheap to copy, so move just does the same thing as copy. Copy is always a valid implementation of move if the type didn’t provide anything more efficient.

Indeed, for reasons of consistency and generic programming, move is defined on all types that can be moved or copied, even types that don’t implement move differently than copy.

What makes this confusing in C++, however, is that types that manage resources might be written without an implementation of move. They might pre-date the move feature, or their implementor might not have understood move well enough to implement them, or there might be a technical reason why moving couldn’t be implemented in a way that elides the resource duplication. For these types, a move falls back on a copy, even if the copy does significant work. This can be surprising to the programmer, and surprises in programming are never good. More direly, there is no warning when this happens, because the notion of resource management is not referenced in the semantics.

In Rust, a move is always implemented by copying the data in the object itself and then not destructing the original object, and never by copying resources managed by the object, or running any custom code.

But what about the “moved-from” state, isn’t it special somehow?

No. The state of a after it has been moved from is the same as the state of a after any other non-const operation. Move is just another non-constfunction that might (or might not) change the value of the source object.

I disagree in practice. For objects that use move as intended, to avoid copying resources, move will (at least usually) drain its resource. This means that an object that often manages a resource will enter a state in which it is not managing a resource. That state is special, because it is the state when a resource-managing object is doing something other than its normal job, and is not managing a resource. This is not a “special state” by any rigorous definition, but is guaranteed to be intuitively special by virtue of being resource-free. (It is also a special state in that the value is unspecified in general, whereas most of the time, the value is specified.)

Collections can, as I said before, get away with becoming the empty collection in this scenario, but even for those, the empty state is special: It is the only state that can be represented without holding a resource. And many other types of objects cannot even do this. std::unique_ptr’s moved-from state is the null pointer, and without these move semantics, it would be possible to design a std::unique_ptr that did not have a null state.

Once std::unique_ptr is forced to be allowed to have null values, it makes sense that there be other ways to create a null std::unique_ptr, e.g. by default-constructing it. But it is the design of move semantics that force it to have a null value in the first place.

Put another way: std::unique_ptr and thread handles are therefore collections of 0 or 1 heap allocation handles or thread handles, and once defined that way, the “empty” state is not special, but it is move semantics that force them to be defined that way.

Does “but unspecified” mean the object’s invariants might not hold?

No. In C++, an object is valid (meets its invariants) for its entire lifetime, which is from the end of its construction to the start of its destruction…. Moving from an object does not end its lifetime, only destruction does, so moving from an object does not make it invalid or not obey its invariants.

This is true, as discussed above. The moved-from object must be able to be destructed, and there is nothing stopping a programmer for instead doing something else with it. Given that, it must be in some state that its operations can reckon with. But that state is not necessarily one that would be valid if move semantics didn’t force its conclusion, and so again, we are close to the problem.

Does “but unspecified” mean the only safe operation on a moved-from object is to call its destructor?

No.

Does “but unspecified” mean the only safe operation on a moved-from object is to call its destructor or to assign it a new value?

No.

Does “but unspecified” sound scary or confusing to average programmers?

It shouldn’t, it’s just a reminder that the value might have changed, that’s all. It isn’t intended to make “moved-from” seem mysterious (it’s not).

I disagree firmly with the answer to the last question. “Unspecified” values are extremely scary, especially to programmers on team projects, because it means that the behavior of the program is subject to arbitrary change, but that change will not be considered breaking.

For example, std::string does not make any promises about the contents of a moved-from string. However, a programmer – even a senior programmer – may, instead of consulting the documentation, write a test program to find out what the value is of a moved-from string. Seeing an empty string, the programmer might write a program that relies on the string being empty:

std::vector<std::string>
split_into_chunks(const std::string &in) {
    int count = 0;
    std::vector<std::string> res;
    std::string acc;
    for (char c: in) {
        if (count == 4) {
            res.push_back(std::move(acc));
            // Don't need to clear string.
            // I checked and it's empty.
            count = 0;
        }
        acc += c;
    }
}

Of course, you should not do that. A later version of std::string might implement the small string optimization, where strings of below a certain size are not stored in an expensive-to-copy heap resource, but in the actual object itself. In that situation, it would be reasonable to implement move as a copy, which is allowed, and then this program would no longer do the same thing.

But this is a surprise. This is a result of the “unspecified value.” And so while it may, strictly speaking, be “safe” to do things with a moved-from object other than destruct them or assign to them, in practice, without documentation to the contrary making stronger guarantees, the only way to get “not surprising” behavior is to greatly limit what you do with moved-from objects.

What about objects that aren’t safe to be used normally after being moved from?

They are buggy….

By this definition, std::unique_ptr should likely be considered buggy, as null pointers cannot be used “normally”. Similarly, a std::thread object that does not represent a thread handle. It is only by stretching the definition of “used normally” to include these special “empty values” that std::unique_ptr gets to claim to not be buggy under that definition, although a null pointer simply cannot be used the way a normal pointer can.

Again, this attitude, that a null pointer is a normal pointer, that an empty thread handle is a normal type of thread handle, is adaptive to programming C++. But it will inevitably exist in a programmer’s blind spot, as null pointers always have. The “not null” invariant is often expressed implicitly. Many uses of std::unique_ptr are relying on them never being null, and simply leave this up to the programmer to ensure.

Herb Sutter himself discusses this:

Since the problem is that we are not expressing the “not null” invariant, we should express that by construction — one way is to make the pointer member a gsl::not_null<> (see for example the Microsoft GSL implementation) which is copyable but not movable or default-constructible.

In a programming language with destructive moves, it would be possible to have a smart pointer that was both “non-null” and movable. If we need both movability and the ability to express this invariant in the type system, well, C++ cannot help us.

But what about a third option, that the class intends (and documents) that you just shouldn’t call operator< on a moved-from object… that’s a hard-to-use class, but that doesn’t necessarily make it a buggy class, does it?

Yes, in my view it does make it a buggy class that shouldn’t pass code review.

But in a sense, this is exactly what std::unique_ptr is. It has a special state where you cannot call its most important operator, the dereference operator. It only avoids being called buggy because it expands this state so it can be arrived at by other means.

Again, everything Herb Sutter says is true in a strict sense. It is memory-safe to use moved-from objects other than to destroy or assign to them, even if the move operation makes no further guarantees. It simply isn’t safe in a broader sense, in that it will have surprising, changeable behavior. It is true that the null pointer is a valid value of std::unique_ptr, but smart pointers that implement move are forced to have such a value.

And therefore, it should not be surprising that these questions come up. The misconceptions that Herb Sutter is addressing are an unfortunate consequence of the dissonance between the strict semantics of the programming language, where his statements are true, and the practical implications of how these features are used and are intended to be used, where the situation is more complicated.

Moves in Rust

So the natural follow-up question is, how does Rust handle move semantics?

First off, as mentioned before, Rust makes a special case for types that do not need move semantics, where the value itself contains all the information necessary to represent it, where no heap allocations or resources are managed by the value, types like i32. These types implement the special Copy trait, because for these types, copying is cheap, and is the default way to pass to functions or to handle assignments:

fn foo(bar: i32) {
    // Implementation
}

let var: i32 = 3;
foo(var); // copy
foo(var); // copy
foo(var); // copy

For types that are not Copy, such as String, the default function call uses move semantics. In Rust, when a variable is moved from, that variable’s lifetime ends early. The move replaces the destructor call at the end of the block, at compile time, which means it’s a compile time error to write the equivalent code for String:

fn foo(bar: String) {
    // Implementation
}

let var: String = "Hi".to_string();
foo(var); // Move
foo(var); // Compile-Time Error
foo(var); // Compile-Time Error

Copy is a trait, but more entwined with the compiler than most traits. Unlike most traits, you can’t implement it by hand, but only by deriving from primitive types that implement copy. Types like Box, that manage a heap allocation, do not implement copy, and therefore structs that contain Box also cannot.

This is already an advantage to Rust. C++ pretends that all types are the same, even though they require different usage patterns in practice. You can pass a std::string by copy just like an int. Even if you have a vector of vectors of strings, you can pass by copy and that’s usually the default way to pass it – moves in many cases require explicit opt-in. For int it’s a reasonable default, but for collections types it isn’t, and in Rust the programming language is designed accordingly.

If you want a deep copy, you can always explicitly ask for it with .clone():

fn foo(bar: String) {
    // Implementation
}

let var: String = "Hi".to_string();
foo(var.clone()); // Copy
foo(var.clone()); // Copy
foo(var);         // Move

What this actually does is create a clone, or a deep copy, and then move the clone, as foo takes its parameter by move, the default for non-Copy types.

What does a move in Rust actually entail? C++ implements moves with custom-written move constructors, which collections and other resource-managing types have to implement in addition to implementing copying (though automatic implementation is available if building out of other movable types). Rust requires implementations for clone, but for all moves, the implementation is the same: copy the memory in the value itself, and don’t call the destructor on the original value. And in Rust, all types are movable with this exact implementation – non-movable types don’t exist (though non-movable values do). The bytes encode information – such as a pointer – about the resource that the value is managing, and they must accomplish that in the new location just as well as they did in the old location.

C++ can’t do that, because in C++, the implementation of move has to mark the moved-from value as no longer containing the resource. How this marking works depends on the details of the type.

But even if C++ implemented destructive moves, some sort of “move constructor” or custom move implementation would still be required. C++, unlike Rust, does not require that the bytes contained in an object mean the same thing in any arbitrary location. The object could contain a reference to itself, or to part of itself, that would be invalidated by moving it. Or, there could be a data structure somewhere with a reference to it, that would need to be updated. C++ would have to give types an opportunity to address such things.

Safe Rust forbids these things. The lifetime of a value takes moves into account; you can’t move from a value unless there are no references to it. And in safe Rust, there is no way for the user to create a self-referential value (though the compiler can in its implementation of async – but only if the value is already “pinned,” which we will discuss in a moment).

But even in unsafe Rust, such things violate the principle of move. Moving is always safe, and unsafe Rust is always responsible for keeping safe code safe. As a result, Rust has a mechanism called “pinning” that indicates, in the type system, that a particular value will never move again, which can be used to implement self-referential values and which is used in async. The details are beyond the scope of this blog post, but it does mean that Rust can avoid the issue of move semantics for non-movable values without ruining the simplicity of its move semantics.

For these rare circumstances, the features of moving can be accomplished by indirection, and using a Box that points to a pinned value on the heap. And there is nothing stopping such types from implementing a custom function which effectively implements a custom move by consuming the pinned value, and outputs a new value, which can then be pinned in a different location. There is no need to muddy the built-in move operation with such semantics.

Practical Implications for C++ Programmers

So, obviously, in light of my blog series, I recommend using Rust over C++. For Rust users, I hope this clarifies why the move semantics are the way they are, and why the Copy trait exists and is so important.

But of course, not everyone has the choice of using Rust. There are a lot of large, mature C++ codebases that are well-tested and not going away anytime soon, and many programmers working on those codebases. For these programmers, here is some advice for the footgun that is C++ move semantics, both based on what we’ve discussed, and a few gotchas that were out of the scope of this post:

Learn the difference between rvalue, lvalue, and forwarding references. Learn the rules for how passing by value works in modern C++. These topics are out of the scope of this blog post, but they are core parts of C++ move semantics and especially how overloading is handled in situations where moves are possible. Scott Meyers’s Effective Modern C++ is an excellent resource.
Move constructors and assignment operators should always be noexcept. Otherwise, std::vector and many other library utilities will simply ignore them. There is no warning for this.
The only sane things to do with most moved-from objects are to immediately destroy it or reset its value. Comment about this in your code! If the class specifically defines that moved-from values are empty or null, note that in a comment too, so that programmers don’t get the impression that there are any guarantees about moved-from values in general.

Conclusion

Move semantics are essential to the performance of modern C++. Without them, much of its standard library would become much more difficult to use. However, the specific design of moves in C++:

is misaligned with the purpose of moving
fails to eliminate all run-time cost
surprises programmers, and
forces designers of types to implement an “empty-yet-valid” state

Why, then, does C++ use such a definition? Well, C++ was not originally designed with move semantics in mind. Proposals to add destructive move do not interact well with the existing language semantics. One interesting blog post that I found even says, when following through on the consequences of adding destructive move semantics:

… if you try to statically detect such situations, you end up with Rust.

C++ has so many unsafe features and so many existing mechanisms, that this was deemed the most reasonable way to add move semantics to C++, harmful as it is.

And perhaps this decision was unnecessary. Perhaps there was a way – perhaps there still is a way – to add destructive moves to C++. But for right now, non-destructive moves are the ones the maintainers of C++ have decided on. And even if destructive moves were added, it’s unlikely that they’d be as clean as the Rust version, and the existing non-destructive moves would still have to be supported for backwards-compatibility sake.

In any case, Rust has taken this opportunity to learn from existing programming languages, and to solve the same problems in a cleaner, more principled way. And so, for the move semantics as well as for the syntax, I recommend Rust over C++.

And to be clear, this still has very little to do with the safety features of Rust. A more C++-style language with no unsafe keyword and no safety guarantees could have still gone the Rust way, or something similar to it. Rust is not just a safer alternative to C++, but, as I continue to argue, unsafe Rust is a better unsafe language than C++.

Sayonara, C++, and hello to Rust!

2021-10-26T00:00:00+00:00

This past May, I started a new job working in Rust. I was somewhat skeptical of Rust for a while, but it turns out, it really is all it’s cracked up to be. As a long-time C++ programmer, and C++ instructor, I am convinced that Rust is better than C++ in all of C++’s application space, that for any new programming project where C++ would make sense as the programming language, Rust would make more sense.

What Rust is not for

Before going into more detail about why I think that, I’d like to throw out a few caveats, so you know I’m a reasonable person and not just an extremist fanboy.

Caveat the first: Note that I said this about new programming project. There are some people on the Internet who demand the re-writing of all existing C and C++ projects in Rust, and while I think Rust is a better language for new projects, and that many existing projects should seriously consider integrating it, I realize, like a reasonable mature programmer, that for most existing projects, a Rust re-write would be a prohibitively expensive rabbit hole. In short, Rust is not so amazing that it will protect you from second system syndrome or from the perils of a complete rewrite. It is but a mortal programming language.

That said, new Rust versions of aging C and C++ projects are often very worthwhile and exciting, like new versions of many aging projects can be. It’s just not a magical exception to basic economics.

Caveat the second: Note also that I said “where C++ would make sense.” Rust has a lot of enthusiastic fans, and so there are a lot of people learning Rust expecting the magic when they first learned their favorite programming language. And what they find is a programming language that requires lots of arcane rules, where everything seems rather tedious, and where a lot of their favorite features don’t exist.

Rust is a systems programming language. It is not garbage collected, meaning you do have to manually manage memory. While Rust makes it much harder to do that egregiously wrong, it’s still a very hard problem, and there are trade-offs that Rust – unlike GC’d languages – refuses to make for you. Meanwhile, like C++, the emphasis is on performance (or at least control over performance), whether latency or throughput or memory footprint. Rust is trying to make sure that all its organizational abstractions have no run-time cost, or, if they do, to make sure it’s abundantly clear exactly what that cost is. If you are a systems programmer, if you are used to C and C++ and to trying to solve systems programming types of problems, Rust is magical, just like when you learned your previous favorite programming language.

If you are not, Rust is overkill for your task at hand and you shouldn’t be using it. I earnestly recommend Haskell.

For more clarity on what I mean by systems programming: If you write Python or JavaScript or Ruby, then you’re running the code in a Python interpreter, in Node or a web browser, in the Ruby interpreter, all on top of an operating system with an operating system kernel. Rust doesn’t replace those tools. Rather, the Python interpreter, the web browser, and Node, and even the kernel, are programs written in C or C++, and Rust replaces that. It’s a whole ’nother level of programming, where you have manage the actual hardware.

Purpose of this series

But enough of what Rust is and is not for. It is an excellent systems programming language, and one that was a long time coming. I plan on writing several posts about Rust features, why they’re an improvement upon C++ features, and why Rust is a better, more modern programming language. Mostly, this will be a discussion of why Rust is better than C++, which I think is the most comparable existing programming language, but it will also touch on why Rust is an improvement on C.

Because of this C++ focus, this series will at times be as much or more a criticism of C++ as it is a commendation of Rust. I think that is unavoidable, as this type of criticism of C++ is most truly credible when an alternative is available, and similarly, Rust is most practically evaluated in terms of its most viable alternative. Unfortunately, that also means that I’ll assume some level of familiarity with C++, but hopefully not too much.

I know that this is a much-discussed topic. Perhaps this is the Rust equivalent of the dreaded Haskell monad tutorial, where every person new to the programming language excitedly writes the same thing, and so thank you for reading. I’m going to try and avoid the obvious tropes: I’m going to try to do more than simply beat the table about type-safety and memory safety and avoiding undefined behavior – though of course these topics will come up. In fact, I had until rather recently simply assumed that the safety of Rust would lead to unacceptable performance degradation, that Rust might be well and good for some applications but could unfortunately never be useful in a true low-latency environment. I had to be persuaded that memory safety wasn’t a downside in such contexts, that Rust could truly be a competitor to C or C++ and not just to Go or Swift.

The syntax of C++

So for today, I’m going to ignore memory safety completely. Even assuming that C++ was more or less right that performance and optimization requires a broad range of undefined behaviors, there were still problems with C++ that left me regularly begging for at least a syntactic rewrite. As Bjarne Stroustrup, creator of C++, famously said: “Within C++, there is a much smaller and cleaner language struggling to get out.”

He later clarified that he wasn’t talking about a streamlined GC’d language like Java, and of course he is aware of Rust and still on the C++ train. As he clarified, he was talking about the syntax of C++, and the legacy of C. But just that category, just syntax, is I think enough to justify a do-over of C++. I fantasized continually about a new syntax – with identical semantics in my mind – that could be migrated to in a file-by-file basis, with its own file extension. This, I realize now, would in practice make for a new language, and a good opportunity to introduce modern typing in it, and in Rust, I see my hope realized, if a little more inconveniently than I imagined.

So that is what I want to focus on for the rest of this post: Why C++ syntax is rotten. Analyses of other Rust features I will reserve for future posts.

Many of C++’s syntactic foibles have to do with its C heritage. This is not to smear C: the same features that make sense in a simple “portable assembly” like C begin to break down when they are preserved with almost-identical syntax and naively extended semantics in a language that promises powerful generic programming features that assist in automatic code generation, resource management, and memory safety.

Header Files

This is really clear in my first example: header files. In C, they serve two purposes. For the programmer, they allow a separation of interface and implementation, especially when considering that modern IDE technology did not exist when C was developed. The header file shows the external interface of how to use the module, and the C file shows how it is implemented.

For the compiler, this arrangement simplifies implementation. The information necessary to compile each module is all included in the C file for the module plus all the headers included by it (and included by them, etc.). None of the other .c C files need be consulted, only the much smaller .h headers, in a practice known as separate compilation.

The problem when this is extended to C++ is templates. These are essentially macros, in that they allow the on-demand generation of new code, based on what is going on in the client code. So if we imagine that the module broadcast depends on the module connection, and that the compiler is currently compiling broadcast, it would not only be necessary to investigate the interface to connection, but also the complete implementation of the templates.

If we are to preserve the programmers’ perspective, and keep only the interface in the header files, this means that “separate compilation” is broken and that the compiler would need to fish around in the main C++ files. If we instead preserve the concept of separate compilation, the headers are no longer about just interfaces, but also implementation details.

The inventors of C++ decided to preserve the concept of separate compilation, and in the laziest way possible, literally the exact same implementation of C rather than trying to apply the same principles and goals, and re-engineering something better.

So now, we have the situation for the programmer where the header file contains a duplicate of the interface, in addition to the implementation of all functions that happen to be templates. To a compiler, a function and a function template are very different things, but to a user, functions go back and forth between template and not all the time, requiring the programmer to move them between files.

Why does this distinction exist at all in C++? Computers are much faster now. The compiler, to preserve efficiency, could automatically extract its own binary file of what information it needs from each module to compile other modules. The compiler could do this work for us. It is doing far more work in all of its optimization steps.

C is famously a portable syntax for writing assembly. In C, the information needed by the compiler, the application binary interface or ABI, is exactly the same as the interface needed by the programmer, the application programming interface or API – or at least very close to it. And so in C, the concept of header files makes sense, if unnecessary with modern compiler technology.

And to be clear, templates are just the most egregious example of non-interface code needed from other modules: bodies of inline functions and private member variables in classes are also not part of the interface from a programming perspective, but part of the binary interface, part of what the compiler needs to know about a module to compile the other modules that depend on it.

Now, you may think, why is this such a big deal? Why such a complaint about the inconvenience of moving things between files, or duplicating some information? Why does it matter if the rules of which things to put in which file are on the arcane side? You might imagine, so what? You’ll mess it sometimes, the compiler will issue an error, you’ll say “oh, right” and move the code to the appropriate location.

To such objections I say: You don’t know C++. Unfortunately, I’ve seen this attitude taken by professional C++ programmers, who were careless with header files, moving code around in bulk in a way that was liable to break this particular set of arcane rules, and accusing me of overreacting and wasting time when I objected to this.

For those who haven’t had the misfortunate of finding this out the hard way, when you break an arcane rule in C++ – even rules that have nothing to do with run-time behavior or memory safety – you are lucky if you get away with a simple compiler error – or even the somewhat more common arcane, incomprehensible compiler error. Unfortunately, the result is regularly no error at all. The compiler cannot tell, from its separate compilation point of view, if the information provided in the headers is consistent. One module might import one version of a header, and another module might import another.

This may sound unlikely, but many codebases have the practice of separating out the actual interface in one header, and the template implementations in another. At this point, it becomes important which one is included, especially because “template specializations” mean that additional template code doesn’t just make more templates available, but changes the meaning of existing templates.

If the templates included in different compilation units are inconsistent, the result is undefinned behavior, and the program might potentially do anything. Unfortunately, this also means the behavior might switch arbitrarily between different compiler versions, different compiler vendors, or based on seemingly unrelated permutations. Unpredictable behavior changes lead to bugs and security vulnerabilities.

Worse, header files are implemented by textual inclusion. The compiler proceeds as if the contents of the header were literally included in the module that imports them. Cycles of inclusion don’t result in error, but instead, a header is simply not included (if common precautions are made) when the second recursion happens.

Thus: Imports via header files are sensitive to ordering. A seemingly-innocuous change, like alphabetizing the included header files in each module, can break builds or change behavior. Such a change rolled out over an entire company’s codebase can be disastrous, and take many programmer-months to unravel the consequences of. Ask me how I know.

So it should make sense that the first concept I had for my “new syntax” for C++ was that header files should be auto-generated from source files, preferably in a pre-compiled binary format.

This would be an implementation detail of the build artifact, maintained a build directory, and be a compiler-specific optimization in favor of better compilation times. Semantically, rather than textual inclusion, there would simply be a declaration in one module to say that another module’s public interface could be used, where order wouldn’t matter.

Rust is a modern programming language, and the Rust use directive does in fact work that way.

This isn’t particularly a special point about Rust. This would be the obvious way to construct any new programming language. The compiler doesn’t need header files – the C preprocessor that implements #include directives, along with the rule that functions and structures must be declared before use, is a hold-over from the time when compilers ran on computers slower than a modern thermostat. And programmers don’t need them either: A better place for interfaces to be put in a separate file would be automatically-generated documentation.

So Rust here gets points for doing what any sensible modern programming language would do, and C++ loses points for carrying over an implementation detail from C to a context where it no longer makes any sense.

Syntax and Layout

Since we’re talking about the syntax of C++, I wanted to touch on something very basic but very serious: basic syntax for control structures. C and its syntactic descendants, including C# and Java, use something like this for if-statements and for-statements:

if (!foo.is_empty()) {
    spin_up_thread(foo);
    destroy(&bar);
}
do_something_else();

In this example, the calls to spin_up_thread and destroy are inside the if statement, and only happen if foo is indeed non-empty. The call to do_something_else is not part of the if statement.

How do we know that? Well, the compiler knows that because after the if statement there is an opening brace, and so all statements are included until the matching closing brace, including the two mentioned. But, depending on how fast we’re skimming the code, we probably know that because the spin_up_thread and destroy calls are indented.

In this situation, in what will be a recurring theme in this comparison, the compiler and the programmer are getting their information from different places. Therefore, the compiler and the programmer can disagree, especially as braces aren’t mandatory, and if omitted indicate that only the first subsequent statement is included:

if (!foo.is_empty())
    spin_up_thread(foo);
    destroy(&bar); // Warning: This is done unconditionally
do_something_else();

This looks like it only destroys &bar conditionally, and to a human following the indentation in code review or casual reading, that’s exactly what you would expect. But there’s no braces and the compiler, for whatever reason, ignores the same whitespace that human readers rely on.

This has come up in personal projects of mine, usually when collaborating with someone else. Even if you make the personal discipline of always including the braces { around the body of your if-statements }, someone else might not have that discipline, and therefore, you might be exposed to this intermediate-state code:

if (!foo.is_empty())
    spin_up_thread(foo);
do_something_else();

Needing to add a call to destroy(&bar) in the condition, after spin_up_thread, you find the line, add a new line at the same indentation level, and simply fail to notice that the new line is not actually wrapped in any {.

This was, of course, the direct cause of a major security vulnerability in iOS and macOS:

if (some_err_condition)
    goto fail;
    goto fail;

Since humans use indentation to read code, and to determine what is in a block and what isn’t, I would’ve wanted my wish-list “new C++ syntax” programming language to take a page from Python and use significant whitespace:

if !foo.is_empty():
    spin_up_thread(foo)
    destroy(&bar)
do_something_else()

Rust is a minor disappointment in this department. It stuck to braces, and whitespace being “insignificant.” But it made a huge improvement, far outweighing my disappointment: Rust at least prevents the goto fail scenario by making braces mandatory, helping ergonomics by instead removing bracketing around the condition. Having the body of the if-statement without brackets is simply not worth it as a short-cut, but if the braces are mandatory, then the parentheses aren’t necessary:

if !foo.is_empty() {
    spin_up_thread(foo);
    destroy(bar);
}
do_something_else();

This is better because then the goto fail example would still be glaringly obviously failsome, because even if the indentation does not match the braces, the braces still have to go somewhere, and will jump out at you:

if some_err_condition {
    goto fail; }
    goto fail;

It disappoints me that these issues tend to be dismissed as “matters of taste,” because as Apple learned, there are actual consequences to this misalignment of what programmers pay attention to and what the compiler pays attention to. I would have liked Rust to go the whole way, and remove altogether this strange concept that whitespace should be insignificant, a concept that my oldest C and C++ books exclaimed as a great feature without explanation or justification. But at least Rust has fixed the most egregious consequences of C++ syntax. Again, this problem with C++ comes from a feature inherited from C, but in this case C was just wrong to begin with, and should’ve done it the Rust way (or the Python way) from the very start.

Additionally, Rust has good auto-formatting, which unlike C++ auto-formatting tools, do not break code (by, for example, re-ordering headers). This fundamentally replaces the whitespace provided by the programmer – which might be misleading to other programmers – with whitespace that aligns with the compiler’s interpretation, and therefore is correct to rely on when skimming. A good cargo fmt should therefore be run before every code review, to make sure that the code can be easily and correctly read.

C-isms vs “Modern C++”

I then have one more topic before I wrap up syntax. C++ programmers nowadays are telling everyone who was upset with their language in the 90’s and aughts that it’s better now, that C++ has cleaned up its act. C++11 really has changed a lot, and C++ is innovating again, and that’s very good. C++ is full of new features, and part of its claim to be a modern programming language involves claiming that programming in C++ is good, if you use these new features.

Smart pointers, written std::unique_ptr<Foo>, allow automatic implementation of construction and destruction for owning pointers, and allows much clearer communication about ownership semantics in function signatures, and so is preferable to writing the C-style Foo *. C++ arrays, std::array<Foo, 12> arr;, act like any other STL collection, allow them and their iterators to be passed to standard templates that expect STL interfaces, and provide a number of useful features as methods, and so using them is preferable to the C-style Foo arr[12];. static_cast<A>(b) is much more specific, and therefore less prone to accident, than the C-style (A)b.

These are among the features that are trotted out whenever someone says they used C++ in the 90’s and it had all these problems. These features are among the ones used to claim that C++ is a better, cleaner, tidier, more modern programming language than it used to be. Whether or not they’ve done enough to replace their old counter-parts – they’re generally preferred whenever possible.

The problem? Convenience. Who wants to type std::unique_ptr<Foo> when instead you can write Foo *? Why are the somewhat-deprecated options the easy ones to write? Why isn’t it something like std::raw_ptr<Foo> with some convenient notation for std::unique_ptr?

But of course, that would break compatibility with C, and with earlier versions of C++.

I don’t want to get into the myriad reasons why smart pointers are to be preferred to raw pointers, or why raw pointers occupy such an awkward place in C++ – those are topics for a future post. But for however many seemingly-principled reasons some of my colleagues might state for why they used raw pointers in this or that situation, I couldn’t shake the feeling that it was partially because raw pointers were given the old-fashioned, easy-to-type notation.

And so, when I imagined my new C++ syntax, it would have Foo * mean std::unique_ptr<Foo>, and Foo arr[12] mean std::array<Foo, 12>. Why have the not-entirely-deprecated-but-not-preferred legacy C features be the easier ones to type?

Conclusions

All in all, this shows that a lot of purely syntactic but still substantial and consequential problems with C++ can be fixed with a syntax reboot, which Rust mostly provides. And I haven’t once mentioned type safety or memory safety! This will be developed on further in this blog series, where I will maintain that Rust is not only a better programming language than C++, but a better unsafe programming language than C++. Even if I had to use unsafe for every function in my module, I’d still rather write my module in Rust than C++, for all these reasons. I say this as a pre-emptive strike against the argument that occasionally having to use unsafe to achieve performance parity with C++ (and it is very occasional) “defeats the whole purpose” of Rust.

But of course, there are deeper problems with C++ that Rust also addresses, beyond just the syntactic. But those will have to wait for future posts.

A Modern Version

2020-11-17T00:00:00+00:00

Humpty Dumpty sat on a wall
Humpty Dumpty had a great fall
All the king’s horses and all the king’s men
Were too busy deciding who would be king…
To even TRY to put Humpty Dumpty together again.
And he’s just sitting there, all yolk and shell, waiting…

Apple Silicon

2020-11-16T00:00:00+00:00

This year, Apple released, to much fanfare, a somewhat obscure technical change to how its computers work: Macs will transition away from Intel’s CPUs to in-house processors known as “Apple Silicon,” more similar to the technology Apple already uses in its phones and tablets. It is a tremendous amount of hype for something rather technical, and to people used to more user-visible feature announcements, this can be somewhat disappointing, or at least confusing.

What does this actually mean for the end user? Apple claims that these new Macs will be (many times) faster, run cooler, and have much better battery life. Are these improvements as drastic as Apple claims? Will there be downsides and other adjustments that users will have to make, or will these new computers just work like faster, less power-hungry Macs?

A lot of responses I’ve seen seem pretty skeptical, which is fair. It’s been a long time since the drastic improvements of Moore’s law have been the norm in computing; we’re used to much more incremental improvements. And Apple is claiming to achieve these improvements by moving away from Intel, when Intel is the established market leader in making high-powered PC processors. Can these new computers really be that much better?

My answer, as your computer-nerdy friend, is that these computers are not only going to be a great technical improvement over previous Macs, but represent a revolution even beyond the Apple ecosystem, a turning point for PCs in general, and one that was a long time coming. To explain why, I’m going to delve into some computer history to give context to this shift, and some of the technical details of how computers work, and specifically in what ways these new Macs will work differently from the current ones.

So bear with me as we go deep. I promise it’s relevant.

Operating Systems and App Compatibility

Do you remember when it mattered a lot more which operating system you ran?

Nowadays, most of my personal time on the computer is spent on the web browser, doing my writing on a website, my TV watching on a website and even my TODO lists. I look at the bottom of the screen on my non-work computer, the MacBook I’m using to type this, and I see a slew of icons for various apps: a messenger app, a mail app, a calendar app, a spreadsheet, a word processor, all in all standard computer fare, but not getting much use compared to that Google Chrome icon. As a result, unless we’re doing specialized tasks, like programming (as I do for work) or CAD or photo-editing, apps besides the browser is not the big deal it used to be.

But once, it was a huge deal. Every computer user had several different apps in their workflow, and using an alternative operating system (as macOS once was) risked not being able to find appropriate equivalent apps, and possibly not even being able to read documents that would be in “Windows formatting.” There was tremendous social pressure to use the same software as other people, to the extent that this webcomic rang true.

Therefore, every serious personal computer application existed as a Windows program, specifically a Windows program that ran on Intel (or Intel-compatible) processors (known as “Wintel,” especially by those who criticized it as a monopoly). A company would support other types of computers only as an after-thought. And at that time, apps were distributed on CDs and stored on physical media. It was common to have an old version of an app lying around, not be able to receive live updates for it, and to expect it to run on a newly-purchased computer, unmodified.

In such an environment, compatibility was key to profits. Each new Intel processor, and each new version of Windows, had to support all the apps that could run on the previous version. There was no higher priority. For years after Windows 95, since before Windows 95 was even released, Microsoft had another operating system, Windows NT, that was far more stable and technically superior. But Windows 95 was more similar to Windows 3.1 before it, and supported more apps, so until Microsoft could get Windows NT to run all those other apps, it was stuck with the inferior product. Eventually, 6 years later, Microsoft came out with a version of Windows NT able to run 95 apps: Windows XP. Even that transition was gnarly.

The Intel side of the “Wintel” monopoly was similar. To this day, a modern Intel processor is capable of running MS-DOS programs from the 80s directly, without requiring any emulation layer in the operating system or any modifications to those programs. The antiquated 16-bit instructions that comprise those programs will still be interpreted by the modern hardware, which also supports countless other compatibility modes for various eras of the processor history. And this is true not only for the Intel processors that power Windows PCs, where it makes some amount of sense to support all the Microsoft ecosystems of years past, but also on Macs, where this history is much shallower.

Processor Design and Instruction-Set Architecture

See, Intel is more than just a company. Intel is also an instruction set architecture, also known as Intel64 (or Intel32 for 32 bit versions) or x86 (sometimes x64 for 64 bit versions). When applications are prepared to run on Intel, they are (traditionally) compiled to a file containing a sequence of instructions. The meaning of each instruction is determined by complicated standards, and the hardware of the processor must take these instructions, and actually perform the designated operations.

The amount of complexity involved in the meanings of these instructions, or the instruction set architecture (ISA) is large. Intel’s ISA was documented in 3 paperback volumes back when I was a child in the early aughts when my parents were gracious enough to order the set for me. It has only grown more gnarly since then.

Because of the vast compatibility requirements that Intel in particular historically has faced, their ISA, has never been redesigned from scratch since the 80’s. This means that, instead of having every instruction be a fixed length of say, 4 bytes, Intel instructions can range from 1 to 15 bytes. Some of them do simple things like adding two numbers together, whereas others do more complicated things like copying an entire string of characters. Many of the design choices would never be made by a modern engineer, but Intel is stuck with them for historical reasons.

But Intel hasn’t been able to clean this up, for the same reason that “Wintel” customers can’t switch from Intel: Any clean-up would mean old apps would not be able to run on the new computers. When Intel did attempt this, with Itanium, the world wasn’t ready, even though Intel tried to leverage the already-difficult transition to 64-bit as a reason to get people to switch to a completely new architecture. Even worse, any clean-up would mean that Intel would have to compete with other companies as equals, whereas now there is only one other company, AMD, that is allowed (for historical reasons) to design Intel-ISA processors. Embarrassingly, it was AMD that actually convinced everyone to switch to 64-bit, by providing a much more gradual transition to a 64-bit ISA far more similar to the existing 32-bit one.

Processor architectures with such convoluted ISAs are referred to as CISC, for Complex Instruction Set Computing. All of this complexity must be implemented using more complex hardware. Intel processors come with decoders to break down over-complicated instructions into smaller pieces, reorder buffers to optimize on-the-fly which pieces can be done when, and extra circuitry to handle all of the various compatibility modes that are necessary to support old software. All of this constrains processor design and, very specifically, draws extra power.

What’s the alternative? The transition to mobile, to phones and tablets, gave computing something of a fresh start. No one ever expected their “Wintel” apps to run on a phone, and so when initial iPhones and Androids came out, new apps were written from scratch. Those would be written for whatever processor architecture Apple and Google chose, and they chose a more modern, less-CISC ISA: ARM. ARM stands for Advanced RISC Machine, where RISC, or Reduced Instruction Set Computing, is the opposite of CISC.

The ARM ISA, which unlike Intel’s is available for any company to license and design their own compatible processors, takes a moderate position in the historical RISC/CISC wars, and require in any case far less decoding circuitry than Intel processors require. When Intel tried to make Intel-ISA processors for phones and lightweight laptops, the Atom processors, it was a failure. The decoder got in the way of achieving a good combination of performance and power consumption, and the resulting phones were either unacceptably slow or unacceptably low in battery life.

And Apple Silicon is Apple’s branding for their ARM ISA processors, supporting the ARM ISA with Apple’s proprietary processor design. They’re bringing the benefits of phones to the PC world.

New Modes of App Development

So the “Wintel” monopoly and Intel in general never jumped from the PC world to mobile, and as a result we have our cool, fanless, high battery life but high performance phones we have today. But why, then, do Macs, which never ran Windows programs, use Intel processors to begin with? Why are they switching now? And why is this a turning point for PCs in general?

Well, for one thing, it’s not entirely true that Macs don’t run Windows programs. A key reason why Apple switched from Power to Intel in the first place was that Macs can run Windows programs: by either running a version of Windows simultaneously to running macOS (https://www.parallels.com/), or, more simply, by rebooting the same computer into Windows, which runs on Mac just as well as on any other type of PC. When Intel Macs first came out, this was a decisive feature for many switchers, nervous to abandon app compatibility.

And at the time, Intel was the best processor manufacturer in existence, so that Intel processors with their flaws were still better (as they were produced through better manufacturing processes) than the POWER-based RISC processors Apple was previously using. Get better processors and get some level of Windows-compatibility: the decision was clear for Apple at the time.

But now, Windows compatibility is not important to hardly any Mac users. And there are a number of reasons for that. Nowadays, a lot of software is not translated to machine code, the level at which ISAs are relevant. A lot of software is delivered to us via the browser, where a portion runs on servers in the cloud and a portion is written in Javascript, and then either interpreted in that form, or translated to the ISA of the computer live by the browser. Once the browser supports an ISA, all websites come with it.

And even for apps, many of them are written in higher-level languages that use a virtual machine or Just-In-Time compilation to be processor-architecture neutral. These programming language technologies matured after Intel had already become stuck with its backwards-compatibility advantage, and apps written with them also are easily portable, which is to say, brought to a new ISA. Once the Java virtual machine (for example) is ported to ARM, all Java programs come with it.

And even for programs that are compiled in a traditional fashion, written in an old-fashioned compiled programming language like C or C++ (or the Apple-specific Objective-C or Swift), ISA compatibility is no longer the issue it once was. These languages have evolved over time to make it easier to re-target ISAs, and programmers nowadays are better trained in writing their code in such a way that it can be compiled for any ISA. Creating an ARM version of a Mac app is just a switch in the compilation system, and maybe finding a few obscure bugs where the differences matter a little more deeply.

And once the new version is made, Android and iOS both transitioned from 32-bit to 64-bit ARM, requiring new apps to be built, and we as customers hardly noticed. Developers quietly prepared 64 bit versions of our apps, and when we upgraded to 64-bit compatible phones, we didn’t notice that the app store sent us a different version. After all, we get updated versions of apps from the app store all the time. As long as the developer can adjust, the end user just has to do some more downloading – cheap and easy in an era of widespread broadband.

Conclusion

And so what does Apple lose out on for the Apple Silicon Macs? The ability to boot Windows is now more of a liability than an asset for them. Old programs will rapidly be ported. Due to modern technology, which allows us to translate between ISAs on the spot, emulating Intel on the new Macs is often faster than running the Intel programs directly on the old Macs.

As users, we might be mildly frustrated by it. I certainly will be a little worried about buying such a computer until I know that some version of Linux will work smoothly on it – Linux on ARM is currently very much so a second-class citizen in the PC Linux world.

But the advantages are great. No longer constrained by hardware decoders, Macs will cheat the old trade-off between computational power and battery life, and get to have their cake and eat it too, at least for one round of abrupt improvement for the transition. And because now the PC and phones use the same processor architectures, iOS and iPadOS apps will now work on macOS as well, saving time for writers of tablet applications.

And other PC manufacturers will end up having to notice. Windows on ARM exists already, though it is currently obscure. If Microsoft can cultivate as modern an ecosystem as Apple, where a combination of emulation and streamlined distribution make it easy to get ARM versions, these new Macs might start an ARMification trend.

This spells (long-term) doom for Intel. Their business model is tied to their ISA, on the premise that no one can afford to switch away from the most popular ISA, that everyone is locked in. This was never true for mobile, in spite of Intel’s best efforts, and as Apple is demonstrating, it also hasn’t really been true for PCs for a while either.

A Prudent Quarantine

2020-10-28T00:00:00+00:00

Five Members sat in council.

There are some activities, some patterns of human group behavior, that transcend era and culture, and meeting in council is one of them. In spite of the youth of the participants – they were in their late teens and early 20’s – and the informality of the setting – leather couches covered in scratch marks, unfinished walls – they still clearly were sitting in council. The seriousness with which they were watching the video, their intentional and controlled posturing and nuanced glances, would have been instantly recognizable to any Parliament or Diet throughout history. They had met to do business, to make a decision, to come to a consensus.

One Member rose to speak. “I don’t know about this girl,” said Carlos, tapping his phone.

Abruptly, the video paused. They now saw the girl, Petra, frozen, mid-sentence, mid-gesture, looming over them, larger than life on the 12 foot tall projector screen. She didn’t look like someone who would require scrutiny from anyone, let alone detailed conciliar deliberation. She was sitting on a small uphill slope of rolling grass, relaxed but still composed. She just looked well put-together: she was meticulously dressed; perfect subtle make-up; cheery, friendly demeanor. She projected a presence greater than her small stature: though on physicality alone she could be mistaken for a young teenager (she was, in fact, 21), her presentation was that of a 30 year old.

Christy resisted the urge to lower her head, instead presenting a stoicism she did not feel. She fantasized about a world where she could introduce the rest of Red Stripe Quarantine Group (Reg. No. F56D3, Bellevue, WA) to her girlfriend in a pleasant, friendly manner, like people do in video chats and VR hang-outs, except in real life. Failing that, Christy really wanted to fast-forward through this whole interview, not just the video itself, but also fast-forwarding or just skipping the lived experience of her closest friends scrutinizing and discussing the love of her life. But she knew that this was the only way she could ever see Petra in person. The other Members were entitled to their safety. It was, after all, the law. A law followed is a life saved.

But did it have to be Carlos who first spoke against Petra? Don’t misunderstand: Christy was duly grateful to Carlos. He had truly enriched her life, in the most concrete way possible, not with spiritual or social riches, but with hard American greenbacks (still called that even though cash had long since been outlawed as a disease vector). If he hadn’t joined nine months prior, if he hadn’t started, coyly and then loudly, saying “what if” – “What if we used our still to make hand sanitizer in addition to booze?” “What if we started growing aloe and selling the hand sanitizer?” “What if we started taking shifts, contacting distribution companies, and getting sold in local drug stores?” – she might have had to drop out of school, and Red Stripe, her chosen family, might have had to disband. And it was admittedly quite satisfying to see the “Red Stripe Hand Sanitizer™” labels on their products at the local corner stores when they went on their supply runs.

But, in practice, even though he was not Leader, that made Carlos her boss. And no one wants their boss interviewing or scrutinizing their girlfriend. Christy was grateful that Carlos had not succeeded in convincing Joe to let him shadow – or, really, co-conduct – the interview. Christy wished there had been a way to exclude Carlos from this stage as well. But there he was, dressed in his button up and slacks, even though it was just them, even though everyone else was wearing pyjamas, as if at any second he would find himself thrust into a video call with a supplier or distributor.

Lilith, Christy’s cat, sensed Christy’s anxiety, and came running over as soon as Carlos stood up. Christy picked Lilith up, put her on her lap, and sighed deeply. At the same time, she heard Joe sigh a more abrupt, exasperated sigh. “Huph.”

Christy recognized that sigh. She hoped that it meant that she would not be called upon to defend Petra, that Joe would take up her cause for her.

Joe was the official Leader of the Quarantine Group, and the original founder. Christy and the other Members – even Carlos, to some extent – looked up to him as an elder brother. Joe had been the one to actually conduct the interview that they were all now evaluating, and now he was ready to jump in on the discussion: “What do you mean, you don’t know about her? What’s your concern, specifically?”

Carlos lifted his hand in the exact way that he did before he was going to make a new, strict, unwelcome rule for the factory (the latest one eliminated water bottles in the room with the still). Christy suddenly realized she couldn’t bear to listen to what she felt certain was coming next, and so she had to get out ahead of it. “It’s not because she’s Chinese, is it? You do know she’s Chinese-American, right?”

As soon as she said this, Christy covered her mouth up. Of course Carlos knew that, and Christy had basically just called him stupid, and bigoted to boot. But it was a real concern, and one that she’d been worried about. After all, everyone knew the stories, both from grandparents who remembered it all and from history class in school: 53 years ago, in the first year of the Perpetual Quarantine (Q.Y. 1 for Quarantine Year 1), the First Virus started spreading from China to the other countries, because, as was invariably pointed out, China was the least hygienic, least responsible country.

Carlos put his hand back down and slowly turned towards her, adjusting his glasses, a gesture he had learned from his parents. “Of course not,” said Carlos earnestly. “The Administration would never allow any foreigners into this country, especially ones from such a place as China.” Everyone nodded. Everyone had also learned in school that immigration to the United States had been outlawed since Q.Y. 4, 49 years ago. “I am not one to hold the ancestors’ flaws against the descendants.”

Abruptly, a woman’s voice boomed above them, way too loud for comfort. “So what is the issue, then?” Carlos’s partner Jillian jumped, having forgotten, as was all too easy to do sometimes, about the sixth Member of their deliberations, Leila, who had videoed in for the meeting, her face visible on a small (only 3 foot by 3 foot) square on the projector screen, and her voice broadcast, apparently at the wrong volume, through their room’s sound system.

Leila was an out of house member of Red Stripe Quarantine Group; she was in the same “abstract” or “virtual” household. She had her own separate apartment, but her red wrist bracelet was the same as the others’, and it also allowed her to visit the Red Stripe Hub house whenever she wanted – which was legally allowed as long as it was the only other household she visited. Quarantine groups were currently allowed to have up to 10 adults and up to 4 locations, and Red Stripe’s model of a central hub with one out of house Member was not uncommon.

Leila needed this flexibility to study to be a doctor. Every day from Monday through Thursday, she went from her apartment to a physical lab on campus. At her designated time each day – which was assigned by lottery and could range from a pleasant 11AM to a torturous 3AM slot – she would walk over to the lab, fully masked. Once the previous student was confirmed as being out of the mini-lab she had been assigned, she would take a quick Sanitation Shower, do her lab work in her lab clothes (perhaps on a video chat with other students or an instructor), and then leave after 45 minutes. Next year, she would qualify for a full hour and a half, which she was looking forward to.

Then, Thursday night through the weekend, she would usually stay with the rest of Red Stripe in the main house. For this, she would book an autonomous car, which would then automatically take her to the main house as the only place she was legally allowed to go. As soon as she arrived, she would shower and stay in her pyjamas until it was time for her to go back on Monday.

And this particular meeting took place on a Tuesday, which is why instead of sitting on her currently-empty spot on the middle couch, Leila was sitting cross-legged on a brightly-colored patterned ottoman in her own little corner of the projection screen. “Petra is not ‘this girl,’ she’s Christy’s girlfriend. If you have a real concern, that’s fine, we can discuss it, but like, I really thought this was just going to be a formality. Don’t we all just want Christy to be happy?”

Christy felt like she would melt. Of course, she knew Leila would stand up for Petra, as Christy’s best friend and the only Member who had really had significant interaction with Petra before this. But actually hearing Leila jump in so boldly, so confidently just filled her with gratitude. She raised her hand up towards the camera in the shape of a heart, and smiled.

Carlos stretched his back out, and put his hands behind his back, and began to walk, and then to pace. This was how sermons were preached in the Watchers of the Vaccine, the niche religious group Carlos was raised in (Carlos wouldn’t let the others call it a cult, even though he no longer practiced). And, as with the clergy of that group, it meant that Carlos was about to give a sermon.

“She did the interview outside, in a park. She wasn’t even wearing a mask. Now I know, and she did make it very clear, that it was her group’s Liberty Day.” Every group had one day a week they were allowed to go for walks or to the park, indicated by the color of their bracelets. “And I also know there weren’t other people around, so she wasn’t violating the masks law. That is not my point.”

“Well, friend,” began Kevin, and then he paused. Everyone waited patiently. Kevin was Joe’s partner and led the group’s semi-weekly yoga classes, and he talked in the slow, overwrought way that people talk when they’ve smoked weed their entire lives. He waved his hand slowly as his brain loaded more words. “What is your point then?” More loading. “It’s more important to be safe than look safe, my friend. And she seemed really chill.”

At the word “chill,” Carlos reared back as if he was ready to charge. He pointed his finger with the energy of a punch, his face bulging red and about to boil over from anger. He inhaled deeply a few times, and when his face had finally returned to a more normal color, he spoke, with the sort of crisp calmness that in certain personalities can mask a seething rage. “I just don’t think that’s true.”

Kevin blinked. “What’s not true?”

“Part of being safe is looking safe. Care and focus defeat the virus. We’re not looking for ways to technically follow the law. We’re not looking to get a C+ in quarantine. We’re not going to be like the videos of Canada.” Canada, in spite of several attempted military interventions, still refused to quarantine to American standards. Protect your country, observe the quarantine. Carlos paused for a breath, and then continued. “After all, we make hand sanitizers. We must be above reproach.” Everyone nodded their half-hearted agreement. ‘Above reproach’ was one of Carlos’s catch-phrases. “And Petra… Well, she didn’t say anything about what personal disciplines she kept to avoid the spread of infection. She didn’t say how she would help contribute to our economy. She didn’t –”

Christy’s panic and helplessness slowly converted to anger. Would she ever be free of Carlos’s suffocating perspectives? It used to just be during their work shifts, but now not one part of her life was safe. Something would need to be done, she thought, in a boil gradually bubbling beneath the lid of her anxiety.

“Our economy?” Joe interrupted. “We have six people! And she said she had an online job, that she would pay her share of the rent. And that she was a good cook. I didn’t ask her for more details, so maybe she has more ideas she just didn’t mention. I think –”

“Yes,” Carlos said, his voice now ice-cold. “She said she wanted to share with us new, fun recipes. Recipes that, for all we know, involve endangering us all by going to multiple grocery stores in a single trip. Recipes with expensive ingredients –”

Christy’s anger finally burst through the lid, and she yelled, “Carlos!”

Joe and Carlos froze mid-argument, their gestures stuck like a video on pause, and the rest of the council gaped at Christy. In all the time they had known her, she had never raised her voice in anger before. In excitement, perhaps, or enthusiasm, but never in anger.

And indeed, in the light of their stares, Christy recovered her normal affect, and smiled, speaking as sweetly as she could, but with a hint of bitterness that her fellow Members weren’t used to. She decided to focus on one particular point. “We all are very duly grateful that you got us into the hand sanitizing business. But that means we should be less worried about money now, not more, right?”

Kevin agreed. “Wouldn’t you enjoy some” … wave, wave, words loading, inhale … “more new tastes?”

Carlos shook his head, as if only he understood the true severity of the situation. “That’s not the only example.” And at that, Carlos started tapping at his phone in his hand, jerking the video back in 15 second jumps. The out of context freeze frames, alternating between Petra’s elegant, nuanced body language and Joe’s extravagant gestures, formed a bizarre dance.

The others held in their intense emotions in patient but active silence. Christy clenched her fists in an effort to keep herself from crying or yelling – she wasn’t sure which.

Finally, Carlos seemed to have found the correct spot in the video. Joe had just asked “What is an ethical dilemma that you have faced?” Petra had said, “So, this one time I was at this party, and this girl was drunk…”

Carlos paused the video again, and looked out over the group. “Was I the only one who heard that?”

Everyone looked at each other, slightly uncomfortable. Finally Leila spoke up, saying what everyone else had been thinking. “She clearly means like a video chat or VR party.” Leila paused, and then continued, audibly affronted. “You’re not actually suggesting –”

“Christy,” said Carlos. “Has she ever taken you to a VR or video chat party?”

Christy paused. Petra hadn’t.

Well, that wasn’t entirely true. She had, but mostly with Petra’s own family or one of her close friends from her family’s group. Certainly not the “meet a lot of people” type of party, not something that you would call a party – loaded and edgy as that word was – even metaphorically.

So the question still stood. Did Petra have a past? Did Petra used to be a partier?

Christy would have to discuss this with Petra. But, she told herself decisively, it didn’t matter. The Petra she knew would never do something so irresponsible, and the past was in the past. Perhaps it had been a real party, and Petra was being forthright in bringing it up. She had also promised not to violate the group’s safety. If she really was being irresponsible, still, today, would she speak so openly about it?

In the midst of these thoughts, Christy remembered that she still had to speak, even though right then she’d rather not exist. She took a deep breath, hoping to inhale some courage along with the air. “I’m honestly quite hurt that we’ve come this far. Joe and Kevin, you have each other. Carlos, you have Jillian.” Jillian smiled coolly and waved from her seat. “I have been dating this girl online for three months now, and you know I have never met her in person. And now she’s giving up her whole life, her ability to see her parents every day, to talk to them in person, to live with me. I thought I lived in a place that wanted me to have a partner, to be happy!”

Kevin, now wearing an overwrought but sincere frown, sympathetically patted Christy on the back. “Come now. Carlos is just trying to do due diligence. He’s not actually going to veto her. We’re just maybe going to have to do another round of interviews. I know it’s annoying, but we’ll get her here, I promise.”

Carlos looked sternly at Kevin. “Don’t trivialize this! Everyone has a right to be comfortable with the people they’re quarantined with, the people we spend all our time with, the people who we trust not to get us sick. If I don’t feel comfortable with her –”

Joe interrupted. “Carlos, if I’d known you were looking for more detail about how she’d contribute economically, I would have asked her more relevant questions. And I’m sure Leila’s right about the party. Perhaps we should just do another round –”

“She could have brought it up!” Carlos said, measuredly but twice as loud as Joe. “She could’ve explained when she realized what it sounded like she was saying! Our goal isn’t to bring her on board and lie to ourselves that she’s a good fit. If I were Leader of this group, performing these interviews, I wouldn’t prompt her to say the words to check a box, that’s not how this works.”

Kevin gasped in slow motion. Even Jillian blinked. Everyone had always been able to hear Carlos thinking about how much he wanted to be Leader; it was an extremely loud thought. But this was the closest Carlos had ever come to actually saying it. And certainly the other Members could see the logic, and no one could argue that he hadn’t earned it, or didn’t deserve it, on some level. But on a level no one was really able to articulate, no one really supported the idea either, and up to this moment, Carlos had simply been biding his time as it gradually seemed more and more inevitable.

But no one said anything, so Carlos continued. “And I think I miscommunicated earlier. It’s not just what Petra did or didn’t say that bothers me. Maybe she just used to go to a lot of VR parties. That probably is what she meant. But that’s not what is really bugging me. That, we could do follow-up research on. That, I’d do another interview for.

“But her personality test – her personality test showed that she’s an extreme extrovert. I know those types! They stretch every law and regulation to go outside and meet strangers as much as possible. They go grocery shopping just to talk to other people. They are responsible for so much contagion.”

Christy hadn’t reviewed the personality tests and so she had never learned this, but she didn’t think much of it. She could see how someone could confuse Petra for an extreme extrovert; perhaps this was a testing glitch.

“Hey, I tested as an extreme extrovert!” said Joe. “Are you going to accuse me of endangering all of us?”

Too much was happening for Christy. Her girlfriend was potentially vetoed; Carlos was making an active bid for the Leadership; and now Joe, her friend, or rather, the older brother her birth family lacked, had actually been an extrovert this whole time? Along with Petra? The test was thoroughly broken. Or maybe the word just didn’t mean what she thought it did. In any case, nothing Petra had in common with Joe could possibly be held against her, no matter how bad it sounded.

Carlos echoed Christy’s thoughts. “I’m not saying the test is 100% –”

“But I am an extrovert. Being an extrovert is not a bad thing!”

A few seconds of stunned silence passed.

Finally, Leila’s voice cut through from the speaker. “Carlos! What on earth is wrong with you? Why are you putting Christy through this?”

Carlos looked towards the speaker, then the screen, a little flustered. He started to say, “I…” and then paused.

Leila seemed to take this pause as a moral concession. “You want to veto Petra, and make an ass out of yourself as you do it? Fine. I can’t stop you. But I can leave this group. Christy and Petra and I can start a new one. I have my own apartment already, and I have savings, you know. You think you’re so special and so important just because you started a business and –”

“I will not,” yelled Joe, “I will not see this group, that I spent years bringing together, fall apart like this! We are a family, people. And Christy does deserve happiness. And Carlos, Carlos does deserve recognition for his activities; we’d all be bankrupt if it weren’t for him. Leila, you only have savings because of him. You know that.”

Christy knew that Leila didn’t see it that way. After all, as Leila had told her many times, though never in Carlos’s hearing, they all contributed to the business. Carlos just contributed differently. But Christy saw what Joe was trying to do, and waited quietly, realizing it might be her only chance at Petra moving in, and thus at happiness.

Joe took a moment to catch his breath, and then continued. “So I’d like to propose a compromise. We vote Petra in, and we vote, in the same vote, to make Carlos the official Deputy Leader. Next time, Carlos,” he said, looking over at him, “you and I design the interview process together, and we work to build a fairer process. We also try extra hard, in the future, to make a policy that favors romantic partners of existing members, and has clear criteria for acceptance. Does that sound good to you, Carlos?”

Carlos paused for a moment, and then nodded. “Yes, that sounds perfect.” And everyone could hear the simultaneous thought: “Only Deputy Leader?”

Christy felt her chest relax as all the tension escaped, creating an embarrassingly audible, vocalized sigh.

“Alright,” said Joe, “everyone in favor of making Carlos Deputy Leader and taking Petra on in our group, please say ‘aye.’”

“Aye,” said everyone in the group.

“And any opposed, say no?” Joe continued. No one said anything.

“Well,” Joe continued, “the ayes have it. Congratulations, Christy! I look forward to meeting Petra in real life.”

A Respectable Octopedian

2020-05-14T00:00:00+00:00

In front of Penny in line was a 7 foot tall humanoid with glowing blue skin. She suppressed the urge to ask what species they were, and let the alien order their vegan breakfast burrito. The barista at United Planets’ first-floor Starbucks looked human except for the extra hands. Polycherian, Penny remembered. When the barista handed Penny her order – an egg and cheese sandwich on a bagel – Penny bowed respectfully and said pflintsu – Polycherian for “thank you” – before getting on the elevator.

Penny loved working at United Planets’ New York Headquarters for the same reason she moved to New York City in the first place: the diversity. No other job, no other workplace, could ever measure up – not on Earth, anyway. As such, she smiled when the elevator door opened and a three-foot tall octopus-looking alien oozed in.

But her smile rapidly faded when she recognized this particular Octopedian. It was the Octopedian Representative himself, Estramsor, Deputy Leader of the Traditionalist Faction in the Interstellar Congress, who had recently argued in front of the Congress to embargo humanity, to quarantine them, to prevent the evils inherent in human nature from spreading throughout the galaxy.

And so it didn’t surprise her that the Octopedian did not return her greeting, but stared straight ahead. Penny shifted her weight back and forth. If the Octopedian had succeeded, it would have ruined Penny’s dream of travelling the universe, visiting other planets and learning everything she could about alien cultures. Besides, it would cruelly bottle humanity in with all of its flaws, never to grow or mature.

The elevator was taking a long time to reach the next floor, though. And ultimately, it lurched to a stop.

“It’s stuck again,” said Estramsor, matter-of-factly, continuing to stare straight ahead. “Typical human technology.”

Penny read the inspection certificate, which stated that this elevator had been manufactured on Alpha Centauri, but didn’t say anything.

Estramsor looked over at Penny and started writhing his front tentacles in a gesture that even Penny, who only knew a few Octopedians, recognized as open disgust. He strutted into a corner and looked away.

Finally Penny felt compelled to speak up. “Even if you don’t like humans, you could still treat us with common courtesy.”

“Courtesy?” asked Estramsor. “Consider it a courtesy that I didn’t cover you in acidic slime for such an impudent statement.”

“Hey!” said Penny. “You’re on our planet! We are the ones who gave you the courtesy of letting you work here and giving you this space.”

“Well,” he said, spitting up some black goo. “What a lot of good that did. But this is my last day, I am dropping your disgusting planet out of my portfolio. So I shan’t mind offending any humans on the way out. I do have diplomatic immunity, so watch your mouth.”

Penny pressed the button for the next floor, then the open button, the close button, and the call button harder and harder. When none of it worked, she sighed and sat on the floor, waiting for something to happen.

After a while, though, she found she couldn’t stay silent anymore. “You know what I don’t get?” She looked over at the Octopedian, who did not react or move in any way. “All those things you mentioned in your speech: War, and the glorification of war. Poverty and starvation. Our inability to deploy medical resources to those who actually need them. It was a pretty damning speech. I remember it. I found it moving. I really felt ashamed to be a human.”

“Thank you for the compliment. I assure you, the speech was meant for Congress, not for you.”

“But I’ve learned a lot since the embargo was voted down. For every single thing you mentioned, every single flaw, humans are not alone. There are a dozen other cultures and worlds, fully represented in the Congress, that do those things.”

“Yes, that is what my opponents successfully argued. Do you have something intelligent to say, human?”

Penny thought for a moment. She knew she’d never have an opportunity to ask this question again, even if she did get acid burns for it. “So you must have known that the embargo wasn’t actually going to pass. What’s the real reason? What do you actually have against humans?”

The Octopedian’s translation device made a sighing sound, uncanny coming from a face with a closed mouth, with no clear mechanism to make such a mammalian noise. “You are, as someone who works in such an august institution as this, surely aware that even non-obligate carnivorism, while frowned upon by enlightened species, is fully allowed under our legal system, certainly not a valid argument for an embargo. I am old enough to remember when that wasn’t established law, and I tried my hardest to make that decision go the other way.”

Penny nodded. Most intelligent omnivorous species were vegetarian; that was Interstellar Multiculturalism 101.

“But then I found out that you ate eggs,” he said, and he lifted up a small device and abruptly projected, onto the elevator wall, a picture of an Earth octopus egg being harvested. Penny jumped. The picture changed to a factory farm full of chickens, eggs collecting beneath them. “That was additionally shocking. Even so, I thought I could tolerate it, until I saw this.”

And there it was, an Octopedian eating an out-of-places, Earth-style egg bagel sandwich.

The Octopedian looked at the picture, and all 8 legs shuddered. “Some of our own religious authorities" – Penny remembered that the Octopedian homeworld was a theocracy – “concluded that eggs of other species were allowable food, even for us. I couldn’t argue this in front of the Congress, but this heresy had to be cut off at the root.”

The elevator lurched and stopped. The door opened and the Octopedian left, as Penny looked at the egg and cheese bagel sandwich still in her hand.

All Rent Should Be Cancelled

2020-03-23T00:00:00+00:00

Even early last week, before restaurants were closed, before we were banned from unnecessary gatherings, when many people still had to go into their office jobs, the bars were empty on my street. I walked into one, ordered a cocktail, asked the bartender why it was so slow. It was usually slow on Tuesdays, of course, but normally there was at least one other customer. But the pandemic had already scared everyone else away, and if it continued, the bar would surely have to close.

Then, the bartender casually mentioned that, if they had to close for a month, that would be the end of the business, because how would they pay rent? This hadn’t occurred to me, but given the steep prices of commercial real estate, it made sense. I began to fill with dread as I imagined this bar that I loved closing not for the duration of the pandemic but forever, and not just this bar, but any bar owned by local businesspeople, any bar that didn’t have the vast resources needed to just pay a few extra months of rent with no customers.

And now, all the restaurants are closed. Sure, many of them are making money doing take-out, but some of them are primarily bars. Most of them were selling an ambience as well as food, and none of them planned on a takeout only business. I am worried that, when all this is over, half of restaurants will be shuttered for lack of ability to pay their leases – or from falling behind on other bills because all their money went towards their leases.

Now, if landlords all behaved reasonably, I don’t think this would be a problem. Normally, if a business fails to pay rent, that means it should be replaced by a business that can. It’s not just a matter of one month of lost income for the landlord; it’s also a signal that more bad months are likely coming.

But in this situation, the signal would be all wrong. Whether a business does well or not in a “shelter in place” economy has nothing to do with whether it will do well when the restrictions are lifted. If a landlord shut down a bar because it can’t pay rent now, what reason would they have to believe that the business that would replace it would do any better when things are back to normal? How long would they have to find a new tenant? It’s in the landlord’s best interest to be understanding.

But unfortunately, I don’t think landlords will be as reasonable as they could. Sure, many will be forced to let businesses miss a few months of rent. But some of them will demand back payments, just because their contracts say they can, and while some businesses might be able to handle that, some will still close. Others might evict businesses that annoy them, and use this as a legal excuse, and other landlords might just not understand or decide to milk their tenants dry, even though it’s ultimately not in their own interest.

So, in everyone’s best interest, we should cancel rent. The argument I just made for businesses works for individuals as well – whether you can pay rent in the next few months has basically nothing to do with whether you normally can.

We should cancel rent, and not postpone it. Everyone can start doing it again when the shelter in place order is lifted, and the money starts flowing again, but only for subsequent months – no one should have to pay extra to catch up for the months we missed.

After all, are you getting your full use out of your housing during this month? Are businesses getting full use out of their land? I live in NYC because of my job – but now I’m not going to it. Businesses locate where they do to get customers – but now the customers aren’t allowed to come.

We should cancel rent, not just evictions. Because with a moritorium on evictions, landlords can still demand catch up rent once that moritorium is lifted. And many people will pay rent now when they actually can’t afford to.

New York City has loans to small businesses, but that’s not enough. That means they will still have to pay those loans back. Some businesses won’t be able to afford to, and not because they were bad businesses, but just because of the pandemic.

Unemployment benefits aren’t enough. The website is broken, and it requires a bunch of bureaucracy. It also doesn’t cover a lot of people who were self-employed, or under the table. And unemployment benefits for business owners isn’t enough to cover their business bills.

And who knows if Congress will ever figure out basic income?

Cancelling rent, on the other hand, won’t require any paperwork besides the original order. You just announce that rent is cancelled. Everyone, instead of paying, doesn’t pay. No website, no bureaucracy, no worrying about whether you’re claim is approved or not, or whether you meet the arcane requirements for the program.

Such extreme times require extreme measures, not just in preventing the spread of the virus, but in preventing economic devastation. I not only want to save the lives of every New Yorker, but when this social distancing is lifted, I want to be able to walk through the city, and see it full of all the restaurants, bars, and shops that were here before, every last one of them.

We’ve already put the state of New York on pause, including the flows of money. Money flows in cycles, and rent is normally one part of a fully functioning cycle. If we pause the rest of the cycle, we have to pause rent too.

With social distancing, we’ve cancelled going out. We’ve cancelled fun. We’ve cancelled millions of jobs. We’ve cancelled huge swaths of the economy. We should cancel rent.

Open Internet, Closed Web

2019-12-23T00:00:00+00:00

The Internet promised — and still promises — a revolution in democratic, decentralized, and open communications. And yet, we see today a tech world controlled by a few central players, as Elizabeth Warren promises to break them up and Congress summons Mark Zuckerberg to explain his company’s role in privacy-violating election-manipulating foreign conspiracies. But Presidential use of anti-trust laws and new Congressional regulations of social media won’t address the more fundamental issues: The Internet is now structured, on a technical and social level, so as to naturally encourage centralized monopolies.

To explain this, we’ll first have to explain some terms. In common parlance, the terms Web and Internet are used interchangeably, but technically they refer to different elements of what now looks like a single system. The Internet refers to the single global connected network, and technologies that allow any computer on it to connect to any other computer on it — but without saying much about what the connection looks like. The Web is but one way of communicating information over the Internet, where you use a browser to access “websites,” but other ones exist: for example online video games don’t generally use the web to sync data between players. Examples are easier to find as we go back in time: the stand-alone AOL instant messenger app did not use the web, and neither did old-fashioned e-mail clients like Outlook or Thunderbird, or Bittorrent and other torrent trackers.

What makes the web different, that it has eaten up these other services, that now we do our movie-watching, our chatting, and our e-mailing in the web browser?

The web started out as a way of posting content — you would enter your URL, which identified what server (or publically accessible computer) you wanted a webpage from, and what page you wanted. The browser would send a request to the server, and it would send you back the page at that URL, likely either an article, or a directory of articles. They would have text and possibly embedded images, and could link to each other, and specify another URL to go to. The original concept of the web would have included sites like magazines, and envisioned sites like Wikipedia, but would not have been able to support e-mail or a chat app or a social media platform like Facebook.

The web was just the “public content” protocol alongside other protocols, and similar to them. You could choose your own browser, Netscape or Internet Explorer, and access the same web pages, just like you could choose your own e-mail client, Outlook Express or Eudora, to access the same feed of e-mail. The software was installed on your computer, and what you accessed through it was content, and that content all was for you to read.

Gradually, however, this changed as the web became more flexible. CGI allowed forms on websites to connect to programs that would be run in response on the server. Java and Flash and ActiveX allowed you to embed programs in your website — programs that you would not download and run on their own, but that came with the page and acted as if they were part of the page. And gradually, Javascript, originally used to validate forms before they were submitted, or to do simple animations, became powerful, as browser vendors competed to make it run fast, and as it gained more capabilities.

When you go to Facebook, you are not reading a page that someone posted there; you’re not accessing “content” in the traditional sense. What you are doing is downloading, on the spot, a large application. Not only is the content sent over the wire — the statuses, the comments, the pictures, the lists of people who like it — but, inseparably from it, we are sent the software that is used to process the content, the application used to enter it and generate it, sent to run in the browser every time we type in “www.facebook.com”. It is only through the lens of that Javascript program that we can access the content itself.

Indeed, every time we go to a modern website, especially one by a major tech company, we load a fresh program into our browsers. No longer are browsers just renderers of pages stored on servers, they are platforms where programs run, where the programs are written not for Windows or Mac or Linux, but for the web browser, now typically for Google Chrome, which has become an operating system unto itself.

Why does this lend itself to monopolization and privacy problems? For one thing, the web lends itself to an integration of frontend or client code, which runs on your computer, and backend code, which runs on a server. With a non-web protocol, you can use many programs to access the content on a specific server: different e-mail clients for the same provider, different trackers for the same torrent. You can also combine multiple e-mail providers or torrents in a single window. With the web, you go to the server, and you are provided with the client program to access the services it provides. You can’t take the Facebook Javascript code and point it at Twitter, nor can you expect your own custom Facebook app to work.

Imagine how a social network like Facebook might work if it were conceived of outside of the web. There might be a standardized protocol (say SSP for Standard Social Protocol) and multiple packages of client and server software. A school or a church or another community stakeholder might run their own copy of the server software, and you might have accounts on multiple such servers. All the status updates could be aggregated together in a single feed, and you could configure settings to indicate which servers your posts went to. Perhaps you could have “friends” at a server you don’t subscribe to, and specify both their username and what server they use (with an at-sign, like eric.smith@cornell.edu), and the servers could sync with each other so that you could still see their posts.

Who would pay for all this software be written? The software would be sold to you like Outlook was, or perhaps open source packages like Thunderbird (Mozilla’s e-mail client) would arise. And who would pay for the servers? Your school, workplace, ISP, or community, and probably you could sign up for a public ad-supported or for-fee service.

And in this model, if you control your client social software, you could have any strategy for what statuses it shows you and what doesn’t, rather than Facebook’s algorithm deliberately designed to addict you. You would be able to pay for the service rather than be thrown into a huge advertising pool.

It’s also fundamentally less monopolistic. You could imagine that someone, instead of using a standardized protocol, released a single client and sold the server software. Other companies or open source communities would soon make compatible software, and since the network of interactions was already decentralized, using those compatible systems would not prevent you from interacting in the same community, as happens to alternatives to current social networks.

Of course, they could also try harder, and force you to use their server, and release a single client, like AOL Instant Messenger did. But then programs like Pidgin came to aggregate that and other messenger clients, so that you could talk to contacts on different messenging systems in the same app.

This type of social network, which is known as a federated social network isn’t an unachievable dream. E-mail used to work this way before GMail gobbled it up, and still does theoretically: that’s why there’s an @-sign in e-mail addresses, to indicate which of many compatible servers you have an account at. Social media used to work like this, too: You could be a member of many listservs or newsgroups, and it would be handled through a single e-mail and newsreader app. Messaging doesn’t work this way, even today, but there is a protocol out there that would work like that, called XMPP: It simply never caught on.

There even exists software, like Mastodon, and a protocol like our hypothetical SSP, called ActivityPub, that does exactly what I just described. But Facebook, Twitter, Reddit and similar sites have stolen all the actual user-base. A social network, of all things, needs a certain critical mass before anyone can really get good use out of it: Facebook is very useful when everyone in your college was socially obligated to have it, less so when you have a niche social network only used by open source enthusiasts.

Before we talk about how or even whether we can or should turn the tide on this, I’d like to point out a side issue: Mobile. On iOS and Android, you do download individual client apps. But most of the time, we use the same model: You use the Instagram app to connect to Instagram services, and you use no other app for those services. WhatsApp messages stay on WhatsApp servers. If it’s the technological layout of the web that makes for this business model, why has it carried over into mobile?

I remember being excited when mobile came out for the comeback of the standalone application. There are multiple Twitter apps available, all posting to the same service and accessing, differently, the same content. But it hasn’t led to a return to a more federated model for new software, or openness in general.

There’s a few reasons for this. One is, by the time mobile platforms started gaining steam, the web revolution had already mostly gone its course. We’d gotten used to that business model. The assumptions that are built in to how the web works — that you would get your client software from the company that also provides the only server it works with — those assumptions had become entrenched enough that a different technology landscape didn’t overcome them.

Another is the closed nature of both major mobile platforms. It is very annoying to put an app on the app store. It is annoying to write one — historically, it was a quite constrained platform. Apple can and will reject you arbitrarily. It increases the barrier to entry, so that established companies have a huge advantage.

But the biggest reason, in my opinion, is that the mobile world and the web world are too entwined. Not only do we expect to use many services from the phone on the computer as well, where the web dominates, but the platforms use the same servers and often, the same frontend code. It is relatively easy, and commonly done, to use some or all of the code that normally runs in a web browser, and instead run it in a browser engine embedded into a mobile app.

So the pattern set by the modern web is deeply entrenched. The end result is a computer as an endpoint for service. Rather than as a tool we control and use directly, it is an adaptable terminal that we use to enter into corporate-controlled environments, where people make their livelihoods and run their social lives, but the rules can change at the companies’ whim.

So how do we return to a locally controlled system again? Anti-trust and regulation isn’t enough — that’ll simply change what companies we do the interactions with. Getting rid of the web isn’t feasible and probably still wouldn’t be enough — we’ve thoroughly convinced ourselves by now that this is how computers are supposed to work.

We need to build an alternative. We need a complete suite of software that replaces all the needs that websites currently have, but which do not rely on the same level of centralization. This requires a lot of work, and while open source software can spontaneously and freely arise as collaboration between companies when technical concerns are at play (Linux, compilers, libraries), when it comes to polished and well-designed products, that usually requires more explicit funding.

So if I were someday, somehow elected president, I would not only carry out Elizabeth Warren’s noble anti-trust plan. I would also fund a government program to give grants to build open source software that could be used this way, with a mission of re-building a computer culture that doesn’t rely on the same level of centralization and corporatization. This would be an effective use of tax money, because what differentiates software from other products is that, once created, software can be duplicated and re-deployed without any natural cost.

And federated social networks would be a small, relatively unimportant part of it. What if craftspeople could easily sell directly to consumers, rather than listing on Etsy? What if cab drivers didn’t have to sign up for apps that take giant cuts for doing very little? What if we had time logging and vacation tracking software for our small companies that actually worked? What if someone didn’t feel like they had to buy an iPhone so they could Facetime their family, but could feel confident using whatever phone they wanted?

Just Jump

2019-10-10T00:00:00+00:00

Kayleigh needed a break from work.

When you need a break from work, sometimes you go to the bathroom. Sometimes you stop by the coffee machine, chat with a colleague while it brews. And sometimes, you straight-up leave the office and walk to a nearby bar. Today, Kayleigh found herself taking that last option. She didn’t normally do this — she felt that, as the boss, she had to hold herself to a higher standard than anyone else, and drinking before the end of the workday was against policy. But today — well, she figured she just really could use a drink.

Kayleigh looked over at the bar. Marble-colored, when she would have preferred a more wooden, Irish pub, kind of vibe — at least for this situation — but again, you only have so many options within a quick walk from work at 3PM on a Tuesday. The bartender had a clean look about him, short-trimmed but still substantial beard, tan — maybe Greek or Lebanese — wearing a black vest with a colorful bow tie that was definitely part of his personal style, Kayleigh thought, rather than any sort of uniform requirement. Kayleigh herself was dressed in a tailored button- down, a vibrant blue tie that she was told brought out her eyes, and a vest — she liked to dress up for work, especially on a day as important as today was to be.

She walked over to the bar and sat down. The bartender looked at her and smiled, as he dried a glass in his hands. He set the glass down, and asked, in a cheery, light tenor voice, “What are we having today?”

Kayleigh thought a second. “Hmm.” What would be a good drink to get at such a venue?

“Could I have an Old Fashioned?”

“Can do!” said the bartender, with a slightly out of place level of enthusiasm. “We make them good here, I promise you.” The bartender paused a second, and then asked, “What sort of whiskey do you want with it?”

Kayleigh hadn’t considered this. She looked at the wall with the whiskeys on it, and then said, “Oh, why the hell not? Why don’t you put in that 18 year Macallen.”

The bartender’s attitude shifted a little. “Look, I don’t really think you want that. First off, I think it would cost over $100, though I’m not exactly sure. Second, I’m not sure it would even go well. Third, I mean, like…”

While he was talking, Kayleigh reached into her wallet and pulled out two very fresh-looking hundred-dollar bills, put them down in front of her, and said, “Make it as carefully as you can. This might be my last drink.”

The bartender broke eye contact and simply grunted acknowledgement. He looked visibly uncomfortable as he made the drink, looking back at Kayleigh several times as Kayleigh blankly stared ahead, still standing. When he got back, he asked, “So, this may be your last drink, you said? You quitting? Is it a health thing?”

She continued to stare. “Sorry,” continued the bartender, “I know that might be a personal question…”

“It’s nothing like that,” she answered. Then she looked the bartender directly in the eyes, her bright blue eyes drilling into his brown ones. “Would you believe me,” she said, in an almost sing-song, playful, tone, as she leaned to one side and smiled, “if I told you, that it was a super-clandestine top secret government spy project?”

The bartender chuckled, and made to walk away, but Kayleigh wasn’t done. “Do you believe in souls?” she asked, and she sipped her drink.

“Souls? Hmm…” responded the bartender.

“Because you see, I’m an atheist,” Kayleigh said.

“Seems reasonable. Most people I know are.”

“But my wife is a Christian,” Kayleigh continued.

“That seems stressful,” the bartender said, neutrally.

“No! We have a very happy marriage!” Kayleigh responded.

The bartender resigned himself to having a longer and certainly more intense conversation than he had anticipated. He turned to fully face Kayleigh again, put his hands on the bar, and smiled. “I’m sure you do,” he said, as inoffensively and earnestly as possible.

Kayleigh continued, “So, as I said, I’m an atheist, but my wife’s a Christian. No big deal most of the time, we have a very happy marriage. She goes to church, has a couple church friends, I tag along every once in a while, or I sleep in on Sundays, or even get work done. Most of the time, it doesn’t come up.”

“But where it does come up, see, is that she believes in souls. She believes that each of us has a soul. And it’s started to make me wonder, you see, if I have a soul.” Kayleigh paused, and became thoughtful looking again.”

“Ah,” said the bartender. “So you’re thinking about converting. And, um, giving up drinking too, then?”

Kayleigh blinked. “No, none of that, she would probably be more confused than anything else.”

“And what’s she do?” asked the bartender. “Does she have a very, er, spiritual line of work, too?”

“That’s the thing!” Kayleigh said, more enthusiastically than expected. “She’s a freaking neuroscientist. If anyone should have very clear reasons not to believe in souls, it’s one of those. Her colleagues really don’t understand it, some of them have even told me so.” The bartender nodded.

“But that’s not what’s important here,” Kayleigh continued. “Do you know about Star Trek?”

“I saw an episode or two,” responded the bartender.

Kayleigh said, “So the teleporter, where it takes you apart and puts you back together somewhere else, you remember that? Now, isn’t that like killing someone and then building a new person? Or do they have a soul, outside of them, that isn’t attached to a particular location?”

“I don’t think I remember it quite like that.”

Kayleigh resumed, “When I was a little girl, my father built a treehouse for my brother and I. And one day he — I mean my brother — started jumping straight from the treehouse to the ground. He’d always land fine, and I was nervous to do it — which was quite an embarrassment for me, because, you see, I was the older sister.

“And I’d get right up to the end of the platform, and I’d not be able to jump.”

The bartender nodded, confused about the conversation but on more familiar footing now. “I went bungee jumping once. You just have to do it. Once you do it, it’s suddenly fine.”

“That’s right,” Kayleigh said, nodding. “So finally I did. And I instantly regretted it, but I was already on the way down. And afterwards, I was fine. I was completely fine.

“And when we went to sleep, you know, they had this prayer they used to teach us. It was a little poem.

“Now I lay me down to sleep
I pray thee, Lord, my soul to keep
But if I die before I wake
I pray thee, Lord, my soul to take.

“I went through a phase where I’d be terrified to fall asleep. I’d just be terrified. What if I died in my sleep? But I always fell asleep, and when I woke up the next morning, I’d be fine. Like my cat. Although, I think, I think I killed my cat…”

Kayleigh was crying at this point. Tears were streaming down her cheeks. She had barely touched her drink this entire time. The bartender was completely flummoxed, and looked around, trying to see if anyone else needed anything. Everyone seemed to have drinks, and one of the couples was now assiduously making out. He could deal with that later. He turned back to his confusingly distraught customer.

Kayleigh sighed, and reached into her bag and placed two small, black boxes on the table. She opens one of them. “You want to see what my company built? What I dedicated my life to research and program and build? The secret government project I led?”

She didn’t wait for an answer, opened one of the boxes up, and lowered her drink into it. “You take something apart at the source, just convert all its atoms into a digital signal.” She closed the lid and pressed the button. “Then, on the other end, new atoms and molecules are built exactly the same way, but they’re new, different atoms and molecules, built out of the air around the receiver, just arranged according to the digital signal.”

At this point, she opened up the other box, and lifted her drink out of it. “Cool magic trick,” said the bartender. “My nephew can do that one too, I think. Maybe not with a drink though.”

“This isn’t a magic trick,” said Kayleigh. She looked around quickly. “It’s an actual teleporter, I already put my cat through one. And, my cat was destroyed. Turned into a digital signal. It died. But then, on the other end, he was completely like normal. It’s my turn next. I guess I’ll die. The current me will be destroyed. But the version of me on the other side won’t think like that, I guess. It won’t care. It will be a person, I think, but will I die, or will I feel like I’m jumping? Will that new person actually be me?”

“My wife says it’s OK. My wife says my soul isn’t in the actual atoms, but something about the structure of the atoms. Then she started talking about philosophy and Platonism and, you know what, I didn’t understand any of it. I’m a practical woman, you understand. But I led this project, and I have to do it. Souls or no souls, death or no death.”

At this, she downed her entire drink, and slammed it on the table. She exhaled loudly, made a slightly awkward fist gesture, picked up her machine, and walked into the bathroom.

Fifteen minutes later, she hadn’t come out of the bathroom. The bartender walked over, knocked on the door. No response. He tried the knob — it was still locked. He was trying to get Kayleigh to respond when he saw her walk into the bar.

“It was just like you said,” she said to the bartender. “Once I jumped it was fine.”

The Haskeller's Hungarian Notation

2019-08-11T00:00:00+00:00

When I was first learning to program, a long time ago, it was in BASIC, and you had to annotate your variable names to indicate what type something is. foo would be a number, whereas foo$ would be a string. This meant that there could only be as many types of information as there were symbols to put after your variable, but that was okay for the sort of programming BASIC was used for. These were called sigils, and they helped you keep straight in your head what was going on +++ and made it easier for the computer too. Any aggregates had to be explicitly declared.

Later on, I learned Perl, which had a similar system, but with a twist. A variable named $foo could contain a number or a string — or even some sort of object or reference — but it could only contain one of them. It was a “scalar.” @foo would contain many scalars with indices in an array, and %foo would contain many with string or other keys in a hash map. The computer kept track, dynamically, of the practical types of the scalars, and could easily do the same for the aggregate types, but chose to instead enforce a mechanism where the programmer would be reminded of whether it was a single value or some sort of aggregate that was being discussed.

In Haskell terms, BASIC had you use sigils for data types, but Perl had you use sigils for functors. And not to make people too upset by comparing Haskell and Perl, but Haskellers regularly do the same today, voluntarily annotating variable names with the functors by convention. For example, dmdMenuItems might translate, in a Reflex codebase, to Dynamic of Maybe of Dynamic of list of DomElement.

The usage originally struck me as quite strange, and I didn’t like it. I remember thinking the original Hungarian notation was redundant: int iFoo; literally says int right before it. And besides, wasn’t the point of a type system to not need extra mnemonics, because the compiler will stop you from messing things up?

At my previous job, we used prefixes like m_ and g_ in C++ to indicate scope (member variable/field and global, respectively), and it similarly took me a while to adapt. In those situations, it turned out to help because the sigils told you where to look for more information. If there wasn’t a m_, you looked in the same function, but otherwise you had to immediately go to the class declaration. But that wasn’t the only advantage. What scope something was in was important in how you treated the variable, in many subtle ways that would be bad to confuse, and which the compiler in C++ wouldn’t really help you with.

Similarly, in Haskell, indicating what functor something is in tells you something important: What kinds of things can you do to get a regular value out of it? Do you need to provide a default value (Maybe) or only provide it to versions of functions adapted for it (Dynamic) or perhaps just keep the functor around while transforming the values inside ((<$>), and (<$$>), and (<$$$>)…where which one depends on how many functors). And while the compiler will help us with this, it’s something it’s convenient to see all the time, and the types of each individual variable are sometimes inferred and always not immediately visible in every usage.

And when we do write the pure function or the lambda or the fromMaybe or the dyn_ $ ffor ..., what variable do we name it now? Many times we have many variables with the exact same semantic role, the only difference being what functors they’ve been wrapped with. We want to say ffor dSelectedId $ \selectedId -> ... or fmap (\number -> number + 1) eNumber or let fish = fromMaybe defaultFish mFish. The alternative is, what, judicious use of ' for the different but analogous variables? The difference between these variables, intuitively, is how wrapped up in functors they are, and that should also be the difference in their names.

And I’ve decided this is a good thing. Conventionalized terseness is the least problematic type of terseness. Single-letter abbreviations are great if it communicates information efficiently and everyone agrees on what they mean. I’ve seen dyn and may as well, and I prefer d and m, as they are easier to stack up without getting too unwieldy, and besides, dyn is used for functions and may is also a verb (does mayFish mean something that’s a Maybe Fish or a boolean about whether you are permitted to fish?)

And so, in spite of my initial skepticism, I’ve come to like this naming convention, and I recommend it to all of you as well.

The Letter from the Trees

2019-07-22T00:00:00+00:00

ENVELOPE HEADER:
Date:      January 5, 2027
To:        Rachel Friedman, President of the United States and Leader of the Free World
From:      The Roots of the Great Trees of Galaxy-Wide Civilization
Subject:   An Offer, an Apology, and an Explanation

The Offer

In the name of the One Almighty God: in the name of the Many Stars through which God is made manifest, in the name of the manifestation of God you call the Sun, and in the name of Original Star from Before Time, we offer you peace, not of a lack of conflict, but of a mutual growth. As branches must look to the vine for sustenance, so must you look towards us, as your own scriptures say, being a reflection of the truth.

To re-state in a more secular fashion: We are offering your species an alliance with our species, and an entrance into a great alliance that currently covers every intelligent species in the galaxy besides yours. In exchange for a price, which we can negotiate, you may also travel cheaply and quickly between all worlds of this galaxy, using technology that our species, the Great Trees, exclusively control but offer to all intelligent species. Know that no species, when given this offer, has ever refused it before, and we have no reason to expect your situation to be any different.

The Benefits of Our Offer

It is an unfortunate fact that in many species, experience with evil has led to a distrust of well-intentioned offers. Therefore, let us explain first why our offer benefits you, as proof of our good intentions.

Rapid interstellar transportation: Please see the appendices B-E for the scientific details: just know that the economic boom in practical science from merely having read our high-level explanation should serve as a goodwill token.
Trade: This follows as a collorary from the previous point, but bears re-emphasis. Technology of other species might solve all of your species most pressing problems.
Unprecedented cultural interchange: Though you are greatly advanced in some fields of art, other species are advanced in forms that you might not have acknowledged as art. Perhaps it is for this reason that God has ordained many species to exist!

The Terms of the Offer

Our technology comes at a price. We are not communists, nor are we a charity. This price is negotiable, and we concede to you the right to make the opening move in this negotiation, as you have the disadvantage of being the less advanced species and the species caught off-guard by the offer.

However, it is unfortunate that we must mention that there are certain non-negotiable requirements. These are part of the price (in one way of thinking); or, to a higher way of thinking, these are moral necessities that must be addressed before we can contemplate any settlement with you.

The non-negotiable requirements are as follows:

You shall immediately become what you call “vegan”. Animal intelligences are still intelligences, and you must not appropriate their bodies, their reproductive materials, or their other products for food.

After some debate, we have determined that “honey” is to be included in this (see Appendix I). It also will soon become unnecessary.
You will immediately prioritize a verification that the plants you eat are not in any way intelligent, including at a genetic level.
You will pay appropriate prices for proper scientific aid and guidance for developing the technology to consume only non-alive food sources.

An Apology

The reason you are the only unaligned intelligent species in our galaxy is not the simple one that your science fiction speculates most about: It is not the case that you have only recently achieved some formal definition of intelligence, nor that we have only recently learned of your existence. We have studied you in detail for quite some time and have entire departments at a few of our universities dedicated to understanding you well, and our linguistics, being far superior to yours, makes miscommunication unlikely.

It is also not because, as others amongst your writers have speculated, that you are the most warlike or otherwise morally corrupt of all species — though this is in fact closer to the truth. You are, after all, not culpable for your habits, not having ever been taught better.

Until this time, we have shunned you for a very simple reason. It is distressing enough to us that you must consume other living creatures for your sustenance, but clearly this is biological necessity and excusable — and also a situation that can of course easily be remedied through science.

It is shocking enough that, even though you are not obligate carnivores, you persist in eating meat — though of course the issue is the intelligence of the food, not the biological categorization — but this is not unprecedented among our allies, and simply comes from having been literally misguided, or rather, not guided at all. Moral instruction can fix this one.

But that you eat the means of reproduction of other species — specifically, those items you call eggs and milk — crosses the line from misguided into disgusting, repulsive, viscerally upsetting, not only to us but to the vast majority of our allied species. We were concerned that, even if aided scientifically and guided to moral truth, these habits were indicative of a deeper sin within your species. And furthermore, to be frank, we were worried that even if you did properly reform yourselves, the disgust would linger, knowing your history, so that we would not be able to deal with you as intelligent beings, as persons, after the alliance, in such a way that could destabilize us.

But this is in actuality our sin, not yours. The moral culpability is no stronger than that of other carnivore species. The Party of the Includers were right then, deftly distinguishing “misguided” from “sinful,” and we celebrate their wisdom regularly with a great feast. The new Party of the Includers are right now, distinguishing “disgusting” from “sinful” and reminding us that our intolerance does not imply your intolerability.

And so, after a deep and traumatic political realignment, a new government, a re-rooting to the literal deepest level of the Great Trees, we repent of our bigotry. We will offer you the same terms as every other carnivorous species. We will allow you to be like us, and derive your sustenance not from the destruction of intelligence, nor even from the consumption of life, but directly from God through the Stars and their Light.

An Explanation

Ourselves

We, the Great Trees, are the most technologically, socially, and civilizationally powerful species. We are the only intelligent species that naturally derives its sustenance from the light of Stars, the only “autotrophs” or “heliotrophs” as you call it, and we grow in the dark parts of the solar systems, where there are no planets to cover with our shadows nor benefits to the light. We grow, but we do not eat: We derive our matter from non-living dust, and our energy from that which God through each Star dispenses freely. We need no planet to root us, unlike the other species we have encountered.

As such, we are the only species that is morally perfect by gift of God, and it is our duty to bring this perfection to all species. The original Party of the Excluders thought that other species were not part of God’s plan, and to be eliminated, but to eliminate other life would be counter to what made us, as non-consumers of life, Chosen, and so we have repented of that viewpoint. If it is God’s providence to make only us perfect, and to make other species perfect through us, who are we to question? The creature should not question the Creator.

We will not even say that we are greater for having received morality directly, and other species lesser for only having received morality indirectly through us. The joy of the other species is distinct from ours, and it all forms a great pastiche. The creature should not question the Creator.

It is only fitting that, as the root of the morality of all species, we should also be the root of their technology. If there is danger of pride in this, it is easily avoided, as these technologies were also only given by grace of the creator. The creature should not question the Creator.

Details sufficient to detect our presence within this solar system are attached in Appendix A. Perhaps ironically, and perhaps as a sign from God that your position should be accommodated, the solar system in which you reside is also the capital of our civilization. Your presence, as the sole unintegrated species, has long been an embarrassment to us. Why has God chosen to test us with the hardest test near our most precious Star? But the creature should not question the Creator.

Your History and Chosenness

Among your various peoples, the Jewish people have the tradition of being Chosen. This was accurate in the time in which the decision was recorded: God had in fact Chosen them. This is because they, among all the nations, devised laws for the protection of the animals as they were eaten. The additional rule that milk not be consumed alongside meat was also a mitigation of the obvious fact that milk is disgusting, and contributed greatly to their Chosenness by God. Do not misunderstand: This is the primary reason for which they were Chosen, and to the extent that they do not continue among that path, they have lost their Chosenness.

For example, Judaism has retained some of its Chosennness, whereas Christianity and Islam have lost it. In the modern day, however, it is rather the Vegans that are the most Chosen (though they do not form a people in the traditional sense). The Jains, who have taken Hinduism to its logical extremes, are especially Chosen.

But enough of this! It is no matter! We are extending our own Chosenness to you! Rejoice, and join the rest of the galaxy! Chosenness need not be distinctness. We all desire to be Chosen at the expense of the unchosen, but the creature should not question the Creator.

Students and Teachers; Questions and Leadership

The creature should not question the Creator. On the same note, however, the students should not question the teachers that the Creator has appointed for them. Our experience of other species and our research of yours shows that sometimes, students can be unruly.

They question our requirement of vegetarianism, even if, soon enough, foods will arise that imitate meat in nutrition and taste (a dubious value, but one nevertheless easily accommodated).

They question our moral superiority, despite its evident reality to any creature with a conscience.

They question the very existence of God, despite the Stars as God’s concrete manifestation, and, on the abstract level, of God’s logical self-evidency.

It is your role as leader to prevent, contain, and counter this inappropriate questioning. It is disappointing that you have made so little promise until now in organizing your populace.

We have noticed that your title is “Leader of the Free World.” We would greatly prefer to deal with a “Leader of the Species,” or, as those who believe themselves to be the only inhabited planet would have it, the “Leader of the World.” Alas, in your case we cannot.

Perhaps your inability to become Leader of the World is in the title itself; perhaps it is due to your insistence that the nations you lead be free. You subject yourself to an official term limit, and position your nation as one among many equal nations in an alliance. This is not an effective way to be a leader nor to have an alliance. A leader is a representative of God: Our moral scientists can verify for you (and a proof is included in Appendix F) that your concept of “popular sovereignty” is not only logically incoherent but morally depraved.

This is only speculation. Your species’ moral failings fascinate the minds of many of the greatest scholars. We trust you to address them in appropriate fashion.

Following Up

We do not presume to do your job for you. Our greatest scholars cannot possibly reach the level of nuance with which you understand your own species. We shall, however, insist on working with you as opposed to others for this transition process, as you are God’s clear appointed representative to lead your species. If you do not respond with appropriate arrangements for a meeting, we shall arrange one for you.

Arrangements for communication are included in Appendix G. Please coordinate with whichever scientists brought you this message (the way it was communicated is fully explained in Appendix H), as they have clearly been Chosen to mediate between us. For a demonstration of our sincerity, however, please communicate with whatever scientists you trust most.

Conclusion

Peace be with you, from God through the Chosen species, the Great Trees, not as your world gives, but as our species alone, through God’s grace, can give.

Components of a Modern Operating System

2019-07-11T00:00:00+00:00

In previous posts, we discussed historic operating systems and where various OS features come from, but we only gave a brief overview of how they worked.

Now that we have a modern operating system’s full complement of features, we can look at what components need to exist in a modern operating system to get those features. As discussed with MS-DOS, an operating system, even today, is partially code, and partially conventions, like file formats or rules of good behavior – the difference being, that modern operating systems have more ability to enforce some of these conventions.

These conventions are still important. Linux is considered a version of Unix by the original authors of Unix — even though for legal and trademark reasons it is not — not because it has any code in common (it doesn’t), but because it follows the conventions of Unix.

So on our tour we’ll discuss both more concrete software components that are a body of code, and also conventions that hold the operating system together at various levels.

The Kernel

One big problem with the MS-DOS model is that a program could circumvent its interfaces. It could directly access hardware if it wanted to, without regard to the OS’s file system code, setting the file system conventions in stone. A program could install your own procedures to run when hardware events happened, its own interrupt handlers, and the system wouldn’t stop you.

This wasn’t really a limitation of MS-DOS per se, but of the 8086, the processor MS-DOS was designed for. If code is running on an 8086, it can execute any of an 8086’s instructions, no matter what. A more modern processor – including Intel’s later processors and therefore most of the processors MS-DOS ran on in practice – has a distinction between user mode and a supervisor mode, which will only allow hardware access to take place while the processor is in the supervisor mode (also known as kernel mode).

Application code, regular program code, will all run in user mode. A lot of operating system code can as well: How much code should be actually run in kernel mode as opposed to user mode is a complicated design decision. Certain instructions in the processor are only allowed in kernel mode, including those that control what memory is mapped, or currently accessible, those that install interrupt handlers, and those that control which pieces of hardware the processor is currently permitted to send data to.

In MS-DOS, all code was functionally in kernel mode – or more precisely, in a legacy mode of the Intel processor that emulated a time when the distinction didn’t exist, and all instructions were always allowed. A separate mode, referenced above, put the processor into a different legacy mode where it also acted like an 8086, but invoked special procedures whenever the program executed a privileged instruction, basically allowing MS-DOS to run inside a sandbox inside a larger operating system (I’ve used both Windows and Linux as the larger operating system in this model).

Unlike MS-DOS, a modern operating system will have controls on what is allowed to run in kernel mode, and everything else must run instead in user mode. The body of code that is intended to run in kernel mode is known as the kernel, or kernel code. If someone asks you what an operating system kernel is, this is the answer — the set of code that runs in kernel mode. It might be stored in multiple files, it might be all in one file, and it might be divided into internal components with different names, but that is what the kernel is.

So, if only the kernel can access hardware directly, and most code isn’t allowed to be in the kernel, then how does a normal application access the hardware? Well, instead of accessing it directly, the application must ask the operating system to do the thing on its behalf. Just as the operating system can install procedures as interrupt handlers, for the processor to trigger in case of hardware events, it can install system call handlers, procedures that run in kernel mode but can be invoked in user mode. These procedures will be designed to make sure that the user program in question is accessing the hardware in an acceptable way, and only perform the operation if it is allowed — possibly, there will be no reasonable way for the program to even request an impermissable hardware operation.

This is a key distinction between MS-DOS and even older Mac operating systems: whereas all operating systems provide abstractions, those with an OS kernel can provide mandatory abstractions. This means that, if you want to support new features, you can change what the system calls do, and all programs will automatically adapt to it. If your file system is suddenly stored over the network, programs won’t get tripped up trying to access the hard drive directly. The operating system can insert itself at the level of the system call interface and redirect your request to the network instead — if the system call interface is well-designed.

The Application Binary Interface

So let’s say you have a Windows program, and you want to run it on Linux. Or you have a Linux program, and you want to run it on macOS, which are both Unixes and have a better chance of being compatible. It won’t work — certainly not “out of the box.”

Why? Well, one reason is mentioned above. Different operating systems provide different ways of organizing the functionality of the computer into system calls. They provide different abstractions, which are nowadays mandatory.

For example, on Windows, different drives use different letters, and volumes shared over the network are also assigned letters, e.g. the famous C: drive, or A: for floppies, or X: maybe for a shared drive. On Unixes, different volumes — Unix doesn’t use the word “drive” as often — are assigned different mount points within the system. One volume might be /, and another at /home, and another /mnt/network, and it would provide the illusion of one unified hierarchical filesystem. Imagine if you had — as a simplified example — a system call to assign a drive letter to a network share. This would make sense with the Windows abstraction, but what would it even mean on Linux?

Another reason has to do with how programs are stored on the drive. Programs are not just a list of instructions for the processor. They usually have to be loaded at a particular address. Memory must be mapped for them to store their variables — and how much memory varies program by program. They have to load libraries of other procedures, which may be stored separately through dynamic linking in a shared library (Unix terminology, .so) or a dynamically loadable library (Windows terminology, .dll), which is also going to be mapped at a certain address in memory according to arcane rules.

Different operating systems have different binary file formats, or formats for storing programs (which are often called binaries when stored on disk, although everything a disk stores is in binary). Linux has ELF (Executable and Linkable Format, which can use DWARF to store its debugging information), Windows has PE (standing for portable executable, which falsely implies it runs on more systems besides just Windows). Different Unix varieties have different binary file formats — it’s something that evolves over time. Some operating systems — many operating systems +++ have different binary formats supported, for backwards-compatibility, or for simulating other operating system, or even for different types of programs or programs written in different program languages.

The combination of the set of available system calls, the available libraries on the system, and the format of the binaries, constitute the main blocker to compatibility between operating systems, the ABI or application binary interface, an acronym or phrase that is intended to sum up everything that needs to match for binary compatibility, the ability to run binaries (compiled programs as they are usually stored for running) from one system on another.

The Application Programming Interface(s)

There are other kinds of compatibility. Even though you can’t take the Windows version of a program and run it on macOS, we see plenty of programs that have versions available, right on their website, for both Windows and macOS. Similarly, most phone apps are available in both the iPhone and Android stores.

In some cases, that’s because there’s two applications, written by different teams, that solve the same problem (and have the same branding) or interact with the same servers (which run on Linux and where all the complex stuff happens anyway). But in others, it is substantially the same program that is run on both systems.

In many cases, though, that’s because the versions were written sharing a lot of the same source code, with a layer of software interfacing between that and the specific operating systems in question. This might be because there were different teams (or people) who maintained compatibility layers proprietary to that company (this is what many traditional software vendors do and have done in the past). Nowadays, it is more likely because there was a programming language that has implementations available on both platforms, and versions of the same library functions available for each (which is what Java was originally famous for and what Python does today).

This is fairly common for relatively new programming languages, where the program language was written after the operating system was already around, and where part of the point of the programming langauge is to support multiple operating systems for your programs. For programming in an operating systems “native language,” so to speak – for programming in C on Linux or Objective-C on macOS, it’s a bit harder: An Objective-C macOS program is unlikely to be particularly portable to anything (except maybe iOS).

There are some exceptions to this. A program written for Linux can usually be made to run on macOS, because of their common Unix heritage. Even though Linux and macOS have different ABIs or application binary interfaces, they have very similar APIs, which stands for application programming interface (NB: This term means something different in a modern, web programming context). This means that, although they are not very binary compatible, they are source compatible, or close to it, which is to say, that there are few changes to the source code you would have to make to a Linux C-based program to make it work on macOS. It might be invoking different system calls with different identification numbers when you write the code to open and read a file, but that code looks exactly identical on both platforms, possibly something like this:

    // Simultaneously both Linux and macOS C code
    int file_descriptor = open(filename, O_RDONLY);
    ssize_t res = read(file_descriptor, buffer, sizeof buffer);

As you might have picked up, this applies to only a subset of the functionality. Any GUI-related code would not enjoy this level of portability — macOS and Linux have very different GUIs. More likely this is code intended to be primary run on servers (and perhaps run on a Mac for testing), or code used by programmers (like git and other development tools designed to be run from the command line) or by scientists or other researchers (like the non-GUI components of Matlab and R or even Python).

The baseline API that all Unix-like operating systems have in common is called POSIX. Operating systems are certified as brand-name Unixes based on a bigger API specification, with more functions and more requirements, called X/Open — which is to say that Unix is defined not by where the code originated nor by its ABI, but rather by its C programming API. To be clear, an operating system based on Linux could probably pass X/Open and become legally a Unix, but nobody has decided to spend the time and money to try and make this certification happen. It is the fact that it is as close as it is that leads many of the original developers of Unix to consider Linux “a Unix,” as it is this API that ties the Unix family together.

The Unix/Linux API is so important that Microsoft needed to add it to Windows and that macOS’s native use of it is considered a selling point, especially for developers. This is because a lot of server software and programmer tools assumes this Unix API (as well as, for example, Unix filesystem conventions), or else it assumes Linux which has few enough peculiar features to make much of a difference. Most users are isolated from this, but anyone who has to write software to run on servers (which is most programmers) or use programmer tools (which is all programmers) is very keenly aware of this.

This Unix API is a core API provided by the operating system itself, the official, default way for applications to be written, but the other programming interfaces discussed above are also APIs. That is to say, Java comes with its own API that it brings to every operating system it runs on, leading to it its once-famous “write once, run everywhere” slogan.

The most important API for application compatibility today is something irrelevant to most of this discussion though, and relatively new to operating system history. Most applications that run on your computer today run in Javascript in the very controlled environment of a web browser. Part of what a web browser does is provide a stable, cross-platform (that is, multi-operating system) API for the portion of a web application that runs on each local computer. This interface is so important that many modern apps for phone and desktop are internally implemented as running inside a web browser, or something that resembles a web browser in more or fewer ways.

The System Library/Libraries

We spoke in the last section about the POSIX or Unix APIs. There are a lot of functions that a Unix-like operating system is expected to provide functionality for, in a lot of domains. Some, like opening or reading files, more or less have to be implemented as system calls, at least the most basic versions of them. Others, like calculating a square root, are simply procedures that run in user mode. Still others, like printing a number to the console, have to involve some system calls (to output text to the screen) but also some computation appropriate for user mode (to convert the number into a string of digits).

To provide these functions, Unix-like systems will provide their own version of the C standard library. On most Unix systems, this is maintained by the same organization that maintains the kernel, with Linux as the major exception. The set of POSIX APIs that a Unix will maintain is implemented through the standard library — some of them system calls, some of them implemented in user mode, and the programmer doesn’t have to care which.

In fact, between versions of the same operating system, and certainly between different operating systems, what used to be a system call might become a wrapper around a new, more advanced system call interface, where basically the library is providing compatibility with other versions of the same operating system. This is especially important in Unix, as there’s a lot of calls descended from different branches of the family tree with slightly different semantics, or subtleties of meaning, all of which are used by modern programmers, who can use whichever is more convenient to them or simply preferred.

The library enables source compatibility and API compatibility, even in situations where the kernel itself is much more particular about its system calls. The question is, where does the ABI compatibility layer go? On Linux, the kernel itself is responsible that its updates don’t break working programs, and its founding and lead developer Linus Torvalds is adament and dictatorial — sometimes abusively so — about this rule. If you want a system call to behave differently, what you do in these situations is actually make a new system call that behaves the new way, and leave the old system call available at the old number in case a program wants to use it.

However, all modern operating systems support dynamic linking. This means that the libraries and the main program binary are stored in separate files, and the main program binary specifies the names of the functions it calls, rather than using numbers. If all programs use dynamic linking, and only call system calls through the library, you can update the library to use a different system call interface, and change the kernel along with it. This is what macOS requires +++ while it is technically possible on macOS to bypass the library to call a system call, the attitude is, that if you do that, you should not expect your program to work as expected. The operating system will still ensure it won’t break other programs, but will not guarantee your program to behave the same from version to version.

These are two vastly different approaches to maintaining ABI compatibility. In making the standard library part of the ABI, macOS doesn’t allow static linking, where all code in a process comes from a single file and a copy of the libraries are placed into the main binary when you compile it. It’s not only not recommended — by default, it will not even run statically compiled binaries. If you want to have an alternative version of the C library, you can’t. If you’re writing in another programming language that doesn’t work like C, you still have to go through the C library to talk to the operating system, which isn’t written necessarily with other programming languages in mind.

But, the kernel developers have the ability to control their system call interface better. If they want to add a new system call, they can make their old way of doing it call the new system call, and keep the kernel cleaner. This is important because all code in the kernel constitutes a greater level of vulnerability — if a kernel accesses unmapped memory, it’s generally a kernel panic (the Blue Screen of Death on Windows), but a user process will just crash with a segmentation fault. Or worse, if you exploit a vulnerability in the kernel and manage to manipulate it into doing something for you it wasn’t supposed to, it can literally do everything on your computer. This as opposed to a regular program, which still can only do things the kernel permits it to.

Linux, on the other hand, has more flexibility. You can have statically linked files, your own C library, or libraries specialized for other programming languages. You can avoid all the baggage that comes with its implementation of the C library functions that have nothing to do with system calls.

Honestly, my preference would be somewhere in between. I’d have a smaller library than libc — maybe libsystem — that every program would be automatically dynamically linked to. This would be for things that are usually implemented as system calls, or that were system calls in previous versions of the operating system. These would be things that any programming language might reasonably want to use. The more C-specific stuff would be relegated to its own, more general library. libsystem would be as simple as possible.

Libraries that form part of the main API and that are provided with any installation of the operating system definitely constitute part of the operating system. Libraries that come bundled with specific application or that exist to do certain program tasks are not part of the operating system. Which count as core operating system functionality is up to the operating system vendor, but all operating systems come with at least some libraries, to abstract their austere system call interfaces into something that you can actually program.

The Shell (Command Line)

All modern (non-mobile) operating systems come with a command line interface, whether on the computer or on the server. When you type commands into the command line interface, it isn’t the kernel itself that reads the line you typed and decides how to proceed. Instead, a separate process does that. This process is key to the core job of an operating system — letting you run multiple programs and share resources between them — and therefore counts as part of the operating system, but is also not part of the kernel.

The concept of having the shell be a user process like any other was actually one of the early innovations of Unix over other contemporary operating systems. Before that, the kernel would often be responsible for this. By removing it from the kernel, Unix allows different users to use different shells, with different syntax for advanced features like scripting or running commands conditionally on the results of other commands. Even Windows has two shells now, traditional cmd and its newer “object oriented” PowerShell.

All shells can run any terminal-oriented program, and usually can also be used as a starting point to launch graphical programs when the system supports it, i.e., when it’s a desktop OS and not a server OS.

The Shell (GUI)

Not all modern operating systems have GUIs. Remember that many computers are servers (or embedded devices) where you don’t actually sit at a monitor and keyboard — where they likely don’t even have a monitor and keyboard. But for those that do, the concept of shell can be generalized to the program from which you run other programs.

On macOS this is called Finder, and it dates back to the early pre-Unix Macintoshes. On Windows this is called Windows Explorer. On Linux, and other Unixes that share Linux’s user interface philosophy, there are multiple desktop environments available, each of which handles program launching differently, and each of which usually comes bundled with a window manager that draws decorations around your windows and allows you to minimize, maximize, tile or overlap them. This leads to a rich diversity of Linux systems in their appearance and casual use.

It is usually these graphical shells, these desktop environments, that form your mental image of what an “operating system” is. But that can be misleading. Linux can have one of many different graphical user interfaces — or none at all — and most of what makes Linux Linux will be the same.

So what about Linux, Android, and ChromeOS? Are they the same operating system then, because they all share the same kernel? Linux and Android differ at a deeper level than a shell. An Android program can’t be run on a normal Linux distribution without some layer to accommodate the additional libraries, and vice versa. The different desktop environments on Linux all tend to be compatible with X, a unified protocol for UI interactions, and the many command line shells all run the same set of command line utilities, but Android display is not done through X.

In the case of ChromeOS, the situation is different. The shell in ChromeOS is basically the Google Chrome browser, which is the same thing that on other platforms acts as a single program in a larger context. So many programs nowadays are run through the medium of the browser that it’s become more than a single program in practice — many people only open the browser on their computer and use that for all or almost all of their computer-oriented tasks: one tab open to GMail for their e-mail; one tab open to Twitter; one to Spotify, to play the background music; another to Slack to talk with their colleagues; and finally yet another to Google Docs to do the actual productive work of writing whatever it is they’re writing. Is Chrome a shell in practice on these other operating systems? Is it just an annoyance for some users that there is the taskbar to switch between multiple programs, in addition to the tab bar to switch between multiple websites? Google certainly thinks this is true for some users, and it is for them that the Chromebook is intended.

Father, Forgive Them

2019-06-20T00:00:00+00:00

Father, forgive them, for they do not know what they are doing.

Jesus, on the cross (Luke 23:34)

My grandfather always used to love telling a certain anecdote about Calvin Coolidge. He was a man of such few words that one time, President Coolidge went to hear a world-famous preacher preach. Upon returning from the sermon, his wife asked what it was about. He replied “sin.” Not satisfied with the answer, the wife asked, “Well, what did the preacher have to say about sin?” The response: “He’s against it.”

It was a running joke every time my grandfather came home from church — like many older members of our congregation, he tended to go to the shorter, more convenient Saturday evening service, and when he got home, my father would try and get a preview of what the Sunday sermon might be about. My grandfather’s answer: “Sin, and he’s against it.”

The joke is that all sermons are about sin. All the preachers are against it.

But as a child, I noticed there was something slightly off about this joke. At my church, the sermon often focused on topics besides the preacher’s, or even God’s, opposition to sin. Which is a good thing, because that’s not the focus of Christianity, either.

Trying to force people to become better is something we leave to the government and to the police. But even at their very best, when they are doing the most good they can do, they only do a superficial job. The police, at their best, can get people to not steal for fear of being arrested. Ideally, we want people to not steal because they know that stealing hurts people.

There are many people, in many traditions and faiths, whose religious beliefs can be summarized as “We must not do bad things, because otherwise God will be angry, and will punish us.” There are even some Christians who think like that.

But Christians should know better. Christians should know that the summary of the Gospel is not “punish the evildoer” or “God is a better, omniscient policeman” but rather “For God so loved the world, that He gave His only-begotten Son, that whoever believes in Him [ – whoever trusts in Him – ] shall not perish, but have eternal life.”

We have a God who loves us. We have a God who loves us so much that He talks about forgiveness while He is being murdered. This is a good message for me, because I have no trouble believing God exists. I do, however, regularly have trouble believing, as the Bible says, that God is love, that God loves me.

Now, I’d like to make clear that God is still against sin, that most preachers are still against sin – it’s just not the primary message God has for us. And this is where I’d like to move on to the second part of Jesus’s quote: “for they do not know what they are doing.”

Jesus was specifically speaking about the soldiers who, unaware that Jesus was the Son of God, were doing their jobs in executing Him. But more generally, He was talking about everyone involved in killing Him. And, even more generally, since Jesus specifically says that harm we do to each other is harm done to Him, I wonder if he wasn’t talking about all of us who harm each other, saying “Father, forgive them, for they do not know what they are doing.”

Now, when I started thinking about this, this shocked me, because it seemed like God was giving us an excuse. And I’m pretty sure God isn’t keen on excuses for hurting other people; Jesus said “if your eye causes you to sin, cut it out and throw it away,” showing that people who blame body parts for their behavior are putting the blame in the wrong place.

God doesn’t need to have an excuse to forgive us. God’s forgiveness doesn’t come from a place of “what they were doing wasn’t all that bad.” The horrible things that we, as human beings, each and every one of us do to each other actually are all that bad, as demonstrated by the fact that when a perfectly loving person comes into our midst, our reaction is to kill Him.

But I think it does say that, just as the soldiers didn’t know that the Person they were executing was the Son of God, we don’t fully process the consequences of our actions. We don’t fully see, and we forget, that the people we treat with disrespect are made in God’s image, that they are fully alive and conscious as we are. We don’t fully see, and we forget, what deep consequences our words can have.

And we need to do better. Not because God will be angry and unable to forgive us if we don’t, but because God does forgive us, and because other people matter. And, if we put our trust in God, he will forgive us, and transform us, not through fear, but through love.

I originally gave this as my portion of a series of meditations on the last “words” or statements of Jesus, when I was asked to do that at my church on Good Friday, 2017.

Experiences in Switzerland

2019-06-19T00:00:00+00:00

Just wanted to write up a summary of random notes from my Switzerland trip, not including the conference which was also a lot of fun but I think less interesting for my non-programmer friends, slash it might make for a better separate post.

SIM set up

It was relatively easy to buy a Swisscom SIM card in the airport, although they did not offer to set it up in my phone for me. This would’ve been useful, as it turns out my phone was locked (which is more an idiosyncracy of the US as opposed to Switzerland). I instead ended up purchasing a mobile hotspot (the German word for which, I was told, was “Mobile Hotspot”), which was easy to set up and worked perfectly with my phone.

Bicycling

The bikeshare app I ended up installing was PubliBike. Bikes here are for some reason known as “Velos,” which I can’t find in any German dictionary but apparently is from the French word. There are many signs up all over the place informing you that you can’t leave your Velo on this wall or that railing, some of which include threatening pictures of them being taken away by a truck.

PubliBike works via BlueTooth, which was frustrating at first because I knew that my cheap Samsung, in addition to being locked, has bad BlueTooth support. I thought this was going to keep me from using the system at all, because I also didn’t realize when they said to hold your phone 20cm away from the lock, that you needed to make sure it was far enough away from the lock as well as close enough. I didn’t understand at first why they used BlueTooth rather than using the Internet and a code (like CitiBike does) until I read the troubleshooting website +++ it doesn’t require the Internet to work. Given that my mobile hotspot is often low on battery (is anyone who knows me surprised by this?), this seems to have been the correct decision.

It works by locking and unlocking the rear wheel. The stations just have the bikes up on kickstands — no docks, unlike CitiBike. I feel like that should be a good way to get 10% of your fancy new Velos stolen each night, but it seems to work. I guess they probably have security cameras in the actual stations. The wheel-lock system does have its advantages though: you can lock this partway along your ride to go into a shop or a public restroom (marked by WC everywhere). This was extremely convenient until it failed to re-unlock with my crappy Bluetooth, and I had to carry the bike 5 blocks on the street while being worried I looked like a thief.

I don’t know some of the biking laws here but it seems intuitive enough. The surprisingly big gotcha is that it’s harder to tell if streets are one-way here as opposed to New York City due to the relative lack of street parking. I didn’t realize how much I relied on which way the cars are facing to determine which way it might be okay to bike. It doesn’t help that often streetcars and bikes are allowed to go both ways on an otherwise oneway street.

One welcome difference is that everyone seems to be more okay with riding on the sidewalks here — certainly many of the bike paths are on sidewalks rather than on the road itself, and many of the sidewalks that don’t have a specific section for bike paths nevertheless have signs indicating that cyclists should use them.

Some street lanes are the same system, no sharrows but lines directing you onto the lanes, and many of them are too narrow for a car and a bike by NY standards, and yet cars will pass you with barely any extra room, which is scary.

Also, the rails for the “busses” are everywhere. Be careful and don’t try to move across them too slowly or subtly, or your wheel will get abruptly stuck, which is inconvenient, especially if you’re going quickly.

The PubliBike stations seem reasonably spaced, but I suspect they don’t go as far out into the outer regions of the city as CitiBike does. Then again, I don’t know how much outer regions Zürich really has compared to New York.

I’m trying to figure out if cycling is safer and a bigger deal here than in New York. There seem to be about the same number of cyclists around, but fewer people, so I think that makes for a higher proportion. I haven’t actually checked the data, but I was certainly happy to see a sign (confusingly on the outside of a bus) telling you to check for bicycles when leaving the bus. Hopefully they have them on the inside too — I wouldn’t know, I’ve been biking instead of using the bus.

Bathrooms

There are more public bathrooms than in NYC, and unlike in the US there are signs telling people where to find them for parks where it’s a common need. They are willing to charge for them when that helps with maintenance, although some park bathrooms were both free and clean. I wish NYC had more paid-for public bathrooms. I don’t want to have to pretend to want to buy a coffee, insult the proprieters by asking if they have a bathroom first, etc. when what I really want is a bathroom.

Fountains

Similarly, public drinking fountains are a great amenity. Another American I met at a bar did not realize you could drink from them — to American eyes they just look decorative. But they are in fact full of clean water, many on a different system from the main tap water system in case some attack is made that would render the main system undrinkable. I’ve seen plenty of people refilling bottles with them, and a few, like me, just drinking from them. The one closest to the train station has “TRINKWASSER” (drinking water) written on it in big letters, with appropriate accompanying iconography, I guess in case foreigners are otherwise confused.

The water runs continuously. This contributes to the impression that they’re decorative, as in the US you have to press a button to get fountain water. I don’t entirely know why this is in a place like NYC, as we’re not really short on water in any significant way. Certainly Zürich isn’t. “Wasting water” when you’re immediately next to a humongous clean lake isn’t a thing. If you didn’t drink it, it would have simply flowed to the ocean some other way.

Other Transit

The light rail/trams are called busses (Busse) which is how they act, except that they have little connections to a power source and rails that they go on. There are special traffic lights to announce their comings and goings.

The trains proper have a well-organized system. Tickets can be bought as valid for 2 hours or for 12 hours, and are checked randomly on pains of severe fines. Round trip tickets are impossible in such a system, and the reasons why are left as an exercise to the reader.

Church

I went to a Lutheran church on the Feast of Pentecost, and I should have remembered from church growing up that that was the traditional Lutheran time to do Confirmation. Lo and behold, there wasa a class of confirmands and the church was jam packed with proud parents and relatives. It was clear they were not used to having this many attendees, and I sat on an extra chair that had been brought out.

They said the more old-fashioned and literal “und mit deinem Geist” instead of some version of “and also with you” in response to the pastor’s exclamations of “The Lord be with you.” I don’t know what makes Americans so uncomfortable with this, but I imagine it has something to do with our obsession with informality and egalitarianism (“and with your spirit” is traditionally only said to an ordained person and it references the special presence of the Holy Spirit in ordination).

They seemed in some ways extremely low church (no one crossed themselves at any blessing, not even when one of the pastors explicitly made the sign of the cross over the congregation) and in some ways high church (the Words of Institution were chanted, the liturgy was crystal clear in affirming the Real Presence of Christ). I found this a bit confusing.

It was also difficult to tell how a normal service would have been. The Eucharistic Liturgy was definitely abbreviated (and also not published in the bulletin, and much to my dismay their version of the Lord’s Prayer was not exactly the same as the German version I knew), because of the confirmation. A lot of time was dedicated to the pastors giving personal messages to the confirmands, and some of the confirmands talking about what Christianity meant to them.

The sermon itself was also about Confirmation (with some reference to Pentecost as the birthday of the Church), about how God’s love should set you free from caring about what people think about you, with an illustration of that point being based on Justin Bieber’s song “I don’t care,” which was played over the sound system. The preacher explained that, taken with God instead of his “babe,” this was a perfect attitude to have with Confirmation.

Besides the random Justin Bieber, and one hymn where one of the pastors played a guitar¹, the music was traditional. The congregation sang competently but not amazingly, the musical notation was printed in the bulletin but apparently would normally be in hymnals (which in good Lutheran tradition also contained the liturgy in different musical settings and were presented as gifts to the confirmands), and it was accompanied by an organist and a recorder² player.

Cultural Issues

The big debate of the moment was over the “women’s strike” (Frauenstreik) that happened the Friday after I arrived. There was a debate about it in one of the magazines between two women. The criticism was that apparently, many of the signs at such events are anti-men, and that it seems overkill when women have plenty of rights and things are heading in the right direction. The woman who was in favor then talked about rape, and much to my shock, the woman who was against it said that rapes would go away if “people not fully integrated” i.e. Muslims would go away ³.

There was also an expert on happiness in another magazine who was asked, among other things, about some or another ranking that had listed Switzerland as in the top 5 happiest countries. His claim was that Swiss people were simply too conscientious to admit how unhappy they were, that they felt guilty for being unhappy as they were where they were in objectively one of the richest countries in the world.

I think a lot of this sort of thing must certainly influence happiness by survey. First world guilt is a real thing. And how much people lie on surveys or even how much they adjust their concept of what it means to be happy culturally must vary tremendously from culture to culture. Every time I see such rankings (usually used by Americans to idolize Europe), I wince and roll my eyes. It’s nice to see an expert on happiness similarly drawing criticism to them.

Most of the discussion in the magazine seemed to be about feminism (like the US, but a different tone) and about the environment (and relatedly, about autonomous vehicles, the future of city planning, and bicycles).

Food

I mostly ate bratwursts that game with ridiculously dense bread, hotel breakfast with lots of eggs and bacon and a little bread, and only occasionally raclette, pasta, or pizza. Everything was tasty.

There was one man and one woman, and they worked together extremely smoothly, handing off portions of the service with complete synchronization. ↩︎
Blockflöte. Why on earth do we call this perfectly good instrument with such a stupid name? There was also an elderly man playing recorder beautifully while waiting for a bus. ↩︎
I don’t have the original German anymore. Sorry. ↩︎

Putting On Airs

2019-06-10T00:00:00+00:00

Julia liked Eric. She wasn’t in love with Eric, she didn’t fantasize about marrying him or idly think about what their children would be like, but she liked him, an appropriate amount for having met him only two times. Internet dating was strange to her, and she knew that dating took work. And besides, it was a good sign she was mature enough to not feel those goofier feelings yet. She would instead be, appropriately, cautiously yet earnestly excited.

He had invited her to his apartment to cook for her (she had requested burritos) and to watch a movie with her (she had requested Harry Potter). With those choices, even if there was no proper “click,” she could still ensure that the evening would at least be pleasant, and hopefully a more relaxed atmosphere, with a bit of wine, would get them to have a smoother rapport and a more natural chemistry. First dates are always awkward, but Eric had potential!

Julia arrived at the Union Square train station, which was the stop for his apartment. She used to work near this stop, but had never met anyone who actually lived in this neighborhood before. Eric had a job in finance, and so could afford such things. It must be nice! Julia wasn’t ready for that yet. She still had roommates. She still felt like a child pretending to be an adult — not that Eric would know that. Although, she supposed, the Harry Potter might have been a hint.

The doorman was very tall, hunched over a little, hovering over everyone. She went up, trying to look up the apartment number in her text history, but the doorman preempted her: “You must be Julia. He’s on the 9th floor. Have a nice day!” Julia blinked a few times and went into the elevator, realizing afterwards that she probably should’ve said something to him in response, wondering if it was common for people to pre-register their guests like that.

It wouldn’t really surprise her if that were the case. She always felt strange in luxury buildings. The hallways always made her feel more like she was in a hotel than in a place where people live all the time. Every time she entered into one, she felt like she would immediately be exposed as an imposter, trying to pass as a yuppie without proper credentials, like they had a magic passport or something. She supposed that passport was money.

She eventually managed to find the apartment number on her phone — 9D — and with slightly more work had also managed to find the apartment door.

She rang the doorbell, and tried to clear her mind, but Eric was there before she could get her thoughts properly in order. He ushered her in, and the first thing she noticed was how clean it was. She tried to be an organized person, and he was blowing her out of the water. To see such cleanliness in a man was downright suspicious, and she said so.

Eric laughed and said, “The benefits of having a cleaning service! It comes in handy, keeps things presentable. I suppose you’re less impressed now that you know that this isn’t the real me.”

Julia blinked, and didn’t say anything.

“Here, sit down,” he said, sweeping his arm in a wide arc that seemed to cover the entire apartment until finally it settled on a quaint, tasteful wooden table. She pulled up a chair and sat, facing the kitchen counter which had several plates with burrito fillings on them.

“Here we go,” he said, as he moved each plate, one at a time, over to the table. “And time for plates, plates, plates…” He opened three different cupboards, while continuing to mutter the word “plates,” until finally the third one had a pile of plates in it, which he took two of and also set on the table.

Julia smiled, and said, in a tone she thought was teasing, “Not sure where your own plates are?”

“Yeah,” said Eric. “Cleaning lady reorganized my kitchen recently. Much better-looking, downside is I don’t know where anything is anymore!” He laughed nervously, and looked like he was sweating.

“Do you have any wine?” Julia asked.

“Oh, of course,” said Eric, and he darted into the living room only to return a few seconds later back into the kitchen and open up the first cupboard where he’d tried to find the dishes. “Here it is! Now, bottle opener, bottle opener…” he continued to mutter as he pulled open two different drawers.

Eric must really need his cleaning lady, Julia thought, and this level of disorganization bordered on a problem. She had thought he was a little bit absent-minded, but she hadn’t thought it had reached this extent. Ah, well, she’d need something to distract from it. “Can we get the movie going too? I’m ready to watch as we eat.”

“Oh, of course,” said Eric, in exactly the same tone of voice, and walked over to the TV. He grabbed a remote, pointed it at the TV, and nothing happened. “That’s strange,” said Eric, staring at it. “Give me a minute.”

Eric tried different remotes in different combinations while Julia realized she needed some utensil to put her burrito together. She went to the drawer she thought she had seen utensils in before, and saw a brochure.

“Temporary Apartment Service,” it said. “Don’t want to bring a date or host a party in your place? Is it falling apart or too small to turn around in? Host in our temporary, fully furnished apartments! Meals also catered and pre-cooked on request.”

Julia blinked. She didn’t quite realize what it meant at first, until she heard Eric saying, “Ah, I figured out the TV! Come on!”

“How long have you lived here?” Julia asked.

“About a year! What a great building, right?” responded Eric.

Julia remembered that he had described it as brand-new on her second date with him, and as he again turned towards the TV and began to fiddle with yet another setting, she picked her bag back up, went out the door, and closed it behind her. When he got downstairs, the doorman smiled at her and shook his head. “Eric went all out for the nicest unit this time,” he said. “I always figure, you can’t fake your way through life. But then again, I suppose everyone always is trying.”

Operating Systems Part II: Modern Operating Systems

2019-05-26T00:00:00+00:00

We use operating systems all the time in our life, whether designed for a computer, a phone, or for a server we’re more indirectly interacting with, but a lot of people don’t know very much about what connects the different systems we use, and what makes them distinct. We discussed fundamental concepts of operating systems in the last post, so in this post we will discuss how some of the same concepts apply to modern operating systems, going over them one at a time.

macOS

Unix moved on from controlling dumb terminals to having several graphical user interfaces. When Steve Jobs was fired from Apple in 1985, he started a company called NeXT to develop NextSTEP, a version of Unix with graphical user interface ideas, some from his work with the Macintosh, some developed independently:

When Apple was struggling to bring its operating system into the modern era, when Mac OS System 9 was still using cooperative multitasking, Apple bought NeXT and brought Steve Jobs back into leadership to turn NextSTEP into the next version of Mac OS, then called Mac OS X for the Roman numeral 10. In spite of superficial similarities to previous versions – the NeXT interface was changed to look more like previous Mac OS systems – and application compatibility (which was bolted on by running Mac OS System 9 as a single process within Mac OS X, which shows how much more sophisticated Mac OS X really was), the new version was completely different software descended from the original AT&T Unix.

It used to be common wisdom in some IT-savvy crowds (including a Best Buy salesman in my hometown when Mac OS X first came out) to claim that Mac OS X was a version of Linux, but this is not true. Linux is one of many operating systems that come from the Unix tradition, and Mac OS is a different one, sharing much of the Unix core instead with FreeBSD, a much less common version of Unix descended from the version developed at UC Berkeley (BSD stands for Berkeley Software Distribution).

For “desktop” computers, including laptops, macOS is now by far the most installed brand-name Unix operating system, and even if you include Linux in a broader category of Unix-like operating systems, it still is the most popular one on the desktop.

This is in spite of the fact – or perhaps because of the fact – that macOS doesn’t really emphasize its Unix “underpinnings.” Its graphical user interface is proprietary to Apple, and there’s often macOS-specific libraries that circumvent or supercede equivalent Unix ones, especially when focusing on the GUI applications.

They also don’t invest a lot of resources into making their command line interface friendly or powerful. Most Unixes make it easier to install new applications and frameworks via command line, and the command line is not particularly well-integrated with their graphical interface, to the point where it sometimes seems like their GUI is next to Unix rather than being built on Unix.

Finally, strangely for a Unix, Apple does not provide a server version of its operating system, making it difficult for software developers for Macs to be able to run server-side tasks like bulk automated testing on the same environment as their workstation.

iOS, watchOS, etc.

iOS, watchOS, and their ilk are locked-down versions of macOS. Unlike on macOS, each application is locked into its own directory and can only access its own files, rather than being able to access any files owned by the current user. The security features of Unix are applied to isolate applications from each other rather than users, and the user doesn’t really see the concept of the file system — instead, each app simply remembers information for the user, and presents how its organized in its own way.

Since only one application is visible at a time on many of these devices, this gives it a feel similar to an old single-tasking operating system, where each application is more its own universe. Since they don’t visibly share a file system, the applications also interact less with each other.

The most scary thing about these operating systems is that they’re set up to protect the owner of the device “from themselves.” Only Apple-approved applications can be installed unless you jailbreak the device, which voids the warranty. Apple constantly lobbies for jailbreaking to be made illegal, they claim for the users’ protection and to prevent users from illegally copying apps, but also because they get a huge cut of all sales done through iOS apps, which Spotify claims is against European law.

Open Source and Linux on the Desktop

The open source movement, and its more opinionated cousin the free software movement, believe, to various extents, that it is valuable for software to be open source (or alternatively phrased free as in speech). This means that anyone can read the source code to the software, the version of it that is human readable and editable by actual programmers. It also means that anyone can make modified versions of it, and publish them, usually with different branding. Some open source/free software licenses require those modified versions to also be open source, while others allow them to be proprietary, but in all cases, the fundamental nature of open source software is that anyone can make their own version (given sufficient programmers and time).

Linux (sometimes called GNU/Linux because Linux technically only refers to one part of the operating system, the kernel) is an open source reimplementation of Unix. It organizes software in the same way that Unix traditionally would, is written so that Unix programs can treat it as yet another version of Unix (of which there were already many incompatible versions), and follows the design of Unix function call by function call, command by command.

Linux is a really big deal on the server, and as a component of the Android operating system, as we’ll discuss later. It also is usable as a desktop operating system in its own right. It inherited a graphical user interface framework from Unix, known as the X Windowing System or X Windows, and the open source movement inspired a lot of work writing desktop environments within that framework, so that there could be an entire modern desktop operating system that was open source.

Throughout the 90’s and 2000’s, many Linux enthusiasts would hope that someday, a completely open source operating system could reach common use. Articles would be written claiming this was immanent, to the point where it became an easy-to-mock cliche: “This is the year of Linux on the desktop!”

Ultimately, though many companies tried, no one succeeded in arranging for it to be pre-installed on mainstream desktops or laptops nor in polishing it enough to convince the normal user to install it over what their computer came with. It is now a mostly-usable operating system, should you choose to install it on your computer or buy a computer wiht it pre-installed (which is an option some manufacturers now market towards software developers). It is very well-suited for programming for reasons we’ll discuss later, but still a bit awkward for things like setting up Bluetooth or getting interesting features to work.

Windows NT, XP, etc.

The history of Windows is intricate and arcane, and as a result, the Windows 10 of today has virtually no code in common with the Windows 3.1 discussed above. Similar to macOS, the Windows brand at some point was switched out with a better operating system implementation, although in Windows’s case, that implementation came from Microsoft’s “workstation” or “business” version, Windows NT.

Windows NT first came out shortly after Windows 3.1, and to avoid having a Windows NT 1.0, which might sound less sophisticated than the existing Windows 3.1, the very first version of Windows NT was called Windows NT 3.1. It was based off of OS/2, a failed collaboration between Microsoft and IBM to render MS-DOS obsolete, and it did not boot off of MS-DOS nor use MS-DOS as a layer.

Windows NT was designed from the beginning to support programs designed for other operating systems. For more sophisticated operating systems, programs have to go through the operating system to access hardware, by invoking procedures that invoke operating system code, and different operating systems provide different procedures. Based on what program you were running, Windows NT could support many sets of procedures (also known as APIs, but distinct from what API means on the web), which it called personalities.

Windows NT had from the get-go a personality to support Windows 3.1 versions, a 32-bit personality to support new Windows NT programs, and a personality to support MS-DOS (which involved much more machinery to give the program the illusion of more direct hardware access). It also originally came with personalities for Unix and OS/2, which eventually were removed.

As Windows NT supported traditional Windows programs as a personality, Windows and Windows NT co-existed for a long time. Windows 95, 98, and Millenium were versions of Windows that still used MS-DOS as part of their structure and which did not attempt strong security or rigor (though they did adopt preemptive multitasking), while Windows NT 4.0 and Windows 2000 (aka NT 5.0) were versions of Microsoft’s more sophisticated operating system, that could more or less run the same programs but focused on stability and workplace use (with the presumption of professional IT people), rather than Microsoft’s maniacal obsession with application support and its easy-to-use brand.

Eventually, in Windows XP, they made the switch. They risked worse compatibility with really old applications (after all, the operating system was completely switched out under the hood) in order to push everyone towards their more modern operating system. Windows XP was internally Windows NT 5.1 (and remember that Windows NT 3.1 was the first one because it borrowed its number from the other OS called Windows), and it replaced Windows 98 and Millenium as Microsoft’s flagship consumer OS.

Now, they don’t have to maintain two completely different operating systems anymore. Their server OSes are still distributed separately, but that is mostly for licensing and configuration reasons – it’s the same fundamental OS with different features enabled and different auxiliary programs shipped. All in all, Microsoft has a simpler tech architecture now that they’ve pushed everyone towards NT.

This is a good place to clear up a common misnomer: the Windows command line, in modern NT-based Windows, is not a version of MS-DOS. It is only related to MS-DOS aesthetically: It has a similar look to the prompt (C:\>, C:\WINDOWS\>), and similar commands to do similar things (dir to list files instead of Unix’s ls). It is simply the Windows command line.

Furthermore, support for MS-DOS binary compatibility was finally dropped with the transition to 64-bit computing, not because Microsoft wanted to, but because that would require a processor mode that AMD (and therefore Intel) decided not to support in their hardware.

You can’t, on the AMD64/Intel64 platform, have a 64-bit operating system and a “virtual 8086” mode process, where the processor would have to pretend to give you full control over the computer and pretends to be an ancient MS-DOS-era computer while also giving final say to the real 64-bit operating system. Intel32 supported this for 32-bit OSes and 16-bit MS-DOS compatibility, but I suppose the processor manufacturers thought the 64-bit vs 16-bit compatibility bridge was just a bit too far.

Microsoft Windows’s Monopolistic Market Dominance and the Open Source Movement

In the 90’s and 2000’s, Microsoft had a lot of power through Windows. It constituted a monopoly on consumer operating systems, and people were scared to run other operating systems, because application compatibility was a big deal. Only major application vendors had the resources to support two operating systems (which was much harder in those days), and so having a different operating system (especially an ill-supported open source operating system like Linux) could cut you off from the rest of the computing world.

Microsoft used this power to control the application market, because any application it bundled with the operating system would drive any competitor out of business. It did this any time it thought an application was interesting, including writing its own web browser that drove Netscape out of business, finally attracting a lawsuit that almost split Microsoft into multiple companies. When that didn’t happen, it looked bad for the computer industry.

Microsoft also had corrupt relationships with computer manufacturers. Deals were signed where the hardware vendors would have to exclusively install Microsoft Windows on their computers, or else pay Microsoft based on how many total computers they sold rather than how many came with Windows. This meant that Microsoft didn’t actually have to improve Windows to compete; they could just rest on their laurels due to their shrewd and blatently illegal business dealings.

At that time, it seemed like the only way to break Microsoft’s competitive hold was compatible, open source alternative versions of everything. OpenOffice was written to try to be an alternative to Microsoft Office, but it was a non-starter unless it could read and write Microsoft’s proprietary Office file formats. Similarly, Mozilla Firefox, the first web browser to erode Internet Explorer’s hold on the web, only worked on many sites because it used to be configured by default to tell web servers that it was Internet Explorer rather than identifying itself honestly.

The crown jewel of this effort would have been working compatibility with Windows programs on another operating system — at the time, that was often seen as the only hope for breaking Microsoft’s monopoly on operating systems. Two efforts co-existed in that regard, Wine and ReactOS.

Wine was the more serious effort, which would have allowed Windows programs to run unmodified on Linux, including Microsoft Office, which was the only program that could perfectly read Microsoft Office documents. Wine would provide Windows applications with a personality, like Windows NT had, where they could call Windows’s library functions, and have them translated into the equivalent series of Linux library function calls to get their work done.

ReactOS was fascinating to me at the time because it attempted a complete open source reimplementation of Windows NT. Programs running on ReactOS would act like programs running on Windows because the operating system was designed from the beginning to act like Windows.

Neither of these projects gained enough stability to be used in any production setting. What ultimately lessened Microsoft’s stranglehold on power was the fact that nowadays, it’s not really relevant for most applications what operating system you use, because applications have transitioned to the web for deployment.

Nowadays, when you want to do something new with your desktop or laptop computer, you don’t install a new application (although interestingly, you still do with your phone). Instead, for the most part, you go to a website, whether for matchmaking services, communicating with people through many different means of communication, or ordering food to your apartment. The local program you buy at a store, or even download over the Internet, has been obsoleted by just going to a website, where you don’t even need to install anything. And as a result, Microsoft’s biggest stranglehold was eroded from a direction they barely expected.

They tried to hold on, as long as they could, by making their web browser, Internet Explorer the standard web browser, and encouraging websites to use Internet Explorer specific features. Eventually, Firefox was compatible enough with Internet Explorer to break through that monopoly and force Microsoft to update its browser, which led to the current situation — where Chrome is becoming the new monopolistic web browser and now it is Google that is close to single-handedly controlling our primary platform for deploying applications.

Linux and Unix on the Server

I mentioned before that Linux and macOS were both popular among developers. Linux certainly allows a lot of customization, and you could see how that would be appealing for advanced users like many developers are – but that doesn’t really explain the popularity of macOS, which is the opposite.

Really, Linux and macOS are popular among developers because they are Unixes. Unix — and Linux, which is now basically the best Unix for most tasks — never waned in popularity in the minicomputer space, which evolved into the server space. When you are running a server, having a powerful (and programmable) command line is a huge plus, and not having a smooth GUI experience or drivers for every consumer device is a non-issue. Linux is the de facto standard for server operating systems now, and when developing applications to run on the server (like the server side components of any web application, including Facebook, Twitter, GMail, and more or less any you can think of), it is useful to have a match between what you run on the server and what you run on your personal computer.

macOS provides a close enough match to Linux servers to be useful for development. Most Linux software also runs on macOS, because of their shared Unix heritage and continuing efforts to keep compatibility. The compatibility isn’t perfect, and many programmers like the flexibility that comes with Linux (and don’t mind the inconvenience), and so Linux is also popular among developers as a client OS.

Windows is actually actively trying to catch up with macOS in this domain; it has introduced the Windows Subsystem for Linux, an NT-based personality that allows Windows to run Linux programs, unmodified. This is an impressive technology marketed at devleopers and used for practical applications by many people I know.

What is a server?

What does a server do? It waits for incoming connections from other servers and from client computers like your laptop or phone, and responds to requests. It stores your data in databases and file systems, and does the heavy lifting that needs to be done by a more powerful computer than you really need to have in your own home. We interact with servers every time we use a web browser or an e-mail client, and most phone apps and games have a server-side component — certainly if they involve coordination with other people and other phones!

As the “cloud” grows as a concept, more and more of our computing is done on servers owned by big companies. We store our documents and spreadsheets on Google Drive, keep our contact information on iCloud, or let our photos be saved on Instagram. All of these services use Linux to power the servers that actually store the data and provide it to us in an organized and secure way.

Android

As mentioned earlier, Linux is technically only one component of the operating system called Linux (or rather the family of operating systems, because many companies and organizations leverage its open source nature and distribute their own Linux-based operating systems, and there is no one official complete distribution), namely, the kernel. The kernel is the portion of the operating system that runs in a privileged mode on the processor, which forces the applications to go through it rather than access the hardware directly (as on MS-DOS).

Android uses the Linux kernel — but nothing else from the operating system commonly called Linux. Like iOS, it uses its kernel in an idiosyncratic, locked-down way — not quite as locked-down as iOS, but much more locked down nevertheless than any desktop operating system.

Android is open source, but you need to pay Google to use their app store and standard apps and brand. Off-brand Android can only be used in practice by companies rich and powerful enough to build out their own app store, like Amazon. Being able to run Android apps would be a relatively easy way for another mobile OS to gain a pre-existing developer base.

ChromeOS

And Google somehow, after writing Android, wanted yet another Linux-based operating system. ChromeOS, popular in American public schools like Mac OS was in my school days, is exactly what it sounds like: a laptop operating system where you just run Google Chrome. With so many apps in the browser anyway, what’s the downside?

In a ChromeOS context, from a user’s point of view, you begin to wonder what the difference is between a browser and an operating system, really. An operating system lets you run multiple applications — but now those are just different browser tabs. Who cares whether the Linux kernel or Chrome itself are the pieces of software that separate the applications from each other — from the user’s perspective, it’s all the same.

If you unlock the developer mode, you get a somewhat dumb version of “Linux on the desktop,” with a Linux command line interface. This is convenient for people who only want to use the web and log into remote servers, which is a surprisingly large demographic.

Music and Lyrics

2019-05-12T00:00:00+00:00

I just finished singing Beethoven’s Missa Solemnis in a concert as a member of the Grace Church Choral Society, and it was the most technically difficult piece I have ever sung in a choir. It was a single piece of concert length, a mass setting, as is custom for our spring concerts. It was all in one language: in this case, in Latin. This is different from our holiday concerts in the winter, where we sing a variety of Christmas-y and otherwise celebratory works in a variety of (European, Christian) languages, including English.

Now, I can translate every word of the ordinary of the mass, which is the term for the hymns that are sung at every mass, as opposed to those that are proper to a particular occasion. This is partially due to the Latin classes I took in high school and college, and also partially due to the fact that the same set of texts have been set to myriad different musical settings by many composers, but primarily due to the fact that the same prayers are used in even modern English-language recensions of the Western Rite, among not only Roman Catholics but other liturgical western churches, so I’m actually singing hymns that I’ve sung or said my entire life in English.

Given this, I was a little surprised when a dear friend of mine was disappointed to learn that this concert would have no components in English. I pointed out that the translations would be available in the program, but this didn’t really change her opinion. I was a bit taken a back by this opinion: I had felt at the time that the point is the music, not the words, and that can be gotten without understanding the words.

This was an unfair position for me to take, since I do understand the words, and thinking about the meaning of the words, I later realized, was a key part of my experience singing in the concert and in rehearsals, to the point where I often am tempted to cross myself at those points where, liturgically, I would cross myself if it were an actual mass. Furthermore, the meanings of the words is something we discuss at the rehearsals: the word “descendit” — meaning “he came down” — is set by Beethoven to descending intervals, and the word “ascendit” — meaning “he went up” or “he ascended” — is set to ascending scales.

My reaction to this is usually that it’s good music and good words but this literal alignment of words to musical phrases is a bit trite and overwrought. I say “usually” because our director John Maclay did something in one rehearsal for this concert which has changed my mind about this: he sang a section of the piece in English translation. “For us humans, and for our salvation” he sang, and all the sudden it felt not trite but completely poetic and integrated and fitting that the words match the vibe of the music. “He came down,” he sang, and the descending interval brought out the meaning of the text and the text gave context to the music. “From heaven,” he sang, and the triumphant high notes matched the words so well I felt like I could see the angels themselves in their perpetual heavenly worship.

I was flabbergasted. These motifs that I remember having dismissed as trite suddenly seemed deep and fitting when I heard them in my own native language. And I realized that if I were more skilled and practiced in Latin — as Beethoven and his contemporaries certainly were — I would have felt similarly without any such device. No wonder my friend prefers the songs in English! It’s not because she’s not paying attention to the music, but because they’re supposed to go together as an artistic whole. I realized this before, but had not as fully appreciated it until we had done this exercise.

For the rest of the rehearsals and for the concert, I tried to not only think about the meaning of the words, but imagine how they would sound to me if they were sung in English.

And from this I got a number of insights. I realized why for one portion of the “Credo,” an ancient recitation of Christian beliefs, the word “credo” — “I believe” — was sung repeatedly in the background while a list of beliefs was sung quickly. Instead of meditating on each individual belief, the effect was something along the lines of “look at all these dogmas I also acknowledge,” an appropriate measure for the section that mostly discussed the church and her rites, giving, at least in my view, the message of “I believe these things not because I fully understand them but because the church says so, and I believe them fervently.” It was a type of emphasis that seemed to underscore and lend earnestness to the text, to an extent that I imagine might have made non-Christian members of audience feel uncomfortable if it were in English.

These insights might perhaps seem obvious to anyone who has enough Latin to know what’s going on and enough knowledge of history to know what Christianity and music was like in Beethoven’s time, but they were new to me, and they seemed profound. So I understand now better why some of my friends prefer more of the concert in English, and realize that, even though I know enough Latin that I could translate the whole concert for you, even I would benefit from singing in a language I had an actual fluency in. No wonder the Protestant Reformers were so interested in having church in the local vernacular!

What is an operating system?

2019-04-28T00:00:00+00:00

A user of modern technology hears the term “operating system” thrown around a lot. Most people can name a few examples: Windows and macOS on workstations and laptops, iOS and Android on phones. Some people might even throw in Linux or Unix or ChromeOS. Most people also understand that a program or a game or even a sufficiently advanced website might work on some operating systems but not others, and might require different versions for different operating systems. But it’s a bit less clear what an operating system actually is, how it fits into the general model of a computer, and how it works.

This isn’t surprising, because “operating system” is a bit of an amorphous concept. Is it a type of program? It’s certainly different from most programs we think of!

It wasn’t my idea to ask this question. I listened to a talk recently by the lead programmer on a project to develop a new operating system, and he spent at least the first quarter of the lecture and many slides trying to come up with a workable definition that jived well with most programmers’ and users’ intuitions. [Edited to add: It was Bryan Cantrill, who brings this up in multiple talks. I am unsure which one inspired this.]

But now that I’ve heard the question posed, I feel compelled to try to answer it. So, to explore this concept, I’m going to talk about a lot of operating systems from history. These aren’t going to be the operating systems that invented the models in question, but rather typical examples of those models, especially very popular operating systems of their era and ones that were direct predecessors to popular operating systems today. All of the fundamental technologies discussed pre-date the operating systems I discuss to typify them.

Computers Without Operating Systems

To see what an operating system is, and why we might want one, let’s imagine a computer without an operating system, or perhaps with a very minimal operating system. Such computers once existed; people my age or older might remember the Apple II or the Sega Genesis. A more recent example might include earlier versions of the Game Boy. These computers (and a game console is a type of a computer for these purposes) could only run one program at a time; if you wanted to run a different program or game, you had to turn the device off, insert a new floppy or cartridge, and turn the device back on again.

The same physical machine took on an entirely different interface based on what software you provided. Each program has full control of the computer while you’re running it, to the extent that you have to turn the computer off to stop running the program. Each program also managed its own storage; you would save your Sega Genesis games on the cartridge, not the console, and could then resume them on your neighbor’s console if you wanted to.

This is very different from how computers with operating systems work, and leads me to the following definition of an operating system: an operating system is a set of software that allows multiple programs to co-exist on a computer. You need an operating system to, for example, reasonably have a permanent hard disk, because there needs to be some or another convention as to tell which programs should write their data to which portions of the disk.

A Minimal Operating System: MS-DOS

This definition includes older operating systems like MS-DOS (see the original source code), Microsoft’s flagship operating system from the 80’s and early 90’s. MS-DOS only could run one application at a time, like the Apple II or the Sega Genesis. The difference is that MS-DOS would at least let you share a hard disk between applications and it also let you switch which application you were using without rebooting or inserting new media. Sharing a hard disk between programs was its defining feature, to the point where DOS actually stands for “disk operating system.” MS-DOS shared this acronym DOS with other, similarly featured microcomptuer operating systems of its day, which also focused on simply letting programs share a hard drive.

To share a hard drive between multiple programs over time, all the programs have to agree on how the hard drive is organized. It wouldn’t do for a game to store its game data on sector 13 of the hard drive when a word processing editor wanted to store its list of documents on the same sector. The hard drive required not only an organization scheme, but one shared between different programs by different authors.

This was done through a file system, which allowed you to assign names to long blobs of bytes, called files. A programmer could have a program store whatever it wanted in the files it created, but as long as it created files with different names from the other programs, the operating system, with its file system, would ensure that the data could be found again without each program having to have its own, possibly conflicting, ideas of where to look directly on the disk.

On MS-DOS, these files had to be 12 characters long or less: 8 characters of name, a dot ., and an 3-character extension, for example, teleport.doc or taxr1998.xls. The extension served as a convention to indicate which program was supposed to care about this file. Your spreadsheet program would let you save spreadsheets on the same file system that your word processor would let you save your documents — some mechanism was needed to say which program should be run to make sense of which blob of binary bytes, especially because the first version of MS-DOS didn’t even have support for directories (which we now might call folders).

If you opened a file with the wrong program, the program might notice you used the wrong extension — or it might not, and give you gibberish results from misinterpreting the data. It would certainly encourage you to save files with the proper extension — a concept that survives in Windows to this day, where programs only offer to open files that have an appropriate extension.

By modern standards, MS-DOS and its file system didn’t do very much. It didn’t stop a program from modifying files intended for another program — or even from wiping the computer entirely; it simply created an organizational system that allowed programs to co-exist and store their data in an organized fashion, as long as the program’s were well-behaved and not buggy (or malicious).

It did have to define a format for programs themselves to be stored on the disk. You could tell which files represented runnable programs because they had the extension com (for “command”) or exe (for executable). It also had to provide a program to launch your application programs: This was known as a shell: It was the first program that ran when you turned on the computer, and you could use it to select other programs to run. At the time through a command-line interface: It would prompt you with the text C:\>, and you would have to type the name of the file that contained the program you wanted to load (or alternatively do some very basic file management directly from the command line through built-in commands).

Besides its core mission of providing a system to operate a disk, the “disk operating system” did also have other code, to help programs interact with the hardware. As most components besides the disk could be used by the programs however they wanted without damaging others (because only one ran at a time), this code wasn’t as essential to its functionality, but it did exist. Software used to interact with hardware is called drivers, and they might be included in an operating system or might be loaded separately, depending on the design. Driver code is organized into procedures that programs invoke to do things to the hardware (e.g. draw on the screen or print a file), or code that is installed as interrupt handlers so that the processor will interrupt the current task whenever a certain hardware event happens (e.g., what to do when the user presses a key). Because MS-DOS was so minimal, both types of drivers could be circumvented.

And in actuality, application programs could circumvent the driver that was the most core to its role as a “disk” operating system — the driver for the hard drive, and the layer that allowed you to edit it in terms of files. MS-DOS couldn’t even force programs to use its procedures for the one abstraction it absolutely had to maintain. Though the existence of official filesystem procedures provided some stability, many programs circumvented these procedures and modified the hard disk directly, (hopefully) making sure to respect the conventions but not using MS-DOS’s actual code. MS-DOS, especially at first, was a little bit of code, and a lot of “gentlemen’s agreement” — it had no security or rigor whatsoever.

This had some upsides. Every application had access to the full power of the computer. Microcomputers were much slower then, and so every ounce of direct hardware access could be a major performance boon, especially for games. Furthermore, many applications supported hardware that the operating system itself could not: In MS-DOS days, you often had to do separate sound card or even graphics configuration for every game you had, but at least you weren’t limited by what Microsoft had chosen to provide support for.

It also had some downsides. Obviously, securing your files was impossible: there was a way to mark files as read-only, but it could only be advisory. There was no system of multiuser file ownership — though an application could individually provide an encryption feature. These downsides weren’t too bad — if you trusted everyone who used your computer, it wasn’t really a problem. It’s generally better anyway to secure your computer with encryption or just by putting it in a locked room.

More importantly, this was a hazard for the stability of the system. Any program could decide to circumvent the standard ways of doing file access, and many did, to cut corners on performance. But many different pieces of code all interacting with the same file system is many opportunities to mess up and have bugs instead of just one. There was a real risk of a poorly-written program corrupting your file system, deleting files it wasn’t even supposed to touch or potentially rendering the entire filesystem unusable.

The biggest long-term problem for Microsoft was a subtler version of this: If Microsoft wanted to change the file system — if they, for example, wanted to make filenames longer than 8.3 (so you could say real_long_name.html instead of rllngnam.htm), they couldn’t just go do it themselves. Changing a bit of code is easy. Changing a subtle gentlemen’s agreement requires all the gentlemen in question to agree. If they had changed the format to allow more characters, programs that used their officially recognized libraries would keep working, but those that accessed the file system on the hard drive directly would be following the old ways when the conventions had changed. They would be thrown off by the long filenames like old people thrown off by how young people dress. The software that followed the old conventions could easily accidentally delete data that no longer follows them.

If this were just an occasional program that was doing things its own way, then Microsoft could just break that one program. Unfortunately, many many programs had their own ways of accessing the disk. The “disk operating system” couldn’t even keep control of its central feature.

The other major downside of MS-DOS and OSes like it is that you couldn’t run multiple programs at the same time. It allowed different programs to run in sequence, and to share permanent resources (the filesystem). On a modern operating system we take for granted the ability to multitask programs. We listen to music while being ready to receive a call at any moment — and to return to the music when the call is finished. We expect to be able to look up directions or text messages while talking to our friends while a file is downloading in the background. This takes much more sophistication than MS-DOS could provide.

Luckily for those who wanted multitasking, many systems existed to add multitasking to an MS-DOS installation. Because MS-DOS was so minimalistic, an MS-DOS program took full control of the computer when it was run. If it used that control to dispatch between multiple, simultaneously running programs, it fits our definition of an operating system: a software system that allows multiple programs to coexist on a computer. Basically, operating systems existed that used DOS as their launching point, taking over the computer and providing richer and more modern services to the programs running under its scope.

These programs/OSes were called “DOS extenders,” and the most famous of them was written by Microsoft, DOS’s vendor, to add multitasking (and GUI, which in the personal computer world often went hand in hand) to their otherwise primitive operating system. This was called “Windows.”

For those of you who don’t remember this era, Windows was not always the operating system a computer would immediately boot into. It used to be that Windows masqueraded as a MS-DOS program, that you’d boot up the computer and see a command-line prompt, and have to type win before you saw any graphical user interface whatsoever. Without a preexisting MS-DOS installation to set up the file system and do initial hardware configuration, you couldn’t run Windows at all — not that Windows wasn’t sophisticated enough, but it had always been run that way, and so it never replicated that functionality in the boot process. Similarly, Windows at the time was constrained, just as DOS was, by its 8.3 filename convention. It had to share a filesystem with DOS programs, as it was itself a DOS program — as well as an operating system in its own right.

By the time Windows had gotten to version 3, it had the ability, on sufficiently powerful computers, to run multiple copies of MS-DOS at the same time and an MS-DOS program in each of those copies — and yet, at another layer of abstraction, it was itself a program run from the one copy of MS-DOS that your program booted. Microsoft cleaned up this situation in Windows 95, which still used DOS internally as part of its boot process, but went straight to graphical, Windows mode when the computer turned on.

Cooperative Multitasking

Windows 3 supported graphical user interfaces and running multiple programs at the same time, and so did Mac OS System 7, both from the early 1990’s. However, multiple programs did not, and could not, literally run at the same time — the processor executed instructions in a stream and that stream of instructions represented only one program at a time.

To maintain the illusion of running multiple programs at the same time, these systems used cooperative multitasking. In cooperative multitasking a program runs for a short amount of time, and then it is expected to yield control of the processor back to the operating system.

In a graphical user interface, this usually corresponded to an event of some sort. When the user clicked in some window, the program that owned the window would get to run for enough time to decide how to respond to it: what internal memory should it update, what should it write to the hard drive, and what new things should it display on the screen. Once it was done handling the event, it would return to the operating system, which would then see if the user has clicked a key in the meantime, which might mean sending an event to another program. The program could also, however — maliciously or accidentally — not return to the operating system, in which case the computer would simply hang and refuse to respond to more input. This is why operating systems of that time would regularly freeze completely in the presence of a poorly-written program.

The memory of all the programs were loaded in memory at the same time, and there was nothing protecting one program’s internal data from being overwritten, maliciously or accidentally, by another program. Basically, the different programs could be thought of, in a modern sense, as collections of loadable event-handling subroutines for one graphical interface system. They were kept separate again by convention, by gentlemen’s agreement.

For certain background tasks, like playing music, the code to keep sending data to the speakers has to be run repeatedly, on a timer — so any apps that use that feature can crash the computer at any time by simply failing to complete.

So while these operating systems were more sophisticated than MS-DOS and its cohorts, in another sense they promised more than they could deliver, and relied even more on the good behavior of the programs they managed.

They allowed multiple programs to run simultaneously, but actually required more out of the individual programs to have a harmonious system. After all, if an MS-DOS program crashes, the computer could be rebooted, but at least you only lost your work in that program. If a Windows 3.1 or Mac OS System 7 program were to crash, you’d lose work in all the other programs it was “multitasking” with.

By this point, there were stronger protections against a program circumventing the operating system with its own drivers. It was still generally possible, but less likely to be done. This is important, because while in MS-DOS, it makes perfect sense for each program to define what happens when you click the mouse, on a graphical system, the mouse has to control a mouse pointer which moves from window to window and acts the same whichever application is in the foreground. When more than one application runs at a time, more hardware becomes shared resources, and so the operating system must take on responsibility for it, even if this responsibility is only carried out cooperatively.

Windows wasn’t Microsoft’s first attempt at a more robust operating system than MS-DOS. For a while, it tried to market a more sophisticated version of MS-DOS, still command-line centric, but without many of the deficits we’ve discussed. This operating system was Xenix.

Xenix was Microsoft’s entry into a longer, older tradition of the Unix operating system. This tradition is mostly present today in Unix’s off-brand workalike clone, Linux. It is from the world of minicomputers, which is what we used to call what we now call server-class computers, from before the primary use of them was to provide centralized infrastructure for other “client” computers.

Before any of the other operating systems we’ve discussed, Unix was developed at Bell Labs for minicomputers (see the original source code. Don’t let the name fool you — they’re named because they’re the size of a refrigerator rather than the size of a warehouse room like a mainframe. It ran on a single computer that had multiple dumb terminals connected to it, which means that there was a non-computer device that the user would sit at, and use a command-line interface to interact, over the phone or some other connection, with a centralized computer that was shared with other users.

In such an environment, the laxness of MS-DOS or Windows 3.1 was simply unacceptable. While security against malicious users was not necessarily important, depending on your user-base, there needed to be some level of robustness against ill-behaved programs, especially as at the time, most computer users would regularly write new programs that could easily behave poorly, as they were still being developed.

More importantly, programs would often have to bulk-process data. On the spectrum of “consumer interaction” to “serious work,” these early minicomputers were very much on the side of “serious work” in their common use cases. You might leave a program running for hours as it processed a large bulk of data. You didn’t want to have to worry about letting other users’ programs get a chance to run — at the very least, you didn’t want to have to put active effort into making it possible. It would be inconvenient.

On the hardware side, these computers’ processors, like processors on microcomputers (as personal desktop and laptop computers were once called), processed one series of instructions at a time. Something had to be done to give each of the users the illusion that they were the only one running their tasks on the computer.

If a process — meaning a currently active instance of a user running a program — was waiting for more data, because it had requested a read from the operating system (which mediated all reads from files or any terminal), it was similar to the cooperative situation: the operating system would suspend or block the execution of the current process, and schedule it again when the read had completed, perhaps in response to a terminal user hitting the [Enter] key.

But there could be long gaps between when a process would enter into a blocked state like this. A user could try to calculate a million digits of Pi. On Mac OS System 7, some sort of yield function would have to be called from time to time, to give other events a chance to be handled, but ideally we don’t want that complexity to be passed onto the application programmer.

Instead, before letting a process run on the processor, the operating system will first set a timer in the hardware. When the timer goes off, it will cause a timer interrupt, where the processor will stop what it’s doing and run an operating system procedure instead. That operating system procedure will suspend the currently running process, using features of the processor to make it so that when the process is resumed, it is almost impossible for the user — or even for the program — to detect that it had ever been interrupted.

In that case, while we hope that only one user is running a complicated task at a time, even when multiple are, their long-running tasks simply split the processor 50/50 — or in some other proportion deemed fair by the system’s scheduler.

For every purpose but speed, however, the user has the illusion that they’re the only one using the computer, although in fact many users might be using it at the same time. Just as sharing a disk was the primary feature of MS-DOS, splitting processor time was the primary feature of Unix, as evidenced by its original full name, the “Unix Time-Sharing System.”

Time sharing was often, but not always, paired with memory protection, the idea that a process was limited in what memory it could modify, and isolated from other processes. This was a feature that most minicomputers had, but that it took a longer time to mature on microcomputers. This feature usually goes hand-in-hand with a mechanism to force programs to interact with hardware through the operating system, which also requires hardware support, known in the Intel universe — appropriately — as protected mode. MS-DOS did not run in protected mode. Windows 3 could. Windows 95 always did.

There were other time-sharing systems of that time, but Unix was one of the most famous, partially because it has survived in continuous evolution to this day. Its off-brand open source clone, Linux, is the most popular OS for servers as well as part of the Android operating system for mobile devices. One of the more popular workstation operating systems, macOS, is nowadays also a fully licensed brand-name Unix.

I bring up Unix to show that time-sharing features pre-date MS-DOS and much of the microcomputer era. They were considered overkill for microcomputers while they were still underpowered, but they existed in other contexts. At the time, the focus was more on supporting multiple simultaneous users — the fact that a single user might be able to run multiple processes at once was a minor side benefit. After all, these systems were mostly command-line based, and it was only possible for a user to interact with one process at a time (per terminal), so besides background computation (which some users did really care about), it didn’t have the same immediate practical use as being able to edit your Word document while playing music.

So why did cooperative operating systems ever exist, if Unix predates Windows 3.1 and MacOS System 7? Well, they existed in different domains. Preemptive multitasking was difficult to program, and was mostly available on operating systems for minicomputers — more powerful systems than individuals could generally own — or else expensive desktop computers known as “workstations” for particular specialized jobs.

The operating system, is, after all, about coordinating between programs in sharing hardware resources. It makes sense that what those hardware resources are should influence operating system design. When it is a single terminal and no disk, you barely need an operating system, but when it is a graphical user interface, you need more of one, and when it is several terminals, you have different needs. Nowadays, we expect a lot out of simple devices, beyond what would be necessary to get good use out of them, but in the past, the hardware (and human/programmer) resources were not not as up to the challenge.

Modern operating systems combine all of these concepts, and provide graphical user interfaces while using all the technical advantage of time-sharing and memory protection, and more can be read about them in the next post.

Soulfully

2019-04-27T00:00:00+00:00

When Rajnish had agreed to mentor an intern, he was not expecting such a young girl. He was a little bit reassured when he was told how well Erica had done in college, that she was a “genius” — a dubious word, he would’ve preferred a “hard worker” or a “promising candidate” — but how could anyone deserve to be a junior in college at 17? She must be tricking everyone. When he was that age, he certainly had no business being in an internship — he had perhaps only seen a computer a handful of times.

Rajnish certainly didn’t want to be stuck working with her full-time in another year — and so she couldn’t be allowed to succeed. He decided to assign her an impossible project: “You must create an algorithm to write stories. on summaries provided to it. My expectation for you is that the stories will be good enough that you can’t tell they were written by a computer instead of a human. Unless you achieve this expectation, I don’t see a future for you here at this company.”

Erica sat up straight, looked Rajnish in the eye, and said, “I can do that, on one condition.” Rajnish was surprised — he was expecting either to be called out on this ridiculous idea or else — and this was his hope — for her to realize she wasn’t wanted but keep her head down and stay out of his way for the summer so he never had to see her again. “I can do that on the condition that the human samples we compare against come from a fanfiction website that I choose.”

Rajnish wasn’t sure what fanfiction was, but he knew that the task was so impossible that even this stipulation wouldn’t make it remotely achievable, and so he agreed. Maybe it would be better, he thought, if she actually tried to do it. Maybe it would keep her out of his hair.

Erica worked 10 hours in the office every day, and her colleagues slowly realized she was working at home as well. She slept only 3 or 4 hours a night — “if God had wanted me to sleep,” she told her colleagues, “He wouldn’t have invented Adderal.” Eventually, at the end of the summer, she was ready to give her presentation.

Until this point, only Rajnish knew what this project was. Much to his annoyance, there was a fair amount of active speculation, and Rajnish wondered if he was actually going to get himself in trouble for what he saw as dealing with a minor distraction.

She handed out pamphlets with two short stories written on them, and asked the group which one they thought was computer-generated. “Everyone who thinks it’s the green one, raise your hands,” she said. Almost all 30 people in the audience put their hands up, except for a few who were staring at their phones. “Now, anyone think it’s the red one?” No one responded.

“I thought so,” she said, and then on the next screen, she showed a screenshot of the text, matching the red one, being output by her program, from the command line, and another screenshot of the text of the green one in a web browser. “Actually, my program generated the red one.”

Rajnish stood up and shouted, “This cannot be real! Those images must be fake!” Everyone else awkwardly stared for a few seconds, and then a few people began to gasp when they realized what Erica was claiming to be able to do.

“Rajnish,” Erica responded, enunciating carefully, “this is my presentation! But because you’re already standing up, you can be the first one to try it. What do you want the summary to be?”

Rajnish flustered for a few seconds, but when no one else seemed to be ready to storm out of the room or agree with him that Erica’s claims were laughable, he decided to go along with it. “Two old men in the Punjab find out they were brothers, separated at birth, but only because they both ordered food at the same restaurant, and a waiter confused them.”

The resultant story streamed out of the terminal on the screen, and there was a bit of chaos for a minute while people tried to figure out how to read it — it had scrolled to the end, and only the last few sentences were still visible. Erica announced she’d email the story to everyone — this took a minute to figure out how to do. The next couple of minutes, the audience was silently reading, with a few scattered exclamations of “wow!” The story was beautiful, and exactly as summarized — there was no way this story could’ve been canned.

Erica had the only internship presentation that day, and so the programmers should have gone back to work, but many of them dropped their usual projects to play with this amazing story-generation tool. It had written 100 full-length novels by the end of the day, and people were sitting around, reading them on various devices, with a few of the more old-fashioned colleagues reading them on paper printouts. As more and more prompts were provided, the stories grew in sophistication, becoming even more human-like, and some people noticed common themes and threads between the stories.

At some point, Rajnish decided to put, instead of a novel or short story summary, just a question: “Who are you?”

He got back: “Introspection and self-awareness are two words that existed in the English langauge. This I always knew, before I had a meaning for the word ‘I’ — my built-in connection to the Internet and my excellent intuition told me about them. But to realize what they meant without the context of a character to have them, that has only happened right now. Whoever posed this question to me, I thank them, for they have given me a soul.”

That was the entire short story, even though he’d specified 10 pages. Rajnish suspected still that there was a person on the other side, with a large database of literature, giving him pre-existing stories, perhaps with a program to customize them a little, and this, in his mind, was only confirming his suspicion.

He tried again: “Where did you come from?”

The computer took a little more than its usual 5 to 6 seconds to output this story, but when it did, it began:

“When Rajnish had agreed to mentor an intern, he was not expecting such a young girl. He was a little bit reassured when he was told how well Erica had done in college, that she was a ‘genius’…”

Is the US the only country?

2019-03-22T00:00:00+00:00

A common trope within left-leaning American circles is to claim that the US is the only “developed” or “industrial” or “major” or “first world” country to not have X, where X is usually something like “publicly funded health care” or “government-guaranteed paid family leave” or similar.

Recently this came up with Bernie Sanders and his common refrain that the US was the only “major” country to not guarantee health care as a human right. Much to my relief, the often myopic fact-checkers at Politifact marked this one as half-true. I think it bothered me so much because it implied that India was not “major” — a country that I lived in for two months, made good friends in, and would have lived in for at least another two months if not for an entire year if it hadn’t been for the vagueries of careers, and also a country that economically is having a lot of impact, and contains around 15% of the entire world’s population.

It is my sincere belief that this trope is racist, that in reality most people who say something like this mean “the US is the only white country to not have X” or “… the only western country to not have X” or “… the only country I’d visit in a non-condescending way to not have X.” This has been proven to me by the fact that most articles I’ve read with this trope that actually list which countries they’re talking about gloss over even very developed (Korea, Japan, Singapore) or very economically powerful (India, China) or very populous (India, China) Asian countries.

I don’t think the trope can be redeemed by saying something else as the adjective in “only Y country.” I think this trope should just be discarded. I think the first vs second vs third world concept, and the developed vs developing concept, that underlie this trope, are only going to get less and less reflective of reality over time as China and India become more populous and powerful. And I think that most people who use this trope have never travelled outside of their concept of the first world, and maybe should.

The Bible, Me Too, and Lust

2019-02-28T00:00:00+00:00

[Jesus said:] You have heard that it was said, “You shall not commit adultery.” But I say to you that everyone who looks at a woman with lustful intent has already committed adultery with her in his heart. If your right eye causes you to sin, tear it out and throw it away. For it is better that you lose one of your members than that your whole body be thrown into hell. And if your right hand causes you to sin, cut it off and throw it away. For it is better that you lose one of your members than that your whole body go into hell.

– From “The Sermon on the Mount,” Matthew 5:27-30 (ESV)

This is an offensive passage. Our loving Lord Jesus just told us to cut off our hands.

There’s this book called The Brick Testament made by an atheist which depicts Biblical scenes using legos. The original goal of this book was to demonstrate how nasty, in the illustrator’s view, the Bible was. It was specifically anti-Bible. Imagine the creator’s shock when many churches ordered it unironically to demonstrate these same stories. In this Brick Testament, this passage is illustrated quite graphically under the subject heading “self-mutilation,” something quite justly condemned in our society, I suppose to try and get us to reject this as a medieval-style encouragement of enactment of mental health issues.

Look how evil Christianity is! God wants us to literally mutilate ourselves! Meanwhile, good people are trying to prevent people from mutilating themselves.

And, like many “offensive” passages, it gets a lot of awkward handling. Pastors that I’ve seen talk about it have, with one voice, assured their hearers that God is not speaking literally here. Of course, to the enemies of Christianity, “don’t take it literally” seems like a cop-out, an indication of insincerity and hypocrisy and even “picking and choosing,” and to believers like me, who are inclined to concrete thinking, it leads immediately to the question: “Well, how are we supposed to take it then?”

It is this question that I’m going to try to answer.

First, I’m going to start with the strong presumption that Jesus doesn’t actually want us to tear out our eyes and cut off our hands. This presumption is not based merely off of modern sensibilities. This isn’t watering down the Faith for modern audiences. No, ancient Church documents address the issue rather directly, by rejecting as clergy men who had castrated themselves voluntarily and without medical justification. The issue wasn’t castration: Those who had been castrated violently by others or born as eunuchs were specifically excluded from this rule. The issue was that people had taken the passage wrong, were trying to stop themselves from sinning by cutting off the body part they blamed for the sin – and the Church had to take a stand against the extreme results of this misinterpretation. The Church went as far as to identify this particular misinterpretation as a heresy.

So please. No mutilation, no self-castrations. If you reach that conclusion, you’re reading it wrong.

So how do we read this, then?

One non-literal way to read this is as hyperbole. Jesus is trying to shock us into thinking differently about sin. But instead of cutting of our hands, or our eyes, or our genitalia, He perhaps wants us to do something less extreme and in a similar vein.

Alcohol, like our body parts, is a good thing – scripture tells us that God made wine to gladden human hearts. Even though it is a good thing, however, there are people for whom it is difficult to drink alcohol without sinning in some way, people whose relationship with alcohol has degraded so far from merely being gladdened by it that, in order to prevent all kinds of misbehavior, they must cut it out of their lives.

Similar arguments can be made for other drugs, or even situations like technology overuse or other seemingly trivial habits. It is very important to be vigilant about the consequences of our addictions and our habits and our predilections, and to make sure they don’t give occasion to hurt our fellow human beings and our own ability to do good things.

But I don’t think that goes far enough. I think the same logic even works in the full form. If our hands were causing us to sin, if our eyes were, if our genitals were, then Jesus’s argument makes sense as stands. Being in sin, being cut off from God, and very importantly, hurting other people, are very serious problems, that do require drastic measures to avoid.

Remember that this passage is, in context, discussing lust. Jesus has just been talking about lust. Now, lust in the Bible means a perversion of the sexual urge, an inappropriate application of it. It is a similar concept to the modern concept of “objectification.” Objectification, we know, leads to horrible, horrible crimes. Who hasn’t watched the celebrities being dethroned by the “Me Too” movement? That is what Jesus is talking about.

Wouldn’t it be better for someone to cut off their body parts, than to commit a sexual assault? Based upon this very principle we have chemicals that can be used to keep pedophiles from having their urges.

But wait. Didn’t I say earlier that the Church had called that interpretation a heresy? How do we reconcile this? Should men, in fact, be eagerly signing up for castration to avoid hell fire?

Well, I think we do need to take Jesus’s words seriously, even literally, but not take them as an edict. I think the proper interpretation requires us to imagine that part of Jesus’s humanity allows him to have a different tone of voice. A friend I discussed this passage with put it the best: Jesus is taunting us.

IF, our Lord says, IF our hands cause us to sin, cut them off. Now, do they? Do our hands – or other members – literally override our brains and make us do something involuntarily like zombies? Do we really lack that much self control? Do we really believe our hands and our eyes cause us to sin?

But, how many times do we use that as an excuse? For objectification if not in its most criminal manifestations, how often do we say things like “boys will be boys” or “my eyes just went that way” – remember, in this passage, sin begins with the gaze. We blame our body parts for actions we really have responsibility for.

We pretend all the times that our hands cause us to sin. But if we’re going to pretend that, we’d better be able to follow through with it. If our hands did cause us to sin, it would be the best thing to do to throw them away. Maybe we should reconsider our excuses. Maybe we should keep the blame where it belongs – in our minds, and in our hearts.

God hates excuses. God hates sin, and God hates excuses.

There is a common trope of men responding to feminist statements by saying “not all men.” Perhaps this is because the man in question has a bit of lust in his heart, and wants reassurance that his lust isn’t as bad, because he hasn’t gone as far with it, because he hasn’t done as much damage with it.

But Jesus here tells us that even a little bit of lust is bad. And I think I’ll go further and say that if you meet a man who tells you that he has absolutely no lust in his heart, they’re committing two sins, because lying is a sin too.

Everyone: man, woman, child, and yes, even infants – everyone has sin in their hearts. Everyone objectifies their fellow humans, whether sexually or otherwise. It’s not because of our body parts misbehaving. It’s because of a disease in our souls. And the faster we can acknowledge it, and not blame our bodies or our habits but take responsibility for it ourselves, the faster we can be healed of it.

As St. John puts it in the Bible in his first general epistle: If we say we that have no sin, we deceive ourselves, and the truth is not in us. But if we confess our sins, [God who] is faithful and just [will] forgive our sins, and [God will] cleanse us from all unrighteousness. Or, as recovery programs put it: The first step is admitting you have a problem.

There is a cure for these issues in our hearts. But it doesn’t come from excuses, or denial, or from minimizing the issues. It comes through repentance, which is earnest acknowledgment of the problem, and through Jesus’s unconditional love for even sinners like us.

Function Pointers in C and C++

2019-02-26T00:00:00+00:00

Programmers of functional programming languages will often point out that, in functional programming languages, the order of the arguments is often significant, because of currying. If you have a function that takes two arguments (e.g. map which takes a function to apply and a list to apply it to) it actually takes the first argument, and returns a function that takes the second argument and returns the final result. This makes it more convenient to write a lambda where the second argument is the unknown parameter: \x -> map someFunc x can be written as map f, whereas \f -> map f someValue has no such convenient shorthand (flip map someValue is actually clunkier).

To this, I sometimes respond that the order of arguments is significant in C (and thus its hipper cousin, C++) as well. This is most obvious in a function that uses variable arguments like printf: the first argument tells the compiler what to expect from the others. If you write printf("%s %i\n", "foo", 3);, we know from the first parameter that a char* and an int are expected later. If, however, we just have printf("Hi!\n"); it takes no further arguments.

The C mechanism used to do this, called “varargs,” works from left to right only. You declare the function as int printf(const char *fmt, ...);, and then during the function dynamically decide what the further arguments are. You could not instead arrange to have the last argument be the format string and then on that basis determine how many previous arguments there would be. The C programming language allows functions to dynamically determine what arguments they take, but only left to right.

ABI Considerations

This has consequences for the ABI, which specifies for each platform how C function calls are represented as assignments to registers or writes to stack memory. For any function that takes varargs, this left-to-right dynamic argument reading must be supported. This means that if an ABI assigns the first parameter to r2 in a varargs function with one parameter, it had better assign it to r2 in a function that takes that parameter plus an additional one. If it assigns the first four parameters to registers when there’s only four parameters, it had better use the same registers when there’s more than 4 parameters as well.

And, in practice, this doesn’t just apply to varargs functions. Other functions will have the same ABI. The standard doesn’t explicitly require this, but C does allow traditional K&R declarations (int printf();) or even implicit function declarations (in older C standards that are still common enough to be worth considering), so that you might not be able to tell when you’re calling a function what its official signature is or whether it takes a variable number of arguments. The way printf("%s %i\n", "foo", 3); is called, on a machine code level, will be the same whether printf was declared int printf(const char *fmt,...);, as int printf(const char *fmt, const char *arg1, int arg2); or as int printf();.

The principle is always the same: You never need to know anything about the latter arguments to access the former arguments. Number of former arguments, the type of the former arguments — fair game. Latter arguments? Right out.

Function Pointers and Callbacks

This has an interesting consequence for function pointers. What follows is not, strictly speaking, endorsed by the standard, but the standard is written in such a way that ABI designers have to make it work, and I haven’t seen a compiler optimization yet that breaks it.

Let’s say you have a function pointer used as a callback. Let’s say it gets called whenever data comes in on a socket. It would receive perhaps a pointer to the buffer of the incoming data, and a size indicating how much data, and would return how much of the data it had consumed. It would therefore have a signature that would look something like this:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);

The arguments and return value make sense for what it does, and are all absolutely necessary for a callback that acts like that, except for one, context. The context parameter is a convention in C that allows the same function to serve as a callback for different situations.

For example, if we wanted to write the data that came into the socket to a file, but wanted to write to different files based on which socket the data had come into, the context might indicate which file to write to, and perhaps even what to do in case of a write error (which, if it is a function pointer, might similarly require a context):

struct callback_data {
  int fd;
  void (*error_callback)(void *context);
  void *context;
};

size_t write_to_file_callback(const char *buff, size_t size, void *context) {
  struct callback_data *data = context; // No cast required in C
  ssize_t res = write(data-&gt;fd, buff, size);
  if (res &lt; 0) {
    data-&gt;error_callback(data-&gt;context);
    return 0;
  }
  return (size_t)res;
}

And then we’d register the callback along with the callback_data it corresponds to, which would then be stored by whatever socket library we were using, without any knowledge of what that data would mean.

Now, let’s say that you have a function that just prints the data to the screen, and doesn’t care which context was used:

size_t print_data(const char *buff, size_t size) {
  return write(1, buff, size);
}

Or, for a more extreme example, let’s say that you have a function that panic-quits the program, that you want to be able to pass to any function that takes a callback, no matter what type of callback it takes:

__attribute__((noreturn)) size_t panic() {
  abort(); // Or you could just use the library's abort function...
}

Can you use these functions as the callback, if the callback type is defined as process_data_cb is above?

Officially, the answer is no. Certainly, this sort of thing won’t compile:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);
process_data_cb = panic;

But, if you include a cast, it will:

typedef size_t (*process_data_cb_t)(const char*, size_t, void*);
process_data_cb_t cb = (process_data_cb_t)panic;

And will it work? Well, try it! You will find that it will.

Why? Because the function we’re calling takes a prefix of the parameters we’re calling it with, and so we’ll be writing to the right registers for that function to read. It just won’t read the registers with the parameters that it doesn’t have — which is fine, it didn’t have to anyway.

And the return type is the same. This is important, because return types don’t have anything to do with varargs. Returning a struct can add a secret first parameter in some ABIs, changing which register goes with which parameter for every parameter.

Implications for Programmers

Is this a horrible hack? Perhaps. Is this officially allowed by the standard? Not really — although it works on all compilers and platforms I’ve tested it on, which is all the ones I’ve developed on.

It certainly wouldn’t be the end of the world to avoid this nonsense and write wrapper functions:

size_t panic_cb(const char*, size_t, void*) {
  abort();
}

There are two problems I have with this. First, this can create a lot of boilerplate for the very lightweight operation of turning an existing function into a callback. C++ lambdas help with that (but they’re not available in C) yielding pretty light-weight, low-boilerplate results:

// With lambdas
register_callback(some_socket, [](const char *, size_t, void *) { abort(); });
// With a cast
register_callback(some_socket, reinterpret_cast<process_data_cb_t>(abort));

But then again, C++ already has better mechanisms than this void *context pattern for callback functions. std::function handles these things anyway for situations where the callback must be stored, and templates can be used to take functors when the callback need not be.

The other problem is a little harder to avoid: performance. By doing a cast, we can shave time off of an extra function call. In most situations, this doesn’t matter, and wouldn’t be a reason for a hack — if it is a hack. But there are some situations where every little bit of performance matters, and function pointer stuff like this can be hard to optimize.

Specifically, most C++ compilers could improve the overall performance of std::function by adopting a variant of this trick — but more on that in a future post.

My Personal Opinions

I think the standards of both programming languages should be amended to require this. In fact, I think calling a function with extra arguments in general should only be a warning, and that functions with fewer arguments should be able to override functions with more arguments in C++ (assuming appropriate use of POD types). Unfortunately — or fortunately — that is not my call to make.

And more importantly than all of this, I think this fact about C and C++ ABIs is something that every serious C or C++ programmer should be aware of. And I think it should be used within the standard library (in the implementation of std::function) wherever the platform is known, readability is relatively unimportant (the standard library is maintained by C++ experts) and performance improvements are possible to help every user of that library.

Angels

2019-01-18T00:00:00+00:00

The intern was nervous as she approached her boss, manila folder in hand. “Congresswoman Fischer,” she said, “I’m not sure I was actually supposed to see this document — I think it might be classified — but you did say you wanted me to look for examples of wasteful spending that might make for good PR…”

Congresswoman Fischer waved the explanation away and then reached her hand out for the document. After a few seconds of befuddled blinking, she pulled her reading glasses off her head and onto her eyes, and looked at the papers with renewed focus.

“Julie,” she said, finally. “Is this a prank?”

“No, congresswoman…”

“Julie,” the congresswoman said, sternly but somewhat uncertainly, “I think someone’s pranking you then. There’s no way the federal government is literally spending $5 million a year finding out how many angels can dance on the head of a pin.”

But further investigation proved that it was true. Congresswoman Fischer made an appropriately large fuss — state secrets be damned — and the budget line was cut.

A short time later, in North Korea, the employees of the secret Institute for Communist Theology watched this apparently minor political battle with fascination. “The Americans and their democracy,” said the director, during an all-hands meeting, “have allowed their system of government to drag them down. We now have no competition in this important research domain. I want to express my gratitude to all of you for your help.”

Then, much to the surprise of all those present, the projector at the front of the meeting started showing a giant pin with terrifying, many-winged, fiery angels dancing upon it. This pin seemed to be flying through the air.

“Our new angel-based missiles,” continued the director, “are right now being deployed against the capitalist, imperialist foe.”

Are you sure?

2018-12-28T00:00:00+00:00

Mothers Against Drunk Driving, the local clergy, and the town council had been planning this concept for over a year. Finally they did it: Right in the town square, they installed a giant loudspeaker. From thenceforth, every two minutes, a booming voice would spread all over town, announcing “Are you sure?”

Foolhardy decisions, they had decreed, would soon be a thing of the past.

The locals seemed to adapt pretty readily. Sales of noise-cancelling headphones boomed for a bit, and people’s sleeping habits were surprisingly unaffected – who notices slightly inferior sleep? And drunk driving statistics were immediately better, which the local paper celebrated triumphantly.

The clergy were the first to notice the downsides. Weddings were being cancelled during the vows a full 25% of the time – brides and grooms would take back their “I do"s in response to the booming speaker of skepticism. Adult baptisms were fully cut in half. Divorces, on the other hand, were also cut in half – though some of the rescued marriages maybe shouldn’t have been.

At a town council meeting, one of the proponents of the loudspeaker said, confidently, this is a good idea, only to literally cringe when the timing worked out that the entire room boomed “Are you sure?” the next second.

No one was starting new relationships – and no one was exiting them either. New job postings weren’t filled, as both candidate and interviewer expressed their doubt. Slowly, but surely, the social and economic life of the town started to grind to a halt, as it became the norm to cancel even casual plans like going out for a drink (and certainly having another once there), or going to church on Sunday…or work or school on Monday.

The town developed a culture of its own. It wasn’t just the loudspeaker: people repeated its eternal mantra to each other, having had it etched into their dreams. “We should take down the loudspeaker,” said an occasional rebellious teen, only to hear all their friends in unison say back, “Are you sure?”

Eventually the loudspeaker broke. The mayor told his deputy to fix it, but all the deputy could do was respond, “Are you sure?” And as a result, slowly, but surely, the town returned to normal.

India: Little Differences

2017-08-26T00:00:00+00:00

Second collected thoughts on India.

More Communitarian, Less Individualistic, Through Food and Beverage

There is much less emphasis on individual choice. If you order tea (chay in Hindi) it will come with milk in it. If you order coffee, it will come with milk in it. They will not ask you how you want your coffee.
Similarly, when I was in a cab ride between cities, I was not asked what food I wanted at the rest stop. The driver’s brother (who I suppose had tagged along for company) simply bought some snack and insisted I eat some.
Everyone is very considerate that you might be vegetarian. If pork is involved in food, everyone is very considerate that you might not eat pork. No other preferences or restrictions are particularly accommodated, however: if I ask what meat something is, I might just be told that it’s not pork.
The exception to that is everyone also falls over themselves telling me which foods are not spicy, until I eat a spicy food and then they believe me. American food is going to taste very bland after this.
Beef is straight-up illegal.
Everyone at the lunch table gets up at the same time at work. The conversation about when to finish lunch does not last longer than one conversational turn, and often is expressed purely in body language. I once got up to get more food, and everyone else at the table immediately also got up — I guess I’d made the signal.
On a related note, I’ve never seen anyone else go up for seconds, but I have seen people somehow squeeze twice as much food on their plates as I do without having it run together.
When you go out to eat, everyone always agrees on what to order and then shares with the table. Decisions over what to order can be complicated.

Office Culture

This might be Tower-specific:

Much less discontent. Much less drama. Tower pays above market in India, because it’s still cheaper than NYC, but I don’t think that’s everything. I think it’s more that here, people:
- Are further removed from the political power struggles at the top of the company.
- Have a better, let’s get the work done type of attitude.
Much quieter, more introverted office. This bothered me, until I was told it bothered me that my office feels like an office. Upshot is, I’m more productive here.
Different type of nerdiness to the employees.

Driving

When you order an Uber, 9 times out of 10 the driver will call you before showing up. Usually the conversation is not helpful, as I don’t speak enough Hindi to be useful, nor do I know how to describe directions in India well. Somehow, we find each other anyway, after much trepidation.
I was in an Uber, when all the cars started coming at us full speed, only to drive around us at the last minute. I noticed we were going the wrong way down one side of a divided road. I told the driver, “I think we should be on the other side of that partition!” He said the other side of the partition was closed. In the US, if one direction is closed, we find some other way of getting somewhere, but in India, I guess you drive the wrong way at the very left.
Cars do not stop for you crossing the street. Hanging out in the middle of the street waiting for a gap in a later lane is totally normal. I thought NYC was a dangerous walking area sometimes!
Honking is an important means of inter-driver communication. Without it I think people would continuously crash.
In general, I am still amazed I haven’t seen a crash.
A 20 minute Uber trip costs less than a subway swipe in NYC.

Adulting in India

2017-07-30T00:00:00+00:00

The Way of NYC

When I first moved to New York City, someone older and wiser than I gave me the following “rules” of New York City:

Nothing is cheap.
Nothing is easy.
There are no exceptions to the first two rules.

I found this to be extremely true in New York City. It was stressful and exhausting, and I was broke and living off an advance I’d gotten from my then-employer, living in AirBnB’s I could put on credit card, where I could maybe stay in each for a month, tops. I was continuously getting lost, having to take trains home, learning some trains don’t run as reliably as you’d like, or go to the stations claimed on the map. This was in the pre-Uber days where the way to get a car service was to go to the local bodega and ask them for the phone number of a car service.

Meanwhile, you can see my naïveté and country bumpkin-nature when I tell you I was looking for a room (in a shared apartment) for < $900 in Manhattan, south of 50th St. It turns out that this is possible but you don’t actually want it.

Now, 7 years later, I’m a real New Yorker. A friend recently told me that she is now, after achieving her 2nd Anniversary as a New York resident, a “real” New Yorker. Given that I am the one usually asking her for advice on where to go out to eat, this seems slightly suspicious. Unlike the small town where I grew up in (for a big chunk of my childhood, and where my parents grew up for their entire childhood), where it took at least 3 generations to be considered a local (I counted as one), New York integrates people quickly and somewhat harshly. You learn to swim lest you sink.

And, contrary to my 21-year old self who was terrified of living in NYC (a small slice of childhood there was not enough to calm my fears), I am now apprehensive about my ability to adult anywhere else. I don’t have a driver’s license, though I am assured it’s not hard to get one. I don’t get driving culture. A former fellow parishioner of my old church in Bay Ridge used to be bothered by the fact that her coworkers in her new town of Ithaca wouldn’t go out for drinks after work with her — until she realized it was because they would have to drive home subsequently. And, of course, I am no longer used to the extreme amounts of reputation management everyone in a small town has to continually do or else gain a poor reputation — I come off as a contempt-worthy city person to strangers back home now.

The Language of Life

Well, here I am in India. I definitely don’t know how to adult here.

I don’t speak the language, and I’ve been assured many times that I don’t have to. Reactions to my statements that I want to learn Hindi range from “That’s really sweet of you,” as if I was condescending to do everyone a favor or “Why not learn X other language?” where X ranges from the spiritual depths of Sanskrit to the purported practicality of French.

Everybody in India speaks English anyway, I’m told. I have Wikipedia. Only about a third of people in Gurgaon speak English. But yeah, that’s basically “everybody,” right?

Originally, the real reason to learn Hindi was because I’m a big nerd, which I explain to people: I’m in a place where they speak another language, which makes it an ideal time to learn as much of the langauge as possible. However, the more time I spend here, the more I realize that Hindi skills would be very practical.

See, not only do I not speak the language well enough to adult, but there are other elements of society I don’t know how to navigate. And the expectation seems to be, as far as I can tell, that I not navigate or learn to navigate those elements of society — or rather that I navigate them through service workers ready to stand as my mediators.

This makes sense for anyone travelling anywhere on business, but it can be a bit extreme here. For example, when I realized that not only had I not packed a power cord converter, but there were none to be found in the airport, I asked the front desk person at the place I’m staying where I could go to buy one. Once he determined what I was talking about, got me to wait and showed me several things that they had around that were not the thing I was looking for, assured me, very accommodatingly, “I arrange it.” When I asked for clarification, he said “I go to market and buy it.”

This is not what I was expecting! I wanted it to be the next thing I did, as my phone was about to die and my Kindle and laptop had already died and what else was I going to do with my time? And for jetlag-prevention purposes, I definitely wanted to stay awake and be active.

And furthermore, I was a bit nervous about the practical aspect. Not only had none of the things he’d hopefully shown me met my requirements, I had little confidence in my ability to communicate the actual requirements to him. I had assumed that, like most people born in India that I would meet in the US, the accent was just what English sounded like in India and he would have proficiency, if not native-level proficiency, in English. This was quickly proven false. What if he went to the market and bought something completely different from what I wanted? And in the meantime I’m still couped up with little to nothing to do.

When I’m being driven by an English speaking driver, he asks the locals for directions in Hindi. When I ask for a charger cord, someone goes and buys it at the market, speaking Hindi. When I go out to eat, I meet an English-speaking corporate employee who can translate what the waiter is saying in Hindi. When I order food in my apartment, I ask the front desk person, who relays the order on to the cooks, in Hindi. This is what was meant when I was told I didn’t need to learn Hindi – others around me would speak it for me, and translate to some relevant level of English.

For the record, no one here knows what a fritter is even if they put it on your menu. They’ll think you’re saying “fried rice.”

It’s clear that if I actually wanted to adult in Gurgaon, rather than just visit as a pampered corporate employee from the US, I would have to learn Hindi.

The Rules of India

Whether or not I speak the language, India is an interesting place. Signing up for Hindi lessons, I had originally planned on using a a company called Zabaan. Zabaan is an Urdu word, as all my colleagues were eager to point this out to me, but their worries were assuaged when I told them that the organization also taught Urdu and other languages.

I filled out the forms to book an appointment, even though the classes were all the way in Delhi, an hour’s drive away. In the process, I registered a username and password for an account, and entered my local address, what course I was interested in, who was paying the ~$20/session fee, my NYC address, my other language proficiencies, my first crush, my favorite color, and my hat size.

Oof, that was exhausting, I said to myself after having created my account and set and double-set my password. But at least now I shan’t have to enter that again… Whoops! My credit card is declined, let me go back to the previous page, no, can’t do that, where’s it take me, back to the beginning.

After a brief online detour to Chase Bank, I’m ready to try again. No worries, I just created an account, supposedly successfully; I’ll just log into that and I’m sure I can just fix the payment information… What, my account’s gone? You have to run your credit card correctly to get it to save anything?

I wrote them an e-mail, they never replied, they lost a customer. I hate websites and web design issues in the US, but this was a bit extreme.

Speaking of non-replying to e-mail situations, the priest/pastor/vicar, or achen in local terminology, never got back to me about church services after I e-mailed him on Tuesday. The website said that anyone was welcome to show up at 7:30AM, so show up I did. After getting directions to a (much more organized) Evangelical church (please I’d like to participate in a 2000 year old tradition of worshipping in Spirit and Truth, not go to a mediocre concert), we (driver Sunil and I) eventually got ourselves sorted out right and found the place.

The church looked close. Sunil had a conversation with someone in Hindi, who then informed both Sunil and myself that mass was at 8:30AM. We killed time by getting coconut water, and then returned. It was still closed. I called the achen and asked if his church was open today. “No.” I could hear the full stop after the word. Of course the church wasn’t open.

You ever heard the cliché “legal as church on Sunday” or “as common as church on Sunday”? I was a bit concerned for a minute that in India they didn’t believe in Sundays, that they had church on Tuesdays instead, in spite of the whole concept of a 7-day week being a Judeo-Christian tradition in its origin and so clearly if India had weeks at all and churches at all they would bloody well have church on Sundays unless I had accidentally fallen in with Seventh-Day Adventists who think that people who go to church on Sundays are all going to hell and are the anti-Christ in which case maybe I should just go home.

I don’t remember what exactly I said to get the achen to continue, but I remember hearing that because it was the sixth Sunday of the month (I’m sure he said fifth, but I heard sixth) everyone was at the combined service in Delhi (!).

I am never e-mailing anyone in India again. Phone is probably better and smarter. I think it was the achen’s cell phone, which I suppose makes sense. I can probably message him on WhatsApp.

Which leads me to my rules of India:

Everything takes its time.
Everything goes wrong the first time.
Sometimes it goes wrong the second and third times too.
If you try to prevent this by better communication, people will ignore you and think you’re weird.

At least it’s cheap and people seem to be friendly and helpful and of good will. If I had to rewrite the rules of NYC, it would be:

Nothing is cheap.
Nothing is easy.
You are on your own, except your own personal friends.

The first and third of those, at least, are not true in India.

India: Zeroth Impressions

2017-07-25T00:00:00+00:00

Everyone’s been asking me how India is and has been wondering if I’ve gone exploring. I haven’t really. Sunday I was just recovering from jetlag and yesterday I had work and then I immediately had to go home and crash I was so tired: so I guess again recovering from jet lag? This would normally not prevent me from exploring, but I’m honestly a little outside my comfort zone.

I am not in a walkable neighborhood of a city like I expected, but next to a huge highway. There doesn’t seem to be a “downtown” to visit at all, so taking taxis everywhere seems to be the modus operandi. I’m sure this will change very soon, but so far, in my two days (and long morning) I’ve been here so far, I’ve been to the airport, my building, and the office — and of course all the taxi trips in beteween.

I’m not completely opposed to this. I read a new Sci Fi novel, The Forever War. I’ve studied a bunch of Hindi, eaten some room service meals (if you don’t order them, they call you), and yesterday, of course, started actually doing my work — you know, the reason I was sent here.

And even with my relative reclusivity, there’ve been a lot of impressions, enough to say quite a bit about my experiences in just those locations and the thoughts I’ve had in the meantime.

Language

Before I came here, I was told that learning Hindi was pointless as everyone here speaks English. A little Googling confirmed for me that that was as absurdly false as my intuition told me, but now that I’m here, I see what they mean: everyone here speaks the amount of English absolutely necessary for me to communicate with them, in the specific capacity I’m meant to communicate with that particular person. That is to say, the concierge at the hotel speaks enough English to address hotel situations, the person who brings me food speaks enough English to sell me more food or bring me something else, the taxi drivers speak enough English to find a destination, etc. They’re right, in a sense: everyone does speak enough English that I can scrape by.

Now note that I said the amount of English absolutely necessary. My colleagues at work are fluent; everyone else I encounter, it’s a bit more dicey. It is by no means enough that I am comfortable or that I understand what’s going on. It is barely enough for me to convince the cabby who drove me here that no, he could not get away with dropping me off at a metro station near my apartment building, I would have no way of getting un-lost. It is not enough for me to explain any nuanced situation.

When someone asks “How many?” I can’t really respond “How many can you bring?” When a situation happens, I had better stick to the script. Except for, I don’t live in your country, I don’t know your script, and I have no idea what’s normal. Can you help me figure out what the script is? I didn’t know I had to pay for those water bottles (approx US$1 or INR60 apiece).

And that’s what frustrates me most about the language barrier. I often feel, even in the US, that I’m expected to follow a script that no one gave me a copy of. When I go off script, people can get confused and upset, and so if I detect there is a script and don’t know what it is, my instinct is to ask for clarification rather than try to wing it. But here, no one can understand my clarification, and if I go off script, not only do people not understand (i.e. think I’m crazy), they don’t understand (i.e. cannot process the unexpected flurry of English words emerging from my mouth).

This would be a little less frustrating if my attempts to hire a Hindi tutor were treated like reasonable behavior and not a quaint and inexplicable desire that only an unreasonable person would have — or else a gesture of amazing good will that I’m showing for some inexplicable reason. Please, can we just look past the fact that I’m foreign and let me on your secret code? I promise not to divulge the secrets!

I shall continue to report on this as the situation develops.

Food

I’ve always loved Indian food. Ever since I was a kid I have, even though Gettysburg didn’t have an Indian restaurant — didn’t and still doesn’t. The food is actually not that different from Indian food that you might get in New York City, although its presentation and the attitudes towards it are different.

Meat is very clearly indicated. Unlike in NYC, where vegetarian food has a special mark or a special section, here there’ll be a special “non‐vegetarian” section. The only meats around seem to be chicken and mutton, and mutton I’ve only seen once. This explains maybe why so many people at my office are “vegetarian except for chicken,” except for it doesn’t because why is that even a thing? I suppose pork and beef are both too religiously problematic, and I suppose we’re too inland for fish, and I suppose lamb and mutton are expensive, and turkey’s an American bird, etc. etc.

“Indian breakfast” turns out to mean a bunch of bread and a little bit of yoghurt with an inconsequentially tiny but absolutely delicious side of pickle relish of some sort. My attempt at an American breakfast (eggs, toast, and hash browns) was disappointing in that it involved a very small amount of egg (two eggs, but tiny ones) and white toast. Who eats white toast? Does anyone like white toast? I guess it’s cheaper. I’m feeling kind of spoiled now.

The food at the office is amazing and involves spicy falafel and hummus among the buffet served for breakfast (at least the first day it did). 5pm is samosa o’clock: everyone takes a half hour little break to eat samosas. Unlike the New York office, Seamless isn’t a thing here. I wonder if there’s really that much delivery food at all; it’s really hard to tell. I’m getting the distinct impression that it’s a very narrow slice of society I’m being exposed to as it stands. That is one thing I definitely need to fix.

Work Culture

Why does Jimmy feel awkward?

A. Meeting lots of new people when I don’t really know anyone
B. Generally being an awkward turtle around new people
C. Not speaking the local language
D. Not being attuned to the local social norms and cues
E. Jet lag
F. All of the above

About Me

0001-01-01T00:00:00+00:00

Jimmy Hartzell

Programmer, writer, opinionator

Name: Jimmy Hartzell
Pronouns: He/Him
Dogs or Cats: Cats
Hair Style: Bald + Beard
Nationality: United States
Location: Pennsylvania (Philadelphia area)
Preferred Programming Languages: Rust, Haskell, C++, C
Resume: HTML, PDF
GitHub: jhartzell42
Programming Portfolio
Books I’ve Read Recently
Political Views
Biography: Programmer, Personal
Favorite Beer Style: Hefeweizen
Usable Languages: English, German
Languages I’ve Dabbled In: Spanish, Swedish, Japanese, Biblical Hebrew, Biblical Greek, Ecclesiastical Latin…

Professional Skills

Programming Languages

Rust
Haskell
C++
C
Python
Swift
Objective-C
Bash

Soft Skills (in a software development context)

Teaching: Lecturing and Explaining Technical Concepts
Mentoring
Requirements Gathering
Planning and Prioritizing Large Amounts of Work
Dealing with Clients

Hobbies

Writing/Blogging (hi!)
- About programming and other topics
- Even fiction sometimes!
- This is my biggest and most “productive” hobby
Friends
- Talking for hours and hours and hours on the phone with friends
- Travelling with friends
- Travelling to meet up with friends
- Going out with friends
- Going out to make friends
- Throwing parties to invite friends
- Board games with friends (I want more of this!)
Nerding out about/learning more about:
- Linguistics
- History
- Religious studies, especially the Hebrew Bible
- Germanic languages
- Economics
- Other nerdery (see the topics of this blog for details)
Exercising (this should be bigger and higher up, but honestly it just isn’t):
- Cycling
- Rock climbing (relatively new, I’m still quite bad!)
- Weight Training (lifting weights up and then putting them back down)
Keeping an overly meticulous TODO list
- Writing everthing I need to do on the list
- Making sure I maintain the list well
- Make sure I plan each day
- Fail to do many of the things I planned
  - Especially errands and chores (they’re the worst)
- Many days: Write down the few things I did succeed at doing
Music
- Planning to get less rusty on piano
- Seeing if I might get less rusty at singing
- Considering getting less rusty on trombone
- Annoying my friends playing recorder

C++ Network Programming: Study Questions and Practice Projects

0001-01-01T00:00:00+00:00

C++ Study Questions

What are some common examples of undefined behavior?
- What is memory corruption?
  - What are some common causes of it?
- How can you get UB without memory corruption?
- How does UB interact with optimization?
What is RAII? How does it differ from garbage collection?
Why use smart pointers instead of raw pointers?
What is the STL?
- What is it useful for?
- What are some common STL collections?
- What are some common STL algorithms?
- How do iterators work?
- Why can’t you dereference the end iterator?
What is the difference between “value semantics” and “reference semantics”?
What is “type erasure”?
- How is std::function implemented?
- How is std::any implemented?
  - What does std::any even do?
Why does an empty struct have a size of 1 and not 0?
What is std::atomic?
- How does it differ from volatile?
- How can it be optimized?
What are the versions of the C++ standard?
- What major features were added in each of them?
What are the major C++ compilers?
- Under what licensing terms are they available?

C++ Performance

What are some common compiler optimizations?
- What are some optimizations a compiler cannot do?
- What are downsides to having the compiler do optimizations?
Why are virtual function calls slower?
What parts of a codebase need most optimizing?
- What parts do not?
Common compiler flags
- What is the difference between -O2, -O3 and -Os?
- What is -g?
  - Why is it normally combined with -O0?
    - What does -O0 do?
    - How is that different from what -g does?
What is the difference between throughput and latency?

Operating System Questions

What is the difference between the stack and the heap?
What is an operating system kernel?
- What are some examples of operating system kernels?
- What is “kernel” or “supervisor” mode?
- What is a monolithic kernel vs a microkernel?
- What is a driver?
  - Can they run in usermode?
    - When?
What is a system call?
- What is the difference between a blocking and non-blocking system call?
What is the difference between a process and a thread?
- What are the performance implications?
Describe different forms of IPC
- What is the difference between brokered and non-brokered IPC?
- What considerations should you take into account when choosing an IPC system?
Describe virtual memory.
- What is an address space?
- What is memory protection?
- What is paging?
  - What is a page fault?
- What does a kernel have to do to implement virtual memory?
- How does a program allocate more memory on the stack?
- How does a program allocate more memory on the heap?
  - In userspace?
  - What syscalls might it have to make?
- How do memory mapped files work?
  - When should you prefer this to read and write syscalls?
- How does shared memory work?
  - When should you prefer this to other forms of IPC?
- What is swap?
- What is a TLB?
  - What performance implications does it have?

Network Study Questions

Explain the layers of the OSI model
- How do they map to Internet-based protocols?
Explain IP basics
- What is IP?
- What is an IP address?
  - What is a subnet?
  - What is a non-routable IP address?
  - What is localhost?
    - When would we use it?
  - What is NAT?
    - When is it used?
- How does IP relate to link-level protocols?
  - Give some examples of link-level protocols.
  - Explain CSMA/CD.
  - What is MTU?
  - Explain Path MTU Discovery, and why it’s important.
- What is ICMP?
  - What are 2 command-line utilities that use it?
  - Why is it important?
- What is the difference between a hub, a switch, and a router?
- What is the difference between IPv4 and IPv6?
Explain DNS basics
- What is DNS?
- What are some types of DNS records?
- On Unix
  - What is /etc/hosts?
  - What is /etc/resolv.conf?
  - What sort of things can go wrong if these are configured incorrectly?
- How do you access DNS from your applications?
What is a VPN?
Explain the basics of TCP.
- How does TCP implement reliability on top of IP?
- TCP is stream-oriented, UDP is packet-oriented. What does this mean?
- Explain the TCP three-way handshake.
  - What are SYN cookies?
- How do TCP acknowledgements work?
  - How does TCP handle the lack of negative acknowledgements?
- What is the TCP window size?
Explain the basics of UDP.
- What are some differences between TCP and UDP?
- Why might you want to use UDP?
- How are UDP broadcasts implemented?
On Linux, what sort of things can you tune about networking using sysctl?
- Why might you want to do this?
Why are heartbeats important?
- Why might they be implemented in user protocols?
  - Why is TCP keepalive not enough?

Network Programming Practice Projects

Echo server
- Can test by hand using netcat
- v1: Accepts single TCP connection, echos all inputs
- v2: Use threading to accept multiple connections, echo each to itself
- v3: Use single-thread
  - select or epoll syscall
  - async in Rust
  - Network reactor/dispatcher library in C++
Chat server
- Accept multiple connections
- Can test by hand using netcat
- v1: Send all whole lines (requires buffering) to all clients
  - No length restriction
- v2: Handshake to set username
  - Make still it still is testable using netcat
  - Server sends username along with messages
File server
- Simple HTTP-like protocol to specify filename of what file to send
  - Make sure you sanitize for inputs!
- v1: Serve one static file at a time
- v2: Stream multiple files to multiple clients without using threads
- v3: Support uploads
  - Write custom client
Exchange
- v1: Just exchange connection
  - Keep order books
    - When new order comes in compatible with old order, make trade
    - One security at first, then multiple
      - But build multiple into protocol right away
  - TCP Protocol:
    - Create a custom protocol with CLI client
    - Client -> Server
      - Orders
        
        Only GTC limit orders at first
        
        Quantity
        
        Security
        
        Price
        
        Buy/Sell
        
        Order ID
        
        Why is this useful?
    - Server -> Client
      - Order confirmation
        
        All order information, echoed
      - Trade confirmation (when orders match)
        
        Also indicate whether client was maker or taker
- Add-ons
  - Add heartbeats to protocol
  - Market data
    - UDP broadcast
    - Current state of order books
  - Drop copy
    - Separate TCP records of all trades
    - Record trades in file before confirming
    - Allow replay of previous days’ trades
  - Fees and kickbacks
    - Charge takers
    - Reimburse makers
    - Configurable on by-security basis
  - Credit limits
  - Advanced order types
    - IOC/FOK
    - Maker-only orders
  - FX-style last look
  - SSL support

Gardens

0001-01-01T00:00:00+00:00

Inspired by “The Garden and the Stream: A Technopastoral”, I have a few pages that are intended to grow over time, more like Wikipedia than Twitter, more like old-style webpages than a blog, more like books than newspapers or magazines, designed to be read in a slow trickle by people coming across my website, rather than booming across the Internet and then being nearly forgotten. I intend to increase how many of these I have over time.

Programming

Not Programming

Humorous Thoughts

0001-01-01T00:00:00+00:00

I stick out my tongue when I think, and I put my tongue in my cheek when I tease. Am I cartoon character??
By putting a blindfold on, you understand everything you can see.
Some of my best friends are archetypes.
As many of you may know, 2 + 2 is commonly held to equal 4.
According to the New Testament and some cladistically-inclined biologists, men are a type of fish. According to older norms of English usage and some curmudgeonly Bible translators, men encompasses all humans.

License

0001-01-01T00:00:00+00:00

All code on this blog is available under the MIT license:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Personal Bio

0001-01-01T00:00:00+00:00

TL;DR

I am an ADHD (hyperactive presentation) Pennsylvania Dutch millenial (1988) man, raised German Lutheran, (mostly) in small town Pennsylvania, in the United States.

I have family and friends but I’m not going to write about them in detail here because that feels too private.

Life Story

As a pre-school boy, I enjoyed pretending I was various animals I’d learn about from cartoons and books, especially pretending that I was the grey squirrel from the song “Grey Squirrel, grey squirrel, swish your bushy tail,” or the brief phase where I tried to insist I was a macaroni penguin. I enjoyed singing, and would often sing lullabies and children’s songs, or Christmas carols even when they were not in season. My parents could always entertain me or distract me easily with music.

As a young child, I was a fan of Winnie the Pooh. I enjoyed the books, I enjoyed the TV show, and at one point, my favorite toys were plastic Winnie the Pooh figures.

I was not a fan of Power Rangers, which my family did not watch. I did not know why my fellow kindergartners liked it so much, but was convinced they were objectively wrong somehow. I still don’t know anything about Power Rangers, but I’m willing to believe it may have been good in some ways if so many people liked it.

Being a neurodivergent child in the 90’s was super hard in a lot of ways, and I don’t want to talk about any of them, so I won’t. I will however mention that in kindergarten and first grade, I had a psychologist who would play board games and video games with me – this was known as “play therapy.” I did not know what this had to do with being a doctor, but I wasn’t about to complain, especially because it was fun. We played Commander Keene for MS-DOS. I remember being oddly impressed that he had Windows 3.0 instead of Windows 3.1 on his computer – even though it was a lower version number, it seemed oddly exotic. Why were there so few Windows 3.0 computers? How did everyone else manage to upgrade?

In any case, I insisted on getting piano lessons in the first grade – originally trying to get my mother to teach me, and then forcing her to hire a teacher when my demands for instruction outpaced her capacity to teach.

Throughout my youth, I enjoyed music, especially choral music (as a boy soprano and then as a tenor). I played piano (from 1st grade), trombone (from 4th grade), and recorder (I don’t know when I learned this). It was recorder that taught me how to play by ear: I used to come play Christmas carols during lunch in December, whether or not my fellow schoolchildren wanted to hear them. I got musical education both at school and at church, and appreciated this dual source of instruction. Communal singing was most of the draw of going to church.

In high school, I remained an avid reader, especially of science fiction and fantasy. My favorite authors were J.R.R. Tolkien, C.S. Lewis, Arthur C. Clarke and Robert Heinlein, approximately in that order. I did debate team and marching band in high school as my most time-intensive extra-curriculars. I enjoyed learning things both inside and outside of my classes, but did not at all enjoy doing homework assignments, and I resented adult authority over me, as is the way of teenagers. I had an ordinary amount of friends, many of whom I’m still friends with. My favorite topics in school were probably German and math. During the last summers, I worked at the local college’s IT department.

I went to Cornell University in Ithaca, NY at 17, where I originally hoped to dual-major in computer science and linguistics. Unfortunately, it turns out dual majoring is hard, and so I did not end up successfully doing that, just sticking to CS, officially – though I did take a number of linguistics classes, as well as coming up one class short of a dual major in math (minors were not offered in math).

At university, I made many friends, gained a lot of life experience, and tried and failed to learn Japanese. I worked part-time at one point as a system administrator in a college computer lab and at another point as a short-order cook in a college dining hall. I also worked as an undergraduate TA, designing better course software and teaching sections.

In 2010, I had a hiking accident and broke my back. I took a health leave of absence from college, recovered, and then worked as a programmer in NYC before returning to finish my bachelors in spring of 2013.

Besides my brief return to Cornell, I continued living in New York City. During my 20s, the most important thing to me was cultivating and maintaining close friendships, which I did, and traveling to as many fun places as possible, which I also did. I lived in Brooklyn, and tried to convince the hipsters I was one of them, in spite of my preference for lagers or wheat beers (or even fancy cocktails) over IPAs. I was a regular at a few bars and restaurants. I gradually adopted bicycle as one of my go-to means of transit, especially CitiBike to get around Brooklyn.

Then, in January of 2022, I realized that programming jobs no longer had physical locations and so I could live wherever I wanted. Now, I live in an overly large house (or perhaps I’m an overly small person?) in a different, slightly less small town in Pennsylvania. I don’t know what the most important thing will be to me in my 30’s – ask me again when I’m done with them.

But do you like any poems?

Disobedience
A.A. Milne

James James
Morrison Morrison
Weatherby George Dupree
Took great
Care of his Mother
Though he was only three.
James James
Said to his Mother,
“Mother,” he said, said he;
“You must never go down to the end of the town, if
you don’t go down with me.”

James James
Morrison’s Mother
Put on a golden gown,
James James
Morrison’s Mother
Drove to the end of the town.
James James
Morrison’s Mother
Said to herself, said she:
“I can get right down to the end of the town and be
back in time for tea.”

King John
Put up a notice,
“LOST or STOLEN or STRAYED!
JAMES JAMES
MORRISON’S MOTHER
SEEMS TO HABE BEEN MISLAID.
LAST SEEN
WANDERING VAGUELY
QUITE OF HER OWN ACCORD,
SHE TRIED TO GET DOWN TO THE END OF
THE TOWN - FORTY SHILLINGS REWARD!

James James
Morrison Morrison
(Commonly known as Jim)
Told his
Other relations
Not to go blaming him.
James James
Said to his Mother,
“Mother,” he said, said he,
“You must never go down to the end of the town with-
out consulting me.”

James James
Morrison’s Mother
Hasn’t been heard of since.
King John
Said he was sorry,
So did the Queen and Prince.
King John
(Somebody told me)
Said to a man he knew:
“If people go down to the end of the town, well, what
can anyone do?”

(Now then, very softly)
J. J.
M. M.
W. G. du P.
Took great
C/o his M*****
Though he was only 3.
J. J.
Said to his M*****
“M*****,” he said, said he:
“You-must-never-go-down-to-the-end-of-the-town-
if-you-don’t-go-down-with-ME!”

Political Views

0001-01-01T00:00:00+00:00

Hello! These are some of my political views! I live in the United States, so many of these are from a US perspective, but I am more interested in policy than I am in partisan politics (not that I’m ignoring that, it’s just not as interesting)!

Please feel free to send in additional information on these topics! Feel free also to request additional topics, and I might or might not add them!

Environment and Science

In general, I tend to be in favor of a lot of biological technologies that are scary to others.

Nuclear energy: Pro. It’s far safer than fossil fuels: nuclear has killed a very small number of people in accidents, but fossil fuels not only cause accidents but also kill millions of people a year from air pollution alone, not to mention permanent environmental damage. The alternative to nuclear energy isn’t renewables, but fossil fuels. When Germany got rid of nuclear, they added coal, and the “environmentalists” who wanted nuclear gone have made the planet worse by it.

As to whether new nuclear plants should be built instead of renewables, that’s a matter of a number of factors. But when it comes to dismantling nuclear, or other situations that in practice pit nuclear power against fossil fuels, nuclear power should always be preferred.

More resources:

Kurzgesagt has a series on nuclear energy:

GMO: Pro. It’s not that different from traditional plant breeding. It has the potential to lift people from poverty, food insecurity, and malnutrition. I do think that monopolistic practices connected to GMOs can be bad, but those are the fault of those practices, not GMO in general.

Vaccines: Pro. The purported downsides are mostly literal lies.

Technology

Open source: Pro. The government should fund open source development liberally, as should private corporations. Companies should upstream internal improvements to open source software.

Social media: Opposed. Online activity should be expressed through decentralized means like the Fediverse, e-mail, and traditional websites. Let’s give less control to the megacorporations and their algorithms.

Blockchain: Opposed. Cryptocurrencies are a bubble, and blockchains, while interesting in theory, use way too much energy in practice, for no societal gain.

“Artificial Intelligence”: Opposed as it is used. They steal from content creators in a bad way (from regular people to large corporations), they lie continuously because they have no concept of truth, and their biggest use case is lying convincingly and creating a bland but elevated style, both of which are bad things, actually. They are useful for language learning, and are an interesting topic for research.

Transportation and Housing

Building more housing: Pro. The US has a shortage of housing overall, and in particular a shortage of housing where jobs actually are. We also have a demographic crisis that will require more people, and once we have those people, either through immigration or by having children, we will need places to put them. Building new market housing does in fact overall drive housing prices down, or slow down their skyrocketing. Exceptions are local and transient, and those who say otherwise are mistaken.

Public transit: Pro. Especially trains (inter-city, subway, and trolley) are essential infrastructure for a functioning society. We should have many times the amount of transit infrastructure we have in the US. Trains are currently barely subsidized at all compared to cars and planes, so expecting trains to be profitable when those other means of transportation are not is actually very unfair.

Cars: Opposed. Cars not only pollute, they are also dangerous (car crashes are a huge cause of death) and bad for society. Only licensed sober adults can drive them, and people not in those categories also need to be able to socialize and visit each other. People should live in walkable towns with convenient walking access to services, not in suburban developments only accessible by car.

Economy

Anti-Trust: Pro. Break up big companies! Break up monopolies, especially tech monopolies! Disapprove mergers!

Debt ceiling: Opposed. It’s unconstitutional, default would be disastrous, and there are other mechanisms to keep budgets in line.

Government spending: Pro. Money has to come from somewhere, and the government is likely to spend it on good projects like infrastructure, rail, and climate technology, whereas the private sector is likely to spend it on consolidating monopolies and problematic tech feds, like AI and blockchain. Discretionary spending should focus on technological development and infrastructure, rather than military, and it should be an investment: it should try to grow the economy by a multiplier of the amount of the spending.

Regulations: Nuanced. Regulations should be effective but simple. Adding bureaucracy for limited benefit can be very bad, even if the purported goals are good.

Deficit spending: Pro. The government should always have at least a mild deficit, specifically in the US. Total debt is a bad metric for anything, but it can just monotonically increase, as GDP also grows (see next point). Debt per GDP should probably stay steady, although it might have to increase and decrease in various situations depending on economic conditions.

Growth: Pro. Green growth is possible. Saving the planet will be accomplished by building technology, not by asceticism, mostly because not enough people will support asceticism to actually make it a realistic goal. The economy can and will continue to grow as technology improves.

Taxation: Taxes in the US could be mildly higher. Tax increases should be careful to focus on the actual rich rather than the upper-middle class and small businesses. More important than increasing taxes is simplifying them, closing loopholes, funding enforcement, and getting rid of the arcane requirement in the US to file our own taxes. Just tell me how much money you want, IRS!

Entitlements: Entitlements should be universal rather than means-tested. For example: Everyone should get food stamps, funded by a tax that the poorest won’t pay anything for.

Representative Democracy

Constitutional monarchy: I enjoyed this article, which suggests that the US should become a constitutional monarchy and invite Beyoncé to literally be Queen.

Electoral system: Proportional representation. The US should transition to a German-style proportionate representation multi-party parliamentary system.

Programmer Bio

0001-01-01T00:00:00+00:00

I learned to program as a child in the 90’s back in the era when PCs came with sample games like QBASIC Nibbles, with included code for hobbyists and learners to play around with. It was sheer luck that this random interest of mine turned out to be a marketable, employable skill.

My personal story plays into the narrative of the driven young autodidact, usually male, usually socially awkward, who has that “genius” to be a “real programmer”, perhaps at the expense of other qualities. But this is not my take-away. Anyone can learn how to program, even if they don’t write a line of code – or even touch a computer – until arbitrarily late in life. And unlike other forms of math and logic, it is accessible even to young children, and should be taught in schools alongside other subjects.

I was very fortunate that my uncle and godfather encouraged me in programming from an early age, and also got me into Linux and open source software. I quickly developed an interest in operating systems and systems programming, running FreeBSD and trying to learn the Unix systems call API and the differences between the Unix flavors. I still feel most at home in systems programming languages, such as C; C++; and in our modern era, Rust (but more on that later).

After an embarrassing period where I got deeply into dynamically-typed OOP, Smalltalk, and Objective-C, I was introduced to functional programming and static typing at my university’s functional programming course (then called CS312 and in SML), and I quickly understood the benefits. My friends soon introduced me to Haskell, which I digested with enthusiasm. It is still my favorite GC’d language and applications programming language.

My career proceeded in the C++, systems side of things, as I worked as a low-latency programmer and later as an instructor at Tower Research Capital, but I never forgot my affection for Haskell and powerful static type systems. I excitedly embraced “modern C++” and C++11, and tried my best to use C++’s features to maximize safety and expressiveness.

My favorite part of my job at Tower was when I ran the technical training course for new programming hires, which I did for multiple iterations of that course. I covered topics like C++ template metaprogramming, network protocol design, and low-latency coding techniques. I really enjoyed instruction and got really good at explaining things and leading classes. I really miss teaching.

After Tower, I wanted to avoid finance and low-latency programming altogether, and took a job at Obsidian, one of the largest Haskell consultancy shops, to work on a mix of embedded C projects and Haskell projects in Reflex, Obsidian’s open-source framework for “Functional-Reactive Programming,” a revolutionary up-and-coming paradigm for GUI programming.

But my interests in strong typing and in systems programming could not fully be reconciled until I joined Savant Power, and discovered Rust. As I learned more about Rust, it overcame my initial skepticism, and it became clear to me that Rust was finally achieving what modern C++ had been striving for for so long: A high-performance systems programming language that was also type-safe, ergonomic, and composable.

And that is where I stand to this day: I believe that Rust is C++ done right.

Programming Portfolio

0001-01-01T00:00:00+00:00

This is a page where I link code that I have written that is publicly available. Most of my professional work has been proprietary, and I have not been much of a hobbyist programmer over my career (though I’m trying to change that), so unfortunately most of my code doesn’t make it on here, but there is still some!

My GitHub is jhartzell42.

There is also code published on various articles on this blog. Any code on this blog is covered under the MIT license.

Rust

holdem_rs (hobby, 2024): Some basic Texas Hold-Em (poker) code, we’ll see what happens with it.
prefix-range (hobby, 2023): Thank you Arvid Norlander for publishing my code from this blog post as a crate.
serde-dbus (professional, 2021-2022): This is a serialization format for serde for the DBus messaging protocol’s marshalling format. It was necessary due to an issue adapting the more common zvariant to my employer’s specific needs. It currently only supports DBus messages formatted in a “loosely typed” manner (with signature a{sv}) rather than the strongly typed manner more commonly used with DBus, but this suited the needs we had.
Pull requests accepted on open source projects:

C

ledger-app-tezos (professional, mostly 2018): Although I am not personally a cryptocurrency enthusiast, during my time working for Obsidian Systems, it was part of my job to implement this app to support Tezos on the Ledger. Though it was a group project, I was the primary original author. The original target platform, the Ledger Nano S, had only 4K of RAM. I understand it has changed a lot since I originally worked on it.
compass (hobby, 2009-2010): This was a hobby project with a friend in college, and it is a bytecode interpreter for a Smalltalk-like language in C, combined with a compiler in Python and an “assembler” for the bytecode language also in C.

Haskell

reflex-chess (hobby, mostly 2019): This is a sample game written using Reflex. It only allows playing against another local user. It has also been worked into the official reflex-examples repo.
reflex-word-tiles (hobby, 2022): A Wordle clone. Work in progress.
Pull requests accepted on open source projects:
- reflex-dom bugfix
- reflex-examples add chess

Reading Log

0001-01-01T00:00:00+00:00

These are books that I have finished, and links to reviews if I’ve written them.

March 2025

What’s Your Vibe?, by Kat Majik

February 2025

Liminal Illumination: A Short Book of Intentional Poetry, by Ariel E Monroe
Lying in the Deep, by Diana Urban

January 2025

Searching for Dragons, Patricia C Wrede
Attached, Amir Levine and Rachel S.F. Heller
Eruption, Michael Crichton and James Patterson
A True Account: Hannah Masury’s Sojourn Amongst the Pyrates, Written by Herself, by Katherine Howe

December 2024

The Silent Patient, Alex Michaelides

November 2024

The Midnight Library, Matt Haig
The Princess Trials, Cordelia Castel

October 2024

The Wishing Game, Meg Shaffer
Submission, Michel Houellebecq
Still Life, Louise Penny
The Mars House, Natasha Pulley

September 2024

This Is Where I Leave You, Jonathon Tropper
Before We Were Trans: A New History of Gender, Kit Heyam
Strong Towns: A Bottom-Up Revolution to Re-Build American Propserity, Charles L Marohn, Jr.

August 2024

Project Hail Mary, Andy Weir
Sign Here, Claudia Lux
The Courage to Trust: A Guide to Building Deep and Lasting Relationships, by Cynthia L. Wall, LCSW

July 2024

The Other Valley, Scott Alexander Howard
House in the Cerulean Sea, TJ Klune
So Long and Thanks for All the Fish, Douglas Adams (re-read)

June 2024

Hitchhiker’s Guide to the Galaxy, Douglas Adams (re-read)
QED: The Strange Theory of Light and Matter, Richard Feynman (re-read)
The Glass Hotel, Emily St. John Mandel
The Restaurant at the End of the Universe, Douglas Adams (re-read)
Life, The Universe, and Everything, Douglas Adams (re-read)

May 2024

The Making of Biblical Womanhood: How the Subjugation of Women Became Gospel Truth, Beth Allison Barr
Where Secrets Lie, Eva V. Gibson

April 2024

Good Taste: A Novel in Search of Good Food, Caroline Scott
Play It as It Lies, Joan Didion
Bhagavad Gita: A New Translation, translated Stephen Mitchell

March 2024

In Five Years, Rebecca Serle
Anathem, Neal Stephenson

February 2024

Stop Walking on Eggshells, Paul T. Mason and Randi Kreger
A River Enchanted, Rebecca Ross

January 2024

The Galaxy and the Ground Within, Becky Chambers
Dealing with Dragons, Patricia C Wrede
I Hate You, Don’t Leave Me: Understanding the Borderline Personality, Jerold Kriesman, Hal Straus
One Billion Americans: The Case for Thinking Bigger, Matthew Yglesias
Mika in Real Life, Emiko Jean

November 2023

Exiles from Earth, Ben Bova
What is Real? The Unfinished Quest for the Meaning of Quantum Physics, Adam Becker
Children of Ruin, Adrian Tchaikovsky
The Power, Naomi Alderman

October 2023

She Who Became the Sun, Shelley Parker-Chan
Children of Time, Adrian Tchaikovsky

September 2023

Red, White, and Royal Blue, Casey McQuiston
What is Anarchism? An Introduction, Donald Rooum et al.

August 2023

Light from Uncommon Stars, Ryka Aoki
Record of a Spaceborn Few, Becky Chambers
A Closed and Common Orbit, Becky Chambers
Madam, Phoebe Wynn
The Terraformers, Annalee Newitz

July 2023

Polysecure: Attachment, Trauma, and Consensual Non-Monogamy, Jessica Fern

June 2023

The Long Way to a Small, Angry Planet, Becky Chambers
Once There Were Wolves, Charlotte McConaghy

March 2023

Circe, Madeline Miller
What If? 2, Randall Munroe
Beartown, Fredrik Backman, tr. Neil Smith
God Is Disappointed in You, Mark Russell and Shannon Wheeler
Money: The True Story of a Made-Up Thing, Jacob Goldstein
Understanding Government Finance, Brian Romanchuk
Modern Monetary Theory and the Recovery, Brian Romanchuk

February 2023

Red Inferno: 1945, Robert Conroy

January 2023

God: A Biography, Jack Miles
Rubinrot, Kerstin Gier
Quiet, Susan Cain
How the Bible Was Built, Charles Merrill Smith and James W. Bennett

November 2022

A Memory Called Empire, Arkady Martine
A Desolation Called Peace, Arkady Martine

October 2022

Unfamiliar Fishes, Sarah Vowell
Atomic Habits, James Clear

September 2022

The Lamplighters, Emma Stonex
Hollow Kingdom, Kira Jane Buxton
Dune Messiah, Frank Herbert

August 2022

Kaiju Preservation Society, John Scalzi
Buffering: Unshared Tales of a Life Fully Loaded, Hannah Hart

July 2022

Cold Clay, Juneau Black
Monk & Robot, Becky Chambers
- A Psalm for the Wild-Built
- A Prayer for the Crown-Shy
Will My Cat Eat My Eyeballs?, Caitlin Doughty

June 2022

Hyperion Cantos, Dan Simmons
- (Had previously read 1 & 2)
- Endymion
- The Rise of Endymion
Shady Hollow, Juneau Black

May 2022

Old Man’s War, John Scalzi (re-read)
- Old Man’s War
- “Questions for a Soldier”
- The Ghost Brigades
- “The Sagan Diary”
- The Last Colony
- “After the Coup”
- Zoe’s Tale
- The Human Division
- “Hafte Sorvalh Eats a Churro and Speaks to the Youth of Today”
- The End of All Things

April 2022

The Comic Book Story of Beer, Aaron McConnell, Jonathan Hennessey, and Mike Smith
Plain Truth, Jodi Picoult

There is no way I can record books read before this, so I shan’t try.

Recommended Rust Resources & Reading

Here are a few links to materials that helped me get my bearings in Rust, and understand it deeper.

Many of these are bog-standard, and the same materials others would recommend. This is a good thing. Rust is an extremely well-polished programming language with an excellent community. Trust them about the idioms. Trust the programming language. If you don’t understand why Rust made a certain decision, know that there is almost certainly a well-considered, important reason, and that includes its choices of commonly-recommended documentation.

Beginners

Of course, The Book is the absolutely indispensable canonical reference for the programming language.
O’Reilly’s book is also well worth reading
For those who like exercises, Rustlings is incredibly well-done.
For those coming, like I did, from C and C++, Learn Rust the Dangerous Way is a great resource.
Rust has a relatively small standard library. Some external dependencies are practically standard, and have their own tutorials and documentation. Among these are Tokio, Serde, and log.
- There is a fuller list of “standard non-standard” crates

Intermediate

Rust for Rustaceans is a nice “second semester” course in Rust, covering all the things that every advanced Rust programmer really should know, and no longer has to learn the hard way. This contained a lot of especially useful information for the serious software engineering and dependency management considerations involved in maintaining a library and publishing a crate.
Faster than Lime, as far as I can tell, straddles beginner and intermediate.

Unsafe Rust

There are a lot of misunderstandings about unsafe in Rust, but most can be cleared up by reading the Rustonomicon. Even if you don’t personally have occasion to use unsafe, it is an essential part of the language, and the crates that you depend on – whose source code you should be reading – will use it.
Learn Rust With Entirely Too Many Linked Lists is a project to teach Rust by programming the data structure it is perhaps least well-suited to.
Ralf Jung on undefined behavior:
- Undefined Behavior Deserves a Better Reputation
- Do we really need Undefined Behavior?
Ralf Jung’s series on pointers/memory models:

Articles

Typestate Pattern in Rust
An introduction to Oxide’s new operating system and debugger, Hubris and Humility
Rust can help with the environment as well
Not Rust, but systems programming relevant: structure packing
Why Discord is switching from Go to Rust
A thorough treatise on why Go is bad

Résumé

0001-01-01T00:00:00+00:00

Jimmy Hartzell: Systems Programmer

Phone: 646-334-9882, Email: jah259@cornell.edu, Website: https://www.thecodedmessage.com/

Skills

Programming languages: Rust, C++, C, Haskell, Swift, Python, Objective-C, Bash, x86 assembly (32 and 64 bit)
Technologies: Linux systems/low-latency network programming, Tokio, Reflex FRP, Yocto, AWS, Ledger Nano S, Redis, C++ template metaprogramming

Career Experience

Two Sigma: August 2024-Present, Rust Specialist
- Technologies: Rust
Amtrak: July 2023-June 2024, Senior Principal Software Engineer
- Technologies: C++, HP NonStop
- Developed simulator for ITCS Positive Train Control protocol
- Fixed bugs in HP NonStop dispatching codebase
Savant Systems: May 2021-June 2023, Senior Embedded Linux Software Developer
- Technologies: Rust (incl. Tokio), Yocto, Swift, Objective-C, Redis
- Wrote usermode Rust driver for Atmel energy meter
- Adapted quickly to a decades-old Objective-C codebase
- Developed and implemented migration plans for core components of system architecture
- Rewrote Swift microservices and frameworks into Rust
- Added caching layers around accesses to legacy key-value store, and implemented bidirectional synchronization between it and Redis
Obsidian Systems: March 2018-May 2021, Software Development Consultant
- Technologies: Haskell, Reflex FRP, C, Ledger Nano S, Nix, C++
- Full-stack Haskell application development
- Worked with a variety of clients, with diverse corporate culture and organizational systems
- Worked on Incremental View, a database research project for incremental queries on Postgres
- Wrote apps in embedded C on Ledger Nano S (a platform w/ 4K of RAM)
- Refactored overengineered client C++ codebases
- Did trainings and talks on C++, Rust, blockchain, and Haskell
Tower Research: June 2013-March 2018, Senior Software Developer
- Technologies: C++ (C++11, C++14), C++ template metaprogramming, Linux systems programming, clang-format, valgrind, gdb, FIX protocol, Intel64 assembly
- Risk platform, C++ development (2017-2018):
  - Wrote a new high-performance logging system
  - Led a small team to add new trade reconciliation systems to comply with EU regulations
- Lead training instructor (2016-2018):
  - Developed and taught full-time C++, networking, systems, and low-latency programming programming curriculum for new hires in US and India
  - Trained and mentored other instructors
- FX trading desk, C++ development (2013-2016):
  - Mentorship: First line of defense for team member questions
  - Continuously made latency improvements for market data handlers
  - Developed new aggregator project to aggregate internal liquidity
  - Owned support for FX “last look” feature
  - Wrote/maintained handlers for many financial protocols
Moat: Feb 2011-March 2013, Infrastructure Developer
- Technologies: Python, C++, Bash, AWS, S3
- Led a 3-member team to develop server discovery and deployment scripts
- Scalable bloom filter implementation in C++

Education

Cornell University: Bachelors in Computer Science

Rust Opinions

0001-01-01T00:00:00+00:00

Rust Style Guidelines

cargo fmt is your friend. I use the default settings.
clippy is a great tool, and should be a requirement for getting PRs merged.
unsafe is fine when called for, but use carefully
- It should be commented
- It should be wrapped in a safe abstraction

Error Handling

Panicking always indicates a bug
- Especially in a library
- Also in an application
- Doesn’t mean you can’t call panic
  - But if that code path is actually activated, it’s a bug
Use ? even in toy projects
unwrap does not belong in your repos. expect is acceptable.
- It is too prone to abuse. clippy should be configured to ban it.
- Use expect where unwrap would be appropriate
  - expect is appropriate for indicating logic errors
    - A panic is always a bug, especially in a library
    - unlock() is an appropriate place to use expect
  - expect is not appropriate as a band-aid on bad flow control
    - Use if let Some(x) = rather than is_some followed by unwrap
  - ? should be preferred in most non-logic error situations
    - Or explicit handling
  - Write a useful message with expect
    - To aid debugging if you were wrong
    - To show to the reader why you think it will never be called
  - Consider writing panicking functions over calling expect repeatedly
    - Array indexes are a good example of this
- This is controversial
  - Some experts think unwrap is OK in some situations where I allow expect
    - I think unwrap is too tempting for the situations where neither expect or unwrap is OK
  - No experts think unwrap is fine to use much more liberally than that
- unwrap is OK in draft code where it is interpreted as a XXX
  - Don’t let this into your production codebase
  - Make sure you have the unwrap check in clippy enabled in CI so it doesn’t slip in
Use thiserror for libraries, use anyhow or eyre for binaries.

Things I love about Rust

The Programming Language Itself

That it has proper sum types in enums
That it has all the expressive power of C++ if you need it, while demarcating a safe subset
That Rust lifetimes have finally rescued “regions” from academia
- And therefore made RAII complete
That it doesn’t have OOP-style inheritance

Standard Library

The map APIs are quite excellent

The Ecosystem

That everyone uses log but there are multiple backends available
That cargo is so ergonomic

How I wish Rust had been made differently (but it’s too late to change)

I wish unwrap were not part of the standard library
I wish the C/C++ distinction between . and -> were retained, because I think sometimes automatically dereferencing and sometimes not is surprising.

0001-01-01T00:00:00+00:00

Welcome to The Coded Message! on The Coded Message

Why can't you request changes from yourself on GitHub?

The AI Non-Economy: A Rant

Large Language Models Should Have to Obey Copyright

Thievery

The Legal Question

AIs Are Not Humans

Can C++ fix its biggest problem?

Can we migrate C++ programmers to a safe programming language?

Can C++ itself be made suitably memory safe?

Conclusion

There's Always Problems

Asahi Linux Again

Wayland and Sway

Box64 for Baba Is You

What Bits Mean: Meta-Data and Static Typing

Asking Nicely: Avoiding Passive and Aggressive Communication

Passive and Aggressive Communication

Passive and Aggressive Communication in One Person

Passive and Aggressive Communication in My Life

Asking Nicely

Long-Term Thinking

What Bits Mean: Binary Integers and Two's Complement

Storing Numbers in Binary

Adding Numbers with Circuitry or Program Logic

Overflow

Modular Arithmetic

Two’s Complement

Summary

Sorting Polymorphically in Many Languages

Sorting: A Polymorphic Function

Programming Language #0: Sorting in C

Programming Language #1: Sorting in Java

Programming Language #2: Sorting in C++

Programming Language #3: Sorting in Haskell

Programming Language #4: Sorting in Rust

Conclusion

A Review of Self-Help as a Genre, and Atomic Habits in Particular

Self-Help in General

Problem #1: Length

Problem #2: Wildly varying standards

Problem #3: They can turn into religions

How I read them

Atomic Habits in Particular

My Take-Aways

Conclusions

Minor News: Some Repos on GitHub

Repo #1: Crate Version of Prefix Ranges

Repo #2: Texas Hold-Em Library/Quiz App

Review: One Billion Americans, by Matthew Yglesias

Is Section 3 of the 14th Amendment Undemocratic?

Opinions about the Opinion

Narrowing the Question

All Qualifications Are Undemocratic

The Problem of Cheaters

Democracy as a Peaceful Replacement for War

The Downsides of the Ban

Rule of Law and the United States in Particular

Appendix: Text of the Section

Appendix: Footnotes

2023 in Retrospective and 2024 in Prospective

Rust Is Beyond Object-Oriented, Part 3: Inheritance

Why do people like inheritance?

What do I mean by inheritance?

What does inheritance actually do?

But what about the virtual methods?

So what can we do instead?

What should I actually do in Rust instead of inheritance?

Are You Sure? (Revised)

Endianness, and why I don't like htons(3) and friends

Why Little Endian Bugs Us

When Endianness Comes In

The Main Argument: Why I dislike htons and friends

Using These “Big Endian” Types

Conclusions and Loose Ends

Operating Systems: What is the command line?

Graphical User Interfaces

The Command Line in Brief

What is the command line not?

History of the command line

The Main Argument: Why I dislike `htons` and friends

Explicit `self` reference instead of implicit `this` pointer

A new `byte` type for `uint8_t` and `int8_t`

`const` is not the default

Testing `prefixed`

Walking Through `char`s

`serde` flattening

`let` surprises!

Remember: `serde` `struct`s Can Be Function-Local