The Haskeller's Hungarian Notation

When I was first learning to program, a long time ago, it was in BASIC, and you had to annotate your variable names to indicate what type something is. foo would be a number, whereas foo$ would be a string. This meant that there could only be as many types of information as there were symbols to put after your variable, but that was okay for the sort of programming BASIC was used for. These were called sigils, and they helped you keep straight in your head what was going on +++ and made it easier for the computer too. Any aggregates had to be explicitly declared.

Later on, I learned Perl, which had a similar system, but with a twist. A variable named $foo could contain a number or a string — or even some sort of object or reference — but it could only contain one of them. It was a “scalar.” @foo would contain many scalars with indices in an array, and %foo would contain many with string or other keys in a hash map. The computer kept track, dynamically, of the practical types of the scalars, and could easily do the same for the aggregate types, but chose to instead enforce a mechanism where the programmer would be reminded of whether it was a single value or some sort of aggregate that was being discussed.

In Haskell terms, BASIC had you use sigils for data types, but Perl had you use sigils for functors. And not to make people too upset by comparing Haskell and Perl, but Haskellers regularly do the same today, voluntarily annotating variable names with the functors by convention. For example, dmdMenuItems might translate, in a Reflex codebase, to Dynamic of Maybe of Dynamic of list of DomElement.

The usage originally struck me as quite strange, and I didn’t like it. I remember thinking the original Hungarian notation was redundant: int iFoo; literally says int right before it. And besides, wasn’t the point of a type system to not need extra mnemonics, because the compiler will stop you from messing things up?

At my previous job, we used prefixes like m_ and g_ in C++ to indicate scope (member variable/field and global, respectively), and it similarly took me a while to adapt. In those situations, it turned out to help because the sigils told you where to look for more information. If there wasn’t a m_, you looked in the same function, but otherwise you had to immediately go to the class declaration. But that wasn’t the only advantage. What scope something was in was important in how you treated the variable, in many subtle ways that would be bad to confuse, and which the compiler in C++ wouldn’t really help you with.

Similarly, in Haskell, indicating what functor something is in tells you something important: What kinds of things can you do to get a regular value out of it? Do you need to provide a default value (Maybe) or only provide it to versions of functions adapted for it (Dynamic) or perhaps just keep the functor around while transforming the values inside ((<$>), and (<$$>), and (<$$$>)…where which one depends on how many functors). And while the compiler will help us with this, it’s something it’s convenient to see all the time, and the types of each individual variable are sometimes inferred and always not immediately visible in every usage.

And when we do write the pure function or the lambda or the fromMaybe or the dyn_ $ ffor ..., what variable do we name it now? Many times we have many variables with the exact same semantic role, the only difference being what functors they’ve been wrapped with. We want to say ffor dSelectedId $ \selectedId -> ... or fmap (\number -> number + 1) eNumber or let fish = fromMaybe defaultFish mFish. The alternative is, what, judicious use of ' for the different but analogous variables? The difference between these variables, intuitively, is how wrapped up in functors they are, and that should also be the difference in their names.

And I’ve decided this is a good thing. Conventionalized terseness is the least problematic type of terseness. Single-letter abbreviations are great if it communicates information efficiently and everyone agrees on what they mean. I’ve seen dyn and may as well, and I prefer d and m, as they are easier to stack up without getting too unwieldy, and besides, dyn is used for functions and may is also a verb (does mayFish mean something that’s a Maybe Fish or a boolean about whether you are permitted to fish?)

And so, in spite of my initial skepticism, I’ve come to like this naming convention, and I recommend it to all of you as well.

If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.

The Haskeller’s Hungarian Notation

Subscribe

Comments