Endianness is a long-standing headache for many a computer science student, and a thorn in the side of practitioners. I have already written some about it in a different context. Today, I’d like to talk more about how to deal with endianness in programming languages and APIs, especially how to deal with it in a principled, type-safe way.

Before we get to that, I want to make some preliminary clarifications about endianness, which will help inform our API design.

Why Little Endian Bugs Us#

New students often are more confused by little endian (where the least-significant component of an integer is stored first), and until they are told about it, they tend to assume computers are big endian (where the most-significant component is stored first) even if they don’t know that word. This is due primarily to the fact that big endian is what they’re used to: We write numbers with the most significant digit on the left, and in languages that write from left to write (including English, the lingua franca of programming among other things), this means that we live our day to day lives in big endian. But that doesn’t mean that big endian is more logical in any way, just that it is more conventional.

This isn’t helped by the fact that many learners are first exposed to little endian by it being confusing, and making them do more cognitive work, by reading little endian numbers from a hex dump. Take, for example, this code, which displays a 32-bit number in hexadecimal, and then displays the individual bytes of the same number as a hex dump:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[0], bytes[1], bytes[2], bytes[3]);

This results in this befudding output:

12345678
78 56 34 12

When read as a number, we can just read the number normally. However, when read as a series of bytes, we find ourselves having to read the number from right to left to read the number as big endian, as we are accustomed to doing. We can’t even just read backwards, however, as each byte is still printed internally according to our big endian convention: the higher-order hex digit is still printed first, followed by the lower-order hex digit.

The problem here isn’t little endian. The problem is that the printing functionality accommodates our big endian preference in printing, but only at the level of printing an individual number, either as a byte or as a 32-bit word. The word printed as a whole is printed big endian, to accommodate us. The individual bytes are also printed big endian, to accommodate us. However, the hex dump as a whole is printed with the lower values on the left, and the higher values on the right, to similarly accommodate our values that lower-indexed memory, memory that comes earlier, should be on the left. On a little endian system, this desire to print each number with the most significant digit on the left, but to print a sequence of numbers from left to right, leads to the contradiction. The resulting last line, 78 56 34 12, isn’t, properly speaking, little endian. The print-out is an odd type of mixed endian, due to our awkward conventions.

There is actually a relatively easy fix: if we insist on reading numbers with the most significant digit on the right (which we do), and the computer insists on storing less significant components first (which it does), these two desires can be reconciled by printing the hex dump from right to left:

uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[3], bytes[2], bytes[1], bytes[0]);

This results in a much cleaner print-out:

12345678
12 34 56 78

This should make clear that the weirdness of little endian is entirely due to our preference for big endian, and our preference for listing the lower-indexed values to the left, and how these preferences interact. It is because of human conventions, not because of any intrinsic problem with little endian. I would argue that, on little endian systems, all hex dumps should be right to left, and that would help, but there is little I can do to change the conventions of this.

Now, almost all modern systems are little endian, either because they are typically configured that way for processors that support either endianness, or because they only support little endian, like Intel processors. The few programmers who have to write code for big endian systems find themselves in the minority, and find themselves doing extra work to deal with other code that no longer accommodates big endianness.

There is one big exception to this: the Internet. All of the Internet protocols are designed to use big endian ordering, known in this context as “network byte ordering.” This is because when the Internet protocols were developed, big endian was a viable rival to little endian, and both byte orders were common.

This does make some sense, as well, because hex dumps of packets are very common, and big endian does make those hex dumps easier to read and reckon with for us big endian humans.

When Endianness Comes In#

I would also like to clarify something about how endianness works. A 32-bit word in a register in the processor is neither big endian nor little endian. The processor needs to be designed knowing which bits are more significant, and which are less, but there is no intrinsic way in which the less significant bits come “first.” In a word-based memory system, where only entire words were stored in memory (like the PDP-7 was with its 16-bit words), and where it was impossible to address memory in terms of individual bytes, this would be the end of it.

As an example of this, see the documentation for std::endian on CppReference.com:

If all scalar types have sizeof equal to 1, endianness does not matter and all three values, std::endian::little, std::endian::big, and std::endian::native are the same.

However, once we come up with the idea that memory is made up of bytes, the endianness question arises: How do we split this 32-bit number into bytes? Which end of it should be byte 0, and which end byte 3? Similarly, if we read a series of bytes into memory, where should the first byte (by memory address) go in the register, the most significant (big) end, or the least significant (little) end?

As a result, types like uint32_t (and uint16_t and uint64_t) have no intrinsic endianness, so long as they are stored in registers. Only if they are written to memory, or read from memory, does their endianness matter. And then, it only matters if the actual byte representation is important – if we, as in the code above, use memcpy to copy their representation, byte by byte, into an array of bytes.

In general, if the byte representation does matter, I would argue that uint32_t should be treated as an abstract 32-bit value, devoid of endianness. Only when it is transcribed as a series of bytes should endianness be taken into account – and then the description should instead have the type of uint8_t[4] in C (or std::array<uint8_t, 4> in C++ or [u8; 4] in Rust).

The Main Argument: Why I dislike htons and friends#

In C, however, we do not in fact do this. We instead have functions like htons, with this signature:

uint16_t htons(uint16_t hostshort);

uint16_t http_port = htons(80);

This function purports to convert a 16-bit number from host endianness (typically little) to network endianness (always big). Assuming a little endian computer, it does a byteswap: It swaps the less significant 8 bits with the more significant 8 bits in the register used to return the uint16_t.

So what are the properties of the returned uint16_t? If we passed in, for example, 80 (the port of HTTP), http_port, the new uint16_t is 20480 – because 80 is 0x0050 in hex, and we’ve swapped the two bytes, so we now have 0x5000. What is this number?

It is not, to be clear, a uint16_t value 80 that is now in “big endian,” though we might say that as a manner of speaking. It is almost certainly in a register, and as mentioned before, registers don’t have intrinsic endianness. It is something far more awkward: It is a value that, if we were to store it in little endian (the only option), results in a different number being stored in big endian.

To expand on this: 20480 is not a particularly meaningful number. It is not actually the port number we want to use. And it has nothing to do with the actual number 20480. It is simply a number that, if we store it in memory as bytes, will result in 0x00 being stored, followed by 0x50 – the big endian representation of 80. It is a uint16_t with a value chosen not for what number we want to store, but what bytes we will get if we store http_port as bytes.

Since uint16_t is designed to store numbers, not collections of bytes, I would argue that this type is not being used in a semantically honest way – it is a lie. What we are really storing is an array of 2 bytes, 2 uint8_ts. We are storing it in a 16-bit register, and implementation-wise that might be a good decision – but I would argue, if we want that to be possible, we should create an ABI where uint8_t[2] should be storable in a single register. The C programming languages, by not making arrays first-class types, is getting in our way here, which explains the situation.

Am I exaggerating when I say the type is a lie? Well, we expect to be able to do arithmetic on a uint16_t, to be able to test, for example, whether it is less than 1024, as listening on a port less than 1024 is a privileged operation. But in order to do that, we have to convert it back to a normal uint16_t – all uint16_t’s usual arithmetic operators are inappropriate for data that’s stored with its bytes swapped around.

So what should be done? Well, if we really intend to express a value in network byte order, e.g. big endian, we are changing the semantics of the information from “this is a 16-bit integer” to “this is a specific sequence of two bytes, chosen for a reason.” Therefore, the return value of htons should be an aggregate of two bytes.

Again, because of pointer decay this is impossible to express straight-forwardly in C, although a wrapper struct could be used. C++ takes care of this by having a built-in wrapper struct for arrays, namely std::array. The equivalent of htons would not emphasize that the uint16_t is in the host order (which I think is the wrong way of thinking about it), but would simply indicate that we’re just storing this short in a big-endian fashion (as opposed to the hardware-supported default storage we can access with a memcpy):

std::array<uint8_t, 2> store_short_as_big_endian(uint16_t value);

Rust already provides this as an alternative:

impl u16 {
    pub const fn to_be_bytes(self) -> [u8; 2] {
        // ...
    }
}

Unfortunately for semantics, Rust still has the problematic signature for to_be:

impl u16 {
    pub const fn to_be(self) -> u16 {
        // ...
    }
}

Perhaps this is due to efficiency reasons, or felt efficiency. Programmers know that this byteswapped value should, for performance, be stored in a single register. Programmers can feel more confident that this is actually done if it remains a u16 (or uint16_t) than if it is transformed into an array of bytes, however semantically inappropriate the u16 is.

However, if we are using a u16 or uint16_t as an implementation layer for what is in fact a way of storing two bytes in the opposite order than the one that makes sense for our processor, if we are using it as an implementation trick to do something semantically different from what a uint16_t normally does, then we should at least make the type distinct to give the maintenance programmer and compiler some ability to avoid letting us do non-sensical things (like comparing the value using uint16_t’s comparison operator).

Luckily, there is a design pattern for using the implementation of a type, but applying different semantics to it: the newtype pattern. We typically think of it as a Haskell or Rust thing, but we can use it in C++ as well. I would argue that if we’re going to abuse uint16_ts and friends in such a way, we should at least abstract it using the newtype pattern. In C++, this would look something like this, assuming a little endian computer:

template <typename T>
class big_endian {
    T value;
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        value = std::byteswap(in);
        return *this;
    }

    operator T() {
        return std::byteswap(value);
    }
};

Adding appropriate if constexpr expressions to also support big endian machines, and defining std::byteswap if you don’t have it yet on your system is left as an exercise to the reader.

But it works on my (little endian) system:

int main() {
    big_endian<uint16_t> be = 80;
    std::array<uint8_t, 2> be_bytes;
    memcpy(be_bytes.data(), &be, 2);
    printf("%04X\n", uint16_t(be));
    printf("%02X %02X\n", be_bytes[0], be_bytes[1]);
    return 0;
}

I would much rather use this to represent “we want to store a value in a register byte-swapped on some platforms” than a uint16_t with no additional type information. You cannot accidentally run invalid uint16_t operators on it, but you can convert it to a normal uint16_t first and then use those operators. However, it does have a big endian representation when stored, as indicated by the memcpy, and it can still be stored in a single register.

Even so, I would still not prioritize that ability to store it in a single register in most situations. Using a uint16_t to store the bytes swapped is still not remotely “storing a big endian value in a uint16_t,” it is “storing a big endian representation in a uint16_t so that when the processor writes that uint16_t little endian, we get a big endian representation of the number we actually want.” It’s still fundamentally a hack for performance, and while I’m comfortable with it contained within the encapsulation of this little_endian class, I would still rather actually write std::array<uint8_t, sizeof(T)> as the underlying storage type, unless the optimization is actually needed. I actually would use a big_endian class that would look more like this:

template <typename T>
class big_endian {
    std::array<uint8_t, sizeof(T)> be_representation;

    static void swap_array(std::array<uint8_t, sizeof(T)> &arr) {
        for (auto it = arr.begin(), jt = arr.end() - 1;
             it < jt;
             ++it, --jt) {
            std::swap(*it, *jt);
        }
    }
public:
    big_endian() = default;
    big_endian& operator=(const big_endian&) = default;

    big_endian(T in) {
        *this = in;
    }

    big_endian& operator=(T in) {
        memcpy(be_representation.data(), &in, sizeof(T));
        swap_array(be_representation);
        return *this;
    }

    operator T() {
        auto bytes_copy = be_representation;
        swap_array(bytes_copy);
        T out;
        memcpy(&out, bytes_copy.data(), sizeof(T));
        return out;
    }
};

This now feels like I’m actually representing accurately what a big endian representation is: a way of storing a number as a sequence of bytes, rather than however the processor feels like storing it, and certainly rather than as a value that the processor will store as little endian, but which will store the value we actually want to store as big endian. I won’t lie and say the optimizer will make it equally performant, and if I needed to actually optimize I would use the other version, but I feel like this version is hack-free. (Again, it still only works on little endian platforms – fixing this is again left as an exercise.)

This version has the added benefit of having an alignment of 1, which I will argue later is more appropriate than using the underlying alignment of uint16_t, uint32_t, etc.

Using These “Big Endian” Types#

This leads to a further question, however: When do we need to support network byte order? Really, the only time is when generating messages in wire format to send over the network. In C and C++, we generally represent messages to be sent over the network as structs.

For example, one can imagine a packet format with a 32-bit sequence number. We would want to write uint32_t for this sequence number:

__attribute__((packed))
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    uint32_t sequence_number;
}

However, of course, if it is in big endian byte ordering (as many protocols are), we then have to call htonl when loading this value in:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = htonl(seq_num);

As I said before, I don’t like htonl. I certainly don’t like using uint32_t as the type for sequence_number. So, we can do one of two things:

  • We can use a Rust-style function to convert to byte representation, and use std::array<uint8_t, 4> as the type of sequence_number. This strikes me as equally awkward. We now know that we need to do soemthing other than just assign the value, but we don’t know what that thing is, necessarly.
  • We can make the type more semantic, and use our big_endian wrapper. This is the purpose why I wrote it, and the use case where it makes sense it has an alignment of 1 – wire format structures are often packed.
__attribute__((packed))
// ^^ You may need to add this to `little_endian` as well,
// or you may not need it at all now
struct packet_wire_format {
    uint8_t from_device;
    uint8_t to_device;
    big_endian<uint32_t> sequence_number;
}

Now, when we actually send it over the wire, we will cast or copy this packet_wire_format to get the byte-by-byte representation, and sequence_number will be in big endian, by the invariants of our big_endian class. We will not need to remember to call any function at all, as the class’s interface provides us with only appropriate options:

packet_wire_format packet;

uint32_t seq_num = current_seqnum++;
packet.sequence_number = seq_num; // Performs conversion

The fewer mistakes you can make by accident, the better. And of course, this has the additional advantage that the type of the wire format is more self-documenting.

Similarly, if you read or write from the wire format using read and write methods on a buffer type, those methods should either be parameterized to take endian information along with the values, or you can pass objects of type big_endian as the value to be copied in: big_endian<uint32_t> is just as trivially-copyable as uint32_t.

Conclusions and Loose Ends#

It is a little more awkward to write big_endian for Rust. I would want to use the existing to_be_bytes method in the implementation, and unfortunately that method is not in any trait, as I’ve complained about before. This can easily be remedied by writing our own trait, however, or using external crates that already do so.

However, I wonder if maybe all of these languages should define types that correspond to uint16_t, uint32_t etc, and just are defined to store themselves in network byte order (and perhaps another one that guarantees little endian order). After all, most processors support byteswap instructions, that make writing a value as a byteswap an easy operation. They could be optimized as normal values unless actually written to memory – and only the optimizer knows when they’re actually written to memory. They could even be written to memory in native endianness unless there’s some defined way to get a byte-by-byte pointer to them – and really only the optimizer knows that.

Endianness seems more a configuration on the natural types of the programming language than it does something to be implemented on top of these natural tools. These loops I’m using to do byteswaps are surely not the most efficient way to do it (which is why the non-array based implementation of big_endian is surely more performant even if it is hackish), because processors have some support for non-native endianness baked in. If a C++ vendor provided types like big_endian (and perhaps some do, I’m sure I’ll find out in the comments) it would surely be more performant.

But again, perhaps they should be primitive types. There’s some built-in processor support for them, and only the optimizer knows when the non-native endianness actually should be used.

I am too busy a person to do the research for such a proposal. I don’t know if such a proposal exists. My interest here is simply in using the tools I have to be a good programmer. For that, to_be_bytes and my implementation of big_endian will simply have to suffice.