Endianness, and why I don’t like htons(3) and friends
Endianness is a long-standing headache for many a computer science student, and a thorn in the side of practitioners. I have already written some about it in a different context. Today, I’d like to talk more about how to deal with endianness in programming languages and APIs, especially how to deal with it in a principled, type-safe way.
Before we get to that, I want to make some preliminary clarifications about endianness, which will help inform our API design.
Why Little Endian Bugs Us#
New students often are more confused by little endian (where the least-significant component of an integer is stored first), and until they are told about it, they tend to assume computers are big endian (where the most-significant component is stored first) even if they don’t know that word. This is due primarily to the fact that big endian is what they’re used to: We write numbers with the most significant digit on the left, and in languages that write from left to write (including English, the lingua franca of programming among other things), this means that we live our day to day lives in big endian. But that doesn’t mean that big endian is more logical in any way, just that it is more conventional.
This isn’t helped by the fact that many learners are first exposed to little endian by it being confusing, and making them do more cognitive work, by reading little endian numbers from a hex dump. Take, for example, this code, which displays a 32-bit number in hexadecimal, and then displays the individual bytes of the same number as a hex dump:
uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[0], bytes[1], bytes[2], bytes[3]);
This results in this befudding output:
12345678
78 56 34 12
When read as a number, we can just read the number normally. However, when read as a series of bytes, we find ourselves having to read the number from right to left to read the number as big endian, as we are accustomed to doing. We can’t even just read backwards, however, as each byte is still printed internally according to our big endian convention: the higher-order hex digit is still printed first, followed by the lower-order hex digit.
The problem here isn’t little endian. The problem is that the printing
functionality accommodates our big endian preference in printing, but
only at the level of printing an individual number, either as a byte
or as a 32-bit word. The word printed as a whole is printed big
endian, to accommodate us. The individual bytes are also printed
big endian, to accommodate us. However, the hex dump as a whole
is printed with the lower values on the left, and the higher values
on the right, to similarly accommodate our values that lower-indexed
memory, memory that comes earlier, should be on the left. On a little
endian system, this desire to print each number with the most
significant digit on the left, but to print a sequence of numbers
from left to right, leads to the contradiction. The resulting
last line, 78 56 34 12
, isn’t, properly speaking, little endian.
The print-out is an odd type of mixed endian, due to our awkward
conventions.
There is actually a relatively easy fix: if we insist on reading numbers with the most significant digit on the right (which we do), and the computer insists on storing less significant components first (which it does), these two desires can be reconciled by printing the hex dump from right to left:
uint32_t number = 0x12345678;
printf("%08X\n", number);
uint8_t bytes[4];
memcpy(bytes, &number, 4);
printf("%02X %02X %02X %02X\n", bytes[3], bytes[2], bytes[1], bytes[0]);
This results in a much cleaner print-out:
12345678
12 34 56 78
This should make clear that the weirdness of little endian is entirely due to our preference for big endian, and our preference for listing the lower-indexed values to the left, and how these preferences interact. It is because of human conventions, not because of any intrinsic problem with little endian. I would argue that, on little endian systems, all hex dumps should be right to left, and that would help, but there is little I can do to change the conventions of this.
Now, almost all modern systems are little endian, either because they are typically configured that way for processors that support either endianness, or because they only support little endian, like Intel processors. The few programmers who have to write code for big endian systems find themselves in the minority, and find themselves doing extra work to deal with other code that no longer accommodates big endianness.
There is one big exception to this: the Internet. All of the Internet protocols are designed to use big endian ordering, known in this context as “network byte ordering.” This is because when the Internet protocols were developed, big endian was a viable rival to little endian, and both byte orders were common.
This does make some sense, as well, because hex dumps of packets are very common, and big endian does make those hex dumps easier to read and reckon with for us big endian humans.
When Endianness Comes In#
I would also like to clarify something about how endianness works. A 32-bit word in a register in the processor is neither big endian nor little endian. The processor needs to be designed knowing which bits are more significant, and which are less, but there is no intrinsic way in which the less significant bits come “first.” In a word-based memory system, where only entire words were stored in memory (like the PDP-7 was with its 16-bit words), and where it was impossible to address memory in terms of individual bytes, this would be the end of it.
As an example of this, see the documentation for
std::endian
on CppReference.com:
If all scalar types have
sizeof
equal to 1, endianness does not matter and all three values,std::endian::little
,std::endian::big
, andstd::endian::native
are the same.
However, once we come up with the idea that memory is made up of bytes, the endianness question arises: How do we split this 32-bit number into bytes? Which end of it should be byte 0, and which end byte 3? Similarly, if we read a series of bytes into memory, where should the first byte (by memory address) go in the register, the most significant (big) end, or the least significant (little) end?
As a result, types like uint32_t
(and uint16_t
and uint64_t
) have no
intrinsic endianness, so long as they are stored in registers. Only if
they are written to memory, or read from memory, does their endianness
matter. And then, it only matters if the actual byte representation is
important – if we, as in the code above, use memcpy
to copy their
representation, byte by byte, into an array of bytes.
In general, if the byte representation does matter, I would argue that
uint32_t
should be treated as an abstract 32-bit value, devoid of
endianness. Only when it is transcribed as a series of bytes should
endianness be taken into account – and then the description should
instead have the type of uint8_t[4]
in C (or std::array<uint8_t, 4>
in C++ or [u8; 4]
in Rust).
The Main Argument: Why I dislike htons
and friends#
In C, however, we do not in fact do this. We instead have functions
like htons
, with this signature:
uint16_t htons(uint16_t hostshort);
uint16_t http_port = htons(80);
This function purports to convert a 16-bit number from host endianness
(typically little) to network endianness (always big). Assuming a little
endian computer, it does a byteswap: It swaps the less significant 8
bits with the more significant 8 bits in the register used to return
the uint16_t
.
So what are the properties of the returned uint16_t
? If we passed in,
for example, 80 (the port of HTTP), http_port
, the new uint16_t
is
20480 – because 80 is 0x0050
in hex, and we’ve swapped the two bytes,
so we now have 0x5000
. What is this number?
It is not, to be clear, a uint16_t
value 80 that is now in “big endian,”
though we might say that as a manner of speaking. It is almost certainly
in a register, and as mentioned before, registers don’t have intrinsic
endianness. It is something far more awkward: It is a value that, if
we were to store it in little endian (the only option), results in a
different number being stored in big endian.
To expand on this: 20480 is not a particularly meaningful number. It is
not actually the port number we want to use. And it has nothing to do
with the actual number 20480. It is simply a number that, if we store
it in memory as bytes, will result in 0x00
being stored, followed by
0x50
– the big endian representation of 80. It is a uint16_t
with
a value chosen not for what number we want to store, but what bytes we
will get if we store http_port
as bytes.
Since uint16_t
is designed to store numbers, not collections of bytes,
I would argue that this type is not being used in a semantically honest
way – it is a lie. What we are really storing is an array of 2 bytes,
2 uint8_t
s. We are storing it in a 16-bit register, and implementation-wise
that might be a good decision – but I would argue, if we want that to
be possible, we should create an ABI where uint8_t[2]
should be storable
in a single register. The C programming languages, by not making arrays
first-class types, is getting in our way here, which explains
the situation.
Am I exaggerating when I say the type is a lie? Well, we expect to be
able to do arithmetic on a uint16_t
, to be able to test, for example,
whether it is less than 1024, as listening on a port less than 1024 is
a privileged operation. But in order to do that, we have to convert
it back to a normal uint16_t
– all uint16_t
’s usual arithmetic
operators are inappropriate for data that’s stored with its bytes
swapped around.
So what should be done? Well, if we really intend to express a value in network byte order, e.g. big endian, we are changing the semantics of the information from “this is a 16-bit integer” to “this is a specific sequence of two bytes, chosen for a reason.” Therefore, the return value of htons should be an aggregate of two bytes.
Again, because of pointer decay this is impossible to express
straight-forwardly in C, although a wrapper struct could be used.
C++ takes care of this by having a built-in wrapper struct for
arrays, namely std::array
. The equivalent of htons
would
not emphasize that the uint16_t
is in the host order (which I
think is the wrong way of thinking about it), but would simply
indicate that we’re just storing this short in a big-endian
fashion (as opposed to the hardware-supported default storage
we can access with a memcpy
):
std::array<uint8_t, 2> store_short_as_big_endian(uint16_t value);
Rust already provides this as an alternative:
impl u16 {
pub const fn to_be_bytes(self) -> [u8; 2] {
// ...
}
}
Unfortunately for semantics, Rust still has the problematic
signature for to_be
:
impl u16 {
pub const fn to_be(self) -> u16 {
// ...
}
}
Perhaps this is due to efficiency reasons, or felt efficiency.
Programmers know that this byteswapped value should, for performance,
be stored in a single register. Programmers can feel more confident
that this is actually done if it remains a u16
(or uint16_t
) than
if it is transformed into an array of bytes, however semantically
inappropriate the u16
is.
However, if we are using a u16
or uint16_t
as an implementation layer
for what is in fact a way of storing two bytes in the opposite order
than the one that makes sense for our processor, if we are using it as
an implementation trick to do something semantically different from what
a uint16_t
normally does, then we should at least make the type distinct
to give the maintenance programmer and compiler some ability to avoid
letting us do non-sensical things (like comparing the value using
uint16_t
’s comparison operator).
Luckily, there is a design pattern for using the implementation of a
type, but applying different semantics to it: the newtype pattern. We
typically think of it as a Haskell or Rust thing, but we can use it in
C++ as well. I would argue that if we’re going to abuse uint16_t
s
and friends in such a way, we should at least abstract it using the
newtype pattern. In C++, this would look something like this, assuming
a little endian computer:
template <typename T>
class big_endian {
T value;
public:
big_endian() = default;
big_endian& operator=(const big_endian&) = default;
big_endian(T in) {
*this = in;
}
big_endian& operator=(T in) {
value = std::byteswap(in);
return *this;
}
operator T() {
return std::byteswap(value);
}
};
Adding appropriate if constexpr
expressions to also support
big endian machines, and defining std::byteswap
if you don’t
have it yet on your system is left as an exercise to the reader.
But it works on my (little endian) system:
int main() {
big_endian<uint16_t> be = 80;
std::array<uint8_t, 2> be_bytes;
memcpy(be_bytes.data(), &be, 2);
printf("%04X\n", uint16_t(be));
printf("%02X %02X\n", be_bytes[0], be_bytes[1]);
return 0;
}
I would much rather use this to represent “we want to store a value
in a register byte-swapped on some platforms” than a uint16_t
with
no additional type information. You cannot accidentally run invalid
uint16_t
operators on it, but you can convert it to a normal uint16_t
first and then use those operators. However, it does have a big endian
representation when stored, as indicated by the memcpy
, and it can still
be stored in a single register.
Even so, I would still not prioritize that ability to store it in a
single register in most situations. Using a uint16_t
to store the
bytes swapped is still not remotely “storing a big endian value in a
uint16_t
,” it is “storing a big endian representation in a uint16_t
so that when the processor writes that uint16_t
little endian, we get
a big endian representation of the number we actually want.” It’s still
fundamentally a hack for performance, and while I’m comfortable with
it contained within the encapsulation of this little_endian
class,
I would still rather actually write std::array<uint8_t, sizeof(T)>
as
the underlying storage type, unless the optimization is actually needed.
I actually would use a big_endian
class that would look more like this:
template <typename T>
class big_endian {
std::array<uint8_t, sizeof(T)> be_representation;
static void swap_array(std::array<uint8_t, sizeof(T)> &arr) {
for (auto it = arr.begin(), jt = arr.end() - 1;
it < jt;
++it, --jt) {
std::swap(*it, *jt);
}
}
public:
big_endian() = default;
big_endian& operator=(const big_endian&) = default;
big_endian(T in) {
*this = in;
}
big_endian& operator=(T in) {
memcpy(be_representation.data(), &in, sizeof(T));
swap_array(be_representation);
return *this;
}
operator T() {
auto bytes_copy = be_representation;
swap_array(bytes_copy);
T out;
memcpy(&out, bytes_copy.data(), sizeof(T));
return out;
}
};
This now feels like I’m actually representing accurately what a big endian representation is: a way of storing a number as a sequence of bytes, rather than however the processor feels like storing it, and certainly rather than as a value that the processor will store as little endian, but which will store the value we actually want to store as big endian. I won’t lie and say the optimizer will make it equally performant, and if I needed to actually optimize I would use the other version, but I feel like this version is hack-free. (Again, it still only works on little endian platforms – fixing this is again left as an exercise.)
This version has the added benefit of having an alignment of 1, which I
will argue later is more appropriate than using the underlying alignment
of uint16_t
, uint32_t
, etc.
Using These “Big Endian” Types#
This leads to a further question, however: When do we need to support
network byte order? Really, the only time is when generating messages
in wire format to send over the network. In C and C++, we generally
represent messages to be sent over the network as struct
s.
For example, one can imagine a packet format with a 32-bit sequence
number. We would want to write uint32_t
for this sequence number:
__attribute__((packed))
struct packet_wire_format {
uint8_t from_device;
uint8_t to_device;
uint32_t sequence_number;
}
However, of course, if it is in big endian byte ordering (as many
protocols are), we then have to call htonl
when loading this value
in:
packet_wire_format packet;
uint32_t seq_num = current_seqnum++;
packet.sequence_number = htonl(seq_num);
As I said before, I don’t like htonl
. I certainly don’t like
using uint32_t
as the type for sequence_number
. So, we can do one of
two things:
- We can use a Rust-style function to convert to byte representation,
and use
std::array<uint8_t, 4>
as the type ofsequence_number
. This strikes me as equally awkward. We now know that we need to do soemthing other than just assign the value, but we don’t know what that thing is, necessarly. - We can make the type more semantic, and use our
big_endian
wrapper. This is the purpose why I wrote it, and the use case where it makes sense it has an alignment of 1 – wire format structures are often packed.
__attribute__((packed))
// ^^ You may need to add this to `little_endian` as well,
// or you may not need it at all now
struct packet_wire_format {
uint8_t from_device;
uint8_t to_device;
big_endian<uint32_t> sequence_number;
}
Now, when we actually send it over the wire, we will cast or copy
this packet_wire_format
to get the byte-by-byte representation,
and sequence_number
will be in big endian, by the invariants
of our big_endian
class. We will not need to remember to call
any function at all, as the class’s interface provides us with
only appropriate options:
packet_wire_format packet;
uint32_t seq_num = current_seqnum++;
packet.sequence_number = seq_num; // Performs conversion
The fewer mistakes you can make by accident, the better. And of course, this has the additional advantage that the type of the wire format is more self-documenting.
Similarly, if you read or write from the wire format using
read and write methods on a buffer type, those methods should
either be parameterized to take endian information along with
the values, or you can pass objects of type big_endian
as the value to be copied in: big_endian<uint32_t>
is just
as trivially-copyable as uint32_t
.
Conclusions and Loose Ends#
It is a little more awkward to write big_endian
for Rust.
I would want to use the existing to_be_bytes
method in the
implementation, and unfortunately that method is not in any
trait, as I’ve complained about before.
This can easily be remedied by writing our own trait, however,
or using external crates that already do so.
However, I wonder if maybe all of these languages should define types
that correspond to uint16_t
, uint32_t
etc, and just are defined to
store themselves in network byte order (and perhaps another one that
guarantees little endian order). After all, most processors support
byteswap instructions, that make writing a value as a byteswap an easy
operation. They could be optimized as normal values unless actually
written to memory – and only the optimizer knows when they’re actually
written to memory. They could even be written to memory in native
endianness unless there’s some defined way to get a byte-by-byte pointer
to them – and really only the optimizer knows that.
Endianness seems more a configuration on the natural types of the
programming language than it does something to be implemented on
top of these natural tools. These loops I’m using to do byteswaps
are surely not the most efficient way to do it (which is why the
non-array based implementation of big_endian
is surely more
performant even if it is hackish), because processors have some
support for non-native endianness baked in. If a C++ vendor
provided types like big_endian
(and perhaps some do, I’m sure I’ll
find out in the comments) it would surely be more performant.
But again, perhaps they should be primitive types. There’s some built-in processor support for them, and only the optimizer knows when the non-native endianness actually should be used.
I am too busy a person to do the research for such a proposal. I don’t
know if such a proposal exists. My interest here is simply in using the
tools I have to be a good programmer. For that, to_be_bytes
and
my implementation of big_endian
will simply have to suffice.
Subscribe
Find out via e-mail when I make new posts! You can also use RSS (RSS for technical posts only) to subscribe!
Comments
If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.
comments powered by Disqus