Endianness, API Design, and Polymorphism in Rust
I have been working on a serialization project recently that involves endianness (also known as byte order), and it caused me to explore parts of the Rust standard library that deals with endianness, and share my thoughts about how endianness should be represented in a programming language and its standard library, as I think this is also something that Rust does better than C++, and also makes for a good case study to talk about API design and polymorphism in Rust.
To start with, let’s discuss endianness a little. I assume most of my audience has some familiarity with endianness; nevertheless, I’d like to explain it from first principles. That way, we can subsequently apply the insights from the explanation to API design. That, and I want practice explaining concepts, even if they are basic. I’ll try to keep it interesting, but also feel free to skim the next section.
Big End, Little End#
I first encountered the concept of endianness when I was first learning
to program using the DEBUG.EXE
program on DOS. When a 16-bit value was
displayed as a 16-bit value, it was just normal hexadecimal, but when
it was displayed as two 8-bit bytes, something weird happened with the
display.
Here’s a C++ snippet that demonstates the effect:
template<typename T>
void display_bytes(const T &val) {
char bytes[sizeof(T)];
memcpy(bytes, &val, sizeof(T));
for (auto byte : bytes) {
printf("%2x ", byte);
}
printf("\n");
}
int main() {
int value = 0x12345678;
printf("%x\n", value);
display_bytes(value);
}
When run on any little-endian processor (the vast majority of processors), we get:
12345678
78 56 34 12
The least significant byte is first, so if you print out the individual bytes in order, you have to read it backwards – though each individual byte is still forwards.
If you were to run the same code on a big-endian processor, you would get:
12345678
12 34 56 78
At this point, little endianness as I understood it was a weird thing that Intel processors did for reasons I didn’t understand, that made me do a little extra work when reading hex dumps. At the time, this was fine: I thought that having to apply this extra arcane knowledge was cool for its own sake. But also, I thought of little endianness as the weird way to do things that required extra work, and big endianness as the more natural design. It wasn’t until much later that I got some nuance on that opinion.
See, the writing system we use for numbers is big endian. Instead of dividing a number into bytes (base 256), we divide it into digits. We consider the left of the page to come before the right of the page, and we write the most significant digit first. This is all taught explicitly in grade school:
1234 = 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0
There’s a certain, very human logic to this system: the more important
information comes first, then the details. Mathematically, though,
we see decreasing numbers for the powers of 10: first we specify
a factor for 10 to the third, then to the second, then to the
first, then to the zeroth. We could instead imagine where
we wrote our decimal numbers big endian, where the same
number would be written 4321
, and still mean “one thousand
two hundred thirty-four,” where we’d count like this:
0
1
2
3
...
9
01
11
21
...
91
02
12
...
99
001
This would have the advantage that the first digit, digit zero,
would be multiplied by 10^0
, digit one by 10^1
. Not what
humans would normally decide to do, but it has a certain logic.
And if you think about languages like Hebrew or Arabic, which
are written right from left, but which write numbers the same
direction we do, the least significant digit is actually reached
first in the normal direction of reading: when they see “100”
in the midst of the text, the zeroes are “before” the “1”. (I am
told that this is not how most people think of it; that they instead
just think of numbers as going the other way from other text, but
it just goes to demonstrate how based on convention all of this
stuff is).
So all of this is to say that, the weird effect we had before with big endian looking “normal” and little endian looking “weird” has nothing to do with the intrinsic logic of big vs little endian, but rather with the fact that we’re mixing a little endian processor with a big endian writing notation. If we instead were to print the digits of each number in increasing significance – that is, if we were to use little endian as our printing convention – we’d get:
# Little Endian Machine
87654321
87 65 43 21
# Big Endian Machine
87654321
21 43 65 87
The mismatch between writing the whole word in hex and writing the individual bytes in order, in hex is caused by a mismatch between the endianness of the system (normally little in practice) and the endianness of the writing system (normally big in practice). When the writing system isn’t a factor, little endian makes more mathematical sense, is easier to reason about in circuitry and code, and therefore has won out over big endian in every major processor architecture.
The only real exception is network byte order, which uses big endian. This is convenient for manually reading hex dumps of packets, but probably has more to do with the fact that the Internet developed when this question was much less settled. Due to the presence of network byte order, however, and the fact that the endianness of Intel and modern ARM is opposite of the endianness of most human writing systems, the concept remains with us.
When is endianness relevant?#
In writing numbers, a digit has no endianness: 8
means the same thing as
a single digit number. Similarly, a byte is indivisible in a processor.
Bytes are made up of bits, but outside of special instructions, the
ordering of those bits is not relevant. One of them is most significant,
one of them is least, but unless we’re indexing them for a special
instruction, or sending them over a wire one by one, there is no way to
say which such bit comes “first.”
Indeed, if we want to display a byte as a series of bits, we as the programmer get to choose the endianness, and the program runs identically on either a big endian or a little endian platform. The little endian version is a little more intuitive, as “2 to the N” is an operation that’s easy to write on computers, and in little endian the N increases as the index increases:
fn byte_as_bits_le(byte: u8) -> [u8; 8] {
let mut res = [0u8; 8];
for i in 0..8 {
let mask = 1 << i;
if byte & mask == 0 {
res[i] = 0;
} else {
res[i] = 1;
}
}
res
}
Nevertheless, at no point is this function relying on the endianness of the hardware, and it does the same thing on either types of hardware.
Why do I bring this up? Well, I don’t think it makes any sense to speak of the endianness of a (multi-byte) word per se. The endianness of the word only comes into play when it is stored as – and accessible as – a series of bytes.
So from that point of view, what are the operations where endianness is relevant? Given a word, what series of bytes comprises it? And then, given a series of bytes, what word is it?
In Rust terms, these are to_be_bytes
(for big endian)/to_le_bytes
(for
little) in the one direction, and from_be_bytes
/from_le_bytes
in the
other. These methods are all bundled together in the Rust
documentation
for – in this case – the primitive u32
type, along with ne
which
gives whatever the native endianness of the processor is.
These are the APIs I’m going to be discussing. But before discussing
how they might be improved, I’m going to point out an API that I think
doesn’t make as much sense:
to_be
. This method takes in a word, a u32
, and outputs
a u32
, and yet claims to change the endianness of that word, which as
I mentioned, does not have endianness per se, only in that it’s represented
by bytes.
I know what they mean by it. On a little endian platform, it will replace
0x12345678
with 0x78563412
. But what does that actually mean? In
its form as a u32
, as I have argued above, a number has no endianness.
So what is this number 0x78563412
? It is the number that, if stored
in bytes in the native endianness, will store the original number in big
endian.
That’s a mouthful, I know, because it’s actually a complicated concept.
That is to say, it’s a hack. We want to write a number – say, 2000
–
in big endian, but we don’t want to think of it as bytes, yet. We want to
be able to load the whole number into a register, and when we write it,
we want it to be 2000
in big endian. So we byte swap the number, and
instead of storing 2000
, we store 3490119680
, so that if we
write it using the processor’s normal mechanism for writing, it comes out
to 2000 in big endian.
Basically, to_be
does the equivalent of u32::from_le_bytes(input.to_be_bytes())
, and using it looks like this:
let input: u32 = 2000;
// These two invocations do the same thing
let be = input.to_be();
let be2: u32 = u32::from_le_bytes(input.to_be_bytes());
println!("{} {}", be, be2); // 3490119680 for both
// The result can be written using native (little endian) byte
// order, and it will give 2000 in big endian byte order.
assert_eq!(be.to_le_bytes(), input.to_be_bytes());
This is arguably a useful hack – though I’m not fully convinced – but
it is definitely a hack. I do not think the description is sufficiently
rigorous. The output of to_be
is not a number “in big endian,” it
is a different number that resembles the big endian representation of
the original number. The description is a simplification, and I think
a conceptually incoherent one – which is understandable because the
concept at play here is so hackish.
It appears that to_be
was in Rust 1.0, and to_be_bytes
was introduced
later. This to me is a good sign, as to_be_bytes
, I think, makes
much more sense as an interface. And as to why we started out with the
to_be
type of interface in Rust, that makes sense as well, because in
C the traditional (POSIX but not ANSI C) functions for these conversions
have similar semantics, such as htonl
(host to network long), where
we have this conceit of storing a “big endian” or “network byte order”
value in a uint32_t
(C for u32
). This always struck me as the wrong
abstraction, but it is justified – or at least more understandable –
for C as we simply can’t pass around things like char[4]
(C for [u8; 4]
)
by value in C.
There are other technical and historical reasons why htonl
and to_be
and friends exist, even if conceptually messy, but in any case, since I’m
talking about API design, and to_be_bytes
and friends are a better match for
the concepts at hand, I am now going to pretend to_be
is deprecated
(it is not), and move on to discussing the design of to_be_bytes
and
to_le_bytes
.
Policies#
So the first thing I notice is that there’s six methods that deal with fundamentally one topic:
from_be_bytes
from_le_bytes
from_ne_bytes
to_be_bytes
to_le_bytes
to_ne_bytes
But really they vary in two ways, namely:
- which operation is performed (
from_X_bytes
vsto_X_bytes
) - which endianness is required (
le
,be
, andne
)
For us humans, this is clear from the names, but to the compiler, these names do not form a pattern that it is capable of recognizing. There are simply 6 separate functions named with 6 separate combinations of characters.
Now, having separate functions for separate operations makes sense; that’s what functions are for. But for the same operation but with different endianness, it might make more sense to indicate that it is one operation with several possible endiannesses by making the endianness into a parameter.
The obvious way to do this would be via run-time parameter. A fairly literal translation of this API would be something like:
enum Endian {
Little,
Big,
Native,
}
impl u32 {
fn to_endian_bytes(self, endianness: Endian) -> [u8; 4] {
match endianness {
Endian::Little | Endian::Native => { ... }
Endian::Big => { ... }
}
}
fn from_endian_bytes([u8: 4], endianness: Endian) -> Self {
match endianness {
Endian::Little | Endian::Native => { ... }
Endian::Big => { ... }
}
}
}
This would also allow us to implement the concept of “native” byte order a little differently, and create more names for byte orders:
enum Endian {
Little,
Big,
}
static NATIVE_ENDIAN: Endian = Endian::Little;
static NETWORK_ENDIAN: Endian = Endian::Big;
So, besides simplifying away the need for a separate implementation
for the ne
functions, and making the code more in sync with what’s
happening, what other positive things have we accomplished? Well,
given that we now have a parameter, we can now make more complicated
code parametric on it. Imagine we have an entire structure to write out,
and we want to write the entire structure as big-endian or little-endian,
perhaps because the protocol in question changed endianness at some version.
Or perhaps we just want to make clear to the reader that one endianness
is used for the entire structure. We can now do something like this:
struct Structure {
a: u32,
b: u32,
c: u32,
}
impl Structure {
fn serialize(&self, endianness: Endian) -> [u8; 12] {
let mut res = [0u8; 12];
[0..4].copy_from_slice(self.a.to_endian_bytes(endianness));
[4..8].copy_from_slice(self.b.to_endian_bytes(endianness));
[8..12].copy_from_slice(self.c.to_endian_bytes(endianness));
res
}
pub fn serialize_old_version(&self) -> [u8; 12] {
self.serialize(Endian::Big)
}
pub fn serialize_new_version(&self) -> [u8; 12] {
self.serialize(Endian::Little)
}
}
The alternative would be to write two separate serializers, and duplicate
all the logic of how to arrange the layout. Duplication is bad, because
bug fixes don’t necessarily get to all the duplicate copies. So, to
save on duplication, we’d have to basically wrap to_be_bytes
and
to_le_bytes
in a version of this; it would be more convenient
if the standard library had done this for us.
What is the downside of this? Well, the implementation didn’t really
get any simpler. Actually, in the normal case, where you don’t change
your mind about endianness, the implementation got more complicated.
We now have a match
expression in our two simplified functions,
which theoretically indicates a run-time decision. We could trust
the optimizer to fold the decision in through inlining and
constant-propagation, but trusting the optimizer is suspicious
and unnecessary.
Nothing we’ve done so far requires this decision to be made at run-time, and so we can instead make the decision at compile-time. Where we had a run-time parameter, we can now have a compile-time parameter.
Now, although Rust has rudimentary support for other kinds of
compile-time parameters, the archetypical compile-time parameter
is a type, bound by a trait. Our enum
from before would then
have to be lifted into the type space, as a trait
and a few
types:
trait Endianness { }
struct BigEndian;
impl Endianness for BigEndian { }
struct LittleEndian;
impl Endianness for LittleEndian { }
Now, we need a compile-time equivalent for match
. This is a
little harder, as at the time of this writing stable Rust does
not have the most direct equivalent of match
for implementors
of trait
s, that is, “specialization.” But Rust does allow something
similar: the code for each branch of the match
must go in each type’s
implementation of that trait, and the fact of the match
must be
provided in the trait itself.
This will also help us simplify the implementation. This to_be_bytes
/
to_le_bytes
API is not just implemented for u32
, but for all primitive
types. Currently, these mostly-similar implementations are stamped out
by a macro, along with other methods for primitive types. But we might
imagine that there are two things going on in the implementation:
- write out the type into an array of bytes
- either swap the bytes, or not, based on whether we’re using the hardware endianness
We could then make the trait come into play – with the decision made at compile time – for the swapping part.
trait Endianness {
fn possibly_swap(bytes: &mut [u8]);
}
struct BigEndian;
impl Endianness for BigEndian {
fn possibly_swap(bytes: &mut [u8]) {
// actually swap here
}
}
impl Endianness for LittleEndian {
fn possibly_swap(_: &mut [u8]) {
// no need to do anything here
}
}
We have now moved some of the implementation into a trait, where the specifics of the implementation are determined by which type implements that trait. This is an example of the policy pattern, where a portion of the code is abstracted out into a policy, and the policy and the main body of the function are sewn together – in this case, at compile-time – into many variations of a function that execute similarly to what an implementor might have written by hand.
Note that there is no possibility of doing run-time endianness
determination in this version. This trait methods does not take a
self
parameter, and would have to be invoked as T::possibly_swap
.
This is possible in Rust because we are doing compile-time polymorphism,
not run-time, so there is no need to make this trait object-safe.
Our previous example serializer, with the two versions, now looks something like this:
struct Structure {
a: u32,
b: u32,
c: u32,
}
impl Structure {
fn serialize<T: Endianness>(&self) -> [u8; 12] {
let res = [0u8; 12];
(&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
(&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
(&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
}
pub fn serialize_old_version(&self) -> [u8; 12] {
self.serialize::<BigEndian>()
}
pub fn serialize_new_version(&self) -> [u8; 12] {
self.serialize::<LittleEndian>()
}
}
The policy pattern is a fairly common pattern in generic programming just like in object-oriented programming, but when generic programming is implemented through monomorphization, as it is in Rust, it can be just as efficient as hand-implementing the combinations of policy and code, while allowing for more policies.
For example, if there were a platform where 4-byte chunks were split into 2-byte chunks little endian, but 2-byte chunks were split into 1-byte chunks big endian, we could write a new policy for this platform and all the existing code would support it.
A much more complicated example of the policy pattern is serde
,
where the generated serializers and deserializers for each structure
are all polymorphic on what serialization format should be used. If a
new serialization format comes out with serde
support, all existing
Serialize
instances can then be used with the new format without
modification.
Now, in practice, there are often processor instructions that do byte
swaps. The hardware uses an interface analogous to the hackish, conceptually
messy to_be()
, which at a hardware level makes sense because elegance of
abstraction is not as an important goal as performance. This converts
0x12345678
into 0x78563412
, and similar. So, this implementation
is not actually what the policy would look like in a production context.
Nevertheless, the endianness argument could definitely be passed in by
a trait-constrained type parameter; the implementation would just be
more complicated.
Traits#
I mentioned before that u32
is not the only type that implements this
set of methods, this convention, this informal protocol of to_be_bytes
,
to_le_bytes
, etc. This means that if we were writing in C++, we
would have enough from this informal protocol to write a function that
did something like “write this value in big endian twice, and little
endian twice, to different locations” that was agnostic to the type
provided, as long as it implemented this informal interface. It would
look something like this:
template <typename T>
void write_four_times(T val) {
write_to_location_1(val.to_be_bytes());
write_to_location_2(val.to_be_bytes());
write_to_location_3(val.to_le_bytes());
write_to_location_4(val.to_le_bytes());
}
This would allow you to call write_four_times
on any type for
which that code made sense, as C++ templates are literally templates,
and the T
is filled in before type-checking. The protocol here
is implicit in the structure of the function – it is compile-time
duck typing.
Rust generic functions are type-checked before monomorphization, so
we can’t do this in Rust. Instead of defining to_le_bytes()
and
friends separately on each type, this function would require them
to be in a trait, maybe EndianBytes
:
fn write_four_times<T: EndianBytes>(val: T) {
write_to_location_1(&val.to_be_bytes());
write_to_location_2(&val.to_be_bytes());
write_to_location_3(&val.to_le_bytes());
write_to_location_4(&val.to_le_bytes());
}
EndianBytes
would have to define at least those methods:
trait EndianBytes {
fn to_be_bytes(self) -> [u8; ???];
fn to_le_bytes(self) -> [u8; ???];
}
Unfortunately, as the ???
shows, the different output arrays have
different lengths – a u16
would be 2 bytes and a u64
8 bytes –
and so the Rust trait system at the time of this writing is (to my
knowledge) not powerful enough to represent this trait as is. Instead,
it would have to return a slice, which introduces an additional run-time
value (the length) into the mix that we’d rather avoid in this
exercise on compile-time generic programming.
Run-Time Endianness#
What if we want to make decisions about endianness at run-time, say, because we are implementing DBus? This is, as Linus Torvalds pointed out in one of his famously angry emails, a stupid idea for a protocol, but we don’t always get to choose what protocol we implement. Even though choosing one endianness and sticking to it would have avoided the run-time cost of making a decision (which as Torvalds points out is more than the cost of either decision), the developers of DBus did not do that. UTF-16 also didn’t – it also does run-time endianness adjustment with a sentinal character at the top of the text block to indicate the endianness.
The most obvious solution is to use the run-time parameterized
version we discussed towards the beginning of this post, and
have an enum Endianness
parameter. This would be parsed in each
message (or connection, or whatever duration of time endianness
is configured) and then passed through to all the serializing and
deserializing code, which would look something like our
original serialization example:
fn serialize(&self, endianness: Endian) -> [u8; 12] {
let res = [0u8; 12];
(&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes(endian));
(&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes(endian));
(&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes(endian));
}
pub fn serialize_old_version(&self) -> [u8; 12] {
self.serialize(Endian::Big)
}
pub fn serialize_new_version(&self) -> [u8; 12] {
self.serialize(Endian::Little)
}
We can do better than that, though. This has one copy of the serialization
code in the source, and one copy in the binary. What we could do instead,
is expand the more sophisticated compile-time version of the serialization
code, and move the match
into a wrapper serialize method:
fn serialize_impl<T: EndiannessTrait>(&self) -> [u8; 12] {
let res = [0u8; 12];
(&mut res[0..4]).copy_from_slice(self.a.to_endian_bytes::<T>());
(&mut res[4..8]).copy_from_slice(self.b.to_endian_bytes::<T>());
(&mut res[8..12]).copy_from_slice(self.c.to_endian_bytes::<T>());
}
pub fn serialize(&self, endianness: EndiannessEnum) -> [u8; 12] {
match endianness {
EndiannessEnum::Big => self.serialize_impl::<BigEndian>(),
EndiannessEnum::Little => self.serialize_impl::<LittleEndian>(),
}
}
This generates two serializers from one serializer function (thus mitigating the biggest problem with code duplication – that of maintainability), and makes the run-time decision further up in the call tree. This ability – to adjust between finer-grained run-time decisions and duplication of run-time code – is one of the greatest powers of C++ and of Rust. We can effectively – in the DBus case – create two entire DBus deserializers – one for little-endian, one for big endian – and then decide between the two deserializers at run-time on a per-message basis, which, because fewer run-time decisions are being made, will be much more efficient than making the run-time deserialization decision at every deserialization site.
Of course, for serialization we can simply write one serializer and always generate little-endian DBus messages.
Subscribe
Find out via e-mail when I make new posts! You can also use RSS (RSS for technical posts only) to subscribe!
Comments
If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.
comments powered by Disqus