Rust Tidbits #1
This is a collection of little Rust thoughts that weren’t complicated enough for a full post. I saved them up until I had a few, and now I’m posting the collection. I plan on continuing to do this again for such little thoughts, thus the #1 in the title.
serde
flattening#
What if you want to read a JSON file, process some of the fields, and
write it back out, without changing the other fields? Can you still use
serde
? Won’t it only keep fields that you know about in your data
structure?
Turns out, you can parse the fields you want, while also just preserving the fields you don’t!
#[derive(Serialize, Deserialize)]
pub struct {
pub known_field: KnownField,
pub known_field2: KnownField2,
#[serde(flatten)]
pub unknown_fields: BTreeMap<String, serde_json::Value>,
}
I found out about this in the serde
documentation,
so it’s not an original insight, but it came in handy for me recently
and so I’m trying to raise awareness:
let
surprises!#
So, in Jon Gjengset’s popular Twitter thread transcribed here, he wrote this:
Did you know that whether or not
let _ = x
should movex
is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488
I didn’t think much of this, besides making a note to self not to use
let _ = x
to ever drop anything, which hopefully I wouldn’t have done
anyway because drop(x)
is much more self-evident in what it intends.
I remember also vaguely hoping that it did drop, because in my mind that
was the obvious, logical thing for it to do.
But then later, as I was writing a match
, I realized why _
couldn’t
mean drop, from the match context:
match foo.bar.baz {
MyEnum::Option1(_) => {
// This shouldn't move from `foo.bar.baz`, but just
// inspects whether it is `MyEnum::Option1`. Otherwise, there'd
// be no straight-forward way to perform that inspection!
//
// And indeed, it doesn't.
None
}
MyEnum::Option2(ref baz_inner) => {
Some(foobar(baz_inner))
}
}
So, if let _ = x
was to be consistent with this use case, well,
that meant that _
has to not drop, as it’s important for _
to
mean the same thing. And, after all, the left-hand side of a let
is just another pattern context!
But wait, I thought! Does this mean that you can write let ref x = y;
? Yes, it does. It’s just another way of writing let x = &y;
…
But just because you can write it that way, doesn’t mean you should.
Keeping to idiom is important.
Nevertheless, fun fact! The more you know!
Remember: serde
struct
s Can Be Function-Local#
Let’s say you need to extract three fields out of some JSON, like
name
, age
, and phone_number
(which, ironically, is a string in
JSON terms, and not a number). One of the great things about Rust
and serde
is that you can just write those fields in a struct
with the Deserialize
trait (which is deriveable)
and grab the values into such a struct, even if there’s
other actual fields in the JSON:
#[derive(Deserialize)]
struct Person {
name: String,
phone_number: String,
age: f64,
}
let person: Person = serde_json::from_str(json_str);
The question then becomes, where should Person
go? Well, if you
plan on passing around this Person
value, and structuring the rest
of your code in terms of it, then it should be a prominent type.
But more often, especially in my own code, I immediately split such a structure into its constituent parts, which I then will use for other things:
let Person {
name,
phone_number,
age,
} = serde_json::from_str(json_str);
let handle = person_database.lookup(&name)?;
handle.set_phone_number(&phone_number);
let demographic = demographic_for_age(age.trunc() as u32);
This is very reasonable. It makes sense that our internal data structures would be designed for whatever logic we want to do on them, rather than having them coincidentally match the wire format. For most complicated applications, having the internal data format match the wire format literally is actually sort of a code smell.
So, we often will have types that we use to deserialize (and serialize)
JSON in exactly one function. In that situation, the type should in
fact be written locally to that function. So in the example above,
where struct Person { ... }
is immediately followed by the
serde_json::from_str
, I didn’t just write them next to each
other as convenience. I would literally put them together in
a function:
fn do_thing(json_str: &str) -> Result<()> {
do_something_else()?;
#[derive(Deserialize)]
struct Person {
name: String,
phone_number: String,
age: f64,
}
let Person {
name,
phone_number,
age,
} = serde_json::from_str(json_str);
let handle = person_database.lookup(&name)?;
handle.set_phone_number(&phone_number);
let demographic = demographic_for_age(age.trunc() as u32);
}
I bring this up mostly because many programmers don’t seem to be
aware that you can do this, or don’t think to. I’ve seen people write
types like Person
at the top level. I realize that many programming
languages either don’t let you do this sort of embedding, or else strongly
discourage it. But I’m a big believer in giving things the least scope
they need, and for many serde
-related types, that’s function scope.
Rust Shadowing#
Speaking of minimal scope, I wanted to write in praise of Rust’s penchant
for shadowing that allows you to not have to come up with a bunch of names
for the same thing. Oftentimes, we just convert the same information
from type to type: wire format in bytes, to parsed wire format, to
application domain format (wrapped in an Option
in a Result
), to
application domain format with errors and absence handled (not wrapped
in those things… Fortunately, Rust lets us shadow and re-use names
for these different variables, and ultimately we get code that looks
something like this (although no type annotations are normally necessary):
let foo: FooTypeC = {
let foo: FooTypeA = get_foo();
let foo: FooTypeB = transform_foo(&foo)?;
match foo {
Some(foo) => transform_foo_again(foo)?,
None => FooTypeC::default(),
}
};
This is really helpful, along with the fact that braces {
… }
enclose expressions, in really minimizing how much scope each variable
has. But it’s also really helpful, because if shadowing wasn’t available,
what would we name all these different variables? foo_a
and foo_b
and similar stupid names? This is an issue in certain other programming
languages where shadowing isn’t as straight-forward, and the results
aren’t fun.
Newsletter
Find out via e-mail when I make new posts!