This is a collection of little Rust thoughts that weren’t complicated enough for a full post. I saved them up until I had a few, and now I’m posting the collection. I plan on continuing to do this again for such little thoughts, thus the #1 in the title.

serde flattening#

What if you want to read a JSON file, process some of the fields, and write it back out, without changing the other fields? Can you still use serde? Won’t it only keep fields that you know about in your data structure?

Turns out, you can parse the fields you want, while also just preserving the fields you don’t!

#[derive(Serialize, Deserialize)]
pub struct {
    pub known_field: KnownField,
    pub known_field2: KnownField2,

    #[serde(flatten)]
    pub unknown_fields: BTreeMap<String, serde_json::Value>,
}

I found out about this in the serde documentation, so it’s not an original insight, but it came in handy for me recently and so I’m trying to raise awareness:

let surprises!#

So, in Jon Gjengset’s popular Twitter thread transcribed here, he wrote this:

Did you know that whether or not let _ = x should move x is actually fairly subtle? https://github.com/rust-lang/rust/issues/10488

I didn’t think much of this, besides making a note to self not to use let _ = x to ever drop anything, which hopefully I wouldn’t have done anyway because drop(x) is much more self-evident in what it intends. I remember also vaguely hoping that it did drop, because in my mind that was the obvious, logical thing for it to do.

But then later, as I was writing a match, I realized why _ couldn’t mean drop, from the match context:

match foo.bar.baz {
    MyEnum::Option1(_) => {
        // This shouldn't move from `foo.bar.baz`, but just
        // inspects whether it is `MyEnum::Option1`. Otherwise, there'd
        // be no straight-forward way to perform that inspection!
        //
        // And indeed, it doesn't.
        None
    }
    MyEnum::Option2(ref baz_inner) => {
        Some(foobar(baz_inner))
    }
}

So, if let _ = x was to be consistent with this use case, well, that meant that _ has to not drop, as it’s important for _ to mean the same thing. And, after all, the left-hand side of a let is just another pattern context!

But wait, I thought! Does this mean that you can write let ref x = y;? Yes, it does. It’s just another way of writing let x = &y;… But just because you can write it that way, doesn’t mean you should. Keeping to idiom is important.

Nevertheless, fun fact! The more you know!

Remember: serde structs Can Be Function-Local#

Let’s say you need to extract three fields out of some JSON, like name, age, and phone_number (which, ironically, is a string in JSON terms, and not a number). One of the great things about Rust and serde is that you can just write those fields in a struct with the Deserialize trait (which is deriveable) and grab the values into such a struct, even if there’s other actual fields in the JSON:

#[derive(Deserialize)]
struct Person {
    name: String,
    phone_number: String,
    age: f64,
}

let person: Person = serde_json::from_str(json_str);

The question then becomes, where should Person go? Well, if you plan on passing around this Person value, and structuring the rest of your code in terms of it, then it should be a prominent type.

But more often, especially in my own code, I immediately split such a structure into its constituent parts, which I then will use for other things:

let Person {
    name,
    phone_number,
    age,
} = serde_json::from_str(json_str);

let handle = person_database.lookup(&name)?;
handle.set_phone_number(&phone_number);
let demographic = demographic_for_age(age.trunc() as u32);

This is very reasonable. It makes sense that our internal data structures would be designed for whatever logic we want to do on them, rather than having them coincidentally match the wire format. For most complicated applications, having the internal data format match the wire format literally is actually sort of a code smell.

So, we often will have types that we use to deserialize (and serialize) JSON in exactly one function. In that situation, the type should in fact be written locally to that function. So in the example above, where struct Person { ... } is immediately followed by the serde_json::from_str, I didn’t just write them next to each other as convenience. I would literally put them together in a function:

fn do_thing(json_str: &str) -> Result<()> {
    do_something_else()?;

    
    #[derive(Deserialize)]
    struct Person {
        name: String,
        phone_number: String,
        age: f64,
    }

    let Person {
        name,
        phone_number,
        age,
    } = serde_json::from_str(json_str);

    let handle = person_database.lookup(&name)?;
    handle.set_phone_number(&phone_number);
    let demographic = demographic_for_age(age.trunc() as u32);
}

I bring this up mostly because many programmers don’t seem to be aware that you can do this, or don’t think to. I’ve seen people write types like Person at the top level. I realize that many programming languages either don’t let you do this sort of embedding, or else strongly discourage it. But I’m a big believer in giving things the least scope they need, and for many serde-related types, that’s function scope.

Rust Shadowing#

Speaking of minimal scope, I wanted to write in praise of Rust’s penchant for shadowing that allows you to not have to come up with a bunch of names for the same thing. Oftentimes, we just convert the same information from type to type: wire format in bytes, to parsed wire format, to application domain format (wrapped in an Option in a Result), to application domain format with errors and absence handled (not wrapped in those things… Fortunately, Rust lets us shadow and re-use names for these different variables, and ultimately we get code that looks something like this (although no type annotations are normally necessary):

let foo: FooTypeC = {
    let foo: FooTypeA = get_foo();
    let foo: FooTypeB = transform_foo(&foo)?;
    match foo {
        Some(foo) => transform_foo_again(foo)?,
        None => FooTypeC::default(),
    }
};

This is really helpful, along with the fact that braces {} enclose expressions, in really minimizing how much scope each variable has. But it’s also really helpful, because if shadowing wasn’t available, what would we name all these different variables? foo_a and foo_b and similar stupid names? This is an issue in certain other programming languages where shadowing isn’t as straight-forward, and the results aren’t fun.