UPDATE: I have updated this post to address C++ features that address these issues or have been purported to.

I have long day-dreamed about useful improvements to C++. Some of these are inspired by Rust, but some of these are ideas I already had before I learned Rust. Each of these would make programming C++ a better experience, usually in a minor way.

Explicit self reference instead of implicit this pointer#

UPDATE: This is coming out in C++23, and they did it right! I’m excited! Good job C++!

I admit I haven’t been paying close attention to C++ post C++14. C++17 was up-and-coming and I hadn’t finished learning everything I wanted to about it when I left C++ programming. And I refuse to be embarrassed for not knowing about a feature in a programming language that is not my favorite before any compiler even supports it.

But I am indeed excited for them! This is a substantial improvement I have wanted since well before C++11 came out. They’ve done it pretty close to how I wished for it here, and they have good reasons for how they made it.

There are a few weird parts of this.

For one, it is a pointer, but it is never allowed to be null, and it cannot be modified to point to a different object. In both of these ways, it behaves more like a reference than a pointer.

class Foo {
public:
    void bar() {
        this = new Foo{}; // Error
    }
};

int main() {
    Foo *foo = nullptr;
    foo->bar(); // Undefined behavior
}

For another, when we want to put a modifier on this, like const or volatile, there is nowhere obvious in the function signature to put it. We have to put it awkwardly after the parameters, before the ; or {:

class Foo {
public:
    void bar() const volatile && {
        // Do stuff
    }
};

Oddly enough, whether the parameter is taken by lvalue or rvalue can also be specified, which would make way more sense for a reference parameter instead of a pointer.

The modifiers have to go in this odd location because this is implicit. This is in line with OOP ideology and theory, but in my mind, it’s just a negative. If you have to think about whether it’s const or taken by rvalue anyway when writing the signature, why put those modifiers somewhere you might forget about, instead of right with the declaration of the parameter.

I would change the syntax to fix both of these issues with one fell swoop: allow an explicit self as an alternative to implicit this, and make it a reference:

class Foo {
public:
    void bar(&self) {
        self.baz();
    }

    void baz(volatile const &self) {
        // Do stuff
    }
}

The type would still be implicit, but modifiers can be specified where the type would be. You would also only be able to take by reference or rvalue reference, and never by value, because implicit copy on method call would be a new feature of questionable value. It would not conflict with existing code, as a parameter named self without an explicit type would be illegal under the current syntax.

Of course, this looks rather similar to Rust’s syntax, but believe it or not, I had this idea long before I learned that Rust does self in this way.

A new byte type for uint8_t and int8_t#

In C++, the type we use for an individual byte of data, by definition, is char. This is the definition of char in the standard, and while the byte length (CHAR_BIT) doesn’t have to be 8 bits, other standard provisions and practical considerations mean that on a modern platform, it always is.

We might use uint8_t or int8_t for bytes in practical code, but these are defined as typedefs to unsigned char and signed char – I don’t know whether this is required by the standard but it is always done in practice.

However, char is also the type we use for text data, so it is a type with two different contrasting (perhaps even contradictory) sets of semantics.

That leads to many odd results, including the fact that char cannot represent all Unicode characters because it has to be 1 byte long. But the one I want to focus on today is a bit weirder. What does this code print?

#include <cstdint>
#include <iostream>

struct message_data {
    uint8_t message_type;
    uint8_t message_length;
    uint8_t data[1];
};

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << mesg.message_type << std::endl;
    std::cout << "Length: " << mesg.message_length << std::endl;
}

int main() {
    message_data data;
    data.message_type = 100;
    data.message_length = 0;
    print_message_hdr(data);
    return 0;
}

Well, if you thought the numbers 100 and 0 would show up on the output, you’d be wrong. std::cout’s operator<<’s char overloads are triggered, and so these fields, clearly meant as integers, are printed as text:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: d
Length:
[jim@palatinate:~]$

In order to get the integer print-outs we want, we have to override this strange default behavior, perhaps by casting the values to uint16_t before printing them:

void print_message_hdr(message_data &mesg) {
    std::cout << "Type: " << uint16_t(mesg.message_type) << std::endl;
    std::cout << "Length: " << uint16_t(mesg.message_length) << std::endl;
}

This results in a better output:

[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: 100
Length: 0
[jim@palatinate:~]$

So, how do we make this a little more ergonomic? We introduce a byte type, that is similar to char, but overloads differently. Like any other integer type, it defaults to signed, and then we add overloads to operator<< and others to treat it like an integer, not like a character. Switching between byte and char would be an implicit cast, but for overloading purposes, they would be different types.

uint8_t and int8_t could then be defined in terms of byte.

I do not know what backwards-compatibility implications it has, but I do think the decision to make char mean byte as its primary meaning instead of “character” was a particularly poor one, and anything we can do to migrate away from it would be good.

Update: Someone drew my attention to std::byte. This one I was aware of, but had not thought about here as I didn’t think it really solves the problem. As it is, it is not an arithmetic type, and therefore cannot be used as the underlying type of uint8_t, leaving the confusing behavior in place.

Real if-else Expression Syntax#

Oftentimes, in C++, I find myself writing code like this:

int32_t error_code;
if (setting == Setting::Socket) {
    error_code = initialize_socket();
} else { // setting == Setting::Pipe
    error_code = initialize_pipe();
}

if (error_code < 0) {
    // ...
}

This error_code variable is just one example. I often want to have a variable get different values depending on which side of the if-else statement it’s on, without having to declare the variable without an initializer right ahead of it, and write two assignment statements. Basically, I want if-else to be an expression.

Now, of course, C++ already has the ternary operator: ?:. But it’s so ugly and unreadable that no one uses it, for good reason. It’s hard to remember what the precedence is, meaning if we want to be rigorous and friendly to our readers we need to bracket with ( and ) even if strictly unnecessary, and the result looks like garbage and is hard to format in a way that’s remotely readable:

int32_t error_code = (setting == Setting::Socket
    ? initialize_socket()
    : initialize_pipe()
);

What do I want instead? I want if-else to have this role, to be an expression, where it evaluates to the value of the end of each block (with no semicolon, to make clear that it’s an expression not a full statement):

int32_t error_code = if (setting == Setting::Socket) {
    initialize_socket()
} else {
    initialize_pipe()
};

This is way better than ?:. The blocks can be multiple statements long if necessary. You can add if-else if-else chaining. And, most importantly, it can be formatted like any other if-else.

Update: Someone drew my attention to a lambda-invocation pattern that is, in my mind, equally ugly to ?:, and also leaves you without the ability to return from the enclosing function within the block. This strikes me as extremely hackish and not really an improvement, but I suppose that’s where C++ is going. I am at a loss for why they didn’t just implement GCC’s expression blocks, followed by if as expression. It’s clearly much better in my mind.

I’ve seen the technique from time to time but I guess I figured it was too hackish to mention. I didn’t realize it was getting officially recommended in C++ Core Guidelines. I feel like when they were recommending it, they should’ve simultaneously been trying to get more usable and obvious features included in the programming language itself. Maybe they are, and if so I wish them luck in that! Maybe C++30 will be a safe and usable programming language, equivalent to Rust now.

Variable Shadowing#

On a related note, I want to have multiple variables with the same name shadow, rather than resulting in an error message. I want the new variable with the same name to simply hide the old variable, rather than giving me a “conflicting declaration” error (or similar).

Why? Well, a lot of production code involves taking the same conceptual thing, and migrating it through many types. Without shadowing, we have to use awkward Hungarian notation.

void handle_data(const void *data_v, size_t size) {
    const uint8_t *data_ch = (const char *)data_v;
    std::vector<uint8_t> data{data_ch, data_ch + size};
    // Actually do something with `data`
}

The new way would look like this:

void handle_data(const void *data, size_t size) {
    const uint8_t *data = (const char *)data;
    std::vector<uint8_t> data{data, data + size};
}

This also cuts down on how many variables are in scope at once.

This bugs people who are new to Rust sometimes, but it’s fairly easy to learn, and C++ has asked people to learn much, much harder things. Once learned, it is really useful, as the alternative is to use Hungarian notation or equivalents. It also helps you use the right value, as you won’t accidentally go back and use an old one, as it’s shadowed.

First-Class Support for Sum Types#

std::variant is awful. I know, because few people except die-hards use it, and people use the Rust equivalent, enums, all the time. The weirdest thing about std::variant is that it supposes that all of the variants hold exactly one value, and one variant per type is sufficient. In reality, multiple variants might hold values of the same type, and many variants don’t need a value – both of which are possible but clumsy to express using std::variant’s semantics.

But C++11 already introduced enum class for more powerful enums! Let’s go all the way and add Rust-style values associated with it, for a compiler-implemented tagged union. The implementation of std::option’s fields would be so much simpler.

template <typename T>
enum class option {
    None,
    Some {
        T value;
    },

    // OK, define some methods
}

This interacts with object lifetimes and constructors in a complicated way, but if there were interest, I know it could be figured out. If you don’t think this feature is necessary, I suspect you’ve spent too long programming without it. Once you get used to this, it’s really hard to go without.

Conclusion#

I am not going to do anything to try to make these things happen. I’m sure I’m not the most popular in the C++ community after my long write-ups of how Rust is so much better, and it’s not where my primary interests lie anymore. But, if someone were to make these features happen, it would make my life much easier, when for good reasons, projects I’m working on require me to code in C++.