My Dream C++ Additions
UPDATE: I have updated this post to address C++ features that address these issues or have been purported to.
I have long day-dreamed about useful improvements to C++. Some of these are inspired by Rust, but some of these are ideas I already had before I learned Rust. Each of these would make programming C++ a better experience, usually in a minor way.
Explicit self
reference instead of implicit this
pointer#
UPDATE: This is coming out in C++23, and they did it right! I’m excited! Good job C++!
I admit I haven’t been paying close attention to C++ post C++14. C++17 was up-and-coming and I hadn’t finished learning everything I wanted to about it when I left C++ programming. And I refuse to be embarrassed for not knowing about a feature in a programming language that is not my favorite before any compiler even supports it.
But I am indeed excited for them! This is a substantial improvement I have wanted since well before C++11 came out. They’ve done it pretty close to how I wished for it here, and they have good reasons for how they made it.
There are a few weird parts of this
.
For one, it is a pointer, but it is never allowed to be null, and it cannot be modified to point to a different object. In both of these ways, it behaves more like a reference than a pointer.
class Foo {
public:
void bar() {
this = new Foo{}; // Error
}
};
int main() {
Foo *foo = nullptr;
foo->bar(); // Undefined behavior
}
For another, when we want to put a modifier on this
, like const
or volatile
, there is nowhere obvious in the function signature to
put it. We have to put it awkwardly after the parameters, before the
;
or {
:
class Foo {
public:
void bar() const volatile && {
// Do stuff
}
};
Oddly enough, whether the parameter is taken by lvalue or rvalue can also be specified, which would make way more sense for a reference parameter instead of a pointer.
The modifiers have to go in this odd location because this
is
implicit. This is in line with OOP ideology and theory, but in my mind,
it’s just a negative. If you have to think about whether it’s const
or taken by rvalue
anyway when writing the signature, why put those
modifiers somewhere you might forget about, instead of right with
the declaration of the parameter.
I would change the syntax to fix both of these issues with one fell swoop:
allow an explicit self
as an alternative to implicit this
, and
make it a reference:
class Foo {
public:
void bar(&self) {
self.baz();
}
void baz(volatile const &self) {
// Do stuff
}
}
The type would still be implicit, but modifiers can be specified where
the type would be. You would also only be able to take by reference or
rvalue reference, and never by value, because implicit copy on method
call would be a new feature of questionable value. It would not conflict
with existing code, as a parameter named self
without an explicit type
would be illegal under the current syntax.
Of course, this looks rather similar to Rust’s syntax, but believe it or
not, I had this idea long before I learned that Rust does self
in this
way.
A new byte
type for uint8_t
and int8_t
#
In C++, the type we use for an individual byte of data, by definition,
is char
. This is the definition of char
in the standard, and while
the byte length (CHAR_BIT
) doesn’t have to be 8 bits, other standard
provisions and practical considerations mean that on a modern platform,
it always is.
We might use uint8_t
or int8_t
for bytes in practical code, but
these are defined as typedef
s to unsigned char
and signed char
–
I don’t know whether this is required by the standard but it is always
done in practice.
However, char
is also the type we use for text data, so it is a type
with two different contrasting (perhaps even contradictory) sets of
semantics.
That leads to many odd results, including the fact that char
cannot represent all Unicode characters because it has to be 1 byte
long. But the one I want to focus on today is a bit weirder.
What does this code print?
#include <cstdint>
#include <iostream>
struct message_data {
uint8_t message_type;
uint8_t message_length;
uint8_t data[1];
};
void print_message_hdr(message_data &mesg) {
std::cout << "Type: " << mesg.message_type << std::endl;
std::cout << "Length: " << mesg.message_length << std::endl;
}
int main() {
message_data data;
data.message_type = 100;
data.message_length = 0;
print_message_hdr(data);
return 0;
}
Well, if you thought the numbers 100
and 0
would show up on the
output, you’d be wrong. std::cout
’s operator<<
’s char
overloads
are triggered, and so these fields, clearly meant as integers,
are printed as text:
[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: d
Length:
[jim@palatinate:~]$
In order to get the integer print-outs we want, we have to override
this strange default behavior, perhaps by casting the values to
uint16_t
before printing them:
void print_message_hdr(message_data &mesg) {
std::cout << "Type: " << uint16_t(mesg.message_type) << std::endl;
std::cout << "Length: " << uint16_t(mesg.message_length) << std::endl;
}
This results in a better output:
[jim@palatinate:~]$ c++ -std=c++11 test.cpp
[jim@palatinate:~]$ ./a.out
Type: 100
Length: 0
[jim@palatinate:~]$
So, how do we make this a little more ergonomic? We introduce a byte
type, that is similar to char
, but overloads differently. Like any
other integer type, it defaults to signed
, and then we add overloads
to operator<<
and others to treat it like an integer, not like a
character. Switching between byte
and char
would be an implicit
cast, but for overloading purposes, they would be different types.
uint8_t
and int8_t
could then be defined in terms of byte
.
I do not know what backwards-compatibility implications it has, but
I do think the decision to make char
mean byte as its primary
meaning instead of “character” was a particularly poor one, and anything
we can do to migrate away from it would be good.
Update: Someone drew my attention to
std::byte
. This one I was aware of, but had not thought about here as I didn’t think it really solves the problem. As it is, it is not an arithmetic type, and therefore cannot be used as the underlying type ofuint8_t
, leaving the confusing behavior in place.
Real if-else Expression Syntax#
Oftentimes, in C++, I find myself writing code like this:
int32_t error_code;
if (setting == Setting::Socket) {
error_code = initialize_socket();
} else { // setting == Setting::Pipe
error_code = initialize_pipe();
}
if (error_code < 0) {
// ...
}
This error_code
variable is just one example. I often want to
have a variable get different values depending on which side of the
if
-else
statement it’s on, without having to declare the variable
without an initializer right ahead of it, and write two assignment
statements. Basically, I want if
-else
to be an expression.
Now, of course, C++ already has the ternary operator: ?:
. But it’s so
ugly and unreadable that no one uses it, for good reason. It’s hard
to remember what the precedence is, meaning if we want to be rigorous
and friendly to our readers we need to bracket with (
and )
even if
strictly unnecessary, and the result looks like garbage and is hard
to format in a way that’s remotely readable:
int32_t error_code = (setting == Setting::Socket
? initialize_socket()
: initialize_pipe()
);
What do I want instead? I want if
-else
to have this role, to
be an expression, where it evaluates to the value of the end of
each block (with no semicolon, to make clear that it’s an expression
not a full statement):
int32_t error_code = if (setting == Setting::Socket) {
initialize_socket()
} else {
initialize_pipe()
};
This is way better than ?:
. The blocks can be multiple statements
long if necessary. You can add if
-else if
-else
chaining. And,
most importantly, it can be formatted like any other if
-else
.
Update: Someone drew my attention to a lambda-invocation pattern that is, in my mind, equally ugly to
?:
, and also leaves you without the ability to return from the enclosing function within the block. This strikes me as extremely hackish and not really an improvement, but I suppose that’s where C++ is going. I am at a loss for why they didn’t just implement GCC’s expression blocks, followed byif
as expression. It’s clearly much better in my mind.I’ve seen the technique from time to time but I guess I figured it was too hackish to mention. I didn’t realize it was getting officially recommended in C++ Core Guidelines. I feel like when they were recommending it, they should’ve simultaneously been trying to get more usable and obvious features included in the programming language itself. Maybe they are, and if so I wish them luck in that! Maybe C++30 will be a safe and usable programming language, equivalent to Rust now.
Variable Shadowing#
On a related note, I want to have multiple variables with the same name shadow, rather than resulting in an error message. I want the new variable with the same name to simply hide the old variable, rather than giving me a “conflicting declaration” error (or similar).
Why? Well, a lot of production code involves taking the same conceptual thing, and migrating it through many types. Without shadowing, we have to use awkward Hungarian notation.
void handle_data(const void *data_v, size_t size) {
const uint8_t *data_ch = (const char *)data_v;
std::vector<uint8_t> data{data_ch, data_ch + size};
// Actually do something with `data`
}
The new way would look like this:
void handle_data(const void *data, size_t size) {
const uint8_t *data = (const char *)data;
std::vector<uint8_t> data{data, data + size};
}
This also cuts down on how many variables are in scope at once.
This bugs people who are new to Rust sometimes, but it’s fairly easy to learn, and C++ has asked people to learn much, much harder things. Once learned, it is really useful, as the alternative is to use Hungarian notation or equivalents. It also helps you use the right value, as you won’t accidentally go back and use an old one, as it’s shadowed.
First-Class Support for Sum Types#
std::variant
is awful. I know, because few people except die-hards
use it, and people use the Rust equivalent, enum
s, all the time. The
weirdest thing about std::variant
is that it supposes that all
of the variants hold exactly one value, and one variant per type
is sufficient. In reality, multiple variants might hold values of
the same type, and many variants don’t need a value – both of which
are possible but clumsy to express using std::variant
’s semantics.
But C++11 already introduced enum class
for more powerful enum
s! Let’s
go all the way and add Rust-style values associated with it, for a
compiler-implemented tagged union. The implementation of std::option
’s
fields would be so much simpler.
template <typename T>
enum class option {
None,
Some {
T value;
},
// OK, define some methods
}
This interacts with object lifetimes and constructors in a complicated way, but if there were interest, I know it could be figured out. If you don’t think this feature is necessary, I suspect you’ve spent too long programming without it. Once you get used to this, it’s really hard to go without.
Conclusion#
I am not going to do anything to try to make these things happen. I’m sure I’m not the most popular in the C++ community after my long write-ups of how Rust is so much better, and it’s not where my primary interests lie anymore. But, if someone were to make these features happen, it would make my life much easier, when for good reasons, projects I’m working on require me to code in C++.
Subscribe
Find out via e-mail when I make new posts! You can also use RSS (RSS for technical posts only) to subscribe!
Comments
If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.
comments powered by Disqus