Zig's @constCast
Jul 06, 2024
In the Coding in Zig section of my Learning Zig series, an invalid snippet was recently pointed out to me. The relevant part was:
if (try stdin.readUntilDelimiterOrEof(&buf, '\n')) |line| {
var name = line;
if (builtin.os.tag == .windows) {
name = std.mem.trimRight(u8, name, "\r");
}
if (name.len == 0) {
break;
}
try lookup.put(name, .{.power = i});
}
The purpose of the code was to look at more cases of dangling pointers: in this example, name
is used as a key in our lookup
map, but it isn't valid beyond the if
block. To show how clever I am, I included code to deal with Windows' different line ending. However, the code failed to compile for Windows.
This issue has nothing to do with Windows, and it has nothing to do with the issue this example was trying to highlight. So let's disentangle and simplify the error case.
We begin with a working function that normalizes an user's name:
fn normalize(name: []u8) void {
std.debug.assert(name.len > 0);
name[0] = std.ascii.toUpper(name[0]);
}
The function mutates the name in-place, which is possible because the name parameter is declared as a []u8
instead of a const []u8
. (This is a valid real-world pattern, but equally common would be to have the normalization process dupe the input and mutate that duped variant, in which case name
could be a []const u8
). The next step in our normalization is to trim spaces. This requires changing our normalize
function to return a new slice:
fn normalize(name: []u8) []u8 {
std.debug.assert(name.len > 0);
name = std.mem.trim(u8, name, " ");
name[0] = std.ascii.toUpper(name[0]);
return name;
}
This code doesn't compile because we're trying to assign a value to name
, and parameters are always const
. So we make a small modification, right?:
var trimmed = std.mem.trim(u8, name, " ");
trimmed[0] = std.ascii.toUpper(trimmed[0]);
return trimmed;
Just like the example in Learning Zig, This code doesn't compile: error: cannot assign to constant on the line which tries to uppercase the first letter. To me, at first, glance, the problem isn't obvious. trimmed
is a var
and it's a slice into name
which, as before, is a []u8
not a const []u8
. What gives?
To understand the issue, we need to look at the definition of std.mem.trim
:
pub fn trim(comptime T: type, slice: []const T, strip: []const T) []const T
This is a generic function, which is to say it can work on any type. Like in this example, there's a good chance that you'll only ever use trim
where T == u8
. We can make this code just a little less abstract by imagining the implemention generated for u8
:
pub fn trim(slice: []const u8, strip: []const u8) []const u8
Both inputs are of type []const u8
, which makes sense. Neither the slice
nor strip
are mutated (the function returns a sub-slice of slice
). Whenever possible, you should make function parameters const
. Not only can this result in optimizations, but it makes it so the function can be used with both non-const
and const
inputs. Because a non-const
value can always be cast to const
, Zig does it implicitly. By having slice
be a []const u8
, trim
is able to operate on both []u8
and []const u8
values.
Our issue isn't with the input parameters, it's with the return type, which is also []const u8
. If we go back to our code, we can see now why Zig refused to compile when we tried to write into trimmed[0]
: the value returned by trim
is []const u8
. Although we declared trimmed
as var
, this only means we can mutate the slice itself (i.e. we can change its length, or change where it points to). The underlying data is a []const u8
, because that's what trim
returns.
@constCast
The simplest solution is to use @constCast
to strip away the const
. This works:
fn normalize(name: []u8) []u8 {
const trimmed = @constCast(std.mem.trim(u8, name, " "));
trimmed[0] = std.ascii.toUpper(trimmed[0]);
return trimmed;
}
@constCast
is similar to @ptrCast
and @alignCast
which I've talked about in Zig Interfaces and Tiptoeing Around @ptrCast. All three are tools to override the compiler. An important part of the compiler's job is to know the type of data and make sure our manipulations of that data is valid. @constCast
is probably the simplest of the three. It tells the compiler: I know you think this is a const
, but trust me, it isn't. This is dangerous because, if you're wrong, what would be a compile-time bug turns into an undefined behavior.
We can easily see this in action. This code won't compile because Zig knows that name
is const
and won't let us write to it. String literals are always constants:
pub fn main() !void {
const name = "leto";
name[0] = 'L';
std.debug.print("{s}\n", .{name});
}
Like our trimmed
variable, we can try to define name
as var
, but we'll get the same error. We've made the slice itself mutable (the len
and ptr
fields), but the underlying data is still const
:
pub fn main() !void {
var name = "leto";
name[0] = 'L';
std.debug.print("{s}\n", .{name});
}
But this version with @constCast
will compile:
pub fn main() !void {
const name = @constCast("leto");
name[0] = 'L';
std.debug.print("{s}\n", .{name});
}
Try to run this code though and it will almost certainly crash. @constCast
and its siblings are unsafe and @constCast
tends to have fewer legitimate use-case than the others. Some people would say you should never use it. Others, myself included, think the world isn't perfect, libraries aren't perfect (which I'd argue std.mem.trim
is a good example of), and it's a useful tool to have. But if you do use it, or its siblings, you must understand the distinction between changing the compiler's perspective and changing reality. @constCast
merely changes the compiler's perspective, the reality remains unchanged. If you're wrong, your code will crash.
If we go back to our non-working example, we can see how @constCast
is a reasonable solution:
fn normalize(name: []u8) []u8 {
std.debug.assert(name.len > 0);
name = @constCast(std.mem.trim(u8, name, " "));
name[0] = std.ascii.toUpper(name[0]);
return name;
}
I say it's "reasonable" because we know name
is mutable and we know trim
returns a slice into name
. I can't think of any future changes to trim
which would make this unsafe. I literally can't think of how you'd change trim
to make this unsafe, and certainly no reasonable change.
Alternatives?
If we look at trim
's signature again:
pub fn trim(comptime T: type, slice: []const T, strip: []const T) []const T
It's tempting to think that we could change the return type to not be const
:
pub fn trim(comptime T: type, slice: []const T, strip: []const T) []T
This works in our specific case where the input slice
is mutable and thus the return slice can be mutable. But now we've dangerously broken the other case: where the input slice
is not mutable. For example, this version of trim
would not work in this common case: trim(" Leto ", " ");
. We'd end up calling @constCast
on data which is a string literal, i.e. a const
. As we just saw, that might compile, but it will crash.
This can be solved in Zig, but it isn't trivial and isn't something I'd feel comfortable doing (let alone explaining). The solution might involve using anytype
instead of a generic. Something like:
pub fn trim(slice: anytype, strip: ???) @TypeOf(slice)
Now if slice
is given as a []const T
our return is []const T
and if slice
is a []T
then return is []T
. That's promising. However, if we called trim(" Ghanima ", " ")
, then TypeOf(slice) == *const [9:0]u8
, which isn't the return type we want. And, we still don't have a type of strip
.
Our solution needs to get more complicated. Something like:
pub fn trim(slice: anytype, strip: TrimStrip(@TypeOf(slice))) TrimReturn(@TypeOf(slice))
Now we can write functions, TrimStrip
and TrimReturn
, to generate the correct type for strip
and our return. These are just normal functions, but they'll be evaluated at comptime (types always have to be known at comptime). In Zig, types and things which return types are, by convention, PascalCase (which is also why the built-in function is @TypeOf
instead of @typeOf
).
This would be my implementation of TrimReturn
:
fn TrimReturn(comptime T: type) type {
switch (@typeInfo(T)) {
.Pointer => |ptr| switch (ptr.size) {
.Slice => return if (ptr.is_const) T else []ptr.child,
.One => switch (@typeInfo(ptr.child)) {
.Array => return if (ptr.is_const) []const std.meta.Elem(ptr.child) else []std.meta.Elem(ptr.child),
else => {},
},
else => {},
},
else => {},
}
@compileError("expected a slice, got: " ++ @typeName(T));
}
Again, this type of comptime programming isn't something I'm confident about. I might be missing cases that should or should not be allowed, or mappings which aren't right. The three empty else
cases are for unsupported types - they'll fall through to the @compileError
which will cause the compiler to emit an error. The @typeInfo
built-in returns a tagged union, currently consisting of 24 possible values (like Int
or Fn
). Here we're only interested in the Pointer
type, which has 4 sub-types based on its size
field: Slice
, One
, Many
and C
. We rely on the is_const
and child
fields of the Pointer
type to generate the correct return type. The goal, at this point, isn't to provide a working example (sorry, I wish it could be), but rather give some insight into this type of comptime programming.
For completeness, where TrimReturn
makes the const-ness of the return type match the const-ness of slice
, TrimStrip
would make it always const
. As such, it would be generally similar to the above, but changed slightly:
fn TrimStrip(comptime T: type) type {
switch (@typeInfo(T)) {
.Pointer => |ptr| switch (ptr.size) {
.Slice => return if (ptr.is_const) T else []const ptr.child,
.One => switch (@typeInfo(ptr.child)) {
.Array => return []const std.meta.Elem(ptr.child),
else => {},
},
else => {},
},
else => {},
}
@compileError("expected a slice, got: " ++ @typeName(T));
}
Conclusion
While we diverted into a poor introduction to comptime programming, the main goal of this post was to introduce @constCast
. To me the interesting part isn't having @constCast
as a tool, but rather seeing and being able to interact and change the compiler's perspective. It seems delicate. Not as in frail, because I think runtime bugs caused by compiler bugs are shockingly rare. But as in beautiful. The compiler can infer so much from so little and enforce correctness.
Having escape hatches, whether through functions like @constCast
or unsafe
blocks found in other languages, can be essential. Be mindful about what those escape hatches are and aren't doing - they aren't changing the data, they're just changing how the compiler treats the data - and remember that you're replacing compile-time safety with the possibility of undefined behavior at runtime - a trade off no one should want.