Switching on Strings in Zig
Feb 13, 2025
Newcomers to Zig will quickly learn that you can't switch on a string (i.e. []const u8
). The following code gives us the unambiguous error message cannot switch on strings:
switch (color) {
"red" => {},
"blue" => {},
"green" => {},
"pink" => {},
else => {},
}
I've seen two explanations for why this isn't supported. The first is that there's ambiguity around string identity. Are two strings only considered equal if they point to the same address? Is a null-terminated string the same as its non-null-terminated counterpart? The other reason is that users of switch
[apparently] expect certain optimizations which are not possible with strings (although, presumably, these same users would know that such optimizations aren't possible with string).
Instead, in Zig, there are two common methods for comparing strings.
The most common way to compare strings is using std.mem.eql
with if / else if / else
:
if (std.mem.eql(u8, color, "red") == true) {
} else if (std.mem.eql(u8, color, "blue") == true) {
} else if (std.mem.eql(u8, color, "green") == true) {
} else if (std.mem.eql(u8, color, "pink") == true) {
} else {
}
The implementation for std.mem.eql
depends on what's being compared. Specifically, it has an optimized code path when comparing strings. Although that's what we're interested in, let's look at the non-optimized version:
pub fn eql(comptime T: type, a: []const T, b: []const T) bool {
if (a.len != b.len) return false;
if (a.len == 0 or a.ptr == b.ptr) return true;
for (a, b) |a_elem, b_elem| {
if (a_elem != b_elem) return false;
}
return true;
}
Whether we're dealing with slices of bytes or some other type, if they're of different length, they can't be equal. Once we know that they're the same length, if they point to the same memory, then they must be equal. I'm not a fan of this second check; it might be cheap, but I think it's quite uncommon. Once those initial checks are done, we compare each element (each byte of our string) one at a time.
The optimized version, which is used for strings, is much more involved. But it's fundamentally the same as the above with SIMD to compare multiple bytes at once.
The nature of string comparison means that real-world performance is dependent on the values being compared. We know that if we have 100 if / else if
branches then, at the worse case, we'll need to call std.mem.eql
100 times. But comparing strings of different lengths or strings which differ early will be significantly faster. For example, consider these three cases:
{
const str1 = "a" ** 10_000 ++ "1";
const str2 = "a" ** 10_000 ++ "2";
_ = std.mem.eql(u8, str1, str2);
}
{
const str1 = "1" ++ a" ** 10_000;
const str2 = "2" ++ a" ** 10_000;
_ = std.mem.eql(u8, str1, str2);
}
{
const str1 = "a" ** 999_999;
const str2 = "a" ** 1_000_000;
_ = std.mem.eql(u8, str1, str2);
}
For me, the first comparison takes ~270ns, whereas the other two take ~20ns - despite the last one involving much larger strings. The second case is faster because the difference is early in the string allowing the for
loop to return after only one comparison. The third case is faster because the strings are of a different length: false
is returned by the initial len
check.
The std.meta.stringToEnum
takes an enum type and a string value and returns the corresponding enum value or null. This code prints "you picked: blue"
const std = @import("std");
const Color = enum {
red,
blue,
green,
pink,
};
pub fn main() !void {
const color = std.meta.stringToEnum(Color, "blue") orelse {
return error.InvalidChoice;
};
switch (color) {
.red => std.debug.print("you picked: red\n", .{}),
.blue => std.debug.print("you picked: blue\n", .{}),
.green => std.debug.print("you picked: green\n", .{}),
.pink => std.debug.print("you picked: pink\n", .{}),
}
}
If you don't need the enum type (i.e. Color
) beyond this check, you can leverage Zig's anonymous types. This is equivalent:
const std = @import("std");
pub fn main() !void {
const color = std.meta.stringToEnum(enum {
red,
blue,
green,
pink,
}, "blue") orelse return error.InvalidChoice;
switch (color) {
.red => std.debug.print("you picked: red\n", .{}),
.blue => std.debug.print("you picked: blue\n", .{}),
.green => std.debug.print("you picked: green\n", .{}),
.pink => std.debug.print("you picked: pink\n", .{}),
}
}
It's not obvious how this should perform versus the straightforward if / else if
approach. Yes, we now have a switch
statement that the compiler can [hopefully] optimize, but std.meta.stringToEnum
still has convert our input, "blue"
, into an enum.
The implementation of std.meta.stringToEnum
depends on the number of possible values, i.e. the number of enum values. Currently, if there are more than 100 values, it'll fallback to using the same if / else if
that we explored above. Thus, with more than 100 values it does the if / else if
check PLUS the switch. This should improve in the future.
However, with 100 or fewer values, std.meta.stringToEnum
creates a comptime std.StaticStringMap
which can then be used to lookup the value. std.StaticStringMap
isn't something we've looked at before. It's a specialized map that buckets keys by their length. Its advantage over Zig's other hash maps is that it can be constructed at compile-time. For our Color
enum, the internal state of a StaticStringMap
would look something like:
// keys are ordered by length
keys: ["red", "blue", "pink", "green"];
// values[N] corresponds to keys[N]
values: [.red, .blue, .pink, .green];
// What's this though?
indexes: [0, 0, 0, 0, 1, 3];
It might not be obvious how indexes
is used. Let's write our own get
implementation, simulating the above StaticStringMap
state:
fn get(str: []const u8) ?Color {
const keys = [_][]const u8{"red", "blue", "pink", "green"};
const values = [_]Color{.red, .blue, .pink, .green};
const indexes = [_]usize{0, 0, 0, 0, 1, 3};
if (str.len >= indexes.len) {
return null;
}
var index = indexes[str.len];
while (index < keys.len) {
const key = keys[index];
if (key.len != str.len) {
return null;
}
if (std.mem.eql(u8, key, str)) {
return values[index];
}
index += 1;
}
return null;
}
Take note that keys
are ordered by length. As a naive implementation, we could iterate through the keys until we either find a match or find a key with a longer length. Once we find a key with a longer length, we can stop searching, as all remaining candidates won't match - they'll all be too long. StaticStringMap
goes a step further and records the index within keys
where entries of a specific length begin. indexes[3]
tells us where to start looking for keys with a length of 3 (at index 0). indexes[5]
tells us where to start looking for keys with a length of 5 (at index 3).
Above, we fallback to using std.mem.eql
for any key which is the same length as our target string. StaticStringMap
uses its own "optimized" version:
pub fn defaultEql(a: []const u8, b: []const u8) bool {
if (a.ptr == b.ptr) return true;
for (a, b) |a_elem, b_elem| {
if (a_elem != b_elem) return false;
}
return true;
}
This is the same as the simple std.mem.eql
implementation, minus the length check. This is done because the eql
within our while
loop is only ever called for values with matching length. On the flip side, StaticStringMap
's eql
doesn't use SIMD, so it would be slower for large strings.
In my own benchmarks, in general, I've seen little difference between the two approaches. It does seem like std.meta.stringToEnum
is generally as fast or faster. It also results in more concise code and is ideal if the resulting enum is useful beyond the comparison.
You usually don't have long enum values, so the lack of SIMD-optimization isn't a concern. However, if you're considering building your own StaticStringMap
at compile time with long keys, you should benchmark with a custom eql
function based on std.mem.eql
.
We could manually bucket those if / else if
branches ourselves, similar to what the StaticStringMap
does. Something like:
switch (color.len) {
3 => {
if (std.mem.eql(u8, color, "red") == true) {
return;
}
},
4 => {
if (std.mem.eql(u8, color, "blue") == true) {
return;
}
if (std.mem.eql(u8, color, "pink") == true) {
return;
}
},
5 => {
if (std.mem.eql(u8, color, "green") == true) {
return;
}
},
else => {},
}
Ughhh. This highlights the convenience of using std.meta.stringToEnum
to generate similar code. Also, do remember that std.mem.eql
quickly discards strings of different lengths, which helps to explain why both approaches generally perform similarly.