homedark

Switching on Strings in Zig

Feb 13, 2025

Newcomers to Zig will quickly learn that you can't switch on a string (i.e. []const u8). The following code gives us the unambiguous error message cannot switch on strings:

switch (color) {
    "red" => {},
    "blue" => {},
    "green" => {},
    "pink" => {},
    else => {},
}

I've seen two explanations for why this isn't supported. The first is that there's ambiguity around string identity. Are two strings only considered equal if they point to the same address? Is a null-terminated string the same as its non-null-terminated counterpart? The other reason is that users of switch [apparently] expect certain optimizations which are not possible with strings (although, presumably, these same users would know that such optimizations aren't possible with string).

Instead, in Zig, there are two common methods for comparing strings.

The most common way to compare strings is using std.mem.eql with if / else if / else:

if (std.mem.eql(u8, color, "red") == true) {

} else if (std.mem.eql(u8, color, "blue") == true) {

} else if (std.mem.eql(u8, color, "green") == true) {

} else if (std.mem.eql(u8, color, "pink") == true) {

} else {

}

The implementation for std.mem.eql depends on what's being compared. Specifically, it has an optimized code path when comparing strings. Although that's what we're interested in, let's look at the non-optimized version:

pub fn eql(comptime T: type, a: []const T, b: []const T) bool {
    if (a.len != b.len) return false;
    if (a.len == 0 or a.ptr == b.ptr) return true;

    for (a, b) |a_elem, b_elem| {
        if (a_elem != b_elem) return false;
    }
    return true;
}

Whether we're dealing with slices of bytes or some other type, if they're of different length, they can't be equal. Once we know that they're the same length, if they point to the same memory, then they must be equal. I'm not a fan of this second check; it might be cheap, but I think it's quite uncommon. Once those initial checks are done, we compare each element (each byte of our string) one at a time.

The optimized version, which is used for strings, is much more involved. But it's fundamentally the same as the above with SIMD to compare multiple bytes at once.

The nature of string comparison means that real-world performance is dependent on the values being compared. We know that if we have 100 if / else if branches then, at the worse case, we'll need to call std.mem.eql 100 times. But comparing strings of different lengths or strings which differ early will be significantly faster. For example, consider these three cases:

{
    const str1 = "a" ** 10_000 ++ "1";
    const str2 = "a" ** 10_000 ++ "2";
    _ = std.mem.eql(u8, str1, str2);
}

{
    const str1 = "1" ++ a" ** 10_000;
    const str2 = "2" ++ a" ** 10_000;
    _ = std.mem.eql(u8, str1, str2);
}

{
    const str1 = "a" ** 999_999;
    const str2 = "a" ** 1_000_000;
    _ = std.mem.eql(u8, str1, str2);
}

For me, the first comparison takes ~270ns, whereas the other two take ~20ns - despite the last one involving much larger strings. The second case is faster because the difference is early in the string allowing the for loop to return after only one comparison. The third case is faster because the strings are of a different length: false is returned by the initial len check.

The std.meta.stringToEnum takes an enum type and a string value and returns the corresponding enum value or null. This code prints "you picked: blue"

const std = @import("std");

const Color = enum {
    red,
    blue,
    green,
    pink,
};

pub fn main() !void {
    const color = std.meta.stringToEnum(Color, "blue") orelse {
        return error.InvalidChoice;
    };

    switch (color) {
        .red => std.debug.print("you picked: red\n", .{}),
        .blue => std.debug.print("you picked: blue\n", .{}),
        .green => std.debug.print("you picked: green\n", .{}),
        .pink => std.debug.print("you picked: pink\n", .{}),
    }
}

If you don't need the enum type (i.e. Color) beyond this check, you can leverage Zig's anonymous types. This is equivalent:

const std = @import("std");

pub fn main() !void {
    const color = std.meta.stringToEnum(enum {
        red,
        blue,
        green,
        pink,
    }, "blue") orelse return error.InvalidChoice;

    switch (color) {
        .red => std.debug.print("you picked: red\n", .{}),
        .blue => std.debug.print("you picked: blue\n", .{}),
        .green => std.debug.print("you picked: green\n", .{}),
        .pink => std.debug.print("you picked: pink\n", .{}),
    }
}

It's not obvious how this should perform versus the straightforward if / else if approach. Yes, we now have a switch statement that the compiler can [hopefully] optimize, but std.meta.stringToEnum still has convert our input, "blue", into an enum.

The implementation of std.meta.stringToEnum depends on the number of possible values, i.e. the number of enum values. Currently, if there are more than 100 values, it'll fallback to using the same if / else if that we explored above. Thus, with more than 100 values it does the if / else if check PLUS the switch. This should improve in the future.

However, with 100 or fewer values, std.meta.stringToEnum creates a comptime std.StaticStringMap which can then be used to lookup the value. std.StaticStringMap isn't something we've looked at before. It's a specialized map that buckets keys by their length. Its advantage over Zig's other hash maps is that it can be constructed at compile-time. For our Color enum, the internal state of a StaticStringMap would look something like:

// keys are ordered by length
keys:     ["red", "blue", "pink", "green"];

// values[N] corresponds to keys[N]
values:   [.red, .blue, .pink, .green];

// What's this though?
indexes:  [0, 0, 0, 0, 1, 3];

It might not be obvious how indexes is used. Let's write our own get implementation, simulating the above StaticStringMap state:

fn get(str: []const u8) ?Color {
    // Simulate the state of the StaticStringMap which
    // stringToMeta built at compile-time.
    const keys = [_][]const u8{"red", "blue", "pink", "green"};
    const values = [_]Color{.red, .blue, .pink, .green};
    const indexes = [_]usize{0, 0, 0, 0, 1, 3};

    if (str.len >= indexes.len) {
        // our map has no strings of this length
        return null;
    }

    var index = indexes[str.len];
    while (index < keys.len) {
        const key = keys[index];

        if (key.len != str.len) {
            // we've gone into the next bucket, everything after
            // this is longer and thus can't be a match
            return null;
        }

        if (std.mem.eql(u8, key, str)) {
            return values[index];
        }
        index += 1;
    }
    return null;
}

Take note that keys are ordered by length. As a naive implementation, we could iterate through the keys until we either find a match or find a key with a longer length. Once we find a key with a longer length, we can stop searching, as all remaining candidates won't match - they'll all be too long. StaticStringMap goes a step further and records the index within keys where entries of a specific length begin. indexes[3] tells us where to start looking for keys with a length of 3 (at index 0). indexes[5] tells us where to start looking for keys with a length of 5 (at index 3).

Above, we fallback to using std.mem.eql for any key which is the same length as our target string. StaticStringMap uses its own "optimized" version:

pub fn defaultEql(a: []const u8, b: []const u8) bool {
    if (a.ptr == b.ptr) return true;
    for (a, b) |a_elem, b_elem| {
        if (a_elem != b_elem) return false;
    }
    return true;
}

This is the same as the simple std.mem.eql implementation, minus the length check. This is done because the eql within our while loop is only ever called for values with matching length. On the flip side, StaticStringMap's eql doesn't use SIMD, so it would be slower for large strings.

In my own benchmarks, in general, I've seen little difference between the two approaches. It does seem like std.meta.stringToEnum is generally as fast or faster. It also results in more concise code and is ideal if the resulting enum is useful beyond the comparison.

You usually don't have long enum values, so the lack of SIMD-optimization isn't a concern. However, if you're considering building your own StaticStringMap at compile time with long keys, you should benchmark with a custom eql function based on std.mem.eql.

We could manually bucket those if / else if branches ourselves, similar to what the StaticStringMap does. Something like:

switch (color.len) {
    3 => {
        if (std.mem.eql(u8, color, "red") == true) {
            // ...
            return;
        }
    },
    4 => {
        if (std.mem.eql(u8, color, "blue") == true) {
            // ...
            return;
        }
        if (std.mem.eql(u8, color, "pink") == true) {
            // ...
            return;
        }
    },
    5 => {
        if (std.mem.eql(u8, color, "green") == true) {
            // ...
            return;
        }
    },
    else => {},
}
// not found

Ughhh. This highlights the convenience of using std.meta.stringToEnum to generate similar code. Also, do remember that std.mem.eql quickly discards strings of different lengths, which helps to explain why both approaches generally perform similarly.