Zig's BoundedArray

Aug 27, 2024

In Zig an array always has a compile-time length. The length of the array is part of the type, so a [4]u32 is a different type than a [5]u32. But in real-life code, the length that we need is often known only at runtime so we rely on dynamic allocations via allocator.alloc. In some cases the length isn't even known until after we're done adding all the values, so we use an std.ArrayList which can dynamically grow as values are added.

Dynamic allocation, either directly or via std.ArrayList, provides great flexibility but has drawbacks. First of all, more things can go wrong. We can forget to free the dynamically allocated memory and we have to be more mindful of bugs or attacks which might end up allocating more memory than we originally intended. Dynamic allocation also adds overhead, both in terms of performance and in terms of having to manage an allocator.

Sometimes you'll have cases where the length isn't known at compile-time, but it's maximum length is. For example, imagine you need to generate a user's full name, combining their first and last name. If your system enforces a maximum first name and last name length of 100 bytes each, then you know that the combined name, including the space, will be a maximum of 201 bytes. So you could use an allocator:

const full_name = try std.fmt.allocPrint(allocator, "{s} {s}", .{first_name, last_name});

Or, you could use an array sized to the maximum possible length:

std.debug.assert(first_name.len <= 100);
std.debug.assert(last_name.len <= 100);

var buf: [201]u8 = undefined;
const full_name = try std.fmt.bufPrint(&buf, "{s} {s}", .{first_name, last_name});

Above we see that bufPrint returns a slice to the combined name. This is a slice that points to our buf. An alternative API that you'll run into is a function which returns the length. For example, std.net.Stream.read returns the number of bytes read into the supplied buffer:

var buf: [1024]u8 = undefined;
const n = try stream.read(&buf);
if (n == 0) {
  return error.Closed;
}
processInput(buf[0..n]);

Both stream.read and fmt.bufPrint take a []u8. The source of that buf doesn't have to be an array - it could be a dynamically allocated slice. It's up to use to decide what's the best approach for your specific use-case.

What about cases where you'd normally reach for an std.ArrayList? We can do something similar, but we'll need to add another variable to track the current length. For example, when reading a postgresql []integer column using pg.zig, we might do something like:

var numbers: [10]i32 = undefined;
var number_length: usize = 0;

var it = try row.getCol(pg.Iterator(i32), "history");
while (it.next()) |value| {
  numbers[number_length] = value;
  number_length += 1;
}

// now we can use numbers[0..number_length];

It's a bit more clumsy: numbers is tied to numbers_length. It might make more sense to create a structure for both fields and to expose methods to wrap any access to the underlying array. This is where std.BoundedArray comes in. It's a generic with two types: the type of the array and the maximum length of our array. The above could be rewritten as:

var numbers = std.BoundedArray(i32, 10){};

var it = try row.getCol(pg.Iterator(i32), "history");
while (it.next()) |value| {
  try numbers.append(value);
}

// now we can use numbers.slice();

It's a small thing, but by combining the two variables into a structure and adding an explicit API (i.e. the append and slice methods), things have gotten both cleaner and safer (numbers and number_length can no longer get out of sync).

std.BoundedArray has a number of other useful methods, such as get, set and clear. None take an allocator, because we never allocate. std.BoundedArray is a wrapper around an array. The comptime-known length, which an array has to have, was specified as part of the type: std.BoundedArray(i32, 10).

If the syntax around var numbers = std.BoundedArray(i32, 10){}; is unclear, consider this code which initializes a User:

var value = User{};

Zig requires all fields to be set, so either User has no field, or every field has a default value. Now, replace User with std.BoundedArray(i32, 10). It's the exact same thing. I doesn't look like it, but std.BoundedArray(i32, 10) is a type, just like User.

One last thing to keep in mind: std.BoundedArray doesn't use any internal pointers. When you see: std.BoundedArray(i32, 10), you should really be imagining:

const BoundedArray = struct {
  buffer: [10]i32,
  len: usize = 0,
};

The reason this is important is that an assignment, including passing it by value to a function, will make a copy of the underlying array:

var numbers = std.BoundedArray(i32, 10){};

var it = try row.getCol(pg.Iterator(i32), "history");
while (it.next()) |value| {
  try numbers.append(value);
}

// since numbers.buffer is an array, this copies the array
var c = numbers;

// this modified the copy, not the original `numbers`
c.set(0, 100);

// this is true
std.debug.assert(c.get(0) == 100);

// this is also true
std.debug.assert(numbers.get(0) == 1);

std.BoundedArray probably isn't something you want to use all the time, but it can be useful in specific cases where the maximum length is known and relatively small and you'd otheriwse want something like an std.ArrayList.