Zig Interfaces
Oct 08, 2023
If you're picking up Zig, it won't be long before you realize there's no syntactical sugar for creating interfaces. But you'll probably notice interface-like things, such as std.mem.Allocator
. This is because Zig doesn't have a simple mechanism for creating interfaces (e.g. interface
and implements
keywords), but the language itself can be used to achieve the same goal.
There are existing articles that I've been able to copy and paste to get this working, but none really clicked with me. It wasn't until I broke the code down into two parts: making it work and then making it pretty, that I finally understood. So, first, let's make something that works.
A Simple Interface Implementation
We're going to create a Writer
interface; it's something simple to understand that won't get in our way. We'll stick with a single function; once we know how to do this with one function, it's easy to repeat the pattern for more. First, the interface itself:
const Writer = struct {
ptr: *anyopaque,
writeAllFn: *const fn (ptr: *anyopaque, data: []const u8) anyerror!void,
fn writeAll(self: Writer, data: []const u8) !void {
return self.writeAllFn(self.ptr, data);
}
};
This is our complete interface. Depending on your knowledge of Zig, there might be some things you aren't sure about. Our interface has two fields. The first, ptr
, is a pointer to the actual implementation. We'll talk about *anyopaque
in a bit. The second, writeAllFn
, is a function pointer to the function of the actual implementation.
Notice that the interface's writeAll
implementation just calls our function pointer and passes it the ptr
field as well as any other arguments. Here's a skeleton implementation:
const File = struct {
fd: os.fd_t,
fn writeAll(ptr: *anyopaque, data: []const u8) !void {
}
fn writer(self: *File) Writer {
return .{
.ptr = self,
.writeAllFn = writeAll,
};
}
};
First, the writer
function is how we get a Writer
from a *File
. This is like calling gpa.allocator()
on a GeneralPurposeAllocator to get an std.mem.Allocator
. Aside from the fact that we're able to assign self
to ptr
(a *File
to *anyopaque
), there's nothing special here. We're just creating a Writer
struct. And even this assignment isn't too special, Zig's automatic type coercion requires guaranteed safety and no ambiguity, two properties that are always true when assigning to an *anyopaque
.
The part that glues everything together, the part that we've left out, is: what do we do with the ptr: *anyopaque
passed back into writeAll
? First,*anyopaque
is a pointer to something of unknown type and unknown size. Hopefully it's clear why Writer.ptr
has to be of this type. It can't be a *File
, else it wouldn't be usable for other implementations. The nature of interfaces means that, at compile time, we don't know what the implementation will be and thus *anyopaque
is the only possible choice.
It's important to know that when we create a Writer
via file.writer()
, ptr
is the file because we assign it to self
. But because ptr
is of type *anyopaque
, the assignment erases its true type. The memory pointed to by ptr
does represent a *File
, the compiler just doesn't know that. We need a way to inject this information into the compiler. We can do this with a combination of ptrCast
and alignCast
:
fn writeAll(ptr: *anyopaque, data: []const u8) !void {
const self: *File = @ptrCast(@alignCast(ptr));
_ = try std.os.write(self.fd, data);
}
@ptrCast
converts a pointer from one type to another. The type to convert to is inferred by the value the result is assigned to. In the above case, we're telling the compiler: give me a variable pointing to the same thing as ptr
but treat that like a *File
, trust me, I know what I'm doing. @ptrCast
is powerful as it allows us to force the type associated with specific memory. If we're wrong and use @ptrCast
to convert a pointer into a type incompatible with the underlying memory, we'll have serious runtime issues, with a crash being the best possible outcome.
@alignCast
is more complicated. There are CPU-specific rules for how data must be arranged in memory. This is called data alignment and it deals with how fields in a structure are aligned in memory. anyopaque
always has an alignment of 1. But our File
has a different alignment (4). If you want, you can see this by printing @alignOf(File)
and @alignOf(anyopaque)
. Just like we need @ptrCast
to tell the compiler what the type is, we need @alignCast
to tell the compiler what the alignment is. And, just like @ptrCast
, @alignCast
infers this based on what it's being assigned to.
Our complete solution is:
const Writer = struct {
ptr: *anyopaque,
writeAllFn: *const fn (ptr: *anyopaque, data: []const u8) anyerror!void,
fn writeAll(self: Writer, data: []const u8) !void {
return self.writeAllFn(self.ptr, data);
}
};
const File = struct {
fd: os.fd_t,
fn writeAll(ptr: *anyopaque, data: []const u8) !void {
const self: *File = @ptrCast(@alignCast(ptr));
_ = try std.os.write(self.fd, data);
}
fn writer(self: *File) Writer {
return .{
.ptr = self,
.writeAllFn = writeAll,
};
}
};
Hopefully, this is pretty clear to you. It comes down to two things: using
*anyopaque
to be able to store a pointer to any implementation, and
then using @ptrCast(@alignCast(ptr))
to restore the correct type information.
As an aside, the interface's ptr
type has to be a pointer to anyopaque
, i.e. *anyopaque
, it cannot be just anyopaque
. Do you know why? As I said, anyopaque
is of unknown size and in Zig, like most languages, all types have to have a known size. Writer
has a size of 16 bytes: 2 pointers with each pointer being 8 bytes on a 64 bit platform. If we were to try and use anyopaque
, then the size of Writer
becomes unknown, which the compiler will not allow. (pointers always have a known type which depends on the underlying architecture, e.g. 4 bytes on a 32bit CPU)
Making it Prettier
I'm a fan of the above implementation. There's only a little magic to know and implement. Some of the interfaces in the standard library, like std.mem.Allocator
, look just like it. (Because Allocator
has a few more functions, a nested structure called VTable
(virtual table) is used to hold the function pointers, but that's a small change.)
The major drawback is that it's only usable through the interface. We can't use file.writeAll
directly since writeAll
doesn't have a *File
receiver. So it's fine if implementations are always accessed through the interface, like Zig's allocators, but it won't work if we need implementations to function on their own as well as through an interface.
In other words, we'd like File.writeAll
to be a normal method, essentially not having to deal with *anyopaque
:
fn writeAll(self: *File, data: []const u8) !void {
_ = try std.os.write(self.fd, data);
}
This is something we can achieve, but it requires changing our Writer
interface:
const Writer = struct {
ptr: *anyopaque,
writeAllFn: *const fn (ptr: *anyopaque, data: []const u8) anyerror!void,
fn init(ptr: anytype) Writer {
const T = @TypeOf(ptr);
const ptr_info = @typeInfo(T);
const gen = struct {
pub fn writeAll(pointer: *anyopaque, data: []const u8) anyerror!void {
const self: T = @ptrCast(@alignCast(pointer));
return ptr_info.Pointer.child.writeAll(self, data);
}
};
return .{
.ptr = ptr,
.writeAllFn = gen.writeAll,
};
}
pub fn writeAll(self: Writer, data: []const u8) !void {
return self.writeAllFn(self.ptr, data);
}
};
What's new here is the init
function. To me, it's pretty complicated, but it helps to think of it from the point of view of our original implementation. The point of all the code in init
is to turn an *anyopaque
into a concrete type, such as *File
. This was easy to do from within *File
, because within File.writeAll
, ptr
had to be a *File
. But here, to know the type, we need to capture more information.
To better understand init
, it might help to see how it's used. Our File.writer
, which previous created a Writer
directly, is now changed to:
fn writer(self: *File) Writer {
return Writer.init(self);
}
So we know that the ptr
argument to init
is our implementation. The @TypeOf
and @typeInfo
builtin functions are central to most compile-time work in Zig. The first returns the type of ptr
, in this case *File
, and the latter returns a tagged union which fully describes the type. You can see that we create a nested structure which also has a writeAll
implementation. This is where the *anyopaque
is converted to the correct type and the implementation's function is invoked. The structure is needed because Zig lacks anonymous functions. We need Writer.writeAllFn
to be this little two-line wrapper, and using a nested structure is the only way to do that.
Obviously file.writer()
is something that'll be executed at runtime. It can be tempting to think that everything inside Writer.init
, which file.writer()
calls, is created at runtime. You might wonder about the lifetime of our internal gen
structure, particularly in the face of multiple implementations. But aside from the return
statement, init
is all compile-time code generation. Specifically, the Zig compiler will create a version of init
for each type of ptr
that the program uses. The init
function is more like a template for the compiler (all because the ptr
argument is anytype
). When file.writer()
is called at runtime, the Writer.init
function that ends up being executed is distinct from the Writer.init
function that would be executed for a different type.
In the original version, each implementation is responsible for converting *anyopaque
to the correct type. Essentially by including that one line of code, @ptrCast(@alignCast(ptr))
. In this fancy version, each implementation also has its own code to do this conversion, we've just managed to embed it in the interface and leveraged Zig's comptime capabilities to generate the code for us.
The last part of this code is the function invocation, via ptr_info.Pointer.child.writeAll(self, data)
. @typeInfo(T)
returns a std.builtin.Type
which is a tagged union that describes a type. It can describe 24 different types, such as integers, optional, structs, pointers, etc. Each type has its own properties. For example, an integer has a signedness
which other types don't. Here's what @typeInfo(*File)
looks like:
builtin.Type{
.Pointer = builtin.Type.Pointer{
.address_space = builtin.AddressSpace.generic,
.alignment = 4,
.child = demo.File,
.is_allowzero = false,
.is_const = false,
.is_volatile = false,
.sentinel = null,
.size = builtin.Type.Pointer.Size.One
}
}
The child
field is the actual type behind the pointer. When init
is called with a *File
, ptr_info.Pointer.child.writeAll(...)
translates to File.writeAll(...)
, exactly what we want.
If you look at other implementations of this pattern, you might find their init
function does a few more things. For example, you might find these two additional lines of code after ptr_info
is created:
if (ptr_info != .Pointer) @compileError("ptr must be a pointer");
if (ptr_info.Pointer.size != .One) @compileError("ptr must be a single item pointer");
The purpose of these is to add additional compile-time checks on the type of value passed to init
. Essentially making sure that we passed it a pointer to a single item.
Also, instead of calling the function via:
ptr_info.Pointer.child.writeAll(self, data);
You might see:
@call(.always_inline, ptr_info.Pointer.child.writeAll, .{self, data});
The @call
builtin function, is the same as calling a function directly (as we did), but gives more flexibility by allowing us to supply a CallModifier
. As you can see, using @call
allows us to tell the compiler to inline the function.
Hopefully this has made the implementation of interfaces in Zig clearer and maybe exposed new capabilities of the language. However, for simple cases where all implementations are known, you might want to consider a different approach.
Tagged Unions
As an alternative to the above solutions, tagged unions can be used to emulate interfaces. Here's a complete working example:
const Writer = union(enum) {
file: File,
fn writeAll(self: Writer, data: []const u8) !void {
switch (self) {
.file => |file| return file.writeAll(data),
}
}
};
const File = struct {
fd: os.fd_t,
fn writeAll(self: File, data: []const u8) !void {
_ = try std.os.write(self.fd, data);
}
};
pub fn main() !void {
const file = File{.fd = std.io.getStdOut().handle};
const writer = Writer{.file = file};
try writer.writeAll("hi");
}
Remember that when we switch a tagged union, the captured values, e.g. |file|
, has the correct type. File
in this case.
The downside with this approach is that it requires all implementations to be known ahead of time. You can't use it, for example, to create an interface that third parties can create implementations for. Each possible implementation has to be baked into the union.
Within an app, there are plenty of cases where such a restriction is fine. You can build a Cache
union with all supported caching implementations, e.g. InMemory, Redis and PostgreSQL. If your application adds a new implementation, you just update the union.
In many cases, the interface will call the underlying implementation directly. For those cases, you can use the special inline else
syntax:
switch (self) {
.null => {},
inline else => |impl| return impl.writeAll(data),
}
This essentially gets expanded automatically for us, meaning that impl
will have the correct underlying type for each case. The other thing this highlights is that the interface can inject its own logic . Above, we see it short-circuit the call when the "null" implementation is active. Exactly how much logic you want to add to the interface is up to you, but we're all adults here and I'm not going to tell you that interfaces should only do dispatching.
As far as I'm concerned, tagged unions should be your first option.
As an aside, I want to mention that there's yet another option for creating interfaces which relies on the @fieldParentPtr
builtin. This used to be the standard way to create interfaces, but is now infrequently used. Still, you might see references to it.
If you're interested in learning Zig, consider my Learning Zig series.