Writing JSON to a custom output in Zig
If you want to serialize (or "stringify") data to JSON using Zig, you'll probably take a peak at Zig's [beta] API documentation and find:
pub fn stringify(
value: anytype,
options: StringifyOptions,
out_stream: anytype,
) @TypeOf(out_stream).Error!void
And not much else. It would be reasonable to guess that the first parameter is the value we want to serialize. The second parameter is probably options controlling how to serialize (e.g., indentation). The third parameter is a lot more opaque. Given that this function doesn't take an allocator, and by the fact that it returns a strange error union with void
, we can safely say that it won't be returning a string / []const u8
. The name out_stream
implies that the output will be written to this parameter, but what is it?!
It helps (but only a little) to understand anytype
, which is the type of our third parameter. This is commonly explained as duct typing that happens at compile-time. Despite the name, it usually isn't a type. It can be anything, so long as it can do what stringify
needs from it. The obvious next question is: what does stringify
need from our out_stream
?
anytype
Before I disappoint you with the answer, let's look at a small anytype
example:
pub fn info(msg: []const u8, into: anytype) void {
into.writei64(std.time.timestamp());
into.writeByte(' ');
into.writeString(msg);
}
If we try to use the above info
function with a [100]u8
as the 2nd paremeter, like so:
var out: [100]u8 = undefined;
info("application started", out);
We'll get an error:
error: no field or member function named 'writei64' in '[100]u8'
into.writei64(std.time.timestamp());
So what happens if we create a type that satisfies the requirements of our info
function, starting with an writei64
function?:
const BufWriter = struct {
len: usize = 0,
buf: [100]u8 = undefined,
// just makes it so we can use Self inside of here instead of BufWriter
const Self = @This();
// todo: make sure we don't go over our buffer's size of 100!
fn writei64(self: *Self, value: i64) void {
self.len += std.fmt.formatIntBuf(self.buf[self.len..], value, 10, .lower, .{});
}
};
If we try to use our new BufWriter
:
var writer = BufWriter{};
info("application started", &writer);
std.debug.print("{s}", .{writer.string()});
We no longer get an error about an missing writei64
member. Instead, we get an error about a missig writeByte
member. For completeness, let's add our missing functions:
// todo: make sure we don't go over our buffer's size of 100!
fn writeByte(self: *Self, b: u8) void {
self.buf[self.len] = b;
self.len += 1;
}
// todo: make sure we don't go over our buffer's size of 100!
fn writeString(self: *Self, data: []const u8) void {
std.mem.copy(u8, self.buf[self.len..], data);
self.len += data.len;
}
// This isn't needed by the info function, but we called:
// std.debug.print("{s}", .{writer.string()});
// in our little demo
fn string(self: Self) []const u8 {
return self.buf[0..self.len];
}
And this is what anytype
means. You can pass anything, as long as it implements the necessary functionality. This check is done at compile-time. What actually happen is that the compiler will see that info
is called with *BufWriter
and it'll create a specialized function, something like:
// anytype -> *BufWriter
pub fn info(msg: []const u8, into: *BufWriter) void {
into.writei64(std.time.timestamp());
into.writeByte(' ');
into.writeString(msg);
}
It'll generate a specialized function like this for every type that is used with the function.
json.stringify
We now know that the third parameter to json.stringify
can be any value that satisfies its requirements. But what are those requirements? As far as I can tell, there's no easy way to tell except to examine the code or to implement the skeleton of a type that compiles.
So, we start with an empty type, and try to compile:
pub fn main() !void {
var buffer = Buffer{};
try std.json.stringify(.{.x = 123}, .{}, &buffer);
}
const Buffer = struct {
const Self = @This();
};
And try to compile:
json.zig:2248:22: error: type '*demo.Buffer' has no members
) @TypeOf(out_stream).Error!void {
Well, that's both similar and a different than what we saw before. Clearly our Writer
skeleton is missing something, but what's this @TypeOf(out_stream).Error!void
all about? If you look back at stringify's
signature, you'll note that the return type is @TypeOf(out_stream).Error!void
. In other words, the return type is based on data that our type must provide. In this case, it isn't a function, but constant error type.
I would have hoped that the following might work:
const Buffer = struct {
const Self = @This();
pub const Error = std.mem.Allocator.Error;
};
But note that we're passing a *Buffer
to stringify
and this Error
type is defined on Buffer
. If we passed buffer
instead of &buffer
, we'd be able to move past this error, but in the long run, we know that our buffer will need to be mutable.
As far as I can tell, the only way to solve this issue is to wrap our buffer:
const Buffer = struct {
const Self = @This();
pub const Writer = struct {
buffer: *Buffer,
pub const Error = std.mem.Allocator.Error;
};
pub fn writer(self: *Self) Writer {
// .{...} is shorthand for Writer{...}
// The return type is inferred.
return .{.buffer = self};
}
};
Now we can pass our Buffer.Write
to stringify
:
var buffer = Buffer{};
try std.json.stringify(.{.x = 123}, .{}, buffer.writer());
And move on to the next compilation error, which is that our Buffer.Writer
is missing a writeByte
function. If we create an empty writeByte
and try again, we'll see that we're missing a writeByteNTimes
function. Thankfully, if we do this one more time, we'll find the last function that we're missing: writeAll
. Finally, our skeleton compiles:
update: around July 24th, 2023 the stringify
logic was changed and now requires another function, print
, which I've included in the snippet.
const Buffer = struct {
const Self = @This();
pub const Writer = struct {
buffer: *Buffer,
pub const Error = std.mem.Allocator.Error;
pub fn writeByte(self: Writer, b: u8) !void {
_ = self;
_ = b;
}
pub fn writeByteNTimes(self: Writer, b: u8, n: usize) !void {
_ = self;
_ = b;
_ = n;
}
// the requirement for print was added in 0.11 dev around July 24th
pub fn print(self: Writer, comptime format: []const u8, args: anytype) !void {
return std.fmt.format(self, format, args);
}
pub fn writeAll(self: Writer, b: []const u8) !void {
_ = self;
_ = b;
}
};
pub fn writer(self: *Self) Writer {
return .{.buffer = self};
}
};
With this skeleton, we could serialize json out into or however we wanted. The implementation of this skeleton doesn't matter. Also, for common cases, like writing to a file, the standard library provides existing writers. Still, this all seems difficult to discover.
io.Writer
With our skeleton writer done, we could call it a day. But there's a more common approach to solve this particular problem.
The above 3 functions, writeByte
, writeByteNTimes
and writeAll
are all about writing data of various shape (a single byte, a single byte multiple types, multiple bytes) into wherever our writer wants it to go. These functions can be implemeted by wrapping a generic writeAll
function. For example, writeByte
could be implemented as:
pub fn writeByte(self: Writer, b: u8) !void {
const array = [1]u8{byte};
return self.writeAll(&array);
}
We can take this a small step further and make writeAll
itself generic wrapper around an even more focused write(data: []const u8) !usize
function, like so:
pub fn writeAll(self: Self, data: []const u8) !void {
var index: usize = 0;
while (index != data.len) {
index += try self.write(data[index..]);
}
}
When we take this approach, all of the wrapper functions are generic. It is only write
that has any implementation-specific logic. This is what Zig's built-in io.Writer
leverages. io.Writer
provides all of the generic writeXYZ
function. It just needs us to give it a write
function to call. So instead of providing our own Writer
we can provide an io.Writer
which points to our specific write
implementation:
const Buffer = struct {
const Self = @This();
pub const Writer = std.io.Writer(*Self, error{OutOfMemory}, write);
pub fn writer(self: *Self) Writer {
return .{.context = self};
}
pub fn write(self: *Self, data: []const u8) !usize {
_ = self;
// just lie and tell io.Writer that we successfully wrote everything
return data.length;
}
};
This has the same usage as our fully customized approach:
var buffer = Buffer{};
try std.json.stringify(.{.x = 123}, .{}, buffer.writer());
We still define our own Buffer.Writer
, but this is a generic implementation of std.io.Writer
. Whe provide the "context" type, the error type, and a our write function (which can be called anything you want, it doesn't have to be write
). When we actually create an instance of our Writer
, via the writer
function, we set the "context" to the instance of our buffer. Seeing how io.Writer
calls back into our code will probably make things clearer:
// Self here is the io.Writer, or, more correctly, it's our
// std.io.Writer(*Buffer, error{OutOfMemory}, write)
pub fn write(self: Self, bytes: []const u8) Error!usize {
return writeFn(self.context, bytes);
}
The context could be anything. In our case, we just the Buffer itself. But if we have a lot of state to track when writing and if our Buffer is a busy object doing a lot of things, it wouldn't be crazy to create a write-specific object which gets created when writer()
is called.
Conclusion
Using io.Writer
requires a lot less code. It also makes our Buffer.Writer
(and by extension our Buffer
) usable in more places because Buffer.Writer
implements more functions (for example, is also implements writeIntNative
, because that's a function of io.Writer
which wraps our write
function.)
The custom implementation has less overhead and could provide greater optimization opportunity (e.g. a custom implementation of writeByte
can be a lot simpler).
If the json.stringify
logic changes, our custom writer might fail (though this would be compile-time error). If we're using io.Writer
chances are we wouldn't have to make any changes (unless json.stringify
changes so much that even io.Writer
is no longer a valid type).
Use whichever you want.
Again, none of this is particularly obvious. There's nothing in std.json
that points to io.Writer
. More generically, as far as I can tell, there's nothing that describes the contract. This makes me wonder about backwards compatibility as it seems like json.stringify
isn't constrained in anyway. At any point, it can change what it requires from its third parameter.