Custom String Formatting and JSON [De]Serializing in Zig
In our last blog post, we saw how builtins like @hasDecl
and functions like std.meta.hasMethod
can be used to inspect a type to determine its capabilities. Zig's standard library makes use of these in a few place to allow developers to opt-into specific behavior. In particular, both std.fmt
and std.json
provide developers the ability to define functions that control how a type is formatted and JSON serialized/deserialized.
format
While Zig does a good job of printing out custom types, it can be useful/necessary to tweak that output. For example, you might want to exclude a specific field from the output. If you define a public format
method on your struct, enum or union, Zig will call it rather than using the default formatter:
const std = @import("std");
pub fn main() !void {
const u = User{.power = 9001};
std.debug.print("{}\n", .{u});
}
const User = struct {
power: u32,
pub fn format(self: *const User, comptime fmt: []const u8, _: std.fmt.FormatOptions, writer: anytype) !void {
if (fmt.len != 0) {
std.fmt.invalidFmtError(fmt, self);
}
return writer.print("power level @ {d}!!!", .{self.power});
}
};
As you can see, format
takes 4 parameters: the value being formatted (your type), the format string, format options, and the writer. It's pretty common to ignore both the format string and format options. Above we did use the format string as a simple guard, ensuring that our user variable was formatted using {}
or {any}
, and returning an error if something like {s}
was used. The real question we need to answer is: what's writer
. It's generally understood to be an std.io.Writer
, but by using anytype
we automatically support anything that can fulfill our needs.
More specifically, the two methods that you're most likely to use are: writer.print
and writer.writeAll
. We used print
above, which is a simple yet powerful way to customize the string representation. We'd use writeAll
if we wanted to directly generate a []const u8
. Of course, there's nothing stopping us from using a both methods, as well as the other io.Writer
methods like writeByte
and writeByteNTimes
.
jsonStringify
We can control the JSON serialization of a struct, union of enum by defining a jsonStringify
method:
const User = struct {
power: u32,
pub fn jsonStringify(self: *const User, jws: anytype) !void {
try jws.beginObject();
try jws.objectField("power");
try jws.write(self.power);
try jws.endObject();
}
};
The signature is simpler, with the writer assumed to be an std.json.WriteStream
. Unlike the more generic io.Writer
used in format
, the WriteStream
is designed specifically for JSON. In addition to the beginObject
and endObject
, there's also a beginArray
and endArray
. Unlike write
and writeAll
found on the more generic io.Writer
, the write
method of the WriteStream is JSON aware. If we called jws.write()
on a string value, the value would be quoted and escaped. If we called it on a structure, that structure would be JSON-encoded (either using Zig's default JSON encoder, or the structure's own jsonStringify
method).
The WriteStream
also has a print
method. Like the io.Writer
's print
method we used in format
, it takes a format string and optional parameters. print
will apply the correct indentation (based on the options passed to stringify
) but will not apply any additional JSON-specific format. So if we changed the code to:
const User = struct {
name: []const u8,
pub fn jsonStringify(self: *const User, jws: anytype) !void {
try jws.beginObject();
try jws.objectField("name");
try jws.print("{s}", .{self.name});
try jws.endObject();
}
};
We'd likely end up generating invalid JSON, since the name
value wouldn't be quoted or correctly escaped (notice the value isn't quoted):
{"name":leto}
Care must be taken when using print
, but it provides the greatest flexibility; for example it's useful if you want to format numbers a specific way.
jsonParse
The counterpart to the jsonStringify
method is the jsonParse
function:
const User = struct {
name: []const u8,
pub fn jsonParse(allocator: Allocator, source: anytype, options: std.json.ParseOptions) !User {
// ...
}
};
The source
is assumed to be either a std.json.Scanner
or a std.json.Reader
. These both expose the same methods, so, from our point of view, it doesn't really matter which it is. Unfortunately, implementing a custom jsonParse
function is a lot more complicated than implementing a custom format
or jsonStringify
method. This is because the scanner is low-level and reads a token at a time. For example, a naive implementation for the above User
with the name
field looks like:
pub fn jsonParse(allocator: Allocator, source: anytype, options: std.json.ParseOptions) !User {
if (try source.next() != .object_begin) {
return error.UnexpectedToken;
}
var name: []const u8 = undefined;
switch (try source.nextAlloc(allocator, .alloc_if_needed)) {
.string, .allocated_string => |field| {
if (std.mem.eql(u8, field, "name") == false) {
return error.UnknownField;
}
},
else => return error.UnexpectedToken,
}
switch (try source.nextAlloc(allocator, options.allocate.?)) {
.string, .allocated_string => |value| name = value,
else => return error.UnexpectedToken,
}
if (try source.next() != .object_end) {
return error.UnexpectedToken;
}
return .{.name = name};
}
Of course, if we were to add more fields, keeping in mind that we should generally support JSON objects to have fields in any order, the code will get much more complicated. Also, we should probably consider the options.ignore_unknown_fields
value to determine whether we should error or ignore an unknown field.
You can probably tell from the above code, but the type returned by next
and nextAlloc
is a tagged union. Specifically, a std.json.Token
. Notice that when we're looking for a string (for the field name and field value), we match against both a .string
and .allocated_string
. Also notice that when we're reading the field, we use nextAlloc
with .alloc_if_needed
, but when reading the value, we're passing options.allocate.?
. Why is this so complicated?
The json Scanner
, Reader
and Token
are all designed to work with both a generic io.Reader
and a string input. When dealing with a generic io.Reader
, a 4K buffer is used. This has two implications. First, when our buffer fills up, it's reset and reused for the next 4K of data. This invalidates any old references. Second, both field names and field values can get split across multiple reads. For this reason we need to tell nextAlloc
how to proceed: should it always create a duplicate of the value, or should it only do it when necessary. There's no "never allocate" option, because of the second case mentioned above: when a field name or value is spread out across multiple fills of the buffer, an allocation is required to create a single coherent value.
We use .alloc_if_needed
on the field name, because that value is only needed until the next call to next
or nextAlloc
. We just need the field name to compare it to our expect "name"
. Hopefully the full field name is inside the buffer, meaning the scanner won't have to allocate anything. If it doesn't, it'll return a .string
. If it does have to allocate, it'll return an .allocated_string
. For our jsonParse
it doesn't matter which it is, we just need the value. But in some cases, you might care about the difference.
options.allocate.?
is the option given to the std.json.parseXYZ
function. We use that for the field value, letting the caller decide whether or not the always dupe string values. When option.allocate
isn't explicit set, the default depends on which parse function is used and whether a Scanner
or Reader
is provided.
The documentation for std.json.Token isn't the clearest, but it does help explain some of this and is probably required reading if you plan on writing your own jsonParse
.
Conclusion
I don't think anyone has ever claimed that Zig's documentation is best-in-class. I think these three methods are particularly good examples of the documentation's shortcoming. It isn't just that the behaviors aren't easily discoverable, but the use of anytype
- which provides more flexibility - harms understandability.
Still, both format
and jsonStringify
are straightforward once you've seen an example or two. And they both provide a flexible and expressive API. For jsonParse
, you'll almost certainly need/want to write some helpers to deal with the low-level API which is exposed.