homedark

Writing JSON to a custom output in Zig

Mar 17, 2023

If you want to serialize (or "stringify") data to JSON using Zig, you'll probably take a peak at Zig's [beta] API documentation and find:

pub fn stringify(
  value: anytype,
  options: StringifyOptions,
  out_stream: anytype,
) @TypeOf(out_stream).Error!void

And not much else. It would be reasonable to guess that the first parameter is the value we want to serialize. The second parameter is probably options controlling how to serialize (e.g., indentation). The third parameter is a lot more opaque. Given that this function doesn't take an allocator, and by the fact that it returns a strange error union with void, we can safely say that it won't be returning a string / []const u8. The name out_stream implies that the output will be written to this parameter, but what is it?!

It helps (but only a little) to understand anytype, which is the type of our third parameter. This is commonly explained as duct typing that happens at compile-time. Despite the name, it usually isn't a type. It can be anything, so long as it can do what stringify needs from it. The obvious next question is: what does stringify need from our out_stream?

anytype

Before I disappoint you with the answer, let's look at a small anytype example:

pub fn info(msg: []const u8, into: anytype) void {
  into.writei64(std.time.timestamp());
  into.writeByte(' ');
  into.writeString(msg);
}

If we try to use the above info function with a [100]u8 as the 2nd paremeter, like so:

var out: [100]u8 = undefined;
info("application started", out);

We'll get an error:

error: no field or member function named 'writei64' in '[100]u8'
 into.writei64(std.time.timestamp());

So what happens if we create a type that satisfies the requirements of our info function, starting with an writei64 function?:

const BufWriter = struct {
  len: usize = 0,
  buf: [100]u8 = undefined,

  // just makes it so we can use Self inside of here instead of BufWriter
  const Self = @This();

  // todo: make sure we don't go over our buffer's size of 100!
  fn writei64(self: *Self, value: i64) void {
    self.len += std.fmt.formatIntBuf(self.buf[self.len..], value, 10, .lower, .{});
  }
};

If we try to use our new BufWriter:

var writer = BufWriter{};
info("application started", &writer);
std.debug.print("{s}", .{writer.string()});

We no longer get an error about an missing writei64 member. Instead, we get an error about a missig writeByte member. For completeness, let's add our missing functions:

// todo: make sure we don't go over our buffer's size of 100!
fn writeByte(self: *Self, b: u8) void {
  self.buf[self.len] = b;
  self.len += 1;
}

// todo: make sure we don't go over our buffer's size of 100!
fn writeString(self: *Self, data: []const u8) void {
  std.mem.copy(u8, self.buf[self.len..], data);
  self.len += data.len;
}

// This isn't needed by the info function, but we called:
//    std.debug.print("{s}", .{writer.string()});
// in our little demo
fn string(self: Self) []const u8 {
  return self.buf[0..self.len];
}

And this is what anytype means. You can pass anything, as long as it implements the necessary functionality. This check is done at compile-time. What actually happen is that the compiler will see that info is called with *BufWriter and it'll create a specialized function, something like:

// anytype -> *BufWriter
pub fn info(msg: []const u8, into: *BufWriter) void {
  into.writei64(std.time.timestamp());
  into.writeByte(' ');
  into.writeString(msg);
}

It'll generate a specialized function like this for every type that is used with the function.

json.stringify

We now know that the third parameter to json.stringify can be any value that satisfies its requirements. But what are those requirements? As far as I can tell, there's no easy way to tell except to examine the code or to implement the skeleton of a type that compiles.

So, we start with an empty type, and try to compile:

pub fn main() !void {
  var buffer = Buffer{};
  try std.json.stringify(.{.x = 123}, .{}, &buffer);
}

const Buffer = struct {
  const Self = @This();
};

And try to compile:

json.zig:2248:22: error: type '*demo.Buffer' has no members
) @TypeOf(out_stream).Error!void {

Well, that's both similar and a different than what we saw before. Clearly our Writer skeleton is missing something, but what's this @TypeOf(out_stream).Error!void all about? If you look back at stringify's signature, you'll note that the return type is @TypeOf(out_stream).Error!void. In other words, the return type is based on data that our type must provide. In this case, it isn't a function, but constant error type.

I would have hoped that the following might work:

const Buffer = struct {
  const Self = @This();

  pub const Error = std.mem.Allocator.Error;
};

But note that we're passing a *Buffer to stringify and this Error type is defined on Buffer. If we passed buffer instead of &buffer, we'd be able to move past this error, but in the long run, we know that our buffer will need to be mutable.

As far as I can tell, the only way to solve this issue is to wrap our buffer:

const Buffer = struct {
  const Self = @This();

  pub const Writer = struct {
    buffer: *Buffer,
    pub const Error = std.mem.Allocator.Error;
  };

  pub fn writer(self: *Self) Writer {
    // .{...} is shorthand for Writer{...}
    // The return type is inferred.
    return .{.buffer = self};
  }
};

Now we can pass our Buffer.Write to stringify:

var buffer = Buffer{};
try std.json.stringify(.{.x = 123}, .{}, buffer.writer());

And move on to the next compilation error, which is that our Buffer.Writer is missing a writeByte function. If we create an empty writeByte and try again, we'll see that we're missing a writeByteNTimes function. Thankfully, if we do this one more time, we'll find the last function that we're missing: writeAll. Finally, our skeleton compiles:

update: around July 24th, 2023 the stringify logic was changed and now requires another function, print, which I've included in the snippet.

const Buffer = struct {
  const Self = @This();

  pub const Writer = struct {
    buffer: *Buffer,
    pub const Error = std.mem.Allocator.Error;

    pub fn writeByte(self: Writer, b: u8) !void {
      _ = self;
      _ = b;
    }

    pub fn writeByteNTimes(self: Writer, b: u8, n: usize) !void {
      _ = self;
      _ = b;
      _ = n;
    }

    // the requirement for print was added in 0.11 dev around July 24th
    pub fn print(self: Writer, comptime format: []const u8, args: anytype) !void {
      return std.fmt.format(self, format, args);
    }

    pub fn writeAll(self: Writer, b: []const u8) !void {
      _ = self;
      _ = b;
    }
  };

  pub fn writer(self: *Self) Writer {
    return .{.buffer = self};
  }
};

With this skeleton, we could serialize json out into or however we wanted. The implementation of this skeleton doesn't matter. Also, for common cases, like writing to a file, the standard library provides existing writers. Still, this all seems difficult to discover.

io.Writer

With our skeleton writer done, we could call it a day. But there's a more common approach to solve this particular problem.

The above 3 functions, writeByte, writeByteNTimes and writeAll are all about writing data of various shape (a single byte, a single byte multiple types, multiple bytes) into wherever our writer wants it to go. These functions can be implemeted by wrapping a generic writeAll function. For example, writeByte could be implemented as:

pub fn writeByte(self: Writer, b: u8) !void {
  const array = [1]u8{byte};
  return self.writeAll(&array);
}

We can take this a small step further and make writeAll itself generic wrapper around an even more focused write(data: []const u8) !usize function, like so:

pub fn writeAll(self: Self, data: []const u8) !void {
  var index: usize = 0;
  while (index != data.len) {
    index += try self.write(data[index..]);
  }
}

When we take this approach, all of the wrapper functions are generic. It is only write that has any implementation-specific logic. This is what Zig's built-in io.Writer leverages. io.Writer provides all of the generic writeXYZ function. It just needs us to give it a write function to call. So instead of providing our own Writer we can provide an io.Writer which points to our specific write implementation:

const Buffer = struct {
  const Self = @This();

  pub const Writer = std.io.Writer(*Self, error{OutOfMemory}, write);

  pub fn writer(self: *Self) Writer {
    return .{.context = self};
  }

  pub fn write(self: *Self, data: []const u8) !usize {
    _ = self;

    // just lie and tell io.Writer that we successfully wrote everything
    return data.length;
  }
};

This has the same usage as our fully customized approach:

var buffer = Buffer{};
try std.json.stringify(.{.x = 123}, .{}, buffer.writer());

We still define our own Buffer.Writer, but this is a generic implementation of std.io.Writer. Whe provide the "context" type, the error type, and a our write function (which can be called anything you want, it doesn't have to be write). When we actually create an instance of our Writer, via the writer function, we set the "context" to the instance of our buffer. Seeing how io.Writer calls back into our code will probably make things clearer:

// Self here is the io.Writer, or, more correctly, it's our
// std.io.Writer(*Buffer, error{OutOfMemory}, write)
pub fn write(self: Self, bytes: []const u8) Error!usize {
  return writeFn(self.context, bytes);
}

The context could be anything. In our case, we just the Buffer itself. But if we have a lot of state to track when writing and if our Buffer is a busy object doing a lot of things, it wouldn't be crazy to create a write-specific object which gets created when writer() is called.

Conclusion

Using io.Writer requires a lot less code. It also makes our Buffer.Writer (and by extension our Buffer) usable in more places because Buffer.Writer implements more functions (for example, is also implements writeIntNative, because that's a function of io.Writer which wraps our write function.)

The custom implementation has less overhead and could provide greater optimization opportunity (e.g. a custom implementation of writeByte can be a lot simpler).

If the json.stringify logic changes, our custom writer might fail (though this would be compile-time error). If we're using io.Writer chances are we wouldn't have to make any changes (unless json.stringify changes so much that even io.Writer is no longer a valid type).

Use whichever you want.

Again, none of this is particularly obvious. There's nothing in std.json that points to io.Writer. More generically, as far as I can tell, there's nothing that describes the contract. This makes me wonder about backwards compatibility as it seems like json.stringify isn't constrained in anyway. At any point, it can change what it requires from its third parameter.