homedark

Zig's std.json.Parsed(T)

Nov 17, 2023

When parsing JSON using one of Zig's std.json.parseFrom* functions, you'll have to deal with the return type: an std.json.Parsed(T):

const file = try std.fs.openFileAbsolute(file_path, .{});
defer file.close();

var buffered = std.io.bufferedReader(file.reader());
var reader = std.json.reader(allocator, buffered.reader());
defer reader.deinit();

const parsed = try std.json.parseFromTokenSource(Config, allocator, &reader, .{
	.allocate = .alloc_always,
});
defer parsed.deinit();
const config = parsed.value;

In the above code parseFromTokenSource returns a std.json.Parsed(Config). This is a simple structure which exposes a deinit function as well as the parsed value. The reason parseFromTokenSource cannot simply return T (Config in the above case), is that parsing JSON likely results in memory allocations. For example, if our imaginary Config struct had a tags: [][]const u8 field, then parseFromTokenSource would need to allocate a slice.

Memory allocated while parsing can be difficult to manage, at least in a way that works in all cases. It would be an unreasonable burden to ask the caller to know what needs freeing, especially for complex/nested objects. To work around this, parseFromTokenSource creates an std.heap.ArenaAllocator from which all allocations are made. When deinit() is called on the returned Parsed(T), the arena allocator is freed and destroyed.

You might have noticed the .allocate = .alloc_always passed as an option to parseFromTokenSource. This option only relates to whether or not strings are duplicated from the source. In the above example, because our source, file, outlives parsed, we could use the alternative option .alloc_if_needed. But that would not alter the fact that other allocations, such as slices for arrays, would still be created and managed by the arena.

While Parsed(T) is an effective way of dealing with allocations, it's still, in my opinion, cumbersome to use. Specifically, in our above example, config is tied to the lifecycle of parsed. If we wanted to write a loadConfig function, we wouldn't be able to return Config, we'd have to return std.json.Parsed(Config). A program that deals with a lot of JSON might find itself passing std.json.Parsed(T) all over the place.

Unfortunately, there's no great solution to this problem. If our object is simple, we can clone all allocate fields thus allowing us to decouple the lifetime of our object from std.json.Parsed(T), but that's not efficient or scalable. If you look at the implementation of parseFromTokenSource, you'll see that it creates the ArenaAllocator and calls a Leaky variant:

fn parseFromTokenSource(comptime T: type, allocator: Allocator, ....) !Parsed(T)
	var parsed = Parsed(T){
		.arena = try allocator.create(ArenaAllocator),
		.value = undefined,
	};
	errdefer allocator.destroy(parsed.arena);
	parsed.arena.* = ArenaAllocator.init(allocator);
	errdefer parsed.arena.deinit();

	parsed.value = try parseFromTokenSourceLeaky(T, parsed.arena.allocator(), scanner_or_reader, options);

	return parsed;
}

Since parseFromTokenSourceLeaky is public, we can call it directly and provide our own std.heap.ArenaAllocator. It isn't that different than using parseFromTokenSource, but some cases might already have a suitable ArenaAlloator, or the suitable lifetime hooks in place. In such cases, using the Leaky variant makes sense.

While it might seem superficial, I really dislike exposing std.json.Parsed(T). The fact that T came from JSON is an irrelevant implementation detail. Also, there's nothing JSON or parsing-specific about std.json.Parsed. It's just an ArenaAllocator and your value. So, while it doesn't really change anything, given no other option, I like to create my own little Parsed(T)-like wrapper:

pub fn Managed(comptime T: type) type {
	return struct {
		value: T,
		arena: *std.heap.ArenaAllocator,

		const Self = @This();

		pub fn fromJson(parsed: std.json.Parsed(T)) Self {
			return  .{
				.arena = parsed.arena,
				.value = parsed.value,
			};
		}

		pub fn deinit(self: Self) void {
			const arena = self.arena;
			const allocator = arena.child_allocator;
			arena.deinit();
			allocator.destroy(arena);
		}
	};
}

Which means that I can pass an myApp.Managed(Config) around rather than an std.json.Parse(T).

Our original example, changed as a function that returns a Config Managed(Config) from a function, looks like:

fn parseConfig(file_path: []const u8, allocator: Allocator) !Managed(Config) {
	const file = try std.fs.openFileAbsolute(file_path, .{});
	defer file.close();

	var buffered = std.io.bufferedReader(file.reader());
	var reader = std.json.reader(allocator, buffered.reader());
	defer reader.deinit();

	const parsed = try std.json.parseFromTokenSource(Config, allocator, &reader, .{
		.allocate = .alloc_always,
	});
	return Managed(Config).fromJson(parsed);
}