homedark

Zig's @memcpy, copyForwards and copyBackwards

Sep 03, 2024

If you've used Zig for a bit, you've probably come across the @memcpy builtin. It copies bytes from one region of memory to another. For example, if we wanted to concat two arrays, we could write a little helper:

fn concat(comptime T: type, allocator: std.mem.Allocator, arr1: []const T, arr2: []const T) ![]T {
  var combined = try allocator.alloc(T, arr1.len + arr2.len);
  @memcpy(combined[0..arr1.len], arr1);
  @memcpy(combined[arr1.len..], arr2);
  return combined;
}

You might have also come across std.mem.copyForwards and std.mem.copyBackwards. All of these do the same thing, but have slightly different constraint on the parameters. Specifically, each restricts if or how the two parameters can overlap. At first glance, having 3 ways to do the same thing might seem like overkill. If we look at the concat example above, combined is a newly allocated chunk of memory: it can never overlap with the memory of arr1 and arr2.

While you're probably using @memcpy in a way similar as concat, there are equally legitimate cases where the two parameters could overlap. For example, in pg.zig we read messages from the PostgreSQL server into a static buffer. For efficiency, we read as much data as possible - ideally, we fill our buffer. As our result, our buffer could be filled with partial payloads. As a made up example, consider:

// In reality, we us a much larger buffer
var buffer: [16]u8 = undefined;
const n = try socket.read(&buffer);

Assuming this fills up our buffer (i.e. n == 16), buffer might look like:

'D', 0, 4, 'G', 'O', 'K', 'U',            // our first row
'D', 0, 9, 'O', 'V', 'E', 'R', '9', '0',  // the start of our next row

To complete our 2nd row, we need to read more data from the socket, but our buffer is full. We could dynamically allocated a larger buffer. However, since data for the 1st row is no longer needed, our buffer is large enough as long as we move our partially-read 2nd row to the beginning. Unlike our concat which was copying into a totally new memory location, we now need to copy from buffer into buffer. Is that ok? Let's extract this specific example and try:

const std = @import("std");

pub fn main() !void {
  var buf = [16]u8{
    'D', 0, 4, 'G', 'O', 'K', 'U',            // our first row
    'D', 0, 9, 'O', 'V', 'E', 'R', '9', '0',  // the start of our next row
  };

  @memcpy(buf[0..9], buf[7..16]);
  std.debug.print("{any}\n", .{&buf});
}

If you try to run the above, you'll get a runtime panic: panic: @memcpy arguments alias. To solve this, we need to use std.mem.copyForwards. If I was to implement my own memcpy functionality, I'd likely end up with something similar to copyFowards::

fn cp(comptime T: type, dest: []T, src: []const u8) void {
  std.debug.assert(dest.len == src.len);
  for (0..src.len) |i| {
    dest[i] = src[i];
  }
}

But, as a general solution, this code has bugs. Consider this usage:

pub fn main() !void {
  var buf = [4]u8{1, 2, 3, 4};
  cp(u8, buf[1..3], buf[0..2]);
  std.debug.print("{any}", .{&buf});
}

We're saying copy the values 1, 2 over 2, 3, so presumably we'd expect the result to be 1, 1, 2, 4. But, with our simple cp function, we'd get 1, 1, 1, 4. The issue is that our cp function overwrites part of src before it's copied into dest. In this case, we can solve the issue by copying backwards. In other words, instead of:

buf[1] = buf[0]; // this overwrites buf[1], which we haven't copied yet
buf[2] = buf[1];

We want to do:

buf[2] = buf[1]; // this copies buf[1] before we copy it
buf[1] = buf[0];

You can see why we need both a copyForwards and copyBackwards. In some cases we need to copy src front to back and in other cases we need to copy it back to front. It depends on which part (the beginning or the end) of the buffers overlap. In many other cases, like concat above, where there is no overlap, an implementation is free to optimize the copy operation - perhaps copying multiple bytes using a single operation.

Aliasing

In many languages, including Zig, you're allowed to have multiple variable reference the same memory. As we saw above, aliasing can impact the correctness of code. Because of aliasing, @memcpy crashes on overlapping memory and copyForwards and copyBackwards can give unexpected results.

While Zig's documents the behavior of all three functions, the issue is pervasive. This code also crashes:

const std = @import("std");

pub fn main() !void {
    var buf: [10]u8 = undefined;
    const prefix = try std.fmt.bufPrint(&buf, "{s}", .{"over"});
    const warning = try std.fmt.bufPrint(&buf, "{s}{d}!", .{ prefix, "9000" });
    std.debug.print("{s}\n", .{warning});
}

Because bufPrint internally calls @memcpy, the second time we call it we get a panic. This is because we're trying to copy prefix (which references buf) into buf.

In addition to correctness, when aliasing is possible, compilers need to be careful about the assumptions they make. This often means avoiding certain optimizations. The output for this convoluted code is 30:

const std = @import("std");

pub fn main() !void {
  var incr = [_]i32{10};
  var result: i32 = 10;

  add(&result, &incr);
  std.debug.print("{d}\n", .{result});
}

fn add(a: *i32, b: []i32) void {
  a.* += b[0];
  a.* += b[0];
}

The code is essentially doing 10 + 10 + 10. If we keep add exactly as-is, but change how we call it, we'll get a different result:

pub fn main() !void {
  var incr = [_]i32{10};
  const result: *i32 = &incr[0];

  add(result, &incr);
  std.debug.print("{d}\n", .{result.*});
}

Even though both cases start with a.* == 10 and b[0] == 10 the result is now 40. Why? Because in this second version, a points to b[0]. Thus the first call to a.* += b[0] is incrementing b[0] as well.

When I look at add, I don't think about a referencing b[0], but the compiler does. And as unlikely as that might be, the compiler has no choice but to play it safe. Instead of loading a[0] once, it has to load it twice: once for each read. More generally, aliasing means that compilers need to handle the possibility that writing to a pointer affects other variables.

Like I said above, this is a convoluted example. But in the name of completeness, we can see all of this in action, albeit indirectly. If we take the complete code, save it as "test.zig" and run it in ReleaseFast, we'll get the aforementioned output of 40:

// $ zig run test.zig -O ReleaseFast
// 40
const std = @import("std");

pub fn main() !void {
  var incr = [_]i32{10};
  const result: *i32 = &incr[0];

  add(result, &incr);
  std.debug.print("{d}\n", .{result.*});
}

pub fn add(a: *i32, b: []i32) void {
  a.* += b[0];
  a.* += b[0];
}

Now if we mark one of both parameters with noalias and run it again, we'll get 30:

// $ zig run test.zig -O ReleaseFast
// 30
const std = @import("std");

pub fn main() !void {
  var incr = [_]i32{10};
  const result: *i32 = &incr[0];

  add(result, &incr);
  std.debug.print("{d}\n", .{result.*});
}

// only thing that's changed is that we've added noalias
pub fn add(noalias a: *i32, b: []i32) void {
  a.* += b[0];
  a.* += b[0];
}

Why is this happening? Because the noalias hint lets the compiler reorganize the code so that b[0] is read only once. It's important to note that noalias doesn't forbid aliasing. The fact that this code runs, despite a and b[0] referencing the same memory, proves that. It's merely a promise that we're making to the compiler. To be clear, I've never used noalias in the past and I'm 99% sure I'll never use it in the future. The only reason I bring it up is to hopefully explain/show how aliasing impacts the compiler.

Conclusion

In most cases, you'll end up using @memcpy. Thankfully, if it ever gets called with overlapping memory, you'll get a runtime panic (I say thankfully, because a runtime panic is better than an undefined behavior). Still, unless you're copying into newly allocated memory, it's probably worth spending a few seconds to consider whether the source and destination could overlap and, if so, whether std.mem.copyForwards (or, less likely in my experience std.mem.copyBackwars) is the correct choice.

Beyond that, for the sake of readability and simplicity aliasing is something worth minimizing. Some guidelines use the word "avoid", but I've settled on merely being more mindful of it (at least for now); maybe now and again I manage to limit the scope where two variables reference the same memory.