Demystifying Write Durability

Much has been said about MongoDB's write durability. Some of it true, some of it not, and some of it was once true but no longer is. This can lead to confusion about a topic where no confusion should exist.

History

There was a time when an insert or update happened in memory with no options available to developers. The data files would get synced periodically (configurable, but defaulting to 60 second). This meant that, should the server crash, up to 60 seconds of writes would be lost. At the time, the answer to this was to run replica pairs (which were later replaced with replica sets). As the number of machines in your replica set grows, the chances of data loss decreases.

But replication is asynchronous, meaning the master could crash before replicating your data. Under this circumstance, the loss-of-data window was much smaller than 60 seconds, but it was still a concern. It therefore became possible, by calling getLastError with {w:N} after a write, to specify the number (N) of servers the write must be replicated to before returning.

At one point, it also became possible to specify {fsync:true} via getLastError, but that killed performance, so it was rarely used.

Then 1.8 added journaling and 2.0 enabled it by default. This largely addressed any concern about MongoDB write durability, especially for single-server setups.

Current State

Although journaling is now enabled by default, it's important to know that it does group commits. That means, writes are still done in memory. The difference though is that the journal file is synced every 100ms (configurable down to 2ms), while the data files continue to sync every 60 seconds. So, there's still a window of possible data loss, though it's much smaller. If you are wondering why it's ok to sync one often (100ms) and the other not (60s), it's because journaling uses an append-only file, which is much faster to write to.

There are two things developers can do to eliminate the remaining gap, both using getLastError. You can use either one, or both. First, the {w:N} option remains available. Secondly, you can specify {j: true}, which won't return until the journal data is committed to disk. (to specify both, you use {j: true, w:N})

Where Does Safe Mode Come In?

Safe mode is yet another option you can pass to getLastError, via {safe: true}. Safe mode only tells you that the server received the data and that it's in memory. That's better than nothing, but it really doesn't give an guarantees about durability. Safe mode should be thought of as validation, specifically to catch things like unique-index violations. It tells you that the data is valid and the writes (journal and data files) will proceed, but it doesn't say anything about when those writes will occur.

An Empowering Journey

Whether or not you agree with the initial assertion that durability is best achieved over multiple servers, the fact is that MongoDB not only has a good single-server solution, but it also has some of the most flexible write capabilities. The ability to opt-in or out of nodes-to-replicate and journal commit, on a per-write basis, is something that a number of developers leverage to maximize their write throughput.