homedark

Redis' slowlog and DEL's surprisingly bad performance

May 10, 2013

Redis has a useful slowlog facility which, as you can probably guess, lets you inspect slow running queries. Slowlog will log the last X queries that took more than Y microseconds to run. X and Y can be specified in either redis.conf or, at runtime, via the CONFIG command:

CONFIG SET slowlog-log-slower-than 5000
CONFIG SET slowlog-max-len 25

The slowlog-log-slower-than is specified in microseconds, so the above will log queries that take over 5 milliseconds to execute. To get the log entries, you use the SLOWLOG GET X command, where X is the number of items you want to retrieve:

SLOWLOG GET 10

This'll display a unique id, the timestamp the query took place, the amount of time the query took to execute and the actual command + parameters that were executed. You can erase the log via SLOWLOG RESET.

The last time that I looked at our slowlog, I was shocked to see that we had DEL commands taking over 20 milliseconds to execute. Remember, Redis is single-threaded, so this blocks (and consequently seriously harms) our system's concurrency. Also, since this is a write operation, it'll block all slaves while it replicates to them. Ugh. What could be going on?

Maybe everyone already knows this but me, but it turns out Redis' DEL command is O(1) for strings and hashes, but O(N) for lists, sets and sorted sets (where N is the number of items in the collection). Deleting a set containing millions of items? Prepare to block.

Our solution was simple: rather than deleting these items, we'd rename them and delete them in a background job in small and interruptible chunks. First, our delayed_delete function:

local key = KEYS[1]
local data_type = redis.call('type', key).ok

if data_type == 'set' or data_type == 'zset' then
  local temp = 'gc:tmp:' .. redis.call('incr', 'gc:ids') .. ':' .. key
  redis.call('rename', key,  temp)
  return redis.call('sadd', 'gc:' .. data_type, temp)
end

return redis.call('del', key)

This'll rename the collection and add that new name to the gc:set or gc:zset sets (we don't use lists, but if you do, you should add support for them as well).

Next we scheduled a Ruby script to run every minute:

require 'redis'
r = Redis.new(driver: :hiredis)
r.srandmember('gc:set', 10000).each do |set|
  items = r.srandmember(set, 5000)
  if items.nil? || items.length == 0
    r.srem('gc:set', set)
    next
  end
  r.srem(set, items)
end

r.srandmember('gc:zset', 10000).each do |zset|
  if r.zremrangebyrank(zset, 0, 5000) < 5000
    r.srem('gc:zset', zset)
  end
end

You'll want to play with the numbers based on your own needs. How big are your collections and how often do they get deleted? Since we don't do these types of deletes too frequently, we can go about deleting small chunks at a time.

While this approach is much slower than a straight delete, it plays much nicer in a concurrent world.