Paging & Ranking With Large Offsets: MongoDB vs Redis vs PostgreSQL
Hopefully you already know that paging with large offsets tends to be inefficient with most systems. Both the MongoDB and PostgreSQL documentation are clear that developers should avoid large offsets. But, how bad can it really be?
In mogade we provide two features that are tied to paging/offset performance: leaderboard pages and ranks. Most users only care about the first or second pages of a leaderboard, meaning the offset is small. However, we provide a leaderboard view which jumps straight to the page containing the user. This results in a large offset. If we have 1 million scores, the average scenario will result in an offset of 500 000.
Paging
Testing the impact of large offsets is really simple. Assume we have some dummy data that looks like:
db.scores.find();
{lid: ObjectId("4fe506dabb2bfa742d000001"), score: 1, name: 'user_1'}
{lid: ObjectId("4fe506dabb2bfa742d000001"), score: 2, name: 'user_2'}
{lid: ObjectId("4fe506dabb2bfa742d000001"), score: 3, name: 'user_3'}
{lid: ObjectId("4fe506dabb2bfa742d000001"), score: 4, name: 'user_4'}
lid
is the leaderboard id. Our test data is made up of 5 leaderboards, each containing 1 200 000 scores. There's an index on lid
and score
.
collection = Mongo::Connection.new.db('test').collection('scores')
Benchmark.bmbm do |x|
x.report("mongo small") do
100.times do |i|
collection.find({:lid => lids.sample}, {:fields => {:_id => false, :score => true, :user => true}}).sort({:score => -1}).limit(20).skip(i * 20).to_a
end
end
x.report("mongo medium") do
100.times do |i|
collection.find({:lid => lids.sample}, {:fields => {:_id => false, :score => true, :user => true}}).sort({:score => -1}).limit(20).skip(i * 1000).to_a
end
end
x.report("mongo large") do
100.times do |i|
collection.find({:lid => lids.sample}, {:fields => {:_id => false, :score => true, :user => true}}).sort({:score => -1}).limit(20).skip(i * 10000).to_a
end
end
end
All three iterate 100 times, getting 20 records per iteration. The difference is with the offset. The first has a maximum offset of 1980, the second 99 000 and the last one 999 000. The results? 0.6 seconds, 17 seconds and 173 seconds.
If we use Redis and store the results in a sorted set, we can rewrite our benchmark like so:
redis = Redis.new(:driver => :hiredis)
Benchmark.bmbm do |x|
x.report("redis small") do
100.times do |i|
start = i * 20
redis.zrevrange(lids.sample, start, start + 20, :with_scores => true)
end
end
x.report("redis medium") do
100.times do |i|
start = i * 1000
redis.zrevrange(lids.sample, start, start + 20, :with_scores => true)
end
end
x.report("redis large") do
100.times do |i|
start = i * 10000
redis.zrevrange(lids.sample, start, start + 20, :with_scores => true)
end
end
We end up with times of 0.028, 0.025 and 0.028 seconds.
Modifying our MongoDB code to test PostgreSQL is trivial. The structure stays the same, the we create an index on lid
and score
and we benchmark simple selects:
pg.exec("select score, u from scores where lid = $1 order by score limit 20 offset $2", [lids.sample, i * 10000]).to_a
The results, with the same three groups of offsets is 1, 122 and 650 seconds.
Comparing these we get:
mongo small 0.6
mongo medium 17
mongo large 173
redis small 0.028
redis medium 0.025
redis large 0.028
pg small 1
pg medium 122
pg large 650
Ranking
I talked about ranking a while ago, but decided to repeat the experiment. Basically, in MongoDB or a relational database, if you want to get the rank of a score, you need to do:
//sql
select count(*) from scores where lid = $1 and score > $2
//mongo
db.scores.find({lid: lid, score: {$gt: score}}).count()
Given 1 000 000 scores, and average player will need to count through 500 000 records. How does each system perform?
# scores have a random value of 0 - 10 000 000
Benchmark.bmbm do |x|
x.report("mongo top rank") do
1000.times do |i|
collection.find({:lid => olids.sample, :score => {'$gt' => 9999988 - i}}).count()
end
end
x.report("mongo average") do
10.times do |i|
collection.find({:lid => olids.sample, :score => {'$lt' => 5000000 - i}}).count()
end
end
x.report('redis top rank') do
1000.times do |i|
redis.zcount(lids.sample, 9999988 - i, 'inf')
end
end
x.report('redis average') do
1000.times do |i|
redis.zcount(lids.sample, 5000000 - i, 'inf')
end
end
x.report('pg top rank') do
1000.times do |i|
pg.exec("select count(1) from scores where lid = $1 and score > $2", [lids.sample, 9999988-i]).to_a
end
end
x.report('pg average') do
10.times do |i|
pg.exec("select count(1) from scores where lid = $1 and score > $2", [lids.sample, 5000000-i]).to_a
end
end
end
There's something very important in the above benchmark. The MongoDB and PostgreSQL versions for average ranked players only does 10 iteration. Redis does 1000. Why? Well, consider the results:
mongo top rank 1.155847
mongo average 22.291007
redis top rank 0.169442
redis average 0.162205
pg top rank 0.714144
pg average 21.771570
Redis is doing 100x the amount of work in less than 1/100th the amount of time.
Memory
I don't want to talk too much about memory. I mean, once you get to the point where it takes 2 seconds to get 20 documents, it really doesn't matter how much less memory you take...it just isn't usable. However, I am surprised that Redis and MongoDB take up roughly the same amount of memory (~900MB). Furthermore, the MongoDB version could be optimized by shrinking the field names and partitioning each leaderboard into its own collection (doing these two changes brings the total storage+index space to 675MB).
Given that MongoDB stores an extra _id field, stores the field name with each document, and stores the leaderboard id with each document, I really expected Redis to have a considerably smaller memory footprint. Anyone know what gives? I'm so surprised that I'm thinking I did something wrong.
What Can You Do?
Rather than using a large offset, you might try to rewrite your query with an additional where filter. For example, if you knew the last id from the previous page, you could avoid the offset and do:
db.scores.find({lid: lid, score: {$lt: last_score}}).sort({score: -1}).limit(20)
This will perform very well (Redis will still be faster). However, it requires that you know the last page's last score (which we don't in the case of the jump to my page feature). If you do know it, it requires that you persist it and pass it around. And, if you have ties, you might end up skipping results.
The moral of the story is: avoid large offset and counting over many records. When you can't, use Redis.