High Concurrency LRU Caching
Oct 19, 2013
LRU Cache Introduction
The most common way of implementing an LRU cache is to use a hashtable for lookups and a linked list to track when items were used. You might want to read this introduction if you aren't familiar with the implementation.
There are two Aha! moments most people have writing one. The first is the realization that you need two different data structures. The second is realizing how you link them together. You see, when you GET or SET an item, you do so from the hashtable. However, you need to promote it in the list. Conversely, when you free up space, you do so from the list, but you need to delete it from the hashtable. The result tends to be a wrapper object that looks like:
type Item struct {
key string
value Value
element *list.Element
}
Such an object gives you access to the linked list from the hashtable, or the hashtable from the linked list.
Concurrency
Supporting concurrent access to our cache is pretty simple. It's important to realize though that granular locks are key to achieving high throughput. If you had a single lock for the entire cache, you'd end up serializing access to it. That's not even worth exploring. As a first step, we can create a separate read-write mutex for our hashtable and our list.
Hashtable
A read-write mutex for the hashtable is efficient. Assuming that we are GETing more than we are SETting, we'll mostly be acquiring read-locks (which multiple threads can secure). The only time we need a write lock is when setting an item. If we keep things basic, it ends up looking like:
func (c *Cache) Get(key string) Value {
c.lookupLock.RLock()
item := c.lookup[key]
c.lookupLock.RUnlock()
if item == nil { return nil }
return item.value
}
func (c *Cache) Set(key string, value Value) {
item := &Item{key: key, value: value,}
c.lookupLock.Lock()
defer c.lookupLock.Unlock()
c.lookup[key] = item
}
If necessary, we could always shard our hashtable to support more write throughput.
List
Concurrent access of our list is a much bigger challenge. Why? Because every GET requires a write lock on our list. Promoting an item involves moving a node (or inserting one in the case of an initial set) at the head of the list:
func (c *Cache) Get(key string) Value {
c.lookupLock.RLock()
item := c.lookup[key]
c.lookupLock.RUnlock()
if item == nil { return nil }
c.promote(item)
return item.value
}
func (c *Cache) Promote(item *Item) {
c.listLock.Lock()
defer c.listLock.Unlock()
c.list.MoveToFront(item.element)
}
What can we do?
Window
One of my favorite solutions is to use a window to limit how often you'll promote an item. In other words, if an item was recently promoted, it's probably near enough the front of our list that we don't need to re-promote it. Given a large enough cache (both in terms of total space and number of items), your window could be measured in minutes.
To achieve this we must first change our Item
structure:
type Item struct {
key string
sync.Mutex
value Value
promoted time.Time
element *list.Element
}
Now we can reduce the frequency of write locks on our list:
func (c *Cache) promote(item *Item) {
now := time.Now()
stale := now.Add(time.Minute * -2)
item.Lock()
defer item.Unlock()
if item.promoted.Before(stale) {
item.promoted = now
c.listLock.Lock()
defer c.listLock.Unlock()
c.list.MoveToFront(item.element)
}
}
Buffering
In addition to using a window, we can also promote in a separate dedicated thread. Go channels make this solution particularly clean:
nc New() *Cache {
c := &Cache{
list: list.New(),
lookup: make(map[string]*Item),
promotables: make(chan *Item, 2048),
}
go c.promoter()
return c
}
func (c *Cache) promote(item *Item) {
c.promotable <- item
}
func (c *Cache) promoter() {
for {
item := <- c.promotables
c.list.MoveToFront(item.element)
}
}
Notice that the above code doesn't require a lock on the list. Synchronization is instead done through Go's channels, which gives us buffering for free.
But wait, we're cheating. What about freeing up memory? Well, if we want to keep everything simple and lock free, we can do it all in the promoter (and maybe rename it along the way):
func (c *Cache) worker() {
ms := new(runtime.MemStats)
for {
item := <- c.promotables
c.list.MoveToFront(item.element)
runtime.ReadMemStats(ms)
if ms.HeapAlloc > c.size {
c.gc()
}
}
}
func (c *Cache) gc() {
for i := 0; i < c.itemsToPrune; i++ {
element := c.list.Back()
if element == nil { return }
item := element.Value.(*Item)
c.bucket(item.key).Remove(item.key)
c.list.Remove(element)
}
}
I like this approach. However, we're leaning heavily on our channel's buffer. If the garbage collection takes long, our promotable
channel will fill up, and our promote
method will block.
A small improvement would be to make promote
non blocking:
func (c *Cache) promote(item *Item) {
select {
c.promotable <- item:
default:
}
}
If we're able to write to the channel, great. Otherwise, the item just won't get promoted. (Go's select
is powerful, the default
path will execute if all other paths are blocking).
Rather than failing to promote the latest item, we could even attempt to dump the oldest one:
func (c *Cache) promote(item *Item) {
select {
c.promotable <- item:
default:
<-c.promotable
c.promote(item)
}
}
But this isn't risk free (another goroutines could keep using up our freed slot!)
Intrusive List
What if you aren't comfortable running the promoter and garbage collector in the same thread, for fear of dropping promote requests? Are you back to a coarse lock across the list?
The beauty of a linked list is that it isn't an atomic data structure. When you're promoting an item to the head, you can also safely manipulate the tail, as long as they aren't the same item (or siblings of the item). In other words, synchronizing a linked list can happen at the node level.
(As an aside, a key benefit of a skiplist is exactly this. The ability to lock only those nodes which participate directly in a modification, makes it a concurrent friendly data structure.)
What we want to do is build a linked list with nodes that can be synchronized. Our Item
structure already has a mutex (to synchronize access its promoted
field). So the ideal solution is to build a linked list with Item
as the element. This is called an intrusive linked list since the data and next/prev
pointers sit within the same structure.
We're doing all of this so that we can lock a node at a time and manipulate it as needed, rather than using a single lock across the entire list. This isn't as simple as it might first seem (in fact, I won't provide a working solution). The problem is that, if we aren't careful, we'll create dealocks. How? Imagine a list with 26 items, labeld A-Z:
A<->B<->C<->D<->E<->...X<->Y<->Z
What happens if we want to promote Y
while also freeing up memory? Our promoter will lock Y
, and our GC will lock Z
. Our promoter will then try to lock Z
while our GC will try to lock Y
. This is a deadlock. To make this work, in pseudocode, we need to do something like:
function lockFamily(node)
lock node
if try lock node.next
if try lock node.prev
//we can safely manipulate this chunk
else
unlock node.next
unlock node
lockFamily node //try again
end
else
unlock node
lockFamily node
end
end
Not impossible, but not fun either
Conclusion
Ultimately, our goal was to reduce lock contention against our list. We achieved this in three ways. First, we use a window to limit the frequency of promotion. Second, we use a buffered channel to process promotions in a separate thread. Finally, we can do promotion and GC within the same thread.
This is all preliminary work. The idea came to me while walking home last night, and has been quickly implemented as a prototype. If you're interested, and brave, you can check out the source code here. It takes care of some of the edge cases that I overlooked (such as promoting a new vs existing item).