Basic Micro-Optimizations Part 3
This intersects various things I've previously written about micro-optimization and my most recent overview of the specialized integer set. It's ok if you haven't already read those other posts. All you need to do is understand this function:
func exists(haystack []int, needle int) bool {
l := len(haystack)
for i := 0; i < l; i++ {
if haystack[i] == needle {
return true
}
}
return false
}
This code is looking to see if a value exists within an array. Its performance depends on a number of factor. The most important is the size of our haystack. It also depends on where (and if) we expect to find the needle. A huge haystack won't have a negative impact if the needle is always at the beginning. Still, if we assume a random distribution, we get the following numbers (with the size of the haystack in parenthesis) :
linear (16) 100000000 24.1 ns/op
linear (32) 50000000 35.2 ns/op
linear (512) 10000000 194 ns/op
linear (4096) 1000000 1367 ns/op
The most common way to try and make this better is to have the values in our array sorted. We'll ignore the cost of keeping our values sorted: these are read-heavy data structures and while we'll want to make sure our insertion cost doesn't become unbearable, we'll first see if its even worth it.
The first thing we can do is a binary search. Go's sort
package does this for us, so the code becomes:
func exists(haystack []int, needle int) (int, bool) {
index := sort.SearchInts(haystack, needle)
return index, haystack[index] == needle
}
The return value has changed a bit. We not only return if it exists, but where it exists (or ought to exist). This is information insert/delete would need (we could have an even more optimized version of exists
only for lookups, but one thing at a time). The numbers:
binary (16) 50000000 37.6 ns/op
binary (32) 30000000 44.0 ns/op
binary (512) 20000000 100 ns/op
binary (4096) 10000000 115 ns/op
The binary search shines with a large haystack. It's important to take note of the fact that, for a small haystack, the performance is worse. Why? The smaller the input size, the more significant, proportionally speaking, the constant time of an algorithm. Binary searching requires more comparisons and should benefit less from CPU caches.
With sorted data, the other option is to do a linear search and exit early. Consider:
func early(haystack []int, needle int) (int, bool) {
l := len(haystack)
for i := 0; i < l; i++ {
value := haystack[i]
if value < needle {
continue
}
return i, value == needle
}
return l, false
}
The results aren't surprising. At any size, it's an improvement over the simplest linear solution:
linear-exit (16) 100000000 21.1 ns/op
linear-exit (32) 50000000 28.5 ns/op
linear-exit (512) 10000000 152 ns/op
linear-exit (4096) 2000000 988 ns/op
Can we do more? Well, linear-exit, like linear, is biased towards values at the beginning of the array. When the array is sorted, couldn't be conditionally scan forwards or backwards? Something like:
func exists(haystack []int, needle int) (int, bool) {
l := len(haystack)
if l == 0 {
return 0, false
}
if needle <= haystack[l/2] {
return existsForward(haystack, needle)
}
return existsBackward(haystack, needle)
}
Which gives the following performance:
linear-bi (16) 100000000 21.6 ns/op
linear-bi (32) 50000000 24.8 ns/op
linear-bi (512) 20000000 105 ns/op
linear-bi (4096) 2000000 620 ns/op
The above code looks promising, but in reality, it's just cutting our input in half. In the case of IntSet, we specify a target haystack size, so it's more efficient to get a smaller size upfront versus introducing the overhead of an extra comparison and division per call.
So, did we learn anything? It comes down to trade offs. If we expect to have large haystacks, we'll probably have to burden our inserts to preserve ordering and use binary search. If we expect smaller haystacks, which is already very fast as a linear search, we need to decide if a 25% gain is worth it.