Learning Golang By Benchmarking Set Implementations

Jul 15, 2011

As my twitter followers probably know, the last couple days I've been having fun learning Google's Go language. My goal is to build something in the FlockDB universe, but for now I'm just getting familiar with things. For those who aren't familiar with it, Go is Google's attempt to build a modern system programming language. A modern day C, if you will.

Go has some pretty neat features. And it has some annoying limitations. I'll blog about those when I get more familiar with it. One thing that I can definitively say is that it is both young and there appears to be a strong will to keep it slim and focused. The result is a comparatively lacking framework (or set of built-in packages as they are called). For example, there's no built-in Set.

Please keep in mind that I don't really know what I'm doing.

My goal tonight was to benchmark various Set implementation, in the hope that I'd not only find a good implementation but also familiarize myself with Go. I want to be very clear: I have specific requirements for my Set. I'm therefore focusing on testing against my use-case. What is my use-case? Basically I want to populate a collection with integers and explicitly remove all duplicates on demand. In other words, I only care about unique values when I care about unique values - the rest of the time, duplicates are ok. Ordering isn't important. Here's an example of how I might use it:

func Load(users []User) ([]int) {
  ids := new(MySpecialCollection)
  for user := range users {
    for id := range.user.friendIds {
      ids.Add(id)
    }
  }
  ids.RemoveDuplicates()
  return ids;
}

So, I want to get all the unique ids of all the user's friends. Ids are removed once at the end and then the "set" is returned.

We'll begin by defining an interface:

type Set interface {
  Add(value int)
  Contains(value int) (bool)
  Length() (int)
  RemoveDuplicates()
}

We only really care about Add and RemoveDuplicates. However, the Contains and Length methods will help us test our implementation (kinda pointless to see which runs faster if we don't also test that they actually work!)

The first implementation uses a map (Go's built-in hashtable type). This is our baseline because it's the simplest most obvious solution. Here's the implementation:

type HashSet struct {
  data map[int]bool
}

func (this *HashSet) Add(value int) {
  this.data[value] = true
}

func (this *HashSet) Contains(value int) (exists bool) {
  _, exists = this.data[value]
  return
}

func (this *HashSet) Length() (int) {
  return len(this.data)
}

func (this *HashSet) RemoveDuplicates() {}

func NewSet() (Set) {
  return &HashSet{make(map[int] bool)}
}

If this is your first time looking at Go, you might be a little lost. First we define our type called HashSet and define a private member data (it's private because the name starts with a lowercase letter) which is a map. One really cool thing about Go is that you don't have to explicitly implement an interface. Simply implement its methods and it can safely be cast to it (it's awesome!) Methods are defined as functions with a receiver (this *HashSet). All of the implementations are pretty straighforward. RemoveDuplicates does nothing because duplicates are never added (that's just a basic property of a hash/map).

Two things about the above code worth exploring. First, the Contains method is somewhat funky. In Go, methods can return multiple values. Getting a value from a map (this.data[value]) returns two values. The actual value at the key, and true/false on whether the value existed. We discard the actual value by assigning it to the magic variable _ and assign whether the value existed or not to the exists variable - which is actually a named returned parameter. The other thing is the NewSet function. Functions like this is how you do a constructor. I think it kinda sucks. Within our "constructor" we actually initialize our map using the special make. make is something else which I think sucks about Go. Three built-in types need to be instantiated with make, all the other types are instantiated with new (or the &TYPE{} syntax we are using here, which is syntactical sugar for new and is similar to C#'s constructor initializers).

We'll create two tests to make sure our implementation works:

import(
  "testing"
  "rand"
)

func TestCanAddUniqueValues(t *testing.T) {
  values := []int{5,4,3,9,1,2,8}
  set := NewSet()
  for _,v := range values {
    set.Add(v)
  }
  set.RemoveDuplicates()
  Assert(t, set.Length() == len(values), "Wrong length")
  for _,v := range values {
    Assert(t, set.Contains(v), "Missing value")
  }
}

func TestUniqueRemovesDuplicates(t *testing.T) {
  values := []int{5,4,3,5,2,1,2,3,4,5,1,2,2}
  uniques := []int{5,4,3,2,1}
  set := NewSet()
  for _,v := range values {
    set.Add(v)
  }
  set.RemoveDuplicates()
  Assert(t, set.Length() == len(uniques), "Wrong length")
  for _,v := range uniques {
    Assert(t, set.Contains(v), "Missing value")
  }
}
func Assert(t *testing.T, condition bool, message string) {
  if (!condition) {
    t.Fatal(message)
  }
}

Testing in Go isn't particularly pretty. But, at least we'll be able to make sure that whenever RemoveDuplicates is called on our set, that duplicates are, in fact, removed.

Now comes the interesting part. Go has built-in benchmarking capabilities, and it's pretty neat. Rather than define a Test function, we define a Benchmark method, like so:

func BenchmarkSmallSetWithFewCollisions(b *testing.B){
  for i := 0; i < b.N; i++ {
    set := NewSet()
    for j := 0; j < 100; j++ {
      set.Add(rand.Int() %700)
      }
    }
  }
}

If you look at the first for loop, you'll notice that we are looping b.N times. b is the parameter which is passed into our function. The way it works is that the engine will loop through our code as many times as it needs to until it feels comfortable that it has an accurate measurement for the cost of a single iteration. This'll become more clear briefly.

My goal was to test the performance of a small, medium and large set (100, 5000 and 100000 items) and for each of those, with a small, medium and large number of duplicates. The above code will generate a set of 100 values, where a given value can be between 0 and 699 - so duplicates are pretty rare. Here's what all 9 benchmarks look like:

func benchmark(b *testing.B, size int, fill int) {
  for i := 0; i < b.N; i++ {
    set := NewSet()
    for j := 0; j < size; j++ {
      set.Add(rand.Int() % fill)
    }
  }
}

func BenchmarkSmallSetWithFewCollisions(b *testing.B){
  benchmark(b, 100, 700)
}
func BenchmarkSmallSetWithMoreCollisions(b *testing.B){
 benchmark(b, 100, 100)
}
func BenchmarkSmallSetWithManyCollisions(b *testing.B){
 benchmark(b, 100, 25)
}
func BenchmarkMediumSetWithFewCollisions(b *testing.B){
  benchmark(b, 5000, 35000)
}
func BenchmarkMediumSetWithMoreCollisions(b *testing.B){
 benchmark(b, 5000, 5000)
}
func BenchmarkMediumSetWithManyCollisions(b *testing.B){
 benchmark(b, 5000, 1250)
}
func BenchmarkLargeSetWithFewCollisions(b *testing.B){
  benchmark(b, 100000, 700000)
}
func BenchmarkLargeSetWithMoreCollisions(b *testing.B){
 benchmark(b, 100000, 100000)
}
func BenchmarkLargeSetWithManyCollisions(b *testing.B){
 benchmark(b, 100000, 25000)
}

And, here's the output from running this against our simple HashSet implementation:

set.BenchmarkSmallSetWithFewCollisions		50000		52600 ns/op
set.BenchmarkSmallSetWithMoreCollisions		50000		42991 ns/op
set.BenchmarkSmallSetWithManyCollisions		50000		31406 ns/op
set.BenchmarkMediumSetWithFewCollisions		1000		2663531 ns/op
set.BenchmarkMediumSetWithMoreCollisions	1000		1703337 ns/op
set.BenchmarkMediumSetWithManyCollisions	1000		1411807 ns/op
set.BenchmarkLargeSetWithFewCollisions		50			54807860 ns/op
set.BenchmarkLargeSetWithMoreCollisions		50			48436380 ns/op
set.BenchmarkLargeSetWithManyCollisions		50			34149020 ns/op

The first column is the benchmark that was run. The 2nd column is how many iterations it ran (or the value of b.N). The last column is the time it took (in nanoseconds) per loop. From the above, we can tell that the more collisions there are, the faster the code (I'd guess that's because it has to insert less values into the map).

For my second implementation, I decided to use an array and scan the array for a duplicate before adding it. The interesting method here is Add and Contains:

func (this *RealtimeArraySet) Add(value int) {
  if this.Contains(value) == false {
    this.data = append(this.data, value)
  }
}
func (this *RealtimeArraySet) Contains(value int) (exists bool) {
  for _, v := range this.data {
    if v == value {
      return true
    }
  }
  return false
}

And, here are the results:

set.BenchmarkSmallSetWithFewCollisions		100000		22390 ns/op
set.BenchmarkSmallSetWithMoreCollisions		100000		19421 ns/op
set.BenchmarkSmallSetWithManyCollisions		100000		17037 ns/op
set.BenchmarkMediumSetWithFewCollisions		100			19248020 ns/op
set.BenchmarkMediumSetWithMoreCollisions	100			12210990 ns/op
set.BenchmarkMediumSetWithManyCollisions	500			5158366 ns/op
set.BenchmarkLargeSetWithFewCollisions		1			7482466000 ns/op
set.BenchmarkLargeSetWithMoreCollisions		1			4660109000 ns/op
set.BenchmarkLargeSetWithManyCollisions		1			1785185000 ns/op

For small sets, this is much faster. For large sets, it's much slower. This isn't really a surprise. Doing a linear search in the array just can't scale with large sets.

The last implementation I did was to allow duplicates to get added to the array, but to remove them when explicitly told to:

func (this *OnDemandArraySet) Add(value int) {
  //the append method will automatically grow our array if needed
  this.data = append(this.data, value)
}
func (this *OnDemandArraySet) RemoveDuplicates() {
  length := len(this.data) - 1
  for i := 0; i < length; i++ {
    for j := i + 1; j <= length; j++ {
      if (this.data[i] == this.data[j]) {
        this.data[j] = this.data[length]
        this.data = this.data[0:length]
        length--
        j--
      }
    }
  }
}

We remove duplicates by comparing every value to every other value. This leans heavily on Go slices. You can think of a Go slice as a movable window above an array. Slices are really efficient. When we find a duplicate, we replace it with the last element and shrink by 1. So, if we have: [1,2,3,2,5,6] it'll become [1,2,3,6,5] (the 6 overwrites the duplicate 2 and the slice is shrank). Again, all of this is efficient - we aren't creating copies of array, just playing with pointers. The results:

set.BenchmarkSmallSetWithFewCollisions		200000		12362 ns/op
set.BenchmarkSmallSetWithMoreCollisions		200000		12425 ns/op
set.BenchmarkSmallSetWithManyCollisions		200000		12430 ns/op
set.BenchmarkMediumSetWithFewCollisions		5000		682792 ns/op
set.BenchmarkMediumSetWithMoreCollisions	2000		720831 ns/op
set.BenchmarkMediumSetWithManyCollisions	2000		706491 ns/op
set.BenchmarkLargeSetWithFewCollisions		100			12645380 ns/op
set.BenchmarkLargeSetWithMoreCollisions		100			12682960 ns/op
set.BenchmarkLargeSetWithManyCollisions		100			12863000 ns/op

Under all circumstances, this is a clear winner. In the simplest case it's 4.25x faster than the map and 1.8x faster than the original array implementation. In the most complex case it's 2.6x faster than the map and 138x faster than the original array implementation. (I did play with pre-allocating the underlying structures with different sizes, it didn't change anything.)

Beyond raw speed, we can also measure how much memory each approach takes. For this, I simply enabled Go's memoryprofile and looped through each set a fixed number of times. To be honest, I'm not positive what all the values mean, but I extracted the three which seemed the most relevant:

HashSet
Alloc = 1227039448
TotalAlloc = 2249302288
HeapInuse = 1241645056

RealTimeArraySet
Alloc = 1033523080
TotalAlloc = 1033574400
HeapInuse = 1047220224

OnDemandArraySet
Alloc = 3631414928
TotalAlloc = 3631466248
HeapInuse = 3659673600

No huge surprise here. Since the fastest implementation doesn't discard duplicates until we tell it to, it takes the most amount of memory. The RealTimeArraySet takes the least amount of memory because duplicates are discarded, and there's little overhead to the underlying structure (an array). I'm quite surprised at how close the memory footprint of the map is to the array.

The source code for this is available at https://github.com/karlseguin/golang-set-fun.