Speeding Up Queries: Understanding Query Plans

Indexes are one of the most important things to master about MongoDB. Proper indexes can mean the difference between a query that takes minutes and one that takes a few seconds. Looking at a query's explain output is the first thing you should do when troubleshooting a slow query.

A query that doesn't use an index can be identified by its BasicCursor cursor type. Such a query must scan every document in the collection. Or, put differently, the number of scanned documents (nscannedObjects) will always be equal to the collection's count. This should only be tolerated for small collection and possibly infrequently-ran background tasks.

> db.unicorns.find({weight: {$gt: 500}}).explain()
cursor: "BasicCursor",
nscannedObjects: 12,
n: 10 // matching documents

A query that uses an index has a cursor of type BtreeCursor (or GeoBrowse-XYZ for a 2D index). The used index is part the cursor type (_id_ in this case). MongoDB decides which query plan to use by occasionally executing them all, and using the first one to finish. You can see all potential plans for a query by passing true to explain.

> db.unicorns.find({_id: ObjectId("SOME_ID")}).explain()
cursor: "BtreeCursor _id_",
nscannedObjects: 1,
n: 1

As important as it is to have MongoDB use an index, it's as important that we make the correct indexes available. Selecting the correct fields to index isn't complicated, but it is important. Even a single field equality operation requires some consideration.

Indexes should be selective. Imagine building an online bookstore, where 90% of your products are categories as books, and the remaining 10% is places in a variety of categories. An index on cat will do wonders when searching for bookmark, but won't be useful for when searching for book. Such low-selectivity indexes often end up wasting space.

Compare the outputs from our two queries. When searching for a rare term, the index is highly effective. However, when we search for a common term, the query takes considerably longer. If 95% of searches happen to be for bookmark this inconsistency might be acceptable. In most cases, it isn't.

Compound indexes, discussed next, can help improve the selectivity of such indexes.

> db.products.find({cat: 'bookmark'}).explain().explain()
cursor: "BtreeCursor cat_1",
nscannedObjects: 9908,
n: 9908,
millis: 34 // took 34 milliseconds

> db.products.find({cat: 'book'}).explain()
cursor: "BtreeCursor cat_1",
nscannedObjects: 910056,
n: 910056,
millis: 2699

Compound Indexes

MongoDB cannot merge multiple indexes. For a given query, a single index will be used. Therefore, well thought-out compound indexes should be your goal. Identifying what group of fields should be indexed together is about understanding the queries that you are running. Beyond that, there are two things to know. First, index ordering matters. Second, and somewhat related, being smart about it so we have the fewest possible indexes.

Field Order

The order of the fields used to create your index is critical. The good news is that, if you get the order wrong, the index won't be used at all, so it'll be easy to spot with explain.

The queries on the right are two different queries. What's important though is that the first query cannot use the index because cat is defined first in our index. You should think of compound indexes as having the field values concatenated together. So, an "index value" of "book405" (cat + qty) can only be used to find something by cat or cat and qty.

The two queries return around the same number of documents, but the second, index-powered query, has to scan a significantly smaller number of objects. This is due to the logarithmic nature of B-trees, which Mongo indexes use.

> db.products.createIndex({cat: 1, qty: 1})
> db.products.find({qty: {$gt: 400}}).explain()
cursor: "BasicCursor",
nscannedObjects: 990001,
n: 196267,
millis: 1342

> db.products.find({qty: {$gt: 400}, cat: 'book'}).explain()
cursor: "BtreeCursor cat_1_qty_1",
nscannedObjects: 178442,
n: 178442,
millis: 635

Sorting

Sorting can be an expensive operation, so it's important to treat the fields that you are sorting on just like a field that you are querying. There is one important difference though, the field that you are sorting must be the last field in your index. The only exception to this rule is if the field is also part of your query, then the must-be-last-rule doesn't apply.

The query on the right sorts on a third field which isn't part of our index. The same documents are returned, but it takes more than 6 times longer. The only solution is to add price as the last field of our index. If however, we wanted to sort on qty or cat, our index wouldn't need to change.

What would happen if we kept our existing index and added a separate index to price? If you think it'll solve the issue, head back to the top and start reading again. A MongoDB query can only use one query. The correct answer is...You can't know unless you run explain. It's a pretty safe bet that MongoDB will use an index, but whether it uses the original index or the new index is up to its query plan (which is based on which one actually runs faster).

> db.products.find({qty: {$gt: 400}, cat: 'book'})
             .sort({price:1}).explain()

cursor: "BtreeCursor cat_1_qty_1",
nscannedObjects: 178442,
n: 178442,
millis: 4047,
scanAndOrder: true, // an indexless sort was used (bad!)

> db.products.createIndex({price:1})
> db.products.find({qty: {$gt: 400}, cat: 'book'})
             .sort({price:1}).explain()

cursor: "BtreeCursor cat_1_qty_1", // for me it used the original

Being Smart About Indexes

Indexes aren't free. They take memory, and impose a performance penalty when doing inserts, updates and deletes. Normally the performance hit is negligible (especially compared to gains in read performance), but that doesn't mean that we can't be smart about creating our indexes.

When we first looked at the importance of field order, we created a single index and looked at two queries. The first query, on qty, couldn't make use of the index due to field ordering. If we just reverse our indexes fields, from {cat: 1, qty: 1} to {qty: 1, cat: 1}, both queries will be able to use the index.

There's no point at looking at other examples. The ability to reorder indexes to satisfy a greater number of queries is highly dependent on your particular scenario. However, I assure you that it happens often, and looking at your queries and indexes from this perspective is the key to fast code.

They Key Points Are...

Learning to use and love explain should be your first priority. If you aren't sure, try it and explain it. Then change it and explain it. And, if you are going to ask people for help with a query, make sure to include the output of explain (as well as db.coll.getIndexes()).

After that, understand two key points. First, while an index is better than no index, the correct index is much better than either. Secondly, MongoDB will only use one index per query, making compound indexes with proper field ordering what you probably want to use.

As the ultimate goal, the closer you can get nscannedObjects to n (scanned documents/indexes vs matching documents), the better things will run.