Multiple Collections vs Embedded Documents

Inevitably, everyone who uses MongoDB has to choose between using multiple collections with id references or embedded documents

Use separate collections

db.posts.find();
{_id: 1, title: 'unicorns are awesome', ...}

db.comments.find();
{_id: 1, post_id: 1, title: 'i agree', ...}
{_id: 2, post_id: 1, title: 'they kill vampires too!', ...}

- or -

Use embedded documents

db.posts.find();
{_id: 1, title: 'unicorns are awesome', ..., comments: [
  {title: 'i agree', ...},
  {title: 'they kill vampires too!', ...}
]}

Don't worry, this means that you get it

Both solutions have their strengths and weaknesses. Learn to use both

Separate collections offer the greatest querying flexibility

// sort comments however you want
db.comments.find({post_id: 3}).sort({votes: -1}).limit(5)

// pull out one or more specific comment(s)
db.comments.find({post_id: 3, user: 'leto'})

// get all of a user's comments joining the posts to get the title
var comments = db.comments.find({user: 'leto'}, {post_id: true})
var postIds = comments.map(function(c) { return c.post_id; });
db.posts.find({_id: {$in: postIds}}, {title: true});

Selecting embedded documents is more limited

// you can select a range (useful for paging)
// but can't sort, so you are limited to the insertion order
db.posts.find({_id: 3}, {comments: {$slice: [0, 5]}})

// you can select the post without any comments also
db.posts.find({_id: 54}, {comments: -1})

// you <em>can't</em> use the update's position operator ($) for field selections
db.posts.find({'comments.user': 'leto'}, {title: 1, 'comments.$': 1})

(there are two separate features, currently in planning/development, which should address this)

A document, including all its embedded documents and arrays, cannot exceed 16MB

Don't freak out, The Complete Works of William Shakespeare is around 5.5MB

Separate collections require more work

// finding a post + its comments is two queries and requires extra work
// in your code to make it all pretty (or your ODM might do it for you)
db.posts.find({_id: 9001});
db.comments.find({post_id: 9001})

this also takes [a bit] more space (+1 field, +1 index)

Embedded documents are easy and fast (single seek)

// finding a post + its comments
db.posts.find({_id: 9001});

No big differences for inserts and updates

// separate collection insert and update
db.comments.insert({post_id: 43, title: 'i hate unicrons', user: 'dracula'});
db.comments.update({_id: 4949}, {$set : {title: 'i hate unicorns'}});

// embedded document insert and update
db.posts.update({_id: 43}, {$push: {title: 'lol @ emo vampire', user: 'paul'}})
// this specific update requires that we store an _id with each comment
db.posts.update( {'comments._id': 4949}, {$inc:{'comments.$.votes':1}})

documents that grow, like inserting new comments, will have a padding factor

So, separate collections are good if you need to select individual documents, need more control over querying, or have huge documents.

Embedded documents are good when you want the entire document, the document with a $slice of comments, or with no comments at all.

As a general rule, if you have a lot of "comments" or if they are large, a separate collection might be best.

Smaller and/or fewer documents tend to be a natural fit for embedding.

Virtual collection + positional field selection are relevant upcoming[ish] features

Remember, you can always change your mind. Trying both is the best way to learn.