home
Inevitably, everyone who uses MongoDB has to choose between using multiple collections with id references or embedded documents

Use separate collections

  db.posts.find();
  {_id: 1, title: 'unicorns are awesome', ...}

  db.comments.find();
  {_id: 1, post_id: 1, title: 'i agree', ...}
  {_id: 2, post_id: 1, title: 'they kill vampires too!', ...}
- or -

Use embedded documents

  db.posts.find();
  {_id: 1, title: 'unicorns are awesome', ..., comments: [
    {title: 'i agree', ...},
    {title: 'they kill vampires too!', ...}
  ]}
Don't worry, this means that you get it
Both solutions have their strengths and weaknesses. Learn to use both
Separate collections offer the greatest querying flexibility
  // sort comments however you want
  db.comments.find({post_id: 3}).sort({votes: -1}).limit(5)
  
  // pull out one or more specific comment(s)
  db.comments.find({post_id: 3, user: 'leto'})
  
  // get all of a user's comments joining the posts to get the title
  var comments = db.comments.find({user: 'leto'}, {post_id: true})
  var postIds = comments.map(function(c) { return c.post_id; });
  db.posts.find({_id: {$in: postIds}}, {title: true});
Selecting embedded documents is more limited
  // you can select a range (useful for paging)
  // but can't sort, so you are limited to the insertion order
  db.posts.find({_id: 3}, {comments: {$slice: [0, 5]}})
  
  // you can select the post without any comments also
  db.posts.find({_id: 54}, {comments: -1})

  // you can't use the update's position operator ($) for field selections
  db.posts.find({'comments.user': 'leto'}, {title: 1, 'comments.$': 1})
(there are two separate features, currently in planning/development, which should address this)
A document, including all its embedded documents and arrays, cannot exceed 16MB
Don't freak out, The Complete Works of William Shakespeare is around 5.5MB
Separate collections require more work
  // finding a post + its comments is two queries and requires extra work
  // in your code to make it all pretty (or your ODM might do it for you)
  db.posts.find({_id: 9001});
  db.comments.find({post_id: 9001})
this also takes [a bit] more space (+1 field, +1 index)
Embedded documents are easy and fast (single seek)
  // finding a post + its comments
  db.posts.find({_id: 9001});
No big differences for inserts and updates
  // separate collection insert and update
  db.comments.insert({post_id: 43, title: 'i hate unicrons', user: 'dracula'});
  db.comments.update({_id: 4949}, {$set : {title: 'i hate unicorns'}});
  
  // embedded document insert and update
  db.posts.update({_id: 43}, {$push: {title: 'lol @ emo vampire', user: 'paul'}})
  // this specific update requires that we store an _id with each comment
  db.posts.update( {'comments._id': 4949}, {$inc:{'comments.$.votes':1}})
documents that grow, like inserting new comments, will have a padding factor

So, separate collections are good if you need to select individual documents, need more control over querying, or have huge documents.

Embedded documents are good when you want the entire document, the document with a $slice of comments, or with no comments at all.

As a general rule, if you have a lot of "comments" or if they are large, a separate collection might be best.

Smaller and/or fewer documents tend to be a natural fit for embedding.

Virtual collection + positional field selection are relevant upcoming[ish] features
Remember, you can always change your mind. Trying both is the best way to learn.