May 05, 2010

Posted by John

« older newer »

MongoSF Notes and Thoughts

Last week was MongoSF. It marked the first conference for MongoDB in San Francisco and was also my first trip to San Fran. Overall the conference was great. Really good content (which I find rare for conferences) and I had a blast meeting all the people (which is always the case).

Speaker Dinner

Thursday night, 10Gen hosted a dinner for the speakers at Palio D’Asti. It was a great dose of geek mixed with food and drinks. I sat at a table with Ben and Jason (from MongoHQ), Les (from Hashrocket) and Lincoln (from Hotpotato). Lots of good conversation, especially the API discussion with Lincoln (who just posted some thoughts on MongoDB).

Kristina Chodorow

Kristina Chodorow was the first talk of the day I attended. She sounded a bit nervous throughout, but had some great content. She talked a bit about the Mongo console and how you can get help and information.

db.runCommand({datasize: collectionname})
db.runCommand({dbstats:1})
db.commandHelp('distinct')

One other interesting tidbit is that you can run a command like so:

> db.activities.count  
function (x) {
    return this.find(x).count();
}

Quite helpful for when you are just spelunking in the shell and are curious about how something works.

$slice and $elemMatch

She also showed a few examples using $slice to paginate embedded documents.

db.blog.posts.find({_id: 123}, {comments: {$slice : [20, 10]}}) # skip/limit
db.blog.posts.find({_id: 123}, {comments: {$slice : [-20]}}) # skip/limit
db.blog.posts.find({_id: 123}, {comments: {$slice : [-20, 10]}}) # skip/limit

And an example of how to use $elemMatch which is new in 1.4:

db.blog.posts.find({
  comments: {
    '$elemMatch': {
      date  : {'$gt': today}, 
      votes : {'$gt': 10}
    }
  }
}, {})

Dwight Merriman

Dwight’s first talk was about schema design. I didn’t really learn anything new, but it was reaffirming to see some of the recommendations he made that we have already implemented in Harmony.

  • One thing I really liked is he referred to array and hash keys as a “contains” relationship. Great term.
  • Embedding documents is just a way of pre-joining data, which is a good way of explaining.
  • 4MB limit is arbitrary and there only to make you think about your design. If you have docs larger than that, maybe you should have a different schema. An arbitrary technical limit to nudge you in the correct direction.
  • MongoDB is not a key/value store. It does not store opaque blob objects. You can reach into those documents with queries and indexes.
  • In Mongo, map/reduce is more for aggregation and reporting.
  • You can embed things like comments in a post, because 4MB limit is the size a full length book.

Hierarchical Data

3 ways to store data like this:

  • Embed: if your tree is not that big and will fit in one document.
  • Parent Links: parent_id key in each document. Getting children is easy, but getting descendants is harder.
  • Array of Ancestors: each document has array of ancestors and parent_id. We actually do this in Harmony. parent_id stores parent and allows easy access to children db.items.find({parent_id:...}). parent_ids is array of ancestors which allows getting ancestors really easy db.items.find({parent_id:{'$in': [1, 2,...]}}) and querying for descendants db.items.find({parent_ids:...}).

Other Stuff

  • Single collection inheritance: You can use type (or _type as MongoMapper does). You can index it as well obviously for querying by type.
  • Atomicity: At the document level ($operators, compare and swap)
  • Sharding: You can use collection per custom or something instead of sharding on a key. If you have less than 10k or whatever this might be a better. You can have a 100k collections and nothing bad will happen.
  • Capped collections: think of it by size, not documents count, as the size matters. Great for LRU stuff like history. They are insert order preserving. No deletes and no updates which grow the document. The profiler uses a capped collection. Great for logging.
  • —nssize option to set the number of collections allowed

Other Talks

The other talks I went to that I did not take a lot of notes on were by the Gilt Group and Justin.tv. Gilt is using Mongo for a lot of serious real-time analytics. They showed a fun demo of a node/mongo/canvas graph that updates with order information live. It is called Hummingbird. Justin.tv uses Mongo for a lot of analytics as well such as a/b testing and tracking the funnel people go through to get to places. Lots of map/reduce goodness and data.

The final talk that I can remember anything about was Replication by Dwight (again). He demo’d replication live and talked a lot about what you should be aware of. The main thing I remember is that replica pairs are deprecated and replica sets will be out in a month or two (they sound awesome).

My Talk

Overall, I was happy with how my talk went. Though my content was probably the least technical, I finished right on time and seemed to keep everyone’s interest. Some noticed that I am slightly insane (hahaha) and others were impressed with the polish. The feedback drove me to post some presentation tips yesterday.

Conclusion

I love conferences around this size (200ish). You get to know more people and, for whatever reason, the talks seem to have more guts. MongoSF did not disappoint. It was great to meet a lot of the people I have had Twitter conversations and I left quite inspired.

I would definitely recommend attending MongoNY, MongoUK, or MongoFR if you can. Video for MongoSF should be out soon, but until then, feel free to spend some time flipping through all the presentations.

If you are sad you missed MongoSF and would like some in-depth training, be sure to sign up for the IdeaFoundry training in Holland, MI on May 24-25th (taught by Kyle Banker and I).

Labels: Events

1 Comment

  1. Nice post John. I also love how Dwight described the “contains” relationship. I was looking for a way to describe the difference between certain types of one-to-many relationships (post has many tags) and others (author has many posts). The “contains” relationship is the best way to quantify this difference that I know of. Thanks for pointing it out.

Thoughts? Do Tell...


textile enabled, preview above, please be nice
use <pre><code class="ruby"></code></pre> for code blocks

About

Authored by John Nunemaker (Noo-neh-maker), a web developer and programmer who has fallen deeply in love with Mongo. More about John.

Syndication

Feed IconMongoTips Articles - An assortment of news, howto's and thoughts on MongoDB.