Tips from a Production MongoDB Deployment
May 3
Even though we’re still in private-beta, we’re already pushing MongoDB fairly hard — it turns out people have a lot of emails!
MongoDB is amazing for dealing with these sorts of big-data problems. However, there’s no such thing as a free lunch. MongoDB makes sharding and replication easy, but not free. While building out our first production cluster, I learned a ton.
This post goes over some of my lessons learned, while undertaking a production MongoDB deployment:
The Servers
Attachments.me is running on AWS. 10gen has a great presentation on deploying to this architecture. Here are the specifics of our current deployment.
Replica Set 1
- 2 x Large EC2 instances, Master/Secondary
- 1 x Micro EC2 Instance, Arbiter
Replica Set 2
- 2 x large EC2 instances, Master/Secondary
- 1 x Micro EC2 instance, Arbiter
Config Servers
- 3 x micro EC2 instances.
Rails Boxes (Running Mongos)
- N x Medium High-CPU instances
Crawler Boxes (Running Mongos)
- N x Medium High-CPU instances
The Operating System
I run ubuntu. My clean install was based on ami-4b4ba522, which is a Lucid AMI.
Installing MongoDB
Here are the steps lifted directly from mongodb.org:
echo ” deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen” | sudo tee -a /etc/apt/sources.list
sudo apt-key adv —keyserver keyserver.ubuntu.com —recv 7F0CEB10
sudo apt-cache update
sudo apt-get install mongodb-10gen
Optimizing for MongoDB
There are a couple nifty tricks that can be used to improve MongoDB’s performance:
1. Turn off directory/file access-time logging:
- edit /etc/fstab
- add noatime,nodiratime to the options for the drives being mounted.
- our fstab looks something like this:
/dev/sdb /mnt auto defaults„noatime,nodiratime,comment=cloudconfig,nobootwait 0 0
2. Increase the available file descriptors:
cat » /etc/security/limits.conf « EOF
* hard nofile 65536
* soft nofile 65536
EOF
Mounting the Ethereal Drive
To avoid changing MongoDB’s default logging and data directories, I added these lines to fstab:
/mnt/var/lib/mongodb /var/lib/mongodb none bind
/mnt/var/log/mongodb /var/log/mongodb none bind
Create An AMI
Once I had everything running, I made an AMI of the box. When I spin-up more MongoDB instances, all the above tweaks are already in place. Here’s a great article on how to make an EBS AMI Image.
Putting the Pieces Together
All of the servers in our MongoDB cluster start with an install resembling the one outlined above. As a general Rule, MongoDB does not play well with others. With the exception of mongos processes, which run on the application servers, each component of the cluster runs on its own EC2 instance.
Creating the First Replica Set
Our MongoDB deployment started as a single replica set. The easiest way to setup a sharded infrastructure, or to add new shards to your infrastructure, is to start by creating replica sets in isolation.
1. We started with two large EC2 instances and a micro EC2 instance for the arbiter.
2. I edited the /etc/mongo.conf on each and uncommented the replicationSet line:
replSet = attachmentsme-xx
3. Next we had to make sure that all of the boxes could talk to each other. You will need to setup security groups on EC2 to allow this. I run firewalls on each box, and had to also add ips to a whitelist on each box.
4. Once the boxes could talk to each other, I could hop on one of the large instances and run a command resembling this:
config = {_id: ‘foo’, members: [
{_id: 0, host: ‘large_instance_1:27017’},
{_id: 1, host: ‘large_instance_2:27017’},
{_id: 2, host: ‘micro_instance:27017’, arbiterOnly: true}]
}
rs.initiate(config)
5. Now just wait, it will take a while for the local.x files to be created on each box.
Let’s Shard This Puppy
We’re using MongoDB, and sharding is the secret-sauce of web-scale. To move towards a sharded infrastructure, from a single replica set, here’s the approach we took:
We started by getting all the components in place. Starting with a sharded cluster that includes just the first replica set.
Config Servers
1. We ran three micro EC2 instances as our config servers.
2. To keep stuff nice and consistent, I modified /etc/init/mongodb.conf to the following command, instead of the default:
/usr/bin/mongod — —configsvr —config /etc/mongodb.conf
The Mongos Processes
The mongos processes know the specifics about routing requests to the MongoDB cluster. They allow your application to simply connect to localhost, pretty slick.
1. I modified the mongodb.conf file on these boxes, this time to resemble this:
/home/ubuntu/run/mongos.pid —logpath /var/log/mongos.log —configdb x.x.x.x:27019,x.x.x.x:27019,x.x.x.x:27019
where the ips of each config server are passed to the configdb parameter.
Putting Stuff Together
1. I connected to mongodb on localhost of one of the application servers.
2. I could now add the replica set with this command:
db.runCommand( { addshard : “attachmentsme-xx/<serverhostname>[:<port>]” } );
3. Now I could actually shard a collection!
use admin
db.runCommand( { shardcollection : “db.table” , key : { shard_key : 1 } , unique : true } );
Adding More Shards
To add new shards to attachments.me, we can now just repeat the process.
1. Create a new replica set in isolation, with a different name.
2. Again, hop on the application box and add the shard.
Automation
MongoDB has a lot of moving parts to keep track of, i.e., there’s a lot of room for stuff to blow up. We took some steps to deal with this:
1. Fabric Scripts
Making sure that every server can talk to every other server is actually really annoying. We created fabric scripts to automate the process of adding and removing boxes to firewall whitelists.
2. Monit
We have monit setup to monitor all the pieces of our MongoDB deployment. If an essential piece of infrastructure tips over, our faithful employee monit attempts to restart it and sends us an email.
Epilogue
We’re still learning a ton about our MongoDB administration. While there are a lot of moving parts, getting things going was fairly straight-forward. What’s exciting, is that we now have an architecture that we know is easy to scale.
Would you like to play with some of this architecture yourself? Did I mention we’re hiring?
— Ben Coe (@benjamincoe)
