It’s been a long time since my first post on EC2 AMI Creation Tips. At the time the primary images people were using were the RedHat based ones supplied by Amazon, but I was trying to do something Ubuntu based. Since then a whole host of other well prepared images are now available. I was even lucky enough to be invited to create an AMI for Sun’s launch of OpenSolaris on EC2, but am not allowed to say much more about it…
Recently though I’ve been speaking to more and more people who are trying to take an existing AMI and customise it for their own use. They do this by booting the AMI they want to base theirs on, doing the customization, then bundling up that volume. Generally they do pretty well, but there are three common themes that crop up that often cause pain: transient runtime configuration being bundled up, the time (and to a lesser extent effort) it takes to bundle the new image in the first place and making further changes to the image down the line.
Thankfully there is a simple single solution to these three problems – bundle from an image, not a running volume, and keep that image (or a set of images) along with some nice helper scripts on an EBS volume. That’s the theory, but as always there’s something in the real world that stops it being easy. By default, only the owner of an image can download and unpack an image directly from S3 and the images are encrypted with the owners EC2 private key. For this process to work, you’ll need to at least bootstrap yourself initially by going through the well known and well documented process of bundling a running system. After that though it’s easy. Really. Promise…
Let’s have a closer look at the problems we’re trying to solve first though before I go into how we fix them.
Transient Runtime Pollution
The most frustrating of these is the udev system flagging the source machines MAC address for eth0 and so making their custom image unusable because the network interface does not come up. There are still distributions out there which try to be “helpful” by remembering which physical device, such as network card, maps to which logical device name, such as eth0. This is not in itself a bad thing. This is one feature I was crying for 10 years ago when I started using Linux on bigger iron. I would dread adding another network card to a server because I would normally end up having to re-label the external interfaces. The thing is though that you’re now creating an image that could be running anywhere and you don’t have physical or even console access to it.
Other examples of things that break are helper scripts. Because we’re now on an operating system image that is meant to be able to run anywhere, there are certain things you want to run only once the very first time the system boots. Once they’ve run the these scripts either create a lock file, clear their own executable bits or even delete themselves. If you’re trying to re-bundle and image you’ve already booted, you need to make sure you back out these changes.
Doing your customization in an image that has never actually been booted helps you keep all these things pristine.
Time and Effort of Creation
This one is actually quite straight forward. When you’re bundling a running volume, what happens under the hood is:
- A new sparse file for the image is created
- A new filesystem is created on the new image file
- This new filesystem is mounted somewhere
- The contents of your running volume is copied into the new image file
- The new filesystem is then unmounted
- The image file is compressed and encrypted
- The compressed and encrypted file is then split into chunks
- Your manifest is created
Some of these steps very I/O intensive. When you’re working with an image though, steps 1 to 5 don’t happen (well, steps 3 and 5 are needed for you to make changes) so you’ll be doing almost 50% less IO. This means that bundling a new image will take about half time. If you work with your image on an EBS volume it’ll be even faster as they have better performance characteristics than the standard instance stores.
Bundling and uploading images are not simple commands though. You need to specify things like your AWS access key and provide your EC2 encryption key. There’s options for which kernels and ramdisks to use. There’s lots of typing which means lots of room for human error. The way to get around this is to have small shell scripts with all these options in them. Now they are simple commands…
Once you’ve got your new AMI looking the way you want and doing the things you need, chances are that a few weeks after you’ve started using it you find that there’s a security fix or package update you’d like to apply. Often this ends up with people starting the whole process from scratch again. Boot up a new instance of the AMI you want to update, update it, type in all those commands and remember the options you used to bundle the volume and upload the new one. If you kept your scripts and image on an EBS you could simply attach it to a running instance and make the fixes there using the same scripts you used last time. Hows that for repeatability?
“So, just how do I work with an image then?” I hear you ask. Here’s a basic outline to get you started.
1. Set up your environment
These steps assume that you have the EC2 AMI and API tools installed locally, and that you’re running the commands on an EC2 instance. If you don’t have them, please look at EC2 AMI Tools and EC2 API Tools.
You also need some environment variables configured to make life easier:
2. Create your EBS Volume
The hardest thing will be working out how big you need to make it. Absolute worst case will be 20gb per image, but in reality 10gb should be plenty. Remember though that an EBS can only be mounted in the availability zone it is created in, so this command creates one in the same zone you are in.
ec2-create-volume -s 10 -z `curl http://169.254.169.254/2008-09-01/meta-data/placement/availability-zone`
3. Prepare the Volume
First, attach the volume to a running EC2 instance. Make sure it’s at least the same type (i386 or x86_64) as the image you’re working on.
ec2-attach-volume vol-<your vol id> -i `curl http://169.254.169.254/2008-09-01/meta-data/instance-id` -d /dev/sdp
An EBS volume is a raw bit bucket. You need to partition it (if you’re in to that kind of thing) and create the filesystem on it. Partitions don’t really make sense here though, so just create a nice shiny filesystem on it once it’s mounted on the instance.
mke2fs -j /dev/sdp
In this instance I’m making an EXT3 filesystem, but you can use any filesystem that’s supported by the host machine. Please make sure though that the block device you specify (in this example /dev/sdp) matches what you told EC2 to mount your EBS volume on.
mount /dev/sdp /ebs
This mounts your new filesystem on a directory called /ebs
This creates some handy directories to help you along with the process. This is what they do:
- mnt: Will be used as the mount point to access your image
- download: This is where you’ll to download your initial bundle to
- upload: When you bundle an image, put it here ready to be uploaded
- .ec2: This will contain your AWS access keys and your EC2 PEM files as follows:
- s3.secret: S3 Secret Key
- s3.access: S3 Access Key
- ec2-pk.pem: EC2 Private Key
- ec2-cert.pem: EC2 Certificate
- id: EC2 user ID (Note: AWS account number, NOT Access Key ID)
ec2-download-bundle -b your-bucket -a `cat /ebs/.ec2/s3.access` -s `cat /ebs/.ec2/s3.secret` -k /ebs/.ec2/ec2-pk.pem -d /ebs/download -p your-image-name
This command pulls down the bundle you want to customise from S3. As I said before though, this will only work if you have sufficient rights on S3 to download the image and the EC2 private key that bundled it up and encrypted it in the first place.
ec2-unbundle -k /ebs/.ec2/ec2-pk.pem -s /ebs/download -d /ebs -m /ebs/download/your-image-name.manifest.xml
This command uncompresses and decrypts the image file from the downloaded bundle. It takes a while…
Now you’re ready to work. All you need is two shell scripts to go in your /ebs directory. work.sh mounts up your image (if it’s not mounted already) and chroot’s you in and you’re now up and running – customise to your hearts content. When you’re done, make sure you’ve logged out of all your work.sh scripts (yes you can run more than one) and then run bundle-and-upload.sh.
When you’re done, just shut down your host machine. When you want to work on it again later, just boot up a new AMI, attach your volume, mount it up and you’re at step 4 already.