Speeding up host TCP metric collection

11 November 2015 Chris Read Leave a comment

We currently use Sensu to monitor our environment, and I’ve taken to using standalone checks to collect various metrics. Standalone metrics don’t rely on the server to issue a check request which provides more reliable interval between checks. One of the metrics we collect is the number of TCP sockets in each of the possible states on each server. We started off using the metrics-netstat-tcp.rb check from the excellent set of Sensu community plugins.

This plugin was doing the job quite nicely, until I noticed that some of our machines had widely varying intervals for publishing this data, especially when under load. This started to be noticeable once the server passed roughly 10k connections, and got worse as the number of connections increased. Given that it’s not uncommon for some of our servers to handle in the region of 100k connections during busy times, I decided to have a closer look at what was going on. Closer inspection on one of the servers revealed that the script was pegging a CPU core at 100% and still taking around 10s to complete when a server had ~60k TCP connections in various states – not a good use of valuable resources.

Taking a look at the code of this plugin for the first time, everything looked pretty reasonable and nicely readable, but the large regular expression running for every line in /proc/net/tcp looks glaringly suspicious. As my skills with awk are greater than my skills with ruby, I decided it would be quicker for me to simply rewrite the check using a tool that was built for running efficiently over large text files. The result a few minutes later was metrics-netstat-tcp.awk. Although the parameters are not the same, the output and functionality matches making it an almost but not quite drop in replacement.

The more important feature for me though is that collecting the metrics on a machine with ~60k connections now completes in under 60ms instead of around 10s. Hopefully the lesson for everyone else is that the older tools are still around for a reason, and you need to know when and how to pick the right tool for the right job.

Categories: DevOps, Linux, Monitoring, Networks

ZFS on Linux ‘insufficient replicas’ panic

17 December 2014 Chris Read Leave a comment

I run a lovely little HP N54L MicroServer at home to keep all my important bits. It’s been a faithful companion for many years across two continents. I’m running Ubuntu LTS on it, booting off a small SSD but keeping years worth of backups across two ZFS mirrors.

I discovered this evening that the little PCIe card I was using for my boot drive had failed. There’s a spare SATA port on the motherboard I never bothered using (it’s only SATA II, the SSD is SATA III), so I just pulled out the old card and booted off the onboard controller. Imagine the horror when I got the following response to my zpool status after the first boot:

root@dumpy:~# zpool status
  pool: first
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
	or invalid.  There are insufficient replicas for the pool to continue
	functioning.
action: Destroy and re-create the pool from
	a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	first       UNAVAIL      0     0     0  insufficient replicas
	  mirror-0  UNAVAIL      0     0     0  insufficient replicas
	    sda     UNAVAIL      0     0     0
	    sdb     FAULTED      0     0     0  corrupted data

  pool: second
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
	or invalid.  There are insufficient replicas for the pool to continue
	functioning.
action: Destroy and re-create the pool from
	a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	second      UNAVAIL      0     0     0  insufficient replicas
	  mirror-0  UNAVAIL      0     0     0  insufficient replicas
	    sdc     FAULTED      0     0     0  corrupted data
	    sdd     FAULTED      0     0     0  corrupted data

The whole point of having two separate mirrors was so that bad things like this would need something more serious than an unconnected controller failure corrupting them!

After taking a deep breath I had a look at the data again, and at the rest of my system. /dev/sda was now my boot SSD, but ZFS thought it was part of an array. Looks like using the on board port had shuffled drive names around. This data is stored in /etc/zfs/zpool.cache to speed up mounting on boot. Moving drives around had invalidated this information.

So, I did the following:

rm /etc/zfs/zpool.cache
Rebooted the machine (unloading the ZFS modules should also theoretically work)
zpool import <my pools>

And all my bits were back in the correct order!

root@dumpy:~# zpool status
  pool: first
 state: ONLINE
  scan: scrub repaired 0 in 3h27m with 0 errors on Sun Dec 14 03:27:14 2014
config:

	NAME                                          STATE     READ WRITE CKSUM
	first                                         ONLINE       0     0     0
	  mirror-0                                    ONLINE       0     0     0
	    ata-WDC_WD20EARX-00PASB0_WD-WMAZA6447754  ONLINE       0     0     0
	    ata-WDC_WD20EARX-00PASB0_WD-WMAZA6448154  ONLINE       0     0     0

errors: No known data errors

  pool: second
 state: ONLINE
  scan: scrub repaired 0 in 9h42m with 0 errors on Sun Dec 14 09:42:32 2014
config:

	NAME                                          STATE     READ WRITE CKSUM
	second                                        ONLINE       0     0     0
	  mirror-0                                    ONLINE       0     0     0
	    ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0231617  ONLINE       0     0     0
	    ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0221030  ONLINE       0     0     0

errors: No known data errors

I initially created the system with an early 0.6.0 release candidate of ZFS on Linux, which is why it was doing something as silly as identifying drives by /dev/sd? in the first place. Now I’m running on the 0.6.3 release I’m happy to see it using drive serial numbers instead.

Hopefully this information will save someone from blowing away a valid mirror and having to restore from backups…

Categories: Linux Tags: sysadmin, zfs

£106.50 per Terabyte Storage Server

2 June 2011 Chris Read Leave a comment

With the price of storage dropping all the time, there is a constant perception from people who don’t deal with it every day that “disk space is cheap”, especially when it comes to developers. The problem is that so called “Enterprise” storage costs are still astronomical compared to what people are used to paying for home storage – even when using SATA disks.

A lot of this extra cost comes from a perceived requirement for the highest available capacity, availability and performance. Achieving all three characteristics is expensive, but if you’re willing to sacrifice one of them then costs start to fall considerably. Lowering requirements on two of the three drops it even more.

One of the teams I work with has a requirement primarily on capacity. Performance and availability are nice, but capacity is the key. We generate gigabytes worth of log files every day, but didn’t have one place to store it all for easy analysis. Just before I joined the team they’d purchased the cheapest “Enterprise” storage system the IT team at the time would allow – it ended up costing in the region of £12k for 12TB of raw storage. That’s £1000 per TB!

In addition to the price, the other problems were accessibility and management of the data and managing growth. This inspired a hunt for something that would provide a cheaper and more flexible solution.

Our requirements were:

*nix based system. The current storage solution was based on Windows Storage Server, but all our systems and tools for this team are Linux based. Yes, Windows does technically provide things like an NFS server, but fighting with the file system permissions and overall performance are two things that impacted us.
Cheap to expand. We need to have a clear path to grow the storage in the server easily by simply adding more disks.
Large filesystems. There’s nothing more wasteful from a storage point of view than having lots of small filesystems. Besides the management overhead, there’s also many wasted blocks lying around un-used.
Cheap to build. This inevitably means commodity hardware.
Reasonable availability. We don’t need 99.999% uptime, but would be happy with somewhere in the region of 90%+
Reasonable performance. Primary access to the data on this machine is via gigabit Ethernet. As long as it can keep up with the network card we’re happy…

Quick Chef Tip

11 August 2010 Chris Read Leave a comment

I’m busy wiring together a new server configuration environment using Windows Deployment Services (don’t ask), Cobbler and Chef. So far things seem to be going quite well, until I bumped in to the following error trying to get a new client to register with the Chef server:

HTTP Request Returned 401 Unauthorized: Failed to authenticate!

A quick sift through Google results didn’t get anything usable. A quick sniff of the packets going over the wire though showed that it was authenticating using a signed certificate. Normally when you sign HTTP requests like that you add some kind of timed expiry. Could the problem be clock related?

Sure enough, a quick check on the new client and the server showed that there was just over an hour time difference. Getting the time on the client and the server in sync got the client registered!

Categories: DevOps, Linux, Unix

EC2 AMI Creation Tips Part 2: Work with Images, not Volumes

8 April 2009 Chris Read 1 comment

It’s been a long time since my first post on EC2 AMI Creation Tips. At the time the primary images people were using were the RedHat based ones supplied by Amazon, but I was trying to do something Ubuntu based. Since then a whole host of other well prepared images are now available. I was even lucky enough to be invited to create an AMI for Sun’s launch of OpenSolaris on EC2, but am not allowed to say much more about it…

Recently though I’ve been speaking to more and more people who are trying to take an existing AMI and customise it for their own use. They do this by booting the AMI they want to base theirs on, doing the customization, then bundling up that volume. Generally they do pretty well, but there are three common themes that crop up that often cause pain: transient runtime configuration being bundled up, the time (and to a lesser extent effort) it takes to bundle the new image in the first place and making further changes to the image down the line.

Thankfully there is a simple single solution to these three problems – bundle from an image, not a running volume, and keep that image (or a set of images) along with some nice helper scripts on an EBS volume. That’s the theory, but as always there’s something in the real world that stops it being easy. By default, only the owner of an image can download and unpack an image directly from S3 and the images are encrypted with the owners EC2 private key. For this process to work, you’ll need to at least bootstrap yourself initially by going through the well known and well documented process of bundling a running system. After that though it’s easy. Really. Promise…

Let’s have a closer look at the problems we’re trying to solve first though before I go into how we fix them.

Transient Runtime Pollution

The most frustrating of these is the udev system flagging the source machines MAC address for eth0 and so making their custom image unusable because the network interface does not come up. There are still distributions out there which try to be “helpful” by remembering which physical device, such as network card, maps to which logical device name, such as eth0. This is not in itself a bad thing. This is one feature I was crying for 10 years ago when I started using Linux on bigger iron. I would dread adding another network card to a server because I would normally end up having to re-label the external interfaces. The thing is though that you’re now creating an image that could be running anywhere and you don’t have physical or even console access to it.

Other examples of things that break are helper scripts. Because we’re now on an operating system image that is meant to be able to run anywhere, there are certain things you want to run only once the very first time the system boots. Once they’ve run the these scripts either create a lock file, clear their own executable bits or even delete themselves. If you’re trying to re-bundle and image you’ve already booted, you need to make sure you back out these changes.

Doing your customization in an image that has never actually been booted helps you keep all these things pristine.

Time and Effort of Creation

This one is actually quite straight forward. When you’re bundling a running volume, what happens under the hood is:

A new sparse file for the image is created
A new filesystem is created on the new image file
This new filesystem is mounted somewhere
The contents of your running volume is copied into the new image file
The new filesystem is then unmounted
The image file is compressed and encrypted
The compressed and encrypted file is then split into chunks
Your manifest is created

Some of these steps very I/O intensive. When you’re working with an image though, steps 1 to 5 don’t happen (well, steps 3 and 5 are needed for you to make changes) so you’ll be doing almost 50% less IO. This means that bundling a new image will take about half time. If you work with your image on an EBS volume it’ll be even faster as they have better performance characteristics than the standard instance stores.

Bundling and uploading images are not simple commands though. You need to specify things like your AWS access key and provide your EC2 encryption key. There’s options for which kernels and ramdisks to use. There’s lots of typing which means lots of room for human error. The way to get around this is to have small shell scripts with all these options in them. Now they are simple commands…

Maintenance

Once you’ve got your new AMI looking the way you want and doing the things you need, chances are that a few weeks after you’ve started using it you find that there’s a security fix or package update you’d like to apply. Often this ends up with people starting the whole process from scratch again. Boot up a new instance of the AMI you want to update, update it, type in all those commands and remember the options you used to bundle the volume and upload the new one. If you kept your scripts and image on an EBS you could simply attach it to a running instance and make the fixes there using the same scripts you used last time. Hows that for repeatability?

“So, just how do I work with an image then?” I hear you ask. Here’s a basic outline to get you started.

1. Set up your environment

These steps assume that you have the EC2 AMI and API tools installed locally, and that you’re running the commands on an EC2 instance. If you don’t have them, please look at EC2 AMI Tools and EC2 API Tools.

You also need some environment variables configured to make life easier:

EC2_PRIVATE_KEY=/path/to/your/EC2/private/key
EC2_CERT=/path/to/your/EC2/cert

2. Create your EBS Volume

The hardest thing will be working out how big you need to make it. Absolute worst case will be 20gb per image, but in reality 10gb should be plenty. Remember though that an EBS can only be mounted in the availability zone it is created in, so this command creates one in the same zone you are in.

ec2-create-volume -s 10 -z `curl http://169.254.169.254/2008-09-01/meta-data/placement/availability-zone`

3. Prepare the Volume

First, attach the volume to a running EC2 instance. Make sure it’s at least the same type (i386 or x86_64) as the image you’re working on.

ec2-attach-volume vol-<your vol id> -i `curl http://169.254.169.254/2008-09-01/meta-data/instance-id` -d /dev/sdp

An EBS volume is a raw bit bucket. You need to partition it (if you’re in to that kind of thing) and create the filesystem on it. Partitions don’t really make sense here though, so just create a nice shiny filesystem on it once it’s mounted on the instance.

mke2fs -j /dev/sdp

In this instance I’m making an EXT3 filesystem, but you can use any filesystem that’s supported by the host machine. Please make sure though that the block device you specify (in this example /dev/sdp) matches what you told EC2 to mount your EBS volume on.

mkdir /ebs

mount /dev/sdp /ebs

This mounts your new filesystem on a directory called /ebs

mkdir /ebs/mnt

mkdir /ebs/download

mkdir /ebs/upload

mkdir /ebs/.ec2

This creates some handy directories to help you along with the process. This is what they do:

mnt: Will be used as the mount point to access your image
download: This is where you’ll to download your initial bundle to
upload: When you bundle an image, put it here ready to be uploaded
.ec2: This will contain your AWS access keys and your EC2 PEM files as follows:
- s3.secret: S3 Secret Key
- s3.access: S3 Access Key
- ec2-pk.pem: EC2 Private Key
- ec2-cert.pem: EC2 Certificate
- id: EC2 user ID (Note: AWS account number, NOT Access Key ID)

ec2-download-bundle -b your-bucket -a `cat /ebs/.ec2/s3.access` -s `cat /ebs/.ec2/s3.secret` -k /ebs/.ec2/ec2-pk.pem -d /ebs/download -p your-image-name

This command pulls down the bundle you want to customise from S3. As I said before though, this will only work if you have sufficient rights on S3 to download the image and the EC2 private key that bundled it up and encrypted it in the first place.

ec2-unbundle -k /ebs/.ec2/ec2-pk.pem -s /ebs/download -d /ebs -m /ebs/download/your-image-name.manifest.xml

This command uncompresses and decrypts the image file from the downloaded bundle. It takes a while…

4. Customise

Now you’re ready to work. All you need is two shell scripts to go in your /ebs directory. work.sh mounts up your image (if it’s not mounted already) and chroot’s you in and you’re now up and running – customise to your hearts content. When you’re done, make sure you’ve logged out of all your work.sh scripts (yes you can run more than one) and then run bundle-and-upload.sh.

When you’re done, just shut down your host machine. When you want to work on it again later, just boot up a new AMI, attach your volume, mount it up and you’re at step 4 already.

Have fun…

Categories: Cloud Computing, Linux, Unix Tags: ami, aws, cloud, Cloud Computing, ec2

EC2 AMI Creation Tips

19 November 2007 Chris Read 1 comment

While we were still working on Buildix 2, people started asking about an AMI for Buildix on Amazons EC2. This didn’t seem to be such a big ask, but now that I’ve finally gotten around to working on this I’ve found it can be a bit fiddly! While there is a lot of good documentation in the various sections of the EC2 site, I still had a quite a few head scratching moments trying to create my own Ubuntu 7.04 Server image to load Buildix into.

The Buildix image is now available for public use as ami-e4ca2f8d.
Read more…

Categories: Linux, Software, Unix, Virtualization

CruiseControl and Buildix 2 at JAOO 2007

26 September 2007 Chris Read Leave a comment

I’ve been at JAOO for the past few days, and while here I had a chance to do a presentation with Erik Doernenburg on Continuous Integration and CruiseControl. We used the new Beta version of Buildix 2 to show people the new CruiseControl Dashboard, and quite a few people were impressed with it. Favourite features were the CCTray integration, and the ability to see the status of a large number of projects at a glance.

As always, there were also people who were interested in hearing about how it can be used for non Java projects. I had a good chat to one person who is interested in using it on a mixed Common LISP and Erlang project he’s working on. I’m looking forward to hearing how it goes for him. Due to lack of experience on my part I could unfortunately not help him much with the darcs problems he’s having though. Some people have all the fun…

It was also quite useful to speak to people about the problems they’re currently facing when trying to use CruiseControl. A common theme is people trying to manage large numbers of builds, or trying to build products across large numbers of different platforms. These are problems the dedicated ThoughtWorks development team are currently working on, so it’s great to get the validation that we’re putting effort into the things people care about now.

Categories: Development, Linux, Software, Unix

Buildix Demo at London 2.0 RC6

8 August 2006 Chris Read Leave a comment

For those of you in the London area who would like to know a bit more about Buildix, see it in action or just ask questions – I’ll be showing it off (so to speak) at London 2.0 RC6. For more info on where and when, check out Sam Newman’s blog entry. I think we even have a few CD’s left from Agile 2006 for those who are interested…

Categories: Linux, Software, Unix

Introducting Buildix – The Agile Development Platform on a disk

7 July 2006 Chris Read Leave a comment

Ever since I started working for ThoughtWorks, I have heard people saying things along the lines of “Wouldn’t it be nice if we had some kind of Cruise-in-a-box to help us get projects up and running quickly?”. After about 6 months I was thinking the same thing, and started tinkering around with various options. Nothing really happened, until the first week of January this year when a group of us who often fill “Build Master” type rolls were all in the office together with a few days unassigned to clients. We were all sitting around a desk together catching up, when somehow the topic once again emerged. With the critical mass in place, this sparked off the birth of Buildix.

The whole point of Buildix is to help any Java based Agile Development Project get up and running as quickly as possible buy providing them with pre-configured and integrated version control system, continuous integration framework, wiki and issue tracking system. We chose our favourite products in each of these areas – Subversion, CruiseControl and Trac. Another common difficulty faced by our development teams, especially in the early stages of a project, is network access. Sometimes all we get from our clients when we arrive on site is a switch to allow us to get our laptops to talk to each other – no DNS, file shares, anything. Buildix can also help in situations like this, as it also runs Samba, and will run as a DNS and DHCP server if given the correct kernel boot parameters.

So, six months after we started, and after a few internal releases, we decided to give something back to the community that helps us do our job, and make Buildix available to everyone. Enjoy!

Categories: Development, Linux, Software, Unix

Chris Read

Archive

Speeding up host TCP metric collection

ZFS on Linux ‘insufficient replicas’ panic

£106.50 per Terabyte Storage Server

Read more…

Quick Chef Tip

EC2 AMI Creation Tips

CruiseControl and Buildix 2 at JAOO 2007

Buildix Demo at London 2.0 RC6

Introducting Buildix – The Agile Development Platform on a disk

@cread

Recent Posts

Top Posts

Archives

Meta