We currently use Sensu to monitor our environment, and I’ve taken to using standalone checks to collect various metrics. Standalone metrics don’t rely on the server to issue a check request which provides more reliable interval between checks. One of the metrics we collect is the number of TCP sockets in each of the possible states on each server. We started off using the metrics-netstat-tcp.rb check from the excellent set of Sensu community plugins.
This plugin was doing the job quite nicely, until I noticed that some of our machines had widely varying intervals for publishing this data, especially when under load. This started to be noticeable once the server passed roughly 10k connections, and got worse as the number of connections increased. Given that it’s not uncommon for some of our servers to handle in the region of 100k connections during busy times, I decided to have a closer look at what was going on. Closer inspection on one of the servers revealed that the script was pegging a CPU core at 100% and still taking around 10s to complete when a server had ~60k TCP connections in various states – not a good use of valuable resources.
Taking a look at the code of this plugin for the first time, everything looked pretty reasonable and nicely readable, but the large regular expression running for every line in /proc/net/tcp looks glaringly suspicious. As my skills with awk are greater than my skills with ruby, I decided it would be quicker for me to simply rewrite the check using a tool that was built for running efficiently over large text files. The result a few minutes later was metrics-netstat-tcp.awk. Although the parameters are not the same, the output and functionality matches making it an almost but not quite drop in replacement.
The more important feature for me though is that collecting the metrics on a machine with ~60k connections now completes in under 60ms instead of around 10s. Hopefully the lesson for everyone else is that the older tools are still around for a reason, and you need to know when and how to pick the right tool for the right job.
I run a lovely little HP N54L MicroServer at home to keep all my important bits. It’s been a faithful companion for many years across two continents. I’m running Ubuntu LTS on it, booting off a small SSD but keeping years worth of backups across two ZFS mirrors.
I discovered this evening that the little PCIe card I was using for my boot drive had failed. There’s a spare SATA port on the motherboard I never bothered using (it’s only SATA II, the SSD is SATA III), so I just pulled out the old card and booted off the onboard controller. Imagine the horror when I got the following response to my
zpool status after the first boot:
root@dumpy:~# zpool status pool: first state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://zfsonlinux.org/msg/ZFS-8000-5E scan: none requested config: NAME STATE READ WRITE CKSUM first UNAVAIL 0 0 0 insufficient replicas mirror-0 UNAVAIL 0 0 0 insufficient replicas sda UNAVAIL 0 0 0 sdb FAULTED 0 0 0 corrupted data pool: second state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://zfsonlinux.org/msg/ZFS-8000-5E scan: none requested config: NAME STATE READ WRITE CKSUM second UNAVAIL 0 0 0 insufficient replicas mirror-0 UNAVAIL 0 0 0 insufficient replicas sdc FAULTED 0 0 0 corrupted data sdd FAULTED 0 0 0 corrupted data
The whole point of having two separate mirrors was so that bad things like this would need something more serious than an unconnected controller failure corrupting them!
After taking a deep breath I had a look at the data again, and at the rest of my system.
/dev/sda was now my boot SSD, but ZFS thought it was part of an array. Looks like using the on board port had shuffled drive names around. This data is stored in
/etc/zfs/zpool.cache to speed up mounting on boot. Moving drives around had invalidated this information.
So, I did the following:
- Rebooted the machine (unloading the ZFS modules should also theoretically work)
zpool import<my pools>
And all my bits were back in the correct order!
root@dumpy:~# zpool status pool: first state: ONLINE scan: scrub repaired 0 in 3h27m with 0 errors on Sun Dec 14 03:27:14 2014 config: NAME STATE READ WRITE CKSUM first ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD20EARX-00PASB0_WD-WMAZA6447754 ONLINE 0 0 0 ata-WDC_WD20EARX-00PASB0_WD-WMAZA6448154 ONLINE 0 0 0 errors: No known data errors pool: second state: ONLINE scan: scrub repaired 0 in 9h42m with 0 errors on Sun Dec 14 09:42:32 2014 config: NAME STATE READ WRITE CKSUM second ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0231617 ONLINE 0 0 0 ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0221030 ONLINE 0 0 0 errors: No known data errors
I initially created the system with an early
0.6.0 release candidate of ZFS on Linux, which is why it was doing something as silly as identifying drives by
/dev/sd? in the first place. Now I’m running on the
0.6.3 release I’m happy to see it using drive serial numbers instead.
Hopefully this information will save someone from blowing away a valid mirror and having to restore from backups…
It’s been way too long since I’ve posted anything here, so I’ve decided to resume again with what I’ve been busy with lately.
We’ve been using Logstash for quite a while now, and one of the annoyances I’ve had is that with the default configuration we threw together we only got second resolution on our timestamps. 99% of the time this has been good enough for the simple things we’ve been trying to keep an eye on, but now and again it’s resulted trouble stitching together the exact sequence of events when trying to diagnose a problem.
Rsyslog is the primary tool we use to get data into Logstash in the first place. As we don’t have the luxury of using anything cloud oriented we need to care about our hardware too, and using syslog makes this very easy. Add to this how simple it is to get most applications to log via syslog too and it makes using it a no brainer.
Getting the high resolution timestamps into your rsyslog messages is actually really simple. Just make sure you have the following configuration option set in your rsyslog configuration:
With the price of storage dropping all the time, there is a constant perception from people who don’t deal with it every day that “disk space is cheap”, especially when it comes to developers. The problem is that so called “Enterprise” storage costs are still astronomical compared to what people are used to paying for home storage – even when using SATA disks.
A lot of this extra cost comes from a perceived requirement for the highest available capacity, availability and performance. Achieving all three characteristics is expensive, but if you’re willing to sacrifice one of them then costs start to fall considerably. Lowering requirements on two of the three drops it even more.
One of the teams I work with has a requirement primarily on capacity. Performance and availability are nice, but capacity is the key. We generate gigabytes worth of log files every day, but didn’t have one place to store it all for easy analysis. Just before I joined the team they’d purchased the cheapest “Enterprise” storage system the IT team at the time would allow – it ended up costing in the region of £12k for 12TB of raw storage. That’s £1000 per TB!
In addition to the price, the other problems were accessibility and management of the data and managing growth. This inspired a hunt for something that would provide a cheaper and more flexible solution.
Our requirements were:
- *nix based system. The current storage solution was based on Windows Storage Server, but all our systems and tools for this team are Linux based. Yes, Windows does technically provide things like an NFS server, but fighting with the file system permissions and overall performance are two things that impacted us.
- Cheap to expand. We need to have a clear path to grow the storage in the server easily by simply adding more disks.
- Large filesystems. There’s nothing more wasteful from a storage point of view than having lots of small filesystems. Besides the management overhead, there’s also many wasted blocks lying around un-used.
- Cheap to build. This inevitably means commodity hardware.
- Reasonable availability. We don’t need 99.999% uptime, but would be happy with somewhere in the region of 90%+
- Reasonable performance. Primary access to the data on this machine is via gigabit Ethernet. As long as it can keep up with the network card we’re happy…
When I got back from DevOpsDays in Hamburg this year I felt the need to explain my journey to the “DevOps” world and my view on where it’s headed. I started writing a State of the Nation paper to lay it all out. A couple of weeks in I got a message from Matthias Marschall asking if I’d like to do a guest post as part of their DevOps series. I agreed, and after a lot of effort (and help from a couple of great editors) you can now read it here.
It’s by far the longest article I’ve ever written, and I was amazed at how ideas that had been floating around in my head for a while crystalized through the processes of writing them down. I found I got so passionate talking to people about what was in it that I’ve decided to make a talk out of it, the first iteration of which will be in Chicago on Tuesday (see previous post).
I’m busy wiring together a new server configuration environment using Windows Deployment Services (don’t ask), Cobbler and Chef. So far things seem to be going quite well, until I bumped in to the following error trying to get a new client to register with the Chef server:
HTTP Request Returned 401 Unauthorized: Failed to authenticate!
A quick sift through Google results didn’t get anything usable. A quick sniff of the packets going over the wire though showed that it was authenticating using a signed certificate. Normally when you sign HTTP requests like that you add some kind of timed expiry. Could the problem be clock related?
Sure enough, a quick check on the new client and the server showed that there was just over an hour time difference. Getting the time on the client and the server in sync got the client registered!
A few months ago I got an email from Patrick Debois who I’d met at CITCON Europe asking if I’d be interested in speaking at the first conference aimed at System Administrators practising/interested in/sceptical about Agile. One of the key beliefs of those of us doing this already is that Agile practices are generally too narrowly focussed in their implementation. At the moment it’s primarily the Development organization who drive its adoption, but to get the most benefit Development and Operations groups within an organization need to work together.
With this in mind it was decided to call the conference DevOpsDays. Videos of the talks will be online in the archives section soon, so I’ve decided to write down my thoughts about what went well and not so well – I am a fan of retrospectives.