Archive
ZFS on Linux ‘insufficient replicas’ panic
I run a lovely little HP N54L MicroServer at home to keep all my important bits. It’s been a faithful companion for many years across two continents. I’m running Ubuntu LTS on it, booting off a small SSD but keeping years worth of backups across two ZFS mirrors.
I discovered this evening that the little PCIe card I was using for my boot drive had failed. There’s a spare SATA port on the motherboard I never bothered using (it’s only SATA II, the SSD is SATA III), so I just pulled out the old card and booted off the onboard controller. Imagine the horror when I got the following response to my zpool status
after the first boot:
root@dumpy:~# zpool status pool: first state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://zfsonlinux.org/msg/ZFS-8000-5E scan: none requested config: NAME STATE READ WRITE CKSUM first UNAVAIL 0 0 0 insufficient replicas mirror-0 UNAVAIL 0 0 0 insufficient replicas sda UNAVAIL 0 0 0 sdb FAULTED 0 0 0 corrupted data pool: second state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://zfsonlinux.org/msg/ZFS-8000-5E scan: none requested config: NAME STATE READ WRITE CKSUM second UNAVAIL 0 0 0 insufficient replicas mirror-0 UNAVAIL 0 0 0 insufficient replicas sdc FAULTED 0 0 0 corrupted data sdd FAULTED 0 0 0 corrupted data
The whole point of having two separate mirrors was so that bad things like this would need something more serious than an unconnected controller failure corrupting them!
After taking a deep breath I had a look at the data again, and at the rest of my system. /dev/sda
was now my boot SSD, but ZFS thought it was part of an array. Looks like using the on board port had shuffled drive names around. This data is stored in /etc/zfs/zpool.cache
to speed up mounting on boot. Moving drives around had invalidated this information.
So, I did the following:
-
rm /etc/zfs/zpool.cache
- Rebooted the machine (unloading the ZFS modules should also theoretically work)
-
zpool import
<my pools>
And all my bits were back in the correct order!
root@dumpy:~# zpool status pool: first state: ONLINE scan: scrub repaired 0 in 3h27m with 0 errors on Sun Dec 14 03:27:14 2014 config: NAME STATE READ WRITE CKSUM first ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD20EARX-00PASB0_WD-WMAZA6447754 ONLINE 0 0 0 ata-WDC_WD20EARX-00PASB0_WD-WMAZA6448154 ONLINE 0 0 0 errors: No known data errors pool: second state: ONLINE scan: scrub repaired 0 in 9h42m with 0 errors on Sun Dec 14 09:42:32 2014 config: NAME STATE READ WRITE CKSUM second ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0231617 ONLINE 0 0 0 ata-WDC_WD20EARS-00J2GB0_WD-WCAYY0221030 ONLINE 0 0 0 errors: No known data errors
I initially created the system with an early 0.6.0
release candidate of ZFS on Linux, which is why it was doing something as silly as identifying drives by /dev/sd?
in the first place. Now I’m running on the 0.6.3
release I’m happy to see it using drive serial numbers instead.
Hopefully this information will save someone from blowing away a valid mirror and having to restore from backups…