Discussion:
[zfs-discuss] Silly me!
UbuntuNewbie
2015-02-25 13:50:22 UTC
Permalink
Today, i ran into a little surprise, that i'd like to share:

In order to do a long SMART test for a drive inside a RAID, i offlined it
and issued the smartctl command, which took quite some time to finish.

In the meantime i went on with other stuff on the degraded pool, including
a software update.
Later, when the SMART test was over, i rebootet in order to make use of the
newer kernel and...
OOPS - grub couldnt boot any longer.

It turned out, that i forgot to online the device before the reboot. (Silly
me!)

Luckily, i still had a "rescue install" around outside the pool that i
could boot, import the pool and online the device, but...

Still things went bad, because the newer initrd still had a reference to
the zpool.cache with the offlined device in it.
Troubles went away after update-initramfs while the pool config was ONLINE
(not DEGRADED).

Imagine one device really failing on a raid pool with ZFS ROOT in it and
grub fails to boot from the pool? SCARY!

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Gordan Bobic
2015-02-25 13:55:37 UTC
Permalink
Does this issue (grub not booting off a degraded array) only affect
RAIDZn/stripe arrangement or does it also affect the case where only
unstacked mirroring arrangement is used?
Post by UbuntuNewbie
In order to do a long SMART test for a drive inside a RAID, i offlined it
and issued the smartctl command, which took quite some time to finish.
In the meantime i went on with other stuff on the degraded pool, including
a software update.
Later, when the SMART test was over, i rebootet in order to make use of
the newer kernel and...
OOPS - grub couldnt boot any longer.
It turned out, that i forgot to online the device before the reboot.
(Silly me!)
Luckily, i still had a "rescue install" around outside the pool that i
could boot, import the pool and online the device, but...
Still things went bad, because the newer initrd still had a reference to
the zpool.cache with the offlined device in it.
Troubles went away after update-initramfs while the pool config was ONLINE
(not DEGRADED).
Imagine one device really failing on a raid pool with ZFS ROOT in it and
grub fails to boot from the pool? SCARY!
To unsubscribe from this group and stop receiving emails from it, send an
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Hajo Möller
2015-02-25 14:55:35 UTC
Permalink
only affect RAIDZn/stripe arrangement
As far as I know ZoL won't boot off pools built from multiple top-level
vdevs (i.e. "striped") as of now, Richard Yao was working on it and
built a proof of concept, but it hasn't been merged yet.
--
Regards,
Hajo Möller

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Fajar A. Nugraha
2015-02-25 16:05:38 UTC
Permalink
Post by Hajo Möller
only affect RAIDZn/stripe arrangement
As far as I know ZoL won't boot off pools built from multiple top-level
vdevs (i.e. "striped") as of now,
Sure it will. At least it works on Ubuntu when you follow my howto.
Just tested it.

# zpool status
pool: rzpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 391M in 0h2m with 0 errors on Mon Jan 12 11:00:19 2015
config:

NAME STATE
READ WRITE CKSUM
rzpool DEGRADED
0 0 0
raidz2-0 DEGRADED
0 0 0
sda1 OFFLINE
0 0 0
ata-VBOX_HARDDISK_VB0a394d20-76c87e6a-part1 ONLINE
0 0 0
ata-VBOX_HARDDISK_VBe51e2eb6-75e186e2-part1 ONLINE
0 0 0
ata-VBOX_HARDDISK_VBfbf70a2a-d7002bce-part1 ONLINE
0 0 0
sde1 OFFLINE
0 0 0
ata-VBOX_HARDDISK_VB32860776-12b776df-part1 ONLINE
0 0 0

# df -h /
Filesystem Size Used Avail Use% Mounted on
rzpool/ROOT/ubuntu 49G 1.2G 48G 3% /

Yes, I know the layout of raidz2 + single vdev is silly. I even had to
use "-f" to add the new disk. But it proves my point perfectly :)
And yes, I did reboot it to verify that it works.
Post by Hajo Möller
Richard Yao was working on it and
built a proof of concept, but it hasn't been merged yet.
Different functionality, perhaps? Or something redhat-specific?
--
Fajar

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Fajar A. Nugraha
2015-02-25 14:41:50 UTC
Permalink
Post by UbuntuNewbie
In order to do a long SMART test for a drive inside a RAID, i offlined it
and issued the smartctl command, which took quite some time to finish.
In the meantime i went on with other stuff on the degraded pool, including a
software update.
Later, when the SMART test was over, i rebootet in order to make use of the
newer kernel and...
OOPS - grub couldnt boot any longer.
It turned out, that i forgot to online the device before the reboot. (Silly
me!)
Luckily, i still had a "rescue install" around outside the pool that i could
boot, import the pool and online the device, but...
Still things went bad, because the newer initrd still had a reference to the
zpool.cache with the offlined device in it.
Troubles went away after update-initramfs while the pool config was ONLINE
(not DEGRADED).
Imagine one device really failing on a raid pool with ZFS ROOT in it and
grub fails to boot from the pool? SCARY!
I'm pretty sure I tested both mirror and raidz in degraded mode, by
REMOVING the disk on virtualbox. Grub works fine on both cases.

I did NOT test offlining a vdev though.

Can you try doing a test with similar setup on virtualbox or similar,
using both methods (offline-disk present, and removing the disk)?

I'm GUESSing that your problem is specific to "offline",since the disk
still has a valid zfs label on it, with the same label and GUID as the
other disks, but can NOT be used by zfs grub code to build the pool.
--
Fajar

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Fajar A. Nugraha
2015-02-25 15:58:23 UTC
Permalink
Post by Fajar A. Nugraha
Post by UbuntuNewbie
Imagine one device really failing on a raid pool with ZFS ROOT in it and
grub fails to boot from the pool? SCARY!
I'm pretty sure I tested both mirror and raidz in degraded mode, by
REMOVING the disk on virtualbox. Grub works fine on both cases.
I did NOT test offlining a vdev though.
Can you try doing a test with similar setup on virtualbox or similar,
using both methods (offline-disk present, and removing the disk)?
I'm GUESSing that your problem is specific to "offline",since the disk
still has a valid zfs label on it, with the same label and GUID as the
other disks, but can NOT be used by zfs grub code to build the pool.
Yup, I guessed correctly. Just tested it.

Offline -> reboot -> grub still works.

write some data -> reboot -> grub stop working

remove the offlined disks -> grub works again

There seems to be some treshold where grub still accepts txg
difference between disks. When it gets too big, it refused to read the
pool. Moral of the story: don't offline disks in rpool :)

( yes, I know it's a bug, but IMHO it's an acceptable workaround for now )
--
Fajar

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Fajar A. Nugraha
2015-02-26 04:22:38 UTC
Permalink
Post by Fajar A. Nugraha
Post by UbuntuNewbie
Imagine one device really failing on a raid pool with ZFS ROOT in it and
grub fails to boot from the pool? SCARY!
There seems to be some treshold where grub still accepts txg
difference between disks. When it gets too big, it refused to read the
pool. Moral of the story: don't offline disks in rpool :)
I've added notes about this on the howto
https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-14.04-or-Later-to-a-Native-ZFS-Root-Filesystem
:
- what to do when grub displays "unknown filesystem" (end of howto)
- remove zpool.cache from both ext4 and zfs root so that you don't end
up with stale zpool.cache (step 3.7 and step 5)
--
Fajar

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
Loading...