Discussion:
End User Problem - zfs-fuse won't let me replace a bad disk
VL
2009-08-24 17:42:26 UTC
Permalink
Hello zfs-fuse peoples; I apologize if this is an inappropriate
location for this question, but I've been unable to find any other
forum which seems better suited.

I am running zfs-fuse version 13 on Ubuntu 8.04.2. I have a situation
where I have a 4x200gb raidz setup which is losing a disk. So far all
attempts to replace that drive have met with failure. When the drive
is failed, I am met with "one or more devices is currently
unavailable" error. When it is failed and offline, I can not bring it
online even though it tells me I should be able to. After a power
cycle and letting the bad drive cool off, I can then import it and it
starts out as "online" but I can not remove that drive or replace it,
as it says is it part of an active pool. I have no more free drive
bays to bring a 5th drive online to try a replace; I need to "replace
in place".

Is this normal? Is this something with ZFS, or perhaps version 13 of
zfs-fuse build?

I have a little longer post prepared with output from the various
zpool commands if it might help.

Thanks so much for any time/insight you could help with. I have been
looking forward to using a stack of all 200gb drives in multiple raidz
sets with zfs.

vl
Emmanuel Anne
2009-08-24 19:06:58 UTC
Permalink
2009/8/24 VL <val.luck-***@public.gmane.org>

>
> Hello zfs-fuse peoples; I apologize if this is an inappropriate
> location for this question, but I've been unable to find any other
> forum which seems better suited.


Good place, no excuses needed ! ;-)

>
> I am running zfs-fuse version 13 on Ubuntu 8.04.2.


Don't know what it is. The only versions released so far were 0.5.0 and
0.4.x. Anyway...


> I have a situation
> where I have a 4x200gb raidz setup which is losing a disk. So far all
> attempts to replace that drive have met with failure. When the drive
> is failed, I am met with "one or more devices is currently
> unavailable" error. When it is failed and offline, I can not bring it
> online even though it tells me I should be able to. After a power
> cycle and letting the bad drive cool off, I can then import it and it
> starts out as "online" but I can not remove that drive or replace it,
> as it says is it part of an active pool. I have no more free drive
> bays to bring a 5th drive online to try a replace; I need to "replace
> in place".
>
> Is this normal? Is this something with ZFS, or perhaps version 13 of
> zfs-fuse build?
>
> I have a little longer post prepared with output from the various
> zpool commands if it might help.
>
> Thanks so much for any time/insight you could help with. I have been
> looking forward to using a stack of all 200gb drives in multiple raidz
> sets with zfs.


Probably not normal, no.
You should have sent directly the outputs of the commands, so that we know
better what happens.
Anyway normally from the man page, once you have physically replaced the
disk, you can just type :
zpool replace <pool> <device>
and it should work.
There is also a -f flag in case of problem, but normally there is none.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-24 20:21:07 UTC
Permalink
>
> Good place, no excuses needed ! ;-)

Thanks so much !

> > I am running zfs-fuse version 13 on Ubuntu 8.04.2.
>
> Don't know what it is. The only versions released so far were 0.5.0 and
> 0.4.x. Anyway...

I'm not sure how to get the version I'm running. The ubuntu package
is 0.5.1-1ubuntu4, which probably means little to anyone outside the
ubuntu package maintainer. "zpool upgrade" tells me "This system is
currently running ZFS pool version 13.". Is there a better way to get
currently installed version?

As I currently write this, the pool is online, resilvered and working
with the flaky drive. I am contemplating trying to make it break
again so I can get the text dump of the commands. This morning, I had
tried to bring it online, and couldn't:

# zpool import
pool: rstore2
id: 762967251253940714
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-2Q
config:

rstore2 DEGRADED
raidz1 DEGRADED
sdf ONLINE
sdg ONLINE
sdk ONLINE
sdi UNAVAIL cannot open


..but no matter what I do, I can not bring it online. It errors out
with:

# zpool import rstore2
cannot import 'rstore2': one or more devices is currently unavailable

Shouldn't I be able to import it even though sdi was not available?

Trying to summarize my steps:
*sdi fails
*unable to detach drive from raidz
*exported pool, unable to import with bad device
*let drive "cool off", re-power it up. Import OK. All devices
working.
*drive starts to resilver, fails mid process, leaving pool degraded
*unable to replace drive, even with -f -- saying it can't open the
device sdi
*export pool, replace sdi with a new 200gb hard drive (new drive
showed up as /dev/sdl)
*try to bring it online, but can't import with only 3/4 raidz drives,
"one or more devices is currently unavailable"
*start process over, export pool, power off drives, replace orig flaky
sdi back in, power on, import, re-silver
*(currently) re-silver finished with flaky drive and is online.

The pool is online, no errors. I tried to "replace" the sdi:

# zpool replace -f rstore2 sdi
invalid vdev specification
the following errors must be manually repaired:
/dev/sdi is part of active pool 'rstore2'

I don't understand the "invalid vdev" error. I get the same error
using /dev/sdi rather than just sdi.

I have a stack of old 200 gb drives I would like to use, and my
question is, if one of the drive fails outright (instead of the flaky
one I have now), am I supposed to be able to remove or replace it?
Based on other raid-5 type devices (and software) that I've used over
the years, I would guess yes, I should be able to.

Do I potentially have an old or bad version of zfs-fuse for my
ubuntu? May I ask the recommended "release" version? Should I
download and compile a more recent version?

Thanks so much for you help, I'm really exited that fuse is bringing
zfs to Linux. I hope my post here was somewhat intelligible.
VL
2009-08-25 05:17:19 UTC
Permalink
So, I copied more data to "rstore2" with the flaky sdi device. Sure
enough, I got errors in dmesg and zpool complained. Oddly enough, it
didn't degrade the array:

# zpool status
pool: rstore2
state: ONLINE
status: One or more devices has experienced an unrecoverable error.
An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool
replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub in progress for 0h27m, 22.87% done, 1h32m to go
config:

NAME STATE READ WRITE CKSUM
rstore2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 127K 126K 10

errors: No known data errors

then I tried to replace the disk - hoping it'd just take sdi out of
the picture:

# zpool replace rstore2 sdi
cannot replace sdi with sdi: sdi is busy

What is the proper process at this point? How do I get zfs to give up
sdi so I can replace it? My experience is if sdi is removed
completely, I'll be unable to import rstore2, and it seems you can't
act at all on non-imported (aka online) pools.

Am I doing something wrong? Thanks so much.

Val
Emmanuel Anne
2009-08-25 09:17:01 UTC
Permalink
sdi is busy because there is a scrub in progress : since it had errors it
scans the whole disk to see what is reliable with it.
So if at this point you want to replace it no matter what, you must do
zpool offline rstore2 sdi
when it's offline you replace it by a new drive, and then
zpool replace rstore2 sdi
and zpool online rstore2 sdi

Notice that you can also wait for the scrub to complete to see what it will
say at the end before trying to replace it.

then it will start a scrub operation to put what's needed on sdi, I guess it
will take time, you can see how the progress is going with zpool status.

For the version, it's ok, the pool version 13 is the most recent released
version until now, a new one should be released soon.
For the bad vdev error, I am not sure, did you try to specify /dev/sdi
instead of sdi?

2009/8/25 VL <val.luck-***@public.gmane.org>

>
> So, I copied more data to "rstore2" with the flaky sdi device. Sure
> enough, I got errors in dmesg and zpool complained. Oddly enough, it
> didn't degrade the array:
>
> # zpool status
> pool: rstore2
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error.
> An
> attempt was made to correct the error. Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the
> errors
> using 'zpool clear' or replace the device with 'zpool
> replace'.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub in progress for 0h27m, 22.87% done, 1h32m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sdh ONLINE 0 0 0
> sdi ONLINE 127K 126K 10
>
> errors: No known data errors
>
> then I tried to replace the disk - hoping it'd just take sdi out of
> the picture:
>
> # zpool replace rstore2 sdi
> cannot replace sdi with sdi: sdi is busy
>
> What is the proper process at this point? How do I get zfs to give up
> sdi so I can replace it? My experience is if sdi is removed
> completely, I'll be unable to import rstore2, and it seems you can't
> act at all on non-imported (aka online) pools.
>
> Am I doing something wrong? Thanks so much.
>
> Val
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
ssc
2009-08-25 10:18:40 UTC
Permalink
I think I have a somewhat similar problem:
I have 3 zpools zpool01, zpool02, zpool03 with 4 harddrives each,
configured as raidz1.
Now one hd died and became unavailable, setting the entire pool as
unavailable.
I took the hd out and replaced it with a new one.

I had to do zpool clear to get the pool from unavailable to degraded
state, then I did a zpool replace.
The resilvering took a couple of hours, then it reported completion
and 365 errors. I had to reboot, so the pool would be mounted at
system startup.
Then zpool status -v shows the affected files - no major loss, I've
got copies elsewhere.

PROBLEM: The pool is still degraded and I can not delete one of the
affected files.

***@host:~$ sudo zpool status -v
pool: zpool01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore
the
entire pool from
backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none
requested
config:

NAME STATE
READ WRITE CKSUM
zpool01 DEGRADED
0 0 4
raidz1 DEGRADED
0 0 4
disk/by-id/scsi-1ATA_ST3500320AS_9QM3CPW7 ONLINE
0 0 0
disk/by-id/scsi-1ATA_ST3500320AS_9QM3QNW7 ONLINE
0 0 0
disk/by-id/scsi-1ATA_ST3500320AS_9QM3P9VB ONLINE
0 0 0
replacing DEGRADED
0 0 0
disk/by-id/scsi-1ATA_ST3500320AS_9QM3R3ZK UNAVAIL
0 0 0 cannot open
disk/by-id/scsi-1ATA_ST3500418AS_9VM2LMCV ONLINE
0 0 0

errors: Permanent errors have been detected in the following files:

zpool01:<0x37368>
zpool01:<0x37369>
<some file path and name>

zpool01:<0x349c7>

There were 4 affected files listed, after I deleted 3 of them, these
zpool01:<0x37368> lines came up.
No problem with them as such, but the output doesn't look like a
reliable raid to me.
Also, when I try to delete the fourth file, I get an error message
'invalid exchange'.
How do I deal with this ?

Why is the zpool still degraded ?
Why is this 'replacing' line still in there ?
Why is the old hd still listed ?

What do I need to do to complete the HD replacing ?

Thank you very much,

Cheers,

Steve
VL
2009-08-25 16:03:13 UTC
Permalink
Hi Emmanuel,

Thanks so much for your reply.

On Aug 25, 2:17 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> sdi is busy because there is a scrub in progress : since it had errors it
> scans the whole disk to see what is reliable with it.
> So if at this point you want to replace it no matter what, you must do
> zpool offline rstore2 sdi

Trying to offline the bad sdi device doesn't seem to work, it wants a
replica. Shouldn't it be ok to remove 1 drive from 4 drive "raidz"
set?

# zpool offline rstore2 sdi
cannot offline sdi: no valid replicas

I didn't see a -f force type option. Is there a way to be more
forceful with zpool and offline a bad disk?

> when it's offline you replace it by a new drive, and then
> zpool replace rstore2 sdi
> and zpool online rstore2 sdi

I didn't try these yet as I was unable to offline sdi. Should I just
remove the bad drive from the system without first 'removing' it from
that pool? My experience yesterday tells me that if I remove the
device from the pool, then I will be unable to do any action on that
pool as it will have a device missing.

> Notice that you can also wait for the scrub to complete to see what it will
> say at the end before trying to replace it.

I tried it again (now that there is no actions like scub or resilver
going on) and I get the same busy:

# zpool replace rstore2 sdi
cannot replace sdi with sdi: sdi is busy


> For the version, it's ok, the pool version 13 is the most recent released
> version until now, a new one should be released soon.

I can't wait. Maybe it will make my problems go away?

> For the bad vdev error, I am not sure, did you try to specify /dev/sdi
> instead of sdi?

Yes, I tried that command replace -f with both sdi and /dev/sdi and
both returned the exact same message, "invalid vdev specification".

Thanks again for all your help; I can't wait to get this fixed.

Val
Emmanuel Anne
2009-08-25 18:13:50 UTC
Permalink
VL : I hadn't noticed that you were using 4 drives for a raidz1 pool. You
had to force the 4th drive with a -f flag, raidz1 pools always use drives by
3 (3, 6, ...).

So your pool is now unstable, if you loose 1 drive, you losse most of it
since the data can't be replicated anymore.

Maybe there is a trick to try to save things in this case, but I don't know
it (and I doubt there is).
zfs accepts -f to add a 4th drive in this case, but it's better never to
have a drive failure in this case because it's like you don't have any
raidz1 at all.

You can try the zfsadmin.pdf to check if they have a clue about that :
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf


Now Rudd-O : in case you read this, I just found a bug stopper for your
release. To reproduce these bugs with raidz1 pools, I have tried with raw
files as usual, and this time our zfs-fuse has real problems here. zpool
status never finishes with out of memory errors in the syslog. So if you get
a failure in a raidz1 pool with the current zfs-fuse, you are in bad shape !
This should be investigated...

2009/8/25 VL <val.luck-***@public.gmane.org>

>
> Hi Emmanuel,
>
> Thanks so much for your reply.
>
> On Aug 25, 2:17 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> > sdi is busy because there is a scrub in progress : since it had errors it
> > scans the whole disk to see what is reliable with it.
> > So if at this point you want to replace it no matter what, you must do
> > zpool offline rstore2 sdi
>
> Trying to offline the bad sdi device doesn't seem to work, it wants a
> replica. Shouldn't it be ok to remove 1 drive from 4 drive "raidz"
> set?
>
> # zpool offline rstore2 sdi
> cannot offline sdi: no valid replicas
>
> I didn't see a -f force type option. Is there a way to be more
> forceful with zpool and offline a bad disk?
>
> > when it's offline you replace it by a new drive, and then
> > zpool replace rstore2 sdi
> > and zpool online rstore2 sdi
>
> I didn't try these yet as I was unable to offline sdi. Should I just
> remove the bad drive from the system without first 'removing' it from
> that pool? My experience yesterday tells me that if I remove the
> device from the pool, then I will be unable to do any action on that
> pool as it will have a device missing.
>
> > Notice that you can also wait for the scrub to complete to see what it
> will
> > say at the end before trying to replace it.
>
> I tried it again (now that there is no actions like scub or resilver
> going on) and I get the same busy:
>
> # zpool replace rstore2 sdi
> cannot replace sdi with sdi: sdi is busy
>
>
> > For the version, it's ok, the pool version 13 is the most recent released
> > version until now, a new one should be released soon.
>
> I can't wait. Maybe it will make my problems go away?
>
> > For the bad vdev error, I am not sure, did you try to specify /dev/sdi
> > instead of sdi?
>
> Yes, I tried that command replace -f with both sdi and /dev/sdi and
> both returned the exact same message, "invalid vdev specification".
>
> Thanks again for all your help; I can't wait to get this fixed.
>
> Val
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Rudd-O
2009-08-27 12:23:35 UTC
Permalink
Do these problems of out-of-memory happen with or without your patches?
devzero-S0/
2009-08-25 18:23:44 UTC
Permalink
>VL : I hadn't noticed that you were using 4 drives for a raidz1 pool. You had to force the 4th drive with a -f flag, raidz1 pools always use drives by 3 (3, 6, ...).

are you sure with that?
pointers?


VL : I hadn't noticed that you were using 4 drives for a raidz1 pool.
> You had to force the 4th drive with a -f flag, raidz1 pools always
> use drives by 3 (3, 6, ...).
>
> So your pool is now unstable, if you loose 1 drive, you losse most of
> it since the data can't be replicated anymore.
>
> Maybe there is a trick to try to save things in this case, but I don'
> t know it (and I doubt there is).
> zfs accepts -f to add a 4th drive in this case, but it's better never
> to have a drive failure in this case because it's like you don't have
> any raidz1 at all.
>
> You can try the zfsadmin.pdf to check if they have a clue about that :
> http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
>
> Now Rudd-O : in case you read this, I just found a bug stopper for
> your release. To reproduce these bugs with raidz1 pools, I have tried
> with raw files as usual, and this time our zfs-fuse has real problems
> here. zpool status never finishes with out of memory errors in the
> syslog. So if you get a failure in a raidz1 pool with the current zfs-
> fuse, you are in bad shape !
> This should be investigated...
>
> 2009/8/25 VL <val.luck-***@public.gmane.org>
>
> Hi Emmanuel,
>
> Thanks so much for your reply.
>
> On Aug 25, 2:17 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> > sdi is busy because there is a scrub in progress : since it had
> errors it
> > scans the whole disk to see what is reliable with it.
> > So if at this point you want to replace it no matter what, you must
> do
> > zpool offline rstore2 sdi
>
> Trying to offline the bad sdi device doesn't seem to work, it wants a
> replica. Shouldn't it be ok to remove 1 drive from 4 drive "raidz"
> set?
>
> # zpool offline rstore2 sdi
> cannot offline sdi: no valid replicas
>
> I didn't see a -f force type option. Is there a way to be more
> forceful with zpool and offline a bad disk?
>
> > when it's offline you replace it by a new drive, and then
> > zpool replace rstore2 sdi
> > and zpool online rstore2 sdi
>
> I didn't try these yet as I was unable to offline sdi. Should I just
> remove the bad drive from the system without first 'removing' it from
> that pool? My experience yesterday tells me that if I remove the
> device from the pool, then I will be unable to do any action on that
> pool as it will have a device missing.
>
> > Notice that you can also wait for the scrub to complete to see what
> it will
> > say at the end before trying to replace it.
>
> I tried it again (now that there is no actions like scub or resilver
> going on) and I get the same busy:
>
> # zpool replace rstore2 sdi
> cannot replace sdi with sdi: sdi is busy
>
> > For the version, it's ok, the pool version 13 is the most recent
> released
> > version until now, a new one should be released soon.
>
> I can't wait. Maybe it will make my problems go away?
>
> > For the bad vdev error, I am not sure, did you try to specify /dev/
> sdi
> > instead of sdi?
>
> Yes, I tried that command replace -f with both sdi and /dev/sdi and
> both returned the exact same message, "invalid vdev specification".
>
> Thanks again for all your help; I can't wait to get this fixed.
>
> Val
>
> >
>
>


________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/
Emmanuel Anne
2009-08-25 18:52:37 UTC
Permalink
I am sure because I just tried to reproduce it with raw files, but it's the
base for raid5, the principle is you have 3 disks :
a, b, c and c = a xor b
this way if you loose 1 disk in the 3 you can recover because each disk is
the result of a simple xor between the 2 others (to simplify).

So if you take a number of disks which is not a multiple of 3, then the
parity can't be kept equally on each disk anymore.
It would probably be a good idea ot look at a detailed zfs admin manual for
that, I guess you need to add a spare at least in this case.

Anyway you can reproduce this like that :
mkdir /root/dd
cd /root/dd
dd if=/dev/zero bs=1M count=100 of=image
dd if=/dev/zero bs=1M count=100 of=image2
dd if=/dev/zero bs=1M count=100 of=image3
dd if=/dev/zero bs=1M count=100 of=new

zpool create test raidz1 /root/dd/image*
-> ok, using 3 disks, parity everywhere.

Copy some files on the pool to use some space and then
zpool export test
rm image2
zpool import test -d .
-> ok
you will just get a degraded state with zpool status which is normal since
you still have 2 disks to rebuild the 3rd one.
Here you can safety run dd again to simulate buying a new disk :

dd if=/dev/zero bs=1M count=100 of=image2

and then
zpool replace test /root/dd/image2

a zpool status after that will show that scrub completed in no time because
100 Mb is very short !

Now if you use 4 disks instead of 3, there is real danger :
1st export test if you have created :
zpool export test
then erase everything :
dd if=/dev/zero bs=1M count=100 of=image
dd if=/dev/zero bs=1M count=100 of=image2
dd if=/dev/zero bs=1M count=100 of=image3

then create test again but with 4 disks :
zpool create test raidz1 /root/dd/image* /root/dd/new

in this case parity is unbalanced. If you put some data on the pool again,
export it as before and then rm new
if you try to import it after this you'll get the famous error message
saying that 1 device is unavialble because you don't have enough disks
anymore to rebuild it.

(notice that if you copy no data on the pool then it will import it happily
even if new is deleted because there is nothing to rebuild in this case).


2009/8/25 <devzero-S0/***@public.gmane.org>

>
> >VL : I hadn't noticed that you were using 4 drives for a raidz1 pool. You
> had to force the 4th drive with a -f flag, raidz1 pools always use drives by
> 3 (3, 6, ...).
>
> are you sure with that?
> pointers?
>
>
> VL : I hadn't noticed that you were using 4 drives for a raidz1 pool.
> > You had to force the 4th drive with a -f flag, raidz1 pools always
> > use drives by 3 (3, 6, ...).
> >
> > So your pool is now unstable, if you loose 1 drive, you losse most of
> > it since the data can't be replicated anymore.
> >
> > Maybe there is a trick to try to save things in this case, but I don'
> > t know it (and I doubt there is).
> > zfs accepts -f to add a 4th drive in this case, but it's better never
> > to have a drive failure in this case because it's like you don't have
> > any raidz1 at all.
> >
> > You can try the zfsadmin.pdf to check if they have a clue about that :
> > http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
> >
> > Now Rudd-O : in case you read this, I just found a bug stopper for
> > your release. To reproduce these bugs with raidz1 pools, I have tried
> > with raw files as usual, and this time our zfs-fuse has real problems
> > here. zpool status never finishes with out of memory errors in the
> > syslog. So if you get a failure in a raidz1 pool with the current zfs-
> > fuse, you are in bad shape !
> > This should be investigated...
> >
> > 2009/8/25 VL <val.luck-***@public.gmane.org>
> >
> > Hi Emmanuel,
> >
> > Thanks so much for your reply.
> >
> > On Aug 25, 2:17 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> > > sdi is busy because there is a scrub in progress : since it had
> > errors it
> > > scans the whole disk to see what is reliable with it.
> > > So if at this point you want to replace it no matter what, you must
> > do
> > > zpool offline rstore2 sdi
> >
> > Trying to offline the bad sdi device doesn't seem to work, it wants a
> > replica. Shouldn't it be ok to remove 1 drive from 4 drive "raidz"
> > set?
> >
> > # zpool offline rstore2 sdi
> > cannot offline sdi: no valid replicas
> >
> > I didn't see a -f force type option. Is there a way to be more
> > forceful with zpool and offline a bad disk?
> >
> > > when it's offline you replace it by a new drive, and then
> > > zpool replace rstore2 sdi
> > > and zpool online rstore2 sdi
> >
> > I didn't try these yet as I was unable to offline sdi. Should I just
> > remove the bad drive from the system without first 'removing' it from
> > that pool? My experience yesterday tells me that if I remove the
> > device from the pool, then I will be unable to do any action on that
> > pool as it will have a device missing.
> >
> > > Notice that you can also wait for the scrub to complete to see what
> > it will
> > > say at the end before trying to replace it.
> >
> > I tried it again (now that there is no actions like scub or resilver
> > going on) and I get the same busy:
> >
> > # zpool replace rstore2 sdi
> > cannot replace sdi with sdi: sdi is busy
> >
> > > For the version, it's ok, the pool version 13 is the most recent
> > released
> > > version until now, a new one should be released soon.
> >
> > I can't wait. Maybe it will make my problems go away?
> >
> > > For the bad vdev error, I am not sure, did you try to specify /dev/
> > sdi
> > > instead of sdi?
> >
> > Yes, I tried that command replace -f with both sdi and /dev/sdi and
> > both returned the exact same message, "invalid vdev specification".
> >
> > Thanks again for all your help; I can't wait to get this fixed.
> >
> > Val
> >
> > >
> >
> >
>
>
> ________________________________________________________________
> Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
> für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
tsuraan
2009-08-25 21:09:39 UTC
Permalink
> I am sure because I just tried to reproduce it with raw files, but it's the
> base for raid5, the principle is you have 3 disks :
> a, b, c and c = a xor b
> this way if you loose 1 disk in the 3 you can recover because each disk is
> the result of a simple xor between the 2 others (to simplify).

That's not quite accurate. The principle is that you have N disks,
and corresponding blocks on N-1 disks are xor'd to create the block on
disk N. This is perfectly valid with any number of disks, although
there's a point where double-disk failure is inevitable and RAID5
isn't the right answer.
>
> So if you take a number of disks which is not a multiple of 3, then the
> parity can't be kept equally on each disk anymore.
> It would probably be a good idea ot look at a detailed zfs admin manual for
> that, I guess you need to add a spare at least in this case.

Multiples of 3 aren't relevant with RAID5. You can xor any number of
blocks together to make a recovery block, and the loss of any of those
blocks can be recovered by xor'ing all the remaining blocks with the
recovery block. As for parities being equal on all disks, that's just
a matter of shifting which disk holds the parity for each set of
corresponding blocks. There's nothing about being a multiple of 3
that allows this to happen.
VL
2009-08-26 00:43:31 UTC
Permalink
Hello again zfs-fuse peoples, and thanks for all the attention.

After administering raid5 on NAS's, DAS's, raid PCI cards and software
raid, I have to say that I would believe a raid5 (raidz1) that is
limited to multiples of 3 (or multiples of anything) would be a flawed
implementation, as Roland commented.

I can't believe that Sun would have such an implementation, so at this
point I have to think it's something about Linux and/or how zfs-fuse
interacts with Linux devices that causes zfs-fuse not be able to work
with erroring devices. Perhaps it's my distro (Ubuntu) that handles
it differently than wahtever distro the programmer/porters (people who
are porting the code).

Correct me if I'm wrong, but in a raid5 array (or zfs raidz pool), if
you have N drives, then you should be able to bring it online, mount
it and use it with N-1 drives working. It would be degraded, and
suggest you replace the failed drive immediately to bring it up to N
drives, but N-1 should work. With N-1 working drives, zpool command
refuses to do anything; I can't replace the missing drive when it's
imported/online, and when it's exported with N-1 working drives, I
can't import it.

As it is, it is a show stopper. What if I were to lose a drive that
stopped spinning up? As it is now (zfs-fuse on Ubuntu, zfs pool
version 13), I would lose the array/pool.

I was actually hope for someone to step in and correct me with some
silly mistake or syntax and it would just start magically working as
it logically should. I'm guessing now that perhaps I've uncovered
some odd incompatibality with with the version of Ubuntu (or kernel) I
am running.
VL
2009-08-26 01:03:28 UTC
Permalink
Ok, as a proof of concept, I did this.

1) create 4 disk images to be used (similar to what Emmanual did)

mkdir /root/zfstest
cd /root/zfstest
dd if=/dev/zero bs=1M count=100 of=image1.dat
dd if=/dev/zero bs=1M count=100 of=image2.dat
dd if=/dev/zero bs=1M count=100 of=image3.dat
dd if=/dev/zero bs=1M count=100 of=image4.dat
zpool create ztest raidz1 /root/zfstest/image*

This created a /ztest mountpoint with just under 300mb usable space.
All looked well.

2) I copied 75mb to it, resilvered it, exported the pool, and imported
it, checking status after each step. All good.

3) I then exported it, deleted /root/zfstest/image4.dat. I checked,
and it claimed to be importable, but degraded.

4) I imported it, it was degraded. Data was still there.

5) I created a replacement device, image4replace.dat

dd if=/dev/zero bs=1M count=100 of=image4replace.dat

6) and I replaced image4.dat with image4replace.dat

zpool replace ztest /root/zfstest/image4.dat /root/zfstest/
image4replace.dat

7) everything came up and seemed to be happy.

# zpool status ztest
pool: ztest
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Tue Aug 25
17:52:17 2009
config:

NAME STATE READ WRITE
CKSUM
ztest ONLINE 0 0
0
raidz1 ONLINE 0 0
0
/root/zfstest/image1.dat ONLINE 0 0
0
/root/zfstest/image2.dat ONLINE 0 0
0
/root/zfstest/image3.dat ONLINE 0 0
0
/root/zfstest/image4replace.dat ONLINE 0 0
0

errors: No known data errors

So this test tells me that zfs-fuse CAN work on my version of Ubuntu
and that it's probably a problem with how Ubuntu interacts with/
exposes its hardware devices. I'm not a hardware programmer at all,
so I'm not sure where to troubleshoot next, other than "it can work,
but for me with my hardware, it doesn't work".

Anyone have any further troubleshooting steps I might take that might
shed light on the picture?

As always, thanks so much.

Val
ssc
2009-08-26 01:12:45 UTC
Permalink
I get the same 'invalid vdev specification' error when I use e.g. sdl
as device name (from the old /dev/sda or /dev/hdb device name scheme).
The error does not occur on my machine when using the same drive under
a different name, e.g. disk/by-id/scsi-1ATA_ST3500418AS_9VM2LMCV.
Check if you have a /dev/disk folder and try some of the alternate
device names.

I can't find any solution for the problem I am looking at. The raid is
still degraded. I tried offline / online, scrubbing, replace -f, etc.
pp, they all just restart a resilvering - which takes ~8 hours every
time.
Status is that there is one permanent error with an unimportant file
that I just can't delete ("invalid exchange"); I can rename it, change
its properties, etc., but can't get rid of it.

I am running out of options and I need to get the zpool back up asap.
I think I'll try exporting the zpool, reboot under OpenSolaris, re-
import and fix it there.

Conclusion: zfs-fuse fails when a harddrive dies.

On Aug 26, 12:43 pm, VL <val.l...-***@public.gmane.org> wrote:
> Hello again zfs-fuse peoples, and thanks for all the attention.
>
> After administering raid5 on NAS's, DAS's, raid PCI cards and software
> raid, I have to say that I would believe a raid5 (raidz1) that is
> limited to multiples of 3 (or multiples of anything) would be a flawed
> implementation, as Roland commented.
>
> I can't believe that Sun would have such an implementation, so at this
> point I have to think it's something about Linux and/or how zfs-fuse
> interacts with Linux devices that causes zfs-fuse not be able to work
> with erroring devices.  Perhaps it's my distro (Ubuntu) that handles
> it differently than wahtever distro the programmer/porters (people who
> are porting the code).
>
> Correct me if I'm wrong, but in a raid5 array (or zfs raidz pool), if
> you have N drives, then you should be able to bring it online, mount
> it and use it with N-1 drives working.  It would be degraded, and
> suggest you replace the failed drive immediately to bring it up to N
> drives, but N-1 should work.  With N-1 working drives, zpool command
> refuses to do anything; I can't replace the missing drive when it's
> imported/online, and when it's exported with N-1 working drives, I
> can't import it.
>
> As it is, it is a show stopper.  What if I were to lose a drive that
> stopped spinning up?  As it is now (zfs-fuse on Ubuntu, zfs pool
> version 13), I would lose the array/pool.
>
> I was actually hope for someone to step in and correct me with some
> silly mistake or syntax and it would just start magically working as
> it logically should.  I'm guessing now that perhaps I've uncovered
> some odd incompatibality with with the version of Ubuntu (or kernel) I
> am running.
VL
2009-08-26 01:57:22 UTC
Permalink
Hi Steven,

On Aug 25, 6:12 pm, ssc <steven.samuel.c...-***@public.gmane.org> wrote:
> I get the same 'invalid vdev specification' error when I use e.g. sdl
> as device name (from the old /dev/sda or /dev/hdb device name scheme).
> The error does not occur on my machine when using the same drive under
> a different name, e.g. disk/by-id/scsi-1ATA_ST3500418AS_9VM2LMCV.
> Check if you have a /dev/disk folder and try some of the alternate
> device names.

I do have /dev/disk/by-id. I will start to play around with that to
see if I might be able to get past that step.

The error I'm seeing now, is, I exported the pool, removed the failed
sdi, replaced it with new drive, powered on, new drive was sdi. I'm
trying to import it, and, I can't import it because it sees an sdi
(new disk) and thinks it is corrupt.

# zpool import
pool: rstore2
id: 762967251253940714
state: ONLINE
status: One or more devices contains corrupted data.
action: The pool can be imported using its name or numeric identifier.
see: http://www.sun.com/msg/ZFS-8000-4J
config:

rstore2 ONLINE
raidz1 ONLINE
sdf ONLINE
sdg ONLINE
sdh ONLINE
sdi UNAVAIL corrupted data

See how it says it CAN be imported? Well, it can't:

# zpool import -f rstore2
cannot import 'rstore2': one or more devices is currently unavailable

It seems that perhaps the error "one or more devices is currently
unavailable" is being thrown by mistake? Having one device
unavailable in a raidz1 shouldn't be a problem, and should be a
warning not an error. Two, yes, error, but not one.

In a non-imported state, I can not seem to do anything to the pool
rstore2 - it just errors out with 'no such pool'.

# zpool replace rstore2 sdi sdi
cannot open 'rstore2': no such pool
# zpool replace rstore2 sdi
cannot open 'rstore2': no such pool

So, back to my previous post -- zfs-fuse seems to work as advertised
with using disk images created with dd; it does not work on actual
hardware disks -- at least with my hardware.
VL
2009-08-26 02:05:48 UTC
Permalink
I apologize if I am overloading everyone with posts. This will be my
last self followup unless someone else has questions for me or my
setup.

So, I managed to duplicate the test I did with "ztest" that was
4x100mb disk images. In that test (documented above), I exported the
pool, deleted (removed) one of the disk images, the successfully
imported the degraded pool.

I have the exact same situation, with the 4th drive (sdi) removed from
the system, so it shows up "UNAVAIL cannot open", just as did my
image4.dat missing test above:

# zpool import
pool: rstore2
id: 762967251253940714
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-2Q
config:

rstore2 DEGRADED
raidz1 DEGRADED
sdf ONLINE
sdg ONLINE
sdh ONLINE
sdi UNAVAIL cannot open

Unfortunately, zfs-fuse must be hard coded to handle disk images vs
physical devices differently, as I tried the exact same command to
import the hardware pool with 1 missing as I did with disk image pool
with 1 missing:

# zpool import rstore2
cannot import 'rstore2': one or more devices is currently unavailable

Again, I don't know what other tests I can do, so, I will standby to
see if any developer-types have any questions or insight for me.

Thanks for reading,

Val
ssc
2009-08-26 02:18:17 UTC
Permalink
Why do you export your zpool ? Just as a last resort ? AFAIK, you do
that only when you want to move a zpool to a different machine or use
it under a new OS.

> # zpool import rstore2
> cannot import 'rstore2': one or more devices is currently unavailable

I think that's because zfs is still looking for the old hd (which you
replaced). Unfortunately, both old and new hd are running under the
same name.
How about
***@host:~$ sudo zpool replace <pool name> disk/by-id/<old hd name>
disk/by-id/<new hd name>
At least this did not result in the problems you describe.
<old hd name> won't be in there as a file anymore as the hd has been
removed, you basically have to re-assemble the correct name from the
hd type and serial number using the still existing drives as a
guideline.

On Aug 26, 2:05 pm, VL <val.l...-***@public.gmane.org> wrote:
> I apologize if I am overloading everyone with posts.  This will be my
> last self followup unless someone else has questions for me or my
> setup.
>
> So, I managed to duplicate the test I did with "ztest" that was
> 4x100mb disk images.  In that test (documented above), I exported the
> pool, deleted (removed) one of the disk images, the successfully
> imported the degraded pool.
>
> I have the exact same situation, with the 4th drive (sdi) removed from
> the system, so it shows up "UNAVAIL  cannot open", just as did my
> image4.dat missing test above:
>
> # zpool import
>   pool: rstore2
>     id: 762967251253940714
>  state: DEGRADED
> status: One or more devices are missing from the system.
> action: The pool can be imported despite missing or damaged devices.
> The
>         fault tolerance of the pool may be compromised if imported.
>    see:http://www.sun.com/msg/ZFS-8000-2Q
> config:
>
>         rstore2     DEGRADED
>           raidz1    DEGRADED
>             sdf     ONLINE
>             sdg     ONLINE
>             sdh     ONLINE
>             sdi     UNAVAIL  cannot open
>
> Unfortunately, zfs-fuse must be hard coded to handle disk images vs
> physical devices differently, as I tried the exact same command to
> import the hardware pool with 1 missing as I did with disk image pool
> with 1 missing:
>
> # zpool import rstore2
> cannot import 'rstore2': one or more devices is currently unavailable
>
> Again, I don't know what other tests I can do, so, I will standby to
> see if any developer-types have any questions or insight for me.
>
> Thanks for reading,
>
> Val
VL
2009-08-26 02:50:38 UTC
Permalink
On Aug 25, 7:18 pm, ssc <steven.samuel.c...-***@public.gmane.org> wrote:
> Why do you export your zpool ? Just as a last resort ? AFAIK, you do
> that only when you want to move a zpool to a different machine or use
> it under a new OS.

I export the pool to "unmount" and or take it "offline", so it's not
in use, so I can power down the drives. Is there a better way to
offline/umount/suspend use of a pool so it's devices can be power
cycled? "offline" command seems to be directed towards a device.

> > # zpool import rstore2
> > cannot import 'rstore2': one or more devices is currently unavailable
>
> I think that's because zfs is still looking for the old hd (which you
> replaced). Unfortunately, both old and new hd are running under the
> same name.

Correct, ZFS is looking for the flaky HD which was temporarily
removed. However, it worked just fine when dealing with disk images
(created with dd). It tells me it can be imported, and works with
disk images. With physical disk drives, I get the same "it can be
imported" message, but, errors out with that error.

> How about
> ***@host:~$ sudo zpool replace <pool name> disk/by-id/<old hd name>
> disk/by-id/<new hd name>
> At least this did not result in the problems you describe.
> <old hd name> won't be in there as a file anymore as the hd has been
> removed, you basically have to re-assemble the correct name from the
> hd type and serial number using the still existing drives as a
> guideline.

In order to do this, I need to reinsert the flaky "sdi" device, power
it on, attempt to "import" (as that's the only way zfs-fuse is letting
me import it to "online" status), then remove the flaky sdi device
'hot' (never a good idea, but hey, I'm game at this point) and slide
in the new replacement drive 'hot'. Since sdi was in use, the new
drive should show up as sdl, and I'll try the replace with the disk/by-
id.

Process...
* flaky drive reinserted, unit powered on
* rstore2 pool brought online
* flaky drive sdi yanked (dang those drives are hot)
* new drive inserted, shows up as sdl as expected
* did zpool replace with /dev/disk/by-id/.. Zpool hanging.
* opened new ssh window, "zpool status" command also hanging.
* ctrl-c out of zpool replace. All zpool command still hanging.
* kill -9 the process /sbin/zfs-fuse ; remove the pid file /var/run/
zfs-fuse.pid
* start zfs: /etc/init.d/zfs-fuse start
* rstore2 there, imported, online, corrupted, both sdi and sdh not
happy. There is a chance that the live subtraction/addition of a
drive in sdi's slot caused sdh to also cease functioning properly.

# zpool status
pool: rstore2
state: UNAVAIL
status: One or more devices could not be opened. There are
insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 UNAVAIL 0 0 0 insufficient replicas
raidz1 UNAVAIL 0 0 0 insufficient replicas
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh UNAVAIL 0 0 0 cannot open
sdi UNAVAIL 0 0 0 cannot open

* export the pool ; do a status. Back to square one... it sees a
degraded pool that 'can be imported' with 3 good drives and 1 bad
sdi. It can't actually be imported through:

# zpool import
pool: rstore2
id: 762967251253940714
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-2Q
config:

rstore2 DEGRADED
raidz1 DEGRADED
sdf ONLINE
sdg ONLINE
sdk ONLINE
sdi UNAVAIL cannot open
# zpool import rstore2
cannot import 'rstore2': one or more devices is currently unavailable

Something about Linux (ubuntu?) hardware devices that zfs-fuse can't
handle.

I'm game to try anything else. Hmm, what if I formatted these 4x200gb
drives as reiser or xfs ... then created a 200gb image (using dd),
then linked the 4 drives' images to /root/zfs/drive#.dat ... then
created the zfs pool with those disk images /root/zfs/drive*.dat ... I
wonder if it would then instead use the zfs code for disk images
instead of the code for devices ..?
Steven Samuel Cole
2009-08-26 10:42:26 UTC
Permalink
OK, I solved my problem. I exported the zpool under Linux, booted an
ancient FreeBSD system, re-imported it there, only this time the
resilvering worked, the replacing was complete and I was able to
delete the file in question. Export under FreeBSD, reboot Linux,
eventually get rid of the remaining issues and reports about permanent
errors. Another scrub is running now, ETA 8 hours.

Why does zfs-fuse import zpools upon system startup that had been
exported previously ? That's Micro$oft style.
Also, zfs-fuse seg-faulted a couple of times. That's Micro$oft style, too.

My conclusion from this incident: zfs-fuse might just about work in an
_ideal_ world, In the real world of broken cables and flaky
harddrives, it is just useless when it is needed most. I will add a
dual-boot FreeBSD or OpenSolaris installation to my machine.

2009/8/26 VL <val.luck-***@public.gmane.org>:
>
>
> On Aug 25, 7:18 pm, ssc <steven.samuel.c...-***@public.gmane.org> wrote:
>> Why do you export your zpool ? Just as a last resort ? AFAIK, you do
>> that only when you want to move a zpool to a different machine or use
>> it under a new OS.
>
> I export the pool to "unmount" and or take it "offline", so it's not
> in use, so I can power down the drives.  Is there a better way to
> offline/umount/suspend use of a pool so it's devices can be power
> cycled?  "offline" command seems to be directed towards a device.
>
>> > # zpool import rstore2
>> > cannot import 'rstore2': one or more devices is currently unavailable
>>
>> I think that's because zfs is still looking for the old hd (which you
>> replaced). Unfortunately, both old and new hd are running under the
>> same name.
>
> Correct, ZFS is looking for the flaky HD which was temporarily
> removed.  However, it worked just fine when dealing with disk images
> (created with dd).  It tells me it can be imported, and works with
> disk images.  With physical disk drives, I get the same "it can be
> imported" message, but, errors out with that error.
>
>> How about
>> ***@host:~$ sudo zpool replace <pool name> disk/by-id/<old hd name>
>> disk/by-id/<new hd name>
>> At least this did not result in the problems you describe.
>> <old hd name> won't be in there as a file anymore as the hd has been
>> removed, you basically have to re-assemble the correct name from the
>> hd type and serial number using the still existing drives as a
>> guideline.
>
> In order to do this, I need to reinsert the flaky "sdi" device, power
> it on, attempt to "import" (as that's the only way zfs-fuse is letting
> me import it to "online" status), then remove the flaky sdi device
> 'hot' (never a good idea, but hey, I'm game at this point) and slide
> in the new replacement drive 'hot'.  Since sdi was in use, the new
> drive should show up as sdl, and I'll try the replace with the disk/by-
> id.
>
> Process...
> * flaky drive reinserted, unit powered on
> * rstore2 pool brought online
> * flaky drive sdi yanked (dang those drives are hot)
> * new drive inserted, shows up as sdl as expected
> * did zpool replace with /dev/disk/by-id/..  Zpool hanging.
> * opened new ssh window, "zpool status" command also hanging.
> * ctrl-c out of zpool replace.  All zpool command still hanging.
> * kill -9 the process /sbin/zfs-fuse ; remove the pid file /var/run/
> zfs-fuse.pid
> * start zfs: /etc/init.d/zfs-fuse start
> * rstore2 there, imported, online, corrupted, both sdi and sdh not
> happy.  There is a chance that the live subtraction/addition of a
> drive in sdi's slot caused sdh to also cease functioning properly.
>
> # zpool status
>  pool: rstore2
>  state: UNAVAIL
> status: One or more devices could not be opened.  There are
> insufficient
>        replicas for the pool to continue functioning.
> action: Attach the missing device and online it using 'zpool online'.
>   see: http://www.sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        rstore2     UNAVAIL      0     0     0  insufficient replicas
>          raidz1    UNAVAIL      0     0     0  insufficient replicas
>            sdf     ONLINE       0     0     0
>            sdg     ONLINE       0     0     0
>            sdh     UNAVAIL      0     0     0  cannot open
>            sdi     UNAVAIL      0     0     0  cannot open
>
> * export the pool ; do a status.  Back to square one... it sees a
> degraded pool that 'can be imported' with 3 good drives and 1 bad
> sdi.  It can't actually be imported through:
>
> # zpool import
>  pool: rstore2
>    id: 762967251253940714
>  state: DEGRADED
> status: One or more devices are missing from the system.
> action: The pool can be imported despite missing or damaged devices.
> The
>        fault tolerance of the pool may be compromised if imported.
>   see: http://www.sun.com/msg/ZFS-8000-2Q
> config:
>
>        rstore2     DEGRADED
>          raidz1    DEGRADED
>            sdf     ONLINE
>            sdg     ONLINE
>            sdk     ONLINE
>            sdi     UNAVAIL  cannot open
> # zpool import rstore2
> cannot import 'rstore2': one or more devices is currently unavailable
>
> Something about Linux (ubuntu?) hardware devices that zfs-fuse can't
> handle.
>
> I'm game to try anything else.  Hmm, what if I formatted these 4x200gb
> drives as reiser or xfs ... then created a 200gb image (using dd),
> then linked the 4 drives' images to /root/zfs/drive#.dat ... then
> created the zfs pool with those disk images /root/zfs/drive*.dat ... I
> wonder if it would then instead use the zfs code for disk images
> instead of the code for devices ..?
>
> >
>
ssc
2009-08-26 11:03:32 UTC
Permalink
Hey VL,

just another piece of advice: When importing zpools, use import -d
<folder>, e.g. sudo zpool import -d /dev/disk/by-id
When I just rebooted from FreeBSD back into Kubuntu, zfs-fuse imported
all zpools automatically (which it should not do), of course without
the -d parameter, so all drives were listed in zpool status as sda,
sdb, sdc, etc. Next time I booted, the IDE & SATA controllers were in
a different order, so all sda sdb sdc etc drive names referred to
different drives and all zpools were reported as this:

pool: zpool01
state: UNAVAIL
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are
unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software
versions.
scrub: none
requested
config:

NAME STATE READ WRITE CKSUM
zpool01 UNAVAIL 0 0 0 insufficient replicas
raidz1 UNAVAIL 0 0 0 corrupted data
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0

pool: zpool02
state:
UNAVAIL
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are
unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software
versions.
scrub: none
requested
config:

NAME STATE READ WRITE
CKSUM
zpool02 UNAVAIL 0 0 0 insufficient
replicas
raidz1 UNAVAIL 0 0 0 corrupted
data
sde ONLINE 0 0
0
sdf ONLINE 0 0
0
sdg ONLINE 0 0
0
sdh ONLINE 0 0
0

pool:
zpool03
state:
UNAVAIL
status: One or more devices could not be used because the label is
missing
or invalid. There are insufficient replicas for the pool to
continue

functioning.
action: Destroy and re-create the pool from a backup
source.
see: http://www.sun.com/msg/ZFS-8000-5E
scrub: none
requested
config:

NAME STATE READ WRITE CKSUM
zpool03 UNAVAIL 0 0 0 insufficient replicas
raidz1 UNAVAIL 0 0 0 insufficient replicas
sda FAULTED 0 0 0 corrupted data
sdb FAULTED 0 0 0 corrupted data
sdc FAULTED 0 0 0 corrupted data
sdd FAULTED 0 0 0 corrupted data

Solution: export all zpools and re-import them with the -d switch. Now
it looks like this:

pool: zpool01
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore
the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub in progress for 0h8m, 2,69% done, 4h50m to go
config:

NAME STATE READ
WRITE CKSUM
zpool01 ONLINE
0 0 4
raidz1 ONLINE
0 0 4
disk/by-id/scsi-1ATA_ST3500320AS_9QM3CPW7 ONLINE
0 0 0
disk/by-id/scsi-1ATA_ST3500320AS_9QM3QNW7 ONLINE
0 0 0
disk/by-id/scsi-1ATA_ST3500320AS_9QM3P9VB ONLINE
0 0 0
disk/by-id/scsi-1ATA_ST3500418AS_9VM2LMCV ONLINE
0 0 0

errors: Permanent errors have been detected in the following files:

zpool01:<0x3736a>

pool: zpool02
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
zpool02
ONLINE 0 0 0
raidz1
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ1JL417664
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ1JL417657
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ10YB21736
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ10YB21737
ONLINE 0 0 0

errors: No known data errors

pool: zpool03
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
zpool03
ONLINE 0 0 0
raidz1
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L800059
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L800069
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L805620
ONLINE 0 0 0
disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L805605
ONLINE 0 0 0

errors: No known data errors

Let's see if I will ever get rid of that last error message...

On Aug 26, 10:42 pm, Steven Samuel Cole <steven.samuel.c...-***@public.gmane.org>
wrote:
> OK, I solved my problem. I exported the zpool under Linux, booted an
> ancient FreeBSD system, re-imported it there, only this time the
> resilvering worked, the replacing was complete and I was able to
> delete the file in question. Export under FreeBSD, reboot Linux,
> eventually get rid of the remaining issues and reports about permanent
> errors. Another scrub is running now, ETA 8 hours.
>
> Why does zfs-fuse import zpools upon system startup that had been
> exported previously ? That's Micro$oft style.
> Also, zfs-fuse seg-faulted a couple of times. That's Micro$oft style, too.
>
> My conclusion from this incident: zfs-fuse might just about work in an
> _ideal_ world, In the real world of broken cables and flaky
> harddrives, it is just useless when it is needed most. I will add a
> dual-boot FreeBSD or OpenSolaris installation to my machine.
>
> 2009/8/26 VL <val.l...-***@public.gmane.org>:
>
>
>
> > On Aug 25, 7:18 pm, ssc <steven.samuel.c...-***@public.gmane.org> wrote:
> >> Why do you export your zpool ? Just as a last resort ? AFAIK, you do
> >> that only when you want to move a zpool to a different machine or use
> >> it under a new OS.
>
> > I export the pool to "unmount" and or take it "offline", so it's not
> > in use, so I can power down the drives.  Is there a better way to
> > offline/umount/suspend use of a pool so it's devices can be power
> > cycled?  "offline" command seems to be directed towards a device.
>
> >> > # zpool import rstore2
> >> > cannot import 'rstore2': one or more devices is currently unavailable
>
> >> I think that's because zfs is still looking for the old hd (which you
> >> replaced). Unfortunately, both old and new hd are running under the
> >> same name.
>
> > Correct, ZFS is looking for the flaky HD which was temporarily
> > removed.  However, it worked just fine when dealing with disk images
> > (created with dd).  It tells me it can be imported, and works with
> > disk images.  With physical disk drives, I get the same "it can be
> > imported" message, but, errors out with that error.
>
> >> How about
> >> ***@host:~$ sudo zpool replace <pool name> disk/by-id/<old hd name>
> >> disk/by-id/<new hd name>
> >> At least this did not result in the problems you describe.
> >> <old hd name> won't be in there as a file anymore as the hd has been
> >> removed, you basically have to re-assemble the correct name from the
> >> hd type and serial number using the still existing drives as a
> >> guideline.
>
> > In order to do this, I need to reinsert the flaky "sdi" device, power
> > it on, attempt to "import" (as that's the only way zfs-fuse is letting
> > me import it to "online" status), then remove the flaky sdi device
> > 'hot' (never a good idea, but hey, I'm game at this point) and slide
> > in the new replacement drive 'hot'.  Since sdi was in use, the new
> > drive should show up as sdl, and I'll try the replace with the disk/by-
> > id.
>
> > Process...
> > * flaky drive reinserted, unit powered on
> > * rstore2 pool brought online
> > * flaky drive sdi yanked (dang those drives are hot)
> > * new drive inserted, shows up as sdl as expected
> > * did zpool replace with /dev/disk/by-id/..  Zpool hanging.
> > * opened new ssh window, "zpool status" command also hanging.
> > * ctrl-c out of zpool replace.  All zpool command still hanging.
> > * kill -9 the process /sbin/zfs-fuse ; remove the pid file /var/run/
> > zfs-fuse.pid
> > * start zfs: /etc/init.d/zfs-fuse start
> > * rstore2 there, imported, online, corrupted, both sdi and sdh not
> > happy.  There is a chance that the live subtraction/addition of a
> > drive in sdi's slot caused sdh to also cease functioning properly.
>
> > # zpool status
> >  pool: rstore2
> >  state: UNAVAIL
> > status: One or more devices could not be opened.  There are
> > insufficient
> >        replicas for the pool to continue functioning.
> > action: Attach the missing device and online it using 'zpool online'.
> >   see:http://www.sun.com/msg/ZFS-8000-3C
> >  scrub: none requested
> > config:
>
> >        NAME        STATE     READ WRITE CKSUM
> >        rstore2     UNAVAIL      0     0     0  insufficient replicas
> >          raidz1    UNAVAIL      0     0     0  insufficient replicas
> >            sdf     ONLINE       0     0     0
> >            sdg     ONLINE       0     0     0
> >            sdh     UNAVAIL      0     0     0  cannot open
> >            sdi     UNAVAIL      0     0     0  cannot open
>
> > * export the pool ; do a status.  Back to square one... it sees a
> > degraded pool that 'can be imported' with 3 good drives and 1 bad
> > sdi.  It can't actually be imported through:
>
> > # zpool import
> >  pool: rstore2
> >    id: 762967251253940714
> >  state: DEGRADED
> > status: One or more devices are missing from the system.
> > action: The pool can be imported despite missing or damaged devices.
> > The
> >        fault tolerance of the pool may be compromised if imported.
> >   see:http://www.sun.com/msg/ZFS-8000-2Q
> > config:
>
> >        rstore2     DEGRADED
> >          raidz1    DEGRADED
> >            sdf     ONLINE
> >            sdg     ONLINE
> >            sdk     ONLINE
> >            sdi     UNAVAIL  cannot open
> > # zpool import rstore2
> > cannot import 'rstore2': one or more devices is currently unavailable
>
> > Something about Linux (ubuntu?) hardware devices that zfs-fuse can't
> > handle.
>
> > I'm game to try anything else.  Hmm, what if I formatted these 4x200gb
> > drives as reiser or xfs ... then created a 200gb image (using dd),
> > then linked the 4 drives' images to /root/zfs/drive#.dat ... then
> > created the zfs pool with those disk images /root/zfs/drive*.dat ... I
> > wonder if it would then instead use the zfs code for disk images
> > instead of the code for devices ..?
Emmanuel Anne
2009-08-26 17:40:21 UTC
Permalink
Too many mails on this thread and I was away for the day.

Anyway to sum up :
indeed zfs is clever enough to balance the parity between all disks so that
even with 4 disks if you delete 1 disk you don't loose anything.
BUT
there is a bug at least in zfs-fuse-0.5.1 which makes it to say "can't
import the pool because 1 disk is currently unavailable" when you just
deleteed a disk.

This bug might very well be gone in 0.5.0.

Now there is a serious bug in the current mercurial repository when testing
with raidz1 pools and deleting one of the disks this way. When you try to
import the pool after that, the zpool command just hangs for ever with out
of memory errors related to put_nvlist in syslog.
So the culprit was easy to find : put_nvlist.
For some reason when you put back the line which was commented in an old
patch, it works again.
And the surprise is that this patch was supposed to fix the disappearing
volumes bug, but even with the line uncommented the test-datasets script is
still working.

And the good news is that it's working ok with 4 disks in this case, it
never complains.
I'll need more time to look into this...

BUT anyway, if you create a pool with 3 disks initially and then force the
addition of only 1 disk (you need the -f flag for this), then the disk is
added outside the raidz pool (as a normal disk), so if you fill your pool
enough to use this last disk and then remove the disk, the pool will become
impossible to import with the message "can't import pool 1 disk currently
unavailable".
So for this to work the pool must be created from the start with 4 disks,
which makes sense.

2009/8/26 ssc <steven.samuel.cole-***@public.gmane.org>

>
> Hey VL,
>
> just another piece of advice: When importing zpools, use import -d
> <folder>, e.g. sudo zpool import -d /dev/disk/by-id
> When I just rebooted from FreeBSD back into Kubuntu, zfs-fuse imported
> all zpools automatically (which it should not do), of course without
> the -d parameter, so all drives were listed in zpool status as sda,
> sdb, sdc, etc. Next time I booted, the IDE & SATA controllers were in
> a different order, so all sda sdb sdc etc drive names referred to
> different drives and all zpools were reported as this:
>
> pool: zpool01
> state: UNAVAIL
> status: The pool is formatted using an older on-disk format. The pool
> can
> still be used, but some features are
> unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
> the
> pool will no longer be accessible on older software
> versions.
> scrub: none
> requested
> config:
>
> NAME STATE READ WRITE CKSUM
> zpool01 UNAVAIL 0 0 0 insufficient replicas
> raidz1 UNAVAIL 0 0 0 corrupted data
> sdi ONLINE 0 0 0
> sdj ONLINE 0 0 0
> sdk ONLINE 0 0 0
> sdl ONLINE 0 0 0
>
> pool: zpool02
> state:
> UNAVAIL
> status: The pool is formatted using an older on-disk format. The pool
> can
> still be used, but some features are
> unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
> the
> pool will no longer be accessible on older software
> versions.
> scrub: none
> requested
> config:
>
> NAME STATE READ WRITE
> CKSUM
> zpool02 UNAVAIL 0 0 0 insufficient
> replicas
> raidz1 UNAVAIL 0 0 0 corrupted
> data
> sde ONLINE 0 0
> 0
> sdf ONLINE 0 0
> 0
> sdg ONLINE 0 0
> 0
> sdh ONLINE 0 0
> 0
>
> pool:
> zpool03
> state:
> UNAVAIL
> status: One or more devices could not be used because the label is
> missing
> or invalid. There are insufficient replicas for the pool to
> continue
>
> functioning.
> action: Destroy and re-create the pool from a backup
> source.
> see: http://www.sun.com/msg/ZFS-8000-5E
> scrub: none
> requested
> config:
>
> NAME STATE READ WRITE CKSUM
> zpool03 UNAVAIL 0 0 0 insufficient replicas
> raidz1 UNAVAIL 0 0 0 insufficient replicas
> sda FAULTED 0 0 0 corrupted data
> sdb FAULTED 0 0 0 corrupted data
> sdc FAULTED 0 0 0 corrupted data
> sdd FAULTED 0 0 0 corrupted data
>
> Solution: export all zpools and re-import them with the -d switch. Now
> it looks like this:
>
> pool: zpool01
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore
> the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub in progress for 0h8m, 2,69% done, 4h50m to go
> config:
>
> NAME STATE READ
> WRITE CKSUM
> zpool01 ONLINE
> 0 0 4
> raidz1 ONLINE
> 0 0 4
> disk/by-id/scsi-1ATA_ST3500320AS_9QM3CPW7 ONLINE
> 0 0 0
> disk/by-id/scsi-1ATA_ST3500320AS_9QM3QNW7 ONLINE
> 0 0 0
> disk/by-id/scsi-1ATA_ST3500320AS_9QM3P9VB ONLINE
> 0 0 0
> disk/by-id/scsi-1ATA_ST3500418AS_9VM2LMCV ONLINE
> 0 0 0
>
> errors: Permanent errors have been detected in the following files:
>
> zpool01:<0x3736a>
>
> pool: zpool02
> state: ONLINE
> status: The pool is formatted using an older on-disk format. The pool
> can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
> the
> pool will no longer be accessible on older software versions.
> scrub: none requested
> config:
>
> NAME
> STATE READ WRITE CKSUM
> zpool02
> ONLINE 0 0 0
> raidz1
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ1JL417664
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ1JL417657
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ10YB21736
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_SP2514N_S08BJ10YB21737
> ONLINE 0 0 0
>
> errors: No known data errors
>
> pool: zpool03
> state: ONLINE
> status: The pool is formatted using an older on-disk format. The pool
> can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
> the
> pool will no longer be accessible on older software versions.
> scrub: none requested
> config:
>
> NAME
> STATE READ WRITE CKSUM
> zpool03
> ONLINE 0 0 0
> raidz1
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L800059
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L800069
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L805620
> ONLINE 0 0 0
> disk/by-id/scsi-1ATA_SAMSUNG_HM160JC_S0CMJ10L805605
> ONLINE 0 0 0
>
> errors: No known data errors
>
> Let's see if I will ever get rid of that last error message...
>
> On Aug 26, 10:42 pm, Steven Samuel Cole <steven.samuel.c...-***@public.gmane.org>
> wrote:
> > OK, I solved my problem. I exported the zpool under Linux, booted an
> > ancient FreeBSD system, re-imported it there, only this time the
> > resilvering worked, the replacing was complete and I was able to
> > delete the file in question. Export under FreeBSD, reboot Linux,
> > eventually get rid of the remaining issues and reports about permanent
> > errors. Another scrub is running now, ETA 8 hours.
> >
> > Why does zfs-fuse import zpools upon system startup that had been
> > exported previously ? That's Micro$oft style.
> > Also, zfs-fuse seg-faulted a couple of times. That's Micro$oft style,
> too.
> >
> > My conclusion from this incident: zfs-fuse might just about work in an
> > _ideal_ world, In the real world of broken cables and flaky
> > harddrives, it is just useless when it is needed most. I will add a
> > dual-boot FreeBSD or OpenSolaris installation to my machine.
> >
> > 2009/8/26 VL <val.l...-***@public.gmane.org>:
> >
> >
> >
> > > On Aug 25, 7:18 pm, ssc <steven.samuel.c...-***@public.gmane.org> wrote:
> > >> Why do you export your zpool ? Just as a last resort ? AFAIK, you do
> > >> that only when you want to move a zpool to a different machine or use
> > >> it under a new OS.
> >
> > > I export the pool to "unmount" and or take it "offline", so it's not
> > > in use, so I can power down the drives. Is there a better way to
> > > offline/umount/suspend use of a pool so it's devices can be power
> > > cycled? "offline" command seems to be directed towards a device.
> >
> > >> > # zpool import rstore2
> > >> > cannot import 'rstore2': one or more devices is currently
> unavailable
> >
> > >> I think that's because zfs is still looking for the old hd (which you
> > >> replaced). Unfortunately, both old and new hd are running under the
> > >> same name.
> >
> > > Correct, ZFS is looking for the flaky HD which was temporarily
> > > removed. However, it worked just fine when dealing with disk images
> > > (created with dd). It tells me it can be imported, and works with
> > > disk images. With physical disk drives, I get the same "it can be
> > > imported" message, but, errors out with that error.
> >
> > >> How about
> > >> ***@host:~$ sudo zpool replace <pool name> disk/by-id/<old hd name>
> > >> disk/by-id/<new hd name>
> > >> At least this did not result in the problems you describe.
> > >> <old hd name> won't be in there as a file anymore as the hd has been
> > >> removed, you basically have to re-assemble the correct name from the
> > >> hd type and serial number using the still existing drives as a
> > >> guideline.
> >
> > > In order to do this, I need to reinsert the flaky "sdi" device, power
> > > it on, attempt to "import" (as that's the only way zfs-fuse is letting
> > > me import it to "online" status), then remove the flaky sdi device
> > > 'hot' (never a good idea, but hey, I'm game at this point) and slide
> > > in the new replacement drive 'hot'. Since sdi was in use, the new
> > > drive should show up as sdl, and I'll try the replace with the disk/by-
> > > id.
> >
> > > Process...
> > > * flaky drive reinserted, unit powered on
> > > * rstore2 pool brought online
> > > * flaky drive sdi yanked (dang those drives are hot)
> > > * new drive inserted, shows up as sdl as expected
> > > * did zpool replace with /dev/disk/by-id/.. Zpool hanging.
> > > * opened new ssh window, "zpool status" command also hanging.
> > > * ctrl-c out of zpool replace. All zpool command still hanging.
> > > * kill -9 the process /sbin/zfs-fuse ; remove the pid file /var/run/
> > > zfs-fuse.pid
> > > * start zfs: /etc/init.d/zfs-fuse start
> > > * rstore2 there, imported, online, corrupted, both sdi and sdh not
> > > happy. There is a chance that the live subtraction/addition of a
> > > drive in sdi's slot caused sdh to also cease functioning properly.
> >
> > > # zpool status
> > > pool: rstore2
> > > state: UNAVAIL
> > > status: One or more devices could not be opened. There are
> > > insufficient
> > > replicas for the pool to continue functioning.
> > > action: Attach the missing device and online it using 'zpool online'.
> > > see:http://www.sun.com/msg/ZFS-8000-3C
> > > scrub: none requested
> > > config:
> >
> > > NAME STATE READ WRITE CKSUM
> > > rstore2 UNAVAIL 0 0 0 insufficient replicas
> > > raidz1 UNAVAIL 0 0 0 insufficient replicas
> > > sdf ONLINE 0 0 0
> > > sdg ONLINE 0 0 0
> > > sdh UNAVAIL 0 0 0 cannot open
> > > sdi UNAVAIL 0 0 0 cannot open
> >
> > > * export the pool ; do a status. Back to square one... it sees a
> > > degraded pool that 'can be imported' with 3 good drives and 1 bad
> > > sdi. It can't actually be imported through:
> >
> > > # zpool import
> > > pool: rstore2
> > > id: 762967251253940714
> > > state: DEGRADED
> > > status: One or more devices are missing from the system.
> > > action: The pool can be imported despite missing or damaged devices.
> > > The
> > > fault tolerance of the pool may be compromised if imported.
> > > see:http://www.sun.com/msg/ZFS-8000-2Q
> > > config:
> >
> > > rstore2 DEGRADED
> > > raidz1 DEGRADED
> > > sdf ONLINE
> > > sdg ONLINE
> > > sdk ONLINE
> > > sdi UNAVAIL cannot open
> > > # zpool import rstore2
> > > cannot import 'rstore2': one or more devices is currently unavailable
> >
> > > Something about Linux (ubuntu?) hardware devices that zfs-fuse can't
> > > handle.
> >
> > > I'm game to try anything else. Hmm, what if I formatted these 4x200gb
> > > drives as reiser or xfs ... then created a 200gb image (using dd),
> > > then linked the 4 drives' images to /root/zfs/drive#.dat ... then
> > > created the zfs pool with those disk images /root/zfs/drive*.dat ... I
> > > wonder if it would then instead use the zfs code for disk images
> > > instead of the code for devices ..?
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-27 17:24:48 UTC
Permalink
Thanks for your follow up, Emmanuel.

I'm wondering, in todays post of "Show stoppers for a 0.6.0
release" ... is 0.6.0 close enough for me to download, compile and
use?

And selfishly, do you think that it fixes some/any of the issues/bugs
that I'm hitting in my situation? I kind of need to move forward with
a solution and I'd love it to be zfs.

As it stands, I have a 4 drive raidz pool with one flaky drive and I
am unable to replace it. It seems my feasible options are to get
another box up and running open-solaris, or, use md/lvm on linux.
VL
2009-08-27 19:50:27 UTC
Permalink
Hello zfs-fuse peoples,

I thought this would be a post with some good news. I managed move
around data and free up a device slot, and I brought a new drive
online /dev/sdh.

I powered everything off, and on, and flaky /dev/sdi was online for
the import. So far so good. I then did the replace command, and it
took, it was replacing!

# zpool status
pool: rstore2
state: ONLINE
status: One or more devices is currently being resilvered. The pool
will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h37m, 32.80% done, 1h17m to go
config:

NAME STATE READ WRITE CKSUM
rstore2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing ONLINE 0 0 0
sdi ONLINE 0 223K 0
sdh ONLINE 0 0 0

errors: No known data errors

Everything looked good. Until I checked it a bit later:

# zpool status
connect: Connection refused
Please make sure that the zfs-fuse daemon is running.
internal error: failed to initialize ZFS library

It seems /sbin/zfs-fuse process and crashed and gone away. I had to
remove the pidfile, re-run /sbin/zfs-fuse, export and re-import the
rstore2 pool before getting it back to this state, where it's "stuck"
again:

# zpool status
pool: rstore2
state: DEGRADED
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing DEGRADED 0 0 0
9461167363731726650 UNAVAIL 0 0 0 was /
dev/sdi
sdh ONLINE 0 0 0

errors: No known data errors

I tried to offline both 9461167363731726650 and /dev/sdh , but both
could not because "no valid replicas" for either. My thought was to
somehow get sdh out of there and try to re-add it as a hot spare. If
it was trying to copy the contents of the flaky /dev/sdi, then of
course /dev/sdi would flake out and die. I was hoping it would
reconstruct the contents of /dev/sdi from the raid 'parity' striped on
sdf,sdg,sde. Any ideas on how to do this? Is it possible?
Emmanuel Anne
2009-08-28 08:44:41 UTC
Permalink
Sorry to learn these bad news then...

Well the current version seems to behave better but only if you patch
put_nvlist first, and since Rudd-O just move to git it might not be an
excellent time to try it...
But yes it's a good idea to try it, it would be good to know how it works
for you.

I'll try 1st to look how this new git repo works, send a patch for
put_nvlist, and eventually put a simple tar.gz for you to download from
somewhere, I'll post again about that soon (in less than 8 hours hopefully
!).

2009/8/27 VL <val.luck-***@public.gmane.org>

>
> Hello zfs-fuse peoples,
>
> I thought this would be a post with some good news. I managed move
> around data and free up a device slot, and I brought a new drive
> online /dev/sdh.
>
> I powered everything off, and on, and flaky /dev/sdi was online for
> the import. So far so good. I then did the replace command, and it
> took, it was replacing!
>
> # zpool status
> pool: rstore2
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool
> will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress for 0h37m, 32.80% done, 1h17m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing ONLINE 0 0 0
> sdi ONLINE 0 223K 0
> sdh ONLINE 0 0 0
>
> errors: No known data errors
>
> Everything looked good. Until I checked it a bit later:
>
> # zpool status
> connect: Connection refused
> Please make sure that the zfs-fuse daemon is running.
> internal error: failed to initialize ZFS library
>
> It seems /sbin/zfs-fuse process and crashed and gone away. I had to
> remove the pidfile, re-run /sbin/zfs-fuse, export and re-import the
> rstore2 pool before getting it back to this state, where it's "stuck"
> again:
>
> # zpool status
> pool: rstore2
> state: DEGRADED
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing DEGRADED 0 0 0
> 9461167363731726650 UNAVAIL 0 0 0 was /
> dev/sdi
> sdh ONLINE 0 0 0
>
> errors: No known data errors
>
> I tried to offline both 9461167363731726650 and /dev/sdh , but both
> could not because "no valid replicas" for either. My thought was to
> somehow get sdh out of there and try to re-add it as a hot spare. If
> it was trying to copy the contents of the flaky /dev/sdi, then of
> course /dev/sdi would flake out and die. I was hoping it would
> reconstruct the contents of /dev/sdi from the raid 'parity' striped on
> sdf,sdg,sde. Any ideas on how to do this? Is it possible?
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Rudd-O
2009-08-28 19:39:19 UTC
Permalink
Let me know when you have a public repo pushed with the nvlist patch
so I can directly merge your changes in.
Emmanuel Anne
2009-08-29 00:48:11 UTC
Permalink
Ok, it will just take a few days, because I have a few weird ideas for that.

2009/8/28 Rudd-O <rudd-o-U/0UrcBcm+/QT0dZR+***@public.gmane.org>

>
> Let me know when you have a public repo pushed with the nvlist patch
> so I can directly merge your changes in.
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Rudd-O
2009-08-29 10:51:15 UTC
Permalink
No problem man. Take all the time you need.

On Aug 28, 7:48 pm, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> Ok, it will just take a few days, because I have a few weird ideas for that.
>
> 2009/8/28 Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org>
>
>
>
> > Let me know when you have a public repo pushed with the nvlist patch
> > so I can directly merge your changes in.
Emmanuel Anne
2009-08-28 09:19:50 UTC
Permalink
Ok, so you can test the current version here :
http://rainemu.swishparty.co.uk/zfs-fuse-20090828.tar.bz2

I don't know if you ever compiled zfs-fuse before, so just in case you need
scons, libaio-dev and libfuse-dev (preferably a version >= 2.8.0 but it will
also work with version 2.7).
Then just cd to the src dir and type scons and then sudo scons install
it installs in /usr/local
so if you installed the deb you should probably temporarly replace the
binaries in /sbin by symlinks to the binaries installed in /usr/local/sbin

I hope it helps, and let me know how it works then !

2009/8/27 VL <val.luck-***@public.gmane.org>

>
> Hello zfs-fuse peoples,
>
> I thought this would be a post with some good news. I managed move
> around data and free up a device slot, and I brought a new drive
> online /dev/sdh.
>
> I powered everything off, and on, and flaky /dev/sdi was online for
> the import. So far so good. I then did the replace command, and it
> took, it was replacing!
>
> # zpool status
> pool: rstore2
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool
> will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress for 0h37m, 32.80% done, 1h17m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing ONLINE 0 0 0
> sdi ONLINE 0 223K 0
> sdh ONLINE 0 0 0
>
> errors: No known data errors
>
> Everything looked good. Until I checked it a bit later:
>
> # zpool status
> connect: Connection refused
> Please make sure that the zfs-fuse daemon is running.
> internal error: failed to initialize ZFS library
>
> It seems /sbin/zfs-fuse process and crashed and gone away. I had to
> remove the pidfile, re-run /sbin/zfs-fuse, export and re-import the
> rstore2 pool before getting it back to this state, where it's "stuck"
> again:
>
> # zpool status
> pool: rstore2
> state: DEGRADED
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing DEGRADED 0 0 0
> 9461167363731726650 UNAVAIL 0 0 0 was /
> dev/sdi
> sdh ONLINE 0 0 0
>
> errors: No known data errors
>
> I tried to offline both 9461167363731726650 and /dev/sdh , but both
> could not because "no valid replicas" for either. My thought was to
> somehow get sdh out of there and try to re-add it as a hot spare. If
> it was trying to copy the contents of the flaky /dev/sdi, then of
> course /dev/sdi would flake out and die. I was hoping it would
> reconstruct the contents of /dev/sdi from the raid 'parity' striped on
> sdf,sdg,sde. Any ideas on how to do this? Is it possible?
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-28 17:23:12 UTC
Permalink
Emmanuel,

Thanks so much for your help. Ubuntu's apt-get let me install the
requirements in minutes and I have compiled and installed the version
20090828 of zfs-fuse that you provided.

I stopped the older zfs-fuse that I had running, and ran the daemon by
path (I rename sbin to sbin.hold so it wouldn't be in default path)

# /usr/local/sbin.hold/zfs-fuse

and then checked the pool with the new zpool

/usr/local/sbin.hold # ./zpool status
pool: rstore2
state: UNAVAIL
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 UNAVAIL 0 0 0
insufficient replicas
raidz1 UNAVAIL 0 0 0
corrupted data
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing ONLINE 0 0 0
9461167363731726650 ONLINE 0 0 0 was /
dev/sdi
sdh ONLINE 0 0 0

It was unhappy and I was a little nervous at the corrupted data, but I
decided to export and import:

/usr/local/sbin.hold # ./zpool export rstore2

and then import

/usr/local/sbin.hold # ./zpool import
pool: rstore2
id: 762967251253940714
state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

rstore2 UNAVAIL insufficient replicas
raidz1 UNAVAIL corrupted data
sdf ONLINE
sdg ONLINE
sde ONLINE
replacing ONLINE
9461167363731726650 ONLINE
sdh ONLINE

again it showed corrupted data, but, I imported it anyway

/usr/local/sbin.hold # ./zpool import rstore2
/usr/local/sbin.hold # ./zpool status rstore2
pool: rstore2
state: DEGRADED
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing DEGRADED 0 0 0
9461167363731726650 UNAVAIL 0 0 0 was /
dev/sdi
sdh ONLINE 0 0 0

errors: No known data errors

So this looks like I am back to where I was with the older packaged
version of zfs-fuse. I don't know if I want to upgrade the pool with
the upgrade command as I'm sure it's a one-way process. I suppose if
you are fairly sure this build is at least beta quality and think it
might help my situation, I would try to upgrade it.

So, I have 3 disks online that should be in good status ... sdf, sdg,
sde. I have 2 disks that are in funky status ... the flaky
9461167363731726650 (used to be sdi) and the sdh which was getting
copied to.

What would you recommend I try? Normally with a raid5 hardware setup,
I'd try to yank 9461167363731726650 and sdh, get it back to a 3 out of
4 disk raid5 degraded, add a disk, initiate rebuild.

Thanks so much, and waiting to hear from you on which method/process I
should try.

Val






On Aug 28, 2:19 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> Ok, so you can test the current version here :http://rainemu.swishparty.co.uk/zfs-fuse-20090828.tar.bz2
>
> I don't know if you ever compiled zfs-fuse before, so just in case you need
> scons, libaio-dev and libfuse-dev (preferably a version >= 2.8.0 but it will
> also work with version 2.7).
> Then just cd to the src dir and type scons and then sudo scons install
> it installs in /usr/local
> so if you installed the deb you should probably temporarly replace the
> binaries in /sbin by symlinks to the binaries installed in /usr/local/sbin
>
> I hope it helps, and let me know how it works then !
>
> 2009/8/27 VL <val.l...-***@public.gmane.org>
>
>
>
> > Hello zfs-fuse peoples,
>
> > I thought this would be a post with some good news.   I managed move
> > around data and free up a device slot, and I brought a new drive
> > online /dev/sdh.
>
> > I powered everything off, and on, and flaky /dev/sdi was online for
> > the import.  So far so good.  I then did the replace command, and it
> > took, it was replacing!
>
> > # zpool status
> >  pool: rstore2
> >  state: ONLINE
> > status: One or more devices is currently being resilvered.  The pool
> > will
> >        continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> >  scrub: resilver in progress for 0h37m, 32.80% done, 1h17m to go
> > config:
>
> >        NAME           STATE     READ WRITE CKSUM
> >         rstore2        ONLINE       0     0     0
> >           raidz1       ONLINE       0     0     0
> >             sdf        ONLINE       0     0     0
> >            sdg        ONLINE       0     0     0
> >            sde        ONLINE       0     0     0
> >             replacing  ONLINE       0     0     0
> >              sdi      ONLINE       0  223K     0
> >               sdh      ONLINE       0     0     0
>
> > errors: No known data errors
>
> > Everything looked good.  Until I checked it a bit later:
>
> > # zpool status
> > connect: Connection refused
> > Please make sure that the zfs-fuse daemon is running.
> > internal error: failed to initialize ZFS library
>
> > It seems /sbin/zfs-fuse process and crashed and gone away.  I had to
> > remove the pidfile, re-run /sbin/zfs-fuse, export and re-import the
> > rstore2 pool before getting it back to this state, where it's "stuck"
> > again:
>
> > # zpool status
> >  pool: rstore2
> >  state: DEGRADED
> >  scrub: none requested
> > config:
>
> >        NAME                       STATE     READ WRITE CKSUM
> >         rstore2                    DEGRADED     0     0     0
> >          raidz1                   DEGRADED     0     0     0
> >             sdf                    ONLINE       0     0     0
> >            sdg                    ONLINE       0     0     0
> >            sde                    ONLINE       0     0     0
> >             replacing              DEGRADED     0     0     0
> >               9461167363731726650  UNAVAIL      0     0     0  was /
> > dev/sdi
> >               sdh                  ONLINE       0     0     0
>
> > errors: No known data errors
>
> > I tried to offline both 9461167363731726650 and /dev/sdh , but both
> > could not because "no valid replicas" for either.  My thought was to
> > somehow get sdh out of there and try to re-add it as a hot spare.  If
> > it was trying to copy the contents of the flaky /dev/sdi, then of
> > course /dev/sdi would flake out and die.  I was hoping it would
> > reconstruct the contents of /dev/sdi from the raid 'parity' striped on
> > sdf,sdg,sde.  Any ideas on how to do this?  Is it possible?
VL
2009-08-28 17:44:26 UTC
Permalink
Emmanuel,

Sorry I didn't wait. I tried to offline those devices, which failed
in the package version of zfs-fuse; it worked with the version you
made.


/usr/local/sbin.hold # ./zpool offline rstore2 9461167363731726650
/usr/local/sbin.hold # ./zpool offline rstore2 sdh
/usr/local/sbin.hold # ./zpool status
pool: rstore2
state: DEGRADED
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing UNAVAIL 0 0 0
insufficient replicas
9461167363731726650 OFFLINE 0 0 0 was /
dev/sdi
sdh OFFLINE 0 0 0

errors: No known data errors

So, it the two drives in concern went from "UNAVIL" and "ONLINE" to
both being "OFFLINE". However they're still in the pool. How do I
remove 9461167363731726650 and sdh, and and then attempt to add sdh
back in as a 'spare' or whatever it takes to get it to sync?
Emmanuel Anne
2009-08-28 17:55:17 UTC
Permalink
Well normally everything should work now, you can try
zpool remove rstore2 sdh
or the other one
also it should work with
zpool remove replacing since it's said to be unavail ?
try to offline it first otherwise (I never saw a replacing canceled by a
crash before, this is rather impressive !)

Anyway you are right, don't use the zpool upgrade now, because you don't
need to, you can safety ignore the warning message it's totally useless, and
if you upgrade you won't be able to use the pool anymore with the old
version, so keep the current version.

After that you can try to run your replace command again, it should not
crash in the middle like before (there were lots of possible crashes
eliminated in this version, I hope it will be usefull in your case).

If your replace command work, then you can choose to continue to use either
this new version or the old one. Well normally this new one should be better
for everything, but you are free to test and decide !
Let me know how it goes.

2009/8/28 VL <val.luck-***@public.gmane.org>

>
> Emmanuel,
>
> Sorry I didn't wait. I tried to offline those devices, which failed
> in the package version of zfs-fuse; it worked with the version you
> made.
>
>
> /usr/local/sbin.hold # ./zpool offline rstore2 9461167363731726650
> /usr/local/sbin.hold # ./zpool offline rstore2 sdh
> /usr/local/sbin.hold # ./zpool status
> pool: rstore2
> state: DEGRADED
> status: The pool is formatted using an older on-disk format. The pool
> can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
> the
> pool will no longer be accessible on older software versions.
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing UNAVAIL 0 0 0
> insufficient replicas
> 9461167363731726650 OFFLINE 0 0 0 was /
> dev/sdi
> sdh OFFLINE 0 0 0
>
> errors: No known data errors
>
> So, it the two drives in concern went from "UNAVIL" and "ONLINE" to
> both being "OFFLINE". However they're still in the pool. How do I
> remove 9461167363731726650 and sdh, and and then attempt to add sdh
> back in as a 'spare' or whatever it takes to get it to sync?
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-28 20:06:47 UTC
Permalink
Hello Rudd,

On Aug 28, 12:40 pm, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:

> First offline, then remove.  Offline makes ZFS unlink that device from
> the pool but not remove.  Remove only removes unlinked devices.

Now that I have a version that attempts to do more with my devices,
I've been simulating failures. What I'm simulating now is "disk dies/
goes away, leave 3 functioning drives in a 4 drive pool, degraded".
All I want to do is add a drive and make it resilver up.

I am stuck again. I removed the drive sdh physically, and imported
it. Sure enough, sdh was UNAVAIL and rstore2 pool was DEGRADED, but
"No known data errors". All good.

Now I'm stuck with device 10732834143420407953 (that was /dev/sdh) and
I can't get rid of it. I've put a new disk in /dev/sdh, but I don't
know how to get rid of 10732834143420407953 (that is playing the part
of drive gone bad) and get it to use the new drive now in /dev/sdh.

I can ONLINE and OFFLINE 10732834143420407953, which toggles the state
from UNAVAIL to OFFLINE, and back again.

It seems that the only way to 'add' a disk to a raidz pool is with
"replace" but you can't "replace" a disk that is UNAVAIL or OFFLINE.

I tried these 3 commands against 10732834143420407953 while both
UNAVAIL and OFFLINE:

# zpool detach rstore2 10732834143420407953
cannot detach 10732834143420407953: only applicable to mirror and
replacing vdevs

# zpool remove rstore2 10732834143420407953
cannot remove 10732834143420407953: only inactive hot spares or cache
devices can be removed

# zpool replace rstore2 10732834143420407953 /dev/sdh
cannot replace 10732834143420407953 with /dev/sdh: one or more devices
is currently unavailable

As for the last command, of course one or more devices is currently
unavailable; the orig sdh is sitting on my shelf. Silly zfs. ^_^

What are the correct ZFS commands for removing a dead drive from a
raidz pool and replacing it with a new one?

Val
Emmanuel Anne
2009-08-29 00:47:24 UTC
Permalink
1st, excellent news to know it finally works for you.
Now a warning, my patch for put_nvlist was done in a hurry, it works for
you, but it's not perfect, I could see the bug of the disappearing pools to
come back (zfs list doesn't show all the fs, and zfs mount -a doesn't mount
everything).
I have a work-around, but it's more than a hack. But it seems to work well,
so I must post that soon.

2009/8/28 VL <val.luck-***@public.gmane.org>

>
> Now that I have a version that attempts to do more with my devices,
> I've been simulating failures. What I'm simulating now is "disk dies/
> goes away, leave 3 functioning drives in a 4 drive pool, degraded".
> All I want to do is add a drive and make it resilver up.


You really like danger, don't you ?
But at least it will be usefull ! :)

>
> I am stuck again. I removed the drive sdh physically, and imported
> it. Sure enough, sdh was UNAVAIL and rstore2 pool was DEGRADED, but
> "No known data errors". All good.
>
> Now I'm stuck with device 10732834143420407953 (that was /dev/sdh) and
> I can't get rid of it. I've put a new disk in /dev/sdh, but I don't
> know how to get rid of 10732834143420407953 (that is playing the part
> of drive gone bad) and get it to use the new drive now in /dev/sdh.
>
> I can ONLINE and OFFLINE 10732834143420407953, which toggles the state
> from UNAVAIL to OFFLINE, and back again.
>
> It seems that the only way to 'add' a disk to a raidz pool is with
> "replace" but you can't "replace" a disk that is UNAVAIL or OFFLINE.
>
> I tried these 3 commands against 10732834143420407953 while both
> UNAVAIL and OFFLINE:
>
> # zpool detach rstore2 10732834143420407953
> cannot detach 10732834143420407953: only applicable to mirror and
> replacing vdevs
>
> # zpool remove rstore2 10732834143420407953
> cannot remove 10732834143420407953: only inactive hot spares or cache
> devices can be removed
>
> # zpool replace rstore2 10732834143420407953 /dev/sdh
> cannot replace 10732834143420407953 with /dev/sdh: one or more devices
> is currently unavailable
>
> As for the last command, of course one or more devices is currently
> unavailable; the orig sdh is sitting on my shelf. Silly zfs. ^_^
>
> What are the correct ZFS commands for removing a dead drive from a
> raidz pool and replacing it with a new one?
>
> Val


I have tried this already with my files to simulate the drives and I can
answer ! ;-)
You tried to make your life too complicated here.
The reason why zfs displays a serial number here is just to show that the
drive changed, it's not for use by you.
So, once you have replaced the drive in sdh, you should see something like
that after running zpool status
10732834143420407953 unavail (was /dev/sdh)
Here once you are sure there is a new good drive available in sdh you just
need to type
zpool replace rstore2 sdh

that's all.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-29 09:53:20 UTC
Permalink
Hi Emmanuel , thanks as always !

> You tried to make your life too complicated here.
> The reason why zfs displays a serial number here is just to show that the
> drive changed, it's not for use by you.
> So, once you have replaced the drive in sdh, you should see something like
> that after running zpool status
>  10732834143420407953 unavail (was /dev/sdh)
> Here once you are sure there is a new good drive available in sdh you just
> need to type
> zpool replace rstore2 sdh

Yes, I tried that command. I usually try many many options before I
post anything, just to make sure.

zpool replace rstore2 sdh

failed just as did

zpool replace rstore2 sdh sdh

just as did replacing sdh with the serial number
10732834143420407953 .

I'm stuck again. I can't remove 10732834143420407953 nor can I add
the new sdh, nor can I replace sdh with sdh. And when I say remove, I
mean I tried remove, and replace and detach. Offline, Online. I'm
pretty sure I tried every combination. Both sdh and
10732834143420407953 and /dev/sdh and /dev/disk/by-id/ name.

Again, not sure if it's fundamentally zfs or something not working
with zfs-fuse.

If nothing I've reported over the past week is reproduce-able by
anyone, perhaps ubuntu has some funky problem with zfs-fuse? Zfs
offers so much in return and I really really want it to work, but, I
follow the documentation and the advice here, and it's just not
working.

With hardware raid (das, nas, or raid card), when a drive fails, it's
never been a problem over the past 10 years. Physically remove the
drive, replace it with a new drive, get into the raid bios ... add the
replacement drive to the raid set, select 'rebuild' and go. I'm at a
loss why zfs is so hard. Am I the only one that ZFS is showing these
problems for?
Rudd-O
2009-08-29 10:53:39 UTC
Permalink
> Am I the only one that ZFS is showing these
> problems for?

TBH, yes to the extent of my knowledge. In my case, when a drive has
failed, all I have had to do is just check in zpool status that the
drive has failed, offline it, then replace it. But then again, I do
not use RAIDZ, I use RAID1.
Rudd-O
2009-08-29 11:04:28 UTC
Permalink
BTW I forgot to tell you how much we appreciate that you are trying to
make ZFS fail and then see how it behaves. This is exactly what a
project needs to get to release quality.
Emmanuel Anne
2009-08-30 08:37:44 UTC
Permalink
In this case I really don't understand how you came to this stuck
situation...

What I understood :
you recovered from your previous problem, and then decided to simulate
hardware failures to see how it would react.
So you just unplugged a disk (while the pool was imported ?) and then
replaced it with another one.
And now you are stuck with the new one because it recognizes it as another
disk so it doesn't want to import it in the pool and you can't simply force
it to replace the unplugged drive by this one, am I correct ?

Except the fact that I simulated with files instead of real hard disks, I
did mostly the same thing and it worked for me.
So try to give some steps to reproduce this as easily as possible if you can
please...

2009/8/29 VL <val.luck-***@public.gmane.org>

>
> Hi Emmanuel , thanks as always !
>
> > You tried to make your life too complicated here.
> > The reason why zfs displays a serial number here is just to show that the
> > drive changed, it's not for use by you.
> > So, once you have replaced the drive in sdh, you should see something
> like
> > that after running zpool status
> > 10732834143420407953 unavail (was /dev/sdh)
> > Here once you are sure there is a new good drive available in sdh you
> just
> > need to type
> > zpool replace rstore2 sdh
>
> Yes, I tried that command. I usually try many many options before I
> post anything, just to make sure.
>
> zpool replace rstore2 sdh
>
> failed just as did
>
> zpool replace rstore2 sdh sdh
>
> just as did replacing sdh with the serial number
> 10732834143420407953 .
>
> I'm stuck again. I can't remove 10732834143420407953 nor can I add
> the new sdh, nor can I replace sdh with sdh. And when I say remove, I
> mean I tried remove, and replace and detach. Offline, Online. I'm
> pretty sure I tried every combination. Both sdh and
> 10732834143420407953 and /dev/sdh and /dev/disk/by-id/ name.
>
> Again, not sure if it's fundamentally zfs or something not working
> with zfs-fuse.
>
> If nothing I've reported over the past week is reproduce-able by
> anyone, perhaps ubuntu has some funky problem with zfs-fuse? Zfs
> offers so much in return and I really really want it to work, but, I
> follow the documentation and the advice here, and it's just not
> working.
>
> With hardware raid (das, nas, or raid card), when a drive fails, it's
> never been a problem over the past 10 years. Physically remove the
> drive, replace it with a new drive, get into the raid bios ... add the
> replacement drive to the raid set, select 'rebuild' and go. I'm at a
> loss why zfs is so hard. Am I the only one that ZFS is showing these
> problems for?
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Rudd-O
2009-08-30 10:42:17 UTC
Permalink
Seconded. We need a testcase that lets us replicate your problem.

On Aug 30, 3:37 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> In this case I really don't understand how you came to this stuck
> situation...
>
> What I understood :
> you recovered from your previous problem, and then decided to simulate
> hardware failures to see how it would react.
> So you just unplugged a disk (while the pool was imported ?) and then
> replaced it with another one.
> And now you are stuck with the new one because it recognizes it as another
> disk so it doesn't want to import it in the pool and you can't simply force
> it to replace the unplugged drive by this one, am I correct ?
>
> Except the fact that I simulated with files instead of real hard disks, I
> did mostly the same thing and it worked for me.
> So try to give some steps to reproduce this as easily as possible if you can
> please...
>
> 2009/8/29 VL <val.l...-***@public.gmane.org>
>
>
>
> > Hi Emmanuel , thanks as always !
>
> > > You tried to make your life too complicated here.
> > > The reason why zfs displays a serial number here is just to show that the
> > > drive changed, it's not for use by you.
> > > So, once you have replaced the drive in sdh, you should see something
> > like
> > > that after running zpool status
> > >  10732834143420407953 unavail (was /dev/sdh)
> > > Here once you are sure there is a new good drive available in sdh you
> > just
> > > need to type
> > > zpool replace rstore2 sdh
>
> > Yes, I tried that command.  I usually try many many options before I
> > post anything, just to make sure.
>
> > zpool replace rstore2 sdh
>
> > failed just as did
>
> > zpool replace rstore2 sdh sdh
>
> > just as did replacing sdh with the serial number
> > 10732834143420407953 .
>
> > I'm stuck again.  I can't remove 10732834143420407953 nor can I add
> > the new sdh, nor can I replace sdh with sdh.  And when I say remove, I
> > mean I tried remove, and replace and detach.  Offline, Online.  I'm
> > pretty sure I tried every combination.  Both sdh and
> > 10732834143420407953 and /dev/sdh and /dev/disk/by-id/ name.
>
> > Again, not sure if it's fundamentally zfs or something not working
> > with zfs-fuse.
>
> > If nothing I've reported over the past week is reproduce-able by
> > anyone, perhaps ubuntu has some funky problem with zfs-fuse?  Zfs
> > offers so much in return and I really really want it to work, but, I
> > follow the documentation and the advice here, and it's just not
> > working.
>
> > With hardware raid (das, nas, or raid card), when a drive fails, it's
> > never been a problem over the past 10 years.  Physically remove the
> > drive, replace it with a new drive, get into the raid bios ... add the
> > replacement drive to the raid set, select 'rebuild' and go.   I'm at a
> > loss why zfs is so hard.  Am I the only one that ZFS is showing these
> > problems for?
VL
2009-08-30 22:23:30 UTC
Permalink
On Aug 30, 1:37 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> In this case I really don't understand how you came to this stuck
> situation...

On Aug 30, 3:42 am, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:
> Seconded. We need a testcase that lets us replicate your problem.

Rudd, Emmanuel, as always, thanks for the response. I'm more than
happy to run any test cases and try to help you replicate what I'm
seeing.

What I have discovered is that hardware devices vs. files act
differently. Here is the test scenario I have just replicated (from
scratch):

**** DISK FILES ****

1. create 4 disk files:

dd if=/dev/zero bs=1M count=250 of=image1.dat
dd if=/dev/zero bs=1M count=250 of=image2.dat
dd if=/dev/zero bs=1M count=250 of=image3.dat
dd if=/dev/zero bs=1M count=250 of=image4.dat

2. create raidz from those files:

zpool create newtest1 raidz1 /root/zfstest/image*dat

3. verify all is well. Copy data to it. All is well.

4. Export the pool and destroy one disk ; important: recreate and use
the same device (ie, same disk name)

zpool export newtest1
dd if=/dev/zero bs=1M count=250 of=image4.dat

5. Import the degraded pool. Note the format of the output.

# zpool import -d . newtest1
# zpool status newtest1
pool: newtest1
state: ONLINE
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
newtest1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/root/zfstest/image1.dat ONLINE 0 0 0
/root/zfstest/image2.dat ONLINE 0 0 0
/root/zfstest/image3.dat ONLINE 0 0 0
7887148096686542922 UNAVAIL 0 0 0 was /
root/zfstest/image4.dat

errors: No known data errors

6. now the crucial moment -- the step that fails with actual
hardware:

# zpool replace newtest1 /root/zfstest/image4.dat /root/zfstest/
image4.dat

# zpool status newtest1
pool: newtest1
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Aug 30
14:53:03 2009
config:

NAME STATE READ WRITE CKSUM
newtest1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/root/zfstest/image1.dat ONLINE 0 0 0
/root/zfstest/image2.dat ONLINE 0 0 0
/root/zfstest/image3.dat ONLINE 0 0 0
/root/zfstest/image4.dat ONLINE 0 0 0 136M
resilvered

errors: No known data errors

7. BINGO. It worked. If you're using this method to replicate, then,
indeed, you would be confused at my failure.

**** HARDWARE ****

1. bring online 4 hard drives (250gb each in my case), sde, sdf, sdg,
sdh

2. Create raidz from those disks:

zpool create rstore2 raidz1 /dev/sde /dev/sdf /dev/sdg /dev/sdh

3. Verify all is well. Copy data to it. All is well.

4. Export the pool and destroy one disk ; important: recreate and use
the same device (ie, same disk name)

# zpool export rstore2

((power off devices, remove sdh and set on shelf, insert new 250gb
into /dev/sdh, power on))

5. Import the degraded pool. Note the format of the output is the
same for the missing disk: <dev id> UNAVAIL ... was <old device>

# zpool status rstore2
pool: rstore2
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed after 0h7m with 0 errors on Fri Aug 28
12:48:27 2009
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
10732834143420407953 UNAVAIL 0 0 0 was /dev/
sdh

6. now the crucial moment -- the step that fails:

# zpool replace rstore2 /dev/sdh /dev/sdh
cannot replace /dev/sdh with /dev/sdh: one or more devices is
currently unavailable

7. Problem exists. What works for disk files, fails for hardware
devices.

THOUGHTS:

1) hardware devices are somehow being treated differently than are
disk images in the zfs-fuse code.

2) only way to replicate this would be to test this on hardware
devices (get external usb drives?)

3) 'replace' command is useless in case of failed/flakey device, as it
needs to read from one device to another. If the drive is flakey/bad/
missing, then, you can't read from it.

4) I just saw, in the 2 situations where I imported the degraded
(missing a disk) pools, the file based pool had state ONLINE while the
disk drive based pool had state DEGRADED, but both had the text
"Sufficient replicas exist for the pool to continue".

After thoughts

*) zfs in general seems geared towards mirrors and not raid5 (raidz).
Many of the commands only work with mirrors. The man page says
'remove' is used for hot spares, not for raidz. Man page says
'detach' is used for detaching a mirror, not a raidz. Man page says
'add' is for adding a mirror to a device.

I hope that this can help someone reproduce the problems I'm having,
and maybe pave the way towards fixing these hardware issues. If there
is anything else that I can do to help, please let me know.

Thanks again !

Val
Emmanuel Anne
2009-08-30 22:38:08 UTC
Permalink
Yes interesting. I'll try to reproduce this with paritions then, because I
don't have 4 hard disks to try for the cause (if somebody wants to send 4
disks to me it's ok ! ;-)).

I'll let you know how it goes, I'll probably try that tomorrow.

For the thoughts about mirroring vs raidz :
actually it's just because zfs won't let you remove a drive from a pool once
it has been added (I mean really added, a mirror is a mirror, but when you
add a drive for more storage, it's in for good it can never been removed
after that). That's why the man page just talks about mirrors or spares in
this case.
In your case too you can't remove it, but you can replace it (normally !).
Anyway don't expect the results too soon, tomorrow is again a very sunny day
and busy too, but I hope soon.

2009/8/31 VL <val.luck-***@public.gmane.org>

>
> On Aug 30, 1:37 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> > In this case I really don't understand how you came to this stuck
> > situation...
>
> On Aug 30, 3:42 am, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:
> > Seconded. We need a testcase that lets us replicate your problem.
>
> Rudd, Emmanuel, as always, thanks for the response. I'm more than
> happy to run any test cases and try to help you replicate what I'm
> seeing.
>
> What I have discovered is that hardware devices vs. files act
> differently. Here is the test scenario I have just replicated (from
> scratch):
>
> **** DISK FILES ****
>
> 1. create 4 disk files:
>
> dd if=/dev/zero bs=1M count=250 of=image1.dat
> dd if=/dev/zero bs=1M count=250 of=image2.dat
> dd if=/dev/zero bs=1M count=250 of=image3.dat
> dd if=/dev/zero bs=1M count=250 of=image4.dat
>
> 2. create raidz from those files:
>
> zpool create newtest1 raidz1 /root/zfstest/image*dat
>
> 3. verify all is well. Copy data to it. All is well.
>
> 4. Export the pool and destroy one disk ; important: recreate and use
> the same device (ie, same disk name)
>
> zpool export newtest1
> dd if=/dev/zero bs=1M count=250 of=image4.dat
>
> 5. Import the degraded pool. Note the format of the output.
>
> # zpool import -d . newtest1
> # zpool status newtest1
> pool: newtest1
> state: ONLINE
> status: One or more devices could not be used because the label is
> missing or
> invalid. Sufficient replicas exist for the pool to continue
> functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-4J
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> newtest1 ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> /root/zfstest/image1.dat ONLINE 0 0 0
> /root/zfstest/image2.dat ONLINE 0 0 0
> /root/zfstest/image3.dat ONLINE 0 0 0
> 7887148096686542922 UNAVAIL 0 0 0 was /
> root/zfstest/image4.dat
>
> errors: No known data errors
>
> 6. now the crucial moment -- the step that fails with actual
> hardware:
>
> # zpool replace newtest1 /root/zfstest/image4.dat /root/zfstest/
> image4.dat
>
> # zpool status newtest1
> pool: newtest1
> state: ONLINE
> scrub: resilver completed after 0h0m with 0 errors on Sun Aug 30
> 14:53:03 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> newtest1 ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> /root/zfstest/image1.dat ONLINE 0 0 0
> /root/zfstest/image2.dat ONLINE 0 0 0
> /root/zfstest/image3.dat ONLINE 0 0 0
> /root/zfstest/image4.dat ONLINE 0 0 0 136M
> resilvered
>
> errors: No known data errors
>
> 7. BINGO. It worked. If you're using this method to replicate, then,
> indeed, you would be confused at my failure.
>
> **** HARDWARE ****
>
> 1. bring online 4 hard drives (250gb each in my case), sde, sdf, sdg,
> sdh
>
> 2. Create raidz from those disks:
>
> zpool create rstore2 raidz1 /dev/sde /dev/sdf /dev/sdg /dev/sdh
>
> 3. Verify all is well. Copy data to it. All is well.
>
> 4. Export the pool and destroy one disk ; important: recreate and use
> the same device (ie, same disk name)
>
> # zpool export rstore2
>
> ((power off devices, remove sdh and set on shelf, insert new 250gb
> into /dev/sdh, power on))
>
> 5. Import the degraded pool. Note the format of the output is the
> same for the missing disk: <dev id> UNAVAIL ... was <old device>
>
> # zpool status rstore2
> pool: rstore2
> state: DEGRADED
> status: One or more devices could not be used because the label is
> missing or
> invalid. Sufficient replicas exist for the pool to continue
> functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-4J
> scrub: scrub completed after 0h7m with 0 errors on Fri Aug 28
> 12:48:27 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> sde ONLINE 0 0 0
> sdf ONLINE 0 0 0
> sdg ONLINE 0 0 0
> 10732834143420407953 UNAVAIL 0 0 0 was /dev/
> sdh
>
> 6. now the crucial moment -- the step that fails:
>
> # zpool replace rstore2 /dev/sdh /dev/sdh
> cannot replace /dev/sdh with /dev/sdh: one or more devices is
> currently unavailable
>
> 7. Problem exists. What works for disk files, fails for hardware
> devices.
>
> THOUGHTS:
>
> 1) hardware devices are somehow being treated differently than are
> disk images in the zfs-fuse code.
>
> 2) only way to replicate this would be to test this on hardware
> devices (get external usb drives?)
>
> 3) 'replace' command is useless in case of failed/flakey device, as it
> needs to read from one device to another. If the drive is flakey/bad/
> missing, then, you can't read from it.
>
> 4) I just saw, in the 2 situations where I imported the degraded
> (missing a disk) pools, the file based pool had state ONLINE while the
> disk drive based pool had state DEGRADED, but both had the text
> "Sufficient replicas exist for the pool to continue".
>
> After thoughts
>
> *) zfs in general seems geared towards mirrors and not raid5 (raidz).
> Many of the commands only work with mirrors. The man page says
> 'remove' is used for hot spares, not for raidz. Man page says
> 'detach' is used for detaching a mirror, not a raidz. Man page says
> 'add' is for adding a mirror to a device.
>
> I hope that this can help someone reproduce the problems I'm having,
> and maybe pave the way towards fixing these hardware issues. If there
> is anything else that I can do to help, please let me know.
>
> Thanks again !
>
> Val
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Emmanuel Anne
2009-08-31 00:33:01 UTC
Permalink
I did the testing now, too bad for tomorrow, result : it works with
partitions too, so it's very surprising that it fails with whole disks.

In the end : I just deleted the last partition and recreated it further.
Then zpool import and status gives the same result as you
then either zpool replace test sda14 or zpool replace test /dev/sda14
/dev/sda14 both work, it just takes a while to resilver the new partition.

So I can't test this further, testing was with current version in git
repository.
I have not got enough external disks to try this with external disks.

2009/8/31 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>

> Yes interesting. I'll try to reproduce this with paritions then, because I
> don't have 4 hard disks to try for the cause (if somebody wants to send 4
> disks to me it's ok ! ;-)).
>
> I'll let you know how it goes, I'll probably try that tomorrow.
>
> For the thoughts about mirroring vs raidz :
> actually it's just because zfs won't let you remove a drive from a pool
> once it has been added (I mean really added, a mirror is a mirror, but when
> you add a drive for more storage, it's in for good it can never been removed
> after that). That's why the man page just talks about mirrors or spares in
> this case.
> In your case too you can't remove it, but you can replace it (normally !).
> Anyway don't expect the results too soon, tomorrow is again a very sunny
> day and busy too, but I hope soon.
>
> 2009/8/31 VL <val.luck-***@public.gmane.org>
>
>
>> On Aug 30, 1:37 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
>> > In this case I really don't understand how you came to this stuck
>> > situation...
>>
>> On Aug 30, 3:42 am, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:
>> > Seconded. We need a testcase that lets us replicate your problem.
>>
>> Rudd, Emmanuel, as always, thanks for the response. I'm more than
>> happy to run any test cases and try to help you replicate what I'm
>> seeing.
>>
>> What I have discovered is that hardware devices vs. files act
>> differently. Here is the test scenario I have just replicated (from
>> scratch):
>>
>> **** DISK FILES ****
>>
>> 1. create 4 disk files:
>>
>> dd if=/dev/zero bs=1M count=250 of=image1.dat
>> dd if=/dev/zero bs=1M count=250 of=image2.dat
>> dd if=/dev/zero bs=1M count=250 of=image3.dat
>> dd if=/dev/zero bs=1M count=250 of=image4.dat
>>
>> 2. create raidz from those files:
>>
>> zpool create newtest1 raidz1 /root/zfstest/image*dat
>>
>> 3. verify all is well. Copy data to it. All is well.
>>
>> 4. Export the pool and destroy one disk ; important: recreate and use
>> the same device (ie, same disk name)
>>
>> zpool export newtest1
>> dd if=/dev/zero bs=1M count=250 of=image4.dat
>>
>> 5. Import the degraded pool. Note the format of the output.
>>
>> # zpool import -d . newtest1
>> # zpool status newtest1
>> pool: newtest1
>> state: ONLINE
>> status: One or more devices could not be used because the label is
>> missing or
>> invalid. Sufficient replicas exist for the pool to continue
>> functioning in a degraded state.
>> action: Replace the device using 'zpool replace'.
>> see: http://www.sun.com/msg/ZFS-8000-4J
>> scrub: none requested
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> newtest1 ONLINE 0 0 0
>> raidz1 ONLINE 0 0 0
>> /root/zfstest/image1.dat ONLINE 0 0 0
>> /root/zfstest/image2.dat ONLINE 0 0 0
>> /root/zfstest/image3.dat ONLINE 0 0 0
>> 7887148096686542922 UNAVAIL 0 0 0 was /
>> root/zfstest/image4.dat
>>
>> errors: No known data errors
>>
>> 6. now the crucial moment -- the step that fails with actual
>> hardware:
>>
>> # zpool replace newtest1 /root/zfstest/image4.dat /root/zfstest/
>> image4.dat
>>
>> # zpool status newtest1
>> pool: newtest1
>> state: ONLINE
>> scrub: resilver completed after 0h0m with 0 errors on Sun Aug 30
>> 14:53:03 2009
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> newtest1 ONLINE 0 0 0
>> raidz1 ONLINE 0 0 0
>> /root/zfstest/image1.dat ONLINE 0 0 0
>> /root/zfstest/image2.dat ONLINE 0 0 0
>> /root/zfstest/image3.dat ONLINE 0 0 0
>> /root/zfstest/image4.dat ONLINE 0 0 0 136M
>> resilvered
>>
>> errors: No known data errors
>>
>> 7. BINGO. It worked. If you're using this method to replicate, then,
>> indeed, you would be confused at my failure.
>>
>> **** HARDWARE ****
>>
>> 1. bring online 4 hard drives (250gb each in my case), sde, sdf, sdg,
>> sdh
>>
>> 2. Create raidz from those disks:
>>
>> zpool create rstore2 raidz1 /dev/sde /dev/sdf /dev/sdg /dev/sdh
>>
>> 3. Verify all is well. Copy data to it. All is well.
>>
>> 4. Export the pool and destroy one disk ; important: recreate and use
>> the same device (ie, same disk name)
>>
>> # zpool export rstore2
>>
>> ((power off devices, remove sdh and set on shelf, insert new 250gb
>> into /dev/sdh, power on))
>>
>> 5. Import the degraded pool. Note the format of the output is the
>> same for the missing disk: <dev id> UNAVAIL ... was <old device>
>>
>> # zpool status rstore2
>> pool: rstore2
>> state: DEGRADED
>> status: One or more devices could not be used because the label is
>> missing or
>> invalid. Sufficient replicas exist for the pool to continue
>> functioning in a degraded state.
>> action: Replace the device using 'zpool replace'.
>> see: http://www.sun.com/msg/ZFS-8000-4J
>> scrub: scrub completed after 0h7m with 0 errors on Fri Aug 28
>> 12:48:27 2009
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> rstore2 DEGRADED 0 0 0
>> raidz1 DEGRADED 0 0 0
>> sde ONLINE 0 0 0
>> sdf ONLINE 0 0 0
>> sdg ONLINE 0 0 0
>> 10732834143420407953 UNAVAIL 0 0 0 was /dev/
>> sdh
>>
>> 6. now the crucial moment -- the step that fails:
>>
>> # zpool replace rstore2 /dev/sdh /dev/sdh
>> cannot replace /dev/sdh with /dev/sdh: one or more devices is
>> currently unavailable
>>
>> 7. Problem exists. What works for disk files, fails for hardware
>> devices.
>>
>> THOUGHTS:
>>
>> 1) hardware devices are somehow being treated differently than are
>> disk images in the zfs-fuse code.
>>
>> 2) only way to replicate this would be to test this on hardware
>> devices (get external usb drives?)
>>
>> 3) 'replace' command is useless in case of failed/flakey device, as it
>> needs to read from one device to another. If the drive is flakey/bad/
>> missing, then, you can't read from it.
>>
>> 4) I just saw, in the 2 situations where I imported the degraded
>> (missing a disk) pools, the file based pool had state ONLINE while the
>> disk drive based pool had state DEGRADED, but both had the text
>> "Sufficient replicas exist for the pool to continue".
>>
>> After thoughts
>>
>> *) zfs in general seems geared towards mirrors and not raid5 (raidz).
>> Many of the commands only work with mirrors. The man page says
>> 'remove' is used for hot spares, not for raidz. Man page says
>> 'detach' is used for detaching a mirror, not a raidz. Man page says
>> 'add' is for adding a mirror to a device.
>>
>> I hope that this can help someone reproduce the problems I'm having,
>> and maybe pave the way towards fixing these hardware issues. If there
>> is anything else that I can do to help, please let me know.
>>
>> Thanks again !
>>
>> Val
>>
>> >>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Emmanuel Anne
2009-08-31 07:54:46 UTC
Permalink
Ideas :
since it works with a partition, it should probably work also with a raw
disk, so there is probably a problem with the disk.
Did you check that it's actually available ? (create a partition on it,
mount it, umount it)
Did you delete all the partitions on it before trying to put it in a pool (I
am not sure wether partitions can block this, but it's probably safer to
delete them first).

2009/8/31 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>

> I did the testing now, too bad for tomorrow, result : it works with
> partitions too, so it's very surprising that it fails with whole disks.
>
> In the end : I just deleted the last partition and recreated it further.
> Then zpool import and status gives the same result as you
> then either zpool replace test sda14 or zpool replace test /dev/sda14
> /dev/sda14 both work, it just takes a while to resilver the new partition.
>
> So I can't test this further, testing was with current version in git
> repository.
> I have not got enough external disks to try this with external disks.
>
> 2009/8/31 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>
>
> Yes interesting. I'll try to reproduce this with paritions then, because I
>> don't have 4 hard disks to try for the cause (if somebody wants to send 4
>> disks to me it's ok ! ;-)).
>>
>> I'll let you know how it goes, I'll probably try that tomorrow.
>>
>> For the thoughts about mirroring vs raidz :
>> actually it's just because zfs won't let you remove a drive from a pool
>> once it has been added (I mean really added, a mirror is a mirror, but when
>> you add a drive for more storage, it's in for good it can never been removed
>> after that). That's why the man page just talks about mirrors or spares in
>> this case.
>> In your case too you can't remove it, but you can replace it (normally !).
>> Anyway don't expect the results too soon, tomorrow is again a very sunny
>> day and busy too, but I hope soon.
>>
>> 2009/8/31 VL <val.luck-***@public.gmane.org>
>>
>>
>>> On Aug 30, 1:37 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
>>> > In this case I really don't understand how you came to this stuck
>>> > situation...
>>>
>>> On Aug 30, 3:42 am, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:
>>> > Seconded. We need a testcase that lets us replicate your problem.
>>>
>>> Rudd, Emmanuel, as always, thanks for the response. I'm more than
>>> happy to run any test cases and try to help you replicate what I'm
>>> seeing.
>>>
>>> What I have discovered is that hardware devices vs. files act
>>> differently. Here is the test scenario I have just replicated (from
>>> scratch):
>>>
>>> **** DISK FILES ****
>>>
>>> 1. create 4 disk files:
>>>
>>> dd if=/dev/zero bs=1M count=250 of=image1.dat
>>> dd if=/dev/zero bs=1M count=250 of=image2.dat
>>> dd if=/dev/zero bs=1M count=250 of=image3.dat
>>> dd if=/dev/zero bs=1M count=250 of=image4.dat
>>>
>>> 2. create raidz from those files:
>>>
>>> zpool create newtest1 raidz1 /root/zfstest/image*dat
>>>
>>> 3. verify all is well. Copy data to it. All is well.
>>>
>>> 4. Export the pool and destroy one disk ; important: recreate and use
>>> the same device (ie, same disk name)
>>>
>>> zpool export newtest1
>>> dd if=/dev/zero bs=1M count=250 of=image4.dat
>>>
>>> 5. Import the degraded pool. Note the format of the output.
>>>
>>> # zpool import -d . newtest1
>>> # zpool status newtest1
>>> pool: newtest1
>>> state: ONLINE
>>> status: One or more devices could not be used because the label is
>>> missing or
>>> invalid. Sufficient replicas exist for the pool to continue
>>> functioning in a degraded state.
>>> action: Replace the device using 'zpool replace'.
>>> see: http://www.sun.com/msg/ZFS-8000-4J
>>> scrub: none requested
>>> config:
>>>
>>> NAME STATE READ WRITE CKSUM
>>> newtest1 ONLINE 0 0 0
>>> raidz1 ONLINE 0 0 0
>>> /root/zfstest/image1.dat ONLINE 0 0 0
>>> /root/zfstest/image2.dat ONLINE 0 0 0
>>> /root/zfstest/image3.dat ONLINE 0 0 0
>>> 7887148096686542922 UNAVAIL 0 0 0 was /
>>> root/zfstest/image4.dat
>>>
>>> errors: No known data errors
>>>
>>> 6. now the crucial moment -- the step that fails with actual
>>> hardware:
>>>
>>> # zpool replace newtest1 /root/zfstest/image4.dat /root/zfstest/
>>> image4.dat
>>>
>>> # zpool status newtest1
>>> pool: newtest1
>>> state: ONLINE
>>> scrub: resilver completed after 0h0m with 0 errors on Sun Aug 30
>>> 14:53:03 2009
>>> config:
>>>
>>> NAME STATE READ WRITE CKSUM
>>> newtest1 ONLINE 0 0 0
>>> raidz1 ONLINE 0 0 0
>>> /root/zfstest/image1.dat ONLINE 0 0 0
>>> /root/zfstest/image2.dat ONLINE 0 0 0
>>> /root/zfstest/image3.dat ONLINE 0 0 0
>>> /root/zfstest/image4.dat ONLINE 0 0 0 136M
>>> resilvered
>>>
>>> errors: No known data errors
>>>
>>> 7. BINGO. It worked. If you're using this method to replicate, then,
>>> indeed, you would be confused at my failure.
>>>
>>> **** HARDWARE ****
>>>
>>> 1. bring online 4 hard drives (250gb each in my case), sde, sdf, sdg,
>>> sdh
>>>
>>> 2. Create raidz from those disks:
>>>
>>> zpool create rstore2 raidz1 /dev/sde /dev/sdf /dev/sdg /dev/sdh
>>>
>>> 3. Verify all is well. Copy data to it. All is well.
>>>
>>> 4. Export the pool and destroy one disk ; important: recreate and use
>>> the same device (ie, same disk name)
>>>
>>> # zpool export rstore2
>>>
>>> ((power off devices, remove sdh and set on shelf, insert new 250gb
>>> into /dev/sdh, power on))
>>>
>>> 5. Import the degraded pool. Note the format of the output is the
>>> same for the missing disk: <dev id> UNAVAIL ... was <old device>
>>>
>>> # zpool status rstore2
>>> pool: rstore2
>>> state: DEGRADED
>>> status: One or more devices could not be used because the label is
>>> missing or
>>> invalid. Sufficient replicas exist for the pool to continue
>>> functioning in a degraded state.
>>> action: Replace the device using 'zpool replace'.
>>> see: http://www.sun.com/msg/ZFS-8000-4J
>>> scrub: scrub completed after 0h7m with 0 errors on Fri Aug 28
>>> 12:48:27 2009
>>> config:
>>>
>>> NAME STATE READ WRITE CKSUM
>>> rstore2 DEGRADED 0 0 0
>>> raidz1 DEGRADED 0 0 0
>>> sde ONLINE 0 0 0
>>> sdf ONLINE 0 0 0
>>> sdg ONLINE 0 0 0
>>> 10732834143420407953 UNAVAIL 0 0 0 was /dev/
>>> sdh
>>>
>>> 6. now the crucial moment -- the step that fails:
>>>
>>> # zpool replace rstore2 /dev/sdh /dev/sdh
>>> cannot replace /dev/sdh with /dev/sdh: one or more devices is
>>> currently unavailable
>>>
>>> 7. Problem exists. What works for disk files, fails for hardware
>>> devices.
>>>
>>> THOUGHTS:
>>>
>>> 1) hardware devices are somehow being treated differently than are
>>> disk images in the zfs-fuse code.
>>>
>>> 2) only way to replicate this would be to test this on hardware
>>> devices (get external usb drives?)
>>>
>>> 3) 'replace' command is useless in case of failed/flakey device, as it
>>> needs to read from one device to another. If the drive is flakey/bad/
>>> missing, then, you can't read from it.
>>>
>>> 4) I just saw, in the 2 situations where I imported the degraded
>>> (missing a disk) pools, the file based pool had state ONLINE while the
>>> disk drive based pool had state DEGRADED, but both had the text
>>> "Sufficient replicas exist for the pool to continue".
>>>
>>> After thoughts
>>>
>>> *) zfs in general seems geared towards mirrors and not raid5 (raidz).
>>> Many of the commands only work with mirrors. The man page says
>>> 'remove' is used for hot spares, not for raidz. Man page says
>>> 'detach' is used for detaching a mirror, not a raidz. Man page says
>>> 'add' is for adding a mirror to a device.
>>>
>>> I hope that this can help someone reproduce the problems I'm having,
>>> and maybe pave the way towards fixing these hardware issues. If there
>>> is anything else that I can do to help, please let me know.
>>>
>>> Thanks again !
>>>
>>> Val
>>>
>>> >>>
>>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
VL
2009-08-31 19:13:16 UTC
Permalink
Hi Emmanuel,

> Did you check that it's actually available ? (create a partition on it,
> mount it, umount it)
> Did you delete all the partitions on it before trying to put it in a pool (I
> am not sure wether partitions can block this, but it's probably safer to
> delete them first).

Yes, it was a working disk that had some old data on it. I mounted
it, checked it to verify it was not important, then removed the
partition with cfdisk, and then started testing with zfs-fuse.

However, I did finally get it working, and this is what I did. Since
it seemed to keep saying that it couldn't access the device when they
were both the same (old was /dev/sdh and the new drive was showing up
as /dev/sdh) ... what I did was re-order the drives in the system so
that the 'new' drive showed up as /dev/sdf.
# zpool import
pool: rstore2
id: 762967251253940714
state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-4J
config:

rstore2 DEGRADED
raidz1 DEGRADED
sdh ONLINE
sdg ONLINE
sde ONLINE
10732834143420407953 FAULTED corrupted data

and here is the import, the replace, and the status

# zpool import
pool: rstore2
id: 762967251253940714
state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-4J
config:

rstore2 DEGRADED
raidz1 DEGRADED
sdh ONLINE
sdg ONLINE
sde ONLINE
10732834143420407953 FAULTED corrupted data

# zpool import rstore2

# zpool replace rstore2 10732834143420407953 /dev/sdf
# zpool status
pool: rstore2
state: DEGRADED
status: One or more devices is currently being resilvered. The pool
will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0.71% done, 0h32m to go
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sdh ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
replacing DEGRADED 0 0 1.71K
10732834143420407953 UNAVAIL 0 0 0 was /
dev/sdh
sdf ONLINE 0 0 0 56.1M
resilvered

errors: No known data errors

Since re-ordering device order is easy enough, I'll consider this a
valid workaround. I will re-perform the entire sequence of events to
verify.
Emmanuel Anne
2009-08-31 22:18:25 UTC
Permalink
Ouf ! (french word of relief !)

Good news finally, so this problem is sloved.
Ok, now the only known problem remaining are the odd freezes for Rudd-O... I
don't why but I bet it will be much more complex ! ;-)

Anyway good to know it's working again for you now.
Be sure to upgrade zfs-fuse again when the stable version will be released,
it will have at least some updated man pages which can be usefull (you can
get them from the git repository too), and a few more bug fixes hopefully.

2009/8/31 VL <val.luck-***@public.gmane.org>

>
> Hi Emmanuel,
>
> > Did you check that it's actually available ? (create a partition on it,
> > mount it, umount it)
> > Did you delete all the partitions on it before trying to put it in a pool
> (I
> > am not sure wether partitions can block this, but it's probably safer to
> > delete them first).
>
> Yes, it was a working disk that had some old data on it. I mounted
> it, checked it to verify it was not important, then removed the
> partition with cfdisk, and then started testing with zfs-fuse.
>
> However, I did finally get it working, and this is what I did. Since
> it seemed to keep saying that it couldn't access the device when they
> were both the same (old was /dev/sdh and the new drive was showing up
> as /dev/sdh) ... what I did was re-order the drives in the system so
> that the 'new' drive showed up as /dev/sdf.
> # zpool import
> pool: rstore2
> id: 762967251253940714
> state: DEGRADED
> status: One or more devices contains corrupted data.
> action: The pool can be imported despite missing or damaged devices.
> The
> fault tolerance of the pool may be compromised if imported.
> see: http://www.sun.com/msg/ZFS-8000-4J
> config:
>
> rstore2 DEGRADED
> raidz1 DEGRADED
> sdh ONLINE
> sdg ONLINE
> sde ONLINE
> 10732834143420407953 FAULTED corrupted data
>
> and here is the import, the replace, and the status
>
> # zpool import
> pool: rstore2
> id: 762967251253940714
> state: DEGRADED
> status: One or more devices contains corrupted data.
> action: The pool can be imported despite missing or damaged devices.
> The
> fault tolerance of the pool may be compromised if imported.
> see: http://www.sun.com/msg/ZFS-8000-4J
> config:
>
> rstore2 DEGRADED
> raidz1 DEGRADED
> sdh ONLINE
> sdg ONLINE
> sde ONLINE
> 10732834143420407953 FAULTED corrupted data
>
> # zpool import rstore2
>
> # zpool replace rstore2 10732834143420407953 /dev/sdf
> # zpool status
> pool: rstore2
> state: DEGRADED
> status: One or more devices is currently being resilvered. The pool
> will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress for 0h0m, 0.71% done, 0h32m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> rstore2 DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> sdh ONLINE 0 0 0
> sdg ONLINE 0 0 0
> sde ONLINE 0 0 0
> replacing DEGRADED 0 0 1.71K
> 10732834143420407953 UNAVAIL 0 0 0 was /
> dev/sdh
> sdf ONLINE 0 0 0 56.1M
> resilvered
>
> errors: No known data errors
>
> Since re-ordering device order is easy enough, I'll consider this a
> valid workaround. I will re-perform the entire sequence of events to
> verify.
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "zfs-fuse" group.
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To unsubscribe from this group, send email to zfs-fuse+unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/zfs-fuse?hl=en
-~----------~----~----~----~------~----~------~--~---
Rudd-O
2009-09-01 20:16:51 UTC
Permalink
The freeze and the abort. I have two bugs I have run into. The
freeze will be diagnosed in the testing rig. The abort, I am
recompiling with more debugging info to see what makes it hit the
bug. I suspect it might be in FUSE (seems as if FUSE is sending a
command to free a znode twice), so I will be preparing new FUSE
packages.

On Aug 31, 5:18 pm, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> Ouf ! (french word of relief !)
>
> Good news finally, so this problem is sloved.
> Ok, now the only known problem remaining are the odd freezes for Rudd-O... I
> don't why but I bet it will be much more complex ! ;-)
>
> Anyway good to know it's working again for you now.
> Be sure to upgrade zfs-fuse again when the stable version will be released,
> it will have at least some updated man pages which can be usefull (you can
> get them from the git repository too), and a few more bug fixes hopefully.
>
> 2009/8/31 VL <val.l...-***@public.gmane.org>
>
>
>
> > Hi Emmanuel,
>
> > > Did you check that it's actually available ? (create a partition on it,
> > > mount it, umount it)
> > > Did you delete all the partitions on it before trying to put it in a pool
> > (I
> > > am not sure wether partitions can block this, but it's probably safer to
> > > delete them first).
>
> > Yes, it was a working disk that had some old data on it.  I mounted
> > it, checked it to verify it was not important, then removed the
> > partition with cfdisk, and then started testing with zfs-fuse.
>
> > However, I did finally get it working, and this is what I did.  Since
> > it seemed to keep saying that it couldn't access the device when they
> > were both the same (old was /dev/sdh and the new drive was showing up
> > as /dev/sdh) ... what I did was re-order the drives in the system so
> > that the 'new' drive showed up as /dev/sdf.
> > # zpool import
> >  pool: rstore2
> >    id: 762967251253940714
> >  state: DEGRADED
> > status: One or more devices contains corrupted data.
> > action: The pool can be imported despite missing or damaged devices.
> > The
> >        fault tolerance of the pool may be compromised if imported.
> >    see:http://www.sun.com/msg/ZFS-8000-4J
> > config:
>
> >        rstore2                   DEGRADED
> >          raidz1                  DEGRADED
> >            sdh                   ONLINE
> >            sdg                   ONLINE
> >            sde                   ONLINE
> >            10732834143420407953  FAULTED  corrupted data
>
> > and here is the import, the replace, and the status
>
> > # zpool import
> >  pool: rstore2
> >    id: 762967251253940714
> >  state: DEGRADED
> > status: One or more devices contains corrupted data.
> > action: The pool can be imported despite missing or damaged devices.
> > The
> >        fault tolerance of the pool may be compromised if imported.
> >    see:http://www.sun.com/msg/ZFS-8000-4J
> > config:
>
> >        rstore2                   DEGRADED
> >          raidz1                  DEGRADED
> >            sdh                   ONLINE
> >            sdg                   ONLINE
> >            sde                   ONLINE
> >            10732834143420407953  FAULTED  corrupted data
>
> > # zpool import rstore2
>
> > # zpool replace rstore2 10732834143420407953 /dev/sdf
> > # zpool status
> >  pool: rstore2
> >  state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool
> > will
> >        continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> >  scrub: resilver in progress for 0h0m, 0.71% done, 0h32m to go
> > config:
>
> >        NAME                        STATE     READ WRITE CKSUM
> >        rstore2                     DEGRADED     0     0     0
> >          raidz1                    DEGRADED     0     0     0
> >             sdh                     ONLINE       0     0     0
> >             sdg                     ONLINE       0     0     0
> >             sde                     ONLINE       0     0     0
> >             replacing               DEGRADED     0     0 1.71K
> >               10732834143420407953  UNAVAIL      0     0     0  was /
> > dev/sdh
> >               sdf                   ONLINE       0     0     0  56.1M
> > resilvered
>
> > errors: No known data errors
>
> > Since re-ordering device order is easy enough, I'll consider this a
> > valid workaround.  I will re-perform the entire sequence of events to
> > verify.
VL
2009-08-28 17:58:37 UTC
Permalink
Emmanuel,

Success! I was so gun-shy after having zfs-fuse puke in my face for
the last 4 days, I didn't want to try anything.

After my last post a few min ago, I tried to detach the bad
9461167363731726650 with no hope at all. And it worked!

/usr/local/sbin.hold # ./zpool detach rstore2 9461167363731726650

NO ERRORS!! And the ugly nightmare of a broken replace ... gone ..

/usr/local/sbin.hold # ./zpool status
pool: rstore2
state: DEGRADED
status: One or more devices has been taken offline by the
administrator.
Sufficient replicas exist for the pool to continue functioning
in a
degraded state.
action: Online the device using 'zpool online' or replace the device
with
'zpool replace'.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rstore2 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
sdh OFFLINE 0 0 0

errors: No known data errors

Very pleasantly surprised. So, I thought I'd keep going, and I
online'd the drive that I offlined:

/usr/local/sbin.hold # zpool online rstore2 sdh

again, no errors, and a status gave me another pleasant surprise:

/usr/local/sbin.hold # ./zpool status
pool: rstore2
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: resilver completed after 0h0m with 0 errors on Fri Aug 28
10:49:22 2009
config:

NAME STATE READ WRITE CKSUM
rstore2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sde ONLINE 0 0 0
sdh ONLINE 0 0 0 8K resilvered

It automatically resilvered itself and went back online. Amazing. So
THIS is how zfs is supposed to work.

If it means anything, I think your version is ready to be released to
the general public, and hopefully make it into the RPM and DEB
repositorys asap.

I guess I'm going to play around with making and breaking pools in
interesting ways just to make 100% sure it's good to go ..

Thanks again everyone, and especially Emmanuel for the custom code
package.

Val
Rudd-O
2009-08-28 19:40:22 UTC
Permalink
> So, it the two drives in concern went from "UNAVIL" and "ONLINE" to
> both being "OFFLINE".  However they're still in the pool.  How do I
> remove 9461167363731726650 and sdh, and and then attempt to add sdh
> back in as a 'spare' or whatever it takes to get it to sync?

First offline, then remove. Offline makes ZFS unlink that device from
the pool but not remove. Remove only removes unlinked devices.
VL
2009-08-28 19:54:14 UTC
Permalink
Hello Rudd,

On Aug 28, 12:38 pm, Rudd-O <rud...-U/0UrcBcm+/QT0dZR+***@public.gmane.org> wrote:
> We really need a backtrace or a coredump to diagnose this issue in
> depth.  If ZFS died, it either aborted because it detected a bad
> condition (possibly signaling hardware flakiness besides the disk?)
> or it segfaulted because it encountered a really unexpected problem.
> If you run your daemon with ulimit -c unlimited before the daemon, you
> should be able to collect a core file in /, and load it up in gdb.  We
> can walk you through the procedure once you have gotten the core file.

I am no longer running the packaged version, I'm running the version
that Emmanuel provided. I have "upgraded" my pool. I'd be happy to
add that ulimit to my startup scripts, or whatever you think might be
of use ..?
Rudd-O
2009-08-28 19:38:32 UTC
Permalink
We really need a backtrace or a coredump to diagnose this issue in
depth. If ZFS died, it either aborted because it detected a bad
condition (possibly signaling hardware flakiness besides the disk?)
or it segfaulted because it encountered a really unexpected problem.
If you run your daemon with ulimit -c unlimited before the daemon, you
should be able to collect a core file in /, and load it up in gdb. We
can walk you through the procedure once you have gotten the core file.

> It seems /sbin/zfs-fuse process and crashed and gone away.  I had to
> remove the pidfile, re-run /sbin/zfs-fuse, export and re-import the
> rstore2 pool before getting it back to this state, where it's "stuck"
> again:
>
> # zpool status
>   pool: rstore2
>  state: DEGRADED
>  scrub: none requested
> config:
>
>         NAME                       STATE     READ WRITE CKSUM
>         rstore2                    DEGRADED     0     0     0
>           raidz1                   DEGRADED     0     0     0
>             sdf                    ONLINE       0     0     0
>             sdg                    ONLINE       0     0     0
>             sde                    ONLINE       0     0     0
>             replacing              DEGRADED     0     0     0
>               9461167363731726650  UNAVAIL      0     0     0  was /
> dev/sdi
>               sdh                  ONLINE       0     0     0
>
> errors: No known data errors
>
> I tried to offline both 9461167363731726650 and /dev/sdh , but both
> could not because "no valid replicas" for either.  My thought was to
> somehow get sdh out of there and try to re-add it as a hot spare.  If
> it was trying to copy the contents of the flaky /dev/sdi, then of
> course /dev/sdi would flake out and die.  I was hoping it would
> reconstruct the contents of /dev/sdi from the raid 'parity' striped on
> sdf,sdg,sde.  Any ideas on how to do this?  Is it possible?
devzero-S0/
2009-08-25 19:30:27 UTC
Permalink
so, you tell it`s safe to build a raidz with ONLY with x*3 disks (where x=1,2,3....) ?

thatŽs insane and would be a severe bug or architectural problem.

how do you explain why the "Solaris ZFS Administration Guide" ( http://dlc.sun.com/pdf/819-5461/819-5461.pdf )
is giving examples with 7 drives (page 56), 5 drives (page 59) and 4 drives + 1 spare (page 77) if it`s not safe to use ?
and why doesnŽt the manual tell a single word about this issue ?

furthermore - taken from the manual:

In RAID-Z, ZFS uses variable-width RAID stripes so that all writes are full-stripe
writes. This design is only possible because ZFS integrates file system and device management
in such a way that the file system's metadata has enough information about the underlying data
redundancy model to handle variable-width RAID stripes. RAID-Z is the world's first
software-only solution to the RAID-5 write hole.
A RAID-Z configuration withNdisks of size X with P parity disks can hold approximately
(N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.


so, whatever is the issue here - ii think itŽs NOT user mistake to have chosen 4 disks for raidz

regards
roland






I am sure because I just tried to reproduce it with raw files, but it'
> s the base for raid5, the principle is you have 3 disks :
> a, b, c and c = a xor b
> this way if you loose 1 disk in the 3 you can recover because each
> disk is the result of a simple xor between the 2 others (to simplify).
>
> So if you take a number of disks which is not a multiple of 3, then
> the parity can't be kept equally on each disk anymore.
> It would probably be a good idea ot look at a detailed zfs admin
> manual for that, I guess you need to add a spare at least in this
> case.
>
> Anyway you can reproduce this like that :
> mkdir /root/dd
> cd /root/dd
> dd if=/dev/zero bs=1M count=100 of=image
> dd if=/dev/zero bs=1M count=100 of=image2
> dd if=/dev/zero bs=1M count=100 of=image3
> dd if=/dev/zero bs=1M count=100 of=new
>
> zpool create test raidz1 /root/dd/image*
> -> ok, using 3 disks, parity everywhere.
>
> Copy some files on the pool to use some space and then
> zpool export test
> rm image2
> zpool import test -d .
> -> ok
> you will just get a degraded state with zpool status which is normal
> since you still have 2 disks to rebuild the 3rd one.
> Here you can safety run dd again to simulate buying a new disk :
>
> dd if=/dev/zero bs=1M count=100 of=image2
>
> and then
> zpool replace test /root/dd/image2
>
> a zpool status after that will show that scrub completed in no time
> because 100 Mb is very short !
>
> Now if you use 4 disks instead of 3, there is real danger :
> 1st export test if you have created :
> zpool export test
> then erase everything :
> dd if=/dev/zero bs=1M count=100 of=image
> dd if=/dev/zero bs=1M count=100 of=image2
> dd if=/dev/zero bs=1M count=100 of=image3
>
> then create test again but with 4 disks :
> zpool create test raidz1 /root/dd/image* /root/dd/new
>
> in this case parity is unbalanced. If you put some data on the pool
> again, export it as before and then rm new
> if you try to import it after this you'll get the famous error
> message saying that 1 device is unavialble because you don't have
> enough disks anymore to rebuild it.
>
> (notice that if you copy no data on the pool then it will import it
> happily even if new is deleted because there is nothing to rebuild in
> this case).
>
> 2009/8/25 <devzero-S0/***@public.gmane.org>
>
> >VL : I hadn't noticed that you were using 4 drives for a raidz1 pool.
> You had to force the 4th drive with a -f flag, raidz1 pools always
> use drives by 3 (3, 6, ...).
>
> are you sure with that?
> pointers?
>
> VL : I hadn't noticed that you were using 4 drives for a raidz1 pool.
> > You had to force the 4th drive with a -f flag, raidz1 pools always
> > use drives by 3 (3, 6, ...).
> >
> > So your pool is now unstable, if you loose 1 drive, you losse most
> of
> > it since the data can't be replicated anymore.
> >
> > Maybe there is a trick to try to save things in this case, but I
> don'
> > t know it (and I doubt there is).
> > zfs accepts -f to add a 4th drive in this case, but it's better
> never
> > to have a drive failure in this case because it's like you don't
> have
> > any raidz1 at all.
> >
> > You can try the zfsadmin.pdf to check if they have a clue about
> that :
> > http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
> >
> > Now Rudd-O : in case you read this, I just found a bug stopper for
> > your release. To reproduce these bugs with raidz1 pools, I have
> tried
> > with raw files as usual, and this time our zfs-fuse has real
> problems
> > here. zpool status never finishes with out of memory errors in the
> > syslog. So if you get a failure in a raidz1 pool with the current
> zfs-
> > fuse, you are in bad shape !
> > This should be investigated...
> >
> > 2009/8/25 VL <val.luck-***@public.gmane.org>
> >
> > Hi Emmanuel,
> >
> > Thanks so much for your reply.
> >
> > On Aug 25, 2:17 am, Emmanuel Anne <emmanuel.a...-***@public.gmane.org> wrote:
> > > sdi is busy because there is a scrub in progress : since it had
> > errors it
> > > scans the whole disk to see what is reliable with it.
> > > So if at this point you want to replace it no matter what, you
> must
> > do
> > > zpool offline rstore2 sdi
> >
> > Trying to offline the bad sdi device doesn't seem to work, it wants
> a
> > replica. Shouldn't it be ok to remove 1 drive from 4 drive "raidz"
> > set?
> >
> > # zpool offline rstore2 sdi
> > cannot offline sdi: no valid replicas
> >
> > I didn't see a -f force type option. Is there a way to be more
> > forceful with zpool and offline a bad disk?
> >
> > > when it's offline you replace it by a new drive, and then
> > > zpool replace rstore2 sdi
> > > and zpool online rstore2 sdi
> >
> > I didn't try these yet as I was unable to offline sdi. Should I just
> > remove the bad drive from the system without first 'removing' it
> from
> > that pool? My experience yesterday tells me that if I remove the
> > device from the pool, then I will be unable to do any action on that
> > pool as it will have a device missing.
> >
> > > Notice that you can also wait for the scrub to complete to see
> what
> > it will
> > > say at the end before trying to replace it.
> >
> > I tried it again (now that there is no actions like scub or resilver
> > going on) and I get the same busy:
> >
> > # zpool replace rstore2 sdi
> > cannot replace sdi with sdi: sdi is busy
> >
> > > For the version, it's ok, the pool version 13 is the most recent
> > released
> > > version until now, a new one should be released soon.
> >
> > I can't wait. Maybe it will make my problems go away?
> >
> > > For the bad vdev error, I am not sure, did you try to specify /
> dev/
> > sdi
> > > instead of sdi?
> >
> > Yes, I tried that command replace -f with both sdi and /dev/sdi and
> > both returned the exact same message, "invalid vdev specification".
> >
> > Thanks again for all your help; I can't wait to get this fixed.
> >
> > Val
> >
> > >
> >
> >
>
> ________________________________________________________________
> Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
> für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/
>
> >
>
>


________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/
Loading...