Discussion:
help zfs pool with duplicated and missing entry of hdd
Jason
2013-01-10 07:51:36 UTC
Permalink
Hi,

One of my server's zfs faulted and it shows following:
NAME STATE READ WRITE CKSUM
backup UNAVAIL 0 0 0 insufficient replicas
raidz2-0 UNAVAIL 0 0 0 insufficient replicas
c4t0d0 ONLINE 0 0 0
c4t0d1 ONLINE 0 0 0
c4t0d0 FAULTED 0 0 0 corrupted data
c4t0d3 FAULTED 0 0 0 too many errors
c4t0d4 FAULTED 0 0 0 too many errors
...(omit the rest).

My question is why c4t0d0 appeared twice, and c4t0d2 is missing.

Have check the controller card and hard disk, they are all working fine.

Please help how to troubleshooting and what is the main cause of it, how to
recover the pool?

Thank you.
Jim Klimov
2013-01-10 12:25:32 UTC
Permalink
Post by Jason
Hi,
NAME STATE READ WRITE CKSUM
backup UNAVAIL 0 0 0 insufficient replicas
raidz2-0 UNAVAIL 0 0 0 insufficient replicas
c4t0d0 ONLINE 0 0 0
c4t0d1 ONLINE 0 0 0
c4t0d0 FAULTED 0 0 0 corrupted data
c4t0d3 FAULTED 0 0 0 too many errors
c4t0d4 FAULTED 0 0 0 too many errors
...(omit the rest).
My question is why c4t0d0 appeared twice, and c4t0d2 is missing.
Have check the controller card and hard disk, they are all working fine.
This renaming does seem like an error in detecting (and further naming)
of the disks - i.e. if a connector got loose, and one of the disks is
not seen by the system, the numbering can shift in such manner. It is
indeed strange however that only "d2" got shifted or missing and not
all those numbers after it.

So, you did verify that the controller sees all the disks in "format"
command (and perhaps after a cold reboot - in BIOS)? Just in case, try
to unplug and replug all cables (power, data) in case their pins got
oxydized over time.

HTH,
//Jim
Michael Hase
2013-01-10 14:03:38 UTC
Permalink
Post by Jim Klimov
Post by Jason
Hi,
NAME STATE READ WRITE CKSUM
backup UNAVAIL 0 0 0 insufficient replicas
raidz2-0 UNAVAIL 0 0 0 insufficient replicas
c4t0d0 ONLINE 0 0 0
c4t0d1 ONLINE 0 0 0
c4t0d0 FAULTED 0 0 0 corrupted data
c4t0d3 FAULTED 0 0 0 too many errors
c4t0d4 FAULTED 0 0 0 too many errors
...(omit the rest).
My question is why c4t0d0 appeared twice, and c4t0d2 is missing.
Have check the controller card and hard disk, they are all working fine.
This renaming does seem like an error in detecting (and further naming)
of the disks - i.e. if a connector got loose, and one of the disks is
not seen by the system, the numbering can shift in such manner. It is
indeed strange however that only "d2" got shifted or missing and not
all those numbers after it.
So, you did verify that the controller sees all the disks in "format"
command (and perhaps after a cold reboot - in BIOS)? Just in case, try
to unplug and replug all cables (power, data) in case their pins got
oxydized over time.
Usually the disk numbering in any solaris based os stays the same if one
disk is offline/missing, it's fixed to the controller port, or scsi
target, or wwn. Imho a huge advantage of the c0t0d0 pattern, instead of
the linux or freebsd numbering. I once had an old sun 5200 hooked up to a
linux box and one of the 22 disks failed, every disk after the bad one had
shifted, what a mess.

To me the c4t0d0, c4t0d1, ... numbering looks either like a hardware raid
controller not in jbod mode, or even an external san. jbods normally show
up as lun 0 (d0) with different target numbers (t1, t2, ...). Maybe
something wrong with lun numbering on your box?

-- Michael

Loading...