ZFS hangs/freezes after disk failure, resumes when disk is replaced

Discussion:

Todd H. Poole

2008-08-24 04:06:54 UTC

Howdy yall,

Earlier this month I downloaded and installed the latest copy of OpenSolaris (2008.05) so that I could test out some of the newer features I've heard so much about, primarily ZFS.

My goal was to replace our aging linux-based (SuSE 10.1) file and media server with a new machine running Sun's OpenSolaris and ZFS. Our old server ran your typical RAID5 setup with 4 500GB disks (3 data, 1 parity), used lvm, mdadm, and xfs to help keep things in order, and relied on NFS to export users' shares. It was solid, stable, and worked wonderfully well.

I would like to replicate this experience using the tools OpenSolaris has to offer, taking advantages of ZFS. However, there are enough differences between the two OSes - especially with respect to the filesystems and (for lack of a better phrase) "RAID managers" - to cause me to consult (on numerous occasions) the likes of Google, these forums, and other places for help.

I've been successful in troubleshooting all problems up until now.

On our old media server (the SuSE 10.1 one), when a disk failed, the machine would send out an e-mail detailing the type of failure, and gracefully fall into a degraded state, but would otherwise continue to operate using the remaining 3 disks in the system. After the faulty disk was replaced, all of the data from the old disk would be replicated onto the new one (I think the term is "resilvered" around here?), and after a few hours, the RAID5 array would be seamlessly promoted from "degraded" back up to a healthy "clean" (or "online") state.

Throughout the entire process, there would be no interruptions to the end user: all NFS shares still remained mounted, there were no noticeable drops in I/O, files, directories, and any other user-created data still remained available, and if everything went smoothly, no one would notice a failure had even occurred.

I've tried my best to recreate something similar in OpenSolaris, but I'm stuck on making it all happen seamlessly.

For example, I have a standard beige box machine running OS 2008.05 with a zpool that contains 4 disks, similar to what the old SuSE 10.1 server had. However, whenever I unplug the SATA cable from one of the drives (to simulate a catastrophic drive failure) while doing moderate reading from the zpool (such as streaming HD video), not only does the video hang on the remote machine (which is accessing the zpool via NFS), but the server running OpenSolaris seems to either hang, or become incredibly unresponsive.

And when I write unresponsive, I mean that when I type the command "zpool status" to see what's going on, the command hangs, followed by a frozen Terminal a few seconds later. After just a few more seconds, the entire GUI - mouse included - locks up or freezes, and all NFS shares become unavailable from the perspective of the remote machines. The whole machine locks up hard.

The machine then stays in this frozen state until I plug the hard disk back in, at which point everything, quite literally, pops back into existence all at once: the output of the "zpool status" command flies by (with all disks listed as "ONLINE" and all "READ," "WRITE," and "CKSUM," fields listed as "0"), the mouse jumps to a different part of the screen, the NFS share becomes available again, and the movie resumes right where it had left off.

While such a quick resume is encouraging, I'd like to avoid the freeze in the first place.

How can I keep any hardware failures like the above transparent to my users?

-Todd

PS: I've done some researching, and while my problem is similar to the following:

http://opensolaris.org/jive/thread.jspa?messageID=151719&#151719
http://opensolaris.org/jive/thread.jspa?messageID=240481&#240481

most of these posts are quite old, and do not offer any solutions.

PSS: I know I haven't provided any details on hardware, but I feel like this is more likely a higher-level issue (like some sort of configuration file or setting is needed) rather than a lower-level one (like faulty hardware). However, if someone were to give me a command to run, I'd gladly do it... I'm just not sure which ones would be helpful, or if I even know which ones to run. It took me half an hour of searching just to find out how to list the disks installed in this system (it's "format") so that I could build my zpool in the first place. It's not quite as simple as writing out /dev/hda, /dev/hdb, /dev/hdc, /dev/hdd. ;)

This message posted from opensolaris.org

Tim

2008-08-24 04:13:40 UTC