ZFS: ZFS command can block the whole ZFS subsystem!

----- Original Message -----

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto a 3TB
disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 22:43:56 CET
2013 amd64. Filesystem in question is a single ZFS pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background via
fg" (I use FreeBSD's csh in that console) locks up the entire command
and even worse - it seems to wind up the pool in question for being
exported!

I cant think of any reason why backgrounding a shell would export a pool.

Post by O. Hartmann
I expect to get the command into the background as every other UNIX
command does when sending Ctrl-Z in the console. Obviously, ZFS
related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still stuck
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for further
maintainance

Are you sure the delete is complete? Also don't forget ZFS has TRIM by
default, so depending on support of the underlying devices you could
be seeing deletes occuring.

You can check that gstat -d

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the pool from
further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

Post by O. Hartmann
This is painful. Last time I faced the problem, I had to reboot prior
to take any action regarding any pool in the system, since one single
ZFS command could obviously block the whole subsystem (I tried to
export and import).
What is up here?

Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.

O. Hartmann

2014-01-03 16:14:57 UTC

On Fri, 3 Jan 2014 14:38:03 -0000

Post by Steven Hartland
----- Original Message -----

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto a 3TB
disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 22:43:56
CET 2013 amd64. Filesystem in question is a single ZFS pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background via
fg" (I use FreeBSD's csh in that console) locks up the entire
command and even worse - it seems to wind up the pool in question
for being exported!

I cant think of any reason why backgrounding a shell would export a pool.

I sent the job "rm" into background and I didn't say that implies an
export of the pool!

I said that the pool can not be exported once the bg-command has been
issued.

Post by O. Hartmann
I expect to get the command into the background as every other UNIX
command does when sending Ctrl-Z in the console. Obviously, ZFS
related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

As I said - the job never backgrounded, locked up the terminal and
makes the whole pool inresponsive.

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for
further maintainance

Are you sure the delete is complete? Also don't forget ZFS has TRIM by
default, so depending on support of the underlying devices you could
be seeing deletes occuring.

Quite sure it didn't! It takes hours (~ 8 now) and the drive is still
working, although I tried to stop.

Post by Steven Hartland
You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

Shortly after issuing the command

zpool import POOL00

the terminal is stuck again, the drive is working at 100% for two
hours now and it seems the great ZFS is deleting every block per pedes.
Is this supposed to last days or a week?

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the pool
from further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

At this moment I will not imagine myself what will happen if I have to
delete several deka terabytes. If the weird behaviour of the current
system can be extrapolated, then this is a no-go.

Post by O. Hartmann
This is painful. Last time I faced the problem, I had to reboot
prior to take any action regarding any pool in the system, since
one single ZFS command could obviously block the whole subsystem (I
tried to export and import).
What is up here?

Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd.
and the person or entity to whom it is addressed. In the event of
misdirection, the recipient is prohibited from using, copying,
printing or otherwise disseminating it or any information contained
in it.
In the event of misdirection, illegible or incomplete transmission
please telephone +44 845 868 1337 or return the E.mail to

Regards,
Oliver

Allan Jude

2014-01-03 16:21:24 UTC

Post by O. Hartmann
On Fri, 3 Jan 2014 14:38:03 -0000

Post by Steven Hartland
----- Original Message -----

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto a 3TB
disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 22:43:56
CET 2013 amd64. Filesystem in question is a single ZFS pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background via
fg" (I use FreeBSD's csh in that console) locks up the entire
command and even worse - it seems to wind up the pool in question
for being exported!

I cant think of any reason why backgrounding a shell would export a pool.

I sent the job "rm" into background and I didn't say that implies an
export of the pool!
I said that the pool can not be exported once the bg-command has been
issued.

Post by O. Hartmann
I expect to get the command into the background as every other UNIX
command does when sending Ctrl-Z in the console. Obviously, ZFS
related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

As I said - the job never backgrounded, locked up the terminal and
makes the whole pool inresponsive.

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for
further maintainance

Are you sure the delete is complete? Also don't forget ZFS has TRIM by
default, so depending on support of the underlying devices you could
be seeing deletes occuring.

Quite sure it didn't! It takes hours (~ 8 now) and the drive is still
working, although I tried to stop.

Post by Steven Hartland
You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.
Shortly after issuing the command
zpool import POOL00
the terminal is stuck again, the drive is working at 100% for two
hours now and it seems the great ZFS is deleting every block per pedes.
Is this supposed to last days or a week?

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the pool
from further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

At this moment I will not imagine myself what will happen if I have to
delete several deka terabytes. If the weird behaviour of the current
system can be extrapolated, then this is a no-go.

Post by O. Hartmann
This is painful. Last time I faced the problem, I had to reboot
prior to take any action regarding any pool in the system, since
one single ZFS command could obviously block the whole subsystem (I
tried to export and import).
What is up here?

Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd.
and the person or entity to whom it is addressed. In the event of
misdirection, the recipient is prohibited from using, copying,
printing or otherwise disseminating it or any information contained
in it.
In the event of misdirection, illegible or incomplete transmission
please telephone +44 845 868 1337 or return the E.mail to

Regards,
Oliver

Deleting large amounts of data with 'rm' is slow. When destroying a
dataset, ZFS grew a feature flag, async_destroy that lets this happen in
the background, and avoids a lot of these issues. An async_delete might
be something to consider some day.

--
Allan Jude

Steven Hartland

2014-01-03 17:04:00 UTC

----- Original Message -----

Post by O. Hartmann
On Fri, 3 Jan 2014 14:38:03 -0000

Post by Steven Hartland
----- Original Message -----

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto a 3TB
disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 22:43:56
CET 2013 amd64. Filesystem in question is a single ZFS pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background via
fg" (I use FreeBSD's csh in that console) locks up the entire
command and even worse - it seems to wind up the pool in question
for being exported!

I cant think of any reason why backgrounding a shell would export a pool.

I sent the job "rm" into background and I didn't say that implies an
export of the pool!
I said that the pool can not be exported once the bg-command has been
issued.

Sorry Im confused then as you said "locks up the entire command and even
worse - it seems to wind up the pool in question for being exported!"

Which to me read like you where saying the pool ended up being exported.

Post by O. Hartmann
I expect to get the command into the background as every other UNIX
command does when sending Ctrl-Z in the console. Obviously, ZFS
related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

As I said - the job never backgrounded, locked up the terminal and
makes the whole pool inresponsive.

Have you tried sending a continue signal to the process?

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for
further maintainance

Are you sure the delete is complete? Also don't forget ZFS has TRIM by
default, so depending on support of the underlying devices you could
be seeing deletes occuring.

Quite sure it didn't! It takes hours (~ 8 now) and the drive is still
working, although I tried to stop.

A delete of a file shouldn't take 8 hours, but you dont say how large
the file actually is?

Post by Steven Hartland
You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

Sorry you seem to be stating conflicting things:
1. The delete hasnt finished
2. The pool export hung
3. You have exported the pool

What exactly is gstat -d reporting, can you paste the output please.

Post by O. Hartmann
Shortly after issuing the command
zpool import POOL00
the terminal is stuck again, the drive is working at 100% for two
hours now and it seems the great ZFS is deleting every block per pedes.
Is this supposed to last days or a week?

What controller and what drive?

What does the following report:
sysctl kstat.zfs.misc.zio_trim

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the pool
from further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

At this moment I will not imagine myself what will happen if I have to
delete several deka terabytes. If the weird behaviour of the current
system can be extrapolated, then this is a no-go.

As I'm sure you'll appreciate that depends if the file is simply being
unlinked or if each sector is being erased, the answers to the above
questions should help determine that :)

Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.

O. Hartmann

2014-01-04 08:56:50 UTC

On Fri, 3 Jan 2014 17:04:00 -0000

Post by Steven Hartland
----- Original Message -----

Post by O. Hartmann
On Fri, 3 Jan 2014 14:38:03 -0000

Post by Steven Hartland
----- Original Message -----

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto
a 3TB disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20
22:43:56 CET 2013 amd64. Filesystem in question is a single ZFS
pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background
via fg" (I use FreeBSD's csh in that console) locks up the
entire command and even worse - it seems to wind up the pool in
question for being exported!

I cant think of any reason why backgrounding a shell would export a pool.

I sent the job "rm" into background and I didn't say that implies an
export of the pool!
I said that the pool can not be exported once the bg-command has
been issued.

Sorry Im confused then as you said "locks up the entire command and
even worse - it seems to wind up the pool in question for being
exported!"
Which to me read like you where saying the pool ended up being
exported.

I'm not a native English speaker. My intention was, to make it short:

renove the dummy file. While having issued the command in the
foreground of the terminal, I decided a second later after hitting
return, to send it in the background via suspending the rm-command and
issuing "bg" then.

Post by O. Hartmann
I expect to get the command into the background as every other
UNIX command does when sending Ctrl-Z in the console.
Obviously, ZFS related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

As I said - the job never backgrounded, locked up the terminal and
makes the whole pool inresponsive.

Have you tried sending a continue signal to the process?

No, not by intention. Since the operation started to slow down the
whole box and seemed to influence nearly every operation with ZFS pools
I intended (zpool status, zpool import the faulty pool, zpool export) I
rebootet the machine.

After the reboot, when ZFS came up, the drive started working like
crazy again and the system stopped while in recognizing the ZFS pools.
I did then a hard reset and restarted in single user mode, exported the
pool successfully, and rebooted. But the moment I did an zpool import
POOL, the heavy working continued.

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for
further maintainance

Are you sure the delete is complete? Also don't forget ZFS has
TRIM by default, so depending on support of the underlying
devices you could be seeing deletes occuring.

Quite sure it didn't! It takes hours (~ 8 now) and the drive is
still working, although I tried to stop.

A delete of a file shouldn't take 8 hours, but you dont say how large
the file actually is?

The drive has a capacity of ~ 2,7 TiB (Western Digital 3TB drive). The
file I created was, do not laugh, please, 2,7 TB :-( I guess depending
on COW technique and what I read about ZFS accordingly to this thread
and others, this seems to be the culprit. There is no space left to
delete the file savely.

By the way - the box is still working on 100% on that drive :-( That's
now > 12 hours.

Post by Steven Hartland
You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

1. The delete hasnt finished
2. The pool export hung
3. You have exported the pool

Not conflicting, but in my non-expert terminology not quite accurate
and precise as you may expect.

ad item 1) I terminated (by the brute force of the mighty RESET button)
the copy command. It hasn't finished the operation on the pool as I can
see, but it might be a kind of recovery mechanism in progress now, not
the rm-command anymore.

ad 2) Yes, first it hung, then I reset the box, then in single user
mode the export to avoid further interaction, then I tried to import
the pool again ...
ad 3) yes, successfully after the reset, now I imported the pool and
the terminal, in which I issued the command is still stuck again while
the pool is under heavy load.

Post by Steven Hartland
What exactly is gstat -d reporting, can you paste the output please.

I think this is boring looking at 100% activity, but here it is ;-)

dT: 1.047s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada1
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada2
10 114 114 455 85.3 0 0 0.0 0 0 0.0 100.0| ada3
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada4
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| cd0
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p1
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p2
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p3
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p4
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p5
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p6
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p7
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p8
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p9
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p10
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p11
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p12
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p13
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada0p14
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/boot
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gptid/c130298b-046a-11e0-b2d6-001d60a6fa74
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/root
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/swap
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gptid/fa3f37b1-046a-11e0-b2d6-001d60a6fa74
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/var
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/var.tmp
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/usr
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/usr.src
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/usr.obj
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/usr.ports
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/data
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/compat
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/var.mail
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| gpt/usr.local
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada1p1
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada2p1
10 114 114 455 85.3 0 0 0.0 0 0 0.0 100.0| ada3p1
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| ada4p1

What controller and what drive?

Hardware is as follows:
CPU: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (3201.89-MHz K8-class CPU)
real memory = 34359738368 (32768 MB)
avail memory = 33252507648 (31712 MB)
ahci1: <Intel Patsburg AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb520000-0xfb5207ff irq 20 at device 31.2 on pci0
ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich8: <AHCI channel> at channel 0 on ahci1
ahcich9: <AHCI channel> at channel 1 on ahci1
ahcich10: <AHCI channel> at channel 2 on ahci1
ahcich11: <AHCI channel> at channel 3 on ahci1
ahcich12: <AHCI channel> at channel 4 on ahci1
ahcich13: <AHCI channel> at channel 5 on ahci1
ahciem0: <AHCI enclosure management bridge> on ahci1

Post by Steven Hartland
sysctl kstat.zfs.misc.zio_trim

sysctl kstat.zfs.misc.zio_trim
kstat.zfs.misc.zio_trim.bytes: 0
kstat.zfs.misc.zio_trim.success: 0
kstat.zfs.misc.zio_trim.unsupported: 507
kstat.zfs.misc.zio_trim.failed: 0

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the
pool from further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

At this moment I will not imagine myself what will happen if I have
to delete several deka terabytes. If the weird behaviour of the
current system can be extrapolated, then this is a no-go.

As I'm sure you'll appreciate that depends if the file is simply being
unlinked or if each sector is being erased, the answers to the above
questions should help determine that :)

You're correct in that. But sometimes I'd like to appreciate to have the choice.

Post by Steven Hartland
Regards
Steve

Regards,

Oliver

Steven Hartland

2014-01-04 17:13:04 UTC

Post by O. Hartmann
On Fri, 3 Jan 2014 17:04:00 -0000

Post by Steven Hartland
Sorry Im confused then as you said "locks up the entire command and
even worse - it seems to wind up the pool in question for being
exported!"
Which to me read like you where saying the pool ended up being
exported.

renove the dummy file. While having issued the command in the
foreground of the terminal, I decided a second later after hitting
return, to send it in the background via suspending the rm-command and
issuing "bg" then.

Ahh thanks for explaining :)

Post by O. Hartmann
I expect to get the command into the background as every other
UNIX command does when sending Ctrl-Z in the console.
Obviously, ZFS related stuff in FreeBSD doesn't comply.
The file has been removed from the pool but the console is still
top
17790 root 1 20 0 8228K 1788K STOP 10 0:05
0.00% rm
for the particular "rm" command issued.

Thats not backgrounded yet otherwise it wouldnt be in the state STOP.

As I said - the job never backgrounded, locked up the terminal and
makes the whole pool inresponsive.

Have you tried sending a continue signal to the process?

No, not by intention. Since the operation started to slow down the
whole box and seemed to influence nearly every operation with ZFS pools
I intended (zpool status, zpool import the faulty pool, zpool export) I
rebootet the machine.
After the reboot, when ZFS came up, the drive started working like
crazy again and the system stopped while in recognizing the ZFS pools.
I did then a hard reset and restarted in single user mode, exported the
pool successfully, and rebooted. But the moment I did an zpool import
POOL, the heavy working continued.

Post by O. Hartmann
Now, having the file deleted, I'd like to export the pool for
further maintainance

Are you sure the delete is complete? Also don't forget ZFS has
TRIM by default, so depending on support of the underlying
devices you could be seeing deletes occuring.

Quite sure it didn't! It takes hours (~ 8 now) and the drive is
still working, although I tried to stop.

A delete of a file shouldn't take 8 hours, but you dont say how large
the file actually is?

The drive has a capacity of ~ 2,7 TiB (Western Digital 3TB drive). The
file I created was, do not laugh, please, 2,7 TB :-( I guess depending
on COW technique and what I read about ZFS accordingly to this thread
and others, this seems to be the culprit. There is no space left to
delete the file savely.
By the way - the box is still working on 100% on that drive :-( That's
now > 12 hours.

Post by Steven Hartland
You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

1. The delete hasnt finished
2. The pool export hung
3. You have exported the pool

Not conflicting, but in my non-expert terminology not quite accurate
and precise as you may expect.
ad item 1) I terminated (by the brute force of the mighty RESET button)
the copy command. It hasn't finished the operation on the pool as I can
see, but it might be a kind of recovery mechanism in progress now, not
the rm-command anymore.
ad 2) Yes, first it hung, then I reset the box, then in single user
mode the export to avoid further interaction, then I tried to import
the pool again ...
ad 3) yes, successfully after the reset, now I imported the pool and
the terminal, in which I issued the command is still stuck again while
the pool is under heavy load.

Post by Steven Hartland
What exactly is gstat -d reporting, can you paste the output please.

What controller and what drive?

real memory = 34359738368 (32768 MB)
avail memory = 33252507648 (31712 MB)
ahci1: <Intel Patsburg AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem
0xfb520000-0xfb5207ff irq 20 at device 31.2 on pci0
ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich8: <AHCI channel> at channel 0 on ahci1
ahcich9: <AHCI channel> at channel 1 on ahci1
ahcich10: <AHCI channel> at channel 2 on ahci1
ahcich11: <AHCI channel> at channel 3 on ahci1
ahcich12: <AHCI channel> at channel 4 on ahci1
ahcich13: <AHCI channel> at channel 5 on ahci1
ahciem0: <AHCI enclosure management bridge> on ahci1

Post by Steven Hartland
sysctl kstat.zfs.misc.zio_trim

sysctl kstat.zfs.misc.zio_trim
kstat.zfs.misc.zio_trim.bytes: 0
kstat.zfs.misc.zio_trim.success: 0
kstat.zfs.misc.zio_trim.unsupported: 507
kstat.zfs.misc.zio_trim.failed: 0

Thanks that confirms its not processing deletes at the disk level.

Post by O. Hartmann
but that doesn't work with
zpool export -f poolname
This command is now also stuck blocking the terminal and the
pool from further actions.

If the delete hasnt completed and is stuck in the kernel this is
to be expected.

At this moment I will not imagine myself what will happen if I have
to delete several deka terabytes. If the weird behaviour of the
current system can be extrapolated, then this is a no-go.

As I'm sure you'll appreciate that depends if the file is simply being
unlinked or if each sector is being erased, the answers to the above
questions should help determine that :)

You're correct in that. But sometimes I'd like to appreciate to have the choice.

The space has to be accounted for so thats likely whats going on.

I can't say I've deleted such a big single file, not to mention one which
totally fills the disk. A good read would be:
http://blog.delphix.com/matt/2012/07/11/performance-of-zfs-destroy/

Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.

Ian Lepore

2014-01-03 17:41:53 UTC

Post by O. Hartmann
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background

via

Post by O. Hartmann
fg" (I use FreeBSD's csh in that console) locks up the entire
command and even worse - it seems to wind up the pool in question
for being exported!

It's probably just a typo in your email, but "^Z fg" suspends the
process then resumes it in foreground; I suspect you meant "^Z bg".
Also, at the point you would hit ^Z, it might be handy to hit ^T and see
what the process is actually doing.

-- Ian

Dan Nelson

2014-01-03 18:16:22 UTC

You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

Did you happen to have enabled deduplication on the filesystem in question?
That's the only thing I can think of that would make file deletions run
slow. I have deleted files up to 10GB on regular filesystems with no
noticable delay at the commandline. If you have deduplication enabled,
however, each block's hash has to be looked up in the dedupe table, and if
you don't have enough RAM for it to be loaded completely into memory, that
will be very very slow :)

There are varying recommendations on how much RAM you need for a given pool
size, since the DDT has to hold an entry for each block written, and
blocksize depends on whether you wrote your files sequentially (128K blocks)
or randomly (8k or smaller). Each DDT entry takes 320 bytes of RAM, so a
full 3TB ZFS pool would need at minimum 320*(3TB/128K) ~= 7GB of RAM to hold
the DDT, and much more than that if your averge blocksize is less than 128K.

So, if your system has less than 8GB of RAM in it, there's no way the DDT
will be able to stay in memory, so you're probably going to have to do at
least one disk seek (probably more, since you're writing to the DDT as well)
per block in the file you're deleting. You should probably have 16GB or
more RAM, and use an SSD as a L2ARC device as well.

--
Dan Nelson
***@allantgroup.com

O. Hartmann

2014-01-03 19:25:35 UTC

On Fri, 3 Jan 2014 12:16:22 -0600

Post by Dan Nelson

On Fri, 3 Jan 2014 14:38:03 -0000 "Steven Hartland"

Post by O. Hartmann
For some security reasons, I dumped via "dd" a large file onto
a 3TB disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20
22:43:56 CET 2013 amd64. Filesystem in question is a single
ZFS pool.
Issuing the command
"rm dumpfile.txt"
and then hitting Ctrl-Z to bring the rm command into background
via fg" (I use FreeBSD's csh in that console) locks up the
entire command and even worse - it seems to wind up the pool in
question for being exported!

You can check that gstat -d

command report 100% acticity on the drive. I exported the pool in
question in single user mode and now try to import it back while in
miltiuser mode.

Did you happen to have enabled deduplication on the filesystem in
question? That's the only thing I can think of that would make file
deletions run slow. I have deleted files up to 10GB on regular
filesystems with no noticable delay at the commandline. If you have
deduplication enabled, however, each block's hash has to be looked up
in the dedupe table, and if you don't have enough RAM for it to be
loaded completely into memory, that will be very very slow :)
There are varying recommendations on how much RAM you need for a
given pool size, since the DDT has to hold an entry for each block
written, and blocksize depends on whether you wrote your files
sequentially (128K blocks) or randomly (8k or smaller). Each DDT
entry takes 320 bytes of RAM, so a full 3TB ZFS pool would need at
minimum 320*(3TB/128K) ~= 7GB of RAM to hold the DDT, and much more
than that if your averge blocksize is less than 128K.
So, if your system has less than 8GB of RAM in it, there's no way the
DDT will be able to stay in memory, so you're probably going to have
to do at least one disk seek (probably more, since you're writing to
the DDT as well) per block in the file you're deleting. You should
probably have 16GB or more RAM, and use an SSD as a L2ARC device as
well.

Thanks for the explanation.

The box in question has 32GB RAM.

I wrote a single file, 2,72 GB in size, to the pool, which I tried to
"remove via rm" then.

DEDUp seems to be off according to this information:

[~] zfs get all BACKUP00
NAME PROPERTY VALUE SOURCE
BACKUP00 type filesystem -
BACKUP00 creation Fr Dez 20 23:14 2013 -
BACKUP00 used 2.53T -
BACKUP00 available 147G -
BACKUP00 referenced 144K -
BACKUP00 compressratio 1.00x -
BACKUP00 mounted yes -
BACKUP00 quota none default
BACKUP00 reservation none default
BACKUP00 recordsize 128K default
BACKUP00 mountpoint /BACKUP00 default
BACKUP00 sharenfs off default
BACKUP00 checksum on default
BACKUP00 compression off default
BACKUP00 atime on default
BACKUP00 devices on default
BACKUP00 exec on default
BACKUP00 setuid on default
BACKUP00 readonly off default
BACKUP00 jailed off default
BACKUP00 snapdir hidden default
BACKUP00 aclmode discard default
BACKUP00 aclinherit restricted default
BACKUP00 canmount on default
BACKUP00 xattr off temporary
BACKUP00 copies 1 default
BACKUP00 version 5 -
BACKUP00 utf8only off -
BACKUP00 normalization none -
BACKUP00 casesensitivity sensitive -
BACKUP00 vscan off default
BACKUP00 nbmand off default
BACKUP00 sharesmb off default
BACKUP00 refquota none default
BACKUP00 refreservation none default
BACKUP00 primarycache all default
BACKUP00 secondarycache all default
BACKUP00 usedbysnapshots 0 -
BACKUP00 usedbydataset 144K -
BACKUP00 usedbychildren 2.53T -
BACKUP00 usedbyrefreservation 0 -
BACKUP00 logbias latency default
BACKUP00 dedup off default
BACKUP00 mlslabel -
BACKUP00 sync standard default
BACKUP00 refcompressratio 1.00x -
BACKUP00 written 144K -
BACKUP00 logicalused 2.52T -
BACKUP00 logicalreferenced 43.5K -

Funny, the disk is supposed to be "empty" ... but is marked as used by
2.5 TB ...

Peter Jeremy

2014-01-04 22:10:04 UTC

Post by O. Hartmann
[~] zfs get all BACKUP00
NAME PROPERTY VALUE SOURCE

...

Post by O. Hartmann
BACKUP00 usedbysnapshots 0 -
BACKUP00 usedbydataset 144K -
BACKUP00 usedbychildren 2.53T -
BACKUP00 usedbyrefreservation 0 -
Funny, the disk is supposed to be "empty" ... but is marked as used by
2.5 TB ...

That says there's another filesystem inside BACKUP00 which has 2.5TB used.

What are the results of:
zpool status -v BACKUP00
zfs list -r BACKUP00

--
Peter Jeremy

O. Hartmann

2014-01-04 22:21:47 UTC

On Sun, 5 Jan 2014 09:10:04 +1100

On 2014-Jan-03 20:25:35 +0100, "O. Hartmann"

Post by O. Hartmann
[~] zfs get all BACKUP00
NAME PROPERTY VALUE SOURCE

...

That says there's another filesystem inside BACKUP00 which has 2.5TB used.
zpool status -v BACKUP00
zfs list -r BACKUP00

Nothing - drive is still operating on something (as reported), every
zfs related commands make the terminal stuck ...

O. Hartmann

2014-01-04 22:26:42 UTC

On Sun, 5 Jan 2014 09:10:04 +1100

On 2014-Jan-03 20:25:35 +0100, "O. Hartmann"

Post by O. Hartmann
[~] zfs get all BACKUP00
NAME PROPERTY VALUE SOURCE

...

That says there's another filesystem inside BACKUP00 which has 2.5TB used.
zpool status -v BACKUP00
zfs list -r BACKUP00

No, not stuck, came back after a while:

zpool status -v BACKUP00
pool: BACKUP00
state: ONLINE
status: Some supported features are not enabled on the pool. The pool
can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not
support the features. See zpool-features(7) for details.
scan: none requested
config:

NAME STATE READ WRITE CKSUM
BACKUP00 ONLINE 0 0 0
ada3p1 ONLINE 0 0 0

errors: No known data errors

[...]

zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Peter Jeremy

2014-01-04 23:14:26 UTC

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from 2.5T
to 1.47T used (though I gather that has taken several days). Can you
pleas post the result of
zfs get all BACKUP00/backup

--
Peter Jeremy

O. Hartmann

2014-01-05 08:11:38 UTC

On Sun, 5 Jan 2014 10:14:26 +1100

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from 2.5T
to 1.47T used (though I gather that has taken several days). Can you
pleas post the result of
zfs get all BACKUP00/backup

Here we go:

NAME PROPERTY VALUE SOURCE
BACKUP00/backup type filesystem -
BACKUP00/backup creation Fr Dez 20 23:17 2013 -
BACKUP00/backup used 1.47T -
BACKUP00/backup available 1.19T -
BACKUP00/backup referenced 1.47T -
BACKUP00/backup compressratio 1.00x -
BACKUP00/backup mounted no -
BACKUP00/backup quota none default
BACKUP00/backup reservation none default
BACKUP00/backup recordsize 128K default
BACKUP00/backup mountpoint /backup local
BACKUP00/backup sharenfs off default
BACKUP00/backup checksum sha256 local
BACKUP00/backup compression lz4 local
BACKUP00/backup atime on default
BACKUP00/backup devices on default
BACKUP00/backup exec on default
BACKUP00/backup setuid on default
BACKUP00/backup readonly off default
BACKUP00/backup jailed off default
BACKUP00/backup snapdir hidden default
BACKUP00/backup aclmode discard default
BACKUP00/backup aclinherit restricted default
BACKUP00/backup canmount on default
BACKUP00/backup xattr on default
BACKUP00/backup copies 1 default
BACKUP00/backup version 5 -
BACKUP00/backup utf8only off -
BACKUP00/backup normalization none -
BACKUP00/backup casesensitivity sensitive -
BACKUP00/backup vscan off default
BACKUP00/backup nbmand off default
BACKUP00/backup sharesmb on local
BACKUP00/backup refquota none default
BACKUP00/backup refreservation none default
BACKUP00/backup primarycache all default
BACKUP00/backup secondarycache all default
BACKUP00/backup usedbysnapshots 0 -
BACKUP00/backup usedbydataset 1.47T -
BACKUP00/backup usedbychildren 0 -
BACKUP00/backup usedbyrefreservation 0 -
BACKUP00/backup logbias latency default
BACKUP00/backup dedup on local
BACKUP00/backup mlslabel -
BACKUP00/backup sync standard default
BACKUP00/backup refcompressratio 1.00x -
BACKUP00/backup written 1.47T -
BACKUP00/backup logicalused 1.47T -
BACKUP00/backup logicalreferenced 1.47T -

Peter Jeremy

2014-01-05 08:30:39 UTC

Post by O. Hartmann
On Sun, 5 Jan 2014 10:14:26 +1100

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from 2.5T
to 1.47T used (though I gather that has taken several days). Can you
pleas post the result of
zfs get all BACKUP00/backup

BACKUP00/backup dedup on local

This is your problem. Before it can free any block, it has to check
for other references to the block via the DDT and I suspect you don't
have enough RAM to cache the DDT.

Your options are:
1) Wait until the delete finishes.
2) Destroy the pool with extreme prejudice: Forcably export the pool
(probably by booting to single user and not starting ZFS) and write
zeroes to the first and last MB of ada3p1.

BTW, this problem will occur on any filesystem where you've ever
enabled dedup - once there are any dedup'd blocks in a filesystem,
all deletes need to go via the DDT.

--
Peter Jeremy

O. Hartmann

2014-01-05 08:46:07 UTC

On Sun, 5 Jan 2014 19:30:39 +1100

On 2014-Jan-05 09:11:38 +0100, "O. Hartmann"

Post by O. Hartmann
On Sun, 5 Jan 2014 10:14:26 +1100

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from
2.5T to 1.47T used (though I gather that has taken several days).
Can you pleas post the result of
zfs get all BACKUP00/backup

BACKUP00/backup dedup on local

This is your problem. Before it can free any block, it has to check
for other references to the block via the DDT and I suspect you don't
have enough RAM to cache the DDT.
1) Wait until the delete finishes.
2) Destroy the pool with extreme prejudice: Forcably export the pool
(probably by booting to single user and not starting ZFS) and write
zeroes to the first and last MB of ada3p1.
BTW, this problem will occur on any filesystem where you've ever
enabled dedup - once there are any dedup'd blocks in a filesystem,
all deletes need to go via the DDT.

As I stated earlier in the this thread, the box in question has 32 GB
RAM and this should be sufficient.

Adam Vande More

2014-01-05 12:43:18 UTC

Post by O. Hartmann
On Sun, 5 Jan 2014 10:14:26 +1100

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from 2.5T
to 1.47T used (though I gather that has taken several days). Can you
pleas post the result of
zfs get all BACKUP00/backup

NAME PROPERTY VALUE SOURCE
BACKUP00/backup type filesystem -
BACKUP00/backup creation Fr Dez 20 23:17 2013 -
BACKUP00/backup used 1.47T -
BACKUP00/backup available 1.19T -
BACKUP00/backup referenced 1.47T -
BACKUP00/backup compressratio 1.00x -
BACKUP00/backup mounted no -
BACKUP00/backup quota none default
BACKUP00/backup reservation none default
BACKUP00/backup recordsize 128K default
BACKUP00/backup mountpoint /backup local
BACKUP00/backup sharenfs off default
BACKUP00/backup checksum sha256 local
BACKUP00/backup compression lz4 local
BACKUP00/backup atime on default
BACKUP00/backup devices on default
BACKUP00/backup exec on default
BACKUP00/backup setuid on default
BACKUP00/backup readonly off default
BACKUP00/backup jailed off default
BACKUP00/backup snapdir hidden default
BACKUP00/backup aclmode discard default
BACKUP00/backup aclinherit restricted default
BACKUP00/backup canmount on default
BACKUP00/backup xattr on default
BACKUP00/backup copies 1 default
BACKUP00/backup version 5 -
BACKUP00/backup utf8only off -
BACKUP00/backup normalization none -
BACKUP00/backup casesensitivity sensitive -
BACKUP00/backup vscan off default
BACKUP00/backup nbmand off default
BACKUP00/backup sharesmb on local
BACKUP00/backup refquota none default
BACKUP00/backup refreservation none default
BACKUP00/backup primarycache all default
BACKUP00/backup secondarycache all default
BACKUP00/backup usedbysnapshots 0 -
BACKUP00/backup usedbydataset 1.47T -
BACKUP00/backup usedbychildren 0 -
BACKUP00/backup usedbyrefreservation 0 -
BACKUP00/backup logbias latency default
BACKUP00/backup dedup on local

As already described by Dan and perhaps not followed up on: dedup requires
at very large amount of memory. Assuming 32GB is sufficient is most likely
wrong.

What does zdb -S BACKUP00 say?

Also I will note you were asked if the ZFS FS in question had dedup
enabled. You replied with a response from an incorrect FS.

--
Adam

O. Hartmann

2014-01-05 15:41:10 UTC

On Sun, 5 Jan 2014 06:43:18 -0600

On Sun, Jan 5, 2014 at 2:11 AM, O. Hartmann

Post by O. Hartmann
On Sun, 5 Jan 2014 10:14:26 +1100

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"

Post by Peter Jeremy
zfs list -r BACKUP00
NAME USED AVAIL REFER MOUNTPOINT
BACKUP00 1.48T 1.19T 144K /BACKUP00
BACKUP00/backup 1.47T 1.19T 1.47T /backup

Well, that at least shows it's making progress - it's gone from
2.5T to 1.47T used (though I gather that has taken several
days). Can you pleas post the result of
zfs get all BACKUP00/backup

As already described by Dan and perhaps not followed up on: dedup
requires at very large amount of memory. Assuming 32GB is sufficient
is most likely wrong.
What does zdb -S BACKUP00 say?

That command is stuck for 2 hours by now ...

Also I will note you were asked if the ZFS FS in question had dedup
enabled. You replied with a response from an incorrect FS.

Adam Vande More

2014-01-05 17:06:09 UTC