Discussion:
How about this simple 4K blocksize enhancement?
ray vantassle
2012-01-09 16:56:56 UTC
Permalink
I'm still trying to get a good (accurate) feel for how important it is to
have ZFS use 4K blocksize on AFT 4KB sectors disks. However, just this
weekend I built a kludged zpool to create hardcoded 4K blocksize and
rebuilt my pool. Got the idea from a patch I found a while ago which added
a "blocksize" option to zpool/create. The patch was so simple that I've
wondered why is hasn't already been pulled into zfs-fuse. Of course, it
wouldn't help unless people knew: a) that the option exists, and b) that
their drive is a 4K sector drive, and c) specify that option when they
create the pool.

There's been much discussion about the drives lying and saying that they
are 512 when they are really 4K, whitelists to override what a drive
reports about itself, etc. So the automatic approach is of limited benefit
until the drives stop lying.

Anyway.....I thought of a simple thing to do that will probably work okay
most of the time, and even when it is wrong it doesn't harm anything.

For the interim, why not just have zpool/create assume that it should use a
blocksize of 4KB (ashift=12) for any drive that is 2TB or larger? The
downside is that files sizes are rounded up to 4K instead of 512B, so
there's slightly more wasted space, but so what? A 2TB+ drive has so much
capacity that the wasted space is trivial.
FWIW, Windows uses a default clustersize of 4KB for NTFS regardless of the
size size and 4KB for FAT16/32 for any disk over 256MB -- which really
means that Windows essentially *always* uses 4KB. Ext4 also seems to
default to 4K blocksize.

So why not just make an enhancement to zpool/create and set the default
blocksize to 4KB for anything more than 1TB?

***************************************

Another question: Is there any active development work being done on
zfs-fuse? I don't see much activity.
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
sgheeren
2012-01-09 20:06:35 UTC
Permalink
Post by ray vantassle
There's been much discussion about the drives lying and saying that
they are 512 when they are really 4K, whitelists to override what a
drive reports about itself, etc. So the automatic approach is of
limited benefit until the drives stop lying.
Did you see the thread/issue tracker on ZoL about this issue? I mean the
one where RuddO prototyped a profile-based optimizer? There were many
intricacies and unpexted/unpexectable performance effects, IIRC. Well,
anyways, I don't realy have time to go re-read that all, but in case
you've not seen it, it will interest you.

Patches are welcome, and if you supply spare drives, and I'd be happy to
test/review things for you,

Seth
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
sgheeren
2012-01-09 20:21:43 UTC
Permalink
Post by sgheeren
and if you supply spare drives
mmm I decided to drop the lame joke, but something came in the way. That
wasn't serious, of course :)
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
ray
2012-01-09 22:07:16 UTC
Permalink
...one where RuddO prototyped a profile-based optimizer...
Yes, I read the whole thing a while back, when I was first looking into
ZFS. I took me back to implementation discussions we would have in my job
(realtime system software development) when somebody would come up with a
complex & convoluted proposal for a simple task. There's no reason for all
the complexity. "We don't have to be perfect, we only need to be Good
Enough." As well as "K.I.S.S" "Simple is Better"

There's no harm in defaulting to a 4KB blocksize on sufficiently large
disks. Also, if you're gonna make that change, might as well add the
ability to specify a blocksize to use.
I'll work up a patch. Just tell me how to submit it, and how to get people
to review it for inclusion. GIT? patchfile?
if you supply spare drives, and I'd be happy to test/review things for
you, ... lame joke

No problem. Back in the pre-Windows days I wrote a couple of shareware CPU
performance programs, and always had a message built-in for extraordinary
performance: "Your CPU is much faster than anything I have tested. Please
send it to me for further evaluation." Nobody ever took me up on it,
though.
Oh my, it's true that the internet never forgets anything. I just googled
"cachechk shareware" -- and out it popped.
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
sgheeren
2012-01-09 22:26:26 UTC
Permalink
There's no reason for all the complexity. "We don't have to be
perfect, we only need to be Good Enough." As well as "K.I.S.S"
"Simple is Better"
Agreed. If you can come up with a simple heuristic, and can show that
the heuristic

(a) works well for AF disks
(b) doesn't degrade significantly for 512k disks

I'll be happy to test/accept the patch :)
GIT? patchfile?
Whatever floats your... It's not that I receive too many patches these
days. (If you're on github, let me mirror my personal repo there, so I
can learn about the pull-request workflow...: THERE, done
<https://github.com/sehe/zfs-fuse> )

I suggest you make it work against maint or testing branches (matching
branches match the official repo
<http://gitweb.zfs-fuse.net/?p=official;a=summary>)

Then again, this will most likely rebase cleanly to all current branches.
Nobody ever took me up on it, though.
Haha. Nice story :)

Regards,
Seth
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
ray vantassle
2012-01-11 03:20:11 UTC
Permalink
Seth,
I just can seem to understand git. Probably brain damage due to all the
years that we used Clearcase at work. Last couple of years I've been doing
development on Tomato-USB router freeware firmware, and had to post my
changes as patches because I just couldn't get the hang of git.

So anyway....I downloaded the latest version os zfs-fuse from your link and
untar'ed it, and built it.

I'm having trouble with scons, too. ;-( Why, oh why can't people just
use the de-facto standard utilities -- like make -- instead of a "better"
tool that 99.9% of people have never heard of? Somehow the linux kernel
build process manages to get by with make and file timestamps instead of
scon's MD5 checksum scheme -- and the kernel is a much more complex build
than things like zfs-fuse. ::sigh::

Anyway......I've built zfs-fuze-0.7.0 from the git tarball, and it works on
my system. BUT....debian's version is zfs-fuse_0.7.0-3, There are a
handful of patches & bugfixes that aren't in your git. Google
"zfs-fuse_0.7.0-3.debian.tar.gz". What about these? Shouldn't they get
pulled in to the "official" zfs-fuse?

I made my auto-blocksize change and began cursory tests on some 750GB
non-AFT disks -- using partitions, not files. Creating a 1GB file runs
about 10 MB/s, same speed whether blocksize is 512 or 4096 ( ashift = 9 or
12). Creating the same file on an ext4 fs is 16 MB/s. For testing I set
the threshold at 190GB. Under that is standard ashift=9. Over that is
ashift=12.

Haven't yet tested raidz or mirror, just single disk pools. Haven't yet
tested on an AFT disk. The only one I have that I can test with is being
used. To test with it I'll have to re-parition the drive -- currently the
entire 2TB is one partition dedicated to 1 zpool. Since zfs can't shrink a
pool I'll have to save the data to another disk, kill the pool, repartiion
the drive, and then restore the data to the new smaller pool. Lucky I have
much less than 2TB of files on it.

The change is pretty trivial, almost doesn't need to be tested, since you
can see if it's right or wrong by simple inspection.

-------------
My ext4 drive is a ST3300831A.
My zpool drives are WDC_WD7501AALS (3 of these). Two have a 195GB partiion
for zfs, the other has a 185GB parition.
My sole AFT drive is a Samsung HD204UI 2TB disk.
Here's the info from a pool creation test run, with the debugging printouts:
igor>blockdev --getsize64 /dev/sdc3
209884469760
igor>blockdev --getsize64 /dev/sdd3
199232732160

zpool create px1 /dev/sdc3
size: 209884469760
Defaulting to 4K blocksize (ashift=12) for '/dev/sdc3'
zpool create px2 /dev/sdd3
size: 199232732160
-------------------------

What other tests do you think I should do, given the limit set of disks
available to me?
Want me to send you the patchfile?
How about the debian patches? Want me to fix them up and send them to you,
too?

Regards,
Ray
**
There's no reason for all the complexity. "We don't have to be perfect,
we only need to be Good Enough." As well as "K.I.S.S" "Simple is Better"
Agreed. If you can come up with a simple heuristic, and can show that the
heuristic
(a) works well for AF disks
(b) doesn't degrade significantly for 512k disks
I'll be happy to test/accept the patch :)
GIT? patchfile?
Whatever floats your... It's not that I receive too many patches these
days. (If you're on github, let me mirror my personal repo there, so I can
learn about the pull-request workflow...: THERE, done<https://github.com/sehe/zfs-fuse>)
I suggest you make it work against maint or testing branches (matching
branches match the official repo<http://gitweb.zfs-fuse.net/?p=official;a=summary>
)
Then again, this will most likely rebase cleanly to all current branches.
Nobody ever took me up on it, though.
Haha. Nice story :)
Regards,
Seth
--
To visit our Web site, click on http://zfs-fuse.net/
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
sgheeren
2012-01-11 07:57:04 UTC
Permalink
What about these? Shouldn't they get pulled in to the "official"
zfs-fuse?
I'd have to see where you get that tarball, but I'm pretty sure it is
the `maint` branch already :)
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
Grizzly
2012-11-11 09:38:01 UTC
Permalink
Is it possible now to use zfs-fuse and set the ashift to 12? On Linux Mint
Debian with 0.7.0-8 i wanted to do

zpool create -o ashift=12 xuse mirror /opt/xuse-link/xuse-f1-110g.zfs
/opt/xuse-link/xuse-f3d-110g.zfs

(I have collected all pools in a common link folder - when I want to
replace a location I just change the symlink and after a scrub all is well
agein ;-) and got

property 'ashift' is not a valid pool property

so far I have only accomplished this:
- Start PC-BSD in a virtual machine with 4 virtual small file-based disks
- create an ashift-12 pool there
- convert the vdi's to raw image files
- import the small pool into zfs-fuse
- gradually replace the dummy files with larger ones or partitions or disk
to taste.
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
Emmanuel Anne
2012-11-13 14:54:33 UTC
Permalink
I don't understand why the patch for this was never posted... As the
original poster said it shouldn't be hard to do looking at the code, now
it's not easy to test.
I don't even see a way to get the ashift value from a created pool (zpool
get all has no ashift here).
But as he says, it can be forced to be 12 all the time without too much
trouble apparently...
Post by Grizzly
Is it possible now to use zfs-fuse and set the ashift to 12? On Linux Mint
Debian with 0.7.0-8 i wanted to do
zpool create -o ashift=12 xuse mirror /opt/xuse-link/xuse-f1-110g.zfs
/opt/xuse-link/xuse-f3d-110g.zfs
(I have collected all pools in a common link folder - when I want to
replace a location I just change the symlink and after a scrub all is well
agein ;-) and got
property 'ashift' is not a valid pool property
- Start PC-BSD in a virtual machine with 4 virtual small file-based disks
- create an ashift-12 pool there
- convert the vdi's to raw image files
- import the small pool into zfs-fuse
- gradually replace the dummy files with larger ones or partitions or disk
to taste.
--
To visit our Web site, click on http://zfs-fuse.net/
--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
ray vantassle
2012-11-23 18:37:25 UTC
Permalink
Except that if you are trying to add a new device to an existing pool
where the other drives are ashift of 8. As I discovered later on.
;-(
So you would still need a command-line option to force ashift 8.
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
Loading...