silent semantic changes with reiser4

Discussion:

silent semantic changes with reiser4

Christoph Hellwig

2004-08-24 20:25:21 UTC

After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems and have a slight potential to break
even in the kernel.

o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
- meaning of the -x permission. This one has different meanings on
directories vs files on UNIX systems. If we want to support
directories as files we'll probably have to find a way to work
around this.
- dentry aliasing. I can't find a formal guarantee in the code this
can't happen

o metafiles - ..metas as a magic name that's just taken out of the
namespace doesn't sound like a good idea. If we want this it should
be a VFS-level option and there should be a translation-layer to
xattrs. Not doing this will again confuse applications greatly that
expect uniform filesystem behaviour.

Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Lee Revell

2004-08-24 20:35:18 UTC

Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

So `find -type d' would list every file on the system?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Christoph Hellwig

2004-08-24 20:38:44 UTC

Post by Lee Revell

Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

So `find -type d' would list every file on the system?

the find I have here is using lstat and not open with O_DIRECTORY, so
no.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Lee Revell

2004-08-24 20:42:08 UTC

Post by Christoph Hellwig

Post by Lee Revell

Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

So `find -type d' would list every file on the system?

the find I have here is using lstat and not open with O_DIRECTORY, so
no.

Ugh, how embarrassing, I completely forgot about stat().

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jamie Lokier

2004-08-24 21:18:35 UTC

Post by Christoph Hellwig

Post by Lee Revell

Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

So `find -type d' would list every file on the system?

the find I have here is using lstat and not open with O_DIRECTORY, so
no.

The find-like program I use (called treescan) uses O_DIRECTORY as an
optimisation. It assumes that O_DIRECTORY will only open objects
which are directories and can be read using readdir().

However, if reiser4 returns d_type values, then it won't even attempt
an open on non-DT_DIR objects, and that's a better optimisation.
(reiserfs doesn't return d_type values, unfortunately).

So the list of files that treescan finds depends on whether reiser4
implements d_type.

This is nothing like a POSIX filesystem. You untar a tree, and then
listing it recursively shows extra things created by reiser4.

I quite like the principle, but because it's not like POSIX and
doesn't match some program's expectations, it's a problem in its
present form.

xattrs aren't a complete solution as you can't store structured data
in an xattr. For example, with reiser4's model, you can cd into a
.tar, .zip, .mp3 or .xml file and list the internal structure along
with the file's metadata. You can't do that with xattrs.

Programs exist which quite reasonably assume that when you create a
file, you can't opendir() the file, and recursive listings (like find,
ls -R et al.) won't automatically traverse into every file.

On the other hand, being able to enter a file in a directory-like way
allows structured representations of the contents to be accessed in
the very useful "everything's a file" way -- i.e. ordinary tools.

So here's a semantic proposal:

1. O_DIRECTORY won't open an ordinary file.
Corollary: opendir("file") won't open an ordinary file.

2. An ordinary file path followed by "/" won't open an ordinary file.
Corollary: opendir("file/") won't open an ordinary file.

This is because appending a trailing slash is an alternate
way for userspace to get the same results as O_DIRECTORY.

3. An ordinary file path followed by "/" _and_ one or more path
components will open the file as a directory and enter it.
Corollary: opendir("file/.") will open an ordinary file.

4. The type of "file/." shall be S_IFDIR, _not_ S_IFREG.
Corollary: stat("file/.") will return that it's a directory.

The intention here is that explicit requests to examine the
metadata or alternate structure representations of a file will
create such a view, but the view is only available if requested
explicitly.

When such a view is created, the results of stat(), O_DIRECTORY
and opendir() are absolutely consistent. This will minimise
confusion. Programs which recurse over a directory tree won't
look inside any of the files. However they can be explicitly
asked to recurse starting from a path inside a file: then
they'll recurse over a single file's metadata and structured data.

Regarding the problems of safe locking in the VFS. The VFS assumes
that directories are not hard linked: i.e. that they cannot appear at
more than one path in a filesystem. Files-as-directories breaks that.

However, VFS does support directories on multiple paths, using bind mounts.

So it wouldn't be out of the question if entering a file (as described
above) effectively auto-mounted a bind mount at that point.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jeff Garzik

2004-08-24 20:38:25 UTC

Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

Ouch.

I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

v***@parcelfarce.linux.theplanet.co.uk

2004-08-24 20:53:44 UTC

Post by Jeff Garzik

Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless

Ouch.
I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.

Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

v***@parcelfarce.linux.theplanet.co.uk

2004-08-24 21:22:32 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.

While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.

Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hans Reiser

2004-08-25 18:28:56 UTC

I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.

It will probably take about 6 weeks. Apologies for wasting your time
before that was done.

Hans

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.

While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.
Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Christoph Hellwig

2004-08-25 18:45:23 UTC

Post by Hans Reiser
I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.
It will probably take about 6 weeks. Apologies for wasting your time
before that was done.

I don't think you'll get anywhere with auditing. We need to write down
the semantics you want, define them at the VFS level and make sure
they're not conflicting with defined userspace semantics or kernel
assumptions.

I think you need to learn the basic distinction between the VFS layer
and a lowlevel filesystem driver.

Hans Reiser

2004-08-26 09:02:29 UTC

Post by Christoph Hellwig
I don't think you'll get anywhere with auditing. We need to write down
the semantics you want, define them at the VFS level and make sure
they're not conflicting with defined userspace semantics or kernel
assumptions.
I think you need to learn the basic distinction between the VFS layer
and a lowlevel filesystem driver.

How old are you? I thought you were the guy at Linux Tag with fashion
oriented hair who gave a talk on his XFS work? Did I confuse you with
someone else?

Hans

Hans Reiser

2004-08-25 19:53:28 UTC

I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus. Sigh.

Dear Christoph,

Let me see if I can summarize what you and your contingent are saying,
and if I misconstrue anything, let me know.;-)

You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

Making files into directories caused only two applications out of the
entire OS to notice the change, and that was because of a bug in what
error code we returned that we are going to fix. You think that was a
disaster; I think it was a triumph.

Now a cleanly architected filesystem with no attributes and just files
and directories that can do everything attributes are used for exists.
You don't want it to have the competitive advantage. Instead, you want
it to have its clean design excised until you have something that
duplicates it ready to go, and only then should it be allowed that users
will use the features of your competitor's filesystem which you
disdained implementing for so long.

Since you never studied or understood namespace design principles (or
you would not have created and supported xattrs), you want to rename it
to be called VFS, rewrite what we have done, and take over as the
maintainer, mangling its design in a committee clusterfuck as you go.

We have just implemented very trivial semantic enhancements of the FS
namespace, nothing like as ambitious as www.namesys.com/whitepaper.html
or WinFS, and you are already pissing your pants.

Is that a fair summary?

Eat my dust.

Hans

PS

I should of course qualify what I have said. The use of files and
directories in place of attributes is not a finished work. It has bugs,
sys_reiser4() does not yet work, and there are little features still
missing like having files readdir ignores.

Still, except for the bugs, what we have is usable, and there are a lot
of happy reiser4 users right now even with the bugs. It will need a
little bit more time, and then all the pieces will be in place.

PPS

If you implement your filesystems as reiser4 plugins, and rename
reiser4's plugin code to be called "vfs", your filesystems will go
faster. Not as fast as reiser4 though, because it has a better layout
and that affects performance a lot, but faster is faster.... See
www.namesys.com/benchmarks.html for details.

PPPS

Since we have such a performance lead, Namesys is about to change its
focus from the storage layer to semantics, look at
www.namesys.com/whitepaper.html for details. Semantic enhancements are
the important stuff, and finally Namesys is where we have all the
storage layer prerequisites done right, and the real work can begin.
The gap between us is about to widen further.

Post by Christoph Hellwig
After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems

if we leave you in the dust, run faster.... not my problem....

Post by Christoph Hellwig
Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.

If you can't help fight WinFS, then get out of the way. Namesys is on
the march. Read www.namesys.com/whitepaper.html.

Or, be smart, recognize that reiser4 is faster and more flexible than
your storage layers because we are older and wiser and worked harder at
it, join the team, and start contributing plugins that tap into the
higher performance it offers.

Microsoft tried to build a storage layer that could handle small objects
without losing performance, failed, and gave up at considerable cost to
their architecture and pocketbook.

We just broke a hole in the enemy line. You could come swarming through
it with us, but it sounds like you prefer complaining to HQ that we are
getting too far in front of you.

Matthew Wilcox

2004-08-25 20:06:48 UTC

That's a nice marketing talk. Get back to us when you have some technical
contribution to make.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

Hans Reiser

2004-08-26 08:41:47 UTC

Post by Matthew Wilcox
That's a nice marketing talk. Get back to us when you have some technical
contribution to make.

www.namesys.com/download.html

Christoph Hellwig

2004-08-25 20:08:59 UTC

Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles.

Actually in most of the discussion you simply didn't participate. While
xattrs might not be the nicest interface they have the advantag of not
breaking the SuS assumption of what directories vs files are, and they
do not break the Linux O_DIRECTORY semantics that are defined and need
to solve real-world races either.

Post by Hans Reiser
When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Post by Hans Reiser
Now a cleanly architected filesystem with no attributes and just files
and directories that can do everything attributes are used for exists.
You don't want it to have the competitive advantage. Instead, you want
it to have its clean design excised until you have something that
duplicates it ready to go, and only then should it be allowed that users
will use the features of your competitor's filesystem which you
disdained implementing for so long.

My competitors filesystem? If you look at MAINTAINERS I maintain only
vxfs and sysvfs, neither of which I'd suggest anyone to run their system
on.

Post by Hans Reiser
Since you never studied or understood namespace design principles (or
you would not have created and supported xattrs), you want to rename it
to be called VFS, rewrite what we have done, and take over as the
maintainer, mangling its design in a committee clusterfuck as you go.

Hans, please stop the personal crap or the black helicopters will kidnap
you. When was the last time you actually worked on kernel namespace
code instead of talking marketing bullshit and ignoring all real world
problems.

Post by Hans Reiser
If you implement your filesystems as reiser4 plugins, and rename
reiser4's plugin code to be called "vfs", your filesystems will go
faster. Not as fast as reiser4 though, because it has a better layout
and that affects performance a lot, but faster is faster.... See
www.namesys.com/benchmarks.html for details.

Could you pass on that crack pipe please?

Christoph Hellwig

2004-08-25 20:19:29 UTC

Btw, I just got reminded you might take my saying as an "piss of you're
idea stinks" or similar things. So let me clarify again the actual
technical and project managment issues another time before we start
getting really personal :)

Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.

So now go on and try to work together with the other peope doing VFS
level work instead of hiding, or if you think you can't work together
with us search a nice research OS where you can take over the VFS layer,
if your ideas prove to be good I'm sure Linux will pick them up sooner
or later.

Christoph

Linus Torvalds

2004-08-25 20:24:36 UTC

Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.

Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.

On the other hand, I think doing it inside the filesystem with ugly hacks
is an acceptable way to prototype the idea before it's been proven to
really be workable. Maybe it has more problems with legacy apps than we'd
expect..

Linus

Christoph Hellwig

2004-08-25 20:25:39 UTC

Post by Linus Torvalds
Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.
On the other hand, I think doing it inside the filesystem with ugly hacks
is an acceptable way to prototype the idea before it's been proven to
really be workable. Maybe it has more problems with legacy apps than we'd
expect..

Oh, I'm the last person to tell anyone how to prototype things. I just
don't want such inconsistancies in the mainline kernel.

v***@parcelfarce.linux.theplanet.co.uk

2004-08-25 20:59:57 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.

Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.

ACK. However, I'm still not seeing *ANYTHING* that would look like a workable
scheme in presense of hardlinks. Show me how to make that deadlock- and
race-free and we might very well do it in VFS.

_That_ is what's missing and it's needed no matter where it's implemented.
You want hybrid objects - you want to solve that one. So far I've seen
nothing workable.

Hans Reiser

2004-08-26 08:43:32 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.

Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.
On the other hand, I think doing it inside the filesystem with ugly hacks

what is ugly? ;-/

Post by Linus Torvalds
is an acceptable way to prototype the idea before it's been proven to
really be workable. Maybe it has more problems with legacy apps than we'd
expect..
Linus

Hans Reiser

2004-08-26 08:42:06 UTC

Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem.

VFS is the common filesystem layer. The only reason you think semantics
belong in the common filesystem layer is that you are not innovating in
your semantics, and feel content with stasis.

I don't. I expect that semantics will get radically changed over the
next few years as we compete with Giampaolo and whatever lesser lights
are working at Microsoft.

Post by Christoph Hellwig
Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.
So now go on and try to work together with the other peope doing VFS
level work instead of hiding, or if you think you can't work together
with us search a nice research OS where you can take over the VFS layer,
if your ideas prove to be good I'm sure Linux will pick them up sooner
or later.
Christoph

I tell you what, use xattrs for all the half speed filesystems, and the
users and I will use metafiles.

Christoph Hellwig

2004-08-26 09:24:14 UTC

Post by Hans Reiser

Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem.

VFS is the common filesystem layer. The only reason you think semantics
belong in the common filesystem layer is that you are not innovating in
your semantics, and feel content with stasis.
I don't. I expect that semantics will get radically changed over the
next few years as we compete with Giampaolo and whatever lesser lights
are working at Microsoft.

Hans, please stop the gooddamn personal attack bullshit.

How do you for example suggestion exporting your semantics over the
network if they're not done at the VFS level? How do you want some
clusterfilesystem support them or tmpfs?

Linus Torvalds

2004-08-25 20:22:55 UTC

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.

It's the UNIX way.

And yes, the semantics can _easily_ be solved in very unixy ways.

One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
you do is:

- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)

Problem solved. Very user-friendly, and very intuitive.

Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless. The argument that applications like "tar"
won't understand the file-as-directory thing is _flawed_, since legacy
apps won't understand xattrs either.

Oh, add a O_NOXATTRS flag to force a path lookup to only use regular
directories, the same way we have O_NOFOLLOW and friends. That allows
people to see the difference, if they care (ie a file server might decide
that it doesn't want to expose things like this).

I never liked the xattr stuff. It makes little sense, and is totally
useless for 99.9999% of everything. I still don't see the point of it,
except for samba. Ugly.

Linus

Christoph Hellwig

2004-08-25 20:35:49 UTC

Post by Linus Torvalds
And yes, the semantics can _easily_ be solved in very unixy ways.
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)
Problem solved. Very user-friendly, and very intuitive.

That would solve the O_DIRECTORY issue, the dentry aliasing still needs
work though with the semantics for link/unlink/rename.

Maybe Hans & you should start 2.7 to work this out? :)

Hans Reiser

2004-08-25 20:41:14 UTC

I just want to add that I AM capable of working with the other
filesystem developers in a team-player way, and I am happy to cooperate
with making portions more reusable where there is serious interest from
other filesystems in that, but Christoph is a puppy who has never
written or designed a major filesystem from scratch, and Nikita is a big
dog who has written stuff very few projects are lucky enough to see the
likes of, and when Christoph insults Nikita's code, or my design
guidance for that code, it is not going to bring out my good side. The
plugin and metafiles code needs many improvements, but Christoph does
not have the expertise to understand what those needed improvements are
because he hasn't invested the work into understanding the code.

Christoph is a bright and clever young fellow, who just hasn't had the
years of study of the field yet. I wish him well, and away.;-)

Hans

Chris Mason

2004-08-25 20:51:49 UTC

Post by Hans Reiser
I just want to add that I AM capable of working with the other
filesystem developers in a team-player way, and I am happy to cooperate
with making portions more reusable where there is serious interest from
other filesystems in that,

Prove it. Stop replying for today and come back tomorrow with some
useful discussions. Christoph suggested that some of the v4 semantics
belong in the VFS and therefore linux as a whole. He's helping you to
make sure the semantics and fit nicely with the rest of kernel
interfaces and are race free.

Take him up on the offer.

-chris

Markus Törnqvist

2004-08-25 20:58:40 UTC

Post by Hans Reiser
Christoph is a bright and clever young fellow, who just hasn't had the
years of study of the field yet. I wish him well, and away.;-)

I see this is as an opportunity where you can share some of your
experience to Cristoph and many others and work to get the semantics
into VFS.

Please make this work :)

--
mjt

Rik van Riel

2004-08-25 21:03:59 UTC

Post by Hans Reiser
other filesystems in that, but Christoph is a puppy who has never

It's not Christoph who's shown more bark than bite in this thread.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

Hans Reiser

2004-08-26 09:00:25 UTC

Post by Rik van Riel

Post by Hans Reiser
other filesystems in that, but Christoph is a puppy who has never

It's not Christoph who's shown more bark than bite in this thread.

The bite is at www.namesys.com/download.html

Hans

v***@parcelfarce.linux.theplanet.co.uk

2004-08-25 20:42:40 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.

Not if you allow link(2) on them. And not if you design and market your
stuff as a general-purpose backdoor into kernel. Note how *EVERY* *DAMN*
*OPERATION* is made possible to override by "plugins". Which is the reason
for deadlocks in question, BTW.

Don't fool yourself - that's what Hans is selling. Target market: ISV.
Marketed product: a set of hooks, the wider the better, no matter how
little sense it makes. The reason for doing that outside of core kernel:
bypassing any review and being able to control the product being sold (see
above).

Shame that it got an actual filesystem mixed in with the marketing plans
and general insanity...

Linus Torvalds

2004-08-25 21:00:01 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Linus Torvalds
It's the UNIX way.

Not if you allow link(2) on them.

Heh. I don't think that's a very strong argument against being "unixy",
considering how traditional unix _used_ to handle directories.

mkdir/rmdir/rename only came later. Now, obviously they did come later for
a good reason, but still..

The interesting part is that thanks to the dcache, we should be perfectly
able to actually _see_ circular links etc, so some of the problems with
linking directories should actually be quite solvable - something that is
_not_ true for a traditional UNIX VFS layer.

Of course, the dcache introduces some new problems of its own wrt
directory aliasing, but I don't actually think that should be fundamental
either. Treating them more as a "static mountpoint" from an FS angle and
less as a traditional Unix hardlink should be doable, I'd have thought.

(Also, it's entirely possible that the filesystem may not support some of
the more esoteric linking/renaming operations. For example, in a
traditional xattrs setup where the xattr is linked on-disk with the file
it is associated with, you simply _can't_ link it somewhere else, or
rename it to any other directory. That's not a VFS layer issue, obviously,
but I thought I'd bring up the point that file-as-dir cases may have
limitations that normal files don't have).

Post by v***@parcelfarce.linux.theplanet.co.uk
And not if you design and market your stuff as a general-purpose
backdoor into kernel.

Now that's a separate argument, and not one I'm personally interested in
arguing at least right now. I haven't actually looked at the reiser4 code,
so I'm really _only_ arguing against special-case attributes.

Linus

v***@parcelfarce.linux.theplanet.co.uk

2004-08-25 21:25:18 UTC

Post by Linus Torvalds
Of course, the dcache introduces some new problems of its own wrt
directory aliasing, but I don't actually think that should be fundamental
either. Treating them more as a "static mountpoint" from an FS angle and
less as a traditional Unix hardlink should be doable, I'd have thought.

Yeah, if we ditch the "mountpoints are busy and untouchable" stuff. Which
I'd love to, but it's a hell of a visible (and admin-visible) change.

FWIW, current deadlocks are unrelated to actual operation succeeding.
Look: we have sys_link() making sure that parent of target is a directory
(PATH_LOOKUP, in a "it has ->lookup()" sense), then locking target's parent,
then checking that it has ->link() (everyone on reiser4 does) and then
checking that source (old link to file) is *not* a directory (in S_ISDIR
sense). Then we lock source.

Note that currently it's OK - we get "all non-directories are always locked
after all directories". With filesystem that provides hybrid objects with
non-NULL ->link() it's not true and we are in deadlock country. Before
we get anywhere near fs code.

I'm not saying that this particular instance is hard to fix, but it wasn't
even looked at. All it would take is checking the description of current
locking scheme and looking through the proof of correctness (present in the
tree). That's the first point where said proof breaks if we have hybrids.
And it's what, about 4 screenfuls of text?

I have no problems with discussing such stuff and no problems with having it
merged if it actually works. But let's start with something better than
"let's hope nothing breaks if we just add such objects and do nothing else,
'cause hybridi files/directories are good, mmmkay?"

Jamie Lokier

2004-08-26 00:11:52 UTC

This message suggests a way to extend the VFS safe locking rules to
include files-as-directories.

Post by v***@parcelfarce.linux.theplanet.co.uk
Note that currently it's OK - we get "all non-directories are always locked
after all directories". With filesystem that provides hybrid objects with
non-NULL ->link() it's not true and we are in deadlock country. Before
we get anywhere near fs code.

Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Simply doing a path walk would lock the file and then cross the mount
point to a directory.

A way to ensure that preserves the lock order is to require that the
metadata is in a different filesystem to its file (i.e. not crossing a
bind mount to the same filesystem).

That has the side effect of preventing hard links between metadata
files and non-metadata, which in my opinion is fine.

Path walking will lock the file, and then lock the directory on a
different filesystem. Lock order is still safe, provided a strict
order is maintained between the two filesystems.

The strict order is ensured by preventing bind mounts which create a
path cycle containing a file->metadata edge. One way to ensure that
is to prevent mounts on the metadata filesystems, but the rule doesn't
have to be that strict. This condition only needs to be checked in
the mount() syscall.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

v***@parcelfarce.linux.theplanet.co.uk

2004-08-26 00:30:55 UTC

Post by Jamie Lokier
Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Yes - mountpoints can't be e.g. unlinked. Moreover, having directory
mounted on non-directory is also an interesting situation.

Post by Jamie Lokier
Simply doing a path walk would lock the file and then cross the mount
point to a directory.

*Ugh*

What would happen if you open that directory or chdir there? If it's
"underlying file stays locked" - we are in even more obvious deadlocks.

Post by Jamie Lokier
A way to ensure that preserves the lock order is to require that the
metadata is in a different filesystem to its file (i.e. not crossing a
bind mount to the same filesystem).
That has the side effect of preventing hard links between metadata
files and non-metadata, which in my opinion is fine.

We don't actually need a different fs - different vfsmount will do just fine.

Post by Jamie Lokier
The strict order is ensured by preventing bind mounts which create a
path cycle containing a file->metadata edge. One way to ensure that
is to prevent mounts on the metadata filesystems, but the rule doesn't
have to be that strict. This condition only needs to be checked in
the mount() syscall.

You really don't want to lock mountpoint on path lookup, so I don't see
how that would be relevant - it's a hell to clean up, for one thing
(I've crossed ten mountpoints on the way, when do I unlock them and
how do I prevent deadlocks from that?) Besides, different namespaces
can have completely different mount trees, so tracking down all that
stuff would be hell in its own right.

The main issue I see with all schemes in that direction (and something
like that could be made workable) is the semantics of unlink() on
mountpoints. *Especially* with users being able to see attributes of
files they do not own (e.g. reiser4 mode/uid/gid stuff). Ability to
pin down any damn file on the system and make it impossible to replace
is not something you want to give to any user.

Jamie Lokier

2004-08-26 01:00:49 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Jamie Lokier
Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Yes - mountpoints can't be e.g. unlinked. Moreover, having directory
mounted on non-directory is also an interesting situation.

Ok, so can we make it so mountpoints can be unlinked? :)

The mount would continue to exist, but with no name, until its last
user disappears.

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Jamie Lokier
Simply doing a path walk would lock the file and then cross the mount
point to a directory.

*Ugh*
What would happen if you open that directory or chdir there? If it's
"underlying file stays locked" - we are in even more obvious deadlocks.

I think the underlying file does not stay locked, and once you've
entered it as a directory, it can be unlinked.

If you have the directory open or chdir into it, then it _may_ have
the effect of keeping the file's storage allocated when you unlink it
-- just like when a file is unlinked while opened. As that is not a
user-visible property, it's a filesystem-specific implementation
detail as to whether it keeps the file's data in existence while the
metadata directories and/or files are open.

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Jamie Lokier
A way to ensure that preserves the lock order is to require that the
metadata is in a different filesystem to its file (i.e. not crossing a
bind mount to the same filesystem).
That has the side effect of preventing hard links between metadata
files and non-metadata, which in my opinion is fine.

We don't actually need a different fs - different vfsmount will do just fine.

Post by Jamie Lokier
The strict order is ensured by preventing bind mounts which create a
path cycle containing a file->metadata edge. One way to ensure that
is to prevent mounts on the metadata filesystems, but the rule doesn't
have to be that strict. This condition only needs to be checked in
the mount() syscall.

You really don't want to lock mountpoint on path lookup, so I don't see
how that would be relevant - it's a hell to clean up, for one thing
(I've crossed ten mountpoints on the way, when do I unlock them and
how do I prevent deadlocks from that?) Besides, different namespaces
can have completely different mount trees, so tracking down all that
stuff would be hell in its own right.

I didn't mean locking a chain of mountpoints, I meant the temporary
state where two dentries and/or inodes are locked, parent and child,
during a path walk. However I'm not very familiar with that part of
the VFS and I see that the current RCU dcache might not lock that much
during a path walk.

Post by v***@parcelfarce.linux.theplanet.co.uk
The main issue I see with all schemes in that direction (and something
like that could be made workable) is the semantics of unlink() on
mountpoints. *Especially* with users being able to see attributes of
files they do not own (e.g. reiser4 mode/uid/gid stuff). Ability to
pin down any damn file on the system and make it impossible to replace
is not something you want to give to any user.

I agree, users shouldn't be able to pin down a file.

I think unlink() should succeed on a file while something is visiting
inside its metadata directory.

It's a filesystem quality-of-implementation feature whether that
actually releases the file's data. It's a desirable feature because
one user shouldn't be able to pin another user's quota'd data if they
don't have permission to open the file, but if it's not implemented by
a filesystem then it doesn't break anything fundamental.

It's a semantics question whether unlinking a file makes the metadata
(i.e. "uid", "mode", "content-type" etc.) disappear at the same time,
or if the metadata stays around until the last visitor leaves it. A
filesystem might be able to keep the metadata in existence even if it
deletes the file's storage on unlink(), but it would be nice for the
VFS to declare which semantic is preferred.

One of the big potential uses for file-as-directory is to go inside
archive files, ELF files, .iso files and so on in a convenient way.
In those cases, if you open one of the virtually generated "archive
content" files, then you might expect the data to continue to exist
after the underlying file is unlinked. I think that's reasonable:
being inside an archive is very similar to having it open. There is
no quota pinning problem with this, because "archive content" files
should inherit permission restrictions from the underlying file. If
you can't read the file, then you can't read it's unpacked contents.

(reiser4 doesn't offer that last feature, but any of the myriad
userspace filesystem hooks could offer it if the VFS has approprate
auto-mounting file-as-directory hooks).

-- Jamie

v***@parcelfarce.linux.theplanet.co.uk

2004-08-26 03:13:47 UTC

Post by Jamie Lokier

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Jamie Lokier
Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Yes - mountpoints can't be e.g. unlinked. Moreover, having directory
mounted on non-directory is also an interesting situation.

Ok, so can we make it so mountpoints can be unlinked? :)

User-visible change of behaviour and IIRC a SuS violation on top of that.

Post by Jamie Lokier
I think the underlying file does not stay locked, and once you've
entered it as a directory, it can be unlinked.

So why lock it at all in that case?

Post by Jamie Lokier
I didn't mean locking a chain of mountpoints, I meant the temporary
state where two dentries and/or inodes are locked, parent and child,
during a path walk. However I'm not very familiar with that part of
the VFS and I see that the current RCU dcache might not lock that much
during a path walk.

Never had been needed on crossing mountpoints, actually.

Post by Jamie Lokier
I agree, users shouldn't be able to pin down a file.
I think unlink() should succeed on a file while something is visiting
inside its metadata directory.

See above. Again, the fundamental problem with that is allowing unlink
and friends on a mountpoint. I would love to do that, but it always
generated -EBUSY on all Unices. Linux got a bit more users and userland
code than Plan 9 - they can afford such changes, but...

And yes, from the kernel POV it's trivial to do - witness the MNT_DETACH
codepath in umount - it's much simpler than "normal" umount exactly because
it doesn't try to emulate old "it's busy, can't umount" behaviour.

With umount we could introduce "don't bother with that shit" flag. With
unlink() we would have to make that default behaviour to be useful.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hans Reiser

2004-08-26 08:49:15 UTC

Post by Jamie Lokier
One of the big potential uses for file-as-directory is to go inside
archive files, ELF files, .iso files and so on in a convenient way.

Yes, this was part of the plan, tar file-directory plugins would be cute.

Chris Wright

2004-08-26 01:13:26 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Jamie Lokier
Is this a problem if we treat entering a file-as-directory as crossing
a mount point (i.e. like auto-mounting)?

Yes - mountpoints can't be e.g. unlinked.

Could it be essentially MNT_DETACH'd?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Christophe Saout

2004-08-25 21:00:00 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.

Not if you allow link(2) on them.

That doesn't make sense anyway. (actually, I tried what happens and the
result was an Oops ;))

It should be completely forbidden to link into a meta-directory or out
of such a directory. You could think of those meta-directory as a sysfs
for that inode. Of course it's not an own filesystem and that means that
there need to be a lot of security precautions in the VFS layer. Where
something like that belongs anyway, if done correctly.

Post by v***@parcelfarce.linux.theplanet.co.uk
And not if you design and market your
stuff as a general-purpose backdoor into kernel. Note how *EVERY* *DAMN*
*OPERATION* is made possible to override by "plugins". Which is the reason
for deadlocks in question, BTW.

What do you mean? If you tell that file that you want it to be
compressed or encrypted or modify some attributes (like ACLs) this isn't
necessarily a backdoor.

Post by v***@parcelfarce.linux.theplanet.co.uk
Don't fool yourself - that's what Hans is selling. Target market: ISV.
Marketed product: a set of hooks, the wider the better, no matter how
bypassing any review and being able to control the product being sold (see
above).

Yes, I don't think it was a good idea either. Probably someone should
remove these features and make it a "normal" filesystem. The people who
need it now can turn it on again and a real solution could be worked out
in Linux 2.7.

I wouldn't use it on a public server anyway now because I'm not
convinced some malicious guy could find a way to exploit that. What if
you changed into a meta directory using ftp and some manage to break
things? This might be very dangerous.

I personally think that the idea of doing something like this (I'm not
speaking of the current implementation which I think is really bad) is
the right way to go in the long term.

Andrea Arcangeli

2004-08-25 22:59:33 UTC

Post by Christophe Saout
It should be completely forbidden to link into a meta-directory or out
of such a directory. [..]

agreed.

Post by Christophe Saout
Yes, I don't think it was a good idea either. Probably someone should

I personally would like plugins only if the API they use wouldn't allow
them to corrupt the underlying fs, I'm not sure if this is the case with
reiserfs4.

About the backwards compatibility, another option is to add a O_FILEDIR
and have bash learn about it when you cd into a file. No magic with the
slashes then.

Hans Reiser

2004-08-26 08:35:01 UTC

Post by Andrea Arcangeli

Post by Christophe Saout
It should be completely forbidden to link into a meta-directory or out
of such a directory. [..]

agreed.

Post by Christophe Saout
Yes, I don't think it was a good idea either. Probably someone should

I personally would like plugins only if the API they use wouldn't allow
them to corrupt the underlying fs, I'm not sure if this is the case with
reiserfs4.

You compile crap into the kernel, you are screwed. We have not and will
not change that.

Reiser4 plugins are not for end users to download from amazon.com, they
are for weekend hackers to send me a cool plugin for me to review,
assign a plugin id to, and send to Linus in the next release. Sometimes
being less ambitious works better, and dynamically loaded plugins would
have made the infrastructure too bulky to be fun and fast.

Hans Reiser

2004-08-26 08:43:46 UTC

Post by v***@parcelfarce.linux.theplanet.co.uk

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.

Not if you allow link(2) on them. And not if you design and market your
stuff as a general-purpose backdoor into kernel.

What backdoor? Please spell it out. Plugins are not dynamically
loadable.

Post by v***@parcelfarce.linux.theplanet.co.uk
Note how *EVERY* *DAMN*
*OPERATION* is made possible to override by "plugins". Which is the reason
for deadlocks in question, BTW.
Don't fool yourself - that's what Hans is selling. Target market: ISV.

Hunh? No, target market is hackers who like to spend a weekend dreaming
up funky new kinds of files. One guy (Jason Holt, clever guy) came up
with an idea for write only files for which even root cannot read the
parts of the file written prior to the time root was achieved, because
the encryption key is changed in a forward computable only direction
with every write, and the start key is kept on another computer. Lots
of folks will have plugins I would never dream of. Think of photoshop
plugins, and what plugins did for photoshop. Same thing will happen to
reiser4 next year (or earlier).

Post by v***@parcelfarce.linux.theplanet.co.uk
Marketed product: a set of hooks, the wider the better, no matter how
bypassing any review and being able to control the product being sold (see
above).
Shame that it got an actual filesystem mixed in with the marketing plans
and general insanity...

I am one of those free software hippy-child anarchists who thinks that
random people should come up with ideas and contribute them. You
understand me well.;-)

Matt Mackall

2004-08-25 21:52:17 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.

I thought the UNIX way is "everything's a file", not "everything's a
directory".

Post by Linus Torvalds
Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless.

There's always the option that they're both broken.

--
Mathematics is the supreme nostalgia of our time.

Linus Torvalds

2004-08-25 22:21:44 UTC

Post by Matt Mackall

Post by Linus Torvalds
It's the UNIX way.

I thought the UNIX way is "everything's a file", not "everything's a
directory".

It really was. Directories were historically largely just files too,
although with the special "lookup" operation.

Historic unix didn't have readdir/rmdir/mkdir/rename or really much _any_
special directory handling. Directories were just files, and you read them
like files.

Of course, even in that early unix, "directories" were very much a
reality even apart from the fact that they happened to be implemented
pretty much like files. Nobody has ever claimed that the UNIX way is
"everything is _one_ file", after all ;)

Post by Matt Mackall

Post by Linus Torvalds
Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless.

There's always the option that they're both broken.

Yes. Highly likely. However, something like that _does_ end up what a
Windows fileserver wants. IOW, even if it's broken, _something_ is likely
forced on us by that nasty thing we call "real users". Damn them.

Linus

Mikulas Patocka

2004-08-26 00:18:49 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.
And yes, the semantics can _easily_ be solved in very unixy ways.
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)
Problem solved. Very user-friendly, and very intuitive.

Stupid question: who will use it? And why?

Anyone can write an userspace library, that implements function
set_attribute(char *file, char *attribute, char *value), that creates
directory ".attr/file" in file's directory and stores attribute there.
(and you can get list of attributes from shell too:
ls `echo "$filename" |sed 's/\/\([^\/]*\)$/\/\.attr\/\1/'`
). There's no need to add extra functionality to kernel and filesystem.

Advantage:
- you don't add bloat to kernel or filesystem
- you don't need to teach tar/cp -a/mc about attributes
- you won't lose attributes after editing file in vim (it creates another
file and renames it over original one)

Post by Linus Torvalds
Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless. The argument that applications like "tar"
won't understand the file-as-directory thing is _flawed_, since legacy
apps won't understand xattrs either.

The only way xattrs are useful is that backup/restore software doesn't
have to know about every filesystem with it's specific attributes and
every magic ioctl for setting them. Instead it can save/restore
filesystem-specific attributes without understanding what do they mean.
However there's no need why application should use them. And no
application does.

I can't imagine anyone shipping an application with "this app requires
reiser4" prerequisite. Why should anyone use it if he can store attributes
in ".attr" directory or whereever and make the application work on any OS
and any filesystem?

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Linus Torvalds

2004-08-26 00:27:18 UTC

Post by Mikulas Patocka
Stupid question: who will use it? And why?
Anyone can write an userspace library, that implements function
set_attribute(char *file, char *attribute, char *value), that creates
directory ".attr/file" in file's directory and stores attribute there.
ls `echo "$filename" |sed 's/\/\([^\/]*\)$/\/\.attr\/\1/'`
). There's no need to add extra functionality to kernel and filesystem.

..and the above is, roughly, what I understand samba etc falls back on.

The problem ends up being that the above isn't in any way safe from people
moving files around (oops, where did those attributes go?) nor does it
have any consistency guarantees. So it only works well if _one_
application does this, and that application follows all the locking rules.

Is it enough? It may have to be.

Post by Mikulas Patocka
The only way xattrs are useful is that backup/restore software doesn't
have to know about every filesystem with it's specific attributes and
every magic ioctl for setting them. Instead it can save/restore
filesystem-specific attributes without understanding what do they mean.
However there's no need why application should use them. And no
application does.

If no application does, then why back them up? Why implement them in the
first place?

In other words - some apps obviously do want to use the. Sadly.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Mikulas Patocka

2004-08-26 00:51:22 UTC

Post by Linus Torvalds

Post by Mikulas Patocka
The only way xattrs are useful is that backup/restore software doesn't
have to know about every filesystem with it's specific attributes and
every magic ioctl for setting them. Instead it can save/restore
filesystem-specific attributes without understanding what do they mean.
However there's no need why application should use them. And no
application does.

If no application does, then why back them up? Why implement them in the
first place?
In other words - some apps obviously do want to use the. Sadly.

You can add more functionality to filesystem and use xattrs to control it.
For example:
- acls
- compress file
- encrypt file (copy user's password into task_struct and use it to
encrypt his files)
- preallocate file in 4MB contignuous chunks, becuase it needs real time
multimedia access
- sync/append-only/immutable
etc.
However there's no need why an application should care whether the file is
compressed, whether it has acls, or so. And applications don't.

And I think this is the only legitimate use for xattrs. Who else uses them
except samba? I don't see how reiser4's hybrids would help.

Mikulas

Hans Reiser

2004-08-26 08:36:43 UTC

Streams are quite ugly. However, if you decompose streams into all of
the little pieces that are needed to emulate them, the pieces are quite
nice.

For instance, inheriting stat data from a common parent is nice, and
inheritance is nice, and being able to cat dirname/pseudos/cat and get a
concatenation of all of the files is nice, and being able to cat
dirname/pseudos/tar and get an archive of the directory is nice, and,
well, if you decompose all of the features of streams into little
features you get a bunch of fun little features much nicer than streams.

Hans

Rik van Riel

2004-08-26 00:57:09 UTC

Post by Mikulas Patocka

Post by Linus Torvalds
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what

Stupid question: who will use it? And why?

I've got a stupid question too. How do you back up these
things ?

If your backup program reads them as a file and restores
them as a file, you might lose your directory-inside-the-file
magic.

If your backup program dives into the file despite stat()
saying it's a file and you restore your backup, how are the
"file is a file" semantics preserved ?

Obviously this is something that needs to be sorted out at
the VFS layer. A filesystem specific backup and restore
program isn't desirable, if only because then there'd be
no way for Hans's users to switch to reiser5 in 2010 ;)

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

Helge Hafting

2004-08-26 07:51:39 UTC

Post by Rik van Riel

Post by Mikulas Patocka

Post by Linus Torvalds
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what

Stupid question: who will use it? And why?

I've got a stupid question too. How do you back up these
things ?
If your backup program reads them as a file and restores
them as a file, you might lose your directory-inside-the-file
magic.
If your backup program dives into the file despite stat()
saying it's a file and you restore your backup, how are the
"file is a file" semantics preserved ?
Obviously this is something that needs to be sorted out at
the VFS layer. A filesystem specific backup and restore
program isn't desirable, if only because then there'd be
no way for Hans's users to switch to reiser5 in 2010 ;)

Sure, this sort of thing must be sorted out at the VFS layer.
And a backup program working on such a filesystem
will need to know that something can be a file, a directory - or both.

So an old "tar" won't get this right as it will assume that an object
is either file or directory. The change to get it right won't be
that big - just notice that an object is both, then backup the
ordinary file contents as usual, before recursing into the
directory it also provides and backup stuff there as usual.

The resulting .tar can of course only be unpacked properly
on a fs supporting file-as-directory, similiar to how a .tar of
a fs with links only will unpack properly on a fs supporting links.

I don't see much problems for userland. Old apps will keep working,
as the new features is a superset. Those who care about
file-as-directory extras will provide patches for "tar" and friends,
after that the extras become useable.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Paul Jackson

2004-08-26 09:21:37 UTC

Post by Helge Hafting
So an old "tar" won't get this right as it will assume that an object
is either file or directory.

There are many backup apps, not just one. I've written a few myself,
none of which will ever be worthy of notice. The sourceforge
Topic.System.Archiving.Backup lists 335 projects at present.

I find the idea that most backup tools and scripts will silently
stop working correctly to be pretty scary.

And then there's archiving, installation, distribution, administration,
emulation, file system and partition managers, and on and on.

===

I wonder if we can make this "modal" somehow.

The one consistency I see is that apps that want the "enhanced" view
need to ask for it, somehow. It is the new views of the data that are
being added - let the app announce to the kernel (usually via
specialized code in some shared library that the app is using to get the
alternate views) that either per-task, or per-file descriptor, it is to
see the "enhanced" view, as a side affect of trying to access it.

Old stuff, or even new stuff that is content to work with the "classic"
view that a file is a single data stream, and that directories only
have pathnames, not data, would by default see that view, and see
_all_ the data, presented somehow in that view, perhaps as additional
files with magic names.

This still leaves the breakage that such tools don't know, and don't
preserve, the magic linkage between such magic files. But that is
much less of an issue, in my view. Programs such as backups that are
manipulating the files of apps they know nothing about already have
to presume that all the files are important in inscrutable ways, and
just be careful to preserve or copy or backup all of them.

Yeah - I realize that there will be a few followups denouncing modal
architectures. I might even agree with some of them.

If this were easy, it would have been done years ago.

The onus should be on the new stuff to request the enhanced view,
rather than on the old dogs to learn new tricks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <***@sgi.com> 1.650.933.1373

Hans Reiser

2004-08-26 08:40:29 UTC

Post by Rik van Riel
If your backup program dives into the file despite stat()
saying it's a file and you restore your backup, how are the
"file is a file" semantics preserved ?
Obviously this is something that needs to be sorted out at
the VFS layer.

It needs to be sorted out, whether it is sorted out at the VFS layer is
unimportant.

Post by Rik van Riel
A filesystem specific backup and restore
program isn't desirable, if only because then there'd be
no way for Hans's users to switch to reiser5 in 2010 ;)

It might be that we need a filenameA/metas/backup method for all of our
file plugins, which if cat'd gives a set of instructions which if
executed are adequate for restoring filenameA.

Hans

Paul Jackson

2004-08-26 09:44:53 UTC

Post by Rik van Riel
If your backup program reads them as a file and restores
them as a file, you might lose your directory-inside-the-file
magic.

Encode the magic in the names, by stealing a bit of the existing
filename space to encode it.

Such works pretty well as part of the magic to map long filenames
into DOS 8.3 names on my FAT partitions.

Apps linked with the appropriate Windows library see nice fancy
long names.

The rest of the world, including DOS apps and my Unix backup
scripts, see the primitive 8.3 names, including one or a few
extra files per directory, which are nothing special to them.

So long as these other apps don't presume to know that they can
keep some of the files in an apps directory, and drop others, then
it works well enough. And no self-respecting general purpose
backup program is going to presume such knowledge anyway.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <***@sgi.com> 1.650.933.1373

Hans Reiser

2004-08-26 08:43:10 UTC

Post by Linus Torvalds

Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.

Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.
It's the UNIX way.
And yes, the semantics can _easily_ be solved in very unixy ways.
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)
Problem solved. Very user-friendly, and very intuitive.
Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless. The argument that applications like "tar"
won't understand the file-as-directory thing is _flawed_, since legacy
apps won't understand xattrs either.
Oh, add a O_NOXATTRS flag to force a path lookup to only use regular
directories, the same way we have O_NOFOLLOW and friends. That allows
people to see the difference, if they care (ie a file server might decide
that it doesn't want to expose things like this).

I think we should require people to care enough to supply an O_NOMETAS
flag to see the difference.

Post by Linus Torvalds
I never liked the xattr stuff. It makes little sense, and is totally
useless for 99.9999% of everything. I still don't see the point of it,
except for samba. Ugly.
Linus

Alex Zarochentsev

2004-08-25 20:35:17 UTC

Post by Christoph Hellwig

Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles.

Actually in most of the discussion you simply didn't participate. While
xattrs might not be the nicest interface they have the advantag of not
breaking the SuS assumption of what directories vs files are, and they
do not break the Linux O_DIRECTORY semantics that are defined and need
to solve real-world races either.

Reiser4 may have a mount option for whoose who like or have to use traditional
O_DIRECTORY semantics. There would be no /metas under non-directories, if user
wants that.

[ ... ]

--
Alex.

Christoph Hellwig

2004-08-25 20:51:49 UTC

Post by Alex Zarochentsev
Reiser4 may have a mount option for whoose who like or have to use traditional
O_DIRECTORY semantics. There would be no /metas under non-directories, if user
wants that.

Again, O_DIRECTORY was added to solve a real-world race, not just for
the sake of it. If you really want to add that option I'll research the
CAN number for you so you can named it after that - or just call it -o
insecure directly.

Jamie Lokier

2004-08-25 23:54:09 UTC

Post by Christoph Hellwig

Post by Alex Zarochentsev
Reiser4 may have a mount option for whoose who like or have to use
traditional O_DIRECTORY semantics. There would be no /metas under
non-directories, if user wants that.

Again, O_DIRECTORY was added to solve a real-world race, not just for
the sake of it. If you really want to add that option I'll research the
CAN number for you so you can named it after that - or just call it -o
insecure directly.

man open(2) explains that O_DIRECTORY is used by opendir() to prevent
blocking when opening pipes and certain devices*, and should only by
used by opendir (of course it isn't only used by opendir, as it's a
handy optimisation).

In fact O_DIRECTORY is also used by Glibc to optimise away stat()
before and fstat() after calls.

An O_NODEVICE flag would be equally secure, and more generally useful.

It's important that device nodes cannot be opened when O_DIRECTORY is
set. This is compatible with reiser4 file-as-directory semantics, but
I don't know if reiser4 actually implements this. If it does (and it
should) then there is no device blocking problem.

That leaves only the optimisation of fstat() in opendir().

But that begs the question: do we want opendir() to succeed on a reiser4 file?

It's up to us to decide if we like that semantic, or prefer a
different one such as the one I described before (path search enters
file-as-directory, but opendir() directly on it fails), or the Sun
syscalls someone mentioned.

-- Jamie

* Aside: O_NONBLOCK|O_NOCTTY is an effective way to prevent blocking on
many systems. It's best if you can avoid opening devices at all though.

Hans Reiser

2004-08-26 08:44:45 UTC

Post by Jamie Lokier

Post by Christoph Hellwig

Post by Alex Zarochentsev
Reiser4 may have a mount option for whoose who like or have to use
traditional O_DIRECTORY semantics. There would be no /metas under
non-directories, if user wants that.

Again, O_DIRECTORY was added to solve a real-world race, not just for
the sake of it. If you really want to add that option I'll research the
CAN number for you so you can named it after that - or just call it -o
insecure directly.

man open(2) explains that O_DIRECTORY is used by opendir() to prevent
blocking when opening pipes and certain devices*, and should only by
used by opendir (of course it isn't only used by opendir, as it's a
handy optimisation).
In fact O_DIRECTORY is also used by Glibc to optimise away stat()
before and fstat() after calls.
An O_NODEVICE flag would be equally secure, and more generally useful.
It's important that device nodes cannot be opened when O_DIRECTORY is
set. This is compatible with reiser4 file-as-directory semantics, but
I don't know if reiser4 actually implements this. If it does (and it
should) then there is no device blocking problem.
That leaves only the optimisation of fstat() in opendir().
But that begs the question: do we want opendir() to succeed on a reiser4 file?

Yes, we do. Why not? Not is more complex.....

Post by Jamie Lokier
It's up to us to decide if we like that semantic, or prefer a
different one such as the one I described before (path search enters
file-as-directory, but opendir() directly on it fails), or the Sun
syscalls someone mentioned.
-- Jamie
* Aside: O_NONBLOCK|O_NOCTTY is an effective way to prevent blocking on
many systems. It's best if you can avoid opening devices at all though.

Hans Reiser

2004-08-26 08:43:59 UTC

Post by Christoph Hellwig

Post by Alex Zarochentsev
Reiser4 may have a mount option for whoose who like or have to use traditional
O_DIRECTORY semantics. There would be no /metas under non-directories, if user
wants that.

Again, O_DIRECTORY was added to solve a real-world race, not just for
the sake of it.

Can you supply more details and we will try to reply concretely? Thanks.

Post by Christoph Hellwig
If you really want to add that option I'll research the
CAN number for you so you can named it after that - or just call it -o
insecure directly.

Jeremy Allison

2004-08-25 20:20:22 UTC

Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

I don't want to comment on any of the technical issues about VFS etc. as
I would be completely out of my depth, however I do want to say 2 things. Firstly,
this is a feature that Samba users have been needing for many years to maintain
compatibility with NTFS and Windows clients. Microsoft no longer sell any servers
or clients without support for multiple data streams per file, and their latest
XP SP2 code *does* use this feature. Whatever the kernel issues I'm really glad
that Hans and Namesys have created something we can use to match this
functionality - soon we will need it in order to be able to exist in
a Microsoft client-dominated world.

My second point is the following. Hans - did you *really* have to reinvent
the wheel w.r.t userspace API calls ? Did you look at this work (done in 2001
for Solaris) ?

http://bama.ua.edu/cgi-bin/man-cgi?fsattr+5
http://bama.ua.edu/cgi-bin/man-cgi?attropen+3C
http://bama.ua.edu/cgi-bin/man-cgi?openat+2

I'm complaining here as someone who will have to write portable code
to try and work on all these "files with streams" systems.

Jeremy.

Hans Reiser

2004-08-26 08:42:20 UTC

Post by Jeremy Allison

Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

I don't want to comment on any of the technical issues about VFS etc. as
I would be completely out of my depth, however I do want to say 2 things. Firstly,
this is a feature that Samba users have been needing for many years to maintain
compatibility with NTFS and Windows clients. Microsoft no longer sell any servers
or clients without support for multiple data streams per file, and their latest
XP SP2 code *does* use this feature. Whatever the kernel issues I'm really glad
that Hans and Namesys have created something we can use to match this
functionality - soon we will need it in order to be able to exist in
a Microsoft client-dominated world.

I agree that your work is important without agreeing that MS client
domination will last.;-) It is indeed my desire to give you every
single feature you need to emulate MS streams within files, but doing it
using directories that are files. I would like to support you in
emulating windows faster than windows.

Post by Jeremy Allison
My second point is the following. Hans - did you *really* have to reinvent
the wheel w.r.t userspace API calls ? Did you look at this work (done in 2001
for Solaris) ?

I interviewed for the file system architect job at Sun in, I think,
1999, and they offered me the job conditional on my giving up on my
Linux work. (After much trying and failing to convince them that it
would be okay for me to work on Linux also, I declined the job, much to
my fiscal loss and work satisfaction.)

They do not do a pure job of implementing attributes in the file
namespace though. There are far more distinctions between files and
attributes than are necessary that are described in these man pages
below, and those distinctions cause a loss of closure. I can say more
on that if asked.

Post by Jeremy Allison
http://bama.ua.edu/cgi-bin/man-cgi?fsattr+5
http://bama.ua.edu/cgi-bin/man-cgi?attropen+3C
http://bama.ua.edu/cgi-bin/man-cgi?openat+2
I'm complaining here as someone who will have to write portable code
to try and work on all these "files with streams" systems.
Jeremy.

Chris Mason

2004-08-25 20:22:14 UTC

Post by Hans Reiser
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus. Sigh.
Dear Christoph,
Let me see if I can summarize what you and your contingent are saying,
and if I misconstrue anything, let me know.;-)

Just for fun why don't we look at the way things are today:

1) reiser4 has semantics that do belong at the VFS level. They weren't
implemented at the VFS level for a variety of reasons, none of which
really matter right now.

2) new kernel patches that fragment the application developers between
apis are a bad thing. There does need to be one interface here, and it
is in Hans' best interest to unify his work by working with people to
introduce new kernel wide apis.

This starts with exactly what Christoph described in writing a short
summary of how you want things to work today. Since we can't resist,
we'll also go ahead and rehash all the old flame wars over this, but try
to include some new ideas about where you want to see the reiser4
interfaces in 6 months as well.

-chris

Hans Reiser

2004-08-26 08:42:50 UTC

Post by Chris Mason
This starts with exactly what Christoph described in writing a short
summary of how you want things to work today. Since we can't resist,
we'll also go ahead and rehash all the old flame wars over this, but try
to include some new ideas about where you want to see the reiser4
interfaces in 6 months as well.
-chris

Did Christoph read the www.namesys.com URL that has for several years
been hiding from him? ;-)

Hans

Christoph Hellwig

2004-08-26 09:36:45 UTC

Post by Hans Reiser
Did Christoph read the www.namesys.com URL that has for several years
been hiding from him? ;-)

Yes, and I'm sick of your marketing bullshit Hans.

Chris Friesen

2004-08-25 20:23:33 UTC

Post by Hans Reiser
Making files into directories caused only two applications out of the
entire OS to notice the change, and that was because of a bug in what
error code we returned that we are going to fix. You think that was a
disaster; I think it was a triumph.
Since we have such a performance lead, Namesys is about to change its
focus from the storage layer to semantics, look at
www.namesys.com/whitepaper.html for details. Semantic enhancements are
the important stuff, and finally Namesys is where we have all the
storage layer prerequisites done right, and the real work can begin.

Just curious about your comments on Jamie Lokier's suggestions for enabling
files-as-directories semantics without breaking existing apps.

Chris

Andrew Morton

2004-08-25 22:28:05 UTC

Post by Hans Reiser
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus.

Yes, but I didn't say "flame Christoph and ignore the issues" ;)

There are lots of little things to do with implementation, coding style,
module exports, deadlocks, what code goes where, etc. These are all normal
daily kernel business and we should set them aside for now and concentrate
on the bigger issues.

And as I see it, there are two big issues:

a) reiser4 extends the Linux API in ways which POSIX/Unix/etc do not
anticipate and

b) it does this within the context of just a single filesystem.

I see three possible responses:

a) accept the reiser4-only extensions as-is (possibly with post-review
modifications, of course) or

b) accept the reiser4-only extensions with a view to turning them into
kernel-wide extensions at some time in the future, so all filesystems
will offer the extensions (as much as poss) or

c) reject the extensions.

My own order of preference is b) c) a). The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

b) means that at some time in the future we need to hoist the reiser4
extensions (at a conceptual level) (and probably with restrictions) up into
the VFS. This will involve much thought, argument and work.

To get us started on this route it would really help me (and, probably,
others) if you could describe what these API extensions are in a very
simple way, without referring to incomprehehsible web pages, and without
using terms which non-reiser4 people don't understand.

It would be best if each extension was addressed in a separate email
thread.

We also need to discuss what a reiser4 "module" is, what its capabilities
are, and what licensing implications they have.

Then, we can look at each one and say "yup, that makes sense - we want
Linux to do that" and we can also think about how we would implement it at
the VFS level.

If we follow the above route I believe we can make progress in a technical
direction and not get deadlocked on personal/political stuff.

Now, an alternative to all the above is to just merge reiser4 as-is, after
addressing all the lower-level coding issues. And see what happens. That
may be a thing which Linus wishes to do. I'm easy.

Spam

2004-08-25 22:51:46 UTC

Post by Andrew Morton
a) reiser4 extends the Linux API in ways which POSIX/Unix/etc do not
anticipate and
b) it does this within the context of just a single filesystem.
a) accept the reiser4-only extensions as-is (possibly with post-review
modifications, of course) or
b) accept the reiser4-only extensions with a view to turning them into
kernel-wide extensions at some time in the future, so all filesystems
will offer the extensions (as much as poss) or
c) reject the extensions.
My own order of preference is b) c) a). The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

This last sentence makes me wonder. Where is Linux heading? The idea
that a FS cannot contain features that no other FS has is very
scary.

I am all for uniformity, but not at the expense of shutting down
advanced progress that Linux is so badly needing.

This talk about old UNIX seems like people want to still live in the
70'ies and not look forward. Please wake up!

~S

Christoph Hellwig

2004-08-25 22:51:15 UTC

Post by Spam
This last sentence makes me wonder. Where is Linux heading? The idea
that a FS cannot contain features that no other FS has is very
scary.
I am all for uniformity, but not at the expense of shutting down
advanced progress that Linux is so badly needing.
This talk about old UNIX seems like people want to still live in the
70'ies and not look forward. Please wake up!

Just because semantics are at the VFS layer doesn't mean every
filesystem has to implement them.

Linus Torvalds

2004-08-25 22:59:43 UTC

Post by Andrew Morton
My own order of preference is b) c) a). The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

This last sentence makes me wonder. Where is Linux heading? The idea
that a FS cannot contain features that no other FS has is very
scary.

That's not what Andrew said or meant.

Note the "cannot offer". As in "there is no way to offer them even if the
filesystem could support it otherwise".

We have tons of filesystems that do things other filesystems cannot do.
Most filesystems support writing to a file - despite the fact that some
filesystems (iso9600 being an obvious one) cannot. The infrastructure is
there in the VFS layer, and it becomes a _choice_ for the filesystem
whether it offers certain capabilities.

So look at what Andrew said, again: his top choice would be (b). Let's see
what that was again, shall we?

Post by Spam
b) accept the reiser4-only extensions with a view to turning them into
kernel-wide extensions at some time in the future, so all filesystems
will offer the extensions (as much as poss) or

In other words, if reiserfs does something special, we should make
standard interfaces for doing that special thing, so that everybody can
do it without stepping on other peoples toes.

That doesn't mean that we'd _force_ everybody to do it. The same way we
don't force iso9660 to write to a CD-ROM.

Linus

Spam

2004-08-25 23:19:35 UTC

Post by Linus Torvalds

Post by Andrew Morton
My own order of preference is b) c) a). The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

This last sentence makes me wonder. Where is Linux heading? The idea
that a FS cannot contain features that no other FS has is very
scary.

That's not what Andrew said or meant.
Note the "cannot offer". As in "there is no way to offer them even if the
filesystem could support it otherwise".
We have tons of filesystems that do things other filesystems cannot do.
Most filesystems support writing to a file - despite the fact that some
filesystems (iso9600 being an obvious one) cannot. The infrastructure is
there in the VFS layer, and it becomes a _choice_ for the filesystem
whether it offers certain capabilities.
So look at what Andrew said, again: his top choice would be (b). Let's see
what that was again, shall we?

Post by Spam
b) accept the reiser4-only extensions with a view to turning them into
kernel-wide extensions at some time in the future, so all filesystems
will offer the extensions (as much as poss) or

In other words, if reiserfs does something special, we should make
standard interfaces for doing that special thing, so that everybody can
do it without stepping on other peoples toes.

Agreed that would be the best. But how much time and effort will it
be, and how much of the original ideas would be lost on the way to
implement them in the VFS? Especially with new and very advanced
FS's like Reiser4.

Isn't the line between the actual file system and the virtual one
very hair thin? Where would the separation lay in Reiser4?

Post by Linus Torvalds
That doesn't mean that we'd _force_ everybody to do it. The same way we
don't force iso9660 to write to a CD-ROM.
Linus

I got caught in the moment of flame war. My appologies.

~S

Andrew Morton

2004-08-25 23:32:25 UTC

Post by Linus Torvalds
In other words, if reiserfs does something special, we should make
standard interfaces for doing that special thing, so that everybody can
do it without stepping on other peoples toes.

Agreed that would be the best. But how much time and effort will it
be

Zero.

We can add these new features tomorrow, as reiser4-only features, with a
plan in hand to generalise them later.

-->>__if__<<-- we think these are features which Linux should offer.

Jeremy Allison

2004-08-25 23:37:39 UTC

Post by Andrew Morton
We can add these new features tomorrow, as reiser4-only features, with a
plan in hand to generalise them later.
-->>__if__<<-- we think these are features which Linux should offer.

Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Jeremy.

Wichert Akkerman

2004-08-25 23:46:29 UTC

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Wichert.

--
Wichert Akkerman <***@wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.

Nicholas Miell

2004-08-26 00:42:21 UTC

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

As to how to do it, I think the Solaris interface is reasonably decent.
The overview is at http://docs.sun.com/db/doc/816-0220/6m6nkorp9?a=view

(An important detail for those who want to access their
multiple-data-streams from non-MDS aware apps is the runat shell
command, which basically does a chdir into the specified file's
attribute directory and then runs a command. i.e. 'runat ~/blah ls' will
list the ~/blah's attributes.)

The only real problem I have with their design is the calling them
attributes and using "at" everywhere.

"Attributes", because it will get confused with the current Linux xattr
implementation (which is still useful for things that actually are file
attributes, like security labels, ACLs, weird attributess that
FAT/NTFS/whatever have, etc.).

I don't like "at" because the API changes don't have anything to do with
the actual attributes. It's a general set of changes to allow paths
relative to a fd instead of the cwd, and doesn't really have anything
specifically to do with attributes (with the exception of the O_XATTR
flag).

Replace "at" with "rel" and O_XATTR with O_FORK or O_MULTI or something,
and it's all good.

Jamie Lokier

2004-08-26 01:03:55 UTC

Post by Nicholas Miell
Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

Additionally, all of those things you describe should be deleted if
the file is modified -- to indicate that they're no longer valid and
should be regenerated if needed.

Whereas there are some other kinds of metadata which should not be
deleted if the file is modified.

-- Jamie

Nicholas Miell

2004-08-26 01:26:47 UTC

Post by Jamie Lokier

Post by Nicholas Miell
Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

Additionally, all of those things you describe should be deleted if
the file is modified -- to indicate that they're no longer valid and
should be regenerated if needed.
Whereas there are some other kinds of metadata which should not be
deleted if the file is modified.
-- Jamie

Presumably the app which uses the metadata will be smart enough to
compare the st_mtime of the MDS/stream/attribute/whatever (can we choose
a name for these things now?) to the st_mtime of the file and do the
right thing.

thumbnail - regenerate it
summary - keep it, it's relatively independent of the file's exact
contents
signature - always verify it
etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jamie Lokier

2004-08-26 01:53:20 UTC

Post by Nicholas Miell

Post by Jamie Lokier
Additionally, all of those things you describe should be deleted if
the file is modified -- to indicate that they're no longer valid and
should be regenerated if needed.
Whereas there are some other kinds of metadata which should not be
deleted if the file is modified.
-- Jamie

Presumably the app which uses the metadata will be smart enough to
compare the st_mtime of the MDS/stream/attribute/whatever (can we choose
a name for these things now?) to the st_mtime of the file and do the
right thing.

[This is straying off the topic of files-as-directories].

st_mtime tests are weak. They break sometimes, and are not suitable
for strong data models such as transparently caching generated data
from a file's contents.

They're especially breakable if you change a file and then read it
within a second. Sometimes, more than a second.

As has been raised before, nanosecond timestamps (a) don't solve this
unless they're stored in the filesystem; (b) even then they will fail
one day, on a sufficiently fast box; (c) don't work when there are
changes to wall clock time.

They're fine where you don't care if the generated data is wrong sometimes.

If you want the _equivalent_ behaviour to opening and parsing the
file, but fast, then they're no good.

Modification serial numbers would work, though.

-- Jamie

Nicholas Miell

2004-08-26 02:02:43 UTC

Post by Jamie Lokier

Post by Nicholas Miell

Post by Jamie Lokier
Additionally, all of those things you describe should be deleted if
the file is modified -- to indicate that they're no longer valid and
should be regenerated if needed.
Whereas there are some other kinds of metadata which should not be
deleted if the file is modified.
-- Jamie

Presumably the app which uses the metadata will be smart enough to
compare the st_mtime of the MDS/stream/attribute/whatever (can we choose
a name for these things now?) to the st_mtime of the file and do the
right thing.

[This is straying off the topic of files-as-directories].
st_mtime tests are weak. They break sometimes, and are not suitable
for strong data models such as transparently caching generated data
from a file's contents.
They're especially breakable if you change a file and then read it
within a second. Sometimes, more than a second.
As has been raised before, nanosecond timestamps (a) don't solve this
unless they're stored in the filesystem; (b) even then they will fail
one day, on a sufficiently fast box; (c) don't work when there are
changes to wall clock time.
They're fine where you don't care if the generated data is wrong sometimes.
If you want the _equivalent_ behaviour to opening and parsing the
file, but fast, then they're no good.
Modification serial numbers would work, though.
-- Jamie

Remember that the Solaris attribute model is just another filesystem
tree rooted in a hidden directory associated with a regular file. If you
can come up with semantics that work in general for all files, that's
fine.

inodes that delete themselves when other inodes are changed creep me
out.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hans Reiser

2004-08-26 08:41:32 UTC

Post by Jamie Lokier

Post by Nicholas Miell
Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

Additionally, all of those things you describe should be deleted if
the file is modified -- to indicate that they're no longer valid and
should be regenerated if needed.
Whereas there are some other kinds of metadata which should not be
deleted if the file is modified.
-- Jamie

Yes, I agree.

Actually we plan to have a whole link taxonomy, and one expected feature
is that some links don't count towards the refcount needed to keep an
object in existence (For instance, the links between key words and text
documents, you don't want to have to explicitly unlink every keyword in
order to delete a document).

Jeremy Allison

2004-08-26 01:22:30 UTC

Post by Nicholas Miell
Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

Which is exactly how Windows apps have started to use streams.

Post by Nicholas Miell
As to how to do it, I think the Solaris interface is reasonably decent.
The overview is at http://docs.sun.com/db/doc/816-0220/6m6nkorp9?a=view

I agree. This is an interface Samba can live with I think. I
was thinking of implementing it anyway, just so I could piss
off the Linux kernel developers by saying "oh if you need proper
Windows semantics on Samba you have to use an advanced OS like
Solaris, Linux doesn't cut it" :-) :-).

Post by Nicholas Miell
The only real problem I have with their design is the calling them
attributes and using "at" everywhere.

Yep - they're different from xattrs. The easiest way to remember
this is that file streams are seekable and get a fd, xattrs aren't
and don't :-).

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Matt Mackall

2004-08-26 04:44:25 UTC

Post by Nicholas Miell

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.

While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.

A file is a stream of bytes.

--
Mathematics is the supreme nostalgia of our time.

Nicholas Miell

2004-08-26 05:09:08 UTC

Post by Matt Mackall

Post by Nicholas Miell

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.
While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.
A file is a stream of bytes.

"OMG! It breaks tar and email!!!" argument doesn't fly. Things break all
the time and are fixed. It's called progress.

cp, grep, cat, and tar will continue to work just fine on files with
multiple streams.

tar and cp will lose the extra streams until somebody fixes them, but
they lose ACLs and xattrs right now, and I don't hear anybody suggesting
that ACLs and xattrs be removed from the kernel because of this.

Fixing programs that do recursive filesystem traversals is a matter of a
glibc patch to nftw(3) and a modification to their option processing.

Holding back a useful feature because you don't want to upgrade
coreutils is just plain dumb.

(BTW, for email, multipart/parallel is a start, but a specific multipart
content type for multi-stream file attachments would probably be more
appropriate.)

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

James Morris

2004-08-26 05:17:16 UTC

Post by Nicholas Miell
(BTW, for email, multipart/parallel is a start, but a specific multipart
content type for multi-stream file attachments would probably be more
appropriate.)

I don't understand why something like this is so important that it needs
underlying VFS support. Multipart messages seem to have survived fine up
until now. Where is the justification?

- James

--
James Morris
<***@redhat.com>

Matt Mackall

2004-08-26 05:32:00 UTC

Post by Nicholas Miell

Post by Matt Mackall

Post by Nicholas Miell

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.
While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.
A file is a stream of bytes.

"OMG! It breaks tar and email!!!" argument doesn't fly. Things break all
the time and are fixed. It's called progress.

What it breaks is the concept of a file. In ways that are ill-defined,
not portable, hard to work with, and needlessly complex. Along the
way, it breaks every single application that ever thought it knew what
a file was.

Progress? No, this has been done before. Various dead operating
systems have done it or similar and regretted it. Most recently MacOS,
which jumped through major hurdles to begin purging themselves of
resource forks when they went to OS X. They're still there, but
heavily deprecated.

Post by Nicholas Miell
cp, grep, cat, and tar will continue to work just fine on files with
multiple streams.

Find some silly person with an iBook and open a shell on OS X. Use cp
to copy a file with a resource fork. Oh look, the Finder has no idea
what the new file is, even though it looks exactly identical in the
shell. Isn't that _wonderful_? Now try cat < a > b on a file with a
fork. How is that ever going to work?

I like cat < a > b. You can keep your progress.

--
Mathematics is the supreme nostalgia of our time.

Denis Vlasenko

2004-08-26 07:34:38 UTC

Post by Matt Mackall

Post by Nicholas Miell

Post by Matt Mackall
A file is a stream of bytes.

"OMG! It breaks tar and email!!!" argument doesn't fly. Things break all
the time and are fixed. It's called progress.

What it breaks is the concept of a file. In ways that are ill-defined,
not portable, hard to work with, and needlessly complex. Along the
way, it breaks every single application that ever thought it knew what
a file was.
Progress? No, this has been done before. Various dead operating
systems have done it or similar and regretted it. Most recently MacOS,
which jumped through major hurdles to begin purging themselves of
resource forks when they went to OS X. They're still there, but
heavily deprecated.

Post by Nicholas Miell
cp, grep, cat, and tar will continue to work just fine on files with
multiple streams.

Find some silly person with an iBook and open a shell on OS X. Use cp
to copy a file with a resource fork. Oh look, the Finder has no idea
what the new file is, even though it looks exactly identical in the
shell. Isn't that _wonderful_? Now try cat < a > b on a file with a
fork. How is that ever going to work?
I like cat < a > b. You can keep your progress.

cat <a >b does not preserve following file properties even on standard
UNIX filesystems: name,owner,group,permissions.

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Markus Törnqvist

2004-08-26 07:53:48 UTC

Post by Matt Mackall
What it breaks is the concept of a file. In ways that are ill-defined,
not portable, hard to work with, and needlessly complex. Along the
way, it breaks every single application that ever thought it knew what
a file was.

It breaks the concept of a file. In ways that offer more versatility,
challenge the imagination to make even better progress and keeps
Linux competing with competitors who are implementing this stuff
as we speak.

I for one would truly welcome the coming of thumbnails and descriptions
in picture files, because I have a real-life project going on where
that would be extremely handy to have in the actual file.
Were I any richer, I'd pay Namesys to have this work for me :)

Post by Matt Mackall
Find some silly person with an iBook and open a shell on OS X. Use cp
to copy a file with a resource fork. Oh look, the Finder has no idea
what the new file is, even though it looks exactly identical in the
shell. Isn't that _wonderful_? Now try cat < a > b on a file with a
fork. How is that ever going to work?

Then I guess OS X ships a broken implementation of cp, yes?

On the cat example, what if cat < a > b simply copies the "main stream"
and not the metadata, as a feature. The key being, "as a feature"

The metadata streams could get file descriptors of their own OR
another program, streamcat or something, could be written to compensate.

Post by Matt Mackall
I like cat < a > b. You can keep your progress.

With all due respect, I hope not too many people agree with you :)

--
mjt

Helge Hafting

2004-08-26 08:31:54 UTC

Matt Mackall wrote:
[...]

Post by Matt Mackall
What it breaks is the concept of a file. In ways that are ill-defined,
not portable, hard to work with, and needlessly complex. Along the
way, it breaks every single application that ever thought it knew what
a file was.
Progress? No, this has been done before. Various dead operating
systems have done it or similar and regretted it. Most recently MacOS,
which jumped through major hurdles to begin purging themselves of
resource forks when they went to OS X. They're still there, but
heavily deprecated.

Post by Nicholas Miell
cp, grep, cat, and tar will continue to work just fine on files with
multiple streams.

Find some silly person with an iBook and open a shell on OS X. Use cp
to copy a file with a resource fork. Oh look, the Finder has no idea
what the new file is, even though it looks exactly identical in the
shell. Isn't that _wonderful_?

It is what I'd expect. Now, use cp -R to copy the file
_with its directory_, and see if that fares better. If not - bad
implementation of fs and/or cp. The way I see file-as -directory
is that _file_ operations (like the reads issued by cat) only
work on the file part. You want the directory part? Use
directory operations such as those "cp -R" use.

Post by Matt Mackall
Now try cat < a > b on a file with a
fork. How is that ever going to work?

It is going to copy the _file_ part, of course. I wouldn't
expect anything else - "cat" doesn't deal with directories.

This also raise the question of when to use file-as-directory.
A usage where you need everything to follow the file, even
when using "cat" calls out for an application that puts everything
into one file. Directory-as-file is the wrong tool for that job, so don't
worry about such problems.

Sticking thumbnails in a file-as-dir is another story though. Move the
file (with "mv a b", not "cat a>b;rm a") moves the image file _and_ the
thumbnail. No time wasted on regenerating thumbnails, no disk space
or cleanup time wasted on dangling thumbs. Use the image file
for its intended purposes (view it, mail it off, serve it on the web,
edit it, print it) and the thumbnail doesn't get in the way
because it isn't embedded in the file format. Embedding it
in the file might work for jpeg which support generic embedded data,
it surely won't work for every image format out there.

This is my idea of how file-as-directory should work.

Helge Hafting

Helge Hafting

Paul Jackson

2004-08-26 06:53:58 UTC

Post by Nicholas Miell
"OMG! It breaks tar and email!!!" argument doesn't fly. Things break all
the time and are fixed. It's called progress.

Yes - we break things all the time. But we take notice, when we are
breaking things, of how long standing and deeply embedded they are.

The deeper the roots, the more respect we show it, and the harder
it will be to change.

Heaping scorn on someone who is reluctant to change something as
deeply embedded as "a file is a byte stream" does not further the
discussion.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <***@sgi.com> 1.650.933.1373

Jamie Lokier

2004-08-26 05:23:19 UTC

Post by Matt Mackall

Post by Nicholas Miell
Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.
While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.
A file is a stream of bytes.

I couldn't agree more. Metadata which is easily lost is a terrible
place to put a "document summary".

However, it's not a bad place to put thumbnails which are generated
from the file contents, if they can be regenerated after transporting.
It's slightly better than a ".thumbnails" directory because the latter
won't follow renames and things like that.

It's a very good place to put metadata which is semantically bound to
the file in special ways not intended for transport, such as security
attributes, auto-invalidated digests of the file's contents, an
auto-unpacked view of an archive, a virtual tree of an XML or other
structured file, or an auto-generated textual representation of a
binary file.

The key concept is "not intended for transport".

Generally that means permissions metadata, and things which are
re-generated from the file's contents on demand.*

It's a very bad place to put an abstract of a text document, or the
names of authors, or the MIME content type, or the character encoding
of the document, or the name of the shell to run the file as a script,
unless these things are also deducible from the file's contents.
However, sometimes the content doesn't give any clue, and then
something like the character encoding of a text file is usefully
stored in the metadata simply because that's better than nothing.

Of course the facility can be abused. What does Windows XP do? Does
it only store the appropriate kind of metadata in the alternate
streams? Even Windows XP users expect files transfered over HTTP or
WebDAV to work, don't they?

-- Jamie

* - By my definition, modification time arguably shouldn't be metadata.
Personally I like downloads to be stamped with their source's
modification time, which is why I don't use Mozilla's download
manager but cut-and-paste URLs to "wget -N" instead.

Helge Hafting

2004-08-26 08:16:28 UTC

Post by Matt Mackall

Post by Nicholas Miell

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.

Utilities that works on files only, such as cat, should keep
working. No problem there. If you cat a file that also is
a directory, then the file contents is all you get - by design.

Utilities that _only_ traverse the directory tree, such as find,
should keep working too. Perhaps with a very minor update
so they don't mistake a file-directory for a file only. I.e.
find must recurse into anything that support directory semantics.

Something that both recurse and operate on files (cp -a, tar, grep, ...
will need minor updating. They already knows how to handle
files and directories, now they will need an update so
they're open for objects that are both. I.e. let grep scan the file
as usual, then recurse into its directory part in the usual way too.
Performance problems should be avoided by not supporting
directory operations on files with no directory content, which I
believe will be many of them.

The "file-as-directory" thing will not be that useful before the tools
gets these relatively simple updates. Till then it'll be a toy, which
shouldn't
stop it from getting into the VFS. Updating the tools will be a task
for the file-as-dir fans.

(I don't know wether reiser4 does things this way - it is certainly
they way I would want file-as-directory though.)

Post by Matt Mackall
While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.

Stick the file in an attachment and you get the file only.
No problem, it is designed that way. An app that really
wants everything in a single file should use a file structured
for that, not file-as-dir. File-as-dir attach stuff to a file in a
more loose way.

If you want to attach the directory contents too, do what you usually do
when you want to mail someone a directory tree. You can't stick a
directory
in an attachment because it is not a file. So you either attach every
file in
the tree, or use tar. In this case, an updated tar.

The ftp server shouldn't be a problem. It supports both files and
directories already. It may need a minor update in order to
not mistake directory for file or vice-versa when someone
request an operation.

ftp> get filename #Will get you the contents of the file part only - by
design.
ftp> cd filename #Will change into the directory (if the file indeed
provides one.)

Post by Matt Mackall
A file is a stream of bytes.

Sure. And a file with a directory support both directory and file
operations.
You can get the stream of bytes as usual - that's the file part. Or you
can cd into the directory as usual. There isn't much overlap between
file operations and directory operations, so there is little conflict
the way
I see it. Merely letting the tools know that being a file no longer rule
out the possibilities of directory operations. Getting the VFS right
is another matter of course, but I don't worry about userland tools.

Helge Hafting

Spam

2004-08-26 09:51:38 UTC

Post by Matt Mackall

Post by Nicholas Miell

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Anything that currently stores a file's metadata in another file really
wants this right now. Things like image thumbnails, document summaries,
digital signatures, etc.

That is _highly_ debatable. I would much rather have my cp and grep
and cat and tar and such continue to work than have to rewrite every
tool because we've thrown the file-is-a-stream-of-bytes concept out
the window. Never mind that I've got thumbnails, document summaries,
and digital signatures already.

In Windows, the extra file streams are not lost or removed if you
use a program that doesn't support them. They are only lost if you
move the file to a file system that doesn't support the streams.

Even RAR support the NTFS file streams.

Post by Matt Mackall
While the number of annoying properties of files with forks is
practically endless, the biggest has got to be utter lack of
portability. How do you stick the thing in an attachment or on an ftp
site? Well you can't because it's NOT A FILE.
A file is a stream of bytes.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Spam

2004-08-26 09:40:28 UTC

Post by Wichert Akkerman

Post by Jeremy Allison
Multiple-data-stream files are something we should offer, definately (IMHO).
I don't care how we do it, but I know it's something we need as application
developers.

Aside from samba, is there any other application that has a use for
them?

Yes, for example documents, image files etc. The multiple data
streams can contain thumbnails, info about who is editing the file
(useful for networked files) etc. Could be used for version handling
and much more.

Also, just because we do not see all the benefits right now, doesn't
mean there won't be any.

~S

Post by Wichert Akkerman
Wichert.

Andrew Morton

2004-08-26 09:49:56 UTC

Post by Spam
Yes, for example documents, image files etc. The multiple data
streams can contain thumbnails, info about who is editing the file
(useful for networked files) etc. Could be used for version handling
and much more.

All of which can be handled in userspace library code.

What compelling reason is there for doing this in the kernel?

Anton Altaparmakov

2004-08-26 09:03:04 UTC

Post by Andrew Morton

Post by Linus Torvalds
In other words, if reiserfs does something special, we should make
standard interfaces for doing that special thing, so that everybody can
do it without stepping on other peoples toes.

Agreed that would be the best. But how much time and effort will it
be

Zero.
We can add these new features tomorrow, as reiser4-only features, with a
plan in hand to generalise them later.
-->>__if__<<-- we think these are features which Linux should offer.

Please don't forget that if the reiser4 features are merged as they are
now, then we will likely be stuck with the API reiser4 chooses. There
will be tools that will rely on it springing up no doubt.

Moving the reiser4 features to VFS later is fine and good, but what if
the VFS doesn't want the same API for those features? Either we would
have to allow reiser4 to continue providing the old API even though the
VFS now provides a new, shiny API or we would have to break all existing
API users on reiser4. Things like "I rebooted into the latest kernel
and my computer failed to boot because essential app FOO failed to
access the reiser4 API - Help!" spring to mind.

Yes, I know I am painting a rather black picture here and I know you
might well say "screw apps", its been done plenty of times in Linux
kernel development before...

Best regards,

Anton

--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

Hans Reiser

2004-08-26 08:44:32 UTC

Post by Andrew Morton
The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

This last sentence makes me wonder. Where is Linux heading? The idea
that a FS cannot contain features that no other FS has is very
scary.
I am all for uniformity, but not at the expense of shutting down
advanced progress that Linux is so badly needing.
This talk about old UNIX seems like people want to still live in the
70'ies and not look forward. Please wake up!
~S

well said.

Hans Reiser

2004-08-26 08:31:34 UTC

Post by Andrew Morton

Post by Hans Reiser
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus.

Yes, but I didn't say "flame Christoph and ignore the issues" ;)

Oh....;-)

Post by Andrew Morton
There are lots of little things to do with implementation, coding style,
module exports, deadlocks, what code goes where, etc. These are all normal
daily kernel business and we should set them aside for now and concentrate
on the bigger issues.

Yes, you are right, but I am not sure Viro will go along with that..... ;-)

Post by Andrew Morton
a) reiser4 extends the Linux API in ways which POSIX/Unix/etc do not
anticipate and
b) it does this within the context of just a single filesystem.
a) accept the reiser4-only extensions as-is (possibly with post-review
modifications, of course) or
b) accept the reiser4-only extensions with a view to turning them into
kernel-wide extensions at some time in the future, so all filesystems
will offer the extensions (as much as poss) or
c) reject the extensions.
My own order of preference is b) c) a).

I don't object to b), though I think b) should wait for 2.7 and reiser4
should not.

Post by Andrew Morton
The fact that one filesystem will
offer features which other filesystems do not and cannot offer makes me
queasy for some reason.

Andrew, we need to compete with WinFS and Dominic Giampaolo's filesystem
for Apple, and that means we need to put search engine and database
functionality into the filesystem. It takes 11 years of serious
research to build a clean storage layer able to handle doing that.
Reiser4 has done that, finally. None of the other Linux filesystems
have. The next major release of ReiserFS is going to be bursting with
semantic enhancements, because the prerequisites for them are in place
now. None of the other Linux filesystems have those prerequisites.
They won't be able to keep up with the semantic enhancements. This
metafiles and file-directories stuff is actually fairly trivial stuff.

Look guys, in 1993 I anticipated the battle would be here, and I build
the foundation for a defensive tower right at the spot MS and Apple are
now maneuvering towards. Help me get the next level on the tower before
they get here. It is one hell of a foundation, they won't be able to
shake it, their trees are not as powerful. Don't move reiser4 into vfs,
use reiser4 as the vfs. Don't write filesystems, write file plugins and
disk format plugins and all the other kinds of plugins, and you won't be
missing any expressive power that you really want....

Give Saveliev and I some credit. 10 years of hard work at an ivory
tower nobody thought mattered. Now the battle leaves the browser and
swings our way. Don't duplicate that infrastructure, use it.

There is so much we could use help with if talented people like you
chose to contribute.

Post by Andrew Morton
b) means that at some time in the future we need to hoist the reiser4
extensions (at a conceptual level) (and probably with restrictions) up into
the VFS. This will involve much thought, argument and work.
To get us started on this route it would really help me (and, probably,
others) if you could describe what these API extensions are in a very
simple way, without referring to incomprehehsible web pages,

what is not comprehensible....?

Post by Andrew Morton
and without
using terms which non-reiser4 people don't understand.

Well, I agree that there is value in defining things in more detail than
we have.

Post by Andrew Morton
It would be best if each extension was addressed in a separate email
thread.
We also need to discuss what a reiser4 "module" is, what its capabilities
are, and what licensing implications they have.
Then, we can look at each one and say "yup, that makes sense - we want
Linux to do that" and we can also think about how we would implement it at
the VFS level.
If we follow the above route I believe we can make progress in a technical
direction and not get deadlocked on personal/political stuff.
Now, an alternative to all the above is to just merge reiser4 as-is, after
addressing all the lower-level coding issues. And see what happens. That
may be a thing which Linus wishes to do. I'm easy.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Andrew Morton

2004-08-26 08:45:42 UTC

Post by Hans Reiser

Post by Andrew Morton
To get us started on this route it would really help me (and, probably,
others) if you could describe what these API extensions are in a very
simple way, without referring to incomprehehsible web pages,

what is not comprehensible....?

Pretty much anything at www.namesys.com. The amount of time which is
needed to extract the technical info which one is looking for vastly
exceeds a gnat-like attention span.

As a starting point, please prepare a bullet-point list of
userspace-visible changes which the filesystem introduces, or is planned to
introduce.

And describe the "plugin" system. Why does the filesystem need such a
thing (other filesystems get their features via `patch -p1')?

And what are the licensing implications of plugins? Are they derived
works? Must they be GPL'ed?

Hans Reiser

2004-08-26 09:24:41 UTC

Post by Andrew Morton

Post by Hans Reiser

Post by Andrew Morton
To get us started on this route it would really help me (and, probably,
others) if you could describe what these API extensions are in a very
simple way, without referring to incomprehehsible web pages,

what is not comprehensible....?

Pretty much anything at www.namesys.com. The amount of time which is
needed to extract the technical info which one is looking for vastly
exceeds a gnat-like attention span.
As a starting point, please prepare a bullet-point list of
userspace-visible changes which the filesystem introduces, or is planned to
introduce.
And describe the "plugin" system. Why does the filesystem need such a
thing (other filesystems get their features via `patch -p1')?

It takes 6 months or more to become competent to change a usual
filesystem. Creating a new reiser4 plugin is a weekend programmer fun
hack to do. Weekend programmers matter, because they tend to have
clever ideas based on understanding a need they have. How many people
can easily add new features to ext3 or reiserfs V3? Very few.

What happens if you need a disk format change?

Well, in V4, you can easily compose a plugin from plugin methods of
other plugins, write a little piece of code with the one thing you want
different, and add it in. Disk format changes, no big deal, add a new
disk format plugin, or a new item plugin, or a new node plugin, etc.,
and you got your new format.

There is a huge difference between code that is designed for reuse, and
code that is not. That is the difference between V3 and V4. We were
looking at our V3 balancing code, and it special cased each different
kind of item (an item is a piece of something, which the balancing code
chops objects into as it squeezes them into nodes). It looked like the
complexity of the balancing code was going to be N squared, where N was
the number of different kinds of items.

So, we created item handlers, and wrote balancing code that could slice
and dice and merge any item that implemented all of the operations
required of an item handler. From there it grew, and we made everything
pluggable.

Adding features to the new code is far less than the time cost of adding
features to the old code. I've seen Nikita complain that something
would take 6 weeks to do (changing key assignment algorithms), and then
it takes him 3 afternoons, and it was because of the plugins, because
when we did something similar in V3 it took 3 man months.

Post by Andrew Morton
And what are the licensing implications of plugins? Are they derived
works?

Yes.

Hans

Christoph Hellwig

2004-08-26 09:34:07 UTC

Post by Hans Reiser
Andrew, we need to compete with WinFS and Dominic Giampaolo's filesystem
for Apple, and that means we need to put search engine and database

Dou you know a nice thing? We (as in the Linux Community) don't have to
compete with anyone. Sure, we're usually trying to be better than
anyone else, but unlike companies under maret pressure we can wait until
something is ready.

Again, if you think you want your work in Linux cooperate with the Linux
Developers, and as you have seen in this thread there's a pretty broad
consensus that we want things working at the VFS (and actually working
of course)

Markus Törnqvist

2004-08-26 09:37:41 UTC

Post by Christoph Hellwig
Dou you know a nice thing? We (as in the Linux Community) don't have to
compete with anyone. Sure, we're usually trying to be better than
anyone else, but unlike companies under maret pressure we can wait until
something is ready.

It won't get ready unless it's made ready, yes?

And what about the time when people will move away from Linux because
Linux has nothing but crappy file systems to offer?
Sure, it's not that we'd have to compete, but...

Post by Christoph Hellwig
Again, if you think you want your work in Linux cooperate with the Linux
Developers, and as you have seen in this thread there's a pretty broad
consensus that we want things working at the VFS (and actually working
of course)

So move it into VFS. Neext! ;)

--
mjt

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

1256 Replies
3 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Christoph Hellwig 2004-08-24 20:25:21 UTC

Lee Revell 2004-08-24 20:35:18 UTC

Christoph Hellwig 2004-08-24 20:38:44 UTC

Lee Revell 2004-08-24 20:42:08 UTC

Jamie Lokier 2004-08-24 21:18:35 UTC

Jeff Garzik 2004-08-24 20:38:25 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-24 20:53:44 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-24 21:22:32 UTC

Hans Reiser 2004-08-25 18:28:56 UTC

Christoph Hellwig 2004-08-25 18:45:23 UTC

Hans Reiser 2004-08-26 09:02:29 UTC

Hans Reiser 2004-08-25 19:53:28 UTC

Matthew Wilcox 2004-08-25 20:06:48 UTC

Hans Reiser 2004-08-26 08:41:47 UTC

Christoph Hellwig 2004-08-25 20:08:59 UTC

Christoph Hellwig 2004-08-25 20:19:29 UTC

Linus Torvalds 2004-08-25 20:24:36 UTC

Christoph Hellwig 2004-08-25 20:25:39 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-25 20:59:57 UTC

Hans Reiser 2004-08-26 08:43:32 UTC

Hans Reiser 2004-08-26 08:42:06 UTC

Christoph Hellwig 2004-08-26 09:24:14 UTC

Linus Torvalds 2004-08-25 20:22:55 UTC

Christoph Hellwig 2004-08-25 20:35:49 UTC

Hans Reiser 2004-08-25 20:41:14 UTC

Chris Mason 2004-08-25 20:51:49 UTC

Markus Törnqvist 2004-08-25 20:58:40 UTC

Rik van Riel 2004-08-25 21:03:59 UTC

Hans Reiser 2004-08-26 09:00:25 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-25 20:42:40 UTC

Linus Torvalds 2004-08-25 21:00:01 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-25 21:25:18 UTC

Jamie Lokier 2004-08-26 00:11:52 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-26 00:30:55 UTC

Jamie Lokier 2004-08-26 01:00:49 UTC

v***@parcelfarce.linux.theplanet.co.uk 2004-08-26 03:13:47 UTC

Hans Reiser 2004-08-26 08:49:15 UTC

Chris Wright 2004-08-26 01:13:26 UTC

Christophe Saout 2004-08-25 21:00:00 UTC

Andrea Arcangeli 2004-08-25 22:59:33 UTC

Hans Reiser 2004-08-26 08:35:01 UTC

Hans Reiser 2004-08-26 08:43:46 UTC

Matt Mackall 2004-08-25 21:52:17 UTC

Linus Torvalds 2004-08-25 22:21:44 UTC

Mikulas Patocka 2004-08-26 00:18:49 UTC

Linus Torvalds 2004-08-26 00:27:18 UTC

Mikulas Patocka 2004-08-26 00:51:22 UTC

Hans Reiser 2004-08-26 08:36:43 UTC

Rik van Riel 2004-08-26 00:57:09 UTC

Helge Hafting 2004-08-26 07:51:39 UTC

Paul Jackson 2004-08-26 09:21:37 UTC

Hans Reiser 2004-08-26 08:40:29 UTC

Paul Jackson 2004-08-26 09:44:53 UTC

Hans Reiser 2004-08-26 08:43:10 UTC

Alex Zarochentsev 2004-08-25 20:35:17 UTC

Christoph Hellwig 2004-08-25 20:51:49 UTC

Jamie Lokier 2004-08-25 23:54:09 UTC

Hans Reiser 2004-08-26 08:44:45 UTC

Hans Reiser 2004-08-26 08:43:59 UTC

Jeremy Allison 2004-08-25 20:20:22 UTC

Hans Reiser 2004-08-26 08:42:20 UTC

Chris Mason 2004-08-25 20:22:14 UTC

Hans Reiser 2004-08-26 08:42:50 UTC

Christoph Hellwig 2004-08-26 09:36:45 UTC

Chris Friesen 2004-08-25 20:23:33 UTC

Andrew Morton 2004-08-25 22:28:05 UTC

Spam 2004-08-25 22:51:46 UTC

Christoph Hellwig 2004-08-25 22:51:15 UTC

Linus Torvalds 2004-08-25 22:59:43 UTC

Spam 2004-08-25 23:19:35 UTC

Andrew Morton 2004-08-25 23:32:25 UTC

Jeremy Allison 2004-08-25 23:37:39 UTC

Wichert Akkerman 2004-08-25 23:46:29 UTC

Nicholas Miell 2004-08-26 00:42:21 UTC

Jamie Lokier 2004-08-26 01:03:55 UTC

Nicholas Miell 2004-08-26 01:26:47 UTC

Jamie Lokier 2004-08-26 01:53:20 UTC

Nicholas Miell 2004-08-26 02:02:43 UTC

Hans Reiser 2004-08-26 08:41:32 UTC

Jeremy Allison 2004-08-26 01:22:30 UTC

Matt Mackall 2004-08-26 04:44:25 UTC

Nicholas Miell 2004-08-26 05:09:08 UTC

James Morris 2004-08-26 05:17:16 UTC

Matt Mackall 2004-08-26 05:32:00 UTC

Denis Vlasenko 2004-08-26 07:34:38 UTC

Markus Törnqvist 2004-08-26 07:53:48 UTC

Helge Hafting 2004-08-26 08:31:54 UTC

Paul Jackson 2004-08-26 06:53:58 UTC

Jamie Lokier 2004-08-26 05:23:19 UTC

Helge Hafting 2004-08-26 08:16:28 UTC

Spam 2004-08-26 09:51:38 UTC

Spam 2004-08-26 09:40:28 UTC

Andrew Morton 2004-08-26 09:49:56 UTC

Anton Altaparmakov 2004-08-26 09:03:04 UTC

Hans Reiser 2004-08-26 08:44:32 UTC

Hans Reiser 2004-08-26 08:31:34 UTC

Andrew Morton 2004-08-26 08:45:42 UTC

Hans Reiser 2004-08-26 09:24:41 UTC

Christoph Hellwig 2004-08-26 09:34:07 UTC

Markus Törnqvist 2004-08-26 09:37:41 UTC

about - legalese

Loading...