[OpenAFS-devel] openafs / opendfs collaboration

Discussion:

[OpenAFS-devel] openafs / opendfs collaboration

Tom Keiser

2005-01-18 21:46:14 UTC

Hi,

Given that OpenAFS and OpenDCE have been released under different licenses
(IPL versus LGPL), is there any chance of code sharing, or other forms of
collaboration? If there is possibility for code sharing, is it
bidirectional or unidirectional?

One particularly fruitful area for collaboration would be development of a
DFS client for modern OS's. OpenAFS could potentially provide a lot of
help for jumpstarting the necessary kernel support. For instance, m4
macros for modern OS support detection, an updated osi layer, afs_syscall
support for modern kernels, and I'm sure there's plenty more that I
haven't mentioned.

Secondly, I know this is a rather drastic proposal, but is it time to
consider splitting the cache manager out of individual filesystem clients?
If the interfaces are abstract enough, we should be able to have multiple
distributed fs's using the same cache manager API. Yes, there's tons of
little details to be worked out (e.g. credential management, access
control, etc.) , but it seems at least remotely feasible. If OpenAFS,
OpenDFS, and possibly NFSv4, could share a single cache management
codebase, that could drastrically reduce the duplication of effort. It
would also help reduce the amount of in-kernel code for which each
project is responsible. Anyone else think this is feasible?

Finally, I know Derrick Brashear has been talking about building
multiprotocol fileservers for a long time. This seems like a very
interesting project, and a great way to combine efforts. But, what
licensing impediments must be overcome to make this a reality?

--
Tom Keiser
***@psu.edu

Harald Barth

2005-01-20 14:13:46 UTC

For kernel dependent stuff, you might want to have a look at nnpfs
(from Arla). It has a very nice license, too.

Harald.

John S. Bucy

2005-01-20 16:57:49 UTC

On Tue, Jan 18, 2005 at 04:46:14PM -0500, Tom Keiser wrote:

> Secondly, I know this is a rather drastic proposal, but is it time to
> If the interfaces are abstract enough, we should be able to have multiple
> distributed fs's using the same cache manager API. Yes, there's tons of

IIRC, David Howells is working on -- or at least has been talking
about -- a generic cache manager for the Linux kernel.

john

Jeffrey Hutzelman

2005-01-21 01:24:23 UTC

On Tuesday, January 18, 2005 16:46:14 -0500 Tom Keiser <***@psu.edu>
wrote:

> Given that OpenAFS and OpenDCE have been released under different licenses
> (IPL versus LGPL), is there any chance of code sharing, or other forms of
> collaboration? If there is possibility for code sharing, is it
> bidirectional or unidirectional?

Disclaimer: I am not a lawyer, and the following is not legal advice.

In general, it's going to be tricky, because of certain terms in the IBM
Public License which render it not GPL-compatible. Particularly, the IPL
requires people who contribute code which implements a patent they hold to
also grant certain rights to use that patent to anyone to whom the combined
program is distributed (there are limitations on these rights; read the
license for details). The GPL does not contain this requirement, and does
not permit people distributing GPL'd code to add such a requirement.

Under the normal GPL, if you combine GPL'd code with other code, the
resulting combination can be distributed only under the terms of the GPL.
So incorporating a piece of IPL'd code into GPL'd software would result in
something you could not distribute, because both the IPL's patent rights
provisions and the GPL's prohibition of such changes would apply at the
same time. Similarly, you can't incorporate a piece of GPL'd code into
OpenAFS, for essentially the same reason.

The fact that OpenDCE was released under the LGPL makes the issue more
interesting, but not really easier. For one thing, the LGPL was really
intended to be used only for libraries, and some of its terms reflect that.
For example, it allows a derivative work to be prepared and distributed
under its terms, but says "The modified work must itself be a software
library."

The LGPL makes a distinction between works which contain parts of the
library or are modified versions of it, and things that merely use the
library, and apply different terms to them. Specifically, a program that
uses an LGPL'd library does not have to be distributed under the same terms
as the library. As a result, it is possible for an LGPL'd library to be
used by software covered by another license, including the IPL.

So, if OpenDCE includes a library that performs some interesting function,
it would be possible to add functionality to OpenAFS that requires that
library. It would be possible to compile the new code and distribute the
binaries, as long as the source code for the relevant parts of both
packages were made available. It would even be possible to include the
entire source for the library as part of the OpenAFS distribution (which
already includes other standalone components covered by other licenses).
And, it works in the reverse direction, too -- OpenDCE could use libraries
from OpenAFS.

What would not be possible would be to combine bits of code from each
distribution to produce a single work, because that combined work would be
covered by both licenses. So, picking up cache management code from
OpenAFS and dropping it into OpenDCE would not work.

That said, there are other interesting possiblities. OpenAFS has pretty
good CVS history, and in many cases we know exactly who contributed what
code. This is particularly true for things that are likely to be of the
most interest to OpenDCE early in its life, like configure tests, linux26
syscall stuff, and so on. I expect in many cases those authors would be
willing to contribute their code as-is to OpenDCE, under whatever license
is needed. There are already several cases where we share code with Arla
in this way.

> Secondly, I know this is a rather drastic proposal, but is it time to
> consider splitting the cache manager out of individual filesystem clients?

That _is_ drastic. Such a thing would need to be cross-platform and have a
clean, well-defined interface that could be used for multiple filesystems.
It would also be necessary to be pretty careful about the licensing issues,
so we don't end up with something we can't use.

Note I'm not saying this is a bad idea, just a lot of work.

> Finally, I know Derrick Brashear has been talking about building
> multiprotocol fileservers for a long time. This seems like a very
> interesting project, and a great way to combine efforts. But, what
> licensing impediments must be overcome to make this a reality?

Actually, I think this is actually pretty easy from a licensing standpoint.
Both AFS and DFS have fairly well-defined protocols, with multiple
implementations (some better than others). As long as the protocols are
well-enough documented, it should be possible to write independent,
interoperable implementations.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA

Derrick J Brashear

2005-01-21 14:00:59 UTC

On Tue, 18 Jan 2005, Tom Keiser wrote:

> Secondly, I know this is a rather drastic proposal, but is it time to
> consider splitting the cache manager out of individual filesystem clients?

It seems like Arla would probably have a better model for us all to follow
if we did so.

Matthew Miller

2005-01-21 15:28:03 UTC

On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:
> >Secondly, I know this is a rather drastic proposal, but is it time to
> >consider splitting the cache manager out of individual filesystem clients?
> It seems like Arla would probably have a better model for us all to follow
> if we did so.

Or on Linux, something based on FUSE, which is apparently now getting
merged.

--
Matthew Miller ***@mattdm.org <http://www.mattdm.org/>
Boston University Linux ------> <http://linux.bu.edu/>

Derrick J Brashear

2005-01-21 17:22:31 UTC

On Fri, 21 Jan 2005, Matthew Miller wrote:

> On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:
>>> Secondly, I know this is a rather drastic proposal, but is it time to
>>> consider splitting the cache manager out of individual filesystem clients?
>> It seems like Arla would probably have a better model for us all to follow
>> if we did so.
>
> Or on Linux, something based on FUSE, which is apparently now getting
> merged.

Arla's nnpfs is actually portable, one filesystem per platform sort of
sucks.

John S. Bucy

2005-01-21 18:55:41 UTC

On Fri, Jan 21, 2005 at 12:22:31PM -0500, Derrick J Brashear wrote:
> On Fri, 21 Jan 2005, Matthew Miller wrote:
>
> >On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:
> >>>Secondly, I know this is a rather drastic proposal, but is it time to
> >>>consider splitting the cache manager out of individual filesystem
> >>>clients?
> >>It seems like Arla would probably have a better model for us all to follow
> >>if we did so.
> >
> >Or on Linux, something based on FUSE, which is apparently now getting
> >merged.
>
> Arla's nnpfs is actually portable, one filesystem per platform sort of
> sucks.

David Howells' cachefs appears to be being merged in 2.6.11.

Here's the original post:
http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/1166.html

john

Alexander Boström

2005-01-28 14:45:08 UTC

fre 2005-01-21 klockan 12:22 -0500 skrev Derrick J Brashear:
> On Fri, 21 Jan 2005, Matthew Miller wrote:
>
> > On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:

> >> It seems like Arla would probably have a better model for us all to follow
> >> if we did so.
> >
> > Or on Linux, something based on FUSE, which is apparently now getting
> > merged.
>
> Arla's nnpfs is actually portable, one filesystem per platform sort of
> sucks.

But it lacks a nice libnnpfs that one can use to implement a filesystem.

Every OS should have some kind of userland filesystem interface. Linux
might get FUSE (and it might be adequate*), HURD has one, Dragonfly are
aiming at it and on some systems there's nnpfs already. On top of those
interfaces there could be a set of libraries implementing a common API
for all the platforms.

*) Last time I looked at FUSE the security model was: If the current uid
equals the owner of the mountpoint then forward the request to the
userland daemon, without any authentication information like for example
the current uid. This might have or could be changed though.

/abo

Luke Kenneth Casson Leighton

2005-01-30 03:30:20 UTC

On Fri, Jan 28, 2005 at 03:45:08PM +0100, Alexander Bostr?m wrote:
> fre 2005-01-21 klockan 12:22 -0500 skrev Derrick J Brashear:
> > On Fri, 21 Jan 2005, Matthew Miller wrote:
> >
> > > On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:
>
> > >> It seems like Arla would probably have a better model for us all to follow
> > >> if we did so.
> > >
> > > Or on Linux, something based on FUSE, which is apparently now getting
> > > merged.
> >
> > Arla's nnpfs is actually portable, one filesystem per platform sort of
> > sucks.
>
> But it lacks a nice libnnpfs that one can use to implement a filesystem.
>
> Every OS should have some kind of userland filesystem interface. Linux
> might get FUSE (and it might be adequate*), HURD has one, Dragonfly are
> aiming at it and on some systems there's nnpfs already. On top of those
> interfaces there could be a set of libraries implementing a common API
> for all the platforms.
>
> *) Last time I looked at FUSE the security model was: If the current uid
> equals the owner of the mountpoint then forward the request to the
> userland daemon, without any authentication information like for example
> the current uid. This might have or could be changed though.

as of 2.6.7-ish (last time i looked: 2.5 months) there was
no forwarding of security: in fact there was nothing in any of the
APIs about security at all: in fact, root as a user was banned (with
good justification iirc)

also, the xattr handling was (is?) non-existant and i had to
add it, but it was unsuitable for selinux, and that's a design
mismatch between fuse's way of communicating with its userspace
daemon (err -512 "please try later") and selinux's requirement
for instant answers (inability to cope with err -512)

so i started to look at lufs instead, which appeared to be a much
cleaner design.

lufs expects the userspace daemon to handle and manage inodes,
whereas fuse instead keeps an in-memory cache of inodes in
the userspace daemon, does a hell of a lot of extra fstat'ing
for you in order to guarantee file consistency, that sort of thing.

there is an API / library which your userspace daemon is expected to
use: this library handles the communication to the kernel and also it
handles the inode proxy redirection and cacheing for you.

lufs has a heck of a lot more examples available for it than fuse
does.

that all having been said, i don't think lufs's API has any
security handling either.

six of one, half a dozen of the other, but if you wanted fuse
to support selinux and any other form of security that involves
extended attributes, you would definitely have problems: i didn't get
round to evaluating lufs for selinux (ran out of time and money).

l.

--
<a href="http://lkcl.net">http://lkcl.net</a>
--

Miklos Szeredi

2005-01-30 11:13:04 UTC

> > *) Last time I looked at FUSE the security model was: If the current uid
> > equals the owner of the mountpoint then forward the request to the
> > userland daemon, without any authentication information like for example
> > the current uid. This might have or could be changed though.
>
> as of 2.6.7-ish (last time i looked: 2.5 months) there was
> no forwarding of security: in fact there was nothing in any of the
> APIs about security at all: in fact, root as a user was banned (with
> good justification iirc)

There are two choices for the security model in FUSE. The first
choice is that the userspace filesystem does the permission checking
in each operation. Current uid and gid is available, group list is
presently not.

The other choice is that the kernel does the normal file mode based
permission checking. Obviously in this case the filesystem can still
implement an additional (stricter) permission policy.

The "root banning" issue is in fact orthogonal to this. The default
operation is that only the user who mounted the filesystem is allowed
to access the contents. This behavior can be switched off with a
mount option, to allow access to all users.

> also, the xattr handling was (is?) non-existant and i had to add
> it,

Looking at the changelog it was added on 2004-03-30, so you must be
using a pretty outdated version.

> but it was unsuitable for selinux, and that's a design mismatch
> between fuse's way of communicating with its userspace daemon (err
> -512 "please try later") and selinux's requirement for instant
> answers (inability to cope with err -512)

Heh? Where did you see error value 512 (ERESTARTSYS)? It's not
something that the userspace daemon can return.

> so i started to look at lufs instead, which appeared to be a much
> cleaner design.

That's pretty subjective. Please back up your statement with concrete
examples, so maybe then I can do something about it.

> lufs expects the userspace daemon to handle and manage inodes,
> whereas fuse instead keeps an in-memory cache of inodes in
> the userspace daemon, does a hell of a lot of extra fstat'ing
> for you in order to guarantee file consistency, that sort of thing.

Well, how much "hell of a lot" actually is depends on a lot of things.
E.g. on whether the backed up filesystem is modified externally (not
just through the kernel). If not, then it will stay consistent
without any extra messaging. This can be set by a timeout parameter
for each looked up entry.

The extra flexibility offered by an inode based kernel interface
(FUSE) instead of a path based one (LUFS) I think outweighs the
disadvantage of having to once look up each path element.

> there is an API / library which your userspace daemon is expected to
> use: this library handles the communication to the kernel and also it
> handles the inode proxy redirection and cacheing for you.

Yes, useful for some filesystems (sshfs, ftpfs) useless for others. I
plan to add a generic caching layer to the FUSE library as well.

> lufs has a heck of a lot more examples available for it than fuse
> does.

In the LUFS package yes. However I bet, currently there are much more
applications which use FUSE than LUFS.

Thanks,
Miklos

Luke Kenneth Casson Leighton

2005-01-30 12:13:02 UTC

On Sun, Jan 30, 2005 at 12:13:04PM +0100, Miklos Szeredi wrote:
> > > *) Last time I looked at FUSE the security model was: If the current uid
> > > equals the owner of the mountpoint then forward the request to the
> > > userland daemon, without any authentication information like for example
> > > the current uid. This might have or could be changed though.
> >
> > as of 2.6.7-ish (last time i looked: 2.5 months) there was
> > no forwarding of security: in fact there was nothing in any of the
> > APIs about security at all: in fact, root as a user was banned (with
> > good justification iirc)
>
> There are two choices for the security model in FUSE. The first
> choice is that the userspace filesystem does the permission checking
> in each operation. Current uid and gid is available, group list is
> presently not.

> The other choice is that the kernel does the normal file mode based
> permission checking. Obviously in this case the filesystem can still
> implement an additional (stricter) permission policy.

if your users are okay with having to run a fuse-mount themselves,
that's okay [to have the kernel do the file mode checking]

the problem with that is that you can't have a "publicly accessible"
mount point like you do on an nfs server.

also if you have a completely different kind of file permission
checking system (which AFS and DFS do), you're stuffed.

> The "root banning" issue is in fact orthogonal to this. The default
> operation is that only the user who mounted the filesystem is allowed
> to access the contents. This behavior can be switched off with a
> mount option, to allow access to all users.
>
> > also, the xattr handling was (is?) non-existant and i had to add
> > it,
>
> Looking at the changelog it was added on 2004-03-30, so you must be
> using a pretty outdated version.

... Release 1.3 - 2004-07-14.

hm, error-at-memory-recall fault, redo from start...

> > but it was unsuitable for selinux, and that's a design mismatch
> > between fuse's way of communicating with its userspace daemon (err
> > -512 "please try later") and selinux's requirement for instant
> > answers (inability to cope with err -512)
>
> Heh? Where did you see error value 512 (ERESTARTSYS)? It's not
> something that the userspace daemon can return.

userspace no, kernel, yes.

the kernel-part of fuse tells any kernel-level callers to
"go away, come back later".

obviously this gives time for the kernel-part to "wake up" the
userspace daemon, obtain an answer, such that when the kernel-level
caller _does_ come back, the information is available.

the problem with using SELinux to obtain xattrs
"security.selinux" in order to perform security checks
is that the checking is done from in the kernel ITSELF
(security/hooks.c), not by a userspace function call RESULTING
in a kernel call.

therefore when you even attempt to _mount_ a selinux-enabled fuse
filesystem, hooks.c tests to see whether the filesystem supports
xattrs, gets this silly 512 (ERESTARTSYS) error message and goes "nope,
doesn't look like it does".

for various reasons, the details of which i am not aware of,
from what i can gather, getting selinux to support ERESTARTSYS
is tricky.

> > so i started to look at lufs instead, which appeared to be a much
> > cleaner design.
>
> That's pretty subjective. Please back up your statement with concrete
> examples, so maybe then I can do something about it.

i must apologise for not having sufficient time _at present_
to do that: please i would therefore as you to treat my
statement _as_ subjective until/unless demonstrated otherwise.

sorry about that.

> > lufs expects the userspace daemon to handle and manage inodes,
> > whereas fuse instead keeps an in-memory cache of inodes in
> > the userspace daemon, does a hell of a lot of extra fstat'ing
> > for you in order to guarantee file consistency, that sort of thing.
>
> Well, how much "hell of a lot" actually is depends on a lot of things.
> E.g. on whether the backed up filesystem is modified externally (not
> just through the kernel). If not, then it will stay consistent
> without any extra messaging. This can be set by a timeout parameter
> for each looked up entry.
>
> The extra flexibility offered by an inode based kernel interface
> (FUSE) instead of a path based one (LUFS) I think outweighs the
> disadvantage of having to once look up each path element.

mrr, yehhh... mmmm :)

what about a remote NTFS filesystem which supports NT Security
Descriptors, which are "inherited" where you not only don't
have the concept of inodes, but also due to the security
model, a client must look up every path element _anyway_
and perform a conglomeration of the "inheritance" parts of
the ACEs in each security descriptor of the path components?

:)

btw so people don't freak out too badly at that concept,
there _has_ existed for a couple of decades the concept of
"change notify" in remote NT filesystems, where the client
can watch for any significant changes on a filesystem, so you
don't have to end up re-reading all of the path components,
you can get the remote server to _tell_ you when they've
changed - cool, huh?

[btw, nt's change notify is what spurred linux kernel's inotify and
dnotify to be written]

in a nutshell, inodes is an optimisation from a unix
perspective: by providing an inode based interface, you are
burdening _all_ filesystem implementers with that concept.

l.

Miklos Szeredi

2005-01-30 12:40:35 UTC

> > There are two choices for the security model in FUSE. The first
> > choice is that the userspace filesystem does the permission checking
> > in each operation. Current uid and gid is available, group list is
> > presently not.
>
> > The other choice is that the kernel does the normal file mode based
> > permission checking. Obviously in this case the filesystem can still
> > implement an additional (stricter) permission policy.
>
> if your users are okay with having to run a fuse-mount themselves,
> that's okay [to have the kernel do the file mode checking]
>
> the problem with that is that you can't have a "publicly accessible"
> mount point like you do on an nfs server.
>
> also if you have a completely different kind of file permission
> checking system (which AFS and DFS do), you're stuffed.

No. Just fall back on the first option (permission checking in each
operation)

> > Looking at the changelog it was added on 2004-03-30, so you must be
> > using a pretty outdated version.
>
> ... Release 1.3 - 2004-07-14.
>
> hm, error-at-memory-recall fault, redo from start...

OK. From 1.2 onwards it was just a bugfix branch with the new
features going into the FUSE-2 release.

> > Heh? Where did you see error value 512 (ERESTARTSYS)? It's not
> > something that the userspace daemon can return.
>
> userspace no, kernel, yes.
>
> the kernel-part of fuse tells any kernel-level callers to
> "go away, come back later".
>
> obviously this gives time for the kernel-part to "wake up" the
> userspace daemon, obtain an answer, such that when the kernel-level
> caller _does_ come back, the information is available.

It doesn't do that and never did. ERESTARTSYS is only returned if the
operation is interrupted, and in that case the operation is restarted
from scratch, the answer to the old request is never used.

> the problem with using SELinux to obtain xattrs
> "security.selinux" in order to perform security checks
> is that the checking is done from in the kernel ITSELF
> (security/hooks.c), not by a userspace function call RESULTING
> in a kernel call.
>
> therefore when you even attempt to _mount_ a selinux-enabled fuse
> filesystem, hooks.c tests to see whether the filesystem supports
> xattrs, gets this silly 512 (ERESTARTSYS) error message and goes "nope,
> doesn't look like it does".
>
> for various reasons, the details of which i am not aware of,
> from what i can gather, getting selinux to support ERESTARTSYS
> is tricky.

Just disable signal delivery while calling FUSE, and it will never
return -ERESTARTSYS or -ERESTARTNOINTR.

> > The extra flexibility offered by an inode based kernel interface
> > (FUSE) instead of a path based one (LUFS) I think outweighs the
> > disadvantage of having to once look up each path element.
>
> mrr, yehhh... mmmm :)
>
> what about a remote NTFS filesystem which supports NT Security
> Descriptors, which are "inherited" where you not only don't
> have the concept of inodes, but also due to the security
> model, a client must look up every path element _anyway_
> and perform a conglomeration of the "inheritance" parts of
> the ACEs in each security descriptor of the path components?

There's not that much difference between the inode and the path model.
If you say each "path component" corresponds to an inode, you have
just solved this problem.

> btw so people don't freak out too badly at that concept,
> there _has_ existed for a couple of decades the concept of
> "change notify" in remote NT filesystems, where the client
> can watch for any significant changes on a filesystem, so you
> don't have to end up re-reading all of the path components,
> you can get the remote server to _tell_ you when they've
> changed - cool, huh?

Yes. And would be pretty easy to add support for this to the FUSE
interface. It's currently isn't there just because nobody demanded
it.

> [btw, nt's change notify is what spurred linux kernel's inotify and
> dnotify to be written]
>
>
> in a nutshell, inodes is an optimisation from a unix
> perspective: by providing an inode based interface, you are
> burdening _all_ filesystem implementers with that concept.

Yes. However I think the burden on performance (nothing else), is
justified by the better flexibility.

Thanks,
Miklos

Luke Kenneth Casson Leighton

2005-01-30 13:06:29 UTC

On Sun, Jan 30, 2005 at 01:40:35PM +0100, Miklos Szeredi wrote:

> > the kernel-part of fuse tells any kernel-level callers to
> > "go away, come back later".
> >
> > obviously this gives time for the kernel-part to "wake up" the
> > userspace daemon, obtain an answer, such that when the kernel-level
> > caller _does_ come back, the information is available.
>
> It doesn't do that and never did. ERESTARTSYS is only returned if the
> operation is interrupted, and in that case the operation is restarted
> from scratch, the answer to the old request is never used.

oh??

*confused* - well that's good, then! glad that's cleared up!

[must contact you again about this when i have time]

> > in a nutshell, inodes is an optimisation from a unix
> > perspective: by providing an inode based interface, you are
> > burdening _all_ filesystem implementers with that concept.
>
> Yes. However I think the burden on performance (nothing else), is
> justified by the better flexibility.

i understand.

l.

Luke Kenneth Casson Leighton

2005-01-21 15:49:19 UTC

On Fri, Jan 21, 2005 at 10:28:03AM -0500, Matthew Miller wrote:
> On Fri, Jan 21, 2005 at 09:00:59AM -0500, Derrick J Brashear wrote:
> > >Secondly, I know this is a rather drastic proposal, but is it time to
> > >consider splitting the cache manager out of individual filesystem clients?
> > It seems like Arla would probably have a better model for us all to follow
> > if we did so.
>
> Or on Linux, something based on FUSE, which is apparently now getting
> merged.

i played with fuse and it does some rather croo-joze inode cacheing
inside the kernel, and maintains a mapping of the inodes on your behalf
so that the userspace program doesn't have to worry about inodes.

lufs on the other hand gleefully lets _you_ manage the inodes in
userspace *cheerful* which of course if you get _wrong_...

anyway...

l.

Garrett Wollman

2005-01-24 05:09:06 UTC

<<On Fri, 21 Jan 2005 09:00:59 -0500 (EST), Derrick J Brashear <***@dementia.org> said:

> It seems like Arla would probably have a better model for us all to follow
> if we did so.

Indeed. Arla worked almost out-of-the-box on FreeBSD/amd64, to the
extent that I'm probably not going to bother working on the FreeBSD
port of OpenAFS.

-GAWollman

Tom Keiser (Tom Keiser)

2005-01-21 21:56:16 UTC

Ivan,

On Fri, 21 Jan 2005 10:27:33 +0100, Ivan Popov <***@medic.chalmers.se> wrote:
> Hi Tom!
>
> On Tue, Jan 18, 2005 at 04:46:14PM -0500, Tom Keiser wrote:
> > Secondly, I know this is a rather drastic proposal, but is it time to
> > consider splitting the cache manager out of individual filesystem clients?
>
> What do you call a filesystem client and a cache manager in this context?
>

I'm (roughly) thinking of clients such as OpenAFS and OpenDFS as
several interacting components:

cache manager:
- responsible for storage of data and metadata
- responsible for cache replacement strategy
- API for interaction with implementation of VFS interface
- API for access by RPC endpoint for things like cache invalidation

credential manager:
- example would be the linux 2.6 keyring implementation

implementation of VFS interface:
- very os-specific stuff
- probably in-kernel unless something like LUFS takes off

RPC endpoint:
- listener for cache invalidations, etc.

RPC client library:
- client stub library

fs-specific syscall:
- PAG management, etc.

This is still an oversimplified view (where to put things like fsprobes?).

> I am afraid that different people (including myself) may think about
> very different things.
>
> > If the interfaces are abstract enough, we should be able to have multiple
> > distributed fs's using the same cache manager API.
>
> Do you mean any besides AFS and DFS?
>

These two are the most obvious. It's less clear whether other
filesystems would actually benefit from a cache manager complex enough
to handle AFS and DFS. It comes down to whether more lightweight
filesystems would benefit from a cache manager that sacrifices some
performance for caching aggressiveness. However, there's nothing to
preclude use of a pluggable algorithm or tunables to set what tradeoff
is desired.

> > help reduce the amount of in-kernel code for which each
> > project is responsible. Anyone else think this is feasible?
>
> Do you mean in-kernel cache management? Then probably no.
> Both filesystems and kernels are of great variety.
>

This is an argument best left for another day. Suffice it to say, I
don't think supporting M in-kernel filesystems on N os's is a
sustainable model. The less we depend on the subtle nuances of each
kernel's API, the better our chances of survival.

> If you mean a more general "cache bookkeeping library", then possibly yes,
> but still you'll get differences depending on how FSs and OSs distribute
> functionality between kernel and user space in a filesystem client.
>

This is what I was proposing in my initial post. Distributed
filesystems can benefit from an in-memory cache, but a larger cache
that survives reboots is often more appealing. Unfortunately,
utilizing os-specific cache tools is just going to increase autoconf
complexity, and produce even more ifdef soup. FS's like AFS and DFS
are so complex that we must have a common client codebase across
platforms. So, a cross-platform cache library that uses something
like the osi api for interaction with the rest of the kernel sounds
more feasible. I don't see the one-OS vision of many linux supporters
becoming a reality for several more years. So, instead I'm advocating
something that sacrifices performance for OS agnosticism (sounds a bit
like the ARLA philosophy...).

> If you mean the upcall interface (a common kernel module for different
> filesystems), then probably no - it reflects both the corresponding filesystem
> semantics and the corresponding kernel architecture...
>

I agree that the upcall interface will probably never be common. The
only way we could ever get there is the emergence of a
high-performance, cross-platform userspace filesystem API. Then maybe
we wouldn't feel compelled to put everything but the kitchen sink in
kernel-space ;)

> Though, less demanding filesystems can be happy with "foreign" kernel
> modules - like podfuk-smb or davfs2 using the Coda module.
>

While I was not trying to advocate a userspace implementation, I don't
think such an option should be ignored. But, I'm one of the last few
hold-outs who like the elegance of the microkernel architecture.
Crossing the kernelspace/userspace boundary can be optimized. If you
want speed and parallelism, the userspace/kernelspace boundary could
be crossed using something like asynchronous message queues. Granted,
there's not much reason for hope right now, but it sure would make
everyone's lives easier if a good userspace filesystem driver API
existed on multiple platforms. Yes, it will always be slower than
running in-kernel, but the reduction in maintenance to keep up with
rapidly changing kernel APIs should free up more people's time to work
on a better cache manager. Not to mention, debugging and profiling
userspace code is soooo much easier.

Regards,

--
Tom

Ivan Popov

2005-01-22 16:56:53 UTC

Thanks Tom,

now I understand your point better.

On Fri, Jan 21, 2005 at 04:56:16PM -0500, Tom Keiser wrote:
> cache manager:
> - responsible for storage of data and metadata
> - responsible for cache replacement strategy
> - API for interaction with implementation of VFS interface
> - API for access by RPC endpoint for things like cache invalidation

On Coda list we have had a small discussion about splitting out cache space
reclaimation from the (user space) cache manager.
It seems feasible, and possible in a portable way.
I doubt though that a similar approach would work for AFS or DFS as their
cache is a lot more complicated.

> credential manager:
> - example would be the linux 2.6 keyring implementation

<skippable>
I think all that keyring business is a try to "work around" a basic
Unix principle - that credentials are tied to an UID, that's it.
It can work as a conveniency feature, but it just hides the fact -
if you need separate security domains for different processes, they
have to be of different UIDs, otherwise the system cannot protect them
from each other (except possibly for special cases like jailed/chrooted).
The rights management in a Unix-like system is tied to UIDs, not to PAGs
or alike. We'd have to redesign all syscalls to be able to securely use PAGs.
</skippable>

> implementation of VFS interface:
> - very os-specific stuff
> - probably in-kernel unless something like LUFS takes off

LUFS seems to mean "Linux Userland File System" which does not promise a lot
about portability?

> I'm advocating
> something that sacrifices performance for OS agnosticism (sounds a bit
> like the ARLA philosophy...).

Me too. It may actually improve performance, as one invests development
time into doing the real things instead of catching up with all kernels
specialities.

> there's not much reason for hope right now, but it sure would make
> everyone's lives easier if a good userspace filesystem driver API
> existed on multiple platforms. Yes, it will always be slower than

You are so right.
Alas, the world in far from being perfect...

Regards,
--
Ivan

Luke Kenneth Casson Leighton

2005-01-22 18:48:11 UTC

On Sat, Jan 22, 2005 at 05:56:53PM +0100, Ivan Popov wrote:
> Thanks Tom,
>
> now I understand your point better.
>
> On Fri, Jan 21, 2005 at 04:56:16PM -0500, Tom Keiser wrote:
> > cache manager:
> > - responsible for storage of data and metadata
> > - responsible for cache replacement strategy
> > - API for interaction with implementation of VFS interface
> > - API for access by RPC endpoint for things like cache invalidation
>
> On Coda list we have had a small discussion about splitting out cache space
> reclaimation from the (user space) cache manager.
> It seems feasible, and possible in a portable way.
> I doubt though that a similar approach would work for AFS or DFS as their
> cache is a lot more complicated.
>
> > credential manager:
> > - example would be the linux 2.6 keyring implementation
>
> <skippable>
> I think all that keyring business is a try to "work around" a basic
> Unix principle - that credentials are tied to an UID, that's it.

that's only relevant _if_ you are accessing files - and if you are
accessing them via the POSIX subsystem.

if you run the entire file server out of, say, a database (e.g. like
Apache Subversion / WebDav) and also the security and the concept of
the user is managed independently of the POSIX / Unix idea of security,
who _gives_ a monkeys about UIDs - they're completely and utterly
irrelevant.

which is why it's so much simpler, cleaner and just not so much of a
pig if everything client-side is done in userspace [KDE / Gnome
filesystem plugin...]

where UIDs become relevant is when you start messing about with trying
to present files from one Unix filesystem in a consistent manner on
another Unix workstation.

so then, all parties - all file servers and all workstations, and all
processes accessing the same files - need to have the same view of the
world: a distributed UID database.

DFS has all that - from what i can gather - via CDS - cell directory
services, and someone has made an effort to write a PAM plugin for DCE
CDS, and an nsswitch module, etc. which "gives" you a consistent view
of your UID database across all workstations in the same "domain".

An attempt was made to do the same thing with Samba, with a program
called "winbindd", but unfortunately, the people who were in control of
writing it did not take into account the need to "distribute" the UIDs
consistently across multiple unix workstations.

consequently, with winbindd, you can join Unix workstations to
an NT Domain, you can run pam_winbindd and nsswitch_winbindd,
so you have the same "names" but the uids could differ, depending
on who logs in first!

actually it wouldn't be difficult to distribute the sid<->uid/gid
entries, all you'd need to do was have one central server which
controls lookups for all workstations.

> It can work as a conveniency feature, but it just hides the fact -
> if you need separate security domains for different processes, they
> have to be of different UIDs, otherwise the system cannot protect them
> from each other (except possibly for special cases like jailed/chrooted).

and selinux, which "tracks" processes in a predefined way [making sure
that they only make certain system calls]

but that's a different story.

> The rights management in a Unix-like system is tied to UIDs, not to PAGs
> or alike. We'd have to redesign all syscalls to be able to securely use PAGs.
> </skippable>

... or you have a central lookup table service which your
nss_myweirdservice is told about, looks stuff up in.

... it's not difficult - just necessary to get your head round a few
weird ideas involving lots of machines rather than just one.

anyway - i _really_ wish Unix had VMS/NT Security Descriptors
not this stupid single 32-bit rubbish, it _really_ would make
life a lot easier.

> > implementation of VFS interface:
> > - very os-specific stuff
> > - probably in-kernel unless something like LUFS takes off
>
> LUFS seems to mean "Linux Userland File System" which does not promise a lot
> about portability?

solution for linux is a solution for linux: i presume that there exist
userspace filesystem drivers for other OSes?

and even a solution for linux requires that the uids be
consistent and possible to look up (see above) but contacting
a remote service to find out your uid is an _awful_ lot easier
to do in userspace than it is in kernel-land.

the thought of doing that kind of thing in-kernel just makes
me wanna puke - it's just _so_ inappropriate!

l.

Ivan Popov

2005-01-22 21:08:46 UTC

Hi Luke,

On Sat, Jan 22, 2005 at 06:48:11PM +0000, Luke Kenneth Casson Leighton wrote:
> if you run the entire file server out of, say, a database (e.g. like
> Apache Subversion / WebDav) and also the security and the concept of
> the user is managed independently of the POSIX / Unix idea of security,
> who _gives_ a monkeys about UIDs - they're completely and utterly
> irrelevant.

sure, I meant the client side.

> which is why it's so much simpler, cleaner and just not so much of a
> pig if everything client-side is done in userspace [KDE / Gnome
> filesystem plugin...]

Let's see. The least common denominator for all processes' i/o
is the corresponding host's OS' system calls.
It means we have to intercept them (via tracing?). Then we need to hand over
a "real" file to the process, otherwise things like mmap() stop working.
I think it can be pretty hard to do it in a general and efficient way
totally in user space. May be just nobody has tried hard enough?..

> where UIDs become relevant is when you start messing about with trying
> to present files from one Unix filesystem in a consistent manner on
> another Unix workstation.
>
> so then, all parties - all file servers and all workstations, and all
> processes accessing the same files - need to have the same view of the
> world: a distributed UID database.

Not really, though it depends on what we call a "consistent" manner.
You can trick "ls" into displaying something feasible _without_
relying on a global uid space. LUFS does such tricks, for example.

> DFS has all that - from what i can gather - via CDS - cell directory
> services, and someone has made an effort to write a PAM plugin for DCE

Christer Bernerus at Chalmers wrote the (appreciated) pam_dce module.

> CDS, and an nsswitch module, etc. which "gives" you a consistent view
> of your UID database across all workstations in the same "domain".

But only inside one cell. Global access in DFS is still hardly possible
(except for anonymous read).

Regards,
--
Ivan

Luke Kenneth Casson Leighton

2005-01-22 23:52:44 UTC

On Sat, Jan 22, 2005 at 10:08:46PM +0100, Ivan Popov wrote:
> Hi Luke,
>
> On Sat, Jan 22, 2005 at 06:48:11PM +0000, Luke Kenneth Casson Leighton wrote:

> > if you run the entire file server out of, say, a database (e.g. like
> > Apache Subversion / WebDav) and also the security and the concept of
> > the user is managed independently of the POSIX / Unix idea of security,
> > who _gives_ a monkeys about UIDs - they're completely and utterly
> > irrelevant.
>
> sure, I meant the client side.

i understood that to be the case - sorry i didn't make that clear - i
hope it became clearer later on from the rest of my reply.

> > which is why it's so much simpler, cleaner and just not so much of a
> > pig if everything client-side is done in userspace [KDE / Gnome
> > filesystem plugin...]
>
> Let's see. The least common denominator for all processes' i/o
> is the corresponding host's OS' system calls.
> It means we have to intercept them (via tracing?). Then we need to hand over
> a "real" file to the process, otherwise things like mmap() stop working.
> I think it can be pretty hard to do it in a general and efficient way
> totally in user space. May be just nobody has tried hard enough?..

i believe that what you are suggesting is to "bounce" kde file
plugins into kernel somehow such that they are presented at
a mount point, so that POSIX apps can get at them as if they
were local files [yuk! ... but actually, thinking about it,
i think it's been done: i believe LUFS does have an example
gnome-vfs filesystem!!! anyway...]

... which, i believe, may actually solve the problem, because
LUFS (well, certainly fuse, anyway) presents file read/writes
via a mmap interface [which i don't pretend to understand,
i just mention it here so you can follow up on it and check
for yourself in case my burbling swiss-cheese memory spotted
something of use].

anyway.

i suggested KDE (and gnome) filesystem plugins for a completely
different reason: to support KDE (and gnome) applications *ONLY*.

KDE's plugin system is very simple - the operations are very
straightforward - straighforward enough for people to write
HTTP filesystem plugins, ftp plugins, a DOS floppy plugin,
the-works-plugins, and so the API comprises the "lowest common
denominator" of file operations.

mmap isn't one of them.

therefore, the program(s) that use(s) the KDE file plugins
(konqueror for example) make quite simple and straightforward,
and KDE "adapts" to the capabilities of the file plugin.

i don't believe uids don't even come into the equation: certainly not
for the DOS floppy plugin (which thunks down onto commands on the
mfloppy package!)

however, user credentials _are_ relevant: username, password etc.

... i do appreciate, however, that this is not exactly what
would be considered a "perfect" solution.

so we have applications that only use the "POSIX" system call
access (including mmap as you say) that cannot be modified...

the only other solution is to do LD preloading to take
over the system calls!

_yes_ this has been done before!!! andrew tridgell did it with
something called "smbsh".

it was successful: you ran "smbsh", which preloaded _open, _open64,
__open, __open64, _read, _read64, __read, __read64, you get the idea,
and then it could "trap" these system calls, checking whether you
accessed a faked-up mount point (which didn't even exist on the
filesystem) and either let it fall through to the "real" file or
provided remote access to SMB servers otherwise.

... i even considered writing an rpcclient version for people
to be able to edit Windows Registrys on remote NT systems!!!

unfortunately, the principle was scotched by the libc6 authors
in about 1998 when they removed some of the __<systemcallname>
things, making them hard symbols or something, and smbsh would
no longer work.

since that time, there may be a workaround or it may be successful
with a more modern version of lib6.

*shrug*

> > where UIDs become relevant is when you start messing about with trying
> > to present files from one Unix filesystem in a consistent manner on
> > another Unix workstation.
> >
> > so then, all parties - all file servers and all workstations, and all
> > processes accessing the same files - need to have the same view of the
> > world: a distributed UID database.
>
> Not really, though it depends on what we call a "consistent" manner.
> You can trick "ls" into displaying something feasible _without_
> relying on a global uid space. LUFS does such tricks, for example.

yes - precisely. ah ha! :)

and that is why i was advocating that complex distributed
filesystem clients, through which POSIX filesystem semantics
can be granted to POSIX apps that expect them, use LUFS as
a simpler way to develop a filesystem client.

> > DFS has all that - from what i can gather - via CDS - cell directory
> > services, and someone has made an effort to write a PAM plugin for DCE
>
> Christer Bernerus at Chalmers wrote the (appreciated) pam_dce module.

hm, i wonder if it's the same one that can be referenced via the
3rd party sw at opengroup.org/dce?

> > CDS, and an nsswitch module, etc. which "gives" you a consistent view
> > of your UID database across all workstations in the same "domain".
>
> But only inside one cell. Global access in DFS is still hardly possible
> (except for anonymous read).

that would tend to suggest that the concept of cells needs to be
extended.

or that the concept of uids needs to be deprecated in POSIX
and replaced with SIDs (which comprise up to 5 but usually 4
32-bit numbers representing a "domain" and are appended with
a 32-bit RID - relative id)

or that a "parallel" extension allowing mappings between UIDs+GIDs
and their corresponding SIDs be allowed - in kernel.

i realise that that all of these would mean a hell of a lot of work.

... but hey, some day, _someone's_ got to bite the bullet, otherwise
everybody's going to continue bitching about this for _another_ what,
20 years is it so far?

:)

l.

Dean Anderson

2005-01-24 05:18:57 UTC

There was a discussion about it on the openafs list. I have an archive if
you need it.

--Dean

On Sat, 22 Jan 2005, Rich Salz wrote:

> Is there an easy link to where I can find out about the "linux 2.5
> keyring" stuff? From name alone, it doesn't sound great. Ivan's note
> about UID's seems right-on.
> /r$
>
>

--
Av8 Internet Prepared to pay a premium for better service?
www.av8.net faster, more reliable, better service
617 344 9000

Kyle Moffett

2005-01-24 22:25:38 UTC

On Sat, 22 Jan 2005, Rich Salz wrote:
> Is there an easy link to where I can find out about the "linux 2.5
> keyring" stuff? From name alone, it doesn't sound great. Ivan's note
> about UID's seems right-on.

The keyring stuff essentially allows you to associate arbitrary BLOBs
with
processes via a simple kernel interface. OpenAFS could store the
credentials
in a session keyring and all processes in that session would have
access to
the credentials. Then OpenAFS could just run a key search for the
credentials
when it needs to perform operations (Such as passing them to the
server) with
them. It's very fast, simple, and well designed

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------

Todd M. Lewis

2005-01-25 12:53:36 UTC

Kyle Moffett wrote:
>
> The keyring stuff essentially allows you to associate arbitrary BLOBs
> with processes via a simple kernel interface. OpenAFS could store
> the credentials in a session keyring and all processes in that
> session would have access to the credentials. Then OpenAFS could
> just run a key search for the credentials when it needs to perform
> operations (Such as passing them to the server) with them. It's very
> fast, simple, and well designed

This is encouraging. How closely do the semantics of "session keyring
and all processes in that session" match those of PAGs? (Group
membership inheritance across fork/exec seems pretty clear; sessions
have always seemed a little fuzzy to me.)
--
+--------------------------------------------------------------+
/ ***@unc.edu 919-962-5273 http://www.unc.edu/~utoddl /
/ I fired my masseuse today. She just rubbed me the wrong way. /
+--------------------------------------------------------------+

Kyle Moffett

2005-01-25 22:28:42 UTC

On Jan 25, 2005, at 07:53, Todd M. Lewis wrote:
> Kyle Moffett wrote:
>> The keyring stuff essentially allows you to associate arbitrary BLOBs
>> with processes via a simple kernel interface. OpenAFS could store
>> the credentials in a session keyring and all processes in that
>> session would have access to the credentials. Then OpenAFS could
>> just run a key search for the credentials when it needs to perform
>> operations (Such as passing them to the server) with them. It's very
>> fast, simple, and well designed
>
> This is encouraging. How closely do the semantics of "session keyring
> and all processes in that session" match those of PAGs? (Group
> membership inheritance across fork/exec seems pretty clear; sessions
> have always seemed a little fuzzy to me.)

I describe in more detail in my other email, but basically a given
"key-session" is preserved across clone/fork/vfork/exec. The only
way to change "key-session"s is with the keyctl syscall, using
PR_JOIN_SESSION_KEYRING to join an existing keyring or create a new
anonymous one.

Actually, Jeffrey Hutzelman has an excellent summary of the other kinds
of "sessions" on Linux in his email, he just doesn't have the specifics
right for "key-sessions".

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------

Luke Kenneth Casson Leighton

2005-01-26 00:08:04 UTC

On Tue, Jan 25, 2005 at 05:28:42PM -0500, Kyle Moffett wrote:
> On Jan 25, 2005, at 07:53, Todd M. Lewis wrote:
> >Kyle Moffett wrote:
> >>The keyring stuff essentially allows you to associate arbitrary BLOBs
> >>with processes via a simple kernel interface. OpenAFS could store
> >>the credentials in a session keyring and all processes in that
> >>session would have access to the credentials. Then OpenAFS could
> >>just run a key search for the credentials when it needs to perform
> >>operations (Such as passing them to the server) with them. It's very
> >>fast, simple, and well designed
> >
> >This is encouraging. How closely do the semantics of "session keyring
> >and all processes in that session" match those of PAGs? (Group
> >membership inheritance across fork/exec seems pretty clear; sessions
> >have always seemed a little fuzzy to me.)
>
> I describe in more detail in my other email, but basically a given
> "key-session" is preserved across clone/fork/vfork/exec. The only
> way to change "key-session"s is with the keyctl syscall, using
> PR_JOIN_SESSION_KEYRING to join an existing keyring or create a new
> anonymous one.

what happens when a process performs a setuid / seteuid call?

i.e. what happens to a file server (such as smbd) which is dropping
in-and-out of root to perform file operations (using seteuid),
but that file server has to perform authentication to an external
[networked] service, and uses a "keyring" as a credential cache?

also relevant: if setuid / seteuid is taken into account,
is an selinux security context _also_ taken into account?

l.

Jeffrey Hutzelman

2005-01-25 17:41:25 UTC

On Monday, January 24, 2005 17:25:38 -0500 Kyle Moffett
<***@mac.com> wrote:

> The keyring stuff essentially allows you to associate arbitrary BLOBs with
> processes via a simple kernel interface. OpenAFS could store the
> credentials in a session keyring and all processes in that session would
> have access to the credentials. Then OpenAFS could just run a key search
> for the credentials when it needs to perform operations (Such as passing
> them to the server) with them.

There still seems to be some confusion here on a couple of key points.

(1) A PAG is not a set of credentials. It is a set of _processes_ which
share the same authentication context. The distinction may seem minor
if you are used to thinking in terms of encrypted local filesystems,
but it is actually of critical importance. A caching distributed
filesystem like AFS (or DFS, or NFSv4) maintains open connections to
servers on behalf of users. Every connection is established using a
specific set of credentials, and operations done over that connection
are subject to the access rights associated with those credentials.
The authentication process is an exchange that may potentially take
multiple round trips (depending on the technology in use); it's not
simply "pass the credentals to the server".

Obviously, it is critical that any operations be done with the right
credentials. And, the cost of creating connections is high enough
that it is important to share connections between processes whenever
possible - otherwise, you'd be creating a new connection for every
ls or whatever. Meeting both of these goals means that processes
which are in the same authentication context should share connections,
while processes in different authentication contexts should not. To
make this happen, we need to keep track of which processes are in the
same authentication context.

In addition, the filesystem may cache other data associated with a
particular authentication context. For example, when we fetch a file
from the fileserver, it gives us information about what access is
available on that file to the user doing the fetch. We cache that
information, so we don't have to go back and ask the fileserver to
reevaluate the ACL on every operation. However, those cached rights
must be associated with the authentication context in which the
original access was done -- otherwise, we might grant some process
too much access to a cached file.

Both of these are ways in which we essentially need an identifier for
sets of processes in the same authentication context, so we can label
other data we track. We don't need a place to store credentials; we
need a way to associate processes.

(2) In UNIX-speak, "session" has a very specific technical meaning; it is
one of several kinds of sets of processes, used for a specific purpose.
Unfortunately, the term has been overloaded several times by people
looking for a good word to describe a concept they cared about. So,
in addition to UNIX sessions, we have PAM sessions and X11 sessions.

In the interest of clarity, a few definitions:

- A UNIX session works like a UNIX process group - a process can call
setsid() to create a new session and become a "session leader";
the new session is named for the pid of its leader. When a process
forks, the new process is in the same session as its parnet, unless
and until it calls setsid() to create a new session. UNIX sessions
are used for managing terminals. This is the only kind of session
the Linux kernel knows about.

- A PAM session relates to the things that happen when you log in,
and again when you log out. It's not really a set of processes at
all - just an instance of someone being logged in to the machine.
Pretty much only the PAM subsystem knows about this.

- An X11 session consists approximately of everything that happens
from when you start the X server (or log in via xdm) until it exits
(or you log out). It contains all the applications you run while
you are logged in. This is the mechanism used to store state about
what applications you are running, where windows are placed, etc,
across login sessions. It is known to the session manager program
and to session-aware X11 applications.

- I'll use the term "login session" to describe everything I do from
when I log in to a machine until I log out. This might be the same
as an X11 session, or it might not. It could include suspended
shells, multiple terminals created by screen, etc.

As I understand it, the keyring code supports associating a keyring
with a UNIX session. Unfortunately, unless a newly-created UNIX
session automatically inherits the keyring of the session which
previously contained the new session leader, this is not enough.
In the course of a single login session, there may be many active
UNIX sessions. Every terminal (xterm, screen window, etc) will be
a separate UNIX session. If I run aklog in one window, it must be
the case that the new tokens take effect for every process which
shares the same authentication context -- it would be unacceptable
for me to have to run aklog in every window!

> It's very fast, simple, and well designed

Says the guy who designed it. :-)

Personally, I think it spends too much time on concepts like keys-as-files
and not enough on getting the inheritance rules right. But otherwise, I
won't argue -- it's basically the right approach for managing keys and
credentials.

Based on our earlier discussion, I think we'll be able to shoehorn what we
need into the keyring mechanism. But it will be ugly, because what we need
is not a place to store keys, but an identifier to label things with.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA

Kyle Moffett

2005-01-25 22:25:43 UTC

On Jan 25, 2005, at 12:41, Jeffrey Hutzelman wrote:
> There still seems to be some confusion here on a couple of key points.
>
> (1) A PAG is not a set of credentials. It is a set of _processes_
> which
> share the same authentication context.

My main point is, why do we need PAGs? You don't just have to store
credentials in a keyring, you can store a (single) shared connection.
Basically any in-kernel or out-of-kernel security-related BLOB that
needs to be associated with a process can be used through the keyring
system. Ex:

kinit:
Connect to Kerberos server, authenticate, get keys, put them in
the session keyring
aklog:
Get the TGT from any keyring where it's available, connect to
the Kerberos server, get an AFS service ticket, put that in the
same keyring as the TGT. Then "create" an AFS connection "key",
passing in a specially formatted BLOB consisting of the service
ticket, connection parameters, and other settings. The AFS
kernel module would decode the BLOB, create the connection, and
link the connection through the newly generated "key".
file ops:
Get the first AFS connection "key" found, and use it to perform
file operations.

The session key is _NOT_ a UNIX session, but a key shared across all
processes descended via fork/vfork/clone/exec from the process where
the keyring is originally set. This means that you would just have
the PAM module create a new key-session and put the keys in there.

> In addition, the filesystem may cache other data associated with a
> particular authentication context. For example, when we fetch a
> file
> from the fileserver, it gives us information about what access is
> available on that file to the user doing the fetch. We cache that
> information, so we don't have to go back and ask the fileserver to
> reevaluate the ACL on every operation. However, those cached rights
> must be associated with the authentication context in which the
> original access was done -- otherwise, we might grant some process
> too much access to a cached file.

This works with the above stuff too, and more elegantly than the old
PAG code did too, because it doesn't need to hack the kernel groups
code.

> (2) In UNIX-speak, "session" has a very specific technical meaning; it
> is
> one of several kinds of sets of processes, used for a specific
> purpose.
> Unfortunately, the term has been overloaded several times by people
> looking for a good word to describe a concept they cared about. So,
> in addition to UNIX sessions, we have PAM sessions and X11 sessions.

This is a new and distinct "session", I call it here a "key-session".

> As I understand it, the keyring code supports associating a keyring
> with a UNIX session.

This is completely false. Please read linux/Documentation/keys.txt from
a recent kernel.

>> It's very fast, simple, and well designed
>
> Says the guy who designed it. :-)

I didn't design or write it. :-P I helped out David Howells by
commenting on his code and design, but I did not code a single line for
it.

> Personally, I think it spends too much time on concepts like
> keys-as-files
> and not enough on getting the inheritance rules right.

What about the inheritance rules described above does not work for you?
The only way to change the session keyring is with
PR_JOIN_SESSION_KEYRING.
With that, you can request a new anonymous keyring or join an already
created one, assuming you have sufficient permissions.

> Based on our earlier discussion, I think we'll be able to shoehorn what
> we need into the keyring mechanism. But it will be ugly, because what
> we need is not a place to store keys, but an identifier to label things
> with.

Just attach your connection and cache data to a key of an "AFS-conn"
type
registered by the AFS module, and you get all of the same inheritance
functionality as traditional group-based PAGs, except without the hacks
introduced to hook the kernel group-manipulation syscalls.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------

Jeffrey Hutzelman

2005-01-26 00:11:56 UTC

[ Ugh. As I've been working on this message, I've found myself saying the
same thing in different ways in different places, just because I'm trying
to answer your comments in order. In the interest of clarity, I'm
reordering things somewhat. Please bear with me... ]

> The session key is _NOT_ a UNIX session, but a key shared across all
> processes descended via fork/vfork/clone/exec from the process where
> the keyring is originally set.

Ok. That's not the impression I got from previous discussion and from what
I read of the documentation in earlier versions of David's code. I stand
corrected, and thank you for clearing up the misunderstanding.

> My main point is, why do we need PAGs? You don't just have to store
> credentials in a keyring, you can store a (single) shared connection.

Things a PAG is not:
- a set of credentials
- a place to store things
- a 32-bit number
- a pair of funny groups

A PAG is a set of processes. In fact, it's very nearly identical to what
you called a "key session". We do in fact need PAG's, or something
equivalent. And, we need a way to "name" PAG's, so that we can label other
data structures as to which PAG they belong to.

We're not tied to labelling PAG's with a 32-bit integer. It could easily
be something else, like a larger integer or a pointer.

We're certainly not tied to representing processes' PAG membership as
groups. It's just a kludge to get the job done. We hate it as much as
anyone else. But it does get the job done.

However, we do need to be able to label open connections and cached access
rights as to what PAG they belong to. Note that we're not talking about
one open connection per PAG; we're talking about one open connection per
PAG per fileserver. And we're not talking about a cached set of groups or
SID's or something; we're talking about cached data on individual files
indicating what operations we are allowed to do on that file. So, it's not
a couple of items per PAG; it could be in the tens of thousands.

We already have data structures and code which manages this information.
That code is cross-platform, and we'd like to keep it that way.
Introducing a pervasive platform-dependent difference in behaviour does not
improve the maintainability of our code.

So, my question is... what do I use as a label?

>>> It's very fast, simple, and well designed
>>
>> Says the guy who designed it. :-)
>
> I didn't design or write it. :-P I helped out David Howells by
> commenting on his code and design, but I did not code a single line for
> it.

Hm; that wasn't the impression I got during our previous discussion, around
the time some of the design work was happening. But OK. I suppose I
should say for the benefit of others reading that I wasn't trying to
devalue your comment, just giving credit where I thought it was due.

We really do appreciate all the work you and David have done on this.

-- Jeff

Kyle Moffett

2005-01-26 01:17:59 UTC

On Jan 25, 2005, at 19:11, Jeffrey Hutzelman wrote:
>> My main point is, why do we need PAGs? You don't just have to store
>> credentials in a keyring, you can store a (single) shared connection.
>
> Things a PAG is not:
> - a set of credentials
> - a place to store things
> - a 32-bit number
> - a pair of funny groups
>
> A PAG is a set of processes. In fact, it's very nearly identical to
> what you called a "key session". We do in fact need PAG's, or
> something equivalent. And, we need a way to "name" PAG's, so that we
> can label other data structures as to which PAG they belong to.
>
> However, we do need to be able to label open connections and cached
> access rights as to what PAG they belong to. Note that we're not
> talking about one open connection per PAG; we're talking about one
> open connection per PAG per fileserver. And we're not talking about a
> cached set of groups or SID's or something; we're talking about cached
> data on individual files indicating what operations we are allowed to
> do on that file. So, it's not a couple of items per PAG; it could be
> in the tens of thousands.
>
> We already have data structures and code which manages this
> information. That code is cross-platform, and we'd like to keep it
> that way. Introducing a pervasive platform-dependent difference in
> behaviour does not improve the maintainability of our code.

Ok, so the requirements are:
1) Shared between multiple processes with sane inheritance
2) Store a pointer to arbitrary arch-independent data structures
3) A unique globally-useable ID to locate a particular combination
of credentials, connection data, caches, etc.

As I see it, the keyring system can very simply be dropped in place of
the existing setgroups hooks. You can implement your own key_type
data structure (struct key_type afs_pag_key_type;) that contains a
pointer to an arch-independent AFS structure containing connections,
caches, etc. Then instead of a "PAG" id, you would use a "key" id,
except you would need to check if the key is of afs_pag_key_type first.
Creating and using the key is trivial with the struct key_type hooks,
and it manages the inheritance automatically for you. That way you
could manage all your own internal interfaces, caches, etc, and only
rely on the keyring system to keep track of processes for you. The
one thing the keyring system _doesn't_ provide is a list of processes
that have a certain keyring, primarily because that slows the system
down considerably and chews up a lot more RAM. :-D

>> I didn't design or write it. :-P I helped out David Howells by
>> commenting on his code and design, but I did not code a single line
>> for
>> it.
> Hm; that wasn't the impression I got during our previous discussion,
> around the time some of the design work was happening. But OK. I
> suppose I should say for the benefit of others reading that I wasn't
> trying to devalue your comment, just giving credit where I thought it
> was due.

Well, I tried my hand at some initial patches, but David Howells had
more code written better, and I didn't have sufficient time to work on
it, so he wrote all the code.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------

Todd M. Lewis

2005-01-26 13:47:27 UTC

Kyle Moffett wrote:
>
> Ok, so the requirements are:
> 1) Shared between multiple processes with sane inheritance
> 2) Store a pointer to arbitrary arch-independent data structures
> 3) A unique globally-useable ID to locate a particular combination
> of credentials, connection data, caches, etc.

Let's flesh out #1 a bit: sane inheritance. That needs to include
repudiation. Specifically, a process needs to be able to do two things:
1) to drop its token and ensure that newly acquired tokens are
accessible only to its descendant processes, and 2) ensure that its
descendants can't "rejoin the old PAG, still in progress" so to speak.

In a previous message, you said:
> What about the inheritance rules described above does not work for you?
> The only way to change the session keyring is with PR_JOIN_SESSION_KEYRING.
> With that, you can request a new anonymous keyring or join an already
> created one, assuming you have sufficient permissions.

What exactly constitutes "sufficient permissions" to join an already
created keyring? That certainly seems like a very flexible design, but
I'm not convinced it meets the "sane inheritance" criterion. PAGs don't
let you do that (without doing some evil rootly things anyway). Maybe
this keyring thing does let you do everything PAGs can do, but can they
keep you from doing everything that PAGs keep you from doing?

> As I see it, the keyring system can very simply be dropped in place of
> the existing setgroups hooks.

I know I'm whistling in the wind here, given that local UNIX groups are
so thoroughly ingrained, simple, and efficiently implemented that
fundamental changes in that area are extremely unlikely, but each local
group membership is exactly equivalent to holding a "local token" for
local (and some shared) file systems. They ought to be treated the same
way -- and with the same code in the kernel -- as tokens for remote file
systems. I'd like to see an implementation that not only could be
"dropped in place of the existing setgroups hooks", but could replace
all the group stuff at once (and improve it at the same time, which is a
tall order given that it's almost trivial now). Not gonna hold my breath
on that one, though.
--
+--------------------------------------------------------------+
/ ***@unc.edu 919-962-5273 http://www.unc.edu/~utoddl /
/ Marriage is the mourning after the knot before. /
+--------------------------------------------------------------+

Jeffrey Hutzelman

2005-01-26 15:22:38 UTC

On Tuesday, January 25, 2005 20:17:59 -0500 Kyle Moffett
<***@mac.com> wrote:

> Ok, so the requirements are:
> 1) Shared between multiple processes with sane inheritance
> 2) Store a pointer to arbitrary arch-independent data structures
> 3) A unique globally-useable ID to locate a particular combination
> of credentials, connection data, caches, etc.

Correct.

> As I see it, the keyring system can very simply be dropped in place of
> the existing setgroups hooks. You can implement your own key_type
> data structure (struct key_type afs_pag_key_type;) that contains a
> pointer to an arch-independent AFS structure containing connections,
> caches, etc. Then instead of a "PAG" id, you would use a "key" id,
> except you would need to check if the key is of afs_pag_key_type first.

By "key id", you mean the key's serial number?
Are these ever reused?

> one thing the keyring system _doesn't_ provide is a list of processes
> that have a certain keyring, primarily because that slows the system
> down considerably and chews up a lot more RAM. :-D

We don't have that either, for similar reasons.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA

David Howells

2005-01-26 10:55:47 UTC

Jeffrey Hutzelman <***@cmu.edu> wrote:

> So, my question is... what do I use as a label?

You could use the session keyring ID. Each key (keyrings are a special type of
key) has a "unique" ID which is a signed 32-bit number. I say "unique", but
after all the two billion possible key IDs have been iterated through, the ID
allocator will begin again from 1, skipping the IDs still in use.

David

Derek Atkins

2005-01-26 15:04:06 UTC

David Howells <***@redhat.com> writes:

> Jeffrey Hutzelman <***@cmu.edu> wrote:
>
>> So, my question is... what do I use as a label?
>
> You could use the session keyring ID. Each key (keyrings are a special type of
> key) has a "unique" ID which is a signed 32-bit number. I say "unique", but
> after all the two billion possible key IDs have been iterated through, the ID
> allocator will begin again from 1, skipping the IDs still in use.

So it's a deterministic and guessable number? Is that necessarily a
good thing, if I can guess your session keyring ID?

> David

-derek

--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
***@MIT.EDU PGP key available

Luke Kenneth Casson Leighton

2005-01-26 00:15:42 UTC

> > In addition, the filesystem may cache other data associated with a
> > particular authentication context. For example, when we fetch a
> >file
> > from the fileserver, it gives us information about what access is
> > available on that file to the user doing the fetch. We cache that
> > information, so we don't have to go back and ask the fileserver to
> > reevaluate the ACL on every operation. However, those cached rights
> > must be associated with the authentication context in which the
> > original access was done -- otherwise, we might grant some process
> > too much access to a cached file.

this would imply that the unix uid _is_ taken into account, yes?

so in order to correctly obtain the authentication context it would be
necessary to perform a seteuid or setuid, yes?

... and if so, what happens when you have a user-referencing
structure (NT SIDs, SElinux security contexts, DCE cells,
etc.) that has nothing to do with unix uids or unix gids?

[yes, i do realise that you can provide a one-to-one & onto mapping
table which maps each of these schemes uniquely to unix uids/gids
on a per-user basis]

l.

Luke Kenneth Casson Leighton

2005-01-25 20:31:44 UTC

On Tue, Jan 25, 2005 at 12:41:25PM -0500, Jeffrey Hutzelman wrote:
> On Monday, January 24, 2005 17:25:38 -0500 Kyle Moffett
> <***@mac.com> wrote:
>
> >The keyring stuff essentially allows you to associate arbitrary BLOBs with
> >processes via a simple kernel interface. OpenAFS could store the
> >credentials in a session keyring and all processes in that session would
> >have access to the credentials. Then OpenAFS could just run a key search
> >for the credentials when it needs to perform operations (Such as passing
> >them to the server) with them.
>
> There still seems to be some confusion here on a couple of key points.
>
> (1) A PAG is not a set of credentials. It is a set of _processes_ which
> share the same authentication context.

*whew*. that's good.

> The distinction may seem minor
> if you are used to thinking in terms of encrypted local filesystems,
> but it is actually of critical importance. A caching distributed
> filesystem like AFS (or DFS, or NFSv4) maintains open connections to
> servers on behalf of users. Every connection is established using a
> specific set of credentials, and operations done over that connection
> are subject to the access rights associated with those credentials.
> The authentication process is an exchange that may potentially take
> multiple round trips (depending on the technology in use); it's not
> simply "pass the credentals to the server".

Windows NT authentication is the same - in an NT3.5->4.0 and NT 5.0
(aka 2000) in "NT domain backwards-compatibility" mode it's an exchange
of in excess of 15 round-trips just to perform authentication: if you
are contacting a domain member server or a PDC with an inter-domain
trust relationship then it's double that (because the server needs to
contact your PDC on your behalf).

this isn't funny to be duplicating, and keeping an open
connection to the PDC can help short-cut the number of authentication
packets exchanged drastically, as can credential cacheing.

> Obviously, it is critical that any operations be done with the right
> credentials. And, the cost of creating connections is high enough
> that it is important to share connections between processes whenever
> possible - otherwise, you'd be creating a new connection for every
> ls or whatever.

winbindd - written by tim potter - does exactly this. pam_winbindd
contacts winbindd, which performs authentication on your behalf.

so there do exist userspace applications which do the same job as
keyring.

> Meeting both of these goals means that processes
> which are in the same authentication context should share connections,
> while processes in different authentication contexts should not. To
> make this happen, we need to keep track of which processes are in the
> same authentication context.

this _can_ be managed all in userspace, by writing appropriate APIs
that are protected behind appropriate access privileges on, say unix
domain sockets.

i'd be interested to hear a justification as to why it is _necessary_
for this to be done in kernel.

> Both of these are ways in which we essentially need an identifier for
> sets of processes in the same authentication context, so we can label
> other data we track. We don't need a place to store credentials; we
> need a way to associate processes.

why does that require specific assistance from the kernel?

is there any overlap between different user contexts,
such that a userspace credential cache would be insufficient?

in other words, if i log in as user1 and create boat-loads
of processes, is there _any_ circumstance under which any
arbitrary user2 _needs_ access to the cached credentials
of user1?

l.

--
<a href="http://lkcl.net">http://lkcl.net</a>
--

Matthew N. Andrews

2005-01-26 21:41:09 UTC

> such that a userspace credential cache would be insufficient?
>
> in other words, if i log in as user1 and create boat-loads
> of processes, is there _any_ circumstance under which any
> arbitrary user2 _needs_ access to the cached credentials
> of user1?
>
I think you're missing a key feature of pags here. you can have a
process acquire credentials that:

1) other processes with the same uid/gid cannot access.
2) are accessible to child processes with a differend uid/gid, unless
specific actions are taken to drop access by an intermediat
descendant/ancestor.

and yes, there are circumstances when changing effective uid needs NOT
to drop access to my credentials. in particular, setuid programs run by
me should retain access to my afs credentials.

-Matt Andrews

> l.
>
> --
> <a href="http://lkcl.net">http://lkcl.net</a>
> --
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-***@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>
>

Luke Kenneth Casson Leighton

2005-01-26 22:35:00 UTC

matthew, i'm going to be bluntly honest here.

it sounds like a complete nightmare.

i see nothing here that a userspace implementation plus an
appropriate selinux policy couldn't gain you.

if a credential / ACL "cache" is provided in a userspace daemon,
which provides oh, say, a unix domain socket interface, and you
created an selinux policy which protected access to those sockets,
not only are you done, but also if you port your credential cacheing
userspace daemon to other unixen that don't happen to have selinux,
so what? at least it's portable: you don't _have_ to "layer" selinux on
top.

there are tricks that can be played when providing a unix
domain socket interface, to ensure that only a specific uid has access
to an interface, in a way that the daemon can be absolutely sure that
only that uid has access (except root of course).

you connect to a 0777 accessible socket, you say "this is
my uid", then the daemon creates a temporary directory,
chmods it 0700, chowns it to that uid, creates a unix-d-sock
in that directory, 0600 chmods and chowns it, and then sends
the location of that directory down the 0777 socket.

(the creation of the directory is necessary because some unixen
cannot handle chmod on a socket!)

the client disconnects from the 0777 socket and then reconnects to the
temporary socket. the daemon therefore knows that only a process with
that uid is able to access that socket.

you could even, on linux, play tricks with /proc (similar to what lsof
does) to ensure that only a particular _process_ has access to a socket.

... but i believe this [restricting processes] would be much
better / cleanly achieved using an selinux policy, setting up
selinux domains to restrict access.

l.

On Wed, Jan 26, 2005 at 01:41:09PM -0800, Matthew N. Andrews wrote:

> > such that a userspace credential cache would be insufficient?
> >
> > in other words, if i log in as user1 and create boat-loads
> > of processes, is there _any_ circumstance under which any
> > arbitrary user2 _needs_ access to the cached credentials
> > of user1?
> >
> I think you're missing a key feature of pags here. you can have a
> process acquire credentials that:
>
> 1) other processes with the same uid/gid cannot access.
> 2) are accessible to child processes with a differend uid/gid, unless
> specific actions are taken to drop access by an intermediat
> descendant/ancestor.
>
> and yes, there are circumstances when changing effective uid needs NOT
> to drop access to my credentials. in particular, setuid programs run by
> me should retain access to my afs credentials.
>
> -Matt Andrews
>
> > l.
> >
> >--
> ><a href="http://lkcl.net">http://lkcl.net</a>
> >--
> >_______________________________________________
> >OpenAFS-devel mailing list
> >OpenAFS-***@openafs.org
> >https://lists.openafs.org/mailman/listinfo/openafs-devel
> >
> >
>

--
--
<a href="http://lkcl.net">http://lkcl.net</a>
--

Steven French

2005-01-20 21:45:29 UTC

This is a multipart message in MIME format.
--=_alternative 007785E886256F8F_=
Content-Type: text/plain; charset="US-ASCII"

> is it time to consider splitting the cache manager out of individual
filesystem clients?
> If the interfaces are abstract enough, we should be able to have
multiple
> distributed fs's using the same cache manager API. Yes, there's tons of
> little details to be worked out (e.g. credential management, access
> control, etc.)

I am pleased with the recent addition of the credential keyring to the
kernel, thanks to AFS users, which will be helpful as soon as I figure out
what the user space tools are saving there at logon time - and how to hook
in kernel the plaintext password or kerberos ticket that winbind or
pam_kerberos get at logon (CIFS client has to get Kerberos tickets to
authenticate to Samba, Windows, Netapp etc.). I don't want to be
prompting for passwords as CIFS users traverse a DFS junction.

The credential management needs improvement in the kernel to handle CIFS,
NFSv4 and AFS (very similar problems) but the other parts o you comment is
puzzling, since at least for Linux the client cache is common (for all but
two major filesystems) and led to much of the improvement between 2.4 and
2.6 Linux kernels (and there is no distinct server cache as all of the
servers run on stock filesystems). CIFS client, NFS client etc. all use
the common mm cache and the CIFS client handles the primitive delegations
(oplock), somewhat similar to what NFSv4 will be doing. The in kernel
AFS client offerred a client cache to disk patch IIRC which would be
mildly helpful for everyone to standardize on - and I will be happy to
integrate support for that on the CIFS client side.

Where we could really use improvement is mapping the missing pieces of the
server side of the AFS API to a set of generic operations - to construct a
list of what is needed to be added to the kernel VFS so JFS, EXT3, XFS
etc. can replicate, move, backup, snapshot etc. better. There has been
enormous progress in the 2.6 kernel, but Samba 4, NFSv4 have protocol
support (which currently can not be implemented) that could plug into many
of these already.

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com
--=_alternative 007785E886256F8F_=
Content-Type: text/html; charset="US-ASCII"

<br><font size=2><tt>> is it time to consider splitting the cache manager
out of individual filesystem clients?<br>
> If the interfaces are abstract enough, we should be able to have multiple<br>
> distributed fs's using the same cache manager API.  Yes, there's
tons of<br>
> little details to be worked out (e.g. credential management, access<br>
> control, etc.) </tt></font>
<br>
<br><font size=2><tt>I am pleased with the recent addition of the credential
keyring to the kernel, thanks to AFS users, which will be helpful as soon
as I figure out what the user space tools are saving there at logon time
- and how to hook in kernel the plaintext password or kerberos ticket that
winbind or pam_kerberos get at logon (CIFS client has to get Kerberos tickets
to authenticate to Samba, Windows, Netapp etc.).  I don't want to
be prompting for passwords as CIFS users traverse a DFS junction.</tt></font>
<br>
<br><font size=2><tt>The credential management needs improvement in the
kernel to handle CIFS, NFSv4 and AFS (very similar problems) but the other
parts o you comment is puzzling, since at least for Linux the client cache
is common (for all but two major filesystems) and led to much of the improvement
between 2.4 and 2.6 Linux kernels (and there is no distinct server cache
as all of the servers run on stock filesystems).   CIFS client, NFS
client etc. all use the common mm cache and the CIFS client handles the
primitive delegations (oplock), somewhat similar to what NFSv4 will be
doing.   The in kernel AFS client offerred a client cache to disk
patch IIRC which would be mildly helpful for everyone to standardize on
- and I will be happy to integrate support for that on the CIFS client
side.</tt></font>
<br>
<br><font size=2><tt>Where we could really use improvement is mapping the
missing pieces of the server side of the AFS API to a set of generic operations
- to construct a list of what is needed to be added to the kernel VFS so
JFS, EXT3, XFS etc. can replicate, move, backup, snapshot etc. better.
 There has been enormous progress in the 2.6 kernel, but Samba 4,
NFSv4 have protocol support (which currently can not be implemented) that
could plug into many of these already.</tt></font>
<br><font size=2 face="sans-serif"><br>
<br>
Steve French<br>
Senior Software Engineer<br>
Linux Technology Center - IBM Austin<br>
phone: 512-838-2294<br>
email: sfrench at-sign us dot ibm dot com</font>
--=_alternative 007785E886256F8F_=--

Ivan Popov

2005-01-21 09:27:33 UTC

Hi Tom!

On Tue, Jan 18, 2005 at 04:46:14PM -0500, Tom Keiser wrote:
> Secondly, I know this is a rather drastic proposal, but is it time to
> consider splitting the cache manager out of individual filesystem clients?

What do you call a filesystem client and a cache manager in this context?

I am afraid that different people (including myself) may think about
very different things.

> If the interfaces are abstract enough, we should be able to have multiple
> distributed fs's using the same cache manager API.

Do you mean any besides AFS and DFS?

> help reduce the amount of in-kernel code for which each
> project is responsible. Anyone else think this is feasible?

Do you mean in-kernel cache management? Then probably no.
Both filesystems and kernels are of great variety.

If you mean a more general "cache bookkeeping library", then possibly yes,
but still you'll get differences depending on how FSs and OSs distribute
functionality between kernel and user space in a filesystem client.

If you mean the upcall interface (a common kernel module for different
filesystems), then probably no - it reflects both the corresponding filesystem
semantics and the corresponding kernel architecture...

Though, less demanding filesystems can be happy with "foreign" kernel
modules - like podfuk-smb or davfs2 using the Coda module.

My 2c,
--
Ivan

Luke Kenneth Casson Leighton

2005-01-21 11:01:06 UTC

On Fri, Jan 21, 2005 at 10:27:33AM +0100, Ivan Popov wrote:

> If you mean a more general "cache bookkeeping library", then possibly yes,
> but still you'll get differences depending on how FSs and OSs distribute
> functionality between kernel and user space in a filesystem client.

i believe stephen is referring to storing NT authentication
"session keys", which are required to perform DCE/RPC signing and
sealing, that sort of thing.

i believe stephen envisages a situation where a user logs in with a
program (or uses a modified version of a pam module) that performs
NT-style authentication, and then the program or pam module
communicates the response to the kernel, which caches it "safely"
somehow.

then, should other programs or kernel modules (such as a
filesystem module, or a user-space service, or smbclient,
or REGEDT32.EXE running under Wine) require access to those
cached credentials, they may do so immediately and via a
well-defined interface.

this may get rid of the need, for example, for smbmount to
use an ioctl to communicate with smbfs.ko which passes over
the SMB username+password. that sort of thing.

personally, i think the whole idea of having complex filesystem
clients inside the kernel is utterly insane, and that userspace
filesystems like fuse (not so hot), lufs (much better) where you
write a userspace helper daemon is much better, and that
application-level plugins for desktop systems like KDE (which even
has a webdav filesystem plugin), and simple command-line tools
like smbclient and ftp should be used instead.

you _really_ want to port dce/rpc client-side code into the linux
kernel???

and you've _really_ ported the AFS client-side code into the linux
kernel??

*shriek*!!! :)

the gnu/hurd has the right approach to this issue: the kernel
"helps" you to "publish" a userspace service via well-known interfaces
such that other programs may use your "service" as system calls.

ironically (given the people on this list i think you'll
appreciate this), in order to achieve this, the gnu/hurd team
had to write _their own_ RPC system - including an IDL compiler
- from the ground up and utilised Mach kernel message-passing
as the transport.

now, if you proposed that the linux kernel gained
message-passing, or similar features to gnu/hurd to achieve
the same effect, or you proposed improvements to the linux
filesystem structures such that lufs and fuse don't end up
locking large critical structures, i'd say "GREAT!"

... but instead, i'm going to say "good luck" :)

l.

Rich Salz

2005-01-22 17:16:47 UTC

Is there an easy link to where I can find out about the "linux 2.5
keyring" stuff? From name alone, it doesn't sound great. Ivan's note
about UID's seems right-on.
/r$

--
Rich Salz Chief Security Architect
DataPower Technology http://www.datapower.com
XS40 XML Security Gateway http://www.datapower.com/products/xs40.html
XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html

41 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Tom Keiser 2005-01-18 21:46:14 UTC

Harald Barth 2005-01-20 14:13:46 UTC

John S. Bucy 2005-01-20 16:57:49 UTC

Jeffrey Hutzelman 2005-01-21 01:24:23 UTC

Derrick J Brashear 2005-01-21 14:00:59 UTC

Matthew Miller 2005-01-21 15:28:03 UTC

Derrick J Brashear 2005-01-21 17:22:31 UTC

John S. Bucy 2005-01-21 18:55:41 UTC

Alexander Boström 2005-01-28 14:45:08 UTC

Luke Kenneth Casson Leighton 2005-01-30 03:30:20 UTC

Miklos Szeredi 2005-01-30 11:13:04 UTC

Luke Kenneth Casson Leighton 2005-01-30 12:13:02 UTC

Miklos Szeredi 2005-01-30 12:40:35 UTC

Luke Kenneth Casson Leighton 2005-01-30 13:06:29 UTC

Luke Kenneth Casson Leighton 2005-01-21 15:49:19 UTC

Garrett Wollman 2005-01-24 05:09:06 UTC

Tom Keiser (Tom Keiser) 2005-01-21 21:56:16 UTC

Ivan Popov 2005-01-22 16:56:53 UTC

Luke Kenneth Casson Leighton 2005-01-22 18:48:11 UTC

Ivan Popov 2005-01-22 21:08:46 UTC

Luke Kenneth Casson Leighton 2005-01-22 23:52:44 UTC

Dean Anderson 2005-01-24 05:18:57 UTC

Kyle Moffett 2005-01-24 22:25:38 UTC

Todd M. Lewis 2005-01-25 12:53:36 UTC

Kyle Moffett 2005-01-25 22:28:42 UTC

Luke Kenneth Casson Leighton 2005-01-26 00:08:04 UTC

Jeffrey Hutzelman 2005-01-25 17:41:25 UTC

Kyle Moffett 2005-01-25 22:25:43 UTC

Jeffrey Hutzelman 2005-01-26 00:11:56 UTC

Kyle Moffett 2005-01-26 01:17:59 UTC

Todd M. Lewis 2005-01-26 13:47:27 UTC

Jeffrey Hutzelman 2005-01-26 15:22:38 UTC

David Howells 2005-01-26 10:55:47 UTC

Derek Atkins 2005-01-26 15:04:06 UTC

Luke Kenneth Casson Leighton 2005-01-26 00:15:42 UTC

Luke Kenneth Casson Leighton 2005-01-25 20:31:44 UTC

Matthew N. Andrews 2005-01-26 21:41:09 UTC

Luke Kenneth Casson Leighton 2005-01-26 22:35:00 UTC

Steven French 2005-01-20 21:45:29 UTC

Ivan Popov 2005-01-21 09:27:33 UTC

Luke Kenneth Casson Leighton 2005-01-21 11:01:06 UTC

Rich Salz 2005-01-22 17:16:47 UTC

about - legalese

Loading...