All of lore.kernel.org
 help / color / mirror / Atom feed
* Thoughts on credential switching
@ 2014-03-27  0:23 Andy Lutomirski
  2014-03-27  0:42 ` Serge Hallyn
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27  0:23 UTC (permalink / raw)
  To: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

Hi various people who care about user-space NFS servers and/or
security-relevant APIs.

I propose the following set of new syscalls:

int credfd_create(unsigned int flags): returns a new credfd that
corresponds to current's creds.

int credfd_activate(int fd, unsigned int flags): Change current's
creds to match the creds stored in fd.  To be clear, this changes both
the "subjective" and "objective" (aka real_cred and cred) because
there aren't any real semantics for what happens when userspace code
runs with real_cred != cred.

Rules:

 - credfd_activate fails (-EINVAL) if fd is not a credfd.
 - credfd_activate fails (-EPERM) if the fd's userns doesn't match
current's userns.  credfd_activate is not intended to be a substitute
for setns.
 - credfd_activate will fail (-EPERM) if LSM does not allow the
switch.  This probably needs to be a new selinux action --
dyntransition is too restrictive.


Optional:
 - credfd_create always sets cloexec, because the alternative is silly.
 - credfd_activate fails (-EINVAL) if dumpable.  This is because we
don't want a privileged daemon to be ptraced while impersonating
someone else.
 - optional: both credfd_create and credfd_activate fail if
!ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).

The first question: does this solve Ganesha's problem?

The second question: is this safe?  I can see two major concerns.  The
bigger concern is that having these syscalls available will allow
users to exploit things that were previously secure.  For example,
maybe some configuration assumes that a task running as uid==1 can't
switch to uid==2, even with uid 2's consent.  Similar issues happen
with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
longer really true.

Alternatively, something running as uid == 0 with heavy capability
restrictions in a mount namespace (but not a uid namespace) could pass
a credfd out of the namespace.  This could break things like Docker
pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
I think that Docker is already totally screwed if a Docker root task
can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
not entirely clear that the situation is any worse, even without
requiring CAP_SYS_ADMIN.

The second concern is that it may be difficult to use this correctly.
There's a reason that real_cred and cred exist, but it's not really
well set up for being used.

As a simple way to stay safe, Ganesha could only use credfds that have
real_uid == 0.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  0:23 Thoughts on credential switching Andy Lutomirski
@ 2014-03-27  0:42 ` Serge Hallyn
  2014-03-27  1:01   ` Andy Lutomirski
  2014-03-27  2:48 ` Jeff Layton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Serge Hallyn @ 2014-03-27  0:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

Quoting Andy Lutomirski (luto@amacapital.net):
> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.

Is there a URL where I can find the motivation, and why the existing
features can't be used?

My guess would be, uid 100000 is root in a container, and you want
him to be able to send a request to a root daemon on the host, on
behalf of uid 100005 in the container, over which 100000 has
privilege.  (Which is sort of what we need for the cgmanager proxy;
there we do it by checking checking that 100000 is mapped to 0 in
the requestor's uid_map, and that 100005 is mapped in that uid_map)
The credfd would be useful there, especially combined with a
credfd_access(credfd, fd, perms) call.

But I'd like to hear exactly how nfs and ganesha would use these.

What all would be assiciated with the credfd?  Everything that is
in the kernel cred?

> Rules:
> 
>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> current's userns.  credfd_activate is not intended to be a substitute
> for setns.
>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> switch.  This probably needs to be a new selinux action --
> dyntransition is too restrictive.
> 
> 
> Optional:
>  - credfd_create always sets cloexec, because the alternative is silly.
>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.
>  - optional: both credfd_create and credfd_activate fail if
> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> 
> The first question: does this solve Ganesha's problem?
> 
> The second question: is this safe?  I can see two major concerns.  The
> bigger concern is that having these syscalls available will allow
> users to exploit things that were previously secure.  For example,
> maybe some configuration assumes that a task running as uid==1 can't
> switch to uid==2, even with uid 2's consent.  Similar issues happen
> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
> longer really true.
> 
> Alternatively, something running as uid == 0 with heavy capability
> restrictions in a mount namespace (but not a uid namespace) could pass
> a credfd out of the namespace.  This could break things like Docker
> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
> I think that Docker is already totally screwed if a Docker root task
> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
> not entirely clear that the situation is any worse, even without
> requiring CAP_SYS_ADMIN.
> 
> The second concern is that it may be difficult to use this correctly.
> There's a reason that real_cred and cred exist, but it's not really
> well set up for being used.
> 
> As a simple way to stay safe, Ganesha could only use credfds that have
> real_uid == 0.
> 
> --Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  0:42 ` Serge Hallyn
@ 2014-03-27  1:01   ` Andy Lutomirski
  2014-03-27 15:41     ` Florian Weimer
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27  1:01 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

On Wed, Mar 26, 2014 at 5:42 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto@amacapital.net):
>> Hi various people who care about user-space NFS servers and/or
>> security-relevant APIs.
>>
>> I propose the following set of new syscalls:
>>
>> int credfd_create(unsigned int flags): returns a new credfd that
>> corresponds to current's creds.
>>
>> int credfd_activate(int fd, unsigned int flags): Change current's
>> creds to match the creds stored in fd.  To be clear, this changes both
>> the "subjective" and "objective" (aka real_cred and cred) because
>> there aren't any real semantics for what happens when userspace code
>> runs with real_cred != cred.
>
> Is there a URL where I can find the motivation, and why the existing
> features can't be used?

It was an LSF talk.  There's this, though:

http://thread.gmane.org/gmane.linux.file-systems/79234

Essentially, it's a performance problem.  knfsd has override_creds,
and it can cache struct cred.  But userspace doing the same thing
(i.e. impersonating a user) has to do setresuid, setresgid, and
setgroups, which kills performance, since it results in something like
five RCU callbacks per impersonation round-trip.

Windows has something called an "impersonation token" that more or
less solves this problem.

>
> My guess would be, uid 100000 is root in a container, and you want
> him to be able to send a request to a root daemon on the host, on
> behalf of uid 100005 in the container, over which 100000 has
> privilege.  (Which is sort of what we need for the cgmanager proxy;
> there we do it by checking checking that 100000 is mapped to 0 in
> the requestor's uid_map, and that 100005 is mapped in that uid_map)
> The credfd would be useful there, especially combined with a
> credfd_access(credfd, fd, perms) call.

This requires uid 100005 to send 100000 a credfd, right?

In general, making this same probably requires a way to make it safe
to call credfd_activate on an untrusted credfd.  You don't want to
expose yourself to ptrace or proc attacks from the credential
provider.  Nor do you want to suddenly get hit by rlimits, perhaps.
So maybe there really does need to be a separate subjective and
objective state.  Ugh.

Ganesha can avoid this because the caller of credfd_create is trusted.

>
> But I'd like to hear exactly how nfs and ganesha would use these.

knfsd will presumably not use it.  Ganesha will, and Jim can probably
comment further.  Samba might want to use it, too.

>
> What all would be assiciated with the credfd?  Everything that is
> in the kernel cred?

I assume so.

--Andy

>
>> Rules:
>>
>>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
>> current's userns.  credfd_activate is not intended to be a substitute
>> for setns.
>>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> switch.  This probably needs to be a new selinux action --
>> dyntransition is too restrictive.
>>
>>
>> Optional:
>>  - credfd_create always sets cloexec, because the alternative is silly.
>>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
>> don't want a privileged daemon to be ptraced while impersonating
>> someone else.
>>  - optional: both credfd_create and credfd_activate fail if
>> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>>
>> The first question: does this solve Ganesha's problem?
>>
>> The second question: is this safe?  I can see two major concerns.  The
>> bigger concern is that having these syscalls available will allow
>> users to exploit things that were previously secure.  For example,
>> maybe some configuration assumes that a task running as uid==1 can't
>> switch to uid==2, even with uid 2's consent.  Similar issues happen
>> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
>> longer really true.
>>
>> Alternatively, something running as uid == 0 with heavy capability
>> restrictions in a mount namespace (but not a uid namespace) could pass
>> a credfd out of the namespace.  This could break things like Docker
>> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
>> I think that Docker is already totally screwed if a Docker root task
>> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
>> not entirely clear that the situation is any worse, even without
>> requiring CAP_SYS_ADMIN.
>>
>> The second concern is that it may be difficult to use this correctly.
>> There's a reason that real_cred and cred exist, but it's not really
>> well set up for being used.
>>
>> As a simple way to stay safe, Ganesha could only use credfds that have
>> real_uid == 0.
>>
>> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  0:23 Thoughts on credential switching Andy Lutomirski
  2014-03-27  0:42 ` Serge Hallyn
@ 2014-03-27  2:48 ` Jeff Layton
  2014-03-27  3:05   ` Andy Lutomirski
  2014-03-27 12:46 ` Florian Weimer
  2014-03-31 10:44 ` One Thousand Gnomes
  3 siblings, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2014-03-27  2:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.
> 
> Rules:
> 
>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> current's userns.  credfd_activate is not intended to be a substitute
> for setns.
>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> switch.  This probably needs to be a new selinux action --
> dyntransition is too restrictive.
> 
> 
> Optional:
>  - credfd_create always sets cloexec, because the alternative is
> silly.
>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.
>  - optional: both credfd_create and credfd_activate fail if
> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> 
> The first question: does this solve Ganesha's problem?
> 
> The second question: is this safe?  I can see two major concerns.  The
> bigger concern is that having these syscalls available will allow
> users to exploit things that were previously secure.  For example,
> maybe some configuration assumes that a task running as uid==1 can't
> switch to uid==2, even with uid 2's consent.  Similar issues happen
> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
> longer really true.
> 
> Alternatively, something running as uid == 0 with heavy capability
> restrictions in a mount namespace (but not a uid namespace) could pass
> a credfd out of the namespace.  This could break things like Docker
> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
> I think that Docker is already totally screwed if a Docker root task
> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
> not entirely clear that the situation is any worse, even without
> requiring CAP_SYS_ADMIN.
> 
> The second concern is that it may be difficult to use this correctly.
> There's a reason that real_cred and cred exist, but it's not really
> well set up for being used.
> 
> As a simple way to stay safe, Ganesha could only use credfds that have
> real_uid == 0.
> 
> --Andy


I still don't quite grok why having this special credfd_create call
buys you anything over simply doing what Al had originally suggested --
switch creds using all of the different syscalls and then simply caching
that in a "normal" fd:

    fd = open("/dev/null", O_PATH...);

...it seems to me that the credfd_activate call will still need to do
the same permission checking that all of the individual set*id() calls
require (and all of the other stuff like changing selinux contexts,
etc).

IOW, this fd is just a "handle" for passing around a struct cred, but I
don't see why having access to that handle would allow you to do
something you couldn't already do anyway.

Am I missing something obvious here?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  2:48 ` Jeff Layton
@ 2014-03-27  3:05   ` Andy Lutomirski
  2014-03-27  3:25     ` Jeff Layton
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27  3:05 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> Hi various people who care about user-space NFS servers and/or
>> security-relevant APIs.
>>
>> I propose the following set of new syscalls:
>>
>> int credfd_create(unsigned int flags): returns a new credfd that
>> corresponds to current's creds.
>>
>> int credfd_activate(int fd, unsigned int flags): Change current's
>> creds to match the creds stored in fd.  To be clear, this changes both
>> the "subjective" and "objective" (aka real_cred and cred) because
>> there aren't any real semantics for what happens when userspace code
>> runs with real_cred != cred.
>>
>> Rules:
>>
>>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
>> current's userns.  credfd_activate is not intended to be a substitute
>> for setns.
>>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> switch.  This probably needs to be a new selinux action --
>> dyntransition is too restrictive.
>>
>>
>> Optional:
>>  - credfd_create always sets cloexec, because the alternative is
>> silly.
>>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
>> don't want a privileged daemon to be ptraced while impersonating
>> someone else.
>>  - optional: both credfd_create and credfd_activate fail if
>> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>>
>> The first question: does this solve Ganesha's problem?
>>
>> The second question: is this safe?  I can see two major concerns.  The
>> bigger concern is that having these syscalls available will allow
>> users to exploit things that were previously secure.  For example,
>> maybe some configuration assumes that a task running as uid==1 can't
>> switch to uid==2, even with uid 2's consent.  Similar issues happen
>> with capabilities.  If CAP_SYS_ADMIN is not required, then this is no
>> longer really true.
>>
>> Alternatively, something running as uid == 0 with heavy capability
>> restrictions in a mount namespace (but not a uid namespace) could pass
>> a credfd out of the namespace.  This could break things like Docker
>> pretty badly.  CAP_SYS_ADMIN guards against this to some extent.  But
>> I think that Docker is already totally screwed if a Docker root task
>> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's
>> not entirely clear that the situation is any worse, even without
>> requiring CAP_SYS_ADMIN.
>>
>> The second concern is that it may be difficult to use this correctly.
>> There's a reason that real_cred and cred exist, but it's not really
>> well set up for being used.
>>
>> As a simple way to stay safe, Ganesha could only use credfds that have
>> real_uid == 0.
>>
>> --Andy
>
>
> I still don't quite grok why having this special credfd_create call
> buys you anything over simply doing what Al had originally suggested --
> switch creds using all of the different syscalls and then simply caching
> that in a "normal" fd:
>
>     fd = open("/dev/null", O_PATH...);
>
> ...it seems to me that the credfd_activate call will still need to do
> the same permission checking that all of the individual set*id() calls
> require (and all of the other stuff like changing selinux contexts,
> etc).
>
> IOW, this fd is just a "handle" for passing around a struct cred, but I
> don't see why having access to that handle would allow you to do
> something you couldn't already do anyway.
>
> Am I missing something obvious here?

Not really.  I think I didn't adequately explain a piece of this.

I think that what you're suggesting is for an fd to encode a set of
credentials but not to grant permission to use those credentials.  So
switch_creds(fd) is more or less the same thing as switch_creds(ruid,
euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
needs to verify that the caller can dyntransition to the label, set
all the ids, etc., but it avoids allocating anything and running RCU
callbacks.

The trouble with this is that the verification needed is complicated
and expensive.  And I think that my proposal is potentially more
useful.

A credfd is like a struct cred, but possession of a credfd carries the
permission to use those credentials.  So, for example, credfd_activate
to switch to a given uid might work even if setresuid to that uid
would be disallowed.  But, for this to be secure, the act of giving
someone a credfd needs to be explicit.  Programs implicitly send other
programs their credentials by means of f_cred all the time, and they
don't expect to allow the receiver to impersonate them.

credfd has other uses.  A file server, for example, could actually
delegate creation of the credfds to a separate process, and that
process could validate that the request is for a credfd that the file
server really should be able to obtain.  This would enable that
process to make sure that the user in question has actually
authenticated itself, so a file server compromise could only attack
users who connect instead of attacking any user on the system.  This
is an argument against requiring CAP_SYS_ADMIN to use credfd_activate.

I'm less confident in whether capabilities should be needed to use
credfd_create.

Is that clearer and/or more convincing?

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  3:05   ` Andy Lutomirski
@ 2014-03-27  3:25     ` Jeff Layton
  2014-03-27 14:08       ` Jeff Layton
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2014-03-27  3:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On Wed, 26 Mar 2014 20:05:16 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton <jlayton@redhat.com>
> wrote:
> > On Wed, 26 Mar 2014 17:23:24 -0700
> > Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >> Hi various people who care about user-space NFS servers and/or
> >> security-relevant APIs.
> >>
> >> I propose the following set of new syscalls:
> >>
> >> int credfd_create(unsigned int flags): returns a new credfd that
> >> corresponds to current's creds.
> >>
> >> int credfd_activate(int fd, unsigned int flags): Change current's
> >> creds to match the creds stored in fd.  To be clear, this changes
> >> both the "subjective" and "objective" (aka real_cred and cred)
> >> because there aren't any real semantics for what happens when
> >> userspace code runs with real_cred != cred.
> >>
> >> Rules:
> >>
> >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
> >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't match
> >> current's userns.  credfd_activate is not intended to be a
> >> substitute for setns.
> >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> >> switch.  This probably needs to be a new selinux action --
> >> dyntransition is too restrictive.
> >>
> >>
> >> Optional:
> >>  - credfd_create always sets cloexec, because the alternative is
> >> silly.
> >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> >> don't want a privileged daemon to be ptraced while impersonating
> >> someone else.
> >>  - optional: both credfd_create and credfd_activate fail if
> >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> >>
> >> The first question: does this solve Ganesha's problem?
> >>
> >> The second question: is this safe?  I can see two major concerns.
> >> The bigger concern is that having these syscalls available will
> >> allow users to exploit things that were previously secure.  For
> >> example, maybe some configuration assumes that a task running as
> >> uid==1 can't switch to uid==2, even with uid 2's consent.  Similar
> >> issues happen with capabilities.  If CAP_SYS_ADMIN is not
> >> required, then this is no longer really true.
> >>
> >> Alternatively, something running as uid == 0 with heavy capability
> >> restrictions in a mount namespace (but not a uid namespace) could
> >> pass a credfd out of the namespace.  This could break things like
> >> Docker pretty badly.  CAP_SYS_ADMIN guards against this to some
> >> extent.  But I think that Docker is already totally screwed if a
> >> Docker root task can receive an O_DIRECTORY or O_PATH fd out of
> >> the container, so it's not entirely clear that the situation is
> >> any worse, even without requiring CAP_SYS_ADMIN.
> >>
> >> The second concern is that it may be difficult to use this
> >> correctly. There's a reason that real_cred and cred exist, but
> >> it's not really well set up for being used.
> >>
> >> As a simple way to stay safe, Ganesha could only use credfds that
> >> have real_uid == 0.
> >>
> >> --Andy
> >
> >
> > I still don't quite grok why having this special credfd_create call
> > buys you anything over simply doing what Al had originally
> > suggested -- switch creds using all of the different syscalls and
> > then simply caching that in a "normal" fd:
> >
> >     fd = open("/dev/null", O_PATH...);
> >
> > ...it seems to me that the credfd_activate call will still need to
> > do the same permission checking that all of the individual set*id()
> > calls require (and all of the other stuff like changing selinux
> > contexts, etc).
> >
> > IOW, this fd is just a "handle" for passing around a struct cred,
> > but I don't see why having access to that handle would allow you to
> > do something you couldn't already do anyway.
> >
> > Am I missing something obvious here?
> 
> Not really.  I think I didn't adequately explain a piece of this.
> 
> I think that what you're suggesting is for an fd to encode a set of
> credentials but not to grant permission to use those credentials.  So
> switch_creds(fd) is more or less the same thing as switch_creds(ruid,
> euid, suid, rgid, egid, sgid, groups, mac label, ...).  switch_creds
> needs to verify that the caller can dyntransition to the label, set
> all the ids, etc., but it avoids allocating anything and running RCU
> callbacks.
> 
> The trouble with this is that the verification needed is complicated
> and expensive.  And I think that my proposal is potentially more
> useful.
> 

Is it really though? My understanding of the problem was that it was
the syscall (context switching) overhead + having to do a bunch of RCU
critical stuff that was the problem. If we can do all of this in the
context of a single RCU critical section, isn't that still a win?

As to the complicated part...maybe but it doesn't seem like it would
have to be. We could simply return -EINVAL or something if the old
struct cred doesn't have fields that match the ones we're replacing and
that we don't expect to see changed.

> A credfd is like a struct cred, but possession of a credfd carries the
> permission to use those credentials.  So, for example, credfd_activate
> to switch to a given uid might work even if setresuid to that uid
> would be disallowed.  But, for this to be secure, the act of giving
> someone a credfd needs to be explicit.  Programs implicitly send other
> programs their credentials by means of f_cred all the time, and they
> don't expect to allow the receiver to impersonate them.
> 

Eek!

Passing around permission to use the credential in conjunction with
the credentials themselves sounds a lot more dangerous to me.

My preference would be that we don't add anything that potentially
gives you privileges to do something you couldn't do already with
existing syscalls. That doesn't seem to be necessary for the intended
use case anyway. When it comes to security, I think we need to err on
the side of caution than try to shortcut it for performance.

> credfd has other uses.  A file server, for example, could actually
> delegate creation of the credfds to a separate process, and that
> process could validate that the request is for a credfd that the file
> server really should be able to obtain.  This would enable that
> process to make sure that the user in question has actually
> authenticated itself, so a file server compromise could only attack
> users who connect instead of attacking any user on the system.  This
> is an argument against requiring CAP_SYS_ADMIN to use credfd_activate.
> 
> I'm less confident in whether capabilities should be needed to use
> credfd_create.
> 
> Is that clearer and/or more convincing?
> 

My worry would be that you could then compromise the process doing this
credfd creation and trick it into passing around credentials that
aren't what are expected.

Another possibility could some kernel bug allow you to frob the creds
that are attached to an existing fd after they've been "vetted" for use.

So yeah, I think I better understand what you're proposing (and thanks
for explaining it), but I'm not convinced that it's really a safe idea.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  0:23 Thoughts on credential switching Andy Lutomirski
  2014-03-27  0:42 ` Serge Hallyn
  2014-03-27  2:48 ` Jeff Layton
@ 2014-03-27 12:46 ` Florian Weimer
  2014-03-27 13:02   ` Jeff Layton
  2014-03-31 10:44 ` One Thousand Gnomes
  3 siblings, 1 reply; 42+ messages in thread
From: Florian Weimer @ 2014-03-27 12:46 UTC (permalink / raw)
  To: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On 03/27/2014 01:23 AM, Andy Lutomirski wrote:

> I propose the following set of new syscalls:
>
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
>
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.

This interface does not address the long-term lack of POSIX compliance 
in setuid and friends, which are required to be process-global and not 
thread-specific (as they are on the kernel side).

glibc works around this by reserving a signal and running set*id on 
every thread in a special signal handler.  This is just crass, and it is 
likely impossible to restore the original process state in case of 
partial failure.  We really need kernel support to perform the 
process-wide switch in an all-or-nothing manner.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 12:46 ` Florian Weimer
@ 2014-03-27 13:02   ` Jeff Layton
  2014-03-27 13:06     ` Florian Weimer
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2014-03-27 13:02 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields

On Thu, 27 Mar 2014 13:46:06 +0100
Florian Weimer <fweimer@redhat.com> wrote:

> On 03/27/2014 01:23 AM, Andy Lutomirski wrote:
> 
> > I propose the following set of new syscalls:
> >
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> >
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes
> > both the "subjective" and "objective" (aka real_cred and cred)
> > because there aren't any real semantics for what happens when
> > userspace code runs with real_cred != cred.
> 
> This interface does not address the long-term lack of POSIX
> compliance in setuid and friends, which are required to be
> process-global and not thread-specific (as they are on the kernel
> side).
> 
> glibc works around this by reserving a signal and running set*id on 
> every thread in a special signal handler.  This is just crass, and it
> is likely impossible to restore the original process state in case of 
> partial failure.  We really need kernel support to perform the 
> process-wide switch in an all-or-nothing manner.
> 

I disagree. We're treading new ground here with this syscall. It's
not defined by POSIX so we're under no obligation to follow its silly
directives in this regard. Per-process cred switching doesn't really
make much sense in this day and age, IMO. Wasn't part of the spec was
written before threading existed


The per-process credential switching is pretty universally a pain in
the ass for anyone who wants to write something like a threaded file
server. Jeremy Allison had to jump through some rather major hoops to
work around it for Samba [1]. I think we want to strive to make this a
per-task credential switch and ignore that part of the POSIX spec.

That said, I think we will need to understand and document what we
expect to occur if someone does this new switch_creds(fd) call and then
subsequently calls something like setuid(), if only to ensure that we
don't get blindsided by it.

[1]: http://sourceware.org/ml/libc-help/2012-07/msg00004.html
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 13:02   ` Jeff Layton
@ 2014-03-27 13:06     ` Florian Weimer
  2014-03-27 13:33       ` Boaz Harrosh
  2014-03-27 14:01       ` Jeff Layton
  0 siblings, 2 replies; 42+ messages in thread
From: Florian Weimer @ 2014-03-27 13:06 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields

On 03/27/2014 02:02 PM, Jeff Layton wrote:

>> This interface does not address the long-term lack of POSIX
>> compliance in setuid and friends, which are required to be
>> process-global and not thread-specific (as they are on the kernel
>> side).
>>
>> glibc works around this by reserving a signal and running set*id on
>> every thread in a special signal handler.  This is just crass, and it
>> is likely impossible to restore the original process state in case of
>> partial failure.  We really need kernel support to perform the
>> process-wide switch in an all-or-nothing manner.
>>
>
> I disagree. We're treading new ground here with this syscall. It's
> not defined by POSIX so we're under no obligation to follow its silly
> directives in this regard. Per-process cred switching doesn't really
> make much sense in this day and age, IMO. Wasn't part of the spec was
> written before threading existed

Okay, then we need to add a separate set of system calls.

I really, really want to get rid of that signal handler mess in glibc, 
with its partial failures.

> The per-process credential switching is pretty universally a pain in
> the ass for anyone who wants to write something like a threaded file
> server. Jeremy Allison had to jump through some rather major hoops to
> work around it for Samba [1]. I think we want to strive to make this a
> per-task credential switch and ignore that part of the POSIX spec.

Yeah, I get that, setfsuid/setfsgid already behaves this way.

(Current directory and umask are equally problematic, but it's possible 
to avoid most issues.)

> That said, I think we will need to understand and document what we
> expect to occur if someone does this new switch_creds(fd) call and then
> subsequently calls something like setuid(), if only to ensure that we
> don't get blindsided by it.

Currently, from the kernel perspective, this is not really a problem 
because the credentials are always per-task.  It's just that a 
conforming user space needs the process-wide credentials.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 13:06     ` Florian Weimer
@ 2014-03-27 13:33       ` Boaz Harrosh
  2014-04-22 11:37         ` Florian Weimer
  2014-03-27 14:01       ` Jeff Layton
  1 sibling, 1 reply; 42+ messages in thread
From: Boaz Harrosh @ 2014-03-27 13:33 UTC (permalink / raw)
  To: Florian Weimer, Jeff Layton, Jim Lieb
  Cc: Andy Lutomirski, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On 03/27/2014 03:06 PM, Florian Weimer wrote:
> On 03/27/2014 02:02 PM, Jeff Layton wrote:
> 
>>> This interface does not address the long-term lack of POSIX
>>> compliance in setuid and friends, which are required to be
>>> process-global and not thread-specific (as they are on the kernel
>>> side).
>>>
>>> glibc works around this by reserving a signal and running set*id on
>>> every thread in a special signal handler.  This is just crass, and it
>>> is likely impossible to restore the original process state in case of
>>> partial failure.  We really need kernel support to perform the
>>> process-wide switch in an all-or-nothing manner.
>>>
>>
>> I disagree. We're treading new ground here with this syscall. It's
>> not defined by POSIX so we're under no obligation to follow its silly
>> directives in this regard. Per-process cred switching doesn't really
>> make much sense in this day and age, IMO. Wasn't part of the spec was
>> written before threading existed
> 
> Okay, then we need to add a separate set of system calls.
> 

separate set of system calls is exactly what we do not need. The syscalls
are all per thread.

What we do need is a new set of glibc APIs to follow what everyone
who cares does now. Because no body uses the current code - setuid
and friends - The huge majority of utilities are just single threaded
and could careless. The few threaded apps that do use setuid and friends
use the SYSCALL directly.

Both samba and Ganesha have their own define of setthread_uid/setthread_gid/
setthread_groops. and avoid glibc as a plague.

man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
and be done with it POSIX or no POSIX who cares?

and a new well define set is to be defined by glibc implemented easily
not only on Linux (MACOX FreeBSD are the same as linux)

And that is that

> I really, really want to get rid of that signal handler mess in glibc, 
> with its partial failures.
> 

I would not spend one second on an API no one uses. As I said most
users are single threaded and could care less. Few threaded users
already do direct syscalls

>> The per-process credential switching is pretty universally a pain in
>> the ass for anyone who wants to write something like a threaded file
>> server. Jeremy Allison had to jump through some rather major hoops to
>> work around it for Samba [1]. I think we want to strive to make this a
>> per-task credential switch and ignore that part of the POSIX spec.
> 
> Yeah, I get that, setfsuid/setfsgid already behaves this way.
> 
> (Current directory and umask are equally problematic, but it's possible 
> to avoid most issues.)
> 
>> That said, I think we will need to understand and document what we
>> expect to occur if someone does this new switch_creds(fd) call and then
>> subsequently calls something like setuid(), if only to ensure that we
>> don't get blindsided by it.
> 
> Currently, from the kernel perspective, this is not really a problem 
> because the credentials are always per-task.  It's just that a 
> conforming user space needs the process-wide credentials.
> 

Please show me one user of that and declare it brain dead.
POSIX or not it just does not have any real programming mining
at all. Plain and simple. Even if you made  process-wide credentials
atomic it would still mean nothing.

That said. I second Jeff's point of Jim's implementation doing
all we really need. It is fast and simple and can give us all
we need and with a single added SYSCALL. The patch is already
out there. I thought it was even considered for inclusion through
Al viro.

Cheers
Boaz


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 13:06     ` Florian Weimer
  2014-03-27 13:33       ` Boaz Harrosh
@ 2014-03-27 14:01       ` Jeff Layton
  2014-03-27 18:26         ` Jeremy Allison
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2014-03-27 14:01 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields

On Thu, 27 Mar 2014 14:06:32 +0100
Florian Weimer <fweimer@redhat.com> wrote:

> On 03/27/2014 02:02 PM, Jeff Layton wrote:
> 
> >> This interface does not address the long-term lack of POSIX
> >> compliance in setuid and friends, which are required to be
> >> process-global and not thread-specific (as they are on the kernel
> >> side).
> >>
> >> glibc works around this by reserving a signal and running set*id on
> >> every thread in a special signal handler.  This is just crass, and
> >> it is likely impossible to restore the original process state in
> >> case of partial failure.  We really need kernel support to perform
> >> the process-wide switch in an all-or-nothing manner.
> >>
> >
> > I disagree. We're treading new ground here with this syscall. It's
> > not defined by POSIX so we're under no obligation to follow its
> > silly directives in this regard. Per-process cred switching doesn't
> > really make much sense in this day and age, IMO. Wasn't part of the
> > spec was written before threading existed
> 
> Okay, then we need to add a separate set of system calls.
> 
> I really, really want to get rid of that signal handler mess in
> glibc, with its partial failures.
> 

I agree, it's a hack, but hardly anyone these days really wants to
switch creds on a per-process basis. It's just that we're saddled with
a spec for those calls that was written before threads really existed.

The kernel syscalls already do the right thing as far as I'm concerned.
What would be nice however is a blessed glibc interface to them
that didn't involve all of the signal handling stuff. Then samba et. al.
wouldn't need to call syscall() directly to get at them.

> > The per-process credential switching is pretty universally a pain in
> > the ass for anyone who wants to write something like a threaded file
> > server. Jeremy Allison had to jump through some rather major hoops
> > to work around it for Samba [1]. I think we want to strive to make
> > this a per-task credential switch and ignore that part of the POSIX
> > spec.
> 
> Yeah, I get that, setfsuid/setfsgid already behaves this way.
> 
> (Current directory and umask are equally problematic, but it's
> possible to avoid most issues.)
> 
> > That said, I think we will need to understand and document what we
> > expect to occur if someone does this new switch_creds(fd) call and
> > then subsequently calls something like setuid(), if only to ensure
> > that we don't get blindsided by it.
> 
> Currently, from the kernel perspective, this is not really a problem 
> because the credentials are always per-task.  It's just that a 
> conforming user space needs the process-wide credentials.
> 

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  3:25     ` Jeff Layton
@ 2014-03-27 14:08       ` Jeff Layton
  2014-03-29  6:43         ` Alex Elsayed
  2014-03-30 13:03         ` Theodore Ts'o
  0 siblings, 2 replies; 42+ messages in thread
From: Jeff Layton @ 2014-03-27 14:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On Wed, 26 Mar 2014 20:25:35 -0700
Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 26 Mar 2014 20:05:16 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton <jlayton@redhat.com>
> > wrote:
> > > On Wed, 26 Mar 2014 17:23:24 -0700
> > > Andy Lutomirski <luto@amacapital.net> wrote:
> > >
> > >> Hi various people who care about user-space NFS servers and/or
> > >> security-relevant APIs.
> > >>
> > >> I propose the following set of new syscalls:
> > >>
> > >> int credfd_create(unsigned int flags): returns a new credfd that
> > >> corresponds to current's creds.
> > >>
> > >> int credfd_activate(int fd, unsigned int flags): Change current's
> > >> creds to match the creds stored in fd.  To be clear, this changes
> > >> both the "subjective" and "objective" (aka real_cred and cred)
> > >> because there aren't any real semantics for what happens when
> > >> userspace code runs with real_cred != cred.
> > >>
> > >> Rules:
> > >>
> > >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
> > >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't
> > >> match current's userns.  credfd_activate is not intended to be a
> > >> substitute for setns.
> > >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
> > >> switch.  This probably needs to be a new selinux action --
> > >> dyntransition is too restrictive.
> > >>
> > >>
> > >> Optional:
> > >>  - credfd_create always sets cloexec, because the alternative is
> > >> silly.
> > >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because
> > >> we don't want a privileged daemon to be ptraced while
> > >> impersonating someone else.
> > >>  - optional: both credfd_create and credfd_activate fail if
> > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
> > >>
> > >> The first question: does this solve Ganesha's problem?
> > >>
> > >> The second question: is this safe?  I can see two major concerns.
> > >> The bigger concern is that having these syscalls available will
> > >> allow users to exploit things that were previously secure.  For
> > >> example, maybe some configuration assumes that a task running as
> > >> uid==1 can't switch to uid==2, even with uid 2's consent.
> > >> Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
> > >> required, then this is no longer really true.
> > >>
> > >> Alternatively, something running as uid == 0 with heavy
> > >> capability restrictions in a mount namespace (but not a uid
> > >> namespace) could pass a credfd out of the namespace.  This could
> > >> break things like Docker pretty badly.  CAP_SYS_ADMIN guards
> > >> against this to some extent.  But I think that Docker is already
> > >> totally screwed if a Docker root task can receive an O_DIRECTORY
> > >> or O_PATH fd out of the container, so it's not entirely clear
> > >> that the situation is any worse, even without requiring
> > >> CAP_SYS_ADMIN.
> > >>
> > >> The second concern is that it may be difficult to use this
> > >> correctly. There's a reason that real_cred and cred exist, but
> > >> it's not really well set up for being used.
> > >>
> > >> As a simple way to stay safe, Ganesha could only use credfds that
> > >> have real_uid == 0.
> > >>
> > >> --Andy
> > >
> > >
> > > I still don't quite grok why having this special credfd_create
> > > call buys you anything over simply doing what Al had originally
> > > suggested -- switch creds using all of the different syscalls and
> > > then simply caching that in a "normal" fd:
> > >
> > >     fd = open("/dev/null", O_PATH...);
> > >
> > > ...it seems to me that the credfd_activate call will still need to
> > > do the same permission checking that all of the individual
> > > set*id() calls require (and all of the other stuff like changing
> > > selinux contexts, etc).
> > >
> > > IOW, this fd is just a "handle" for passing around a struct cred,
> > > but I don't see why having access to that handle would allow you
> > > to do something you couldn't already do anyway.
> > >
> > > Am I missing something obvious here?
> > 
> > Not really.  I think I didn't adequately explain a piece of this.
> > 
> > I think that what you're suggesting is for an fd to encode a set of
> > credentials but not to grant permission to use those credentials.
> > So switch_creds(fd) is more or less the same thing as
> > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
> > label, ...).  switch_creds needs to verify that the caller can
> > dyntransition to the label, set all the ids, etc., but it avoids
> > allocating anything and running RCU callbacks.
> > 
> > The trouble with this is that the verification needed is complicated
> > and expensive.  And I think that my proposal is potentially more
> > useful.
> > 
> 
> Is it really though? My understanding of the problem was that it was
> the syscall (context switching) overhead + having to do a bunch of RCU
> critical stuff that was the problem. If we can do all of this in the
> context of a single RCU critical section, isn't that still a win?
> 
> As to the complicated part...maybe but it doesn't seem like it would
> have to be. We could simply return -EINVAL or something if the old
> struct cred doesn't have fields that match the ones we're replacing
> and that we don't expect to see changed.
> 
> > A credfd is like a struct cred, but possession of a credfd carries
> > the permission to use those credentials.  So, for example,
> > credfd_activate to switch to a given uid might work even if
> > setresuid to that uid would be disallowed.  But, for this to be
> > secure, the act of giving someone a credfd needs to be explicit.
> > Programs implicitly send other programs their credentials by means
> > of f_cred all the time, and they don't expect to allow the receiver
> > to impersonate them.
> > 
> 
> Eek!
> 
> Passing around permission to use the credential in conjunction with
> the credentials themselves sounds a lot more dangerous to me.
> 
> My preference would be that we don't add anything that potentially
> gives you privileges to do something you couldn't do already with
> existing syscalls. That doesn't seem to be necessary for the intended
> use case anyway. When it comes to security, I think we need to err on
> the side of caution than try to shortcut it for performance.
> 
> > credfd has other uses.  A file server, for example, could actually
> > delegate creation of the credfds to a separate process, and that
> > process could validate that the request is for a credfd that the
> > file server really should be able to obtain.  This would enable that
> > process to make sure that the user in question has actually
> > authenticated itself, so a file server compromise could only attack
> > users who connect instead of attacking any user on the system.  This
> > is an argument against requiring CAP_SYS_ADMIN to use
> > credfd_activate.
> > 
> > I'm less confident in whether capabilities should be needed to use
> > credfd_create.
> > 
> > Is that clearer and/or more convincing?
> > 
> 
> My worry would be that you could then compromise the process doing
> this credfd creation and trick it into passing around credentials that
> aren't what are expected.
> 
> Another possibility could some kernel bug allow you to frob the creds
> that are attached to an existing fd after they've been "vetted" for
> use.
> 
> So yeah, I think I better understand what you're proposing (and thanks
> for explaining it), but I'm not convinced that it's really a safe
> idea.
> 

I had some time to think about this last night...

While using a fd to pass around credentials is convenient, the danger
is that it's pretty opaque. You have a fd that you know has creds
attached to it, but it's hard to be certain what is going to change.

Perhaps we can use the flags field for that. So, assuming we have a fd
with the creds attached, we could do something like:

    err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);

...then the switch_creds syscall could be set up to fail if the new
credentials had other fields that didn't match those in the current
task credentials. So if (for instance) the cred->euid were
different between the two, the above could fail with -EINVAL or
something.

Yes, that means that we need to check this stuff on every switch_creds
call, but I don't think it'll be that expensive.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  1:01   ` Andy Lutomirski
@ 2014-03-27 15:41     ` Florian Weimer
  2014-03-27 16:21       ` Andy Lutomirski
  0 siblings, 1 reply; 42+ messages in thread
From: Florian Weimer @ 2014-03-27 15:41 UTC (permalink / raw)
  To: Andy Lutomirski, Serge Hallyn
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

On 03/27/2014 02:01 AM, Andy Lutomirski wrote:

> Essentially, it's a performance problem.  knfsd has override_creds,
> and it can cache struct cred.  But userspace doing the same thing
> (i.e. impersonating a user) has to do setresuid, setresgid, and
> setgroups, which kills performance, since it results in something like
> five RCU callbacks per impersonation round-trip.

Do you mean setfsuid instead of setresuid?

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 15:41     ` Florian Weimer
@ 2014-03-27 16:21       ` Andy Lutomirski
  0 siblings, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27 16:21 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Serge Hallyn, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thu, Mar 27, 2014 at 8:41 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 03/27/2014 02:01 AM, Andy Lutomirski wrote:
>
>> Essentially, it's a performance problem.  knfsd has override_creds,
>> and it can cache struct cred.  But userspace doing the same thing
>> (i.e. impersonating a user) has to do setresuid, setresgid, and
>> setgroups, which kills performance, since it results in something like
>> five RCU callbacks per impersonation round-trip.
>
>
> Do you mean setfsuid instead of setresuid?

No, but I should have given context.  setfsuid sucks, since its return
value is garbage.  It's also pointless nowadays, since setresuid can
be used to sanely change only the effective uid, and doing so should
be secure.

See: http://article.gmane.org/gmane.linux.kernel/1540880

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 14:01       ` Jeff Layton
@ 2014-03-27 18:26         ` Jeremy Allison
  2014-03-27 18:46           ` Andy Lutomirski
  2014-03-27 19:30           ` Jim Lieb
  0 siblings, 2 replies; 42+ messages in thread
From: Jeremy Allison @ 2014-03-27 18:26 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Florian Weimer, Andy Lutomirski, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
> On Thu, 27 Mar 2014 14:06:32 +0100
> Florian Weimer <fweimer@redhat.com> wrote:
> 
> > On 03/27/2014 02:02 PM, Jeff Layton wrote:
> > 
> > >> This interface does not address the long-term lack of POSIX
> > >> compliance in setuid and friends, which are required to be
> > >> process-global and not thread-specific (as they are on the kernel
> > >> side).
> > >>
> > >> glibc works around this by reserving a signal and running set*id on
> > >> every thread in a special signal handler.  This is just crass, and
> > >> it is likely impossible to restore the original process state in
> > >> case of partial failure.  We really need kernel support to perform
> > >> the process-wide switch in an all-or-nothing manner.
> > >>
> > >
> > > I disagree. We're treading new ground here with this syscall. It's
> > > not defined by POSIX so we're under no obligation to follow its
> > > silly directives in this regard. Per-process cred switching doesn't
> > > really make much sense in this day and age, IMO. Wasn't part of the
> > > spec was written before threading existed
> > 
> > Okay, then we need to add a separate set of system calls.
> > 
> > I really, really want to get rid of that signal handler mess in
> > glibc, with its partial failures.
> > 
> 
> I agree, it's a hack, but hardly anyone these days really wants to
> switch creds on a per-process basis. It's just that we're saddled with
> a spec for those calls that was written before threads really existed.
> 
> The kernel syscalls already do the right thing as far as I'm concerned.
> What would be nice however is a blessed glibc interface to them
> that didn't involve all of the signal handling stuff. Then samba et. al.
> wouldn't need to call syscall() directly to get at them.

Amen to that :-).

However, after talking with Jeff and Jim at CollabSummit,
I was 'encouraged' to make my opinions known on the list.

To me, calling the creds handle a file descriptor just
feels wrong. IT *isn't* an fd, you can't read/write/poll
on it, and it's only done as a convenience to get the
close-on-exec semantics and the fact that the creds are
already hung off the fd's in kernel space.

I'd rather any creads call use a different type, even if
it's a typedef of 'int -> creds_handle_t', just to make
it really clear it's *NOT* an fd.

That way we can also make it clear this thing only has
meaning to a thread group, and SHOULD NOT (and indeed
preferably CAN NOT) be passed between processes.

Cheers,

	Jeremy.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 18:26         ` Jeremy Allison
@ 2014-03-27 18:46           ` Andy Lutomirski
  2014-03-27 18:56             ` Jeremy Allison
  2014-03-27 19:30           ` Jim Lieb
  1 sibling, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27 18:46 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Jeff Layton, Florian Weimer, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison <jra@samba.org> wrote:
>
> Amen to that :-).
>
> However, after talking with Jeff and Jim at CollabSummit,
> I was 'encouraged' to make my opinions known on the list.
>
> To me, calling the creds handle a file descriptor just
> feels wrong. IT *isn't* an fd, you can't read/write/poll
> on it, and it's only done as a convenience to get the
> close-on-exec semantics and the fact that the creds are
> already hung off the fd's in kernel space.

Windows calls these things "handles."  Linux has "file descriptors,"
and there's plenty of precedent for things that aren't files.

>
> I'd rather any creads call use a different type, even if
> it's a typedef of 'int -> creds_handle_t', just to make
> it really clear it's *NOT* an fd.
>
> That way we can also make it clear this thing only has
> meaning to a thread group, and SHOULD NOT (and indeed
> preferably CAN NOT) be passed between processes.
>

If you want those semantics, then stick a struct pid * in there for
the tgid of the cretor and make sure that current's tgid matches when
you try to use it.

I think they'd be more useful without that check, though.

BTW, what do you want to have happen on fork?  I think they should keep working.

> Cheers,
>
>         Jeremy.



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 18:46           ` Andy Lutomirski
@ 2014-03-27 18:56             ` Jeremy Allison
  2014-03-27 19:02               ` Andy Lutomirski
  0 siblings, 1 reply; 42+ messages in thread
From: Jeremy Allison @ 2014-03-27 18:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jeremy Allison, Jeff Layton, Florian Weimer, Jim Lieb,
	Eric W. Biederman, LSM List, Serge E. Hallyn, Kees Cook,
	Linux FS Devel, Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison <jra@samba.org> wrote:
> >
> > Amen to that :-).
> >
> > However, after talking with Jeff and Jim at CollabSummit,
> > I was 'encouraged' to make my opinions known on the list.
> >
> > To me, calling the creds handle a file descriptor just
> > feels wrong. IT *isn't* an fd, you can't read/write/poll
> > on it, and it's only done as a convenience to get the
> > close-on-exec semantics and the fact that the creds are
> > already hung off the fd's in kernel space.
> 
> Windows calls these things "handles."  Linux has "file descriptors,"
> and there's plenty of precedent for things that aren't files.

Sure, but there's a set of expectations around
fd's that these things don't satisfy - IO-ops.

> > That way we can also make it clear this thing only has
> > meaning to a thread group, and SHOULD NOT (and indeed
> > preferably CAN NOT) be passed between processes.
> >
> 
> If you want those semantics, then stick a struct pid * in there for
> the tgid of the cretor and make sure that current's tgid matches when
> you try to use it.
> 
> I think they'd be more useful without that check, though.

I'm more worried about leakage and unintended consequences
here.

> BTW, what do you want to have happen on fork?  I think they should keep working.

Yeah, that's true. I want them to keep
working across fork, but not across exec
or any other method of fd-passing.

Jeremy.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 18:56             ` Jeremy Allison
@ 2014-03-27 19:02               ` Andy Lutomirski
  0 siblings, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27 19:02 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Jeff Layton, Florian Weimer, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 11:56 AM, Jeremy Allison <jra@samba.org> wrote:
> On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison <jra@samba.org> wrote:
>> >
>> > Amen to that :-).
>> >
>> > However, after talking with Jeff and Jim at CollabSummit,
>> > I was 'encouraged' to make my opinions known on the list.
>> >
>> > To me, calling the creds handle a file descriptor just
>> > feels wrong. IT *isn't* an fd, you can't read/write/poll
>> > on it, and it's only done as a convenience to get the
>> > close-on-exec semantics and the fact that the creds are
>> > already hung off the fd's in kernel space.
>>
>> Windows calls these things "handles."  Linux has "file descriptors,"
>> and there's plenty of precedent for things that aren't files.
>
> Sure, but there's a set of expectations around
> fd's that these things don't satisfy - IO-ops.

eventfd, timerfd, and signalfd barely satisfy those.  namespace fds
don't satisfy those expectations at all.  And /proc/pid/fd is really
quite useful for debugging.

>
>> > That way we can also make it clear this thing only has
>> > meaning to a thread group, and SHOULD NOT (and indeed
>> > preferably CAN NOT) be passed between processes.
>> >
>>
>> If you want those semantics, then stick a struct pid * in there for
>> the tgid of the cretor and make sure that current's tgid matches when
>> you try to use it.
>>
>> I think they'd be more useful without that check, though.
>
> I'm more worried about leakage and unintended consequences
> here.
>
>> BTW, what do you want to have happen on fork?  I think they should keep working.
>
> Yeah, that's true. I want them to keep
> working across fork, but not across exec
> or any other method of fd-passing.
>

This seems like an unfortunate restriction to put in the kernel to
prevent userspace from shooting itself in the foot.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Thoughts on credential switching
  2014-03-27 18:26         ` Jeremy Allison
  2014-03-27 18:46           ` Andy Lutomirski
@ 2014-03-27 19:30           ` Jim Lieb
  2014-03-27 19:45             ` Andy Lutomirski
  1 sibling, 1 reply; 42+ messages in thread
From: Jim Lieb @ 2014-03-27 19:30 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Jeff Layton, Florian Weimer, Andy Lutomirski, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

Rather than inline, I'm responding in the context of Jeremy's comments but I 
have to answer others as well.  It is Jeremy after all who said my baby was 
ugly ;).

Jeremy is right about overloading "fd".  Maybe I can call it something else 
but an fd (in implementation) has merit because a creds struct hangs off of 
either/or/both task_struct and file but nothing else.  Coming up with a creds 
token struct adds/intros another struct to the kernel that has to justify its 
existence.  This fd points to an anon inode so the resource consumption is 
pretty low and all the paths for managing it are well known and *work*.  I'm 
trying to get across the swamp, not build a new Golden Gate bridge...

As for POSIX, all of the pthreads API was based on CDE threads whose initial 
implementation was on Ultrix (BSD 4.2).  Process wide was assumed because 
utheeads was a user space hack and setuid was process wide because a proc was 
a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach 
2.5 kernel just carried that forward with its proc + thread(s) model to be 
compatible with the old world.  In other words, we incrementally backed 
ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't 
see all that far into the future.  Enough said.  These calls are *outside* 
POSIX.  Pthreads in 2014 is broken/obsolete.

For the interface, I changed it from what is in the cited lkml below.   It is 
now:

   int switch_creds(struct user_creds *creds);

Behavior is the same except the mux'd syscall idea is gone.  Adding a flags arg 
to this is a useful idea both for current use and future proofing the API.  But 
this requires a second call

   int use_creds(int fd);

This call does the "use an fd" case but adds -1 to revert to real creds.  Its 
guts are either an override_creds(fd->file->creds) or a revert_creds().  Nice 
and quick.  Note that use_creds(fd) is used if I have cached the fd/token from 
switch_creds().  Also nice and quick.

Given that I've done the checking in switch_creds and I don't pass anything 
back other than the token/fd and this token/fd is/will be confined to the 
thread group, use_creds(fd) only needs to validate that it is a switch_creds 
one, not from an arbitrary open().  I do this.

I have focused this down to "fsuid" because I intend this for ops that file 
perms.  Other stuff is out of scope unless someone can come up with a use case 
and add flag defs...  The other variations on the theme uid, euid, and that 
other one I don't understand the use for, are for long lasting creds change, 
i.e. become user "bob" and go off an fork a shell...  I am wrapping I/O.

I do not like the idea of spinning off a separate proc to do creds work.  It 
doesn't buy anything in performance (everybody is a task to the kernel) but it 
does open a door to scary places.  Jeremy and I agree that this token/fd must 
stay within the thread group, aka, process.  I have already (in the newer 
patchset) tied off inheritance by opening the anon fd with close-on-exec.  I 
think I tied off passing the fd thru a unix socket but if not, I will in my 
next submission.  This fd/token should stay within the thread group, period.

As to the "get an fd and then do set*id", you have to do this twice because 
that fd gets the creds at the time of open, not after fooling around.  I am 
trying to avoid multiple RCU cycles, not add more.  Second, the above path 
makes the creds switch atomic because I use the creds swapping infrastructure. 
Popping back up to user space before that *all* happens opens all kinds of 
ptrace+signal+??? holes.

Note also that I mask caps as well in the override creds including the caps to 
do things like set*id.  That is intentional.  This whole idea is to constrain 
the thread, not open a door *but* still provide a way to get back home 
(safely).  That is via use_creds(-1).

Some have proposed that we personalize a worker thread (the rpc op processor) 
to the creds of the client user right off.  Ganesha only needs to do this user 
creds work for VFS (kernel local) filesystems.  Most of our cluster filesystems 
have apis that allow us to pass creds directly on calls.  We only need this 
for that local mounted namespace.  The server core owns rpc and ops, the 
backend (FSAL) owns the shim layer.  User creds are backend...  Having a 
personalized thread complicates the core.

As I mentioned at LSF, I have another set pending that needs a bit more polish 
to answer issues from the last cycle.  I need to fix the issue of handling 
syscalls that would do their own creds swapping inside the switch_creds ... 
use_creds(-1) region.  The patch causes these syscalls, e.g. setuid() to 
EPERM.  Again, I like this because I want the client creds impersonating 
thread to only be able to do I/O, not escape into the wild.

Jim
On Thursday, March 27, 2014 11:26:17 Jeremy Allison wrote:
> On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
> > On Thu, 27 Mar 2014 14:06:32 +0100
> > 
> > Florian Weimer <fweimer@redhat.com> wrote:
> > > On 03/27/2014 02:02 PM, Jeff Layton wrote:
> > > >> This interface does not address the long-term lack of POSIX
> > > >> compliance in setuid and friends, which are required to be
> > > >> process-global and not thread-specific (as they are on the kernel
> > > >> side).
> > > >> 
> > > >> glibc works around this by reserving a signal and running set*id on
> > > >> every thread in a special signal handler.  This is just crass, and
> > > >> it is likely impossible to restore the original process state in
> > > >> case of partial failure.  We really need kernel support to perform
> > > >> the process-wide switch in an all-or-nothing manner.
> > > > 
> > > > I disagree. We're treading new ground here with this syscall. It's
> > > > not defined by POSIX so we're under no obligation to follow its
> > > > silly directives in this regard. Per-process cred switching doesn't
> > > > really make much sense in this day and age, IMO. Wasn't part of the
> > > > spec was written before threading existed
> > > 
> > > Okay, then we need to add a separate set of system calls.
> > > 
> > > I really, really want to get rid of that signal handler mess in
> > > glibc, with its partial failures.
> > 
> > I agree, it's a hack, but hardly anyone these days really wants to
> > switch creds on a per-process basis. It's just that we're saddled with
> > a spec for those calls that was written before threads really existed.
> > 
> > The kernel syscalls already do the right thing as far as I'm concerned.
> > What would be nice however is a blessed glibc interface to them
> > that didn't involve all of the signal handling stuff. Then samba et. al.
> > wouldn't need to call syscall() directly to get at them.
> 
> Amen to that :-).
> 
> However, after talking with Jeff and Jim at CollabSummit,
> I was 'encouraged' to make my opinions known on the list.
> 
> To me, calling the creds handle a file descriptor just
> feels wrong. IT *isn't* an fd, you can't read/write/poll
> on it, and it's only done as a convenience to get the
> close-on-exec semantics and the fact that the creds are
> already hung off the fd's in kernel space.
> 
> I'd rather any creads call use a different type, even if
> it's a typedef of 'int -> creds_handle_t', just to make
> it really clear it's *NOT* an fd.
> 
> That way we can also make it clear this thing only has
> meaning to a thread group, and SHOULD NOT (and indeed
> preferably CAN NOT) be passed between processes.
> 
> Cheers,
> 
> 	Jeremy.

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Thoughts on credential switching
  2014-03-27 19:30           ` Jim Lieb
@ 2014-03-27 19:45             ` Andy Lutomirski
  2014-03-27 20:47                 ` Jim Lieb
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27 19:45 UTC (permalink / raw)
  To: Jim Lieb
  Cc: Jeremy Allison, Jeff Layton, Florian Weimer, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb <jlieb@panasas.com> wrote:
> Rather than inline, I'm responding in the context of Jeremy's comments but I
> have to answer others as well.  It is Jeremy after all who said my baby was
> ugly ;).
>
> Jeremy is right about overloading "fd".  Maybe I can call it something else
> but an fd (in implementation) has merit because a creds struct hangs off of
> either/or/both task_struct and file but nothing else.  Coming up with a creds
> token struct adds/intros another struct to the kernel that has to justify its
> existence.  This fd points to an anon inode so the resource consumption is
> pretty low and all the paths for managing it are well known and *work*.  I'm
> trying to get across the swamp, not build a new Golden Gate bridge...
>
> As for POSIX, all of the pthreads API was based on CDE threads whose initial
> implementation was on Ultrix (BSD 4.2).  Process wide was assumed because
> utheeads was a user space hack and setuid was process wide because a proc was
> a just that.  Good grief, that kernel was UP...  When OSF/1 appeared, the Mach
> 2.5 kernel just carried that forward with its proc + thread(s) model to be
> compatible with the old world.  In other words, we incrementally backed
> ourselves into a corner.  Declaring POSIX now avoids admitting that we didn't
> see all that far into the future.  Enough said.  These calls are *outside*
> POSIX.  Pthreads in 2014 is broken/obsolete.
>
> For the interface, I changed it from what is in the cited lkml below.   It is
> now:
>
>    int switch_creds(struct user_creds *creds);

What is struct user_creds?  And why is this called switch_creds?  It
doesn't switch anything.

>
> Behavior is the same except the mux'd syscall idea is gone.  Adding a flags arg
> to this is a useful idea both for current use and future proofing the API.  But
> this requires a second call
>
>    int use_creds(int fd);
>
> This call does the "use an fd" case but adds -1 to revert to real creds.  Its
> guts are either an override_creds(fd->file->creds) or a revert_creds().  Nice
> and quick.  Note that use_creds(fd) is used if I have cached the fd/token from
> switch_creds().  Also nice and quick.
>
> Given that I've done the checking in switch_creds and I don't pass anything
> back other than the token/fd and this token/fd is/will be confined to the
> thread group, use_creds(fd) only needs to validate that it is a switch_creds
> one, not from an arbitrary open().  I do this.

Not so fast...  what if you start privileged, create a cred fd, call
unshare, do a dyntransition, chroot, drop privileges, and call
use_creds?  I don't immediately see why this is insecure, but having
it be secure seems to be more or less the same condition as having my
credfd_activate be secure.

And I still don't see why you need revert at all.  Just create a
second token/fd/whatever representing your initial creds and switch to
that.

>
> I have focused this down to "fsuid" because I intend this for ops that file
> perms.  Other stuff is out of scope unless someone can come up with a use case
> and add flag defs...  The other variations on the theme uid, euid, and that
> other one I don't understand the use for, are for long lasting creds change,
> i.e. become user "bob" and go off an fork a shell...  I am wrapping I/O.

Isn't there euid for that?

>
> I do not like the idea of spinning off a separate proc to do creds work.  It
> doesn't buy anything in performance (everybody is a task to the kernel) but it
> does open a door to scary places.  Jeremy and I agree that this token/fd must
> stay within the thread group, aka, process.  I have already (in the newer
> patchset) tied off inheritance by opening the anon fd with close-on-exec.  I
> think I tied off passing the fd thru a unix socket but if not, I will in my
> next submission.  This fd/token should stay within the thread group, period.

Maybe I'm uniquely not scared of adding sane interfaces.  setuid(2) is
insane.  Impersonating a token is quite sane and even has lots of
prior art.

>
> As to the "get an fd and then do set*id", you have to do this twice because
> that fd gets the creds at the time of open, not after fooling around.  I am
> trying to avoid multiple RCU cycles, not add more.  Second, the above path
> makes the creds switch atomic because I use the creds swapping infrastructure.
> Popping back up to user space before that *all* happens opens all kinds of
> ptrace+signal+??? holes.

I assume you're planning on caching these things.  So spending some
cycles setting this thing up shouldn't matter much.

If you want to add a totally separate syscall
setresfsuidgidgroupscaps, be my guest :)  It would actually be
generally useful.

>
> Note also that I mask caps as well in the override creds including the caps to
> do things like set*id.  That is intentional.  This whole idea is to constrain
> the thread, not open a door *but* still provide a way to get back home
> (safely).  That is via use_creds(-1).
>
> Some have proposed that we personalize a worker thread (the rpc op processor)
> to the creds of the client user right off.  Ganesha only needs to do this user
> creds work for VFS (kernel local) filesystems.  Most of our cluster filesystems
> have apis that allow us to pass creds directly on calls.  We only need this
> for that local mounted namespace.  The server core owns rpc and ops, the
> backend (FSAL) owns the shim layer.  User creds are backend...  Having a
> personalized thread complicates the core.
>
> As I mentioned at LSF, I have another set pending that needs a bit more polish
> to answer issues from the last cycle.  I need to fix the issue of handling
> syscalls that would do their own creds swapping inside the switch_creds ...
> use_creds(-1) region.  The patch causes these syscalls, e.g. setuid() to
> EPERM.  Again, I like this because I want the client creds impersonating
> thread to only be able to do I/O, not escape into the wild.

Eek!  You want this for I/O.  What if someone else wants it for
something else?  Any where does the actual list of what syscalls get
blocked come from?

I think that your patches will get a *lot* simpler if you get rid of
this override_creds and revert stuff and just switch the entire set of
creds.  No setuid blocking, no BUG, and no need to audit the whole
tree for odd real_creds uses.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Re: Thoughts on credential switching
  2014-03-27 19:45             ` Andy Lutomirski
@ 2014-03-27 20:47                 ` Jim Lieb
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Lieb @ 2014-03-27 20:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jeremy Allison, Jeff Layton, Florian Weimer, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb <jlieb@panasas.com> wrote:
> > Rather than inline, I'm responding in the context of Jeremy's comments but
> > I have to answer others as well.  It is Jeremy after all who said my baby
> > was ugly ;).
> > 
> > Jeremy is right about overloading "fd".  Maybe I can call it something
> > else
> > but an fd (in implementation) has merit because a creds struct hangs off
> > of
> > either/or/both task_struct and file but nothing else.  Coming up with a
> > creds token struct adds/intros another struct to the kernel that has to
> > justify its existence.  This fd points to an anon inode so the resource
> > consumption is pretty low and all the paths for managing it are well
> > known and *work*.  I'm trying to get across the swamp, not build a new
> > Golden Gate bridge...
> > 
> > As for POSIX, all of the pthreads API was based on CDE threads whose
> > initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
> > because utheeads was a user space hack and setuid was process wide
> > because a proc was a just that.  Good grief, that kernel was UP...  When
> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
> > proc + thread(s) model to be compatible with the old world.  In other
> > words, we incrementally backed ourselves into a corner.  Declaring POSIX
> > now avoids admitting that we didn't see all that far into the future. 
> > Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
> > broken/obsolete.
> > 
> > For the interface, I changed it from what is in the cited lkml below.   It
> > is> 
> > now:
> >    int switch_creds(struct user_creds *creds);
> 
> What is struct user_creds?  And why is this called switch_creds?  It
> doesn't switch anything.

Sorry, it was in the cited lkml... It is:

struct user_creds {
	uid_t fsuid;
	gid_t fsgid;
	int ngroups;
	gid_t altgroups[];
};

ngroups is limited by NGROUPS.

It is called switch_creds because it does switch them to the contents of 
creds.  When you return from switch_creds "bob" your fsuid is "Bob".  The 
return value is a handle to those creds so the next time you need to be "Bob" 
you can just:

     use_creds(bob_creds);

> 
> > Behavior is the same except the mux'd syscall idea is gone.  Adding a
> > flags arg to this is a useful idea both for current use and future
> > proofing the API.  But this requires a second call
> > 
> >    int use_creds(int fd);
> > 
> > This call does the "use an fd" case but adds -1 to revert to real creds. 
> > Its guts are either an override_creds(fd->file->creds) or a
> > revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
> > have cached the fd/token from switch_creds().  Also nice and quick.
> > 
> > Given that I've done the checking in switch_creds and I don't pass
> > anything
> > back other than the token/fd and this token/fd is/will be confined to the
> > thread group, use_creds(fd) only needs to validate that it is a
> > switch_creds one, not from an arbitrary open().  I do this.
> 
> Not so fast...  what if you start privileged, create a cred fd, call
> unshare, do a dyntransition, chroot, drop privileges, and call
> use_creds?  I don't immediately see why this is insecure, but having
> it be secure seems to be more or less the same condition as having my
> credfd_activate be secure.
> 
> And I still don't see why you need revert at all.  Just create a
> second token/fd/whatever representing your initial creds and switch to
> that.

The creds you get have also had the capabilities reduced.  You can't do a 
chroot because it will EPERM.  Same with a lot of others.  This is a 
restricted file access jail in itself.  We *reduce* privs+caps here.

The use_creds has a shortcut of -1 to revert to real creds.  So if you do a 
series of these, how do you get back to your priv'd state?  You can either 
keep track of that yourself or just pass in -1 to use_creds() and know you 
are...
> 
> > I have focused this down to "fsuid" because I intend this for ops that
> > file
> > perms.  Other stuff is out of scope unless someone can come up with a use
> > case and add flag defs...  The other variations on the theme uid, euid,
> > and that other one I don't understand the use for, are for long lasting
> > creds change, i.e. become user "bob" and go off an fork a shell...  I am
> > wrapping I/O.
> Isn't there euid for that?

Right.  A crappy interface given that creds are really more than just euid or 
even egid but it does that deed just fine.  It is for long term as in "become 
Dave and run this program..."  I don't change that.  I'm dealing with "As Bob, 
do a pwrite(s)+fstat" across N cores and M ops per core...

> 
> > I do not like the idea of spinning off a separate proc to do creds work. 
> > It doesn't buy anything in performance (everybody is a task to the
> > kernel) but it does open a door to scary places.  Jeremy and I agree that
> > this token/fd must stay within the thread group, aka, process.  I have
> > already (in the newer patchset) tied off inheritance by opening the anon
> > fd with close-on-exec.  I think I tied off passing the fd thru a unix
> > socket but if not, I will in my next submission.  This fd/token should
> > stay within the thread group, period.
> Maybe I'm uniquely not scared of adding sane interfaces.  setuid(2) is
> insane.  Impersonating a token is quite sane and even has lots of
> prior art.
> 
> > As to the "get an fd and then do set*id", you have to do this twice
> > because
> > that fd gets the creds at the time of open, not after fooling around.  I
> > am
> > trying to avoid multiple RCU cycles, not add more.  Second, the above path
> > makes the creds switch atomic because I use the creds swapping
> > infrastructure. Popping back up to user space before that *all* happens
> > opens all kinds of ptrace+signal+??? holes.
> 
> I assume you're planning on caching these things.  So spending some
> cycles setting this thing up shouldn't matter much.

Indeed.  NFS usage patterns as such that the RCU overhead (1 instead of 6) is 
more than fine given that the "untar" the user is doing first LOOKUP, LOOKUP, 
OPEN followed by a boatload of WRITEs.

> 
> If you want to add a totally separate syscall
> setresfsuidgidgroupscaps, be my guest :)  It would actually be
> generally useful.
> 
> > Note also that I mask caps as well in the override creds including the
> > caps to do things like set*id.  That is intentional.  This whole idea is
> > to constrain the thread, not open a door *but* still provide a way to get
> > back home (safely).  That is via use_creds(-1).
> > 
> > Some have proposed that we personalize a worker thread (the rpc op
> > processor) to the creds of the client user right off.  Ganesha only needs
> > to do this user creds work for VFS (kernel local) filesystems.  Most of
> > our cluster filesystems have apis that allow us to pass creds directly on
> > calls.  We only need this for that local mounted namespace.  The server
> > core owns rpc and ops, the backend (FSAL) owns the shim layer.  User
> > creds are backend...  Having a personalized thread complicates the core.
> > 
> > As I mentioned at LSF, I have another set pending that needs a bit more
> > polish to answer issues from the last cycle.  I need to fix the issue of
> > handling syscalls that would do their own creds swapping inside the
> > switch_creds ... use_creds(-1) region.  The patch causes these syscalls,
> > e.g. setuid() to EPERM.  Again, I like this because I want the client
> > creds impersonating thread to only be able to do I/O, not escape into the
> > wild.
> 
> Eek!  You want this for I/O.  What if someone else wants it for
> something else?  Any where does the actual list of what syscalls get
> blocked come from?

The "blocked" list is twofold.  First, any syscall that needs privs that I 
have explicitly given up here (see the nfs_setuser caps that knfsd masks)
The second is, with the new patches, any syscall that ends up doing a 
commit_creds().  That is, in general, all the set*id/groups calls, anything 
messing with keys, execve, fork, and security twiddling code.  The more I 
think about this, it is a good thing.  They would return EPERM which for my 
use case wouldn't happen but for exploit attempts it slams the door with a 
clean error.

> 
> I think that your patches will get a *lot* simpler if you get rid of
> this override_creds and revert stuff and just switch the entire set of
> creds.  No setuid blocking, no BUG, and no need to audit the whole
> tree for odd real_creds uses.

I really do want override.  Otherwise, it is an RCU and I can get into a 
position where I can't revert leaving me with the even worse situation of 
doing a thread exit because it is no good for anything else anymore.

As for someone else coming up with a new use case, sure, we can think about 
this.  One idea has been proposed to have a flags arg to spec the "type" of uid 
etc.  If there is motivation for that, I could add a flags arg with only one 
"do fsuid" as its first and, for now, only defined bit.

As for the BUG(task->creds != task->real_creds), this should really be handled 
much earlier and more friendly (EPERM) than oops'ing the task.  That is what 
the promised additional patch does.  I did audit the commit_creds calls to 
come up with the patch and the list above.

> 
> --Andy

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Re: Thoughts on credential switching
@ 2014-03-27 20:47                 ` Jim Lieb
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Lieb @ 2014-03-27 20:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jeremy Allison, Jeff Layton, Florian Weimer, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb <jlieb@panasas.com> wrote:
> > Rather than inline, I'm responding in the context of Jeremy's comments but
> > I have to answer others as well.  It is Jeremy after all who said my baby
> > was ugly ;).
> > 
> > Jeremy is right about overloading "fd".  Maybe I can call it something
> > else
> > but an fd (in implementation) has merit because a creds struct hangs off
> > of
> > either/or/both task_struct and file but nothing else.  Coming up with a
> > creds token struct adds/intros another struct to the kernel that has to
> > justify its existence.  This fd points to an anon inode so the resource
> > consumption is pretty low and all the paths for managing it are well
> > known and *work*.  I'm trying to get across the swamp, not build a new
> > Golden Gate bridge...
> > 
> > As for POSIX, all of the pthreads API was based on CDE threads whose
> > initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
> > because utheeads was a user space hack and setuid was process wide
> > because a proc was a just that.  Good grief, that kernel was UP...  When
> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
> > proc + thread(s) model to be compatible with the old world.  In other
> > words, we incrementally backed ourselves into a corner.  Declaring POSIX
> > now avoids admitting that we didn't see all that far into the future. 
> > Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
> > broken/obsolete.
> > 
> > For the interface, I changed it from what is in the cited lkml below.   It
> > is> 
> > now:
> >    int switch_creds(struct user_creds *creds);
> 
> What is struct user_creds?  And why is this called switch_creds?  It
> doesn't switch anything.

Sorry, it was in the cited lkml... It is:

struct user_creds {
	uid_t fsuid;
	gid_t fsgid;
	int ngroups;
	gid_t altgroups[];
};

ngroups is limited by NGROUPS.

It is called switch_creds because it does switch them to the contents of 
creds.  When you return from switch_creds "bob" your fsuid is "Bob".  The 
return value is a handle to those creds so the next time you need to be "Bob" 
you can just:

     use_creds(bob_creds);

> 
> > Behavior is the same except the mux'd syscall idea is gone.  Adding a
> > flags arg to this is a useful idea both for current use and future
> > proofing the API.  But this requires a second call
> > 
> >    int use_creds(int fd);
> > 
> > This call does the "use an fd" case but adds -1 to revert to real creds. 
> > Its guts are either an override_creds(fd->file->creds) or a
> > revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
> > have cached the fd/token from switch_creds().  Also nice and quick.
> > 
> > Given that I've done the checking in switch_creds and I don't pass
> > anything
> > back other than the token/fd and this token/fd is/will be confined to the
> > thread group, use_creds(fd) only needs to validate that it is a
> > switch_creds one, not from an arbitrary open().  I do this.
> 
> Not so fast...  what if you start privileged, create a cred fd, call
> unshare, do a dyntransition, chroot, drop privileges, and call
> use_creds?  I don't immediately see why this is insecure, but having
> it be secure seems to be more or less the same condition as having my
> credfd_activate be secure.
> 
> And I still don't see why you need revert at all.  Just create a
> second token/fd/whatever representing your initial creds and switch to
> that.

The creds you get have also had the capabilities reduced.  You can't do a 
chroot because it will EPERM.  Same with a lot of others.  This is a 
restricted file access jail in itself.  We *reduce* privs+caps here.

The use_creds has a shortcut of -1 to revert to real creds.  So if you do a 
series of these, how do you get back to your priv'd state?  You can either 
keep track of that yourself or just pass in -1 to use_creds() and know you 
are...
> 
> > I have focused this down to "fsuid" because I intend this for ops that
> > file
> > perms.  Other stuff is out of scope unless someone can come up with a use
> > case and add flag defs...  The other variations on the theme uid, euid,
> > and that other one I don't understand the use for, are for long lasting
> > creds change, i.e. become user "bob" and go off an fork a shell...  I am
> > wrapping I/O.
> Isn't there euid for that?

Right.  A crappy interface given that creds are really more than just euid or 
even egid but it does that deed just fine.  It is for long term as in "become 
Dave and run this program..."  I don't change that.  I'm dealing with "As Bob, 
do a pwrite(s)+fstat" across N cores and M ops per core...

> 
> > I do not like the idea of spinning off a separate proc to do creds work. 
> > It doesn't buy anything in performance (everybody is a task to the
> > kernel) but it does open a door to scary places.  Jeremy and I agree that
> > this token/fd must stay within the thread group, aka, process.  I have
> > already (in the newer patchset) tied off inheritance by opening the anon
> > fd with close-on-exec.  I think I tied off passing the fd thru a unix
> > socket but if not, I will in my next submission.  This fd/token should
> > stay within the thread group, period.
> Maybe I'm uniquely not scared of adding sane interfaces.  setuid(2) is
> insane.  Impersonating a token is quite sane and even has lots of
> prior art.
> 
> > As to the "get an fd and then do set*id", you have to do this twice
> > because
> > that fd gets the creds at the time of open, not after fooling around.  I
> > am
> > trying to avoid multiple RCU cycles, not add more.  Second, the above path
> > makes the creds switch atomic because I use the creds swapping
> > infrastructure. Popping back up to user space before that *all* happens
> > opens all kinds of ptrace+signal+??? holes.
> 
> I assume you're planning on caching these things.  So spending some
> cycles setting this thing up shouldn't matter much.

Indeed.  NFS usage patterns as such that the RCU overhead (1 instead of 6) is 
more than fine given that the "untar" the user is doing first LOOKUP, LOOKUP, 
OPEN followed by a boatload of WRITEs.

> 
> If you want to add a totally separate syscall
> setresfsuidgidgroupscaps, be my guest :)  It would actually be
> generally useful.
> 
> > Note also that I mask caps as well in the override creds including the
> > caps to do things like set*id.  That is intentional.  This whole idea is
> > to constrain the thread, not open a door *but* still provide a way to get
> > back home (safely).  That is via use_creds(-1).
> > 
> > Some have proposed that we personalize a worker thread (the rpc op
> > processor) to the creds of the client user right off.  Ganesha only needs
> > to do this user creds work for VFS (kernel local) filesystems.  Most of
> > our cluster filesystems have apis that allow us to pass creds directly on
> > calls.  We only need this for that local mounted namespace.  The server
> > core owns rpc and ops, the backend (FSAL) owns the shim layer.  User
> > creds are backend...  Having a personalized thread complicates the core.
> > 
> > As I mentioned at LSF, I have another set pending that needs a bit more
> > polish to answer issues from the last cycle.  I need to fix the issue of
> > handling syscalls that would do their own creds swapping inside the
> > switch_creds ... use_creds(-1) region.  The patch causes these syscalls,
> > e.g. setuid() to EPERM.  Again, I like this because I want the client
> > creds impersonating thread to only be able to do I/O, not escape into the
> > wild.
> 
> Eek!  You want this for I/O.  What if someone else wants it for
> something else?  Any where does the actual list of what syscalls get
> blocked come from?

The "blocked" list is twofold.  First, any syscall that needs privs that I 
have explicitly given up here (see the nfs_setuser caps that knfsd masks)
The second is, with the new patches, any syscall that ends up doing a 
commit_creds().  That is, in general, all the set*id/groups calls, anything 
messing with keys, execve, fork, and security twiddling code.  The more I 
think about this, it is a good thing.  They would return EPERM which for my 
use case wouldn't happen but for exploit attempts it slams the door with a 
clean error.

> 
> I think that your patches will get a *lot* simpler if you get rid of
> this override_creds and revert stuff and just switch the entire set of
> creds.  No setuid blocking, no BUG, and no need to audit the whole
> tree for odd real_creds uses.

I really do want override.  Otherwise, it is an RCU and I can get into a 
position where I can't revert leaving me with the even worse situation of 
doing a thread exit because it is no good for anything else anymore.

As for someone else coming up with a new use case, sure, we can think about 
this.  One idea has been proposed to have a flags arg to spec the "type" of uid 
etc.  If there is motivation for that, I could add a flags arg with only one 
"do fsuid" as its first and, for now, only defined bit.

As for the BUG(task->creds != task->real_creds), this should really be handled 
much earlier and more friendly (EPERM) than oops'ing the task.  That is what 
the promised additional patch does.  I did audit the commit_creds calls to 
come up with the patch and the list above.

> 
> --Andy

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Re: Thoughts on credential switching
  2014-03-27 20:47                 ` Jim Lieb
  (?)
@ 2014-03-27 21:19                 ` Andy Lutomirski
  -1 siblings, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-27 21:19 UTC (permalink / raw)
  To: Jim Lieb
  Cc: Jeremy Allison, Jeff Layton, Florian Weimer, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Thu, Mar 27, 2014 at 1:47 PM, Jim Lieb <jlieb@panasas.com> wrote:
> On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote:
>> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb <jlieb@panasas.com> wrote:
>> > Rather than inline, I'm responding in the context of Jeremy's comments but
>> > I have to answer others as well.  It is Jeremy after all who said my baby
>> > was ugly ;).
>> >
>> > Jeremy is right about overloading "fd".  Maybe I can call it something
>> > else
>> > but an fd (in implementation) has merit because a creds struct hangs off
>> > of
>> > either/or/both task_struct and file but nothing else.  Coming up with a
>> > creds token struct adds/intros another struct to the kernel that has to
>> > justify its existence.  This fd points to an anon inode so the resource
>> > consumption is pretty low and all the paths for managing it are well
>> > known and *work*.  I'm trying to get across the swamp, not build a new
>> > Golden Gate bridge...
>> >
>> > As for POSIX, all of the pthreads API was based on CDE threads whose
>> > initial implementation was on Ultrix (BSD 4.2).  Process wide was assumed
>> > because utheeads was a user space hack and setuid was process wide
>> > because a proc was a just that.  Good grief, that kernel was UP...  When
>> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its
>> > proc + thread(s) model to be compatible with the old world.  In other
>> > words, we incrementally backed ourselves into a corner.  Declaring POSIX
>> > now avoids admitting that we didn't see all that far into the future.
>> > Enough said.  These calls are *outside* POSIX.  Pthreads in 2014 is
>> > broken/obsolete.
>> >
>> > For the interface, I changed it from what is in the cited lkml below.   It
>> > is>
>> > now:
>> >    int switch_creds(struct user_creds *creds);
>>
>> What is struct user_creds?  And why is this called switch_creds?  It
>> doesn't switch anything.
>
> Sorry, it was in the cited lkml... It is:
>
> struct user_creds {
>         uid_t fsuid;
>         gid_t fsgid;
>         int ngroups;
>         gid_t altgroups[];
> };
>
> ngroups is limited by NGROUPS.
>
> It is called switch_creds because it does switch them to the contents of
> creds.  When you return from switch_creds "bob" your fsuid is "Bob".  The
> return value is a handle to those creds so the next time you need to be "Bob"
> you can just:
>
>      use_creds(bob_creds);
>
>>
>> > Behavior is the same except the mux'd syscall idea is gone.  Adding a
>> > flags arg to this is a useful idea both for current use and future
>> > proofing the API.  But this requires a second call
>> >
>> >    int use_creds(int fd);
>> >
>> > This call does the "use an fd" case but adds -1 to revert to real creds.
>> > Its guts are either an override_creds(fd->file->creds) or a
>> > revert_creds().  Nice and quick.  Note that use_creds(fd) is used if I
>> > have cached the fd/token from switch_creds().  Also nice and quick.
>> >
>> > Given that I've done the checking in switch_creds and I don't pass
>> > anything
>> > back other than the token/fd and this token/fd is/will be confined to the
>> > thread group, use_creds(fd) only needs to validate that it is a
>> > switch_creds one, not from an arbitrary open().  I do this.
>>
>> Not so fast...  what if you start privileged, create a cred fd, call
>> unshare, do a dyntransition, chroot, drop privileges, and call
>> use_creds?  I don't immediately see why this is insecure, but having
>> it be secure seems to be more or less the same condition as having my
>> credfd_activate be secure.
>>
>> And I still don't see why you need revert at all.  Just create a
>> second token/fd/whatever representing your initial creds and switch to
>> that.
>
> The creds you get have also had the capabilities reduced.  You can't do a
> chroot because it will EPERM.  Same with a lot of others.  This is a
> restricted file access jail in itself.  We *reduce* privs+caps here.

Why not just leave the capabilities as-is?  The caller will have to
remove CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH from the effective set
first, but that's okay.

>
> The use_creds has a shortcut of -1 to revert to real creds.  So if you do a
> series of these, how do you get back to your priv'd state?  You can either
> keep track of that yourself or just pass in -1 to use_creds() and know you
> are...

int fully_privileged_cred_fd;

>>
>> > I have focused this down to "fsuid" because I intend this for ops that
>> > file
>> > perms.  Other stuff is out of scope unless someone can come up with a use
>> > case and add flag defs...  The other variations on the theme uid, euid,
>> > and that other one I don't understand the use for, are for long lasting
>> > creds change, i.e. become user "bob" and go off an fork a shell...  I am
>> > wrapping I/O.
>> Isn't there euid for that?
>
> Right.  A crappy interface given that creds are really more than just euid or
> even egid but it does that deed just fine.  It is for long term as in "become
> Dave and run this program..."  I don't change that.  I'm dealing with "As Bob,
> do a pwrite(s)+fstat" across N cores and M ops per core...

But euid can do that just fine.  And I think it would be unfortunate
to add these new syscalls in a way that's so narrowly tailored to
userspace file serving.  Someone might want to use it for login: have
an unprivileged task that accepts connections, forward the username
and password to a privileged helper, have that helper send back creds,
then switch to those creds and call exec.

>>
>> Eek!  You want this for I/O.  What if someone else wants it for
>> something else?  Any where does the actual list of what syscalls get
>> blocked come from?
>
> The "blocked" list is twofold.  First, any syscall that needs privs that I
> have explicitly given up here (see the nfs_setuser caps that knfsd masks)
> The second is, with the new patches, any syscall that ends up doing a
> commit_creds().  That is, in general, all the set*id/groups calls, anything
> messing with keys, execve, fork, and security twiddling code.  The more I
> think about this, it is a good thing.  They would return EPERM which for my
> use case wouldn't happen but for exploit attempts it slams the door with a
> clean error.
>
>>
>> I think that your patches will get a *lot* simpler if you get rid of
>> this override_creds and revert stuff and just switch the entire set of
>> creds.  No setuid blocking, no BUG, and no need to audit the whole
>> tree for odd real_creds uses.
>
> I really do want override.  Otherwise, it is an RCU and I can get into a
> position where I can't revert leaving me with the even worse situation of
> doing a thread exit because it is no good for anything else anymore.

No RCU -- as long as you have anything else that references the
original creds, the refcount won't drop to zero, and no RCU free will
be needed.  And, if you keep an fd referencing the old creds, you can
go back.

>
> As for someone else coming up with a new use case, sure, we can think about
> this.  One idea has been proposed to have a flags arg to spec the "type" of uid
> etc.  If there is motivation for that, I could add a flags arg with only one
> "do fsuid" as its first and, for now, only defined bit.
>
> As for the BUG(task->creds != task->real_creds), this should really be handled
> much earlier and more friendly (EPERM) than oops'ing the task.  That is what
> the promised additional patch does.  I did audit the commit_creds calls to
> come up with the patch and the list above.

I'm still scared of this.  Someone will add something in the future
that breaks this.  It could be an LSM fiddling with security.  It
could be keys.  And regardless of what it is, I don't see why it
should be illegal, other than the fact that no one knows what the
semantics are.


The tl;dr version might be that you're proposing adding a way to
separate real_creds and creds in a way that's visible to userspace.
I'm suggesting a way to have a kernel object that grants permission to
switch both of them.  I think that mine is less of a departure from
the way things work now, and is therefore less scary.

If you're afraid of my approach, then make credfd_activate require
CAP_SYS_ADMIN, and it'll all be safe.

--Andy

>
>>
>> --Andy
>
> --
> Jim Lieb
> Linux Systems Engineer
> Panasas Inc.
>
> "If ease of use was the only requirement, we would all be riding tricycles"
> - Douglas Engelbart 1925-2013



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 14:08       ` Jeff Layton
@ 2014-03-29  6:43         ` Alex Elsayed
  2014-03-30 13:03         ` Theodore Ts'o
  1 sibling, 0 replies; 42+ messages in thread
From: Alex Elsayed @ 2014-03-29  6:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-security-module, linux-fsdevel

Jeff Layton wrote:

> On Wed, 26 Mar 2014 20:25:35 -0700
> Jeff Layton <jlayton@redhat.com> wrote:
> 
>> On Wed, 26 Mar 2014 20:05:16 -0700
>> Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton <jlayton@redhat.com>
>> > wrote:
>> > > On Wed, 26 Mar 2014 17:23:24 -0700
>> > > Andy Lutomirski <luto@amacapital.net> wrote:
>> > >
>> > >> Hi various people who care about user-space NFS servers and/or
>> > >> security-relevant APIs.
>> > >>
>> > >> I propose the following set of new syscalls:
>> > >>
>> > >> int credfd_create(unsigned int flags): returns a new credfd that
>> > >> corresponds to current's creds.
>> > >>
>> > >> int credfd_activate(int fd, unsigned int flags): Change current's
>> > >> creds to match the creds stored in fd.  To be clear, this changes
>> > >> both the "subjective" and "objective" (aka real_cred and cred)
>> > >> because there aren't any real semantics for what happens when
>> > >> userspace code runs with real_cred != cred.
>> > >>
>> > >> Rules:
>> > >>
>> > >>  - credfd_activate fails (-EINVAL) if fd is not a credfd.
>> > >>  - credfd_activate fails (-EPERM) if the fd's userns doesn't
>> > >> match current's userns.  credfd_activate is not intended to be a
>> > >> substitute for setns.
>> > >>  - credfd_activate will fail (-EPERM) if LSM does not allow the
>> > >> switch.  This probably needs to be a new selinux action --
>> > >> dyntransition is too restrictive.
>> > >>
>> > >>
>> > >> Optional:
>> > >>  - credfd_create always sets cloexec, because the alternative is
>> > >> silly.
>> > >>  - credfd_activate fails (-EINVAL) if dumpable.  This is because
>> > >> we don't want a privileged daemon to be ptraced while
>> > >> impersonating someone else.
>> > >>  - optional: both credfd_create and credfd_activate fail if
>> > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID).
>> > >>
>> > >> The first question: does this solve Ganesha's problem?
>> > >>
>> > >> The second question: is this safe?  I can see two major concerns.
>> > >> The bigger concern is that having these syscalls available will
>> > >> allow users to exploit things that were previously secure.  For
>> > >> example, maybe some configuration assumes that a task running as
>> > >> uid==1 can't switch to uid==2, even with uid 2's consent.
>> > >> Similar issues happen with capabilities.  If CAP_SYS_ADMIN is not
>> > >> required, then this is no longer really true.
>> > >>
>> > >> Alternatively, something running as uid == 0 with heavy
>> > >> capability restrictions in a mount namespace (but not a uid
>> > >> namespace) could pass a credfd out of the namespace.  This could
>> > >> break things like Docker pretty badly.  CAP_SYS_ADMIN guards
>> > >> against this to some extent.  But I think that Docker is already
>> > >> totally screwed if a Docker root task can receive an O_DIRECTORY
>> > >> or O_PATH fd out of the container, so it's not entirely clear
>> > >> that the situation is any worse, even without requiring
>> > >> CAP_SYS_ADMIN.
>> > >>
>> > >> The second concern is that it may be difficult to use this
>> > >> correctly. There's a reason that real_cred and cred exist, but
>> > >> it's not really well set up for being used.
>> > >>
>> > >> As a simple way to stay safe, Ganesha could only use credfds that
>> > >> have real_uid == 0.
>> > >>
>> > >> --Andy
>> > >
>> > >
>> > > I still don't quite grok why having this special credfd_create
>> > > call buys you anything over simply doing what Al had originally
>> > > suggested -- switch creds using all of the different syscalls and
>> > > then simply caching that in a "normal" fd:
>> > >
>> > >     fd = open("/dev/null", O_PATH...);
>> > >
>> > > ...it seems to me that the credfd_activate call will still need to
>> > > do the same permission checking that all of the individual
>> > > set*id() calls require (and all of the other stuff like changing
>> > > selinux contexts, etc).
>> > >
>> > > IOW, this fd is just a "handle" for passing around a struct cred,
>> > > but I don't see why having access to that handle would allow you
>> > > to do something you couldn't already do anyway.
>> > >
>> > > Am I missing something obvious here?
>> > 
>> > Not really.  I think I didn't adequately explain a piece of this.
>> > 
>> > I think that what you're suggesting is for an fd to encode a set of
>> > credentials but not to grant permission to use those credentials.
>> > So switch_creds(fd) is more or less the same thing as
>> > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac
>> > label, ...).  switch_creds needs to verify that the caller can
>> > dyntransition to the label, set all the ids, etc., but it avoids
>> > allocating anything and running RCU callbacks.
>> > 
>> > The trouble with this is that the verification needed is complicated
>> > and expensive.  And I think that my proposal is potentially more
>> > useful.
>> > 
>> 
>> Is it really though? My understanding of the problem was that it was
>> the syscall (context switching) overhead + having to do a bunch of RCU
>> critical stuff that was the problem. If we can do all of this in the
>> context of a single RCU critical section, isn't that still a win?
>> 
>> As to the complicated part...maybe but it doesn't seem like it would
>> have to be. We could simply return -EINVAL or something if the old
>> struct cred doesn't have fields that match the ones we're replacing
>> and that we don't expect to see changed.
>> 
>> > A credfd is like a struct cred, but possession of a credfd carries
>> > the permission to use those credentials.  So, for example,
>> > credfd_activate to switch to a given uid might work even if
>> > setresuid to that uid would be disallowed.  But, for this to be
>> > secure, the act of giving someone a credfd needs to be explicit.
>> > Programs implicitly send other programs their credentials by means
>> > of f_cred all the time, and they don't expect to allow the receiver
>> > to impersonate them.
>> > 
>> 
>> Eek!
>> 
>> Passing around permission to use the credential in conjunction with
>> the credentials themselves sounds a lot more dangerous to me.
>> 
>> My preference would be that we don't add anything that potentially
>> gives you privileges to do something you couldn't do already with
>> existing syscalls. That doesn't seem to be necessary for the intended
>> use case anyway. When it comes to security, I think we need to err on
>> the side of caution than try to shortcut it for performance.
>> 
>> > credfd has other uses.  A file server, for example, could actually
>> > delegate creation of the credfds to a separate process, and that
>> > process could validate that the request is for a credfd that the
>> > file server really should be able to obtain.  This would enable that
>> > process to make sure that the user in question has actually
>> > authenticated itself, so a file server compromise could only attack
>> > users who connect instead of attacking any user on the system.  This
>> > is an argument against requiring CAP_SYS_ADMIN to use
>> > credfd_activate.
>> > 
>> > I'm less confident in whether capabilities should be needed to use
>> > credfd_create.
>> > 
>> > Is that clearer and/or more convincing?
>> > 
>> 
>> My worry would be that you could then compromise the process doing
>> this credfd creation and trick it into passing around credentials that
>> aren't what are expected.
>> 
>> Another possibility could some kernel bug allow you to frob the creds
>> that are attached to an existing fd after they've been "vetted" for
>> use.
>> 
>> So yeah, I think I better understand what you're proposing (and thanks
>> for explaining it), but I'm not convinced that it's really a safe
>> idea.
>> 
> 
> I had some time to think about this last night...
> 
> While using a fd to pass around credentials is convenient, the danger
> is that it's pretty opaque. You have a fd that you know has creds
> attached to it, but it's hard to be certain what is going to change.
> 
> Perhaps we can use the flags field for that. So, assuming we have a fd
> with the creds attached, we could do something like:
> 
>     err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
> 
> ...then the switch_creds syscall could be set up to fail if the new
> credentials had other fields that didn't match those in the current
> task credentials. So if (for instance) the cred->euid were
> different between the two, the above could fail with -EINVAL or
> something.
> 
> Yes, that means that we need to check this stuff on every switch_creds
> call, but I don't think it'll be that expensive.
> 

If creating a credfd populates a bitmap in the same manner (I suspect 
_creating_ credfds might be somewhat less of a hot-path) then it would come 
down to

if (credfd_fields & ~flags) goto unexpected_cred_field;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 14:08       ` Jeff Layton
  2014-03-29  6:43         ` Alex Elsayed
@ 2014-03-30 13:03         ` Theodore Ts'o
  2014-03-30 18:56           ` Andy Lutomirski
  2014-03-31 11:51           ` Jeff Layton
  1 sibling, 2 replies; 42+ messages in thread
From: Theodore Ts'o @ 2014-03-30 13:03 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, linux-kernel,
	bfields

On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> I had some time to think about this last night...
> 
> While using a fd to pass around credentials is convenient, the danger
> is that it's pretty opaque. You have a fd that you know has creds
> attached to it, but it's hard to be certain what is going to change.

I don't think that's a particularly tough problem.  In general, the fd
isn't something that you would want to pass around, and so the process
which generated it will know exactly what it contained.

> Perhaps we can use the flags field for that. So, assuming we have a fd
> with the creds attached, we could do something like:
> 
>     err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
> 
> ...then the switch_creds syscall could be set up to fail if the new
> credentials had other fields that didn't match those in the current
> task credentials. So if (for instance) the cred->euid were
> different between the two, the above could fail with -EINVAL or
> something.

Huh?  The whole *point* is that the creds value will be different, of
course they won't match!  I would think this would be over
complicating the interface.


A couple of other things.  What I would suggest is that we create a
few new fd flags, to join FD_CLOEXEC:

FD_NOPROCFS	disallow being able to open the inode via /proc/<pid>/fd
		(but in the case of a creds fd, for bonus points, the
		target of the pseudo-symlink could be something like:
		"uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
		a userspace file server).  This also answers Jeff's concern
		if for some reason --- I don't know how --- a process
		doesn't know what the contents of the creds fd that
		it created itself.

FD_NOPASSFD	disallow being able to pass the fd via a unix domain socket

FD_LOCKFLAGS	if this bit is set, disallow any further changes of FD_CLOEXEC,
		FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.

Some of the functionality requested by the folks suggesting the "SEAL"
API would also be covered by these fd flags.

In order to solve some potential race concerns, a credsfd must be
created with FD_CLOEXEC and FD_NOPROCFS enabled.

Why is this important even if the anon_inode is owned by root with a
mode of 0?  Because if the system is set up to use SELinux or full
Posix capabilities, merely having the a uid of 0 is not special, and
we don't want to allow a process with uid of 0 to be able modify the
mode with the /proc/<pid>/fd/<FD> and then proceed to open the inode
using open.  This way, instead of adding special case code to prevent
this from happening, we can add a more general facility which can be
used to solve a few other problems.

Regards,

						- Ted


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-30 13:03         ` Theodore Ts'o
@ 2014-03-30 18:56           ` Andy Lutomirski
  2014-03-31 11:51           ` Jeff Layton
  1 sibling, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-30 18:56 UTC (permalink / raw)
  To: Theodore Ts'o, Jeff Layton, Andy Lutomirski, Jim Lieb,
	Eric W. Biederman, LSM List, Serge E. Hallyn, Kees Cook,
	Linux FS Devel, linux-kernel, bfields

On Sun, Mar 30, 2014 at 6:03 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>> I had some time to think about this last night...
>>
>> While using a fd to pass around credentials is convenient, the danger
>> is that it's pretty opaque. You have a fd that you know has creds
>> attached to it, but it's hard to be certain what is going to change.
>
> I don't think that's a particularly tough problem.  In general, the fd
> isn't something that you would want to pass around, and so the process
> which generated it will know exactly what it contained.
>
>> Perhaps we can use the flags field for that. So, assuming we have a fd
>> with the creds attached, we could do something like:
>>
>>     err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
>>
>> ...then the switch_creds syscall could be set up to fail if the new
>> credentials had other fields that didn't match those in the current
>> task credentials. So if (for instance) the cred->euid were
>> different between the two, the above could fail with -EINVAL or
>> something.
>
> Huh?  The whole *point* is that the creds value will be different, of
> course they won't match!  I would think this would be over
> complicating the interface.
>
>
> A couple of other things.  What I would suggest is that we create a
> few new fd flags, to join FD_CLOEXEC:
>
> FD_NOPROCFS     disallow being able to open the inode via /proc/<pid>/fd
>                 (but in the case of a creds fd, for bonus points, the
>                 target of the pseudo-symlink could be something like:
>                 "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
>                 a userspace file server).  This also answers Jeff's concern
>                 if for some reason --- I don't know how --- a process
>                 doesn't know what the contents of the creds fd that
>                 it created itself.
>
> FD_NOPASSFD     disallow being able to pass the fd via a unix domain socket
>
> FD_LOCKFLAGS    if this bit is set, disallow any further changes of FD_CLOEXEC,
>                 FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
>
> Some of the functionality requested by the folks suggesting the "SEAL"
> API would also be covered by these fd flags.
>
> In order to solve some potential race concerns, a credsfd must be
> created with FD_CLOEXEC and FD_NOPROCFS enabled.

It might be nice to try to coordinate this with the Capsicum people,
who are doing something along these lines.  They even have tentative
Linux patches.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27  0:23 Thoughts on credential switching Andy Lutomirski
                   ` (2 preceding siblings ...)
  2014-03-27 12:46 ` Florian Weimer
@ 2014-03-31 10:44 ` One Thousand Gnomes
  2014-03-31 16:49   ` Andy Lutomirski
  2014-03-31 19:05   ` Jeremy Allison
  3 siblings, 2 replies; 42+ messages in thread
From: One Thousand Gnomes @ 2014-03-31 10:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

On Wed, 26 Mar 2014 17:23:24 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> Hi various people who care about user-space NFS servers and/or
> security-relevant APIs.
> 
> I propose the following set of new syscalls:
> 
> int credfd_create(unsigned int flags): returns a new credfd that
> corresponds to current's creds.
> 
> int credfd_activate(int fd, unsigned int flags): Change current's
> creds to match the creds stored in fd.  To be clear, this changes both
> the "subjective" and "objective" (aka real_cred and cred) because
> there aren't any real semantics for what happens when userspace code
> runs with real_cred != cred.

What is the sematic of a simultaneous ptrace racing a credfd_activate on
another processor core ?

What are the rules for simultaneous threads doing I/O and and credential
changes ?

What is the rule for a faulting of an mmapped page in a multithreaded app
one thread of which has changed credentials ?

Who owns a file created while you are changing credentials ?

>  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> don't want a privileged daemon to be ptraced while impersonating
> someone else.

That's one of the obvious problems but if you have that problem then
you've got races against signals and ptrace etc to deal with.

One way to implement it I think safely but which requires a fair bit more
work elsewhere is to apply the debug and signal type checks as

   'you may only do X if you also posess the right to do so for *all*
    credentials accessible to this task'

which I think is the correct logical check.

Alan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-30 13:03         ` Theodore Ts'o
  2014-03-30 18:56           ` Andy Lutomirski
@ 2014-03-31 11:51           ` Jeff Layton
  2014-03-31 18:06               ` Trond Myklebust
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2014-03-31 11:51 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, linux-kernel,
	bfields

On Sun, 30 Mar 2014 09:03:29 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> > I had some time to think about this last night...
> > 
> > While using a fd to pass around credentials is convenient, the danger
> > is that it's pretty opaque. You have a fd that you know has creds
> > attached to it, but it's hard to be certain what is going to change.
> 
> I don't think that's a particularly tough problem.  In general, the fd
> isn't something that you would want to pass around, and so the process
> which generated it will know exactly what it contained.
> 

I think there's a bit more of a use-case for passing around such an fd
via socket...

Part of the problem is that the traditional uid/gid switching glibc
wrappers are per-process. If we're proposing doing something like:

seteuid()
setegid()
setgroups()
fd = open()
(...and then revert the creds using same syscalls)

...during the time that you're doing all of that, you can't really
allow any thread in the process to be doing something that requires
_other_ creds until you've completed the above.

So, I could envision a program like ganesha firing up a separate
process to handle the credential switching and fd creation and then
handing those back to the main process via a unix domain socket.

> > Perhaps we can use the flags field for that. So, assuming we have a fd
> > with the creds attached, we could do something like:
> > 
> >     err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS);
> > 
> > ...then the switch_creds syscall could be set up to fail if the new
> > credentials had other fields that didn't match those in the current
> > task credentials. So if (for instance) the cred->euid were
> > different between the two, the above could fail with -EINVAL or
> > something.
> 
> Huh?  The whole *point* is that the creds value will be different, of
> course they won't match!  I would think this would be over
> complicating the interface.
> 

You know that *some* of the credentials won't match, but presumably
some of them (e.g. the real uid/gid) should match. Using a flags field
could act as a mask of what creds are allowed to change.

> 
> A couple of other things.  What I would suggest is that we create a
> few new fd flags, to join FD_CLOEXEC:
> 
> FD_NOPROCFS	disallow being able to open the inode via /proc/<pid>/fd
> 		(but in the case of a creds fd, for bonus points, the
> 		target of the pseudo-symlink could be something like:
> 		"uid: 15806 gid: 100: grps: 27, 50" to aid in debugging
> 		a userspace file server).  This also answers Jeff's concern
> 		if for some reason --- I don't know how --- a process
> 		doesn't know what the contents of the creds fd that
> 		it created itself.
> 
> FD_NOPASSFD	disallow being able to pass the fd via a unix domain socket
> 
> FD_LOCKFLAGS	if this bit is set, disallow any further changes of FD_CLOEXEC,
> 		FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
> 
> Some of the functionality requested by the folks suggesting the "SEAL"
> API would also be covered by these fd flags.
> 
> In order to solve some potential race concerns, a credsfd must be
> created with FD_CLOEXEC and FD_NOPROCFS enabled.
> 
> Why is this important even if the anon_inode is owned by root with a
> mode of 0?  Because if the system is set up to use SELinux or full
> Posix capabilities, merely having the a uid of 0 is not special, and
> we don't want to allow a process with uid of 0 to be able modify the
> mode with the /proc/<pid>/fd/<FD> and then proceed to open the inode
> using open.  This way, instead of adding special case code to prevent
> this from happening, we can add a more general facility which can be
> used to solve a few other problems.
> 



-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 10:44 ` One Thousand Gnomes
@ 2014-03-31 16:49   ` Andy Lutomirski
  2014-04-01 20:22     ` One Thousand Gnomes
  2014-03-31 19:05   ` Jeremy Allison
  1 sibling, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-31 16:49 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jim Lieb, Jeff Layton, Serge E. Hallyn, bfields,
	Theodore Ts'o, Linux FS Devel, linux-kernel, LSM List,
	Eric W. Biederman, Kees Cook

On Mar 31, 2014 3:45 AM, "One Thousand Gnomes"
<gnomes@lxorguk.ukuu.org.uk> wrote:
>
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
> > Hi various people who care about user-space NFS servers and/or
> > security-relevant APIs.
> >
> > I propose the following set of new syscalls:
> >
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> >
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes both
> > the "subjective" and "objective" (aka real_cred and cred) because
> > there aren't any real semantics for what happens when userspace code
> > runs with real_cred != cred.
>
> What is the sematic of a simultaneous ptrace racing a credfd_activate on
> another processor core ?

Exactly the same as for the raw syscall setresuid.  That is, we
require sequential consistency.  If the kernel fails to provide
sequential consistency, it's a bug.

>
> What are the rules for simultaneous threads doing I/O and and credential
> changes ?

None.  credfd_activate affects only the calling thread.

>
> What is the rule for a faulting of an mmapped page in a multithreaded app
> one thread of which has changed credentials ?

Same as in current kernels.  Note that checking credentials in
readpage, etc. is generally a bug.

>
> Who owns a file created while you are changing credentials ?

Can't happen.  See above.

>
> >  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> > don't want a privileged daemon to be ptraced while impersonating
> > someone else.
>
> That's one of the obvious problems but if you have that problem then
> you've got races against signals and ptrace etc to deal with.
>
> One way to implement it I think safely but which requires a fair bit more
> work elsewhere is to apply the debug and signal type checks as
>
>    'you may only do X if you also posess the right to do so for *all*
>     credentials accessible to this task'
>
> which I think is the correct logical check.

Do we include credfd fds sitting in Unix sockets?

More generally, there are plenty of cases where ptracing something
might give you access to one of its fds that you shouldn't normally
have access to.  Even real file fds are like this, and they could come
from outside a namespace, they could be a result of odd selinux
policy, etc.

The only real new issue is that, while using a subsidiary credential,
you don't want people who couldn't ptrace you before to suddenly be
able to ptrace you.  This is where "subjective" and "objective"
credentials came from, but the current in-kernel implementation is
really half-baked.  Fixing it for real will probably run into all
kinds of funny corner cases involving the existing syscalls, and it
would be nice to have a way to handle this that doesn't involve
auditing every commit_creds caller.

Hmm.  What if we had initial_creds and creds, and initial_creds never
changed unless explicitly requested.  There wouldn't be any way to
revert to initial_creds, but may_ptrace would check initial_creds
*and* creds.  This is a little hacky, and it might break Android (I'm
not really sure how the zygote thing works), but it could do the
trick.

Or we can rely on dumpable, which I think can prevent unprivileged
ptracers from attaching.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 11:51           ` Jeff Layton
@ 2014-03-31 18:06               ` Trond Myklebust
  0 siblings, 0 replies; 42+ messages in thread
From: Trond Myklebust @ 2014-03-31 18:06 UTC (permalink / raw)
  To: Layton Jeff
  Cc: Theodore Ts'o, Andy Lutomirski, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce


On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:

> On Sun, 30 Mar 2014 09:03:29 -0400
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>> I had some time to think about this last night...
>>> 
>>> While using a fd to pass around credentials is convenient, the danger
>>> is that it's pretty opaque. You have a fd that you know has creds
>>> attached to it, but it's hard to be certain what is going to change.
>> 
>> I don't think that's a particularly tough problem.  In general, the fd
>> isn't something that you would want to pass around, and so the process
>> which generated it will know exactly what it contained.
>> 
> 
> I think there's a bit more of a use-case for passing around such an fd
> via socket...
> 
> Part of the problem is that the traditional uid/gid switching glibc
> wrappers are per-process. If we're proposing doing something like:
> 
> seteuid()
> setegid()
> setgroups()
> fd = open()
> (...and then revert the creds using same syscalls)
> 
> ...during the time that you're doing all of that, you can't really
> allow any thread in the process to be doing something that requires
> _other_ creds until you've completed the above.

Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity.

> So, I could envision a program like ganesha firing up a separate
> process to handle the credential switching and fd creation and then
> handing those back to the main process via a unix domain socket.

How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?

Cheers
  Trond

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
@ 2014-03-31 18:06               ` Trond Myklebust
  0 siblings, 0 replies; 42+ messages in thread
From: Trond Myklebust @ 2014-03-31 18:06 UTC (permalink / raw)
  To: Layton Jeff
  Cc: Theodore Ts'o, Andy Lutomirski, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce


On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:

> On Sun, 30 Mar 2014 09:03:29 -0400
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>> I had some time to think about this last night...
>>> 
>>> While using a fd to pass around credentials is convenient, the danger
>>> is that it's pretty opaque. You have a fd that you know has creds
>>> attached to it, but it's hard to be certain what is going to change.
>> 
>> I don't think that's a particularly tough problem.  In general, the fd
>> isn't something that you would want to pass around, and so the process
>> which generated it will know exactly what it contained.
>> 
> 
> I think there's a bit more of a use-case for passing around such an fd
> via socket...
> 
> Part of the problem is that the traditional uid/gid switching glibc
> wrappers are per-process. If we're proposing doing something like:
> 
> seteuid()
> setegid()
> setgroups()
> fd = open()
> (...and then revert the creds using same syscalls)
> 
> ...during the time that you're doing all of that, you can't really
> allow any thread in the process to be doing something that requires
> _other_ creds until you've completed the above.

Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity.

> So, I could envision a program like ganesha firing up a separate
> process to handle the credential switching and fd creation and then
> handing those back to the main process via a unix domain socket.

How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?

Cheers
  Trond

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 18:06               ` Trond Myklebust
@ 2014-03-31 18:12                 ` Jeff Layton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeff Layton @ 2014-03-31 18:12 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Theodore Ts'o, Andy Lutomirski, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce

On Mon, 31 Mar 2014 14:06:01 -0400
Trond Myklebust <trond.myklebust@primarydata.com> wrote:

> 
> On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Sun, 30 Mar 2014 09:03:29 -0400
> > "Theodore Ts'o" <tytso@mit.edu> wrote:
> > 
> >> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> >>> I had some time to think about this last night...
> >>> 
> >>> While using a fd to pass around credentials is convenient, the danger
> >>> is that it's pretty opaque. You have a fd that you know has creds
> >>> attached to it, but it's hard to be certain what is going to change.
> >> 
> >> I don't think that's a particularly tough problem.  In general, the fd
> >> isn't something that you would want to pass around, and so the process
> >> which generated it will know exactly what it contained.
> >> 
> > 
> > I think there's a bit more of a use-case for passing around such an fd
> > via socket...
> > 
> > Part of the problem is that the traditional uid/gid switching glibc
> > wrappers are per-process. If we're proposing doing something like:
> > 
> > seteuid()
> > setegid()
> > setgroups()
> > fd = open()
> > (...and then revert the creds using same syscalls)
> > 
> > ...during the time that you're doing all of that, you can't really
> > allow any thread in the process to be doing something that requires
> > _other_ creds until you've completed the above.
> 
> Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity.
> 

The proposal that Al originally had was to have userland set up the
credentials that it wanted to use, and then open("/dev/null", ...) to
get a fd that has those creds attached.

Then, we'd add a syscall that takes such an fd and switches to the creds
attached to it. So the open() in the above example isn't to do an actual
operation but rather to create a fd to act as a credential "handle".

> > So, I could envision a program like ganesha firing up a separate
> > process to handle the credential switching and fd creation and then
> > handing those back to the main process via a unix domain socket.
> 
> How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?
> 

That's an interesting idea. 
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
@ 2014-03-31 18:12                 ` Jeff Layton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff Layton @ 2014-03-31 18:12 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Theodore Ts'o, Andy Lutomirski, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce

On Mon, 31 Mar 2014 14:06:01 -0400
Trond Myklebust <trond.myklebust@primarydata.com> wrote:

> 
> On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Sun, 30 Mar 2014 09:03:29 -0400
> > "Theodore Ts'o" <tytso@mit.edu> wrote:
> > 
> >> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
> >>> I had some time to think about this last night...
> >>> 
> >>> While using a fd to pass around credentials is convenient, the danger
> >>> is that it's pretty opaque. You have a fd that you know has creds
> >>> attached to it, but it's hard to be certain what is going to change.
> >> 
> >> I don't think that's a particularly tough problem.  In general, the fd
> >> isn't something that you would want to pass around, and so the process
> >> which generated it will know exactly what it contained.
> >> 
> > 
> > I think there's a bit more of a use-case for passing around such an fd
> > via socket...
> > 
> > Part of the problem is that the traditional uid/gid switching glibc
> > wrappers are per-process. If we're proposing doing something like:
> > 
> > seteuid()
> > setegid()
> > setgroups()
> > fd = open()
> > (...and then revert the creds using same syscalls)
> > 
> > ...during the time that you're doing all of that, you can't really
> > allow any thread in the process to be doing something that requires
> > _other_ creds until you've completed the above.
> 
> Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity.
> 

The proposal that Al originally had was to have userland set up the
credentials that it wanted to use, and then open("/dev/null", ...) to
get a fd that has those creds attached.

Then, we'd add a syscall that takes such an fd and switches to the creds
attached to it. So the open() in the above example isn't to do an actual
operation but rather to create a fd to act as a credential "handle".

> > So, I could envision a program like ganesha firing up a separate
> > process to handle the credential switching and fd creation and then
> > handing those back to the main process via a unix domain socket.
> 
> How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?
> 

That's an interesting idea. 
-- 
Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 10:44 ` One Thousand Gnomes
  2014-03-31 16:49   ` Andy Lutomirski
@ 2014-03-31 19:05   ` Jeremy Allison
  1 sibling, 0 replies; 42+ messages in thread
From: Jeremy Allison @ 2014-03-31 19:05 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Andy Lutomirski, Jim Lieb, Eric W. Biederman, LSM List,
	Serge E. Hallyn, Kees Cook, Linux FS Devel, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Mon, Mar 31, 2014 at 11:44:59AM +0100, One Thousand Gnomes wrote:
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> > Hi various people who care about user-space NFS servers and/or
> > security-relevant APIs.
> > 
> > I propose the following set of new syscalls:
> > 
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> > 
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes both
> > the "subjective" and "objective" (aka real_cred and cred) because
> > there aren't any real semantics for what happens when userspace code
> > runs with real_cred != cred.
> 
> What is the sematic of a simultaneous ptrace racing a credfd_activate on
> another processor core ?
> 
> What are the rules for simultaneous threads doing I/O and and credential
> changes ?
> 
> What is the rule for a faulting of an mmapped page in a multithreaded app
> one thread of which has changed credentials ?
> 
> Who owns a file created while you are changing credentials ?
> 
> >  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> > don't want a privileged daemon to be ptraced while impersonating
> > someone else.
> 
> That's one of the obvious problems but if you have that problem then
> you've got races against signals and ptrace etc to deal with.

FYI, Any process using pthreads and glibc already
has to cope with these races as setresuid on glibc
on Linux is not atomic.

That's why Samba eventually changed to using the
raw system calls on Linux due to an interesting
bug with glibc aio interacting with setresuid
races (receiving signal thread was uid 0, sending
aio thread was non-zero - signal couldn't be
delivered, glibc aio wakeup lost).

Jeremy.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 18:06               ` Trond Myklebust
  (?)
  (?)
@ 2014-03-31 19:26               ` Andy Lutomirski
  2014-03-31 20:14                 ` Trond Myklebust
  -1 siblings, 1 reply; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-31 19:26 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Layton Jeff, Theodore Ts'o, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce

On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
>
> On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:
>
>> On Sun, 30 Mar 2014 09:03:29 -0400
>> "Theodore Ts'o" <tytso@mit.edu> wrote:
>>
>>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>>> I had some time to think about this last night...
>>>>
>>>> While using a fd to pass around credentials is convenient, the danger
>>>> is that it's pretty opaque. You have a fd that you know has creds
>>>> attached to it, but it's hard to be certain what is going to change.
>>>
>>> I don't think that's a particularly tough problem.  In general, the fd
>>> isn't something that you would want to pass around, and so the process
>>> which generated it will know exactly what it contained.
>>>
>>
>> I think there's a bit more of a use-case for passing around such an fd
>> via socket...
>>
>> Part of the problem is that the traditional uid/gid switching glibc
>> wrappers are per-process. If we're proposing doing something like:
>>
>> seteuid()
>> setegid()
>> setgroups()
>> fd = open()
>> (...and then revert the creds using same syscalls)
>>
>> ...during the time that you're doing all of that, you can't really
>> allow any thread in the process to be doing something that requires
>> _other_ creds until you've completed the above.
>
> Umm... open() isn't the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... Pretty much any interaction with the underlying filesystem needs to use the right identity.
>
>> So, I could envision a program like ganesha firing up a separate
>> process to handle the credential switching and fd creation and then
>> handing those back to the main process via a unix domain socket.
>
> How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?

This sounds considerably more complicated than just using fds.  What's
the advantage?

I guess using keys for local fs credentials fits in with using keys to
access things like AFS, but I'm still not sure I see why this helps
here.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 19:26               ` Andy Lutomirski
@ 2014-03-31 20:14                 ` Trond Myklebust
  2014-03-31 21:25                   ` Andy Lutomirski
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2014-03-31 20:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Layton Jeff, Theodore Ts'o, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce


On Mar 31, 2014, at 15:26, Andy Lutomirski <luto@amacapital.net> wrote:

> On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>> 
>> On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:
>> 
>>> On Sun, 30 Mar 2014 09:03:29 -0400
>>> "Theodore Ts'o" <tytso@mit.edu> wrote:
>>> 
>>>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>>>> I had some time to think about this last night...
>>>>> 
>>>>> While using a fd to pass around credentials is convenient, the danger
>>>>> is that it's pretty opaque. You have a fd that you know has creds
>>>>> attached to it, but it's hard to be certain what is going to change.
>>>> 
>>>> I don't think that's a particularly tough problem.  In general, the fd
>>>> isn't something that you would want to pass around, and so the process
>>>> which generated it will know exactly what it contained.
>>>> 
>>> 
>>> I think there's a bit more of a use-case for passing around such an fd
>>> via socket...
>>> 
>>> Part of the problem is that the traditional uid/gid switching glibc
>>> wrappers are per-process. If we're proposing doing something like:
>>> 
>>> seteuid()
>>> setegid()
>>> setgroups()
>>> fd = open()
>>> (...and then revert the creds using same syscalls)
>>> 
>>> ...during the time that you're doing all of that, you can't really
>>> allow any thread in the process to be doing something that requires
>>> _other_ creds until you've completed the above.
>> 
>> Umm... open() isn't the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... Pretty much any interaction with the underlying filesystem needs to use the right identity.
>> 
>>> So, I could envision a program like ganesha firing up a separate
>>> process to handle the credential switching and fd creation and then
>>> handing those back to the main process via a unix domain socket.
>> 
>> How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?
> 
> This sounds considerably more complicated than just using fds.  What's
> the advantage?

The advantage is that it’s considerably _less_ complicated because it uses interfaces that were designed to carry security related information, and to share them across threads.

> I guess using keys for local fs credentials fits in with using keys to
> access things like AFS, but I'm still not sure I see why this helps
> here.

All you want to do is atomically store and retrieve a user identity (in practice a credential) and share it between members of a thread. As I said, that kind of storage/retrieval of structured data is what keyrings do. As far as I know, there are already LSM interfaces in place (see security_key_alloc/permission/free), and the generic keyring/keyctl interface exists with appropriate process sharing/inheritance rules via the kernel keyring interface.

So basically, the recipe is:

- You set up a kernel key type that takes a reference to the current process credential on instantiation (no upcalls, etc needed). Ganesha can then use standard libkeyutils methods to create the key and store it in its user, process or thread keyring.
- You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that has access to the key, via the keyring in which it is stored, to apply the stored process credential to itself (and perhaps cache it’s old credential in a new key?).


_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 20:14                 ` Trond Myklebust
@ 2014-03-31 21:25                   ` Andy Lutomirski
  0 siblings, 0 replies; 42+ messages in thread
From: Andy Lutomirski @ 2014-03-31 21:25 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Layton Jeff, Theodore Ts'o, Jim Lieb, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	linux-kernel, Dr Fields James Bruce

On Mon, Mar 31, 2014 at 1:14 PM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
>
> On Mar 31, 2014, at 15:26, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>>>
>>> On Mar 31, 2014, at 7:51, Jeff Layton <jlayton@redhat.com> wrote:
>>>
>>>> On Sun, 30 Mar 2014 09:03:29 -0400
>>>> "Theodore Ts'o" <tytso@mit.edu> wrote:
>>>>
>>>>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote:
>>>>>> I had some time to think about this last night...
>>>>>>
>>>>>> While using a fd to pass around credentials is convenient, the danger
>>>>>> is that it's pretty opaque. You have a fd that you know has creds
>>>>>> attached to it, but it's hard to be certain what is going to change.
>>>>>
>>>>> I don't think that's a particularly tough problem.  In general, the fd
>>>>> isn't something that you would want to pass around, and so the process
>>>>> which generated it will know exactly what it contained.
>>>>>
>>>>
>>>> I think there's a bit more of a use-case for passing around such an fd
>>>> via socket...
>>>>
>>>> Part of the problem is that the traditional uid/gid switching glibc
>>>> wrappers are per-process. If we're proposing doing something like:
>>>>
>>>> seteuid()
>>>> setegid()
>>>> setgroups()
>>>> fd = open()
>>>> (...and then revert the creds using same syscalls)
>>>>
>>>> ...during the time that you're doing all of that, you can't really
>>>> allow any thread in the process to be doing something that requires
>>>> _other_ creds until you've completed the above.
>>>
>>> Umm... open() isn't the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... Pretty much any interaction with the underlying filesystem needs to use the right identity.
>>>
>>>> So, I could envision a program like ganesha firing up a separate
>>>> process to handle the credential switching and fd creation and then
>>>> handing those back to the main process via a unix domain socket.
>>>
>>> How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different?
>>
>> This sounds considerably more complicated than just using fds.  What's
>> the advantage?
>
> The advantage is that it's considerably _less_ complicated because it uses interfaces that were designed to carry security related information, and to share them across threads.
>
>> I guess using keys for local fs credentials fits in with using keys to
>> access things like AFS, but I'm still not sure I see why this helps
>> here.
>
> All you want to do is atomically store and retrieve a user identity (in practice a credential) and share it between members of a thread. As I said, that kind of storage/retrieval of structured data is what keyrings do. As far as I know, there are already LSM interfaces in place (see security_key_alloc/permission/free), and the generic keyring/keyctl interface exists with appropriate process sharing/inheritance rules via the kernel keyring interface.
>
> So basically, the recipe is:
>
> - You set up a kernel key type that takes a reference to the current process credential on instantiation (no upcalls, etc needed). Ganesha can then use standard libkeyutils methods to create the key and store it in its user, process or thread keyring.
> - You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that has access to the key, via the keyring in which it is stored, to apply the stored process credential to itself (and perhaps cache it's old credential in a new key?).

I can see pros and cons of using keys instead of fds:

Pros:
 - You can change may_ptrace to check that you can legally ptrace any
creds in the keyring.  (It's debatable how useful this is.)
 - You could plausibly stick subsidiary uids into a user keyring and
use them for uid_map, removing the need for newuidmap.

Cons:
 - You can't send them via SCM_RIGHTS.  I still think that the ability
to do that is extremely useful.

Neutral: What happens if the filesystem you're trying to share
requires keys?  I don't think that credfd or credkeys can really do
this.  I suppose that if the keys live in the user keyring
corresponding to the user being impersonated, everything just works
regardless of how this is implemented.

Cue brain explosion: thread_keyring lives inside struct cred.  I
suspect that any sane implementation of credential switching will need
to change that.  Sigh.


Anyway, the actual parts that seem like they need solving one way or
another are:

 - The keyring pointers in struct cred are a problem.  (Solved with
credkeys, but only if credkeys can only live on the thread keyring.
Also, there will be a problem with circular references.)

 - may_ptrace and all its users (e.g. procfs).  "Solved" by
real_creds, but no one has really defined the semantics of real_creds,
since it's not currently available to userspace.

--Andy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-31 16:49   ` Andy Lutomirski
@ 2014-04-01 20:22     ` One Thousand Gnomes
  0 siblings, 0 replies; 42+ messages in thread
From: One Thousand Gnomes @ 2014-04-01 20:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jim Lieb, Jeff Layton, Serge E. Hallyn, bfields,
	Theodore Ts'o, Linux FS Devel, linux-kernel, LSM List,
	Eric W. Biederman, Kees Cook

> Do we include credfd fds sitting in Unix sockets?

Ouch

> Hmm.  What if we had initial_creds and creds, and initial_creds never
> changed unless explicitly requested.  There wouldn't be any way to
> revert to initial_creds, but may_ptrace would check initial_creds
> *and* creds.  This is a little hacky, and it might break Android (I'm
> not really sure how the zygote thing works), but it could do the
> trick.

That mirros setfsuid and seteuid type behaviour so seems sensible enough.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-03-27 13:33       ` Boaz Harrosh
@ 2014-04-22 11:37         ` Florian Weimer
  2014-04-22 12:14           ` Boaz Harrosh
  0 siblings, 1 reply; 42+ messages in thread
From: Florian Weimer @ 2014-04-22 11:37 UTC (permalink / raw)
  To: Boaz Harrosh, Jeff Layton, Jim Lieb
  Cc: Andy Lutomirski, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On 03/27/2014 02:33 PM, Boaz Harrosh wrote:

> man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE
> and be done with it POSIX or no POSIX who cares?

The glibc side cares, and there's also this bit: "It aims towards POSIX 
and Single UNIX Specification compliance.", which should be familiar.

I get that POSIX doesn't do what people want here (umask and current 
directory handling are another story).  But it hasn't really helped that 
some folks just said "POSIX sucks" and did their own thing, completely 
bypassing the system interfaces, instead of getting things fixed 
properly.  Now we have to balance the kernel behavior, POSIX demands, 
glibc dirty hacks to implement the questionable POSIX behavior on top of 
the kernel interfaces, and more dirty hacks to work around glibc because 
some applications do not like it.

I would really like to clean up this mess, but it's not clear to me 
where to start.

> Please show me one user of that and declare it brain dead.

Safely and demonstratively relinquishing elevated privileges.

> POSIX or not it just does not have any real programming mining
> at all.

What do you mean with "mining" in this context?

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Thoughts on credential switching
  2014-04-22 11:37         ` Florian Weimer
@ 2014-04-22 12:14           ` Boaz Harrosh
  2014-04-22 16:35               ` Jim Lieb
  0 siblings, 1 reply; 42+ messages in thread
From: Boaz Harrosh @ 2014-04-22 12:14 UTC (permalink / raw)
  To: Florian Weimer, Jeff Layton, Jim Lieb
  Cc: Andy Lutomirski, Eric W. Biederman, LSM List, Serge E. Hallyn,
	Kees Cook, Linux FS Devel, Theodore Ts'o, linux-kernel,
	bfields

On 04/22/2014 02:37 PM, Florian Weimer wrote:
> On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
>> POSIX or not it just does not have any real programming mining
>> at all.
> 
> What do you mean with "mining" in this context?
> 

Sorry I saw this mistake after I posted. I meant "meaning".

What I'm saying is that the mess starts when you are trying
to keep patching a very wrong API. the POSIX politics aside,
in regard to user switching (and current directory and etc...)
this API is plain WRONG. I mean in the mathematical sense wrong.

All these application mess is not the application programmers
fault. He had to do what he had to do. The mess starts when you
are trying to keep a mathematical contradiction in your proof.

It is glibc mess for trying to maintain compatibility with
these "PROCESS WIDE OPERATIONS". And naming it holy names like
POSIX will not cover the mess that they are. As long as you try
to keep them there will be mess. If you want to honestly clean
things up is by throwing the true garbage out. Convert all legacy
code to new mathematically sound API's.

Peace
Boaz


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Thoughts on credential switching
  2014-04-22 12:14           ` Boaz Harrosh
@ 2014-04-22 16:35               ` Jim Lieb
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Lieb @ 2014-04-22 16:35 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Florian Weimer, Jeff Layton, Andy Lutomirski, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Tuesday, April 22, 2014 15:14:33 Boaz Harrosh wrote:
> On 04/22/2014 02:37 PM, Florian Weimer wrote:
> > On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
> >> POSIX or not it just does not have any real programming mining
> >> at all.
> > 
> > What do you mean with "mining" in this context?
> 
> Sorry I saw this mistake after I posted. I meant "meaning".
> 
> What I'm saying is that the mess starts when you are trying
> to keep patching a very wrong API. the POSIX politics aside,
> in regard to user switching (and current directory and etc...)
> this API is plain WRONG. I mean in the mathematical sense wrong.
> 
> All these application mess is not the application programmers
> fault. He had to do what he had to do. The mess starts when you
> are trying to keep a mathematical contradiction in your proof.
> 
> It is glibc mess for trying to maintain compatibility with
> these "PROCESS WIDE OPERATIONS". And naming it holy names like
> POSIX will not cover the mess that they are. As long as you try
> to keep them there will be mess. If you want to honestly clean
> things up is by throwing the true garbage out. Convert all legacy
> code to new mathematically sound API's.
> 
> Peace
> Boaz

I don't want to keep this churning thread going.  I'd rather solve today's 
problems.  So let's put this in perspective and move on.  Our project has a 
problem to solve.

I wouldn't necessarily say wrong, simply very out of date.  All of these 
issues with POSIX and pthreads in particular come from one common source, the 
actual capabilities of the systems, BSD, and SysV at the time.  The goofy 
localtime_r and friends were (and still are) hacks to work around the 
simplistic interfaces in the original Bell Labs code that deeply assumed a 
single process with a single thread, i.e. there were only 3 segments, no 
shared libraries or shared r/w memory and definitely no (real) TLS.  The 
pthreads interfaces deeply assumed the same.  Even worse, it had to also run 
on VMS which (I don't believe) ever had kernel support for multiple user space 
theads.  ASTs are nasty enough and only worked because everybody knew that you 
were either in app code or the AST...  So all of this stuff is based on lowest 
common denominator.  The "Single UNIX Specification" was a peace treaty among 
warring computer companies out looking for lock-ins and competitive advantages 
while reluctantly realizing that the ISV community (Oracle) really didn't care 
about anything but a common base.

Let's move on.  Glibc has to keep some semblance of POSIX in order to cope 
with legacy code from that era and I commend them for their perseverance.  But 
that should not hold us back from leveraging today's architectures and, more 
importantly, today's multi-client workloads.  If that means extensions to 
POSIX fine.  But if POSIX gets hung up because *BSD doesn't have kernel support 
for thread level creds and *bsd.org doesn't want to do it, we just do what fits 
today's requirements and what Linux is capable of,

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Re: Thoughts on credential switching
@ 2014-04-22 16:35               ` Jim Lieb
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Lieb @ 2014-04-22 16:35 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Florian Weimer, Jeff Layton, Andy Lutomirski, Eric W. Biederman,
	LSM List, Serge E. Hallyn, Kees Cook, Linux FS Devel,
	Theodore Ts'o, linux-kernel, bfields

On Tuesday, April 22, 2014 15:14:33 Boaz Harrosh wrote:
> On 04/22/2014 02:37 PM, Florian Weimer wrote:
> > On 03/27/2014 02:33 PM, Boaz Harrosh wrote:
> >> POSIX or not it just does not have any real programming mining
> >> at all.
> > 
> > What do you mean with "mining" in this context?
> 
> Sorry I saw this mistake after I posted. I meant "meaning".
> 
> What I'm saying is that the mess starts when you are trying
> to keep patching a very wrong API. the POSIX politics aside,
> in regard to user switching (and current directory and etc...)
> this API is plain WRONG. I mean in the mathematical sense wrong.
> 
> All these application mess is not the application programmers
> fault. He had to do what he had to do. The mess starts when you
> are trying to keep a mathematical contradiction in your proof.
> 
> It is glibc mess for trying to maintain compatibility with
> these "PROCESS WIDE OPERATIONS". And naming it holy names like
> POSIX will not cover the mess that they are. As long as you try
> to keep them there will be mess. If you want to honestly clean
> things up is by throwing the true garbage out. Convert all legacy
> code to new mathematically sound API's.
> 
> Peace
> Boaz

I don't want to keep this churning thread going.  I'd rather solve today's 
problems.  So let's put this in perspective and move on.  Our project has a 
problem to solve.

I wouldn't necessarily say wrong, simply very out of date.  All of these 
issues with POSIX and pthreads in particular come from one common source, the 
actual capabilities of the systems, BSD, and SysV at the time.  The goofy 
localtime_r and friends were (and still are) hacks to work around the 
simplistic interfaces in the original Bell Labs code that deeply assumed a 
single process with a single thread, i.e. there were only 3 segments, no 
shared libraries or shared r/w memory and definitely no (real) TLS.  The 
pthreads interfaces deeply assumed the same.  Even worse, it had to also run 
on VMS which (I don't believe) ever had kernel support for multiple user space 
theads.  ASTs are nasty enough and only worked because everybody knew that you 
were either in app code or the AST...  So all of this stuff is based on lowest 
common denominator.  The "Single UNIX Specification" was a peace treaty among 
warring computer companies out looking for lock-ins and competitive advantages 
while reluctantly realizing that the ISV community (Oracle) really didn't care 
about anything but a common base.

Let's move on.  Glibc has to keep some semblance of POSIX in order to cope 
with legacy code from that era and I commend them for their perseverance.  But 
that should not hold us back from leveraging today's architectures and, more 
importantly, today's multi-client workloads.  If that means extensions to 
POSIX fine.  But if POSIX gets hung up because *BSD doesn't have kernel support 
for thread level creds and *bsd.org doesn't want to do it, we just do what fits 
today's requirements and what Linux is capable of,

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-04-22 16:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-27  0:23 Thoughts on credential switching Andy Lutomirski
2014-03-27  0:42 ` Serge Hallyn
2014-03-27  1:01   ` Andy Lutomirski
2014-03-27 15:41     ` Florian Weimer
2014-03-27 16:21       ` Andy Lutomirski
2014-03-27  2:48 ` Jeff Layton
2014-03-27  3:05   ` Andy Lutomirski
2014-03-27  3:25     ` Jeff Layton
2014-03-27 14:08       ` Jeff Layton
2014-03-29  6:43         ` Alex Elsayed
2014-03-30 13:03         ` Theodore Ts'o
2014-03-30 18:56           ` Andy Lutomirski
2014-03-31 11:51           ` Jeff Layton
2014-03-31 18:06             ` Trond Myklebust
2014-03-31 18:06               ` Trond Myklebust
2014-03-31 18:12               ` Jeff Layton
2014-03-31 18:12                 ` Jeff Layton
2014-03-31 19:26               ` Andy Lutomirski
2014-03-31 20:14                 ` Trond Myklebust
2014-03-31 21:25                   ` Andy Lutomirski
2014-03-27 12:46 ` Florian Weimer
2014-03-27 13:02   ` Jeff Layton
2014-03-27 13:06     ` Florian Weimer
2014-03-27 13:33       ` Boaz Harrosh
2014-04-22 11:37         ` Florian Weimer
2014-04-22 12:14           ` Boaz Harrosh
2014-04-22 16:35             ` Jim Lieb
2014-04-22 16:35               ` Jim Lieb
2014-03-27 14:01       ` Jeff Layton
2014-03-27 18:26         ` Jeremy Allison
2014-03-27 18:46           ` Andy Lutomirski
2014-03-27 18:56             ` Jeremy Allison
2014-03-27 19:02               ` Andy Lutomirski
2014-03-27 19:30           ` Jim Lieb
2014-03-27 19:45             ` Andy Lutomirski
2014-03-27 20:47               ` Jim Lieb
2014-03-27 20:47                 ` Jim Lieb
2014-03-27 21:19                 ` Andy Lutomirski
2014-03-31 10:44 ` One Thousand Gnomes
2014-03-31 16:49   ` Andy Lutomirski
2014-04-01 20:22     ` One Thousand Gnomes
2014-03-31 19:05   ` Jeremy Allison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.