All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] System call to switch user credentials
@ 2013-10-16 22:01 Jim Lieb
  2013-10-16 22:01 ` [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Jim Lieb
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-16 22:01 UTC (permalink / raw)
  To: tytso, viro; +Cc: linux-fsdevel, linux-kernel, bfields, jlayton


This system call implementation is the result of discussions at LSF
this last Feb about providing better kernel support to user mode file
servers.

Our use case is an NFS+pNFS+9P file server in user space.  We have to
switch user credentials for a number of operations such as CREATE,
MKDIR, and WRITE.  We currently use setfsuid(), setfsgid(), and setgroups()
for each of these calls followed by the same set of syscalls to revert
to root privileges.  We also must do a setcap to disable root privs so that
quotas and access checks work properly.  This results in a minimum of
7 system calls for each affected filesystem operation.  Each syscall
in this set creates a new creds object with its associated RCU resources.

Knfsd calls nfsd_setuser() to do the same thing in one call.

This system call does the same function as nfsd_setuser() but for user space.
It replaces the six system calls with just two and uses RCU more efficiently
by only doing it once.  This is done using the following struct which
combines all of the arguments that are passed but the corrent syscalls
and passes them to the system call.

	struct user_creds {
		uid_t		uid;
		gid_t		gid;
		unsigned	ngroups;
		gid_t		altgroups[0];
	};

Inside our server, we have implemented two functions to manage credentials
using a local user identity cache that has the following structure.
The req_ctx contains two members of interest, a pointer to the credentials
that are constructed by the server from the protocol and a file descriptor
which is described below.  The rest of the structure is housekeeping for
the user identity cache.

	struct req_ctx {
		/* other stuff like avl tree links */
		struct user_creds *creds;
		int creds_fd;
	};

The typical sequence for a protocol operation that creates or morphs
an object in the filesystem is:

	ctx = get_ctx(me->uid);
	become_client(ctx);
	/* mkdir, mknod, write as client user */
	restore_creds();

I have left out error handling to simplify the flow.  This replaces the
current setfsuid();setfsgid();setgroups(); before and after. get_ctx()
does a lookup in the cache.

The become_client function uses two forms of the system call.  We will
take the second first.  The SWCREDS_FSIDS command creates a new creds
for the task and fills it from the user_creds argument.  It also clears
the set of root capabilities in the effective capabilities.  This is
functionally equivalent to what nfsd_setuser() does for knfsd.

We currently set the reduced capabilities globally in order to keep the
overhead down.  This system call does this per call, leaving the rest
of the server with full capabilities.

This version of the system call opens an anonymous file and returns
an fd for it.  This fd is useless for I/O but it does allow
us to cache creds cheaply.  We close the file when we purge the cache
entry.

The first form uses the SWCREDS_FROMFD command and the appropriate fd
that was returned for this client user earlier.  The creds referenced
by the fd are used to override_creds the task's effective creds.  Actually,
any open file will do but the opened anonymous file is the least
overhead because all it consumes is a filp and fd slot.  The override_creds
does not consume any RCU resources so it is much faster and consumes
fewer resources.

int become_client(struct req_ctx *ctx)
{
	int ret;

	if (ctx->creds_fd >= 0) {
		ret = switch_creds(SWCREDS_FROMFD, ctx->creds_fd);
		if (ret < 0) {
			perror("become_client failed!");
			return ret;
		}
	} else {
		ret = switch_creds(SWCREDS_FSIDS, (unsigned long)ctx->creds);
		if (ret < 0) {
			perror("become_client with creds failed!");
			return ret;
		} else {
			fprintf(stderr, "New client: uid= %d, fd = %d\n",
			       ctx->creds->uid, ret);
			ctx->creds_fd = ret;
		}
	}
	return 0;
}

The restore_creds function simply uses the SWCREDS_REVERT command which
restores the task's real creds.  This is the safest route in our code
but one could also switch directly to another set safely.

int restore_creds(void)
{
	int ret;

	ret = switch_creds(SWCREDS_REVERT, 0);
	if (ret < 0) {
		perror("switch_creds back failed!\n");
		return ret;
	}
	return 0;
}

The first patch implements the system call itself.  The second two add
the syscall linkage for X86 and X86_64.  I chose the next available
numbers for those architectures as of 3.12-RC5. I added these patches as a
temporary bridge until official numbers are assigned.  I have also not
added entries for other architectures but there is nothing architecturally
dependent in this syscall so when appropriate, numbers can be assigned.

Please review and comment to me.  The code fragments above are from my
test program.

Regards,

Jim Lieb
NFS Ganesha project

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-16 22:01 [RFC PATCH 0/3] System call to switch user credentials Jim Lieb
@ 2013-10-16 22:01 ` Jim Lieb
  2013-10-16 22:42   ` Al Viro
  2013-10-16 22:01 ` [PATCH 2/3] switch_creds: Add x86 syscall number Jim Lieb
  2013-10-16 22:01 ` [PATCH 3/3] switch_creds: Assign x86_64 syscall number for switch_creds Jim Lieb
  2 siblings, 1 reply; 26+ messages in thread
From: Jim Lieb @ 2013-10-16 22:01 UTC (permalink / raw)
  To: tytso, viro; +Cc: linux-fsdevel, linux-kernel, bfields, jlayton, Jim Lieb

File servers must do some operations with the credentials of
their client.  This syscall switches the key credentials similar
to nfsd_setuser() in fs/nfsd/auth.c  with the capability of retaining a
handle to the credentials by way of an fd for an open anonymous file.
This makes switching for subsequent operations for that client more efficient.

Signed-off-by: Jim Lieb <jlieb@panasas.com>
---
 include/linux/cred.h     |  15 ++++
 include/linux/syscalls.h |   2 +
 kernel/sys.c             | 175 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys_ni.c          |   3 +
 4 files changed, 195 insertions(+)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 04421e8..66a006e 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -138,6 +138,21 @@ struct cred {
 	struct rcu_head	rcu;		/* RCU deletion hook */
 };
 
+/* switch_creds() syscall parameters*/
+
+#define SWCREDS_REVERT 0
+#define SWCREDS_FROMFD 1
+#define SWCREDS_FSIDS  2
+
+/* arg for SWCREDS_FSIDS command */
+
+struct user_creds {
+	uid_t		uid;
+	gid_t		gid;
+	unsigned	ngroups;
+	gid_t		altgroups[0];
+};
+
 extern void __put_cred(struct cred *);
 extern void exit_creds(struct task_struct *);
 extern int copy_creds(struct task_struct *, unsigned long);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 7fac04e..3550a8f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -64,6 +64,7 @@ struct old_linux_dirent;
 struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
+struct user_creds;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -231,6 +232,7 @@ asmlinkage long sys_setresuid(uid_t ruid, uid_t euid, uid_t suid);
 asmlinkage long sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid);
 asmlinkage long sys_setfsuid(uid_t uid);
 asmlinkage long sys_setfsgid(gid_t gid);
+asmlinkage long sys_switch_creds(int cmd, unsigned long arg);
 asmlinkage long sys_setpgid(pid_t pid, pid_t pgid);
 asmlinkage long sys_setsid(void);
 asmlinkage long sys_setgroups(int gidsetsize, gid_t __user *grouplist);
diff --git a/kernel/sys.c b/kernel/sys.c
index c18ecca..cebc661 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -53,6 +53,7 @@
 #include <linux/rcupdate.h>
 #include <linux/uidgid.h>
 #include <linux/cred.h>
+#include <linux/anon_inodes.h>
 
 #include <linux/kmsg_dump.h>
 /* Move somewhere else to avoid recompiling? */
@@ -798,6 +799,180 @@ change_okay:
 	return old_fsgid;
 }
 
+static int swcreds_release(struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+static int swcreds_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	const struct group_info *group_info = f->f_cred->group_info;
+	int ret;
+
+	ret = seq_printf(m, "setcreds: fsuid = %d, fsgid = %d, ngroups = %d\n",
+			 from_kuid_munged(current_user_ns(),
+					  f->f_cred->fsuid),
+			 from_kgid_munged(current_user_ns(),
+					  f->f_cred->fsgid),
+			 group_info->ngroups);
+	if (!ret) {
+		int i;
+
+		for (i = 0; i < f->f_cred->group_info->ngroups; i++) {
+			ret = seq_printf(m, "altgroup[%d] = %d\n",
+					 i,
+					 from_kgid_munged(current_user_ns(),
+							  GROUP_AT(group_info,
+								   i)));
+			if (ret)
+				break;
+		}
+	}
+	return ret;
+}
+#endif
+
+static const struct file_operations swcreds_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo = swcreds_show_fdinfo,
+#endif
+	.release = swcreds_release
+};
+
+/* do_switch_creds - set thread creds from struct pointed to by arg
+ *
+ * Does setfsuid, setfsgid, setgroups for thread in one call and one rcu update.
+ * drop special root capabilities as well which are not dropped by setfsuid.
+ * This code is similar to nfsd_setuid in concept.
+ */
+
+static int do_switch_creds(unsigned long arg)
+{
+	const struct user_creds  __user *creds;
+	const gid_t __user *alt_groups;
+	const struct cred *old = current_cred();
+	int i, new_fd;
+	struct cred *new;
+	struct user_creds ncred;
+	struct group_info *group_info;
+	int retval;
+
+	creds  = (const struct user_creds  __user *)arg;
+	/* validate and copyin creds */
+	if (copy_from_user(&ncred, creds, sizeof(struct user_creds)))
+		return -EFAULT;
+
+	if (ncred.ngroups > NGROUPS_MAX)
+		return -EINVAL;
+	new = prepare_creds();
+	if (!new)
+		return -ENOMEM;
+	group_info = groups_alloc(ncred.ngroups);
+	if (!group_info) {
+		abort_creds(new);
+		return -ENOMEM;
+	}
+	new->fsuid = make_kuid(old->user_ns, ncred.uid);
+	new->fsgid = make_kgid(old->user_ns, ncred.gid);
+	alt_groups = &creds->altgroups[0];
+	for (i = 0; i < ncred.ngroups; i++) {
+		kgid_t kgid;
+		gid_t gid;
+
+		if (get_user(gid, alt_groups+i)) {
+			retval = -EFAULT;
+			goto bad_altgrp;
+		}
+		kgid = make_kgid(old->user_ns, gid);
+		if (!gid_valid(kgid)) {
+			retval = -EINVAL;
+			goto bad_altgrp;
+		}
+		GROUP_AT(group_info, i) = kgid;
+	}
+	retval = set_groups(new, group_info);
+	put_group_info(group_info);
+	if (retval < 0) {
+		abort_creds(new);
+		return retval;
+	}
+	/* We need to be the real user in capabilities as well.
+	 * don't leak my root caps into the mix */
+	new->cap_effective = cap_drop_nfsd_set(new->cap_effective);
+	old = override_creds(new);
+	/* now open a anon file to preserve these creds
+	 * and return the fd
+	 */
+	new_fd = anon_inode_getfd("[switch_creds]",
+				  &swcreds_fops,
+				  NULL,
+				  (O_RDONLY|O_CLOEXEC));
+	if (new_fd < 0) {
+		revert_creds(old);
+		abort_creds(new);
+	}
+	return new_fd;
+
+bad_altgrp:
+	put_group_info(group_info);
+	abort_creds(new);
+	return retval;
+}
+
+/**
+ *  switch_creds -- Change user (client) file system credentials
+ *
+ *  Switch the thread's current file access credentials from the argument.
+ *  The end result is the thread has the fsuid, fsgid, altgroups and
+ *  non-root capabilities of the user we want to be for subsequent syscalls.
+ *
+ *  @cmd  SWCREDS_REVERT  revert thread's creds to its real creds.
+ *                        Always returns 0
+ *        SWCREDS_FROM_FD override current creds with creds from open file.
+ *                        Returns 0 or file related error.
+ *        SWCREDS_FSIDS   set fsuid, fsgid, altgroups from arg
+ *                        Return fd or error
+ *  @arg fd or pointer to creds struct
+ *
+ *  The returned fd is a somewhat useless but minimally resource consuming
+ *  anon file that can only be used to store the new creds state.
+ *
+ *  Return 0, fd, or error.
+ */
+
+SYSCALL_DEFINE2(switch_creds, int, cmd, unsigned long, arg)
+{
+	const struct cred *old;
+
+	old = current_cred();
+	if (!ns_capable(old->user_ns, CAP_SETGID) ||
+	    !ns_capable(old->user_ns, CAP_SETUID))
+		return -EPERM;
+
+	if (cmd == SWCREDS_REVERT) {
+		if (current->real_cred != current->cred) {
+			revert_creds(current->real_cred);
+			return 0;
+		}
+	}
+	if (cmd == SWCREDS_FROMFD) {
+		struct fd f;
+		const struct cred *file_cred;
+
+		f = fdget(arg);
+		if (unlikely(!f.file))
+			return -EBADF;
+		file_cred = f.file->f_cred;
+		(void)override_creds(file_cred);
+		fdput(f);
+		return 0;
+	} else if (cmd == SWCREDS_FSIDS) {
+		return do_switch_creds(arg);
+	}
+	return -EINVAL;
+}
+
 /**
  * sys_getpid - return the thread group id of the current process
  *
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7078052..7573cad 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -209,3 +209,6 @@ cond_syscall(compat_sys_open_by_handle_at);
 
 /* compare kernel pointers */
 cond_syscall(sys_kcmp);
+
+/* switch task creds */
+cond_syscall(sys_switch_creds);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] switch_creds: Add x86 syscall number
  2013-10-16 22:01 [RFC PATCH 0/3] System call to switch user credentials Jim Lieb
  2013-10-16 22:01 ` [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Jim Lieb
@ 2013-10-16 22:01 ` Jim Lieb
  2013-10-16 22:01 ` [PATCH 3/3] switch_creds: Assign x86_64 syscall number for switch_creds Jim Lieb
  2 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-16 22:01 UTC (permalink / raw)
  To: tytso, viro; +Cc: linux-fsdevel, linux-kernel, bfields, jlayton, Jim Lieb

This is temporary number awaiting syscall number assignment.

Signed-off-by: Jim Lieb <jlieb@panasas.com>
---
 arch/x86/syscalls/syscall_32.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index aabfb83..b836839 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -357,3 +357,4 @@
 348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
+351	i386	switch_creds		sys_switch_creds
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] switch_creds: Assign x86_64 syscall number for switch_creds
  2013-10-16 22:01 [RFC PATCH 0/3] System call to switch user credentials Jim Lieb
  2013-10-16 22:01 ` [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Jim Lieb
  2013-10-16 22:01 ` [PATCH 2/3] switch_creds: Add x86 syscall number Jim Lieb
@ 2013-10-16 22:01 ` Jim Lieb
  2 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-16 22:01 UTC (permalink / raw)
  To: tytso, viro; +Cc: linux-fsdevel, linux-kernel, bfields, jlayton, Jim Lieb

This is a temporary while waiting for syscall number assignment.

Signed-off-by: Jim Lieb <jlieb@panasas.com>
---
 arch/x86/syscalls/syscall_64.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..f46b75c 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
 311	64	process_vm_writev	sys_process_vm_writev
 312	common	kcmp			sys_kcmp
 313	common	finit_module		sys_finit_module
+314	common	switch_creds		sys_switch_creds
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-16 22:01 ` [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Jim Lieb
@ 2013-10-16 22:42   ` Al Viro
  2013-10-17  1:18     ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2013-10-16 22:42 UTC (permalink / raw)
  To: Jim Lieb; +Cc: tytso, viro, linux-fsdevel, linux-kernel, bfields, jlayton

On Wed, Oct 16, 2013 at 03:01:57PM -0700, Jim Lieb wrote:
> File servers must do some operations with the credentials of
> their client.  This syscall switches the key credentials similar
> to nfsd_setuser() in fs/nfsd/auth.c  with the capability of retaining a
> handle to the credentials by way of an fd for an open anonymous file.
> This makes switching for subsequent operations for that client more efficient.

	Yet Another Untyped Multiplexor.  Inna bun.  Onna stick.
CMOT Dibbler special...

	Switching creds to those of opener of given file descriptor
is fine, but in any realistic situation you'll get all the real win
from that - you should cache those fds (which you seem to do), and
then setuid/etc. is done once per cache miss.  Making the magical
"set them all at once" mess (complete with non-trivial structure,
32/64bit compat, etc.) pointless.  Moreover, you don't need any magic
files at all - just set the creds and open /dev/null and there's your fd.
With proper creds associated with it.  While we are at it, just _start_
with opening /dev/null.  With your initial creds.  Voila - revert is
simply switch to that fd's creds.

	IOW, you really need only one syscall:

SYSCALL_DEFINE1(switch_cred, int, fd)
{
	struct fd f = fdget(fd);
	if (!f.file)
		return -EBADF;
	put_cred(override_creds(f.file->f_cred);
	fdput(f);
	return 0;
}

and that's all there is to it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-16 22:42   ` Al Viro
@ 2013-10-17  1:18     ` Eric W. Biederman
  2013-10-17  1:20       ` Al Viro
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2013-10-17  1:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Jim Lieb, tytso, viro, linux-fsdevel, linux-kernel, bfields, jlayton

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Wed, Oct 16, 2013 at 03:01:57PM -0700, Jim Lieb wrote:
>> File servers must do some operations with the credentials of
>> their client.  This syscall switches the key credentials similar
>> to nfsd_setuser() in fs/nfsd/auth.c  with the capability of retaining a
>> handle to the credentials by way of an fd for an open anonymous file.
>> This makes switching for subsequent operations for that client more efficient.
>
> 	Yet Another Untyped Multiplexor.  Inna bun.  Onna stick.
> CMOT Dibbler special...
>
> 	Switching creds to those of opener of given file descriptor
> is fine, but in any realistic situation you'll get all the real win
> from that - you should cache those fds (which you seem to do), and
> then setuid/etc. is done once per cache miss.  Making the magical
> "set them all at once" mess (complete with non-trivial structure,
> 32/64bit compat, etc.) pointless.  Moreover, you don't need any magic
> files at all - just set the creds and open /dev/null and there's your fd.
> With proper creds associated with it.  While we are at it, just _start_
> with opening /dev/null.  With your initial creds.  Voila - revert is
> simply switch to that fd's creds.
>
> 	IOW, you really need only one syscall:

That doesn't look bad but it does need capable(CAP_SETUID) &&
capable(CAP_SETGID) or possibly something a little more refined.

I don't think we want file descriptor passing to all of a sudden become
a grant of privilege, beyond what the passed fd can do.

> SYSCALL_DEFINE1(switch_cred, int, fd)
> {
> 	struct fd f = fdget(fd);
> 	if (!f.file)
> 		return -EBADF;
> 	put_cred(override_creds(f.file->f_cred);
> 	fdput(f);
> 	return 0;
> }
>
> and that's all there is to it.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-17  1:18     ` Eric W. Biederman
@ 2013-10-17  1:20       ` Al Viro
  2013-10-17  3:35           ` Jim Lieb
  2013-10-17  3:52         ` Eric W. Biederman
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2013-10-17  1:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jim Lieb, tytso, viro, linux-fsdevel, linux-kernel, bfields, jlayton

On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:

> That doesn't look bad but it does need capable(CAP_SETUID) &&
> capable(CAP_SETGID) or possibly something a little more refined.

D'oh

> I don't think we want file descriptor passing to all of a sudden become
> a grant of privilege, beyond what the passed fd can do.

Definitely.  And an extra ) to make it compile wouldn't hurt either...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-17  1:20       ` Al Viro
@ 2013-10-17  3:35           ` Jim Lieb
  2013-10-17  3:52         ` Eric W. Biederman
  1 sibling, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-17  3:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, tytso, viro, linux-fsdevel, linux-kernel,
	bfields, jlayton

On Thursday, October 17, 2013 02:20:50 Al Viro wrote:
> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> > That doesn't look bad but it does need capable(CAP_SETUID) &&
> > capable(CAP_SETGID) or possibly something a little more refined.
> 
> D'oh
> 
> > I don't think we want file descriptor passing to all of a sudden become
> > a grant of privilege, beyond what the passed fd can do.
> 
> Definitely.  And an extra ) to make it compile wouldn't hurt either...

Ok, I'll rework this, dropping the void arg etc.  How about this:

1. have one arg, the fd, i.e. SYSCALL_DEFINE1(switch_cred, int, fd)

2. if the fd >=0 do the override in my "use the fd" variation.  This would do 
the capability check after the valid fd check.  This means that you must have
privs to mess with privs.  Returns 0 or either EBADF or EPERM

3. if the fd == -1 do the revert case.  The reason for this is there are 4 
syscalls needed to change the creds and each has an error return.  We need
a way to escape the damage and a revert to the real creds set is the best way 
to return to a known state.  This does not require a capability check because
all that can happen is to return to the immutable real set.  Also, I don't 
need the initial open of /dev/null.

Does this fit?

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
@ 2013-10-17  3:35           ` Jim Lieb
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-17  3:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, tytso, viro, linux-fsdevel, linux-kernel,
	bfields, jlayton

On Thursday, October 17, 2013 02:20:50 Al Viro wrote:
> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> > That doesn't look bad but it does need capable(CAP_SETUID) &&
> > capable(CAP_SETGID) or possibly something a little more refined.
> 
> D'oh
> 
> > I don't think we want file descriptor passing to all of a sudden become
> > a grant of privilege, beyond what the passed fd can do.
> 
> Definitely.  And an extra ) to make it compile wouldn't hurt either...

Ok, I'll rework this, dropping the void arg etc.  How about this:

1. have one arg, the fd, i.e. SYSCALL_DEFINE1(switch_cred, int, fd)

2. if the fd >=0 do the override in my "use the fd" variation.  This would do 
the capability check after the valid fd check.  This means that you must have
privs to mess with privs.  Returns 0 or either EBADF or EPERM

3. if the fd == -1 do the revert case.  The reason for this is there are 4 
syscalls needed to change the creds and each has an error return.  We need
a way to escape the damage and a revert to the real creds set is the best way 
to return to a known state.  This does not require a capability check because
all that can happen is to return to the immutable real set.  Also, I don't 
need the initial open of /dev/null.

Does this fit?

Jim
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-17  1:20       ` Al Viro
  2013-10-17  3:35           ` Jim Lieb
@ 2013-10-17  3:52         ` Eric W. Biederman
  2013-10-24  1:14           ` Andy Lutomirski
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2013-10-17  3:52 UTC (permalink / raw)
  To: Al Viro
  Cc: Jim Lieb, tytso, viro, linux-fsdevel, linux-kernel, bfields, jlayton

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>
>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> capable(CAP_SETGID) or possibly something a little more refined.
>
> D'oh
>
>> I don't think we want file descriptor passing to all of a sudden become
>> a grant of privilege, beyond what the passed fd can do.
>
> Definitely.  And an extra ) to make it compile wouldn't hurt either...

There also appears to need to be a check that we don't gain any
capabilities.

We also need a check so that you don't gain any capabilities, and
possibly a few other things.

So I suspect we want a check something like:

if ((new_cred->securebits != current_cred->securebits)  ||
    (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
    (new_cred->cap_permitted != current_cred->cap_permitted) ||
    (new_cred->cap_effective != current_cred->cap_effective) ||
    (new_cred->cap_bset != current_cred->cap_bset) ||
    (new_cred->jit_keyring != current_cred->jit_keyring) ||
    (new_cred->session_keyring != current_cred->session_keyring) ||
    (new_cred->process_keyring != current_cred->process_keyring) ||
    (new_cred->thread_keyring != current_cred->thread_keyring) ||
    (new_cred->request_keyring != current_cred->request_keyring) ||
    (new_cred->security != current_cred->security) ||
    (new_cred->user_ns != current_cred->user_ns)) {
	return -EPERM;
}

Eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-17  3:52         ` Eric W. Biederman
@ 2013-10-24  1:14           ` Andy Lutomirski
  2013-10-24  5:59             ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2013-10-24  1:14 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro
  Cc: Jim Lieb, tytso, viro, linux-fsdevel, linux-kernel, bfields, jlayton

On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> Al Viro <viro@ZenIV.linux.org.uk> writes:
> 
>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>
>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>> capable(CAP_SETGID) or possibly something a little more refined.
>>
>> D'oh
>>
>>> I don't think we want file descriptor passing to all of a sudden become
>>> a grant of privilege, beyond what the passed fd can do.
>>
>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
> 
> There also appears to need to be a check that we don't gain any
> capabilities.
> 
> We also need a check so that you don't gain any capabilities, and
> possibly a few other things.

Why?  I like the user_ns part, but I'm not immediately seeing the issue
with capabilities.

> 
> So I suspect we want a check something like:
> 
> if ((new_cred->securebits != current_cred->securebits)  ||
>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>     (new_cred->cap_effective != current_cred->cap_effective) ||
>     (new_cred->cap_bset != current_cred->cap_bset) ||
>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>     (new_cred->session_keyring != current_cred->session_keyring) ||
>     (new_cred->process_keyring != current_cred->process_keyring) ||
>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>     (new_cred->request_keyring != current_cred->request_keyring) ||
>     (new_cred->security != current_cred->security) ||
>     (new_cred->user_ns != current_cred->user_ns)) {
> 	return -EPERM;
> }
> 

I *really* don't like the idea of being able to use any old file
descriptor.  I barely care what rights the caller needs to have to
invoke this -- if you're going to pass an fd that grants a capability
(in the non-Linux sense of the work), please make sure that the sender
actually wants that behavior.

IOW, have a syscall to generate a special fd for this purpose.  It's
only a couple lines of code, and I think we'll really regret it if we
fsck this up.

(I will take it as a personal challenge to find at least one exploitable
privilege escalation in this if an arbitrary fd works.)

Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
and faccessat.  That is, current userspace can't see it.  But now you'll
expose various oddities.  For example, AFAICS a capability-less process
that's a userns owner can always use setuid.  This will *overwrite*
real_cred.  Then you're screwed, especially if this happens by accident.

That being said, Windows has had functions like this for a long time.
Processes have a primary token and possibly an impersonation token.  Any
process can call ImpersonateLoggedOnUser (no privilege required) to
impersonate the credentials of a token (which is special kind of fd).
Similarly, any process can call RevertToSelf to undo it.

Is there any actual problem with allowing completely unprivileged tasks
to switch to one of these magic cred fds?  That would avoid needing a
"revert" operation.


Another note: I think that there may be issues if the creator of a token
has no_new_privs set and the user doesn't.  Imagine a daemon that
accepts one of these fds, impersonates it, and calls exec.  This could
be used to escape from no_new_privs land.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-24  1:14           ` Andy Lutomirski
@ 2013-10-24  5:59             ` Eric W. Biederman
  2013-10-24 19:04                 ` Jim Lieb
  2013-10-24 19:28               ` Andy Lutomirski
  0 siblings, 2 replies; 26+ messages in thread
From: Eric W. Biederman @ 2013-10-24  5:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Jim Lieb, tytso, viro, linux-fsdevel, linux-kernel,
	bfields, jlayton

Andy Lutomirski <luto@amacapital.net> writes:

> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> Al Viro <viro@ZenIV.linux.org.uk> writes:
>> 
>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>>
>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>>> capable(CAP_SETGID) or possibly something a little more refined.
>>>
>>> D'oh
>>>
>>>> I don't think we want file descriptor passing to all of a sudden become
>>>> a grant of privilege, beyond what the passed fd can do.
>>>
>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
>> 
>> There also appears to need to be a check that we don't gain any
>> capabilities.
>> 
>> We also need a check so that you don't gain any capabilities, and
>> possibly a few other things.
>
> Why?  I like the user_ns part, but I'm not immediately seeing the issue
> with capabilities.

My reasoning was instead of making this syscall as generic as possible
start it out by only allowing the cases Jim cares about and working with
a model where you can't gain any permissions you couldn't gain
otherwise.

Although the fd -1 trick to revert to your other existing cred seems
reasonable.  

>> So I suspect we want a check something like:
>> 
>> if ((new_cred->securebits != current_cred->securebits)  ||
>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>>     (new_cred->cap_effective != current_cred->cap_effective) ||
>>     (new_cred->cap_bset != current_cred->cap_bset) ||
>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>>     (new_cred->session_keyring != current_cred->session_keyring) ||
>>     (new_cred->process_keyring != current_cred->process_keyring) ||
>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>>     (new_cred->request_keyring != current_cred->request_keyring) ||
>>     (new_cred->security != current_cred->security) ||
>>     (new_cred->user_ns != current_cred->user_ns)) {
>> 	return -EPERM;
>> }
>> 
>
> I *really* don't like the idea of being able to use any old file
> descriptor.  I barely care what rights the caller needs to have to
> invoke this -- if you're going to pass an fd that grants a capability
> (in the non-Linux sense of the work), please make sure that the sender
> actually wants that behavior.
>
> IOW, have a syscall to generate a special fd for this purpose.  It's
> only a couple lines of code, and I think we'll really regret it if we
> fsck this up.
>
> (I will take it as a personal challenge to find at least one exploitable
> privilege escalation in this if an arbitrary fd works.)

If you can't switch to a uid or a gid you couldn't switch to otherwise
then the worst that can happen is an information leak.  And information
leaks are rarely directly exploitable.

> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> and faccessat.  That is, current userspace can't see it.  But now you'll
> expose various oddities.  For example, AFAICS a capability-less process
> that's a userns owner can always use setuid.  This will *overwrite*
> real_cred.  Then you're screwed, especially if this happens by
> accident.

And doing in userland what faccessat, and knfsd do in the kernel is
exactly what is desired here.  But maybe there are issues with that.

> That being said, Windows has had functions like this for a long time.
> Processes have a primary token and possibly an impersonation token.  Any
> process can call ImpersonateLoggedOnUser (no privilege required) to
> impersonate the credentials of a token (which is special kind of fd).
> Similarly, any process can call RevertToSelf to undo it.
>
> Is there any actual problem with allowing completely unprivileged tasks
> to switch to one of these magic cred fds?  That would avoid needing a
> "revert" operation.

If the permission model is this switching of credentials doesn't get you
anything you couldn't get some other way.  That would seem to totally
rules out unprivileged processes switching these things.

> Another note: I think that there may be issues if the creator of a token
> has no_new_privs set and the user doesn't.  Imagine a daemon that
> accepts one of these fds, impersonates it, and calls exec.  This could
> be used to escape from no_new_privs land.

Which is why I was suggesting that we don't allow changing any field in
the cred except for uids and gids.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
  2013-10-24  5:59             ` Eric W. Biederman
@ 2013-10-24 19:04                 ` Jim Lieb
  2013-10-24 19:28               ` Andy Lutomirski
  1 sibling, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-24 19:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Al Viro, tytso, viro, linux-fsdevel,
	linux-kernel, bfields, jlayton

On Wednesday, October 23, 2013 22:59:54 Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> > On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>> 
> >>> D'oh
> >>> 
> >>>> I don't think we want file descriptor passing to all of a sudden become
> >>>> a grant of privilege, beyond what the passed fd can do.
> >>> 
> >>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
> >> 
> >> There also appears to need to be a check that we don't gain any
> >> capabilities.
> >> 
> >> We also need a check so that you don't gain any capabilities, and
> >> possibly a few other things.

My original api does not pass any caps however, the code in setting up the new 
creds does mask off a number of them in the effective set, the same set that 
knfsd does.  So all I can do is reduce, not extend.  We need to reduce caps 
because without that, quotas and access get bypassed.  If I do the generic (4 
syscalls), this opens up lots of options for setting things which is not my 
need or intent.

> > 
> > Why?  I like the user_ns part, but I'm not immediately seeing the issue
> > with capabilities.

As for name spaces, we don't have much interest here for two reasons.

1. if the server does run under a different namespace, so be it. The syscall 
does make those translations.  If you want to firewall off the service and its 
exports, fine.

2. in our main use case, nfs-ganesha is running as a pNFS metadata server and 
its uid namespace comes from somewhere else (krb5 etc.) anyway.  The kernel 
etc. it is running on is completely separate, a black box with its own 
nsswitch options, probably local only with only a small set of admin entries.

Just to be clear, if there is any mapping between uids within the ganesha env 
and the local env (an nfs server that is also an app server), all of the above 
applies assuming the nsswitch/idmapper/ipa is set up properly.  The how that 
happens is outside scope at this level.
> 
> My reasoning was instead of making this syscall as generic as possible
> start it out by only allowing the cases Jim cares about and working with
> a model where you can't gain any permissions you couldn't gain
> otherwise.

Right.  That is my intent, to do the same as nfsd_setuser.  We need to 
impersonate the user at the client per the creds we get from the appropriate 
authentication service and to only do it for file access, nothing more.  The 
api I chose allows later extension but only supports what we need now.

I looked at the case of a setuid or seteuid with this and the use case is 
NFS/CIFS/9P/... user space file servers.  That is why I originally chose an 
evil multiplexed call.  I could split it into two or three syscalls but my 
intent in either api is to have a restricted set of creds.  There is a 
(potential) new use case in NFSv4.2 labels where we will also have to shove 
those down as well, adding another variant.  We've got to pass them in somehow 
and the set*uid mess is an example of the other way.  The setuid family used 
to be pretty simple (just two syscalls) but not anymore.  At least this is an 
extensible model that is consistent and returns useful things like EPERM 
rather than an "alternate" uid/gid that must be interpreted to find the error.

I could also restructure the api to have a typed argument.  One early pass had 
a struct that had everything in it, the op, the fd, and the creds but that has 
the pitfall of now freezing the whole api at the syscall entry point. If v4.2 
labels or CIFS SIDs must be passed in future, there would be no alternative 
but yet another syscall.   I do validate the arg.  The unsigned long does pass 
32/64.  The typedefs in the struct match what is used by the other syscalls.  
Our server uses the same typedefs internally so the usual porting issues look 
minimal (and the same as the setfsuid+setfsgid+setgroups it replaces).

I was also trying to cut down on the number of RCU cleanups here.  This is 
what nfsd_setuser already does.  Caching via using an fd as a handle to the 
kernel resource is good and lightweight  but our server under load will have 
to shed "creds" fds before I/O fds (for lock reasons) and that means the 
nfsd_setuser method of setting fsuid, fsgid, altgroups, and restricted caps 
could be the common path under load.  RCU is cheaper but it is not cheap and 
I'd really prefer to not have the fallback make matters worse.

Our use case is a meta-data server part of pNFS server, in particular, pNFS in 
a clustered env.  This means a higher metadata load where these switches have 
real cost.
> 
> Although the fd -1 trick to revert to your other existing cred seems
> reasonable.

In my revised api idea, I would still need the revert especially with multiple 
set* calls that have failure paths.  I may not know what was left so a full 
reset to real is the safest choice in the error path.  It is also cheap(er) in 
the post- file operation return path.

> 
> >> So I suspect we want a check something like:
> >> 
> >> if ((new_cred->securebits != current_cred->securebits)  ||
> >> 
> >>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >>     (new_cred->session_keyring != current_cred->session_keyring) ||
> >>     (new_cred->process_keyring != current_cred->process_keyring) ||
> >>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >>     (new_cred->request_keyring != current_cred->request_keyring) ||
> >>     (new_cred->security != current_cred->security) ||
> >>     (new_cred->user_ns != current_cred->user_ns)) {
> >> 	
> >> 	return -EPERM;
> >> 
> >> }
> > 
> > I *really* don't like the idea of being able to use any old file
> > descriptor.  I barely care what rights the caller needs to have to
> > invoke this -- if you're going to pass an fd that grants a capability
> > (in the non-Linux sense of the work), please make sure that the sender
> > actually wants that behavior.
> > 
> > IOW, have a syscall to generate a special fd for this purpose.  It's
> > only a couple lines of code, and I think we'll really regret it if we
> > fsck this up.
> > 
> > (I will take it as a personal challenge to find at least one exploitable
> > privilege escalation in this if an arbitrary fd works.)
> 
> If you can't switch to a uid or a gid you couldn't switch to otherwise
> then the worst that can happen is an information leak.  And information
> leaks are rarely directly exploitable.
> 
> > Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> > and faccessat.  That is, current userspace can't see it.  But now you'll
> > expose various oddities.  For example, AFAICS a capability-less process
> > that's a userns owner can always use setuid.  This will *overwrite*
> > real_cred.  Then you're screwed, especially if this happens by
> > accident.
> 
> And doing in userland what faccessat, and knfsd do in the kernel is
> exactly what is desired here.  But maybe there are issues with that.
> 
> > That being said, Windows has had functions like this for a long time.
> > Processes have a primary token and possibly an impersonation token.  Any
> > process can call ImpersonateLoggedOnUser (no privilege required) to
> > impersonate the credentials of a token (which is special kind of fd).
> > Similarly, any process can call RevertToSelf to undo it.
> > 
> > Is there any actual problem with allowing completely unprivileged tasks
> > to switch to one of these magic cred fds?  That would avoid needing a
> > "revert" operation.
> 
> If the permission model is this switching of credentials doesn't get you
> anything you couldn't get some other way.  That would seem to totally
> rules out unprivileged processes switching these things.
> 
> > Another note: I think that there may be issues if the creator of a token
> > has no_new_privs set and the user doesn't.  Imagine a daemon that
> > accepts one of these fds, impersonates it, and calls exec.  This could
> > be used to escape from no_new_privs land.
> 
> Which is why I was suggesting that we don't allow changing any field in
> the cred except for uids and gids.

Yes.  Which is why in my original patch I pass a user_creds structure that 
only has the fsuid, fsgid, and altgroups.  This means that the caller can 
*only* impersonate these particular creds for purposes of file access, not priv 
escalation.

As for magic fds, I didn't have but should add to the perms check a test to 
validate that the passed fd is one of our "magic" ones.    The other users of 
anon fds do a similar test to verify that the fd is one that they gave you. 
This check would prevent the using of a random open of /dev/null with unknown 
caps/creds because it would throw an error, preventing an escalation.  This fd 
is also opened with O_CLOEXEC so children don't get it.   Our use case doesn't 
need to fork/exec and if we needed to, we wouldn't be using switch_creds for 
it anyway.  Adding Close-on-exec removes the exploit opportunity.

Jim

> 
> Eric

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds:  Syscall to switch creds for file server ops
@ 2013-10-24 19:04                 ` Jim Lieb
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-24 19:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Al Viro, tytso, viro, linux-fsdevel,
	linux-kernel, bfields, jlayton

On Wednesday, October 23, 2013 22:59:54 Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> > On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>> 
> >>> D'oh
> >>> 
> >>>> I don't think we want file descriptor passing to all of a sudden become
> >>>> a grant of privilege, beyond what the passed fd can do.
> >>> 
> >>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
> >> 
> >> There also appears to need to be a check that we don't gain any
> >> capabilities.
> >> 
> >> We also need a check so that you don't gain any capabilities, and
> >> possibly a few other things.

My original api does not pass any caps however, the code in setting up the new 
creds does mask off a number of them in the effective set, the same set that 
knfsd does.  So all I can do is reduce, not extend.  We need to reduce caps 
because without that, quotas and access get bypassed.  If I do the generic (4 
syscalls), this opens up lots of options for setting things which is not my 
need or intent.

> > 
> > Why?  I like the user_ns part, but I'm not immediately seeing the issue
> > with capabilities.

As for name spaces, we don't have much interest here for two reasons.

1. if the server does run under a different namespace, so be it. The syscall 
does make those translations.  If you want to firewall off the service and its 
exports, fine.

2. in our main use case, nfs-ganesha is running as a pNFS metadata server and 
its uid namespace comes from somewhere else (krb5 etc.) anyway.  The kernel 
etc. it is running on is completely separate, a black box with its own 
nsswitch options, probably local only with only a small set of admin entries.

Just to be clear, if there is any mapping between uids within the ganesha env 
and the local env (an nfs server that is also an app server), all of the above 
applies assuming the nsswitch/idmapper/ipa is set up properly.  The how that 
happens is outside scope at this level.
> 
> My reasoning was instead of making this syscall as generic as possible
> start it out by only allowing the cases Jim cares about and working with
> a model where you can't gain any permissions you couldn't gain
> otherwise.

Right.  That is my intent, to do the same as nfsd_setuser.  We need to 
impersonate the user at the client per the creds we get from the appropriate 
authentication service and to only do it for file access, nothing more.  The 
api I chose allows later extension but only supports what we need now.

I looked at the case of a setuid or seteuid with this and the use case is 
NFS/CIFS/9P/... user space file servers.  That is why I originally chose an 
evil multiplexed call.  I could split it into two or three syscalls but my 
intent in either api is to have a restricted set of creds.  There is a 
(potential) new use case in NFSv4.2 labels where we will also have to shove 
those down as well, adding another variant.  We've got to pass them in somehow 
and the set*uid mess is an example of the other way.  The setuid family used 
to be pretty simple (just two syscalls) but not anymore.  At least this is an 
extensible model that is consistent and returns useful things like EPERM 
rather than an "alternate" uid/gid that must be interpreted to find the error.

I could also restructure the api to have a typed argument.  One early pass had 
a struct that had everything in it, the op, the fd, and the creds but that has 
the pitfall of now freezing the whole api at the syscall entry point. If v4.2 
labels or CIFS SIDs must be passed in future, there would be no alternative 
but yet another syscall.   I do validate the arg.  The unsigned long does pass 
32/64.  The typedefs in the struct match what is used by the other syscalls.  
Our server uses the same typedefs internally so the usual porting issues look 
minimal (and the same as the setfsuid+setfsgid+setgroups it replaces).

I was also trying to cut down on the number of RCU cleanups here.  This is 
what nfsd_setuser already does.  Caching via using an fd as a handle to the 
kernel resource is good and lightweight  but our server under load will have 
to shed "creds" fds before I/O fds (for lock reasons) and that means the 
nfsd_setuser method of setting fsuid, fsgid, altgroups, and restricted caps 
could be the common path under load.  RCU is cheaper but it is not cheap and 
I'd really prefer to not have the fallback make matters worse.

Our use case is a meta-data server part of pNFS server, in particular, pNFS in 
a clustered env.  This means a higher metadata load where these switches have 
real cost.
> 
> Although the fd -1 trick to revert to your other existing cred seems
> reasonable.

In my revised api idea, I would still need the revert especially with multiple 
set* calls that have failure paths.  I may not know what was left so a full 
reset to real is the safest choice in the error path.  It is also cheap(er) in 
the post- file operation return path.

> 
> >> So I suspect we want a check something like:
> >> 
> >> if ((new_cred->securebits != current_cred->securebits)  ||
> >> 
> >>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >>     (new_cred->session_keyring != current_cred->session_keyring) ||
> >>     (new_cred->process_keyring != current_cred->process_keyring) ||
> >>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >>     (new_cred->request_keyring != current_cred->request_keyring) ||
> >>     (new_cred->security != current_cred->security) ||
> >>     (new_cred->user_ns != current_cred->user_ns)) {
> >> 	
> >> 	return -EPERM;
> >> 
> >> }
> > 
> > I *really* don't like the idea of being able to use any old file
> > descriptor.  I barely care what rights the caller needs to have to
> > invoke this -- if you're going to pass an fd that grants a capability
> > (in the non-Linux sense of the work), please make sure that the sender
> > actually wants that behavior.
> > 
> > IOW, have a syscall to generate a special fd for this purpose.  It's
> > only a couple lines of code, and I think we'll really regret it if we
> > fsck this up.
> > 
> > (I will take it as a personal challenge to find at least one exploitable
> > privilege escalation in this if an arbitrary fd works.)
> 
> If you can't switch to a uid or a gid you couldn't switch to otherwise
> then the worst that can happen is an information leak.  And information
> leaks are rarely directly exploitable.
> 
> > Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> > and faccessat.  That is, current userspace can't see it.  But now you'll
> > expose various oddities.  For example, AFAICS a capability-less process
> > that's a userns owner can always use setuid.  This will *overwrite*
> > real_cred.  Then you're screwed, especially if this happens by
> > accident.
> 
> And doing in userland what faccessat, and knfsd do in the kernel is
> exactly what is desired here.  But maybe there are issues with that.
> 
> > That being said, Windows has had functions like this for a long time.
> > Processes have a primary token and possibly an impersonation token.  Any
> > process can call ImpersonateLoggedOnUser (no privilege required) to
> > impersonate the credentials of a token (which is special kind of fd).
> > Similarly, any process can call RevertToSelf to undo it.
> > 
> > Is there any actual problem with allowing completely unprivileged tasks
> > to switch to one of these magic cred fds?  That would avoid needing a
> > "revert" operation.
> 
> If the permission model is this switching of credentials doesn't get you
> anything you couldn't get some other way.  That would seem to totally
> rules out unprivileged processes switching these things.
> 
> > Another note: I think that there may be issues if the creator of a token
> > has no_new_privs set and the user doesn't.  Imagine a daemon that
> > accepts one of these fds, impersonates it, and calls exec.  This could
> > be used to escape from no_new_privs land.
> 
> Which is why I was suggesting that we don't allow changing any field in
> the cred except for uids and gids.

Yes.  Which is why in my original patch I pass a user_creds structure that 
only has the fsuid, fsgid, and altgroups.  This means that the caller can 
*only* impersonate these particular creds for purposes of file access, not priv 
escalation.

As for magic fds, I didn't have but should add to the perms check a test to 
validate that the passed fd is one of our "magic" ones.    The other users of 
anon fds do a similar test to verify that the fd is one that they gave you. 
This check would prevent the using of a random open of /dev/null with unknown 
caps/creds because it would throw an error, preventing an escalation.  This fd 
is also opened with O_CLOEXEC so children don't get it.   Our use case doesn't 
need to fork/exec and if we needed to, we wouldn't be using switch_creds for 
it anyway.  Adding Close-on-exec removes the exploit opportunity.

Jim

> 
> Eric

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-24  5:59             ` Eric W. Biederman
  2013-10-24 19:04                 ` Jim Lieb
@ 2013-10-24 19:28               ` Andy Lutomirski
  2013-10-24 20:24                   ` Jim Lieb
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2013-10-24 19:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Al Viro, Jim Lieb, Theodore Ts'o, Al Viro, Linux FS Devel,
	linux-kernel, bfields, Jeff Layton

On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>>> Al Viro <viro@ZenIV.linux.org.uk> writes:
>>>
>>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>>>>
>>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>>>>> capable(CAP_SETGID) or possibly something a little more refined.
>>>>
>>>> D'oh
>>>>
>>>>> I don't think we want file descriptor passing to all of a sudden become
>>>>> a grant of privilege, beyond what the passed fd can do.
>>>>
>>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
>>>
>>> There also appears to need to be a check that we don't gain any
>>> capabilities.
>>>
>>> We also need a check so that you don't gain any capabilities, and
>>> possibly a few other things.
>>
>> Why?  I like the user_ns part, but I'm not immediately seeing the issue
>> with capabilities.
>
> My reasoning was instead of making this syscall as generic as possible
> start it out by only allowing the cases Jim cares about and working with
> a model where you can't gain any permissions you couldn't gain
> otherwise.
>
> Although the fd -1 trick to revert to your other existing cred seems
> reasonable.
>
>>> So I suspect we want a check something like:
>>>
>>> if ((new_cred->securebits != current_cred->securebits)  ||
>>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>>>     (new_cred->cap_effective != current_cred->cap_effective) ||
>>>     (new_cred->cap_bset != current_cred->cap_bset) ||
>>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>>>     (new_cred->session_keyring != current_cred->session_keyring) ||
>>>     (new_cred->process_keyring != current_cred->process_keyring) ||
>>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>>>     (new_cred->request_keyring != current_cred->request_keyring) ||
>>>     (new_cred->security != current_cred->security) ||
>>>     (new_cred->user_ns != current_cred->user_ns)) {
>>>      return -EPERM;
>>> }
>>>
>>
>> I *really* don't like the idea of being able to use any old file
>> descriptor.  I barely care what rights the caller needs to have to
>> invoke this -- if you're going to pass an fd that grants a capability
>> (in the non-Linux sense of the work), please make sure that the sender
>> actually wants that behavior.
>>
>> IOW, have a syscall to generate a special fd for this purpose.  It's
>> only a couple lines of code, and I think we'll really regret it if we
>> fsck this up.
>>
>> (I will take it as a personal challenge to find at least one exploitable
>> privilege escalation in this if an arbitrary fd works.)
>
> If you can't switch to a uid or a gid you couldn't switch to otherwise
> then the worst that can happen is an information leak.  And information
> leaks are rarely directly exploitable.

Here's the attack:

Suppose there's a daemon that uses this in conjunction with
SCM_RIGHTS.  The daemon is effectively root (under the current
proposal, it has to be).  So a client connects, sends a credential fd,
and the daemon impersonates those credentials.

Now a malicious user obtains *any* fd opened by root.  It sends that
fd to the daemon.  The daemon then impersonates root.  We lose.  (It
can't possibly be hard to obtain an fd with highly privileged f_cred
-- I bet that most console programs have stdin like that, for example.
 There are probably many setuid programs that will happily open
/dev/null for you, too.)

>
>> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
>> and faccessat.  That is, current userspace can't see it.  But now you'll
>> expose various oddities.  For example, AFAICS a capability-less process
>> that's a userns owner can always use setuid.  This will *overwrite*
>> real_cred.  Then you're screwed, especially if this happens by
>> accident.
>
> And doing in userland what faccessat, and knfsd do in the kernel is
> exactly what is desired here.  But maybe there are issues with that.
>
>> That being said, Windows has had functions like this for a long time.
>> Processes have a primary token and possibly an impersonation token.  Any
>> process can call ImpersonateLoggedOnUser (no privilege required) to
>> impersonate the credentials of a token (which is special kind of fd).
>> Similarly, any process can call RevertToSelf to undo it.
>>
>> Is there any actual problem with allowing completely unprivileged tasks
>> to switch to one of these magic cred fds?  That would avoid needing a
>> "revert" operation.
>
> If the permission model is this switching of credentials doesn't get you
> anything you couldn't get some other way.  That would seem to totally
> rules out unprivileged processes switching these things.

IMO, there are two reasonable models that involve fds carrying some
kind of credential.

1. The fd actually carries the ability to use the credentials.  You
need to be very careful whom you send these to.  The security
implications are obvious (which is good) and the receiver doesn't need
privilege (which is good, as long as the receiver is careful).

2. The fd is for identification only.  But this means that the fd
carries the ability to identify as a user.  So you *still* have to be
very careful about whom you send it to.  What you need is an operation
that allows you to identify using the fd without transitively granting
the recipient the same ability.  On networks, this is done by signing
some kind of challenge.  The kernel could work the same way, or there
could be a new CMSG_IDENTITY that you need an identity fd to send but
that does not copy that fd to the recipient.

>
>> Another note: I think that there may be issues if the creator of a token
>> has no_new_privs set and the user doesn't.  Imagine a daemon that
>> accepts one of these fds, impersonates it, and calls exec.  This could
>> be used to escape from no_new_privs land.
>
> Which is why I was suggesting that we don't allow changing any field in
> the cred except for uids and gids.

If the daemon impersonates and execs, we still lose.


--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-24 19:28               ` Andy Lutomirski
@ 2013-10-24 20:24                   ` Jim Lieb
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-24 20:24 UTC (permalink / raw)
  To: Andy Lutomirski, Linux FS Devel
  Cc: Eric W. Biederman, Al Viro, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> 
> <ebiederm@xmission.com> wrote:
> > Andy Lutomirski <luto@amacapital.net> writes:
> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>>> 
> >>>> D'oh
> >>>> 
> >>>>> I don't think we want file descriptor passing to all of a sudden
> >>>>> become
> >>>>> a grant of privilege, beyond what the passed fd can do.
> >>>> 
> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
> >>> 
> >>> There also appears to need to be a check that we don't gain any
> >>> capabilities.
> >>> 
> >>> We also need a check so that you don't gain any capabilities, and
> >>> possibly a few other things.
> >> 
> >> Why?  I like the user_ns part, but I'm not immediately seeing the issue
> >> with capabilities.
> > 
> > My reasoning was instead of making this syscall as generic as possible
> > start it out by only allowing the cases Jim cares about and working with
> > a model where you can't gain any permissions you couldn't gain
> > otherwise.
> > 
> > Although the fd -1 trick to revert to your other existing cred seems
> > reasonable.
> > 
> >>> So I suspect we want a check something like:
> >>> 
> >>> if ((new_cred->securebits != current_cred->securebits)  ||
> >>> 
> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
> >>>     (new_cred->security != current_cred->security) ||
> >>>     (new_cred->user_ns != current_cred->user_ns)) {
> >>>     
> >>>      return -EPERM;
> >>> 
> >>> }
> >> 
> >> I *really* don't like the idea of being able to use any old file
> >> descriptor.  I barely care what rights the caller needs to have to
> >> invoke this -- if you're going to pass an fd that grants a capability
> >> (in the non-Linux sense of the work), please make sure that the sender
> >> actually wants that behavior.
> >> 
> >> IOW, have a syscall to generate a special fd for this purpose.  It's
> >> only a couple lines of code, and I think we'll really regret it if we
> >> fsck this up.
> >> 
> >> (I will take it as a personal challenge to find at least one exploitable
> >> privilege escalation in this if an arbitrary fd works.)
> > 
> > If you can't switch to a uid or a gid you couldn't switch to otherwise
> > then the worst that can happen is an information leak.  And information
> > leaks are rarely directly exploitable.
> 
> Here's the attack:
> 
> Suppose there's a daemon that uses this in conjunction with
> SCM_RIGHTS.  The daemon is effectively root (under the current
> proposal, it has to be).  So a client connects, sends a credential fd,
> and the daemon impersonates those credentials.
> 
> Now a malicious user obtains *any* fd opened by root.  It sends that
> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
> can't possibly be hard to obtain an fd with highly privileged f_cred
> -- I bet that most console programs have stdin like that, for example.
>  There are probably many setuid programs that will happily open
> /dev/null for you, too.)

In my reply to Eric, I note that I need to add a check that the fd passed is 
one from switch_creds. With that test, not any fd will do and the one that 
does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd 
set) caps.  They can do no more damage than what the original switch_creds 
allowed.  The any fd by root no longer applies so use doesn't get much (no 
escalation).

> 
> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> >> and faccessat.  That is, current userspace can't see it.  But now you'll
> >> expose various oddities.  For example, AFAICS a capability-less process
> >> that's a userns owner can always use setuid.  This will *overwrite*
> >> real_cred.  Then you're screwed, especially if this happens by
> >> accident.
> > 
> > And doing in userland what faccessat, and knfsd do in the kernel is
> > exactly what is desired here.  But maybe there are issues with that.
> > 
> >> That being said, Windows has had functions like this for a long time.
> >> Processes have a primary token and possibly an impersonation token.  Any
> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> impersonate the credentials of a token (which is special kind of fd).
> >> Similarly, any process can call RevertToSelf to undo it.
> >> 
> >> Is there any actual problem with allowing completely unprivileged tasks
> >> to switch to one of these magic cred fds?  That would avoid needing a
> >> "revert" operation.
> > 
> > If the permission model is this switching of credentials doesn't get you
> > anything you couldn't get some other way.  That would seem to totally
> > rules out unprivileged processes switching these things.
> 
> IMO, there are two reasonable models that involve fds carrying some
> kind of credential.
> 
> 1. The fd actually carries the ability to use the credentials.  You
> need to be very careful whom you send these to.  The security
> implications are obvious (which is good) and the receiver doesn't need
> privilege (which is good, as long as the receiver is careful).

I test for caps.  I think using switch_creds in all forms should require 
privs.  I thought of the revert case and although it does simply return to 
"real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This means, 
of course, if you have really messed up things, you may not be able to get 
back home which, although a bad thing for the process, is a good thing for the 
system as a whole.

> 
> 2. The fd is for identification only.  But this means that the fd
> carries the ability to identify as a user.  So you *still* have to be
> very careful about whom you send it to.  What you need is an operation
> that allows you to identify using the fd without transitively granting
> the recipient the same ability.  On networks, this is done by signing
> some kind of challenge.  The kernel could work the same way, or there
> could be a new CMSG_IDENTITY that you need an identity fd to send but
> that does not copy that fd to the recipient.

I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets 
which means that the switch_creds fd test still applies here.  It is 
identification but only for within the same kernel.  As for namespaces, the 
translation was done when the creds fd was created.  I suppose if it was 
passed across namespace boundaries there could be a problem but what is in the 
creds structure is the translated fduid,fsgid etc., not the untranslated one.  
We are still doing access checks and quota stuff with the translated creds.  If 
one namespace has "fred" as uid 52 and another has "mary" as 52, the quota 
will only be credited to whoever "fred" really is only if "fred" gets to be 
"mary" in the second alternate universe...
> 
> >> Another note: I think that there may be issues if the creator of a token
> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
> >> accepts one of these fds, impersonates it, and calls exec.  This could
> >> be used to escape from no_new_privs land.
> > 
> > Which is why I was suggesting that we don't allow changing any field in
> > the cred except for uids and gids.
> 
> If the daemon impersonates and execs, we still lose.

I answered this.  You only get to impersonate for purposes of file access with 
reduced caps to prevent you from being root as well.  Also, since these are 
O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is 
gone post-exec.

> 
> 
> --Andy

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
@ 2013-10-24 20:24                   ` Jim Lieb
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-10-24 20:24 UTC (permalink / raw)
  To: Andy Lutomirski, Linux FS Devel
  Cc: Eric W. Biederman, Al Viro, Theodore Ts'o, linux-kernel,
	bfields, Jeff Layton

On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> 
> <ebiederm@xmission.com> wrote:
> > Andy Lutomirski <luto@amacapital.net> writes:
> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >>>> 
> >>>> D'oh
> >>>> 
> >>>>> I don't think we want file descriptor passing to all of a sudden
> >>>>> become
> >>>>> a grant of privilege, beyond what the passed fd can do.
> >>>> 
> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
> >>> 
> >>> There also appears to need to be a check that we don't gain any
> >>> capabilities.
> >>> 
> >>> We also need a check so that you don't gain any capabilities, and
> >>> possibly a few other things.
> >> 
> >> Why?  I like the user_ns part, but I'm not immediately seeing the issue
> >> with capabilities.
> > 
> > My reasoning was instead of making this syscall as generic as possible
> > start it out by only allowing the cases Jim cares about and working with
> > a model where you can't gain any permissions you couldn't gain
> > otherwise.
> > 
> > Although the fd -1 trick to revert to your other existing cred seems
> > reasonable.
> > 
> >>> So I suspect we want a check something like:
> >>> 
> >>> if ((new_cred->securebits != current_cred->securebits)  ||
> >>> 
> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
> >>>     (new_cred->security != current_cred->security) ||
> >>>     (new_cred->user_ns != current_cred->user_ns)) {
> >>>     
> >>>      return -EPERM;
> >>> 
> >>> }
> >> 
> >> I *really* don't like the idea of being able to use any old file
> >> descriptor.  I barely care what rights the caller needs to have to
> >> invoke this -- if you're going to pass an fd that grants a capability
> >> (in the non-Linux sense of the work), please make sure that the sender
> >> actually wants that behavior.
> >> 
> >> IOW, have a syscall to generate a special fd for this purpose.  It's
> >> only a couple lines of code, and I think we'll really regret it if we
> >> fsck this up.
> >> 
> >> (I will take it as a personal challenge to find at least one exploitable
> >> privilege escalation in this if an arbitrary fd works.)
> > 
> > If you can't switch to a uid or a gid you couldn't switch to otherwise
> > then the worst that can happen is an information leak.  And information
> > leaks are rarely directly exploitable.
> 
> Here's the attack:
> 
> Suppose there's a daemon that uses this in conjunction with
> SCM_RIGHTS.  The daemon is effectively root (under the current
> proposal, it has to be).  So a client connects, sends a credential fd,
> and the daemon impersonates those credentials.
> 
> Now a malicious user obtains *any* fd opened by root.  It sends that
> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
> can't possibly be hard to obtain an fd with highly privileged f_cred
> -- I bet that most console programs have stdin like that, for example.
>  There are probably many setuid programs that will happily open
> /dev/null for you, too.)

In my reply to Eric, I note that I need to add a check that the fd passed is 
one from switch_creds. With that test, not any fd will do and the one that 
does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd 
set) caps.  They can do no more damage than what the original switch_creds 
allowed.  The any fd by root no longer applies so use doesn't get much (no 
escalation).

> 
> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> >> and faccessat.  That is, current userspace can't see it.  But now you'll
> >> expose various oddities.  For example, AFAICS a capability-less process
> >> that's a userns owner can always use setuid.  This will *overwrite*
> >> real_cred.  Then you're screwed, especially if this happens by
> >> accident.
> > 
> > And doing in userland what faccessat, and knfsd do in the kernel is
> > exactly what is desired here.  But maybe there are issues with that.
> > 
> >> That being said, Windows has had functions like this for a long time.
> >> Processes have a primary token and possibly an impersonation token.  Any
> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> impersonate the credentials of a token (which is special kind of fd).
> >> Similarly, any process can call RevertToSelf to undo it.
> >> 
> >> Is there any actual problem with allowing completely unprivileged tasks
> >> to switch to one of these magic cred fds?  That would avoid needing a
> >> "revert" operation.
> > 
> > If the permission model is this switching of credentials doesn't get you
> > anything you couldn't get some other way.  That would seem to totally
> > rules out unprivileged processes switching these things.
> 
> IMO, there are two reasonable models that involve fds carrying some
> kind of credential.
> 
> 1. The fd actually carries the ability to use the credentials.  You
> need to be very careful whom you send these to.  The security
> implications are obvious (which is good) and the receiver doesn't need
> privilege (which is good, as long as the receiver is careful).

I test for caps.  I think using switch_creds in all forms should require 
privs.  I thought of the revert case and although it does simply return to 
"real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This means, 
of course, if you have really messed up things, you may not be able to get 
back home which, although a bad thing for the process, is a good thing for the 
system as a whole.

> 
> 2. The fd is for identification only.  But this means that the fd
> carries the ability to identify as a user.  So you *still* have to be
> very careful about whom you send it to.  What you need is an operation
> that allows you to identify using the fd without transitively granting
> the recipient the same ability.  On networks, this is done by signing
> some kind of challenge.  The kernel could work the same way, or there
> could be a new CMSG_IDENTITY that you need an identity fd to send but
> that does not copy that fd to the recipient.

I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets 
which means that the switch_creds fd test still applies here.  It is 
identification but only for within the same kernel.  As for namespaces, the 
translation was done when the creds fd was created.  I suppose if it was 
passed across namespace boundaries there could be a problem but what is in the 
creds structure is the translated fduid,fsgid etc., not the untranslated one.  
We are still doing access checks and quota stuff with the translated creds.  If 
one namespace has "fred" as uid 52 and another has "mary" as 52, the quota 
will only be credited to whoever "fred" really is only if "fred" gets to be 
"mary" in the second alternate universe...
> 
> >> Another note: I think that there may be issues if the creator of a token
> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
> >> accepts one of these fds, impersonates it, and calls exec.  This could
> >> be used to escape from no_new_privs land.
> > 
> > Which is why I was suggesting that we don't allow changing any field in
> > the cred except for uids and gids.
> 
> If the daemon impersonates and execs, we still lose.

I answered this.  You only get to impersonate for purposes of file access with 
reduced caps to prevent you from being root as well.  Also, since these are 
O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is 
gone post-exec.

> 
> 
> --Andy

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-24 20:24                   ` Jim Lieb
@ 2013-10-31 19:09                     ` Andy Lutomirski
  -1 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2013-10-31 19:09 UTC (permalink / raw)
  To: Jim Lieb
  Cc: Linux FS Devel, Eric W. Biederman, Al Viro, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <jlieb@panasas.com> wrote:
> On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
>> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>>
>> <ebiederm@xmission.com> wrote:
>> > Andy Lutomirski <luto@amacapital.net> writes:
>> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
>> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
>> >>>>
>> >>>> D'oh
>> >>>>
>> >>>>> I don't think we want file descriptor passing to all of a sudden
>> >>>>> become
>> >>>>> a grant of privilege, beyond what the passed fd can do.
>> >>>>
>> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
>> >>>
>> >>> There also appears to need to be a check that we don't gain any
>> >>> capabilities.
>> >>>
>> >>> We also need a check so that you don't gain any capabilities, and
>> >>> possibly a few other things.
>> >>
>> >> Why?  I like the user_ns part, but I'm not immediately seeing the issue
>> >> with capabilities.
>> >
>> > My reasoning was instead of making this syscall as generic as possible
>> > start it out by only allowing the cases Jim cares about and working with
>> > a model where you can't gain any permissions you couldn't gain
>> > otherwise.
>> >
>> > Although the fd -1 trick to revert to your other existing cred seems
>> > reasonable.
>> >
>> >>> So I suspect we want a check something like:
>> >>>
>> >>> if ((new_cred->securebits != current_cred->securebits)  ||
>> >>>
>> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
>> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
>> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
>> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
>> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
>> >>>     (new_cred->security != current_cred->security) ||
>> >>>     (new_cred->user_ns != current_cred->user_ns)) {
>> >>>
>> >>>      return -EPERM;
>> >>>
>> >>> }
>> >>
>> >> I *really* don't like the idea of being able to use any old file
>> >> descriptor.  I barely care what rights the caller needs to have to
>> >> invoke this -- if you're going to pass an fd that grants a capability
>> >> (in the non-Linux sense of the work), please make sure that the sender
>> >> actually wants that behavior.
>> >>
>> >> IOW, have a syscall to generate a special fd for this purpose.  It's
>> >> only a couple lines of code, and I think we'll really regret it if we
>> >> fsck this up.
>> >>
>> >> (I will take it as a personal challenge to find at least one exploitable
>> >> privilege escalation in this if an arbitrary fd works.)
>> >
>> > If you can't switch to a uid or a gid you couldn't switch to otherwise
>> > then the worst that can happen is an information leak.  And information
>> > leaks are rarely directly exploitable.
>>
>> Here's the attack:
>>
>> Suppose there's a daemon that uses this in conjunction with
>> SCM_RIGHTS.  The daemon is effectively root (under the current
>> proposal, it has to be).  So a client connects, sends a credential fd,
>> and the daemon impersonates those credentials.
>>
>> Now a malicious user obtains *any* fd opened by root.  It sends that
>> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
>> can't possibly be hard to obtain an fd with highly privileged f_cred
>> -- I bet that most console programs have stdin like that, for example.
>>  There are probably many setuid programs that will happily open
>> /dev/null for you, too.)
>
> In my reply to Eric, I note that I need to add a check that the fd passed is
> one from switch_creds. With that test, not any fd will do and the one that
> does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd
> set) caps.  They can do no more damage than what the original switch_creds
> allowed.  The any fd by root no longer applies so use doesn't get much (no
> escalation).
>
>>
>> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
>> >> and faccessat.  That is, current userspace can't see it.  But now you'll
>> >> expose various oddities.  For example, AFAICS a capability-less process
>> >> that's a userns owner can always use setuid.  This will *overwrite*
>> >> real_cred.  Then you're screwed, especially if this happens by
>> >> accident.
>> >
>> > And doing in userland what faccessat, and knfsd do in the kernel is
>> > exactly what is desired here.  But maybe there are issues with that.
>> >
>> >> That being said, Windows has had functions like this for a long time.
>> >> Processes have a primary token and possibly an impersonation token.  Any
>> >> process can call ImpersonateLoggedOnUser (no privilege required) to
>> >> impersonate the credentials of a token (which is special kind of fd).
>> >> Similarly, any process can call RevertToSelf to undo it.
>> >>
>> >> Is there any actual problem with allowing completely unprivileged tasks
>> >> to switch to one of these magic cred fds?  That would avoid needing a
>> >> "revert" operation.
>> >
>> > If the permission model is this switching of credentials doesn't get you
>> > anything you couldn't get some other way.  That would seem to totally
>> > rules out unprivileged processes switching these things.
>>
>> IMO, there are two reasonable models that involve fds carrying some
>> kind of credential.
>>
>> 1. The fd actually carries the ability to use the credentials.  You
>> need to be very careful whom you send these to.  The security
>> implications are obvious (which is good) and the receiver doesn't need
>> privilege (which is good, as long as the receiver is careful).
>
> I test for caps.  I think using switch_creds in all forms should require
> privs.  I thought of the revert case and although it does simply return to
> "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This means,
> of course, if you have really messed up things, you may not be able to get
> back home which, although a bad thing for the process, is a good thing for the
> system as a whole.
>
>>
>> 2. The fd is for identification only.  But this means that the fd
>> carries the ability to identify as a user.  So you *still* have to be
>> very careful about whom you send it to.  What you need is an operation
>> that allows you to identify using the fd without transitively granting
>> the recipient the same ability.  On networks, this is done by signing
>> some kind of challenge.  The kernel could work the same way, or there
>> could be a new CMSG_IDENTITY that you need an identity fd to send but
>> that does not copy that fd to the recipient.
>
> I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets
> which means that the switch_creds fd test still applies here.  It is
> identification but only for within the same kernel.  As for namespaces, the
> translation was done when the creds fd was created.  I suppose if it was
> passed across namespace boundaries there could be a problem but what is in the
> creds structure is the translated fduid,fsgid etc., not the untranslated one.
> We are still doing access checks and quota stuff with the translated creds.  If
> one namespace has "fred" as uid 52 and another has "mary" as 52, the quota
> will only be credited to whoever "fred" really is only if "fred" gets to be
> "mary" in the second alternate universe...

I'm not talking about namespaces here -- I think we're not really on
the same page.  Can you describe what you're envisioning doing with
these fds?

>>
>> >> Another note: I think that there may be issues if the creator of a token
>> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
>> >> accepts one of these fds, impersonates it, and calls exec.  This could
>> >> be used to escape from no_new_privs land.
>> >
>> > Which is why I was suggesting that we don't allow changing any field in
>> > the cred except for uids and gids.
>>
>> If the daemon impersonates and execs, we still lose.
>
> I answered this.  You only get to impersonate for purposes of file access with
> reduced caps to prevent you from being root as well.  Also, since these are
> O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is
> gone post-exec.

If a no_new_privs process authenticates to a daemon and that daemon
execs, then there's now a dumpable, ptraceable, non-no-new-privs
process running as the uid/gid of the no_new_privs process.  This may
be dangerous.

>
>>
>>
>> --Andy
>
> --
> Jim Lieb
> Linux Systems Engineer
> Panasas Inc.
>
> "If ease of use was the only requirement, we would all be riding tricycles"
> - Douglas Engelbart 1925–2013



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
@ 2013-10-31 19:09                     ` Andy Lutomirski
  0 siblings, 0 replies; 26+ messages in thread
From: Andy Lutomirski @ 2013-10-31 19:09 UTC (permalink / raw)
  To: Jim Lieb
  Cc: Linux FS Devel, Eric W. Biederman, Al Viro, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <jlieb@panasas.com> wrote:
> On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
>> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>>
>> <ebiederm@xmission.com> wrote:
>> > Andy Lutomirski <luto@amacapital.net> writes:
>> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
>> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
>> >>>>
>> >>>> D'oh
>> >>>>
>> >>>>> I don't think we want file descriptor passing to all of a sudden
>> >>>>> become
>> >>>>> a grant of privilege, beyond what the passed fd can do.
>> >>>>
>> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt either...
>> >>>
>> >>> There also appears to need to be a check that we don't gain any
>> >>> capabilities.
>> >>>
>> >>> We also need a check so that you don't gain any capabilities, and
>> >>> possibly a few other things.
>> >>
>> >> Why?  I like the user_ns part, but I'm not immediately seeing the issue
>> >> with capabilities.
>> >
>> > My reasoning was instead of making this syscall as generic as possible
>> > start it out by only allowing the cases Jim cares about and working with
>> > a model where you can't gain any permissions you couldn't gain
>> > otherwise.
>> >
>> > Although the fd -1 trick to revert to your other existing cred seems
>> > reasonable.
>> >
>> >>> So I suspect we want a check something like:
>> >>>
>> >>> if ((new_cred->securebits != current_cred->securebits)  ||
>> >>>
>> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
>> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
>> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
>> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
>> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
>> >>>     (new_cred->security != current_cred->security) ||
>> >>>     (new_cred->user_ns != current_cred->user_ns)) {
>> >>>
>> >>>      return -EPERM;
>> >>>
>> >>> }
>> >>
>> >> I *really* don't like the idea of being able to use any old file
>> >> descriptor.  I barely care what rights the caller needs to have to
>> >> invoke this -- if you're going to pass an fd that grants a capability
>> >> (in the non-Linux sense of the work), please make sure that the sender
>> >> actually wants that behavior.
>> >>
>> >> IOW, have a syscall to generate a special fd for this purpose.  It's
>> >> only a couple lines of code, and I think we'll really regret it if we
>> >> fsck this up.
>> >>
>> >> (I will take it as a personal challenge to find at least one exploitable
>> >> privilege escalation in this if an arbitrary fd works.)
>> >
>> > If you can't switch to a uid or a gid you couldn't switch to otherwise
>> > then the worst that can happen is an information leak.  And information
>> > leaks are rarely directly exploitable.
>>
>> Here's the attack:
>>
>> Suppose there's a daemon that uses this in conjunction with
>> SCM_RIGHTS.  The daemon is effectively root (under the current
>> proposal, it has to be).  So a client connects, sends a credential fd,
>> and the daemon impersonates those credentials.
>>
>> Now a malicious user obtains *any* fd opened by root.  It sends that
>> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
>> can't possibly be hard to obtain an fd with highly privileged f_cred
>> -- I bet that most console programs have stdin like that, for example.
>>  There are probably many setuid programs that will happily open
>> /dev/null for you, too.)
>
> In my reply to Eric, I note that I need to add a check that the fd passed is
> one from switch_creds. With that test, not any fd will do and the one that
> does has only been able to set fsuid, fsgid, altgroups, and reduced (the nfsd
> set) caps.  They can do no more damage than what the original switch_creds
> allowed.  The any fd by root no longer applies so use doesn't get much (no
> escalation).
>
>>
>> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
>> >> and faccessat.  That is, current userspace can't see it.  But now you'll
>> >> expose various oddities.  For example, AFAICS a capability-less process
>> >> that's a userns owner can always use setuid.  This will *overwrite*
>> >> real_cred.  Then you're screwed, especially if this happens by
>> >> accident.
>> >
>> > And doing in userland what faccessat, and knfsd do in the kernel is
>> > exactly what is desired here.  But maybe there are issues with that.
>> >
>> >> That being said, Windows has had functions like this for a long time.
>> >> Processes have a primary token and possibly an impersonation token.  Any
>> >> process can call ImpersonateLoggedOnUser (no privilege required) to
>> >> impersonate the credentials of a token (which is special kind of fd).
>> >> Similarly, any process can call RevertToSelf to undo it.
>> >>
>> >> Is there any actual problem with allowing completely unprivileged tasks
>> >> to switch to one of these magic cred fds?  That would avoid needing a
>> >> "revert" operation.
>> >
>> > If the permission model is this switching of credentials doesn't get you
>> > anything you couldn't get some other way.  That would seem to totally
>> > rules out unprivileged processes switching these things.
>>
>> IMO, there are two reasonable models that involve fds carrying some
>> kind of credential.
>>
>> 1. The fd actually carries the ability to use the credentials.  You
>> need to be very careful whom you send these to.  The security
>> implications are obvious (which is good) and the receiver doesn't need
>> privilege (which is good, as long as the receiver is careful).
>
> I test for caps.  I think using switch_creds in all forms should require
> privs.  I thought of the revert case and although it does simply return to
> "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This means,
> of course, if you have really messed up things, you may not be able to get
> back home which, although a bad thing for the process, is a good thing for the
> system as a whole.
>
>>
>> 2. The fd is for identification only.  But this means that the fd
>> carries the ability to identify as a user.  So you *still* have to be
>> very careful about whom you send it to.  What you need is an operation
>> that allows you to identify using the fd without transitively granting
>> the recipient the same ability.  On networks, this is done by signing
>> some kind of challenge.  The kernel could work the same way, or there
>> could be a new CMSG_IDENTITY that you need an identity fd to send but
>> that does not copy that fd to the recipient.
>
> I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets
> which means that the switch_creds fd test still applies here.  It is
> identification but only for within the same kernel.  As for namespaces, the
> translation was done when the creds fd was created.  I suppose if it was
> passed across namespace boundaries there could be a problem but what is in the
> creds structure is the translated fduid,fsgid etc., not the untranslated one.
> We are still doing access checks and quota stuff with the translated creds.  If
> one namespace has "fred" as uid 52 and another has "mary" as 52, the quota
> will only be credited to whoever "fred" really is only if "fred" gets to be
> "mary" in the second alternate universe...

I'm not talking about namespaces here -- I think we're not really on
the same page.  Can you describe what you're envisioning doing with
these fds?

>>
>> >> Another note: I think that there may be issues if the creator of a token
>> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
>> >> accepts one of these fds, impersonates it, and calls exec.  This could
>> >> be used to escape from no_new_privs land.
>> >
>> > Which is why I was suggesting that we don't allow changing any field in
>> > the cred except for uids and gids.
>>
>> If the daemon impersonates and execs, we still lose.
>
> I answered this.  You only get to impersonate for purposes of file access with
> reduced caps to prevent you from being root as well.  Also, since these are
> O_CLOEXEC and switch_creds "magic" fds, this can't happen because the fd is
> gone post-exec.

If a no_new_privs process authenticates to a daemon and that daemon
execs, then there's now a dumpable, ptraceable, non-no-new-privs
process running as the uid/gid of the no_new_privs process.  This may
be dangerous.

>
>>
>>
>> --Andy
>
> --
> Jim Lieb
> Linux Systems Engineer
> Panasas Inc.
>
> "If ease of use was the only requirement, we would all be riding tricycles"
> - Douglas Engelbart 1925–2013



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-31 19:09                     ` Andy Lutomirski
  (?)
@ 2013-10-31 19:43                     ` Jim Lieb
  2013-10-31 19:48                       ` Andy Lutomirski
  -1 siblings, 1 reply; 26+ messages in thread
From: Jim Lieb @ 2013-10-31 19:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, Eric W. Biederman, Al Viro, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <jlieb@panasas.com> wrote:
> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> >> 
> >> <ebiederm@xmission.com> wrote:
> >> > Andy Lutomirski <luto@amacapital.net> writes:
> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >> >>>> 
> >> >>>> D'oh
> >> >>>> 
> >> >>>>> I don't think we want file descriptor passing to all of a sudden
> >> >>>>> become
> >> >>>>> a grant of privilege, beyond what the passed fd can do.
> >> >>>> 
> >> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt
> >> >>>> either...
> >> >>> 
> >> >>> There also appears to need to be a check that we don't gain any
> >> >>> capabilities.
> >> >>> 
> >> >>> We also need a check so that you don't gain any capabilities, and
> >> >>> possibly a few other things.
> >> >> 
> >> >> Why?  I like the user_ns part, but I'm not immediately seeing the
> >> >> issue
> >> >> with capabilities.
> >> > 
> >> > My reasoning was instead of making this syscall as generic as possible
> >> > start it out by only allowing the cases Jim cares about and working
> >> > with
> >> > a model where you can't gain any permissions you couldn't gain
> >> > otherwise.
> >> > 
> >> > Although the fd -1 trick to revert to your other existing cred seems
> >> > reasonable.
> >> > 
> >> >>> So I suspect we want a check something like:
> >> >>> 
> >> >>> if ((new_cred->securebits != current_cred->securebits)  ||
> >> >>> 
> >> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
> >> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
> >> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
> >> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
> >> >>>     (new_cred->security != current_cred->security) ||
> >> >>>     (new_cred->user_ns != current_cred->user_ns)) {
> >> >>>     
> >> >>>      return -EPERM;
> >> >>> 
> >> >>> }
> >> >> 
> >> >> I *really* don't like the idea of being able to use any old file
> >> >> descriptor.  I barely care what rights the caller needs to have to
> >> >> invoke this -- if you're going to pass an fd that grants a capability
> >> >> (in the non-Linux sense of the work), please make sure that the sender
> >> >> actually wants that behavior.
> >> >> 
> >> >> IOW, have a syscall to generate a special fd for this purpose.  It's
> >> >> only a couple lines of code, and I think we'll really regret it if we
> >> >> fsck this up.
> >> >> 
> >> >> (I will take it as a personal challenge to find at least one
> >> >> exploitable
> >> >> privilege escalation in this if an arbitrary fd works.)
> >> > 
> >> > If you can't switch to a uid or a gid you couldn't switch to otherwise
> >> > then the worst that can happen is an information leak.  And information
> >> > leaks are rarely directly exploitable.
> >> 
> >> Here's the attack:
> >> 
> >> Suppose there's a daemon that uses this in conjunction with
> >> SCM_RIGHTS.  The daemon is effectively root (under the current
> >> proposal, it has to be).  So a client connects, sends a credential fd,
> >> and the daemon impersonates those credentials.
> >> 
> >> Now a malicious user obtains *any* fd opened by root.  It sends that
> >> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
> >> can't possibly be hard to obtain an fd with highly privileged f_cred
> >> -- I bet that most console programs have stdin like that, for example.
> >> 
> >>  There are probably many setuid programs that will happily open
> >> 
> >> /dev/null for you, too.)
> > 
> > In my reply to Eric, I note that I need to add a check that the fd passed
> > is one from switch_creds. With that test, not any fd will do and the one
> > that does has only been able to set fsuid, fsgid, altgroups, and reduced
> > (the nfsd set) caps.  They can do no more damage than what the original
> > switch_creds allowed.  The any fd by root no longer applies so use
> > doesn't get much (no escalation).
> > 
> >> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
> >> >> and faccessat.  That is, current userspace can't see it.  But now
> >> >> you'll
> >> >> expose various oddities.  For example, AFAICS a capability-less
> >> >> process
> >> >> that's a userns owner can always use setuid.  This will *overwrite*
> >> >> real_cred.  Then you're screwed, especially if this happens by
> >> >> accident.
> >> > 
> >> > And doing in userland what faccessat, and knfsd do in the kernel is
> >> > exactly what is desired here.  But maybe there are issues with that.
> >> > 
> >> >> That being said, Windows has had functions like this for a long time.
> >> >> Processes have a primary token and possibly an impersonation token. 
> >> >> Any
> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> >> impersonate the credentials of a token (which is special kind of fd).
> >> >> Similarly, any process can call RevertToSelf to undo it.
> >> >> 
> >> >> Is there any actual problem with allowing completely unprivileged
> >> >> tasks
> >> >> to switch to one of these magic cred fds?  That would avoid needing a
> >> >> "revert" operation.
> >> > 
> >> > If the permission model is this switching of credentials doesn't get
> >> > you
> >> > anything you couldn't get some other way.  That would seem to totally
> >> > rules out unprivileged processes switching these things.
> >> 
> >> IMO, there are two reasonable models that involve fds carrying some
> >> kind of credential.
> >> 
> >> 1. The fd actually carries the ability to use the credentials.  You
> >> need to be very careful whom you send these to.  The security
> >> implications are obvious (which is good) and the receiver doesn't need
> >> privilege (which is good, as long as the receiver is careful).
> > 
> > I test for caps.  I think using switch_creds in all forms should require
> > privs.  I thought of the revert case and although it does simply return to
> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This
> > means, of course, if you have really messed up things, you may not be
> > able to get back home which, although a bad thing for the process, is a
> > good thing for the system as a whole.
> > 
> >> 2. The fd is for identification only.  But this means that the fd
> >> carries the ability to identify as a user.  So you *still* have to be
> >> very careful about whom you send it to.  What you need is an operation
> >> that allows you to identify using the fd without transitively granting
> >> the recipient the same ability.  On networks, this is done by signing
> >> some kind of challenge.  The kernel could work the same way, or there
> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
> >> that does not copy that fd to the recipient.
> > 
> > I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets
> > which means that the switch_creds fd test still applies here.  It is
> > identification but only for within the same kernel.  As for namespaces,
> > the
> > translation was done when the creds fd was created.  I suppose if it was
> > passed across namespace boundaries there could be a problem but what is in
> > the creds structure is the translated fduid,fsgid etc., not the
> > untranslated one. We are still doing access checks and quota stuff with
> > the translated creds.  If one namespace has "fred" as uid 52 and another
> > has "mary" as 52, the quota will only be credited to whoever "fred"
> > really is only if "fred" gets to be "mary" in the second alternate
> > universe...
> 
> I'm not talking about namespaces here -- I think we're not really on
> the same page.  Can you describe what you're envisioning doing with
> these fds?

I have a new version that is now two syscalls and no multiplexing has more 
testing etc. to mirror exactly the tests in setfsuid().  I still am testing 
but plan to send this new patchset next week for review.

Ok,  I may have missed something.  What I meant to say is that I'm using the 
same namespace functions for switch_creds as all the set*uid syscalls use so 
what I have should work as well.  If one were to pass this fd via CMSG, it 
could cross a namespace boundary.  The changed mappings of this fd would be 
the same as for any other fd passed across now.  Maybe that is irrelevant in 
this case. 

The purpose of the fd is to create a handle to a creds set that a server can 
then efficiently switch to prior to filesystem syscalls where fsuid etc. are 
relevant (mkdir, creat, write, etc.).  This is exposing the override_creds() 
and revert_creds() to these servers.  I do not intend in our server to hand 
these fds to anyone else.  After all, they are useless for anything other than 
switch_creds/use_creds.

The server keeps a cache of its known, currently active client creds along 
with this fd.  The fast path is to lookup the creds and use the fd.  On cache 
misses, it does a switch_creds() and saves the fd in the cache.

> 
> >> >> Another note: I think that there may be issues if the creator of a
> >> >> token
> >> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
> >> >> accepts one of these fds, impersonates it, and calls exec.  This could
> >> >> be used to escape from no_new_privs land.
> >> > 
> >> > Which is why I was suggesting that we don't allow changing any field in
> >> > the cred except for uids and gids.
> >> 
> >> If the daemon impersonates and execs, we still lose.
> > 
> > I answered this.  You only get to impersonate for purposes of file access
> > with reduced caps to prevent you from being root as well.  Also, since
> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
> > because the fd is gone post-exec.
> 
> If a no_new_privs process authenticates to a daemon and that daemon
> execs, then there's now a dumpable, ptraceable, non-no-new-privs
> process running as the uid/gid of the no_new_privs process.  This may
> be dangerous.

Two things.  My new patch set now throws an error if the fd is not one 
returned by switch_creds().  The new syscall here is use_creds() btw.  That fd 
is opened O_CLOEXEC so the syscall has two guards.  First, it can only be one 
that switch_creds() returned and second, it gets closed on an exec so the 
no_new_privs process doesn't get it.  In addition, switch_creds() reduces the 
effective caps set to the same minimal set that nfsd_setuser() has.

If someone else chooses to pass this fd via CMSG, whoever gets it has reduced 
privs and in order to use it for anything, i.e. use_creds(), it still needs to 
inherit SETUID and SETGID caps from its parent or it will get an EPERM error. 
Passing this in "friendly" code defeats the purpose of the cache above in that 
we could get fd leaks.  If there is another flag that could prevent its being 
passed along via CMSG, I can add that too because using CMSG is well outside 
the use case.

Note too, it also cannot pass it on via exec.  The use case is and should be 
as for knfsd, namely  which calls nfsd_setuser() followed by fulfilling the 
protocol op followed by a revert of creds.  All this is is in the same thread.

Jim
> 
> >> --Andy
> > 
> > --
> > Jim Lieb
> > Linux Systems Engineer
> > Panasas Inc.
> > 
> > "If ease of use was the only requirement, we would all be riding
> > tricycles"
> > - Douglas Engelbart 1925–2013

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-31 19:43                     ` Jim Lieb
@ 2013-10-31 19:48                       ` Andy Lutomirski
  2013-10-31 20:39                         ` Jim Lieb
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2013-10-31 19:48 UTC (permalink / raw)
  To: Jim Lieb
  Cc: Linux FS Devel, Eric W. Biederman, Al Viro, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thu, Oct 31, 2013 at 12:43 PM, Jim Lieb <jlieb@panasas.com> wrote:
> On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
>> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <jlieb@panasas.com> wrote:
>> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
>> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
>> >>
>> >> <ebiederm@xmission.com> wrote:
>> >> > Andy Lutomirski <luto@amacapital.net> writes:
>> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
>> >> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
>> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman wrote:
>> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
>> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
>> >> >>>>
>> >> >>>> D'oh
>> >> >>>>
>> >> >>>>> I don't think we want file descriptor passing to all of a sudden
>> >> >>>>> become
>> >> >>>>> a grant of privilege, beyond what the passed fd can do.
>> >> >>>>
>> >> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt
>> >> >>>> either...
>> >> >>>
>> >> >>> There also appears to need to be a check that we don't gain any
>> >> >>> capabilities.
>> >> >>>
>> >> >>> We also need a check so that you don't gain any capabilities, and
>> >> >>> possibly a few other things.
>> >> >>
>> >> >> Why?  I like the user_ns part, but I'm not immediately seeing the
>> >> >> issue
>> >> >> with capabilities.
>> >> >
>> >> > My reasoning was instead of making this syscall as generic as possible
>> >> > start it out by only allowing the cases Jim cares about and working
>> >> > with
>> >> > a model where you can't gain any permissions you couldn't gain
>> >> > otherwise.
>> >> >
>> >> > Although the fd -1 trick to revert to your other existing cred seems
>> >> > reasonable.
>> >> >
>> >> >>> So I suspect we want a check something like:
>> >> >>>
>> >> >>> if ((new_cred->securebits != current_cred->securebits)  ||
>> >> >>>
>> >> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable) ||
>> >> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
>> >> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
>> >> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
>> >> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
>> >> >>>     (new_cred->session_keyring != current_cred->session_keyring) ||
>> >> >>>     (new_cred->process_keyring != current_cred->process_keyring) ||
>> >> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
>> >> >>>     (new_cred->request_keyring != current_cred->request_keyring) ||
>> >> >>>     (new_cred->security != current_cred->security) ||
>> >> >>>     (new_cred->user_ns != current_cred->user_ns)) {
>> >> >>>
>> >> >>>      return -EPERM;
>> >> >>>
>> >> >>> }
>> >> >>
>> >> >> I *really* don't like the idea of being able to use any old file
>> >> >> descriptor.  I barely care what rights the caller needs to have to
>> >> >> invoke this -- if you're going to pass an fd that grants a capability
>> >> >> (in the non-Linux sense of the work), please make sure that the sender
>> >> >> actually wants that behavior.
>> >> >>
>> >> >> IOW, have a syscall to generate a special fd for this purpose.  It's
>> >> >> only a couple lines of code, and I think we'll really regret it if we
>> >> >> fsck this up.
>> >> >>
>> >> >> (I will take it as a personal challenge to find at least one
>> >> >> exploitable
>> >> >> privilege escalation in this if an arbitrary fd works.)
>> >> >
>> >> > If you can't switch to a uid or a gid you couldn't switch to otherwise
>> >> > then the worst that can happen is an information leak.  And information
>> >> > leaks are rarely directly exploitable.
>> >>
>> >> Here's the attack:
>> >>
>> >> Suppose there's a daemon that uses this in conjunction with
>> >> SCM_RIGHTS.  The daemon is effectively root (under the current
>> >> proposal, it has to be).  So a client connects, sends a credential fd,
>> >> and the daemon impersonates those credentials.
>> >>
>> >> Now a malicious user obtains *any* fd opened by root.  It sends that
>> >> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
>> >> can't possibly be hard to obtain an fd with highly privileged f_cred
>> >> -- I bet that most console programs have stdin like that, for example.
>> >>
>> >>  There are probably many setuid programs that will happily open
>> >>
>> >> /dev/null for you, too.)
>> >
>> > In my reply to Eric, I note that I need to add a check that the fd passed
>> > is one from switch_creds. With that test, not any fd will do and the one
>> > that does has only been able to set fsuid, fsgid, altgroups, and reduced
>> > (the nfsd set) caps.  They can do no more damage than what the original
>> > switch_creds allowed.  The any fd by root no longer applies so use
>> > doesn't get much (no escalation).
>> >
>> >> >> Also... real_cred looks confusing.  AFAICS it is used *only* for knfsd
>> >> >> and faccessat.  That is, current userspace can't see it.  But now
>> >> >> you'll
>> >> >> expose various oddities.  For example, AFAICS a capability-less
>> >> >> process
>> >> >> that's a userns owner can always use setuid.  This will *overwrite*
>> >> >> real_cred.  Then you're screwed, especially if this happens by
>> >> >> accident.
>> >> >
>> >> > And doing in userland what faccessat, and knfsd do in the kernel is
>> >> > exactly what is desired here.  But maybe there are issues with that.
>> >> >
>> >> >> That being said, Windows has had functions like this for a long time.
>> >> >> Processes have a primary token and possibly an impersonation token.
>> >> >> Any
>> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
>> >> >> impersonate the credentials of a token (which is special kind of fd).
>> >> >> Similarly, any process can call RevertToSelf to undo it.
>> >> >>
>> >> >> Is there any actual problem with allowing completely unprivileged
>> >> >> tasks
>> >> >> to switch to one of these magic cred fds?  That would avoid needing a
>> >> >> "revert" operation.
>> >> >
>> >> > If the permission model is this switching of credentials doesn't get
>> >> > you
>> >> > anything you couldn't get some other way.  That would seem to totally
>> >> > rules out unprivileged processes switching these things.
>> >>
>> >> IMO, there are two reasonable models that involve fds carrying some
>> >> kind of credential.
>> >>
>> >> 1. The fd actually carries the ability to use the credentials.  You
>> >> need to be very careful whom you send these to.  The security
>> >> implications are obvious (which is good) and the receiver doesn't need
>> >> privilege (which is good, as long as the receiver is careful).
>> >
>> > I test for caps.  I think using switch_creds in all forms should require
>> > privs.  I thought of the revert case and although it does simply return to
>> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This
>> > means, of course, if you have really messed up things, you may not be
>> > able to get back home which, although a bad thing for the process, is a
>> > good thing for the system as a whole.
>> >
>> >> 2. The fd is for identification only.  But this means that the fd
>> >> carries the ability to identify as a user.  So you *still* have to be
>> >> very careful about whom you send it to.  What you need is an operation
>> >> that allows you to identify using the fd without transitively granting
>> >> the recipient the same ability.  On networks, this is done by signing
>> >> some kind of challenge.  The kernel could work the same way, or there
>> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
>> >> that does not copy that fd to the recipient.
>> >
>> > I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN sockets
>> > which means that the switch_creds fd test still applies here.  It is
>> > identification but only for within the same kernel.  As for namespaces,
>> > the
>> > translation was done when the creds fd was created.  I suppose if it was
>> > passed across namespace boundaries there could be a problem but what is in
>> > the creds structure is the translated fduid,fsgid etc., not the
>> > untranslated one. We are still doing access checks and quota stuff with
>> > the translated creds.  If one namespace has "fred" as uid 52 and another
>> > has "mary" as 52, the quota will only be credited to whoever "fred"
>> > really is only if "fred" gets to be "mary" in the second alternate
>> > universe...
>>
>> I'm not talking about namespaces here -- I think we're not really on
>> the same page.  Can you describe what you're envisioning doing with
>> these fds?
>
> I have a new version that is now two syscalls and no multiplexing has more
> testing etc. to mirror exactly the tests in setfsuid().  I still am testing
> but plan to send this new patchset next week for review.
>
> Ok,  I may have missed something.  What I meant to say is that I'm using the
> same namespace functions for switch_creds as all the set*uid syscalls use so
> what I have should work as well.  If one were to pass this fd via CMSG, it
> could cross a namespace boundary.  The changed mappings of this fd would be
> the same as for any other fd passed across now.  Maybe that is irrelevant in
> this case.
>
> The purpose of the fd is to create a handle to a creds set that a server can
> then efficiently switch to prior to filesystem syscalls where fsuid etc. are
> relevant (mkdir, creat, write, etc.).  This is exposing the override_creds()
> and revert_creds() to these servers.  I do not intend in our server to hand
> these fds to anyone else.  After all, they are useless for anything other than
> switch_creds/use_creds.
>
> The server keeps a cache of its known, currently active client creds along
> with this fd.  The fast path is to lookup the creds and use the fd.  On cache
> misses, it does a switch_creds() and saves the fd in the cache.

So if I understand you correctly, you're not planning on sending this
fd between process at all.  In that case, adding a new API that seems
designed for interprocess use and having exactly zero usecases to
think about makes me nervous.  Why is it going to use fds again?

Or maybe I should wait to see the updated API before complaining :)

>
>>
>> >> >> Another note: I think that there may be issues if the creator of a
>> >> >> token
>> >> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
>> >> >> accepts one of these fds, impersonates it, and calls exec.  This could
>> >> >> be used to escape from no_new_privs land.
>> >> >
>> >> > Which is why I was suggesting that we don't allow changing any field in
>> >> > the cred except for uids and gids.
>> >>
>> >> If the daemon impersonates and execs, we still lose.
>> >
>> > I answered this.  You only get to impersonate for purposes of file access
>> > with reduced caps to prevent you from being root as well.  Also, since
>> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
>> > because the fd is gone post-exec.
>>
>> If a no_new_privs process authenticates to a daemon and that daemon
>> execs, then there's now a dumpable, ptraceable, non-no-new-privs
>> process running as the uid/gid of the no_new_privs process.  This may
>> be dangerous.
>
> Two things.  My new patch set now throws an error if the fd is not one
> returned by switch_creds().  The new syscall here is use_creds() btw.  That fd
> is opened O_CLOEXEC so the syscall has two guards.  First, it can only be one
> that switch_creds() returned and second, it gets closed on an exec so the
> no_new_privs process doesn't get it.  In addition, switch_creds() reduces the
> effective caps set to the same minimal set that nfsd_setuser() has.
>
> If someone else chooses to pass this fd via CMSG, whoever gets it has reduced
> privs and in order to use it for anything, i.e. use_creds(), it still needs to
> inherit SETUID and SETGID caps from its parent or it will get an EPERM error.
> Passing this in "friendly" code defeats the purpose of the cache above in that
> we could get fd leaks.  If there is another flag that could prevent its being
> passed along via CMSG, I can add that too because using CMSG is well outside
> the use case.

I still don't see the point of lowering effective caps, but this is
IMO minor.  Are you checking the permitted set on revert?

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-31 19:48                       ` Andy Lutomirski
@ 2013-10-31 20:39                         ` Jim Lieb
  2013-11-01 13:24                           ` Tetsuo Handa
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Lieb @ 2013-10-31 20:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, Eric W. Biederman, Al Viro, Theodore Ts'o,
	linux-kernel, bfields, Jeff Layton

On Thursday, October 31, 2013 12:48:54 Andy Lutomirski wrote:
> On Thu, Oct 31, 2013 at 12:43 PM, Jim Lieb <jlieb@panasas.com> wrote:
> > On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote:
> >> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb <jlieb@panasas.com> wrote:
> >> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote:
> >> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman
> >> >> 
> >> >> <ebiederm@xmission.com> wrote:
> >> >> > Andy Lutomirski <luto@amacapital.net> writes:
> >> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote:
> >> >> >>> Al Viro <viro@ZenIV.linux.org.uk> writes:
> >> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederman 
wrote:
> >> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID) &&
> >> >> >>>>> capable(CAP_SETGID) or possibly something a little more refined.
> >> >> >>>> 
> >> >> >>>> D'oh
> >> >> >>>> 
> >> >> >>>>> I don't think we want file descriptor passing to all of a sudden
> >> >> >>>>> become
> >> >> >>>>> a grant of privilege, beyond what the passed fd can do.
> >> >> >>>> 
> >> >> >>>> Definitely.  And an extra ) to make it compile wouldn't hurt
> >> >> >>>> either...
> >> >> >>> 
> >> >> >>> There also appears to need to be a check that we don't gain any
> >> >> >>> capabilities.
> >> >> >>> 
> >> >> >>> We also need a check so that you don't gain any capabilities, and
> >> >> >>> possibly a few other things.
> >> >> >> 
> >> >> >> Why?  I like the user_ns part, but I'm not immediately seeing the
> >> >> >> issue
> >> >> >> with capabilities.
> >> >> > 
> >> >> > My reasoning was instead of making this syscall as generic as
> >> >> > possible
> >> >> > start it out by only allowing the cases Jim cares about and working
> >> >> > with
> >> >> > a model where you can't gain any permissions you couldn't gain
> >> >> > otherwise.
> >> >> > 
> >> >> > Although the fd -1 trick to revert to your other existing cred seems
> >> >> > reasonable.
> >> >> > 
> >> >> >>> So I suspect we want a check something like:
> >> >> >>> 
> >> >> >>> if ((new_cred->securebits != current_cred->securebits)  ||
> >> >> >>> 
> >> >> >>>     (new_cred->cap_inheritable != current_cred->cap_inheritable)
> >> >> >>>     ||
> >> >> >>>     (new_cred->cap_permitted != current_cred->cap_permitted) ||
> >> >> >>>     (new_cred->cap_effective != current_cred->cap_effective) ||
> >> >> >>>     (new_cred->cap_bset != current_cred->cap_bset) ||
> >> >> >>>     (new_cred->jit_keyring != current_cred->jit_keyring) ||
> >> >> >>>     (new_cred->session_keyring != current_cred->session_keyring)
> >> >> >>>     ||
> >> >> >>>     (new_cred->process_keyring != current_cred->process_keyring)
> >> >> >>>     ||
> >> >> >>>     (new_cred->thread_keyring != current_cred->thread_keyring) ||
> >> >> >>>     (new_cred->request_keyring != current_cred->request_keyring)
> >> >> >>>     ||
> >> >> >>>     (new_cred->security != current_cred->security) ||
> >> >> >>>     (new_cred->user_ns != current_cred->user_ns)) {
> >> >> >>>     
> >> >> >>>      return -EPERM;
> >> >> >>> 
> >> >> >>> }
> >> >> >> 
> >> >> >> I *really* don't like the idea of being able to use any old file
> >> >> >> descriptor.  I barely care what rights the caller needs to have to
> >> >> >> invoke this -- if you're going to pass an fd that grants a
> >> >> >> capability
> >> >> >> (in the non-Linux sense of the work), please make sure that the
> >> >> >> sender
> >> >> >> actually wants that behavior.
> >> >> >> 
> >> >> >> IOW, have a syscall to generate a special fd for this purpose. 
> >> >> >> It's
> >> >> >> only a couple lines of code, and I think we'll really regret it if
> >> >> >> we
> >> >> >> fsck this up.
> >> >> >> 
> >> >> >> (I will take it as a personal challenge to find at least one
> >> >> >> exploitable
> >> >> >> privilege escalation in this if an arbitrary fd works.)
> >> >> > 
> >> >> > If you can't switch to a uid or a gid you couldn't switch to
> >> >> > otherwise
> >> >> > then the worst that can happen is an information leak.  And
> >> >> > information
> >> >> > leaks are rarely directly exploitable.
> >> >> 
> >> >> Here's the attack:
> >> >> 
> >> >> Suppose there's a daemon that uses this in conjunction with
> >> >> SCM_RIGHTS.  The daemon is effectively root (under the current
> >> >> proposal, it has to be).  So a client connects, sends a credential fd,
> >> >> and the daemon impersonates those credentials.
> >> >> 
> >> >> Now a malicious user obtains *any* fd opened by root.  It sends that
> >> >> fd to the daemon.  The daemon then impersonates root.  We lose.  (It
> >> >> can't possibly be hard to obtain an fd with highly privileged f_cred
> >> >> -- I bet that most console programs have stdin like that, for example.
> >> >> 
> >> >>  There are probably many setuid programs that will happily open
> >> >> 
> >> >> /dev/null for you, too.)
> >> > 
> >> > In my reply to Eric, I note that I need to add a check that the fd
> >> > passed
> >> > is one from switch_creds. With that test, not any fd will do and the
> >> > one
> >> > that does has only been able to set fsuid, fsgid, altgroups, and
> >> > reduced
> >> > (the nfsd set) caps.  They can do no more damage than what the original
> >> > switch_creds allowed.  The any fd by root no longer applies so use
> >> > doesn't get much (no escalation).
> >> > 
> >> >> >> Also... real_cred looks confusing.  AFAICS it is used *only* for
> >> >> >> knfsd
> >> >> >> and faccessat.  That is, current userspace can't see it.  But now
> >> >> >> you'll
> >> >> >> expose various oddities.  For example, AFAICS a capability-less
> >> >> >> process
> >> >> >> that's a userns owner can always use setuid.  This will *overwrite*
> >> >> >> real_cred.  Then you're screwed, especially if this happens by
> >> >> >> accident.
> >> >> > 
> >> >> > And doing in userland what faccessat, and knfsd do in the kernel is
> >> >> > exactly what is desired here.  But maybe there are issues with that.
> >> >> > 
> >> >> >> That being said, Windows has had functions like this for a long
> >> >> >> time.
> >> >> >> Processes have a primary token and possibly an impersonation token.
> >> >> >> Any
> >> >> >> process can call ImpersonateLoggedOnUser (no privilege required) to
> >> >> >> impersonate the credentials of a token (which is special kind of
> >> >> >> fd).
> >> >> >> Similarly, any process can call RevertToSelf to undo it.
> >> >> >> 
> >> >> >> Is there any actual problem with allowing completely unprivileged
> >> >> >> tasks
> >> >> >> to switch to one of these magic cred fds?  That would avoid needing
> >> >> >> a
> >> >> >> "revert" operation.
> >> >> > 
> >> >> > If the permission model is this switching of credentials doesn't get
> >> >> > you
> >> >> > anything you couldn't get some other way.  That would seem to
> >> >> > totally
> >> >> > rules out unprivileged processes switching these things.
> >> >> 
> >> >> IMO, there are two reasonable models that involve fds carrying some
> >> >> kind of credential.
> >> >> 
> >> >> 1. The fd actually carries the ability to use the credentials.  You
> >> >> need to be very careful whom you send these to.  The security
> >> >> implications are obvious (which is good) and the receiver doesn't need
> >> >> privilege (which is good, as long as the receiver is careful).
> >> > 
> >> > I test for caps.  I think using switch_creds in all forms should
> >> > require
> >> > privs.  I thought of the revert case and although it does simply return
> >> > to
> >> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.  This
> >> > means, of course, if you have really messed up things, you may not be
> >> > able to get back home which, although a bad thing for the process, is a
> >> > good thing for the system as a whole.
> >> > 
> >> >> 2. The fd is for identification only.  But this means that the fd
> >> >> carries the ability to identify as a user.  So you *still* have to be
> >> >> very careful about whom you send it to.  What you need is an operation
> >> >> that allows you to identify using the fd without transitively granting
> >> >> the recipient the same ability.  On networks, this is done by signing
> >> >> some kind of challenge.  The kernel could work the same way, or there
> >> >> could be a new CMSG_IDENTITY that you need an identity fd to send but
> >> >> that does not copy that fd to the recipient.
> >> > 
> >> > I am not sure I understand this.  CMSG only applies to UNIX_DOMAIN
> >> > sockets
> >> > which means that the switch_creds fd test still applies here.  It is
> >> > identification but only for within the same kernel.  As for namespaces,
> >> > the
> >> > translation was done when the creds fd was created.  I suppose if it
> >> > was
> >> > passed across namespace boundaries there could be a problem but what is
> >> > in
> >> > the creds structure is the translated fduid,fsgid etc., not the
> >> > untranslated one. We are still doing access checks and quota stuff with
> >> > the translated creds.  If one namespace has "fred" as uid 52 and
> >> > another
> >> > has "mary" as 52, the quota will only be credited to whoever "fred"
> >> > really is only if "fred" gets to be "mary" in the second alternate
> >> > universe...
> >> 
> >> I'm not talking about namespaces here -- I think we're not really on
> >> the same page.  Can you describe what you're envisioning doing with
> >> these fds?
> > 
> > I have a new version that is now two syscalls and no multiplexing has more
> > testing etc. to mirror exactly the tests in setfsuid().  I still am
> > testing
> > but plan to send this new patchset next week for review.
> > 
> > Ok,  I may have missed something.  What I meant to say is that I'm using
> > the same namespace functions for switch_creds as all the set*uid syscalls
> > use so what I have should work as well.  If one were to pass this fd via
> > CMSG, it could cross a namespace boundary.  The changed mappings of this
> > fd would be the same as for any other fd passed across now.  Maybe that
> > is irrelevant in this case.
> > 
> > The purpose of the fd is to create a handle to a creds set that a server
> > can then efficiently switch to prior to filesystem syscalls where fsuid
> > etc. are relevant (mkdir, creat, write, etc.).  This is exposing the
> > override_creds() and revert_creds() to these servers.  I do not intend in
> > our server to hand these fds to anyone else.  After all, they are useless
> > for anything other than switch_creds/use_creds.
> > 
> > The server keeps a cache of its known, currently active client creds along
> > with this fd.  The fast path is to lookup the creds and use the fd.  On
> > cache misses, it does a switch_creds() and saves the fd in the cache.
> 
> So if I understand you correctly, you're not planning on sending this
> fd between process at all.  In that case, adding a new API that seems
> designed for interprocess use and having exactly zero usecases to
> think about makes me nervous.  Why is it going to use fds again?

Correct.  They are handles to a set of creds for filesystem calls.  I use fds 
because every filp has a pointer to a set of creds that gets swapped in for 
various things like coredumps, and in the case of knfsd, impersonating the nfs 
client for when it does filesystem morphing ops.  In order to currently do what 
nfsd_setuser() does we must do:

	setfsuid();setfsgid(); setgroups(); setcaps();
followed by 
	open/creat/mknod/write
followed by
	 setfsuid(); setfsgid(); setgroups(); setcaps();

Each one of these 4 syscalls discards an RCU copy of the creds, both going in 
and going out.  In the first case below, the revert doesn't recycle the RCU 
because the open fd still holds a ref.  Same applied to the second case.  In 
fact here, no new creds are created either.  The creds only get recycled when 
the fd is closed.

Using an fd to have a handle on what all these do is the simplest thing to 
pass back and forth between user and kernel space.  This is what Ted T'so and 
Al Viro suggested.  This ends up looking like:

	fd = switch_creds(creds struct);
followed by
	open/creat/mknod/write
followed by
	use_creds(-1);

The -1 arg is used as I proposed in the earlier round, namely, it tells 
use_creds to revert to real creds.

Subsequent uses look like:

	use_creds(cached fd);
followed by
	open/creat/mknod/write
followed by
	use_creds(-1);


There is no need to pass the fd because if the other side has the caps to use 
the fd, they have the caps to get their own.  We are saving overhead.
> 
> Or maybe I should wait to see the updated API before complaining :)
> 
> >> >> >> Another note: I think that there may be issues if the creator of a
> >> >> >> token
> >> >> >> has no_new_privs set and the user doesn't.  Imagine a daemon that
> >> >> >> accepts one of these fds, impersonates it, and calls exec.  This
> >> >> >> could
> >> >> >> be used to escape from no_new_privs land.
> >> >> > 
> >> >> > Which is why I was suggesting that we don't allow changing any field
> >> >> > in
> >> >> > the cred except for uids and gids.
> >> >> 
> >> >> If the daemon impersonates and execs, we still lose.
> >> > 
> >> > I answered this.  You only get to impersonate for purposes of file
> >> > access
> >> > with reduced caps to prevent you from being root as well.  Also, since
> >> > these are O_CLOEXEC and switch_creds "magic" fds, this can't happen
> >> > because the fd is gone post-exec.
> >> 
> >> If a no_new_privs process authenticates to a daemon and that daemon
> >> execs, then there's now a dumpable, ptraceable, non-no-new-privs
> >> process running as the uid/gid of the no_new_privs process.  This may
> >> be dangerous.
> > 
> > Two things.  My new patch set now throws an error if the fd is not one
> > returned by switch_creds().  The new syscall here is use_creds() btw. 
> > That fd is opened O_CLOEXEC so the syscall has two guards.  First, it can
> > only be one that switch_creds() returned and second, it gets closed on an
> > exec so the no_new_privs process doesn't get it.  In addition,
> > switch_creds() reduces the effective caps set to the same minimal set
> > that nfsd_setuser() has.
> > 
> > If someone else chooses to pass this fd via CMSG, whoever gets it has
> > reduced privs and in order to use it for anything, i.e. use_creds(), it
> > still needs to inherit SETUID and SETGID caps from its parent or it will
> > get an EPERM error. Passing this in "friendly" code defeats the purpose
> > of the cache above in that we could get fd leaks.  If there is another
> > flag that could prevent its being passed along via CMSG, I can add that
> > too because using CMSG is well outside the use case.
> 
> I still don't see the point of lowering effective caps, but this is
> IMO minor.  Are you checking the permitted set on revert?

We lower specific filesystem related caps to prevent using the root bypass of 
acess checks, prevent root bypass of quota restrictions, and to correctly 
charge the impersonated user for quota'd resource usage.  This is what 
nfsd_setuser() does.  I use the same mask set.

I don't check the permitted set associated with the fd because I've already 
checked that it is a switch_creds() returned fd (they are immutable).  I do 
make a check at the beginning of both syscalls for the caps to set uid/gid, 
the same test as setfsuid et al.

> 
> --Andy

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-10-31 20:39                         ` Jim Lieb
@ 2013-11-01 13:24                           ` Tetsuo Handa
  2013-11-01 15:49                             ` Jim Lieb
  0 siblings, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2013-11-01 13:24 UTC (permalink / raw)
  To: jlieb, luto
  Cc: linux-fsdevel, ebiederm, viro, tytso, linux-kernel, bfields, jlayton

Jim Lieb wrote:
> Subsequent uses look like:
> 
> 	use_creds(cached fd);
> followed by
> 	open/creat/mknod/write
> followed by
> 	use_creds(-1);

Are you aware that calling commit_creds() is prohibitted between
override_creds() and revert_creds() ?

If the caller does some operation that calls commit_creds() (like
example below), the kernel triggers BUG().

---------- example module start ----------
#include <linux/module.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/file.h>

static int __init test_init(void)
{
        { /* switch_creds() syscall */
                struct fd f = fdget(0);
                if (!f.file)
                        return -EBADF;
                put_cred(override_creds(f.file->f_cred));
                fdput(f);
        }
        { /* something that calls commit_creds() */
                struct cred *cred = prepare_creds();
                if (cred)
                        commit_creds(cred);
        }
        return 0;
}

static void test_exit(void)
{
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
---------- example module end ----------

Since nobody can guarantee that the caller of switch_creds() never does
some operation that calls commit_creds(), I don't think switch_creds()
based on override_creds() will work.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-11-01 13:24                           ` Tetsuo Handa
@ 2013-11-01 15:49                             ` Jim Lieb
  2013-11-01 16:07                               ` Tetsuo Handa
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Lieb @ 2013-11-01 15:49 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: luto, linux-fsdevel, ebiederm, viro, tytso, linux-kernel,
	bfields, jlayton

On Friday, November 01, 2013 22:24:12 Tetsuo Handa wrote:
> Jim Lieb wrote:
> > Subsequent uses look like:
> > 	use_creds(cached fd);
> > 
> > followed by
> > 
> > 	open/creat/mknod/write
> > 
> > followed by
> > 
> > 	use_creds(-1);
> 
> Are you aware that calling commit_creds() is prohibitted between
> override_creds() and revert_creds() ?
> 
> If the caller does some operation that calls commit_creds() (like
> example below), the kernel triggers BUG().

Yes, I do.  I caught this in an early pass.  I only use override_creds() and 
revert_creds().  
> 
> ---------- example module start ----------
> #include <linux/module.h>
> #include <linux/cred.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> 
> static int __init test_init(void)
> {
>         { /* switch_creds() syscall */
>                 struct fd f = fdget(0);
>                 if (!f.file)
>                         return -EBADF;
>                 put_cred(override_creds(f.file->f_cred));
>                 fdput(f);
>         }
>         { /* something that calls commit_creds() */
>                 struct cred *cred = prepare_creds();
>                 if (cred)
>                         commit_creds(cred);
>         }
>         return 0;
> }
> 
> static void test_exit(void)
> {
> }
> 
> module_init(test_init);
> module_exit(test_exit);
> MODULE_LICENSE("GPL");
> ---------- example module end ----------
> 
> Since nobody can guarantee that the caller of switch_creds() never does
> some operation that calls commit_creds(), I don't think switch_creds()
> based on override_creds() will work.

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-11-01 15:49                             ` Jim Lieb
@ 2013-11-01 16:07                               ` Tetsuo Handa
  2013-11-01 17:16                                 ` Jim Lieb
  0 siblings, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2013-11-01 16:07 UTC (permalink / raw)
  To: jlieb
  Cc: luto, linux-fsdevel, ebiederm, viro, tytso, linux-kernel,
	bfields, jlayton

Jim Lieb wrote:
> On Friday, November 01, 2013 22:24:12 Tetsuo Handa wrote:
> > Jim Lieb wrote:
> > > Subsequent uses look like:
> > > 	use_creds(cached fd);
> > > 
> > > followed by
> > > 
> > > 	open/creat/mknod/write
> > > 
> > > followed by
> > > 
> > > 	use_creds(-1);
> > 
> > Are you aware that calling commit_creds() is prohibitted between
> > override_creds() and revert_creds() ?
> > 
> > If the caller does some operation that calls commit_creds() (like
> > example below), the kernel triggers BUG().
> 
> Yes, I do.  I caught this in an early pass.  I only use override_creds() and 
> revert_creds().  

Excuse me, but even below example will trigger BUG(). You pack
override_creds() + open() + revert_creds() into one system call so that the
caller of this system call shall not do something that calls commit_creds() ?

---------- example module start ----------
#include <linux/module.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/file.h>

static int __init test_init(void)
{
        const struct cred *orig;
        { /* switch_cred() syscall */
                struct fd f = fdget(0);
                if (!f.file)
                        return -EBADF;
                orig = override_creds(f.file->f_cred);
                fdput(f);
        }
        { /* something that calls commit_creds() */
                struct cred *cred = prepare_creds();
                if (cred)
                        commit_creds(cred);
        }
        { /* restore */
                revert_creds(orig);
        }
        return 0;
}

static void test_exit(void)
{
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
---------- example module end ----------

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops
  2013-11-01 16:07                               ` Tetsuo Handa
@ 2013-11-01 17:16                                 ` Jim Lieb
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Lieb @ 2013-11-01 17:16 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: luto, linux-fsdevel, ebiederm, viro, tytso, linux-kernel,
	bfields, jlayton

On Saturday, November 02, 2013 01:07:59 Tetsuo Handa wrote:
> Jim Lieb wrote:
> > On Friday, November 01, 2013 22:24:12 Tetsuo Handa wrote:
> > > Jim Lieb wrote:
> > > > Subsequent uses look like:
> > > > 	use_creds(cached fd);
> > > > 
> > > > followed by
> > > > 
> > > > 	open/creat/mknod/write
> > > > 
> > > > followed by
> > > > 
> > > > 	use_creds(-1);
> > > 
> > > Are you aware that calling commit_creds() is prohibitted between
> > > override_creds() and revert_creds() ?
> > > 
> > > If the caller does some operation that calls commit_creds() (like
> > > example below), the kernel triggers BUG().
> > 
> > Yes, I do.  I caught this in an early pass.  I only use override_creds()
> > and revert_creds().
> 
> Excuse me, but even below example will trigger BUG(). You pack
> override_creds() + open() + revert_creds() into one system call so that the
> caller of this system call shall not do something that calls commit_creds()
> ?

Ok, I see your point here.  If I do a switch_creds and the userland does 
something like seteuid before I do the revert, we are toast.  Correct?

This is an issue.  Thanks for pointing this out.  It is certainly not in my 
use case but that doesn't mean someone else won't try it.  I have some more 
work to do.
> 
> ---------- example module start ----------
> #include <linux/module.h>
> #include <linux/cred.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> 
> static int __init test_init(void)
> {
>         const struct cred *orig;
>         { /* switch_cred() syscall */
>                 struct fd f = fdget(0);
>                 if (!f.file)
>                         return -EBADF;
>                 orig = override_creds(f.file->f_cred);
>                 fdput(f);
>         }
>         { /* something that calls commit_creds() */
>                 struct cred *cred = prepare_creds();
>                 if (cred)
>                         commit_creds(cred);
>         }
>         { /* restore */
>                 revert_creds(orig);
>         }
>         return 0;
> }
> 
> static void test_exit(void)
> {
> }
> 
> module_init(test_init);
> module_exit(test_exit);
> MODULE_LICENSE("GPL");
> ---------- example module end ----------

-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-11-01 17:16 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-16 22:01 [RFC PATCH 0/3] System call to switch user credentials Jim Lieb
2013-10-16 22:01 ` [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Jim Lieb
2013-10-16 22:42   ` Al Viro
2013-10-17  1:18     ` Eric W. Biederman
2013-10-17  1:20       ` Al Viro
2013-10-17  3:35         ` Jim Lieb
2013-10-17  3:35           ` Jim Lieb
2013-10-17  3:52         ` Eric W. Biederman
2013-10-24  1:14           ` Andy Lutomirski
2013-10-24  5:59             ` Eric W. Biederman
2013-10-24 19:04               ` Jim Lieb
2013-10-24 19:04                 ` Jim Lieb
2013-10-24 19:28               ` Andy Lutomirski
2013-10-24 20:24                 ` Jim Lieb
2013-10-24 20:24                   ` Jim Lieb
2013-10-31 19:09                   ` Andy Lutomirski
2013-10-31 19:09                     ` Andy Lutomirski
2013-10-31 19:43                     ` Jim Lieb
2013-10-31 19:48                       ` Andy Lutomirski
2013-10-31 20:39                         ` Jim Lieb
2013-11-01 13:24                           ` Tetsuo Handa
2013-11-01 15:49                             ` Jim Lieb
2013-11-01 16:07                               ` Tetsuo Handa
2013-11-01 17:16                                 ` Jim Lieb
2013-10-16 22:01 ` [PATCH 2/3] switch_creds: Add x86 syscall number Jim Lieb
2013-10-16 22:01 ` [PATCH 3/3] switch_creds: Assign x86_64 syscall number for switch_creds Jim Lieb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.