linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged
@ 2014-11-29 17:26 Andy Lutomirski
  2014-12-02 12:09 ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-11-29 17:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	kenton, Andy Lutomirski, stable

Classic unix permission checks have an interesting feature.  The
group permissions for a file can be set to less than the other
permissions on a file.  Occasionally this is used deliberately to
give a certain group of users fewer permissions than the default.

User namespaces break this usage.  Groups set in rgid or egid are
unaffected because an unprivileged user namespace creator can only
map a single group, so setresgid inside and outside the namespace
have the same effect.  However, an unprivileged user namespace
creator can currently use setgroups(2) to drop all supplementary
groups, so, if a supplementary group denies access to some resource,
user namespaces can be used to bypass that restriction.

To fix this issue, this introduces a new user namespace flag
USERNS_SETGROUPS_ALLOWED.  If that flag is not set, then
setgroups(2) will fail regardless of the caller's capabilities.

USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace.  By
default, if the writer of gid_map has CAP_SETGID in the parent
userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
USERNS_SETGROUPS_ALLOWED will be set in the child.  If the writer is
not so privileged, then writing to gid_map will fail unless the
writer adds "setgroups deny" to gid_map, in which case the check is
skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.

The full semantics are:

If "setgroups allow" is present or no explicit "setgroups" setting
is written to gid_map, then writing to gid_map will fail with -EPERM
unless the opener and writer have CAP_SETGID in the parent namespace
and the parent namespace has USERNS_SETGROUPS_ALLOWED.

If "setgroups deny" is present, then writing gid_map will work as
before, but USERNS_SETGROUPS_ALLOWED will remain cleared.  This will
result in processes in the userns that have CAP_SETGID to be
nontheless unable to use setgroups(2).  If this breaks something
inside the userns, then this is okay -- the userns creator
specifically requested this behavior.

While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
namespace creator has no supplementary groups, doing so could be
surprising and could have unpleasant interactions with setns(2).

Any application that uses newgidmap(1) should be unaffected by this
fix, but unprivileged users that create user namespaces to
manipulate mounts or sandbox themselves will break until they start
using "setgroups deny".

This should fix CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

Unlike v1, this *will* break things like Sandstorm.  Fixing them will be
easy.  I agree that this will result in better long-term semantics, but
I'm not so happy about breaking working software.

If this is unpalatable, here's a different option: get rid of all these
permission checks and just change setgroups.  Specifically, make it so
that setgroups(2) in a userns will succeed but will silently refuse to
remove unmapped groups.

Changes from v1:
 - Userns flags are now properly atomic.
 - "setgroups allow" is now the default, so legacy unprivileged gid_map
   writers will start to fail.

 include/linux/user_namespace.h |  3 +++
 kernel/groups.c                |  3 +++
 kernel/user.c                  |  1 +
 kernel/user_namespace.c        | 42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..0ae4a8c97165 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,8 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
 	} extent[UID_GID_MAP_MAX_EXTENTS];
 };
 
+#define USERNS_SETGROUPS_ALLOWED 0
+
 struct user_namespace {
 	struct uid_gid_map	uid_map;
 	struct uid_gid_map	gid_map;
@@ -27,6 +29,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..b5ec42423202 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 	struct group_info *group_info;
 	int retval;
 
+	if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
+		return -EPERM;
 	if (!ns_capable(current_user_ns(), CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..58fba8ea0845 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.flags = (1 << USERNS_SETGROUPS_ALLOWED),
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..1f63935483e9 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	char *kbuf, *pos, *next_line;
 	ssize_t ret = -EINVAL;
 
+	bool may_setgroups = false;
+	bool setgroups_requested = true;
+	bool seen_explicit_setgroups = false;
+
 	/*
 	 * The id_map_mutex serializes all writes to any given map.
 	 *
@@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
 		goto out;
 
+	if (map == &ns->gid_map) {
+		/*
+		 * Setgroups is permitted if the writer and the
+		 * parent ns are privileged.
+		 */
+		may_setgroups =
+			test_bit(USERNS_SETGROUPS_ALLOWED,
+				 &ns->parent->flags) &&
+			file_ns_capable(file, ns->parent, CAP_SETGID) &&
+			ns_capable(ns->parent, CAP_SETGID);
+	}
+
 	/* Get a buffer */
 	ret = -ENOMEM;
 	page = __get_free_page(GFP_TEMPORARY);
@@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 				next_line = NULL;
 		}
 
+		/* Is this line a gid_map option? */
+		if (map == &ns->gid_map) {
+			if (!strcmp(pos, "setgroups deny")) {
+				if (seen_explicit_setgroups)
+					goto out;
+				seen_explicit_setgroups = true;
+				setgroups_requested = false;
+				continue;
+			} else if (!strcmp(pos, "setgroups allow")) {
+				if (seen_explicit_setgroups)
+					goto out;
+				seen_explicit_setgroups = true;
+				setgroups_requested = true;
+				continue;
+			}
+		}
+
 		pos = skip_spaces(pos);
 		extent->first = simple_strtoul(pos, &pos, 10);
 		if (!isspace(*pos))
@@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 		extent->lower_first = lower_first;
 	}
 
+	/* Validate and install setgroups permission. */
+	if (map == &ns->gid_map && setgroups_requested) {
+		if (!may_setgroups) {
+			ret = -EPERM;
+			goto out;
+		}
+		set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+	}
+
 	/* Install the map */
 	memcpy(map->extent, new_map.extent,
 		new_map.nr_extents*sizeof(new_map.extent[0]));
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged
  2014-11-29 17:26 [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged Andy Lutomirski
@ 2014-12-02 12:09 ` Eric W. Biederman
  2014-12-02 18:53   ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 12:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	kenton, stable

Andy Lutomirski <luto@amacapital.net> writes:

> Classic unix permission checks have an interesting feature.  The
> group permissions for a file can be set to less than the other
> permissions on a file.  Occasionally this is used deliberately to
> give a certain group of users fewer permissions than the default.
>
> User namespaces break this usage.  Groups set in rgid or egid are
> unaffected because an unprivileged user namespace creator can only
> map a single group, so setresgid inside and outside the namespace
> have the same effect.  However, an unprivileged user namespace
> creator can currently use setgroups(2) to drop all supplementary
> groups, so, if a supplementary group denies access to some resource,
> user namespaces can be used to bypass that restriction.
>
> To fix this issue, this introduces a new user namespace flag
> USERNS_SETGROUPS_ALLOWED.  If that flag is not set, then
> setgroups(2) will fail regardless of the caller's capabilities.
>
> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace.  By
> default, if the writer of gid_map has CAP_SETGID in the parent
> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
> USERNS_SETGROUPS_ALLOWED will be set in the child.  If the writer is
> not so privileged, then writing to gid_map will fail unless the
> writer adds "setgroups deny" to gid_map, in which case the check is
> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>
> The full semantics are:
>
> If "setgroups allow" is present or no explicit "setgroups" setting
> is written to gid_map, then writing to gid_map will fail with -EPERM
> unless the opener and writer have CAP_SETGID in the parent namespace
> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>
> If "setgroups deny" is present, then writing gid_map will work as
> before, but USERNS_SETGROUPS_ALLOWED will remain cleared.  This will
> result in processes in the userns that have CAP_SETGID to be
> nontheless unable to use setgroups(2).  If this breaks something
> inside the userns, then this is okay -- the userns creator
> specifically requested this behavior.

I think we need to do this but I also think setgroups allow/deny
should be a separate knob than the uid/gid mapping.

If for no other reason than you missed at least two implementations of
setgroups, in your implementation.

> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
> namespace creator has no supplementary groups, doing so could be
> surprising and could have unpleasant interactions with setns(2).
>
> Any application that uses newgidmap(1) should be unaffected by this
> fix, but unprivileged users that create user namespaces to
> manipulate mounts or sandbox themselves will break until they start
> using "setgroups deny".
>
> This should fix CVE-2014-8989.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>
> Unlike v1, this *will* break things like Sandstorm.  Fixing them will be
> easy.  I agree that this will result in better long-term semantics, but
> I'm not so happy about breaking working software.

I know what you mean.   One of the pieces of software broken by all of
this is my test to verify the remount semantics.   Which makes all of
this very unfortunate.

> If this is unpalatable, here's a different option: get rid of all these
> permission checks and just change setgroups.  Specifically, make it so
> that setgroups(2) in a userns will succeed but will silently refuse to
> remove unmapped groups.

Nope silently refusing to remove unmapped groups is not enough.  I can
make any gid in my supplemental groups my egid, it takes a sgid helper
application but I don't need any special privileges to create that.
Once that group is my egid I can map it.  Which means I could drop
any one group of my choosing without privielges.  Which out and out
breaks negative groups :(

I got to looking and I have a significant piece of code that all of this
breaks.

tools/testing/selftests/mount/unprivileged-remount-test.c

So I am extra motivated to figure out at find a way to preserve most of
the existing functionality.  My regression tests won't pass until I can
find something pallateable.

It is very annoying that every option I have considered so far breaks
something useful.

Having a write once setgroups disable, and the allowing unprivileged
mappings after that seems the most palatable option I have seen,
semantically.  Which means existing software that doesn't care about
setgroups can just add the disable code and then work otherwise
unmodified.

The other option that I have played with is forcing a set of groups
in setgroups if your user namespace was created without privilege,
that winds up requiring that verify you don't have any other
supplementary groups, and is generally messy whichever way I look at it.

*Pounds head on desk*

What a mess.

Eric

> Changes from v1:
>  - Userns flags are now properly atomic.
>  - "setgroups allow" is now the default, so legacy unprivileged gid_map
>    writers will start to fail.
>
>  include/linux/user_namespace.h |  3 +++
>  kernel/groups.c                |  3 +++
>  kernel/user.c                  |  1 +
>  kernel/user_namespace.c        | 42 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 49 insertions(+)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..0ae4a8c97165 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -17,6 +17,8 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
>  	} extent[UID_GID_MAP_MAX_EXTENTS];
>  };
>  
> +#define USERNS_SETGROUPS_ALLOWED 0
> +
>  struct user_namespace {
>  	struct uid_gid_map	uid_map;
>  	struct uid_gid_map	gid_map;
> @@ -27,6 +29,7 @@ struct user_namespace {
>  	kuid_t			owner;
>  	kgid_t			group;
>  	unsigned int		proc_inum;
> +	unsigned long		flags;
>  
>  	/* Register of per-UID persistent keyrings for this namespace */
>  #ifdef CONFIG_PERSISTENT_KEYRINGS
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 451698f86cfa..b5ec42423202 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
>  #include <linux/slab.h>
>  #include <linux/security.h>
>  #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
>  #include <asm/uaccess.h>
>  
>  /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>  	struct group_info *group_info;
>  	int retval;
>  
> +	if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
> +		return -EPERM;
>  	if (!ns_capable(current_user_ns(), CAP_SETGID))
>  		return -EPERM;
>  	if ((unsigned)gidsetsize > NGROUPS_MAX)
> diff --git a/kernel/user.c b/kernel/user.c
> index 4efa39350e44..58fba8ea0845 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
>  	.owner = GLOBAL_ROOT_UID,
>  	.group = GLOBAL_ROOT_GID,
>  	.proc_inum = PROC_USER_INIT_INO,
> +	.flags = (1 << USERNS_SETGROUPS_ALLOWED),
>  #ifdef CONFIG_PERSISTENT_KEYRINGS
>  	.persistent_keyring_register_sem =
>  	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index aa312b0dc3ec..1f63935483e9 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	char *kbuf, *pos, *next_line;
>  	ssize_t ret = -EINVAL;
>  
> +	bool may_setgroups = false;
> +	bool setgroups_requested = true;
> +	bool seen_explicit_setgroups = false;
> +
>  	/*
>  	 * The id_map_mutex serializes all writes to any given map.
>  	 *
> @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>  		goto out;
>  
> +	if (map == &ns->gid_map) {
> +		/*
> +		 * Setgroups is permitted if the writer and the
> +		 * parent ns are privileged.
> +		 */
> +		may_setgroups =
> +			test_bit(USERNS_SETGROUPS_ALLOWED,
> +				 &ns->parent->flags) &&
> +			file_ns_capable(file, ns->parent, CAP_SETGID) &&
> +			ns_capable(ns->parent, CAP_SETGID);
> +	}
> +
>  	/* Get a buffer */
>  	ret = -ENOMEM;
>  	page = __get_free_page(GFP_TEMPORARY);
> @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  				next_line = NULL;
>  		}
>  
> +		/* Is this line a gid_map option? */
> +		if (map == &ns->gid_map) {
> +			if (!strcmp(pos, "setgroups deny")) {
> +				if (seen_explicit_setgroups)
> +					goto out;
> +				seen_explicit_setgroups = true;
> +				setgroups_requested = false;
> +				continue;
> +			} else if (!strcmp(pos, "setgroups allow")) {
> +				if (seen_explicit_setgroups)
> +					goto out;
> +				seen_explicit_setgroups = true;
> +				setgroups_requested = true;
> +				continue;
> +			}
> +		}
> +
>  		pos = skip_spaces(pos);
>  		extent->first = simple_strtoul(pos, &pos, 10);
>  		if (!isspace(*pos))
> @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  		extent->lower_first = lower_first;
>  	}
>  
> +	/* Validate and install setgroups permission. */
> +	if (map == &ns->gid_map && setgroups_requested) {
> +		if (!may_setgroups) {
> +			ret = -EPERM;
> +			goto out;
> +		}
> +		set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
> +	}
> +
>  	/* Install the map */
>  	memcpy(map->extent, new_map.extent,
>  		new_map.nr_extents*sizeof(new_map.extent[0]));

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged
  2014-12-02 12:09 ` Eric W. Biederman
@ 2014-12-02 18:53   ` Andy Lutomirski
  2014-12-02 19:45     ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> Classic unix permission checks have an interesting feature.  The
>> group permissions for a file can be set to less than the other
>> permissions on a file.  Occasionally this is used deliberately to
>> give a certain group of users fewer permissions than the default.
>>
>> User namespaces break this usage.  Groups set in rgid or egid are
>> unaffected because an unprivileged user namespace creator can only
>> map a single group, so setresgid inside and outside the namespace
>> have the same effect.  However, an unprivileged user namespace
>> creator can currently use setgroups(2) to drop all supplementary
>> groups, so, if a supplementary group denies access to some resource,
>> user namespaces can be used to bypass that restriction.
>>
>> To fix this issue, this introduces a new user namespace flag
>> USERNS_SETGROUPS_ALLOWED.  If that flag is not set, then
>> setgroups(2) will fail regardless of the caller's capabilities.
>>
>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace.  By
>> default, if the writer of gid_map has CAP_SETGID in the parent
>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>> USERNS_SETGROUPS_ALLOWED will be set in the child.  If the writer is
>> not so privileged, then writing to gid_map will fail unless the
>> writer adds "setgroups deny" to gid_map, in which case the check is
>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>
>> The full semantics are:
>>
>> If "setgroups allow" is present or no explicit "setgroups" setting
>> is written to gid_map, then writing to gid_map will fail with -EPERM
>> unless the opener and writer have CAP_SETGID in the parent namespace
>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>
>> If "setgroups deny" is present, then writing gid_map will work as
>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared.  This will
>> result in processes in the userns that have CAP_SETGID to be
>> nontheless unable to use setgroups(2).  If this breaks something
>> inside the userns, then this is okay -- the userns creator
>> specifically requested this behavior.
>
> I think we need to do this but I also think setgroups allow/deny
> should be a separate knob than the uid/gid mapping.

Yeah.  It should be readable, too.

>
> If for no other reason than you missed at least two implementations of
> setgroups, in your implementation.

I clearly didn't grep hard enough.  Grr.

>
>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>> namespace creator has no supplementary groups, doing so could be
>> surprising and could have unpleasant interactions with setns(2).
>>
>> Any application that uses newgidmap(1) should be unaffected by this
>> fix, but unprivileged users that create user namespaces to
>> manipulate mounts or sandbox themselves will break until they start
>> using "setgroups deny".
>>
>> This should fix CVE-2014-8989.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> ---
>>
>> Unlike v1, this *will* break things like Sandstorm.  Fixing them will be
>> easy.  I agree that this will result in better long-term semantics, but
>> I'm not so happy about breaking working software.
>
> I know what you mean.   One of the pieces of software broken by all of
> this is my test to verify the remount semantics.   Which makes all of
> this very unfortunate.
>
>> If this is unpalatable, here's a different option: get rid of all these
>> permission checks and just change setgroups.  Specifically, make it so
>> that setgroups(2) in a userns will succeed but will silently refuse to
>> remove unmapped groups.
>
> Nope silently refusing to remove unmapped groups is not enough.  I can
> make any gid in my supplemental groups my egid, it takes a sgid helper
> application but I don't need any special privileges to create that.
> Once that group is my egid I can map it.  Which means I could drop
> any one group of my choosing without privielges.  Which out and out
> breaks negative groups :(

Whoops, right.  And you can, indeed, have egid match one of your
supplementary groups.

>
> I got to looking and I have a significant piece of code that all of this
> breaks.
>
> tools/testing/selftests/mount/unprivileged-remount-test.c
>
> So I am extra motivated to figure out at find a way to preserve most of
> the existing functionality.  My regression tests won't pass until I can
> find something pallateable.
>
> It is very annoying that every option I have considered so far breaks
> something useful.
>
> Having a write once setgroups disable, and the allowing unprivileged
> mappings after that seems the most palatable option I have seen,
> semantically.  Which means existing software that doesn't care about
> setgroups can just add the disable code and then work otherwise
> unmodified.
>
> The other option that I have played with is forcing a set of groups
> in setgroups if your user namespace was created without privilege,
> that winds up requiring that verify you don't have any other
> supplementary groups, and is generally messy whichever way I look at it.

How bad would the automatic selection of setgroups behavior really be?

Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
"deny", or "auto".  It starts out as "auto" (or "deny" if it's set to
"deny" in the parent).  Once any of the maps have been set,
userns_options becomes readonly.  If you try to write to gid_map when
setgroups == auto, then it switches to "allow" or "deny" depending on
whether the writer has privilege.

This is nasty magical behavior, but it should DTRT for existing users,
and everyone can be updated to set the value explicitly.

FWIW, it might also make sense to move all of this stuff into
/proc/PID/userns.  There may be races in which a setuid or otherwise
privileged helper pokes at more than one userns file but actually
modifies different namespaces each time.  I don't know whether these
races matter.  uid_map, gid_map, and projid_map could be symlinks.

--Andy

>
> *Pounds head on desk*
>
> What a mess.
>
> Eric
>
>> Changes from v1:
>>  - Userns flags are now properly atomic.
>>  - "setgroups allow" is now the default, so legacy unprivileged gid_map
>>    writers will start to fail.
>>
>>  include/linux/user_namespace.h |  3 +++
>>  kernel/groups.c                |  3 +++
>>  kernel/user.c                  |  1 +
>>  kernel/user_namespace.c        | 42 ++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 49 insertions(+)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..0ae4a8c97165 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -17,6 +17,8 @@ struct uid_gid_map {        /* 64 bytes -- 1 cache line */
>>       } extent[UID_GID_MAP_MAX_EXTENTS];
>>  };
>>
>> +#define USERNS_SETGROUPS_ALLOWED 0
>> +
>>  struct user_namespace {
>>       struct uid_gid_map      uid_map;
>>       struct uid_gid_map      gid_map;
>> @@ -27,6 +29,7 @@ struct user_namespace {
>>       kuid_t                  owner;
>>       kgid_t                  group;
>>       unsigned int            proc_inum;
>> +     unsigned long           flags;
>>
>>       /* Register of per-UID persistent keyrings for this namespace */
>>  #ifdef CONFIG_PERSISTENT_KEYRINGS
>> diff --git a/kernel/groups.c b/kernel/groups.c
>> index 451698f86cfa..b5ec42423202 100644
>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/security.h>
>>  #include <linux/syscalls.h>
>> +#include <linux/user_namespace.h>
>>  #include <asm/uaccess.h>
>>
>>  /* init to 2 - one for init_task, one to ensure it is never freed */
>> @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>>       struct group_info *group_info;
>>       int retval;
>>
>> +     if (!test_bit(USERNS_SETGROUPS_ALLOWED, &current_user_ns()->flags))
>> +             return -EPERM;
>>       if (!ns_capable(current_user_ns(), CAP_SETGID))
>>               return -EPERM;
>>       if ((unsigned)gidsetsize > NGROUPS_MAX)
>> diff --git a/kernel/user.c b/kernel/user.c
>> index 4efa39350e44..58fba8ea0845 100644
>> --- a/kernel/user.c
>> +++ b/kernel/user.c
>> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
>>       .owner = GLOBAL_ROOT_UID,
>>       .group = GLOBAL_ROOT_GID,
>>       .proc_inum = PROC_USER_INIT_INO,
>> +     .flags = (1 << USERNS_SETGROUPS_ALLOWED),
>>  #ifdef CONFIG_PERSISTENT_KEYRINGS
>>       .persistent_keyring_register_sem =
>>       __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index aa312b0dc3ec..1f63935483e9 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>>       char *kbuf, *pos, *next_line;
>>       ssize_t ret = -EINVAL;
>>
>> +     bool may_setgroups = false;
>> +     bool setgroups_requested = true;
>> +     bool seen_explicit_setgroups = false;
>> +
>>       /*
>>        * The id_map_mutex serializes all writes to any given map.
>>        *
>> @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>>       if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>               goto out;
>>
>> +     if (map == &ns->gid_map) {
>> +             /*
>> +              * Setgroups is permitted if the writer and the
>> +              * parent ns are privileged.
>> +              */
>> +             may_setgroups =
>> +                     test_bit(USERNS_SETGROUPS_ALLOWED,
>> +                              &ns->parent->flags) &&
>> +                     file_ns_capable(file, ns->parent, CAP_SETGID) &&
>> +                     ns_capable(ns->parent, CAP_SETGID);
>> +     }
>> +
>>       /* Get a buffer */
>>       ret = -ENOMEM;
>>       page = __get_free_page(GFP_TEMPORARY);
>> @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>>                               next_line = NULL;
>>               }
>>
>> +             /* Is this line a gid_map option? */
>> +             if (map == &ns->gid_map) {
>> +                     if (!strcmp(pos, "setgroups deny")) {
>> +                             if (seen_explicit_setgroups)
>> +                                     goto out;
>> +                             seen_explicit_setgroups = true;
>> +                             setgroups_requested = false;
>> +                             continue;
>> +                     } else if (!strcmp(pos, "setgroups allow")) {
>> +                             if (seen_explicit_setgroups)
>> +                                     goto out;
>> +                             seen_explicit_setgroups = true;
>> +                             setgroups_requested = true;
>> +                             continue;
>> +                     }
>> +             }
>> +
>>               pos = skip_spaces(pos);
>>               extent->first = simple_strtoul(pos, &pos, 10);
>>               if (!isspace(*pos))
>> @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>>               extent->lower_first = lower_first;
>>       }
>>
>> +     /* Validate and install setgroups permission. */
>> +     if (map == &ns->gid_map && setgroups_requested) {
>> +             if (!may_setgroups) {
>> +                     ret = -EPERM;
>> +                     goto out;
>> +             }
>> +             set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
>> +     }
>> +
>>       /* Install the map */
>>       memcpy(map->extent, new_map.extent,
>>               new_map.nr_extents*sizeof(new_map.extent[0]));



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged
  2014-12-02 18:53   ` Andy Lutomirski
@ 2014-12-02 19:45     ` Eric W. Biederman
  2014-12-02 20:13       ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 19:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> Classic unix permission checks have an interesting feature.  The
>>> group permissions for a file can be set to less than the other
>>> permissions on a file.  Occasionally this is used deliberately to
>>> give a certain group of users fewer permissions than the default.
>>>
>>> User namespaces break this usage.  Groups set in rgid or egid are
>>> unaffected because an unprivileged user namespace creator can only
>>> map a single group, so setresgid inside and outside the namespace
>>> have the same effect.  However, an unprivileged user namespace
>>> creator can currently use setgroups(2) to drop all supplementary
>>> groups, so, if a supplementary group denies access to some resource,
>>> user namespaces can be used to bypass that restriction.
>>>
>>> To fix this issue, this introduces a new user namespace flag
>>> USERNS_SETGROUPS_ALLOWED.  If that flag is not set, then
>>> setgroups(2) will fail regardless of the caller's capabilities.
>>>
>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace.  By
>>> default, if the writer of gid_map has CAP_SETGID in the parent
>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>>> USERNS_SETGROUPS_ALLOWED will be set in the child.  If the writer is
>>> not so privileged, then writing to gid_map will fail unless the
>>> writer adds "setgroups deny" to gid_map, in which case the check is
>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>>
>>> The full semantics are:
>>>
>>> If "setgroups allow" is present or no explicit "setgroups" setting
>>> is written to gid_map, then writing to gid_map will fail with -EPERM
>>> unless the opener and writer have CAP_SETGID in the parent namespace
>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>>
>>> If "setgroups deny" is present, then writing gid_map will work as
>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared.  This will
>>> result in processes in the userns that have CAP_SETGID to be
>>> nontheless unable to use setgroups(2).  If this breaks something
>>> inside the userns, then this is okay -- the userns creator
>>> specifically requested this behavior.
>>
>> I think we need to do this but I also think setgroups allow/deny
>> should be a separate knob than the uid/gid mapping.
>
> Yeah.  It should be readable, too.
>
>>
>> If for no other reason than you missed at least two implementations of
>> setgroups, in your implementation.
>
> I clearly didn't grep hard enough.  Grr.
>
>>
>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>>> namespace creator has no supplementary groups, doing so could be
>>> surprising and could have unpleasant interactions with setns(2).
>>>
>>> Any application that uses newgidmap(1) should be unaffected by this
>>> fix, but unprivileged users that create user namespaces to
>>> manipulate mounts or sandbox themselves will break until they start
>>> using "setgroups deny".
>>>
>>> This should fix CVE-2014-8989.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>>> ---
>>>
>>> Unlike v1, this *will* break things like Sandstorm.  Fixing them will be
>>> easy.  I agree that this will result in better long-term semantics, but
>>> I'm not so happy about breaking working software.
>>
>> I know what you mean.   One of the pieces of software broken by all of
>> this is my test to verify the remount semantics.   Which makes all of
>> this very unfortunate.
>>
>>> If this is unpalatable, here's a different option: get rid of all these
>>> permission checks and just change setgroups.  Specifically, make it so
>>> that setgroups(2) in a userns will succeed but will silently refuse to
>>> remove unmapped groups.
>>
>> Nope silently refusing to remove unmapped groups is not enough.  I can
>> make any gid in my supplemental groups my egid, it takes a sgid helper
>> application but I don't need any special privileges to create that.
>> Once that group is my egid I can map it.  Which means I could drop
>> any one group of my choosing without privielges.  Which out and out
>> breaks negative groups :(
>
> Whoops, right.  And you can, indeed, have egid match one of your
> supplementary groups.
>
>>
>> I got to looking and I have a significant piece of code that all of this
>> breaks.
>>
>> tools/testing/selftests/mount/unprivileged-remount-test.c
>>
>> So I am extra motivated to figure out at find a way to preserve most of
>> the existing functionality.  My regression tests won't pass until I can
>> find something pallateable.
>>
>> It is very annoying that every option I have considered so far breaks
>> something useful.
>>
>> Having a write once setgroups disable, and the allowing unprivileged
>> mappings after that seems the most palatable option I have seen,
>> semantically.  Which means existing software that doesn't care about
>> setgroups can just add the disable code and then work otherwise
>> unmodified.
>>
>> The other option that I have played with is forcing a set of groups
>> in setgroups if your user namespace was created without privilege,
>> that winds up requiring that verify you don't have any other
>> supplementary groups, and is generally messy whichever way I look at it.
>
> How bad would the automatic selection of setgroups behavior really be?
>
> Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
> "deny", or "auto".  It starts out as "auto" (or "deny" if it's set to
> "deny" in the parent).  Once any of the maps have been set,
> userns_options becomes readonly.  If you try to write to gid_map when
> setgroups == auto, then it switches to "allow" or "deny" depending on
> whether the writer has privilege.
>
> This is nasty magical behavior, but it should DTRT for existing users,
> and everyone can be updated to set the value explicitly.

Rarely is everything updated unless there is a requirement for an
update.

For my code that cares an update is necessary anyway as it contains
a gratuitous setgroups(0, NULL). 

Since we have to break applications breaking them loud and clear and
letting them set the flat to recover (if possible) seems the best we can
do.  That at least allows someone to ask if they depend on setgroups or
init_groups.

> FWIW, it might also make sense to move all of this stuff into
> /proc/PID/userns.  There may be races in which a setuid or otherwise
> privileged helper pokes at more than one userns file but actually
> modifies different namespaces each time.  I don't know whether these
> races matter.  uid_map, gid_map, and projid_map could be symlinks.

I don't see how moving these files as removing any races.

Eric


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged
  2014-12-02 19:45     ` Eric W. Biederman
@ 2014-12-02 20:13       ` Andy Lutomirski
  2014-12-02 20:25         ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 20:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 11:45 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>
>>>> Classic unix permission checks have an interesting feature.  The
>>>> group permissions for a file can be set to less than the other
>>>> permissions on a file.  Occasionally this is used deliberately to
>>>> give a certain group of users fewer permissions than the default.
>>>>
>>>> User namespaces break this usage.  Groups set in rgid or egid are
>>>> unaffected because an unprivileged user namespace creator can only
>>>> map a single group, so setresgid inside and outside the namespace
>>>> have the same effect.  However, an unprivileged user namespace
>>>> creator can currently use setgroups(2) to drop all supplementary
>>>> groups, so, if a supplementary group denies access to some resource,
>>>> user namespaces can be used to bypass that restriction.
>>>>
>>>> To fix this issue, this introduces a new user namespace flag
>>>> USERNS_SETGROUPS_ALLOWED.  If that flag is not set, then
>>>> setgroups(2) will fail regardless of the caller's capabilities.
>>>>
>>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace.  By
>>>> default, if the writer of gid_map has CAP_SETGID in the parent
>>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>>>> USERNS_SETGROUPS_ALLOWED will be set in the child.  If the writer is
>>>> not so privileged, then writing to gid_map will fail unless the
>>>> writer adds "setgroups deny" to gid_map, in which case the check is
>>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>>>
>>>> The full semantics are:
>>>>
>>>> If "setgroups allow" is present or no explicit "setgroups" setting
>>>> is written to gid_map, then writing to gid_map will fail with -EPERM
>>>> unless the opener and writer have CAP_SETGID in the parent namespace
>>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>>>
>>>> If "setgroups deny" is present, then writing gid_map will work as
>>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared.  This will
>>>> result in processes in the userns that have CAP_SETGID to be
>>>> nontheless unable to use setgroups(2).  If this breaks something
>>>> inside the userns, then this is okay -- the userns creator
>>>> specifically requested this behavior.
>>>
>>> I think we need to do this but I also think setgroups allow/deny
>>> should be a separate knob than the uid/gid mapping.
>>
>> Yeah.  It should be readable, too.
>>
>>>
>>> If for no other reason than you missed at least two implementations of
>>> setgroups, in your implementation.
>>
>> I clearly didn't grep hard enough.  Grr.
>>
>>>
>>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>>>> namespace creator has no supplementary groups, doing so could be
>>>> surprising and could have unpleasant interactions with setns(2).
>>>>
>>>> Any application that uses newgidmap(1) should be unaffected by this
>>>> fix, but unprivileged users that create user namespaces to
>>>> manipulate mounts or sandbox themselves will break until they start
>>>> using "setgroups deny".
>>>>
>>>> This should fix CVE-2014-8989.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>>>> ---
>>>>
>>>> Unlike v1, this *will* break things like Sandstorm.  Fixing them will be
>>>> easy.  I agree that this will result in better long-term semantics, but
>>>> I'm not so happy about breaking working software.
>>>
>>> I know what you mean.   One of the pieces of software broken by all of
>>> this is my test to verify the remount semantics.   Which makes all of
>>> this very unfortunate.
>>>
>>>> If this is unpalatable, here's a different option: get rid of all these
>>>> permission checks and just change setgroups.  Specifically, make it so
>>>> that setgroups(2) in a userns will succeed but will silently refuse to
>>>> remove unmapped groups.
>>>
>>> Nope silently refusing to remove unmapped groups is not enough.  I can
>>> make any gid in my supplemental groups my egid, it takes a sgid helper
>>> application but I don't need any special privileges to create that.
>>> Once that group is my egid I can map it.  Which means I could drop
>>> any one group of my choosing without privielges.  Which out and out
>>> breaks negative groups :(
>>
>> Whoops, right.  And you can, indeed, have egid match one of your
>> supplementary groups.
>>
>>>
>>> I got to looking and I have a significant piece of code that all of this
>>> breaks.
>>>
>>> tools/testing/selftests/mount/unprivileged-remount-test.c
>>>
>>> So I am extra motivated to figure out at find a way to preserve most of
>>> the existing functionality.  My regression tests won't pass until I can
>>> find something pallateable.
>>>
>>> It is very annoying that every option I have considered so far breaks
>>> something useful.
>>>
>>> Having a write once setgroups disable, and the allowing unprivileged
>>> mappings after that seems the most palatable option I have seen,
>>> semantically.  Which means existing software that doesn't care about
>>> setgroups can just add the disable code and then work otherwise
>>> unmodified.
>>>
>>> The other option that I have played with is forcing a set of groups
>>> in setgroups if your user namespace was created without privilege,
>>> that winds up requiring that verify you don't have any other
>>> supplementary groups, and is generally messy whichever way I look at it.
>>
>> How bad would the automatic selection of setgroups behavior really be?
>>
>> Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
>> "deny", or "auto".  It starts out as "auto" (or "deny" if it's set to
>> "deny" in the parent).  Once any of the maps have been set,
>> userns_options becomes readonly.  If you try to write to gid_map when
>> setgroups == auto, then it switches to "allow" or "deny" depending on
>> whether the writer has privilege.
>>
>> This is nasty magical behavior, but it should DTRT for existing users,
>> and everyone can be updated to set the value explicitly.
>
> Rarely is everything updated unless there is a requirement for an
> update.
>
> For my code that cares an update is necessary anyway as it contains
> a gratuitous setgroups(0, NULL).
>
> Since we have to break applications breaking them loud and clear and
> letting them set the flat to recover (if possible) seems the best we can
> do.  That at least allows someone to ask if they depend on setgroups or
> init_groups.

Fair enough.

Any thoughts on what the API should be for v3?

>
>> FWIW, it might also make sense to move all of this stuff into
>> /proc/PID/userns.  There may be races in which a setuid or otherwise
>> privileged helper pokes at more than one userns file but actually
>> modifies different namespaces each time.  I don't know whether these
>> races matter.  uid_map, gid_map, and projid_map could be symlinks.
>
> I don't see how moving these files as removing any races.

It helps if you use openat to open the userns directory and of the
/proc infrastructure is smart enough to make that work.

Admittedly, I don't actually see a dangerous race right now.

--Andy

>
> Eric
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 20:13       ` Andy Lutomirski
@ 2014-12-02 20:25         ` Eric W. Biederman
  2014-12-02 20:28           ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
  2014-12-02 20:58           ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski
  0 siblings, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 20:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


Classic unix permission checks have an interesting feature, the group
permissions for a file can be set to less than the other permissions
on a file.  Occassionally this is used deliberately to give a certain
group of users fewer permissions than the default.

Overlooking negative groups has resulted in the permission checks for
setting up a group mapping in a user namespace to be too lax.  Tighten
the permission checks in new_idmap_permitted to ensure that mapping
uids and gids into user namespaces without privilege will not result
in new combinations of credentials being available to the users.

When setting mappings without privilege only the creator of the user
namespace is interesting as all other users that have CAP_SETUID over
the user namespace will also have CAP_SETUID over the user namespaces
parent.  So the scope of the unprivileged check is reduced to just
the case where cred->euid is the namespace creator.

For setting a uid mapping without privilege only euid is considered as
setresuid can set uid, suid and fsuid from euid without privielege
making any combination of uids possible with user namespaces already
possible without them.

For now seeting a gid mapping without privilege is removed.  The only
possible set of credentials that would be safe without a gid mapping
(egid without any supplementary groups) just doesn't happen in practice
so would simply lead to unused untested code.

setgroups is modified to fail not only when the group ids do not
map but also when there are no gid mappings at all, preventing
setgroups(0, NULL) from succeeding when gid mappings have not been
established.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivileged-remount-test.c

Most of the removed functionality will be added back with the
addition of a one way knob to disable setgroups.  Once setgroups
is disabled setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map with
privilege this change will have no effect on them.

This should fix CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/s390/kernel/compat_linux.c |  5 ++++-
 include/linux/user_namespace.h  | 10 ++++++++++
 kernel/groups.c                 |  5 ++++-
 kernel/uid16.c                  |  5 ++++-
 kernel/user_namespace.c         | 17 ++++++++++-------
 5 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ca38139423ae..21c91feeca2d 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -49,6 +49,7 @@
 #include <linux/fadvise.h>
 #include <linux/ipc.h>
 #include <linux/slab.h>
+#include <linux/user_namespace.h>
 
 #include <asm/types.h>
 #include <asm/uaccess.h>
@@ -246,10 +247,12 @@ out:
 
 COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int retval;
 
-	if (!capable(CAP_SETGID))
+	if (!gid_mapping_possible(user_ns) ||
+	    !capable(CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..26d5e8f5db97 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -46,6 +46,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
 	return ns;
 }
 
+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+	return ns->gid_map.nr_extents != 0;
+}
+
 extern int create_user_ns(struct cred *new);
 extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred);
 extern void free_user_ns(struct user_namespace *ns);
@@ -70,6 +75,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
 	return &init_user_ns;
 }
 
+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+	return true;
+}
+
 static inline int create_user_ns(struct cred *new)
 {
 	return -EINVAL;
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..b9a6a5c7e100 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -220,10 +221,12 @@ out:
 
 SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!gid_mapping_possible(user_ns) ||
+	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bbbceff..602c7de2aa11 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -13,6 +13,7 @@
 #include <linux/highuid.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 
 #include <asm/uaccess.h>
 
@@ -173,10 +174,12 @@ out:
 
 SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!gid_mapping_possible(user_ns) ||
+	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..51d65b444951 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
-	/* Allow mapping to your own filesystem ids */
-	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+	const struct cred *cred = file->f_cred;
+
+	/* Allow a mapping without capabilities when allowing the root
+	 * of the user namespace capabilities restricted to that id
+	 * will not change the set of credentials available to that
+	 * user.
+	 */
+	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->fsuid))
-				return true;
-		} else if (cap_setid == CAP_SETGID) {
-			kgid_t gid = make_kgid(ns->parent, id);
-			if (gid_eq(gid, file->f_cred->fsgid))
+			if (uid_eq(uid, cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 20:25         ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman
@ 2014-12-02 20:28           ` Eric W. Biederman
  2014-12-02 20:30             ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman
  2014-12-02 21:05             ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski
  2014-12-02 20:58           ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski
  1 sibling, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 20:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of 0 means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of 1 means the segtoups system call is enabled.

- Descedent user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls
  currently do not pass in a struct file so file_ns_capable
  is unusable.

- Update new_idmap_permitted to allow unprivileged users to
  establish a mapping of their own gid, as such mappings
  can no longer allow dropping of supplemental groups.

This set of changes restores as much as possible the functionality
that was lost when new_idmap_permitted was modified to not allow
mappinges to be established without privilege.

As this fixes a regression from: "userns: Avoid problems with negative groups"
it is probably a candidate for a backport.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

This patch still needs a little bit of love.
I need to take a hard look at the interaction of barriers and atomic ops,
and it seems I have at least one comment fix that needs to move elsewhere.

But this should be enough to move the conversation forward.

 arch/s390/kernel/compat_linux.c |  1 +
 fs/proc/base.c                  | 31 ++++++++++----
 include/linux/user_namespace.h  |  3 ++
 kernel/groups.c                 |  1 +
 kernel/uid16.c                  |  1 +
 kernel/user.c                   |  1 +
 kernel/user_namespace.c         | 95 ++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 21c91feeca2d..6d0ee1b089fb 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
 	int retval;
 
 	if (!gid_mapping_possible(user_ns) ||
+	    !atomic_read(&user_ns->setgroups_allowed) ||
 	    !capable(CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
 #endif /* CONFIG_TASK_IO_ACCOUNTING */
 
 #ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
 	const struct seq_operations *seq_ops)
 {
 	struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
 	return ret;
 }
 
-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *seq = file->private_data;
 	struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)
 
 static int proc_uid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+	return proc_userns_open(inode, file, &proc_uid_seq_operations);
 }
 
 static int proc_gid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+	return proc_userns_open(inode, file, &proc_gid_seq_operations);
 }
 
 static int proc_projid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+	return proc_userns_open(inode, file, &proc_projid_seq_operations);
 }
 
 static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
 	.write		= proc_uid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
 	.write		= proc_gid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
 	.write		= proc_projid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+	return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+	.open		= proc_setgroups_open,
+	.write		= proc_setgroups_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_userns_release,
 };
 #endif /* CONFIG_USER_NS */
 
@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	REG("timers",	  S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 };
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 26d5e8f5db97..1e8cb168b1d0 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	atomic_t		setgroups_allowed;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -65,9 +66,11 @@ struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
 extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index b9a6a5c7e100..467ae954e859 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -226,6 +226,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 	int retval;
 
 	if (!gid_mapping_possible(user_ns) ||
+	    !atomic_read(&user_ns->setgroups_allowed) ||
 	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602c7de2aa11..096962fa1975 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -179,6 +179,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 	int retval;
 
 	if (!gid_mapping_possible(user_ns) ||
+	    !atomic_read(&user_ns->setgroups_allowed) ||
 	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..0d78759f7dbe 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.setgroups_allowed	= ATOMIC_INIT(1),
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 51d65b444951..521c8d53ee17 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -98,6 +98,7 @@ int create_user_ns(struct cred *new)
 	ns->level = parent_ns->level + 1;
 	ns->owner = owner;
 	ns->group = group;
+	atomic_set(&ns->setgroups_allowed, atomic_read(&parent_ns->setgroups_allowed));
 
 	set_cred_user_ns(new, ns);
 
@@ -640,7 +641,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	if (!page)
 		goto out;
 
-	/* Only allow <= page size writes at the beginning of the file */
+	/* Only allow < page size writes at the beginning of the file */
 	ret = -EINVAL;
 	if ((*ppos != 0) || (count >= PAGE_SIZE))
 		goto out;
@@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, cred->euid))
 				return true;
+		} else if (cap_setid == CAP_SETGID) {
+			kgid_t gid = make_kgid(ns->parent, id);
+			if (!atomic_read(&ns->setgroups_allowed) &&
+			    gid_eq(gid, cred->egid))
+				return true;
 		}
 	}
 
@@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return (*ppos == 0) ?  ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+	++*ppos;
+	return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+
+	seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
+	return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+	.start	= setgroups_m_start,
+	.stop = setgroups_m_stop,
+	.next = setgroups_m_next,
+	.show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	char kbuf[3];
+	int setgroups_allowed;
+	ssize_t ret;
+
+	ret = -EPERM;
+	if (!file_ns_capable(file, ns, CAP_SETGID))
+		goto out;
+
+	/* Only allow a very narrow range of strings to be written */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
+		goto out;
+
+	/* What was written? */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+
+	/* What is being requested? */
+	ret = -EINVAL;
+	if (kbuf[0] == '0')
+		setgroups_allowed = 0;
+	else if (kbuf[0] == '1')
+		setgroups_allowed = 1;
+	else
+		goto out;
+
+	/* Allow a trailing newline */
+	ret = -EINVAL;
+	if ((count == 2) && (kbuf[1] != '\n'))
+		goto out;
+
+
+	if (setgroups_allowed) {
+		ret = -EINVAL;
+		if (atomic_read(&ns->setgroups_allowed) == 0)
+			goto out;
+	} else {
+		atomic_set(&ns->setgroups_allowed, 0);
+		/* sigh memory barriers! */
+	}
+
+	/* Report a successful write */
+	*ppos = count;
+	ret = count;
+out:
+	return ret;
+}
+
 static void *userns_get(struct task_struct *task)
 {
 	struct user_namespace *user_ns;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests
  2014-12-02 20:28           ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-02 20:30             ` Eric W. Biederman
  2014-12-02 21:05             ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski
  1 sibling, 0 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 20:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


A security fix in caused the way the unprivileged remount tests were
using user namespaces to break.  Tweak the way user namespaces are
being used so the test works again.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

This is what it takes to fix a broken application, in it's full glory.
This fix works even if new functionality does not exist.

 tools/testing/selftests/mount/unprivileged-remount-test.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 9669d375625a..d47227494137 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -144,13 +144,12 @@ static void create_and_enter_userns(void)
 			strerror(errno));
 	}
 
+	if (access("/proc/self/setgroups", F_OK) == 0) {
+		write_file("/proc/self/setgroups", "0");
+	}
 	write_file("/proc/self/uid_map", "0 %d 1", uid);
 	write_file("/proc/self/gid_map", "0 %d 1", gid);
 
-	if (setgroups(0, NULL) != 0) {
-		die("setgroups failed: %s\n",
-			strerror(errno));
-	}
 	if (setgid(0) != 0) {
 		die ("setgid(0) failed %s\n",
 			strerror(errno));
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 20:25         ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman
  2014-12-02 20:28           ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-02 20:58           ` Andy Lutomirski
  2014-12-02 21:26             ` Eric W. Biederman
  1 sibling, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 20:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Classic unix permission checks have an interesting feature, the group
> permissions for a file can be set to less than the other permissions
> on a file.  Occassionally this is used deliberately to give a certain
> group of users fewer permissions than the default.
>
> Overlooking negative groups has resulted in the permission checks for
> setting up a group mapping in a user namespace to be too lax.  Tighten
> the permission checks in new_idmap_permitted to ensure that mapping
> uids and gids into user namespaces without privilege will not result
> in new combinations of credentials being available to the users.
>
> When setting mappings without privilege only the creator of the user
> namespace is interesting as all other users that have CAP_SETUID over
> the user namespace will also have CAP_SETUID over the user namespaces
> parent.  So the scope of the unprivileged check is reduced to just
> the case where cred->euid is the namespace creator.
>
> For setting a uid mapping without privilege only euid is considered as
> setresuid can set uid, suid and fsuid from euid without privielege
> making any combination of uids possible with user namespaces already
> possible without them.
>
> For now seeting a gid mapping without privilege is removed.  The only
> possible set of credentials that would be safe without a gid mapping
> (egid without any supplementary groups) just doesn't happen in practice
> so would simply lead to unused untested code.
>
> setgroups is modified to fail not only when the group ids do not
> map but also when there are no gid mappings at all, preventing
> setgroups(0, NULL) from succeeding when gid mappings have not been
> established.
>
> For a small class of applications this change breaks userspace
> and removes useful functionality.  This small class of applications
> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>
> Most of the removed functionality will be added back with the
> addition of a one way knob to disable setgroups.  Once setgroups
> is disabled setting the gid_map becomes as safe as setting the uid_map.
>
> For more common applications that set the uid_map and the gid_map with
> privilege this change will have no effect on them.
>
> This should fix CVE-2014-8989.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---

>
> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
> +{
> +       return ns->gid_map.nr_extents != 0;
> +}
> +

Can you rename this to userns_may_setgroups or something like that?
To me, gid_mapping_possible sounds like you're allowed to map gids,
which sounds like the opposite condition, and it doesn't explain what
the point is.


> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index aa312b0dc3ec..51d65b444951 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>                                 struct user_namespace *ns, int cap_setid,
>                                 struct uid_gid_map *new_map)
>  {
> -       /* Allow mapping to your own filesystem ids */
> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
> +       const struct cred *cred = file->f_cred;
> +
> +       /* Allow a mapping without capabilities when allowing the root
> +        * of the user namespace capabilities restricted to that id
> +        * will not change the set of credentials available to that
> +        * user.
> +        */
> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
> +           uid_eq(ns->owner, cred->euid)) {

What's uid_eq(ns->owner, cred->euid)) for?  This should already be covered by:

    if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
        goto out;

(except that I don't know why cap_valid(cap_setid) is checked -- this
ought to be enforced for projid_map, too, right?)

>                 u32 id = new_map->extent[0].lower_first;
>                 if (cap_setid == CAP_SETUID) {
>                         kuid_t uid = make_kuid(ns->parent, id);
> -                       if (uid_eq(uid, file->f_cred->fsuid))
> -                               return true;
> -               } else if (cap_setid == CAP_SETGID) {
> -                       kgid_t gid = make_kgid(ns->parent, id);
> -                       if (gid_eq(gid, file->f_cred->fsgid))
> +                       if (uid_eq(uid, cred->euid))

Why'd you change this from fsuid to euid?

--Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 20:28           ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
  2014-12-02 20:30             ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman
@ 2014-12-02 21:05             ` Andy Lutomirski
  2014-12-02 21:45               ` Eric W. Biederman
  1 sibling, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 21:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups

Can you rename this to something clearer, e.g. userns_setgroups_mode?

>
>   A value of 0 means the setgroups system call is disabled in the
>   current processes user namespace and can not be enabled in the
>   future in this user namespace.
>
>   A value of 1 means the segtoups system call is enabled.

Would it make more sense to put strings like "allow" and "deny" in
here?  That way, future extensions could add additional values.

> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index 21c91feeca2d..6d0ee1b089fb 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>         int retval;
>
>         if (!gid_mapping_possible(user_ns) ||
> +           !atomic_read(&user_ns->setgroups_allowed) ||
>             !capable(CAP_SETGID))
>                 return -EPERM;

This is now incomprehensible because of the gid_mapping_possible
thing.  If you renamed gid_mapping_possible to
userns_setgroup_allowed, then this could be added to the
implementation, and this would all make sense (not to mention avoiding
duplicating this thing).

> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>                         kuid_t uid = make_kuid(ns->parent, id);
>                         if (uid_eq(uid, cred->euid))
>                                 return true;
> +               } else if (cap_setid == CAP_SETGID) {
> +                       kgid_t gid = make_kgid(ns->parent, id);
> +                       if (!atomic_read(&ns->setgroups_allowed) &&
> +                           gid_eq(gid, cred->egid))
> +                               return true;

I still don't see why egid is any better than fsgid here.

>                 }
>         }
>
> @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
>         return false;
>  }
>
> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +       struct user_namespace *ns = seq->private;
> +
> +       return (*ppos == 0) ?  ns : NULL;
> +}
> +
> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
> +{
> +       ++*ppos;
> +       return NULL;
> +}
> +
> +static void setgroups_m_stop(struct seq_file *seq, void *v)
> +{
> +}
> +
> +static int setgroups_m_show(struct seq_file *seq, void *v)
> +{
> +       struct user_namespace *ns = seq->private;
> +
> +       seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
> +       return 0;
> +}
> +
> +const struct seq_operations proc_setgroups_seq_operations = {
> +       .start  = setgroups_m_start,
> +       .stop = setgroups_m_stop,
> +       .next = setgroups_m_next,
> +       .show = setgroups_m_show,
> +};
> +
> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *ppos)
> +{
> +       struct seq_file *seq = file->private_data;
> +       struct user_namespace *ns = seq->private;
> +       char kbuf[3];
> +       int setgroups_allowed;
> +       ssize_t ret;
> +
> +       ret = -EPERM;
> +       if (!file_ns_capable(file, ns, CAP_SETGID))
> +               goto out;

CAP_SYS_ADMIN?  This isn't setting a gid in the namespace; it's
reconfiguring the namespace.

> +
> +       /* Only allow a very narrow range of strings to be written */
> +       ret = -EINVAL;
> +       if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
> +               goto out;
> +
> +       /* What was written? */
> +       ret = -EFAULT;
> +       if (copy_from_user(kbuf, buf, count))
> +               goto out;
> +       kbuf[count] = '\0';
> +
> +       /* What is being requested? */
> +       ret = -EINVAL;
> +       if (kbuf[0] == '0')
> +               setgroups_allowed = 0;
> +       else if (kbuf[0] == '1')
> +               setgroups_allowed = 1;
> +       else
> +               goto out;
> +
> +       /* Allow a trailing newline */
> +       ret = -EINVAL;
> +       if ((count == 2) && (kbuf[1] != '\n'))
> +               goto out;
> +
> +
> +       if (setgroups_allowed) {
> +               ret = -EINVAL;
> +               if (atomic_read(&ns->setgroups_allowed) == 0)
> +                       goto out;
> +       } else {

I would disallow this if gid_map has been written in the interest of sanity.

> +               atomic_set(&ns->setgroups_allowed, 0);
> +               /* sigh memory barriers! */

I don't think that any barriers are needed.  If you ever observe
setgroups_allowed == 0, it will stay that way forever.

> +       }
> +
> +       /* Report a successful write */
> +       *ppos = count;
> +       ret = count;
> +out:
> +       return ret;
> +}
> +
>  static void *userns_get(struct task_struct *task)
>  {
>         struct user_namespace *user_ns;

--Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 20:58           ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski
@ 2014-12-02 21:26             ` Eric W. Biederman
  2014-12-02 22:09               ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 21:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Classic unix permission checks have an interesting feature, the group
>> permissions for a file can be set to less than the other permissions
>> on a file.  Occassionally this is used deliberately to give a certain
>> group of users fewer permissions than the default.
>>
>> Overlooking negative groups has resulted in the permission checks for
>> setting up a group mapping in a user namespace to be too lax.  Tighten
>> the permission checks in new_idmap_permitted to ensure that mapping
>> uids and gids into user namespaces without privilege will not result
>> in new combinations of credentials being available to the users.
>>
>> When setting mappings without privilege only the creator of the user
>> namespace is interesting as all other users that have CAP_SETUID over
>> the user namespace will also have CAP_SETUID over the user namespaces
>> parent.  So the scope of the unprivileged check is reduced to just
>> the case where cred->euid is the namespace creator.
>>
>> For setting a uid mapping without privilege only euid is considered as
>> setresuid can set uid, suid and fsuid from euid without privielege
>> making any combination of uids possible with user namespaces already
>> possible without them.
>>
>> For now seeting a gid mapping without privilege is removed.  The only
>> possible set of credentials that would be safe without a gid mapping
>> (egid without any supplementary groups) just doesn't happen in practice
>> so would simply lead to unused untested code.
>>
>> setgroups is modified to fail not only when the group ids do not
>> map but also when there are no gid mappings at all, preventing
>> setgroups(0, NULL) from succeeding when gid mappings have not been
>> established.
>>
>> For a small class of applications this change breaks userspace
>> and removes useful functionality.  This small class of applications
>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>
>> Most of the removed functionality will be added back with the
>> addition of a one way knob to disable setgroups.  Once setgroups
>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>
>> For more common applications that set the uid_map and the gid_map with
>> privilege this change will have no effect on them.
>>
>> This should fix CVE-2014-8989.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>
>>
>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>> +{
>> +       return ns->gid_map.nr_extents != 0;
>> +}
>> +
>
> Can you rename this to userns_may_setgroups or something like that?
> To me, gid_mapping_possible sounds like you're allowed to map gids,
> which sounds like the opposite condition, and it doesn't explain what
> the point is.

gid_mapping_established?

What I mean to be testing if is if from_kgid and make_kgid will work
because the gid mappings have been set.

The userns knob for setgroups is a different test and is added
in the next patch.  And yes we really need both or the knob can
start out as on, and we need to provent setgroups(0, NULL)
before the user namespace is unshared.

Although come to think about it probably makes sense to roll those two
test into one function and call that inline function from the setgroups
implementation.

Anyway I will think about it and see what I can do to make it easily
comprehensible.

>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index aa312b0dc3ec..51d65b444951 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>                                 struct user_namespace *ns, int cap_setid,
>>                                 struct uid_gid_map *new_map)
>>  {
>> -       /* Allow mapping to your own filesystem ids */
>> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>> +       const struct cred *cred = file->f_cred;
>> +
>> +       /* Allow a mapping without capabilities when allowing the root
>> +        * of the user namespace capabilities restricted to that id
>> +        * will not change the set of credentials available to that
>> +        * user.
>> +        */
>> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>> +           uid_eq(ns->owner, cred->euid)) {
>
> What's uid_eq(ns->owner, cred->euid)) for?  This should already be covered by:

This means that the only user we attempt to set up unprivileged mappings
for is the owner of the user namespace.  Anyone else should already
have capabilities in the parent user namespace or shouldn't be able to
set the mapping at all.

In practice it is a clarification to make it easier to think about the code.

>     if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>         goto out;
>
> (except that I don't know why cap_valid(cap_setid) is checked -- this
> ought to be enforced for projid_map, too, right?)

What to do with projid_map is entirely different discussion.  In
practice it is dead, and either XFS needs to be fixed to use it
or that code needs to be removed.  At the time I wrote it XFS
did not require any privileges to set project ids.

>>                 u32 id = new_map->extent[0].lower_first;
>>                 if (cap_setid == CAP_SETUID) {
>>                         kuid_t uid = make_kuid(ns->parent, id);
>> -                       if (uid_eq(uid, file->f_cred->fsuid))
>> -                               return true;
>> -               } else if (cap_setid == CAP_SETGID) {
>> -                       kgid_t gid = make_kgid(ns->parent, id);
>> -                       if (gid_eq(gid, file->f_cred->fsgid))
>> +                       if (uid_eq(uid, cred->euid))
>
> Why'd you change this from fsuid to euid?

Because strangely enough I can set euid to any other uid with
setresuid, but the same does not hold with fsuid.

So strictly speaking fsuid was actually wrong before.  In practice
fsuid == euid so I don't think anyone will care.  But I want very much
to enforce the rule that user namespaces can't give you any credentials
you couldn't get otherwise.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 21:05             ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski
@ 2014-12-02 21:45               ` Eric W. Biederman
  2014-12-02 22:17                 ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 21:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
> Can you rename this to something clearer, e.g. userns_setgroups_mode?

I am not certain that is any clearer.  That just reads as wordier.

The userns bit is definitely confusing and wrong.  Why should we need to
spell out the scope?

>>   A value of 0 means the setgroups system call is disabled in the
>>   current processes user namespace and can not be enabled in the
>>   future in this user namespace.
>>
>>   A value of 1 means the segtoups system call is enabled.
>
> Would it make more sense to put strings like "allow" and "deny" in
> here?  That way, future extensions could add additional values.

If the implementation of the write side isn't too bad.  I would love
to see precedent elsewhere in the kernel.    What I don't expect to do
is have any values except setgroups are enabled and setgroups are
disabled.

I am fine with allowing for the possibility (that is just good
engineering) but I don't intend to seriously considering or
implementing other possibilities.

>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>> index 21c91feeca2d..6d0ee1b089fb 100644
>> --- a/arch/s390/kernel/compat_linux.c
>> +++ b/arch/s390/kernel/compat_linux.c
>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>         int retval;
>>
>>         if (!gid_mapping_possible(user_ns) ||
>> +           !atomic_read(&user_ns->setgroups_allowed) ||
>>             !capable(CAP_SETGID))
>>                 return -EPERM;
>
> This is now incomprehensible because of the gid_mapping_possible
> thing.  If you renamed gid_mapping_possible to
> userns_setgroup_allowed, then this could be added to the
> implementation, and this would all make sense (not to mention avoiding
> duplicating this thing).
>
>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>                         kuid_t uid = make_kuid(ns->parent, id);
>>                         if (uid_eq(uid, cred->euid))
>>                                 return true;
>> +               } else if (cap_setid == CAP_SETGID) {
>> +                       kgid_t gid = make_kgid(ns->parent, id);
>> +                       if (!atomic_read(&ns->setgroups_allowed) &&
>> +                           gid_eq(gid, cred->egid))
>> +                               return true;
>
> I still don't see why egid is any better than fsgid here.

Answered in my earlier response fsgid was a goof.
I can set any gid to my egid with my existing permissions.
Show me how I can do that with fsgid or fsuid and I will be happy to use
those.


>>                 }
>>         }
>>
>> @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file,
>>         return false;
>>  }
>>
>> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
>> +{
>> +       struct user_namespace *ns = seq->private;
>> +
>> +       return (*ppos == 0) ?  ns : NULL;
>> +}
>> +
>> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
>> +{
>> +       ++*ppos;
>> +       return NULL;
>> +}
>> +
>> +static void setgroups_m_stop(struct seq_file *seq, void *v)
>> +{
>> +}
>> +
>> +static int setgroups_m_show(struct seq_file *seq, void *v)
>> +{
>> +       struct user_namespace *ns = seq->private;
>> +
>> +       seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed));
>> +       return 0;
>> +}
>> +
>> +const struct seq_operations proc_setgroups_seq_operations = {
>> +       .start  = setgroups_m_start,
>> +       .stop = setgroups_m_stop,
>> +       .next = setgroups_m_next,
>> +       .show = setgroups_m_show,
>> +};
>> +
>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>> +                            size_t count, loff_t *ppos)
>> +{
>> +       struct seq_file *seq = file->private_data;
>> +       struct user_namespace *ns = seq->private;
>> +       char kbuf[3];
>> +       int setgroups_allowed;
>> +       ssize_t ret;
>> +
>> +       ret = -EPERM;
>> +       if (!file_ns_capable(file, ns, CAP_SETGID))
>> +               goto out;
>
> CAP_SYS_ADMIN?  This isn't setting a gid in the namespace; it's
> reconfiguring the namespace.

Hmm.  Maybe.  It is an activity that is normally controlled by
CAP_SETGID.

Frankly I think the entire split up of the capability model is almost
totally broken.  But I think CAP_SETGID is a close approximation of the
right thing here.

>> +       /* Only allow a very narrow range of strings to be written */
>> +       ret = -EINVAL;
>> +       if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>> +               goto out;
>> +
>> +       /* What was written? */
>> +       ret = -EFAULT;
>> +       if (copy_from_user(kbuf, buf, count))
>> +               goto out;
>> +       kbuf[count] = '\0';
>> +
>> +       /* What is being requested? */
>> +       ret = -EINVAL;
>> +       if (kbuf[0] == '0')
>> +               setgroups_allowed = 0;
>> +       else if (kbuf[0] == '1')
>> +               setgroups_allowed = 1;
>> +       else
>> +               goto out;
>> +
>> +       /* Allow a trailing newline */
>> +       ret = -EINVAL;
>> +       if ((count == 2) && (kbuf[1] != '\n'))
>> +               goto out;
>> +
>> +
>> +       if (setgroups_allowed) {
>> +               ret = -EINVAL;
>> +               if (atomic_read(&ns->setgroups_allowed) == 0)
>> +                       goto out;
>> +       } else {
>
> I would disallow this if gid_map has been written in the interest of
> sanity.

Not a chance.  That is part of making this an independent knob.  If
there is another reason for disabling setgroups you can flip this
knob even after mappings are established.

>> +               atomic_set(&ns->setgroups_allowed, 0);
>> +               /* sigh memory barriers! */
>
> I don't think that any barriers are needed.  If you ever observe
> setgroups_allowed == 0, it will stay that way forever.

Likely.   In practice the code works today.

But I need to review things closely to understand if there are barriers
needed.  But especially since it is a write once knob we can get away
with a lot.

>> +       }
>> +
>> +       /* Report a successful write */
>> +       *ppos = count;
>> +       ret = count;
>> +out:
>> +       return ret;
>> +}
>> +
>>  static void *userns_get(struct task_struct *task)
>>  {
>>         struct user_namespace *user_ns;
>
> --Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 21:26             ` Eric W. Biederman
@ 2014-12-02 22:09               ` Andy Lutomirski
  2014-12-02 22:48                 ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 22:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> Classic unix permission checks have an interesting feature, the group
>>> permissions for a file can be set to less than the other permissions
>>> on a file.  Occassionally this is used deliberately to give a certain
>>> group of users fewer permissions than the default.
>>>
>>> Overlooking negative groups has resulted in the permission checks for
>>> setting up a group mapping in a user namespace to be too lax.  Tighten
>>> the permission checks in new_idmap_permitted to ensure that mapping
>>> uids and gids into user namespaces without privilege will not result
>>> in new combinations of credentials being available to the users.
>>>
>>> When setting mappings without privilege only the creator of the user
>>> namespace is interesting as all other users that have CAP_SETUID over
>>> the user namespace will also have CAP_SETUID over the user namespaces
>>> parent.  So the scope of the unprivileged check is reduced to just
>>> the case where cred->euid is the namespace creator.
>>>
>>> For setting a uid mapping without privilege only euid is considered as
>>> setresuid can set uid, suid and fsuid from euid without privielege
>>> making any combination of uids possible with user namespaces already
>>> possible without them.
>>>
>>> For now seeting a gid mapping without privilege is removed.  The only
>>> possible set of credentials that would be safe without a gid mapping
>>> (egid without any supplementary groups) just doesn't happen in practice
>>> so would simply lead to unused untested code.
>>>
>>> setgroups is modified to fail not only when the group ids do not
>>> map but also when there are no gid mappings at all, preventing
>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>> established.
>>>
>>> For a small class of applications this change breaks userspace
>>> and removes useful functionality.  This small class of applications
>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>
>>> Most of the removed functionality will be added back with the
>>> addition of a one way knob to disable setgroups.  Once setgroups
>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>
>>> For more common applications that set the uid_map and the gid_map with
>>> privilege this change will have no effect on them.
>>>
>>> This should fix CVE-2014-8989.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>
>>>
>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>> +{
>>> +       return ns->gid_map.nr_extents != 0;
>>> +}
>>> +
>>
>> Can you rename this to userns_may_setgroups or something like that?
>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>> which sounds like the opposite condition, and it doesn't explain what
>> the point is.
>
> gid_mapping_established?
>
> What I mean to be testing if is if from_kgid and make_kgid will work
> because the gid mappings have been set.

But why do you care whether from_kgid and make_kgid will work?  If
they fail, then they fail.  I think that the point is that you're
checking whether allowing setgroups to drop groups is safe, and that's
only barely the same condition.

>
> The userns knob for setgroups is a different test and is added
> in the next patch.  And yes we really need both or the knob can
> start out as on, and we need to provent setgroups(0, NULL)
> before the user namespace is unshared.

Do you mean before it's mapped?

>
> Although come to think about it probably makes sense to roll those two
> test into one function and call that inline function from the setgroups
> implementation.

That's what I think, too.

>
> Anyway I will think about it and see what I can do to make it easily
> comprehensible.
>
>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index aa312b0dc3ec..51d65b444951 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>>                                 struct user_namespace *ns, int cap_setid,
>>>                                 struct uid_gid_map *new_map)
>>>  {
>>> -       /* Allow mapping to your own filesystem ids */
>>> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>> +       const struct cred *cred = file->f_cred;
>>> +
>>> +       /* Allow a mapping without capabilities when allowing the root
>>> +        * of the user namespace capabilities restricted to that id
>>> +        * will not change the set of credentials available to that
>>> +        * user.
>>> +        */
>>> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>> +           uid_eq(ns->owner, cred->euid)) {
>>
>> What's uid_eq(ns->owner, cred->euid)) for?  This should already be covered by:
>
> This means that the only user we attempt to set up unprivileged mappings
> for is the owner of the user namespace.  Anyone else should already
> have capabilities in the parent user namespace or shouldn't be able to
> set the mapping at all.
>
> In practice it is a clarification to make it easier to think about the code.

But why?  I don't see why this check is necessary or why it's relevant
to the current issue.

>
>>     if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>         goto out;
>>
>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>> ought to be enforced for projid_map, too, right?)
>
> What to do with projid_map is entirely different discussion.  In
> practice it is dead, and either XFS needs to be fixed to use it
> or that code needs to be removed.  At the time I wrote it XFS
> did not require any privileges to set project ids.
>
>>>                 u32 id = new_map->extent[0].lower_first;
>>>                 if (cap_setid == CAP_SETUID) {
>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>> -                       if (uid_eq(uid, file->f_cred->fsuid))
>>> -                               return true;
>>> -               } else if (cap_setid == CAP_SETGID) {
>>> -                       kgid_t gid = make_kgid(ns->parent, id);
>>> -                       if (gid_eq(gid, file->f_cred->fsgid))
>>> +                       if (uid_eq(uid, cred->euid))
>>
>> Why'd you change this from fsuid to euid?
>
> Because strangely enough I can set euid to any other uid with
> setresuid, but the same does not hold with fsuid.
>
> So strictly speaking fsuid was actually wrong before.  In practice
> fsuid == euid so I don't think anyone will care.  But I want very much
> to enforce the rule that user namespaces can't give you any credentials
> you couldn't get otherwise.

Fair enough.  Want to split that into a separate patch, then?

--Andy

>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 21:45               ` Eric W. Biederman
@ 2014-12-02 22:17                 ` Andy Lutomirski
  2014-12-02 23:07                   ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 22:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>
>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>
> I am not certain that is any clearer.  That just reads as wordier.
>
> The userns bit is definitely confusing and wrong.  Why should we need to
> spell out the scope?

Because it's a property of the process' userns, not a property of the process.

userns_setgroups would be okay.  (Arguably it should've been
userns_uid_map, too, but at least uid_map sounds obviously
namespace-related.)

>
>>>   A value of 0 means the setgroups system call is disabled in the
>>>   current processes user namespace and can not be enabled in the
>>>   future in this user namespace.
>>>
>>>   A value of 1 means the segtoups system call is enabled.
>>
>> Would it make more sense to put strings like "allow" and "deny" in
>> here?  That way, future extensions could add additional values.
>
> If the implementation of the write side isn't too bad.  I would love
> to see precedent elsewhere in the kernel.    What I don't expect to do
> is have any values except setgroups are enabled and setgroups are
> disabled.

current_clocksource?  I think there are lots of things like that.

>
> I am fine with allowing for the possibility (that is just good
> engineering) but I don't intend to seriously considering or
> implementing other possibilities.

I can imagine a mode where there's a fixed set of groups that are
forced set but other groups can be added and dropped.  It would be
complicated to set up right, but someone might want it some day.

>
>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>> --- a/arch/s390/kernel/compat_linux.c
>>> +++ b/arch/s390/kernel/compat_linux.c
>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>>         int retval;
>>>
>>>         if (!gid_mapping_possible(user_ns) ||
>>> +           !atomic_read(&user_ns->setgroups_allowed) ||
>>>             !capable(CAP_SETGID))
>>>                 return -EPERM;
>>
>> This is now incomprehensible because of the gid_mapping_possible
>> thing.  If you renamed gid_mapping_possible to
>> userns_setgroup_allowed, then this could be added to the
>> implementation, and this would all make sense (not to mention avoiding
>> duplicating this thing).
>>
>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>>                         if (uid_eq(uid, cred->euid))
>>>                                 return true;
>>> +               } else if (cap_setid == CAP_SETGID) {
>>> +                       kgid_t gid = make_kgid(ns->parent, id);
>>> +                       if (!atomic_read(&ns->setgroups_allowed) &&
>>> +                           gid_eq(gid, cred->egid))
>>> +                               return true;
>>
>> I still don't see why egid is any better than fsgid here.
>
> Answered in my earlier response fsgid was a goof.
> I can set any gid to my egid with my existing permissions.
> Show me how I can do that with fsgid or fsuid and I will be happy to use
> those.

You can use your fsgid to make a setgid file, I think.  But yes, point
taken, although as mentioned in the other thread, I think it would be
a lot clearer if it were a separate patch.


>>> +
>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>> +                            size_t count, loff_t *ppos)
>>> +{
>>> +       struct seq_file *seq = file->private_data;
>>> +       struct user_namespace *ns = seq->private;
>>> +       char kbuf[3];
>>> +       int setgroups_allowed;
>>> +       ssize_t ret;
>>> +
>>> +       ret = -EPERM;
>>> +       if (!file_ns_capable(file, ns, CAP_SETGID))
>>> +               goto out;
>>
>> CAP_SYS_ADMIN?  This isn't setting a gid in the namespace; it's
>> reconfiguring the namespace.
>
> Hmm.  Maybe.  It is an activity that is normally controlled by
> CAP_SETGID.
>
> Frankly I think the entire split up of the capability model is almost
> totally broken.  But I think CAP_SETGID is a close approximation of the
> right thing here.

I agree that the cap model is screwy.  But we use CAP_SYS_ADMIN for
everything else that changes the overall behavior of a namespace.

In any event, the only way it matters is for a non-ns owner in the
parent ns with CAP_SETGID set but not CAP_SYS_ADMIN.  I'd argue that
CAP_SETGID should not be usable to make unrelated process' syscalls
fail.

>
>>> +       /* Only allow a very narrow range of strings to be written */
>>> +       ret = -EINVAL;
>>> +       if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>> +               goto out;
>>> +
>>> +       /* What was written? */
>>> +       ret = -EFAULT;
>>> +       if (copy_from_user(kbuf, buf, count))
>>> +               goto out;
>>> +       kbuf[count] = '\0';
>>> +
>>> +       /* What is being requested? */
>>> +       ret = -EINVAL;
>>> +       if (kbuf[0] == '0')
>>> +               setgroups_allowed = 0;
>>> +       else if (kbuf[0] == '1')
>>> +               setgroups_allowed = 1;
>>> +       else
>>> +               goto out;
>>> +
>>> +       /* Allow a trailing newline */
>>> +       ret = -EINVAL;
>>> +       if ((count == 2) && (kbuf[1] != '\n'))
>>> +               goto out;
>>> +
>>> +
>>> +       if (setgroups_allowed) {
>>> +               ret = -EINVAL;
>>> +               if (atomic_read(&ns->setgroups_allowed) == 0)
>>> +                       goto out;
>>> +       } else {
>>
>> I would disallow this if gid_map has been written in the interest of
>> sanity.
>
> Not a chance.  That is part of making this an independent knob.  If
> there is another reason for disabling setgroups you can flip this
> knob even after mappings are established.

Then you really want CAP_SYS_ADMIN, I think.

>
>>> +               atomic_set(&ns->setgroups_allowed, 0);
>>> +               /* sigh memory barriers! */
>>
>> I don't think that any barriers are needed.  If you ever observe
>> setgroups_allowed == 0, it will stay that way forever.
>
> Likely.   In practice the code works today.
>
> But I need to review things closely to understand if there are barriers
> needed.  But especially since it is a write once knob we can get away
> with a lot.
>

Yeah.

For long-term use, I kind of like the flags approach I added in the
other patch.  It makes it easy to add more flags in the future.

In any event, I think the only barrier that's needed is when writing gid_map:

atomic_read / test_bit to determine whether unpriv mappings are okay;
smp_mb() or whatever the current _after_atomic thing is these days;
write mapping;

Although I'm not sure whether Linux supports any architectures that
can violate causality in the way that barrier is there to prevent.

--Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 22:09               ` Andy Lutomirski
@ 2014-12-02 22:48                 ` Eric W. Biederman
  2014-12-02 22:56                   ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 22:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Classic unix permission checks have an interesting feature, the group
>>>> permissions for a file can be set to less than the other permissions
>>>> on a file.  Occassionally this is used deliberately to give a certain
>>>> group of users fewer permissions than the default.
>>>>
>>>> Overlooking negative groups has resulted in the permission checks for
>>>> setting up a group mapping in a user namespace to be too lax.  Tighten
>>>> the permission checks in new_idmap_permitted to ensure that mapping
>>>> uids and gids into user namespaces without privilege will not result
>>>> in new combinations of credentials being available to the users.
>>>>
>>>> When setting mappings without privilege only the creator of the user
>>>> namespace is interesting as all other users that have CAP_SETUID over
>>>> the user namespace will also have CAP_SETUID over the user namespaces
>>>> parent.  So the scope of the unprivileged check is reduced to just
>>>> the case where cred->euid is the namespace creator.
>>>>
>>>> For setting a uid mapping without privilege only euid is considered as
>>>> setresuid can set uid, suid and fsuid from euid without privielege
>>>> making any combination of uids possible with user namespaces already
>>>> possible without them.
>>>>
>>>> For now seeting a gid mapping without privilege is removed.  The only
>>>> possible set of credentials that would be safe without a gid mapping
>>>> (egid without any supplementary groups) just doesn't happen in practice
>>>> so would simply lead to unused untested code.
>>>>
>>>> setgroups is modified to fail not only when the group ids do not
>>>> map but also when there are no gid mappings at all, preventing
>>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>>> established.
>>>>
>>>> For a small class of applications this change breaks userspace
>>>> and removes useful functionality.  This small class of applications
>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>>
>>>> Most of the removed functionality will be added back with the
>>>> addition of a one way knob to disable setgroups.  Once setgroups
>>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>>
>>>> For more common applications that set the uid_map and the gid_map with
>>>> privilege this change will have no effect on them.
>>>>
>>>> This should fix CVE-2014-8989.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>
>>>>
>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>>> +{
>>>> +       return ns->gid_map.nr_extents != 0;
>>>> +}
>>>> +
>>>
>>> Can you rename this to userns_may_setgroups or something like that?
>>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>>> which sounds like the opposite condition, and it doesn't explain what
>>> the point is.
>>
>> gid_mapping_established?
>>
>> What I mean to be testing if is if from_kgid and make_kgid will work
>> because the gid mappings have been set.
>
> But why do you care whether from_kgid and make_kgid will work?  If
> they fail, then they fail.  I think that the point is that you're
> checking whether allowing setgroups to drop groups is safe, and that's
> only barely the same condition.

For all of the system calls to set or change uids and gids except
setgroups it happens to fall out that if there are no mappings set the
system calls fail.  That is and was deliberate.  However setgroups is
weird because it allows the case of 0 mappings and to maintain the
constraint that it fails when there are no mapping set (just like
everything else) that requires an additional test.

>> The userns knob for setgroups is a different test and is added
>> in the next patch.  And yes we really need both or the knob can
>> start out as on, and we need to provent setgroups(0, NULL)
>> before the user namespace is unshared.
>
> Do you mean before it's mapped?

Right we need to prevent setgroups(0, NULL) before we set the gid
mapping.

>> Although come to think about it probably makes sense to roll those two
>> test into one function and call that inline function from the setgroups
>> implementation.
>
> That's what I think, too.
>
>>
>> Anyway I will think about it and see what I can do to make it easily
>> comprehensible.
>>
>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>>> index aa312b0dc3ec..51d65b444951 100644
>>>> --- a/kernel/user_namespace.c
>>>> +++ b/kernel/user_namespace.c
>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>>>                                 struct user_namespace *ns, int cap_setid,
>>>>                                 struct uid_gid_map *new_map)
>>>>  {
>>>> -       /* Allow mapping to your own filesystem ids */
>>>> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>>> +       const struct cred *cred = file->f_cred;
>>>> +
>>>> +       /* Allow a mapping without capabilities when allowing the root
>>>> +        * of the user namespace capabilities restricted to that id
>>>> +        * will not change the set of credentials available to that
>>>> +        * user.
>>>> +        */
>>>> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>>> +           uid_eq(ns->owner, cred->euid)) {
>>>
>>> What's uid_eq(ns->owner, cred->euid)) for?  This should already be covered by:
>>
>> This means that the only user we attempt to set up unprivileged mappings
>> for is the owner of the user namespace.  Anyone else should already
>> have capabilities in the parent user namespace or shouldn't be able to
>> set the mapping at all.
>>
>> In practice it is a clarification to make it easier to think about the code.
>
> But why?  I don't see why this check is necessary or why it's relevant
> to the current issue.

My goal in this check is to guarantee that any combination of uids and
gids in struct cred that you can obtain with mappings and a user
namespace you can also obtain without privilege without a user
namespace.

What limiting euid to ns->owner does is it guarantees that when a user
namespace is created without privilege root doesn't come along and set
the mapping using the unprivileged path.  That is confusing to think
about and it is not necessary to support.

With ns->owner == euid I have the guarantee especially with the gids
that they wind up paired with a uid in struct cred that came from the
same user.  Either that or someone set one of the mappings with
privilege.

With ns->owner == euid I can verify all of these things pretty
trivially.  Without that check I don't have a clue how to verify
the pairing between uids and gids in the unprivileged mapping.

Does that make things clearer?

>>>     if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>>         goto out;
>>>
>>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>>> ought to be enforced for projid_map, too, right?)
>>
>> What to do with projid_map is entirely different discussion.  In
>> practice it is dead, and either XFS needs to be fixed to use it
>> or that code needs to be removed.  At the time I wrote it XFS
>> did not require any privileges to set project ids.
>>
>>>>                 u32 id = new_map->extent[0].lower_first;
>>>>                 if (cap_setid == CAP_SETUID) {
>>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>>> -                       if (uid_eq(uid, file->f_cred->fsuid))
>>>> -                               return true;
>>>> -               } else if (cap_setid == CAP_SETGID) {
>>>> -                       kgid_t gid = make_kgid(ns->parent, id);
>>>> -                       if (gid_eq(gid, file->f_cred->fsgid))
>>>> +                       if (uid_eq(uid, cred->euid))
>>>
>>> Why'd you change this from fsuid to euid?
>>
>> Because strangely enough I can set euid to any other uid with
>> setresuid, but the same does not hold with fsuid.
>>
>> So strictly speaking fsuid was actually wrong before.  In practice
>> fsuid == euid so I don't think anyone will care.  But I want very much
>> to enforce the rule that user namespaces can't give you any credentials
>> you couldn't get otherwise.
>
> Fair enough.  Want to split that into a separate patch, then?

Strictly speaking it is part and parcel of the same thing but it
probably makes sense to split it out and emphasise and explain the
change.

Eric


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups
  2014-12-02 22:48                 ` Eric W. Biederman
@ 2014-12-02 22:56                   ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 22:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 2:48 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>
>>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman
>>>> <ebiederm@xmission.com> wrote:
>>>>>
>>>>> Classic unix permission checks have an interesting feature, the group
>>>>> permissions for a file can be set to less than the other permissions
>>>>> on a file.  Occassionally this is used deliberately to give a certain
>>>>> group of users fewer permissions than the default.
>>>>>
>>>>> Overlooking negative groups has resulted in the permission checks for
>>>>> setting up a group mapping in a user namespace to be too lax.  Tighten
>>>>> the permission checks in new_idmap_permitted to ensure that mapping
>>>>> uids and gids into user namespaces without privilege will not result
>>>>> in new combinations of credentials being available to the users.
>>>>>
>>>>> When setting mappings without privilege only the creator of the user
>>>>> namespace is interesting as all other users that have CAP_SETUID over
>>>>> the user namespace will also have CAP_SETUID over the user namespaces
>>>>> parent.  So the scope of the unprivileged check is reduced to just
>>>>> the case where cred->euid is the namespace creator.
>>>>>
>>>>> For setting a uid mapping without privilege only euid is considered as
>>>>> setresuid can set uid, suid and fsuid from euid without privielege
>>>>> making any combination of uids possible with user namespaces already
>>>>> possible without them.
>>>>>
>>>>> For now seeting a gid mapping without privilege is removed.  The only
>>>>> possible set of credentials that would be safe without a gid mapping
>>>>> (egid without any supplementary groups) just doesn't happen in practice
>>>>> so would simply lead to unused untested code.
>>>>>
>>>>> setgroups is modified to fail not only when the group ids do not
>>>>> map but also when there are no gid mappings at all, preventing
>>>>> setgroups(0, NULL) from succeeding when gid mappings have not been
>>>>> established.
>>>>>
>>>>> For a small class of applications this change breaks userspace
>>>>> and removes useful functionality.  This small class of applications
>>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c
>>>>>
>>>>> Most of the removed functionality will be added back with the
>>>>> addition of a one way knob to disable setgroups.  Once setgroups
>>>>> is disabled setting the gid_map becomes as safe as setting the uid_map.
>>>>>
>>>>> For more common applications that set the uid_map and the gid_map with
>>>>> privilege this change will have no effect on them.
>>>>>
>>>>> This should fix CVE-2014-8989.
>>>>>
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>> ---
>>>>
>>>>>
>>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns)
>>>>> +{
>>>>> +       return ns->gid_map.nr_extents != 0;
>>>>> +}
>>>>> +
>>>>
>>>> Can you rename this to userns_may_setgroups or something like that?
>>>> To me, gid_mapping_possible sounds like you're allowed to map gids,
>>>> which sounds like the opposite condition, and it doesn't explain what
>>>> the point is.
>>>
>>> gid_mapping_established?
>>>
>>> What I mean to be testing if is if from_kgid and make_kgid will work
>>> because the gid mappings have been set.
>>
>> But why do you care whether from_kgid and make_kgid will work?  If
>> they fail, then they fail.  I think that the point is that you're
>> checking whether allowing setgroups to drop groups is safe, and that's
>> only barely the same condition.
>
> For all of the system calls to set or change uids and gids except
> setgroups it happens to fall out that if there are no mappings set the
> system calls fail.  That is and was deliberate.  However setgroups is
> weird because it allows the case of 0 mappings and to maintain the
> constraint that it fails when there are no mapping set (just like
> everything else) that requires an additional test.
>
>>> The userns knob for setgroups is a different test and is added
>>> in the next patch.  And yes we really need both or the knob can
>>> start out as on, and we need to provent setgroups(0, NULL)
>>> before the user namespace is unshared.
>>
>> Do you mean before it's mapped?
>
> Right we need to prevent setgroups(0, NULL) before we set the gid
> mapping.

Fair enough.

If you factor this into a separate inline helper, it might be worth
adding a short comment to that effect.  It could be as simple as:

static inline bool whatever(whatever) {
  if (mapping is empty)
    return false;  /* setgroups with a nonempty set requires a
mapping; make sure that setgroups(0, NULL) does, too. */
  ...;
}

>
>>> Although come to think about it probably makes sense to roll those two
>>> test into one function and call that inline function from the setgroups
>>> implementation.
>>
>> That's what I think, too.
>>
>>>
>>> Anyway I will think about it and see what I can do to make it easily
>>> comprehensible.
>>>
>>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>>>> index aa312b0dc3ec..51d65b444951 100644
>>>>> --- a/kernel/user_namespace.c
>>>>> +++ b/kernel/user_namespace.c
>>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file,
>>>>>                                 struct user_namespace *ns, int cap_setid,
>>>>>                                 struct uid_gid_map *new_map)
>>>>>  {
>>>>> -       /* Allow mapping to your own filesystem ids */
>>>>> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
>>>>> +       const struct cred *cred = file->f_cred;
>>>>> +
>>>>> +       /* Allow a mapping without capabilities when allowing the root
>>>>> +        * of the user namespace capabilities restricted to that id
>>>>> +        * will not change the set of credentials available to that
>>>>> +        * user.
>>>>> +        */
>>>>> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
>>>>> +           uid_eq(ns->owner, cred->euid)) {
>>>>
>>>> What's uid_eq(ns->owner, cred->euid)) for?  This should already be covered by:
>>>
>>> This means that the only user we attempt to set up unprivileged mappings
>>> for is the owner of the user namespace.  Anyone else should already
>>> have capabilities in the parent user namespace or shouldn't be able to
>>> set the mapping at all.
>>>
>>> In practice it is a clarification to make it easier to think about the code.
>>
>> But why?  I don't see why this check is necessary or why it's relevant
>> to the current issue.
>
> My goal in this check is to guarantee that any combination of uids and
> gids in struct cred that you can obtain with mappings and a user
> namespace you can also obtain without privilege without a user
> namespace.
>
> What limiting euid to ns->owner does is it guarantees that when a user
> namespace is created without privilege root doesn't come along and set
> the mapping using the unprivileged path.  That is confusing to think
> about and it is not necessary to support.
>
> With ns->owner == euid I have the guarantee especially with the gids
> that they wind up paired with a uid in struct cred that came from the
> same user.  Either that or someone set one of the mappings with
> privilege.
>
> With ns->owner == euid I can verify all of these things pretty
> trivially.  Without that check I don't have a clue how to verify
> the pairing between uids and gids in the unprivileged mapping.
>
> Does that make things clearer?

Yes.  Thanks.  It might pay to try to improve the comment.  I
understand it with this explanation but I didn't when I just read the
comment.

>
>>>>     if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
>>>>         goto out;
>>>>
>>>> (except that I don't know why cap_valid(cap_setid) is checked -- this
>>>> ought to be enforced for projid_map, too, right?)
>>>
>>> What to do with projid_map is entirely different discussion.  In
>>> practice it is dead, and either XFS needs to be fixed to use it
>>> or that code needs to be removed.  At the time I wrote it XFS
>>> did not require any privileges to set project ids.
>>>
>>>>>                 u32 id = new_map->extent[0].lower_first;
>>>>>                 if (cap_setid == CAP_SETUID) {
>>>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>>>> -                       if (uid_eq(uid, file->f_cred->fsuid))
>>>>> -                               return true;
>>>>> -               } else if (cap_setid == CAP_SETGID) {
>>>>> -                       kgid_t gid = make_kgid(ns->parent, id);
>>>>> -                       if (gid_eq(gid, file->f_cred->fsgid))
>>>>> +                       if (uid_eq(uid, cred->euid))
>>>>
>>>> Why'd you change this from fsuid to euid?
>>>
>>> Because strangely enough I can set euid to any other uid with
>>> setresuid, but the same does not hold with fsuid.
>>>
>>> So strictly speaking fsuid was actually wrong before.  In practice
>>> fsuid == euid so I don't think anyone will care.  But I want very much
>>> to enforce the rule that user namespaces can't give you any credentials
>>> you couldn't get otherwise.
>>
>> Fair enough.  Want to split that into a separate patch, then?
>
> Strictly speaking it is part and parcel of the same thing but it
> probably makes sense to split it out and emphasise and explain the
> change.

Sounds good.

Anyway, time to do a combination of Real Work (tm) and dealing with
the fact that I found a whole family of vulnerabilities of
as-yet-unknown severity in arch/x86 this morning.

--Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 22:17                 ` Andy Lutomirski
@ 2014-12-02 23:07                   ` Eric W. Biederman
  2014-12-02 23:17                     ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-02 23:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>
>>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>>
>> I am not certain that is any clearer.  That just reads as wordier.
>>
>> The userns bit is definitely confusing and wrong.  Why should we need to
>> spell out the scope?
>
> Because it's a property of the process' userns, not a property of the process.
>
> userns_setgroups would be okay.  (Arguably it should've been
> userns_uid_map, too, but at least uid_map sounds obviously
> namespace-related.)
>
>>
>>>>   A value of 0 means the setgroups system call is disabled in the
>>>>   current processes user namespace and can not be enabled in the
>>>>   future in this user namespace.
>>>>
>>>>   A value of 1 means the segtoups system call is enabled.
>>>
>>> Would it make more sense to put strings like "allow" and "deny" in
>>> here?  That way, future extensions could add additional values.
>>
>> If the implementation of the write side isn't too bad.  I would love
>> to see precedent elsewhere in the kernel.    What I don't expect to do
>> is have any values except setgroups are enabled and setgroups are
>> disabled.
>
> current_clocksource?  I think there are lots of things like that.
>
>>
>> I am fine with allowing for the possibility (that is just good
>> engineering) but I don't intend to seriously considering or
>> implementing other possibilities.
>
> I can imagine a mode where there's a fixed set of groups that are
> forced set but other groups can be added and dropped.  It would be
> complicated to set up right, but someone might want it some day.

Yeah.  I am defintiely not designing for that.

>>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>>> --- a/arch/s390/kernel/compat_linux.c
>>>> +++ b/arch/s390/kernel/compat_linux.c
>>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>>>         int retval;
>>>>
>>>>         if (!gid_mapping_possible(user_ns) ||
>>>> +           !atomic_read(&user_ns->setgroups_allowed) ||
>>>>             !capable(CAP_SETGID))
>>>>                 return -EPERM;
>>>
>>> This is now incomprehensible because of the gid_mapping_possible
>>> thing.  If you renamed gid_mapping_possible to
>>> userns_setgroup_allowed, then this could be added to the
>>> implementation, and this would all make sense (not to mention avoiding
>>> duplicating this thing).
>>>
>>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>>>                         if (uid_eq(uid, cred->euid))
>>>>                                 return true;
>>>> +               } else if (cap_setid == CAP_SETGID) {
>>>> +                       kgid_t gid = make_kgid(ns->parent, id);
>>>> +                       if (!atomic_read(&ns->setgroups_allowed) &&
>>>> +                           gid_eq(gid, cred->egid))
>>>> +                               return true;
>>>
>>> I still don't see why egid is any better than fsgid here.
>>
>> Answered in my earlier response fsgid was a goof.
>> I can set any gid to my egid with my existing permissions.
>> Show me how I can do that with fsgid or fsuid and I will be happy to use
>> those.
>
> You can use your fsgid to make a setgid file, I think.  But yes, point
> taken, although as mentioned in the other thread, I think it would be
> a lot clearer if it were a separate patch.

Agreed.

>>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>>> +                            size_t count, loff_t *ppos)
>>>> +{
>>>> +       struct seq_file *seq = file->private_data;
>>>> +       struct user_namespace *ns = seq->private;
>>>> +       char kbuf[3];
>>>> +       int setgroups_allowed;
>>>> +       ssize_t ret;
>>>> +
>>>> +       ret = -EPERM;
>>>> +       if (!file_ns_capable(file, ns, CAP_SETGID))
>>>> +               goto out;
>>>
>>> CAP_SYS_ADMIN?  This isn't setting a gid in the namespace; it's
>>> reconfiguring the namespace.
>>
>> Hmm.  Maybe.  It is an activity that is normally controlled by
>> CAP_SETGID.
>>
>> Frankly I think the entire split up of the capability model is almost
>> totally broken.  But I think CAP_SETGID is a close approximation of the
>> right thing here.
>
> I agree that the cap model is screwy.  But we use CAP_SYS_ADMIN for
> everything else that changes the overall behavior of a namespace.
>
> In any event, the only way it matters is for a non-ns owner in the
> parent ns with CAP_SETGID set but not CAP_SYS_ADMIN.  I'd argue that
> CAP_SETGID should not be usable to make unrelated process' syscalls
> fail.

That is a pretty decent argument.    I will take a good stare at this
issue as I refresh these patches and see how close to perfection I can
make them.

>>>> +       /* Only allow a very narrow range of strings to be written */
>>>> +       ret = -EINVAL;
>>>> +       if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>>> +               goto out;
>>>> +
>>>> +       /* What was written? */
>>>> +       ret = -EFAULT;
>>>> +       if (copy_from_user(kbuf, buf, count))
>>>> +               goto out;
>>>> +       kbuf[count] = '\0';
>>>> +
>>>> +       /* What is being requested? */
>>>> +       ret = -EINVAL;
>>>> +       if (kbuf[0] == '0')
>>>> +               setgroups_allowed = 0;
>>>> +       else if (kbuf[0] == '1')
>>>> +               setgroups_allowed = 1;
>>>> +       else
>>>> +               goto out;
>>>> +
>>>> +       /* Allow a trailing newline */
>>>> +       ret = -EINVAL;
>>>> +       if ((count == 2) && (kbuf[1] != '\n'))
>>>> +               goto out;
>>>> +
>>>> +
>>>> +       if (setgroups_allowed) {
>>>> +               ret = -EINVAL;
>>>> +               if (atomic_read(&ns->setgroups_allowed) == 0)
>>>> +                       goto out;
>>>> +       } else {
>>>
>>> I would disallow this if gid_map has been written in the interest of
>>> sanity.
>>
>> Not a chance.  That is part of making this an independent knob.  If
>> there is another reason for disabling setgroups you can flip this
>> knob even after mappings are established.
>
> Then you really want CAP_SYS_ADMIN, I think.
>
>>
>>>> +               atomic_set(&ns->setgroups_allowed, 0);
>>>> +               /* sigh memory barriers! */
>>>
>>> I don't think that any barriers are needed.  If you ever observe
>>> setgroups_allowed == 0, it will stay that way forever.
>>
>> Likely.   In practice the code works today.
>>
>> But I need to review things closely to understand if there are barriers
>> needed.  But especially since it is a write once knob we can get away
>> with a lot.
>>
>
> Yeah.
>
> For long-term use, I kind of like the flags approach I added in the
> other patch.  It makes it easy to add more flags in the future.

I thought a dedicated word might be easier.  I don't know that it makes
much of a difference at this point.

> In any event, I think the only barrier that's needed is when writing gid_map:
>
> atomic_read / test_bit to determine whether unpriv mappings are okay;
> smp_mb() or whatever the current _after_atomic thing is these days;
> write mapping;
>
> Although I'm not sure whether Linux supports any architectures that
> can violate causality in the way that barrier is there to prevent.

The DEC Alpha at least used to be able to violate causality in ways
weird enough to confuse anyone, and the alpha still seems to be in the
tree.  What keyed me in was the part in atomic_ops.txt that says these
things don't have barriers and you will probably need them, and I
remember spin locks tend to take care of all those issues for you. 

Anyway thanks for your barrier pointer.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-02 23:07                   ` Eric W. Biederman
@ 2014-12-02 23:17                     ` Andy Lutomirski
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-02 23:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 2, 2014 at 3:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>
>>>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman
>>>> <ebiederm@xmission.com> wrote:
>>>>>
>>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>>
>>>> Can you rename this to something clearer, e.g. userns_setgroups_mode?
>>>
>>> I am not certain that is any clearer.  That just reads as wordier.
>>>
>>> The userns bit is definitely confusing and wrong.  Why should we need to
>>> spell out the scope?
>>
>> Because it's a property of the process' userns, not a property of the process.
>>
>> userns_setgroups would be okay.  (Arguably it should've been
>> userns_uid_map, too, but at least uid_map sounds obviously
>> namespace-related.)
>>
>>>
>>>>>   A value of 0 means the setgroups system call is disabled in the
>>>>>   current processes user namespace and can not be enabled in the
>>>>>   future in this user namespace.
>>>>>
>>>>>   A value of 1 means the segtoups system call is enabled.
>>>>
>>>> Would it make more sense to put strings like "allow" and "deny" in
>>>> here?  That way, future extensions could add additional values.
>>>
>>> If the implementation of the write side isn't too bad.  I would love
>>> to see precedent elsewhere in the kernel.    What I don't expect to do
>>> is have any values except setgroups are enabled and setgroups are
>>> disabled.
>>
>> current_clocksource?  I think there are lots of things like that.
>>
>>>
>>> I am fine with allowing for the possibility (that is just good
>>> engineering) but I don't intend to seriously considering or
>>> implementing other possibilities.
>>
>> I can imagine a mode where there's a fixed set of groups that are
>> forced set but other groups can be added and dropped.  It would be
>> complicated to set up right, but someone might want it some day.
>
> Yeah.  I am defintiely not designing for that.
>
>>>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>>>>> index 21c91feeca2d..6d0ee1b089fb 100644
>>>>> --- a/arch/s390/kernel/compat_linux.c
>>>>> +++ b/arch/s390/kernel/compat_linux.c
>>>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
>>>>>         int retval;
>>>>>
>>>>>         if (!gid_mapping_possible(user_ns) ||
>>>>> +           !atomic_read(&user_ns->setgroups_allowed) ||
>>>>>             !capable(CAP_SETGID))
>>>>>                 return -EPERM;
>>>>
>>>> This is now incomprehensible because of the gid_mapping_possible
>>>> thing.  If you renamed gid_mapping_possible to
>>>> userns_setgroup_allowed, then this could be added to the
>>>> implementation, and this would all make sense (not to mention avoiding
>>>> duplicating this thing).
>>>>
>>>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file,
>>>>>                         kuid_t uid = make_kuid(ns->parent, id);
>>>>>                         if (uid_eq(uid, cred->euid))
>>>>>                                 return true;
>>>>> +               } else if (cap_setid == CAP_SETGID) {
>>>>> +                       kgid_t gid = make_kgid(ns->parent, id);
>>>>> +                       if (!atomic_read(&ns->setgroups_allowed) &&
>>>>> +                           gid_eq(gid, cred->egid))
>>>>> +                               return true;
>>>>
>>>> I still don't see why egid is any better than fsgid here.
>>>
>>> Answered in my earlier response fsgid was a goof.
>>> I can set any gid to my egid with my existing permissions.
>>> Show me how I can do that with fsgid or fsuid and I will be happy to use
>>> those.
>>
>> You can use your fsgid to make a setgid file, I think.  But yes, point
>> taken, although as mentioned in the other thread, I think it would be
>> a lot clearer if it were a separate patch.
>
> Agreed.
>
>>>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>>>> +                            size_t count, loff_t *ppos)
>>>>> +{
>>>>> +       struct seq_file *seq = file->private_data;
>>>>> +       struct user_namespace *ns = seq->private;
>>>>> +       char kbuf[3];
>>>>> +       int setgroups_allowed;
>>>>> +       ssize_t ret;
>>>>> +
>>>>> +       ret = -EPERM;
>>>>> +       if (!file_ns_capable(file, ns, CAP_SETGID))
>>>>> +               goto out;
>>>>
>>>> CAP_SYS_ADMIN?  This isn't setting a gid in the namespace; it's
>>>> reconfiguring the namespace.
>>>
>>> Hmm.  Maybe.  It is an activity that is normally controlled by
>>> CAP_SETGID.
>>>
>>> Frankly I think the entire split up of the capability model is almost
>>> totally broken.  But I think CAP_SETGID is a close approximation of the
>>> right thing here.
>>
>> I agree that the cap model is screwy.  But we use CAP_SYS_ADMIN for
>> everything else that changes the overall behavior of a namespace.
>>
>> In any event, the only way it matters is for a non-ns owner in the
>> parent ns with CAP_SETGID set but not CAP_SYS_ADMIN.  I'd argue that
>> CAP_SETGID should not be usable to make unrelated process' syscalls
>> fail.
>
> That is a pretty decent argument.    I will take a good stare at this
> issue as I refresh these patches and see how close to perfection I can
> make them.
>
>>>>> +       /* Only allow a very narrow range of strings to be written */
>>>>> +       ret = -EINVAL;
>>>>> +       if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1))
>>>>> +               goto out;
>>>>> +
>>>>> +       /* What was written? */
>>>>> +       ret = -EFAULT;
>>>>> +       if (copy_from_user(kbuf, buf, count))
>>>>> +               goto out;
>>>>> +       kbuf[count] = '\0';
>>>>> +
>>>>> +       /* What is being requested? */
>>>>> +       ret = -EINVAL;
>>>>> +       if (kbuf[0] == '0')
>>>>> +               setgroups_allowed = 0;
>>>>> +       else if (kbuf[0] == '1')
>>>>> +               setgroups_allowed = 1;
>>>>> +       else
>>>>> +               goto out;
>>>>> +
>>>>> +       /* Allow a trailing newline */
>>>>> +       ret = -EINVAL;
>>>>> +       if ((count == 2) && (kbuf[1] != '\n'))
>>>>> +               goto out;
>>>>> +
>>>>> +
>>>>> +       if (setgroups_allowed) {
>>>>> +               ret = -EINVAL;
>>>>> +               if (atomic_read(&ns->setgroups_allowed) == 0)
>>>>> +                       goto out;
>>>>> +       } else {
>>>>
>>>> I would disallow this if gid_map has been written in the interest of
>>>> sanity.
>>>
>>> Not a chance.  That is part of making this an independent knob.  If
>>> there is another reason for disabling setgroups you can flip this
>>> knob even after mappings are established.
>>
>> Then you really want CAP_SYS_ADMIN, I think.
>>
>>>
>>>>> +               atomic_set(&ns->setgroups_allowed, 0);
>>>>> +               /* sigh memory barriers! */
>>>>
>>>> I don't think that any barriers are needed.  If you ever observe
>>>> setgroups_allowed == 0, it will stay that way forever.
>>>
>>> Likely.   In practice the code works today.
>>>
>>> But I need to review things closely to understand if there are barriers
>>> needed.  But especially since it is a write once knob we can get away
>>> with a lot.
>>>
>>
>> Yeah.
>>
>> For long-term use, I kind of like the flags approach I added in the
>> other patch.  It makes it easy to add more flags in the future.
>
> I thought a dedicated word might be easier.  I don't know that it makes
> much of a difference at this point.
>
>> In any event, I think the only barrier that's needed is when writing gid_map:
>>
>> atomic_read / test_bit to determine whether unpriv mappings are okay;
>> smp_mb() or whatever the current _after_atomic thing is these days;
>> write mapping;
>>
>> Although I'm not sure whether Linux supports any architectures that
>> can violate causality in the way that barrier is there to prevent.
>
> The DEC Alpha at least used to be able to violate causality in ways
> weird enough to confuse anyone, and the alpha still seems to be in the
> tree.  What keyed me in was the part in atomic_ops.txt that says these
> things don't have barriers and you will probably need them, and I
> remember spin locks tend to take care of all those issues for you.
>
> Anyway thanks for your barrier pointer.
>

My pointer was a bit off.  Barriers synchronize with barriers; the
check for whether setgroups is okay may need a barrier as well:

static inline bool may_setgroups(whatever) {
    if (no groups are mapped)
      return false;
    smp_rmb() (or _before_atomic, maybe -- I don't know whether that's
okay for atomic_read);
    if (setgroups is disallowed by option)
      return false;

    return true;
}

It would be bad if there were a window in which we'd see a group
mapped but not see that setgroups is disallowed.

--Andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings.
  2014-12-02 23:17                     ` Andy Lutomirski
@ 2014-12-08 22:06                       ` Eric W. Biederman
  2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
                                           ` (5 more replies)
  0 siblings, 6 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b99c862a2e3f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
-	/* Allow mapping to your own filesystem ids */
+	/* Don't allow mappings that would allow anything that wouldn't
+	 * be allowed without the establishment of unprivileged mappings.
+	 */
 	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
@ 2014-12-08 22:07                         ` Eric W. Biederman
  2014-12-08 22:11                           ` Andy Lutomirski
  2014-12-08 22:17                           ` Richard Weinberger
  2014-12-08 22:07                         ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
                                           ` (4 subsequent siblings)
  5 siblings, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/user_namespace.h | 9 +++++++++
 kernel/groups.c                | 7 ++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..41cc26e5a350 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -37,6 +37,15 @@ struct user_namespace {
 
 extern struct user_namespace init_user_ns;
 
+static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
+{
+	bool established;
+	smp_mb__before_atomic();
+	established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
+	smp_mb__after_atomic();
+	return established;
+}
+
 #ifdef CONFIG_USER_NS
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index 02d8a251c476..e0335e44f76a 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -217,7 +218,11 @@ bool may_setgroups(void)
 {
 	struct user_namespace *user_ns = current_user_ns();
 
-	return ns_capable(user_ns, CAP_SETGID);
+	/* It is not safe to use setgroups until a gid mapping in
+	 * the user namespace has been established.
+	 */
+	return userns_gid_mappings_established(user_ns) &&
+		ns_capable(user_ns, CAP_SETGID);
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
@ 2014-12-08 22:07                         ` Eric W. Biederman
  2014-12-08 22:08                         ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b99c862a2e3f..8e7c87162171 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, file->f_cred->fsuid))
 				return true;
-		} else if (cap_setid == CAP_SETGID) {
-			kgid_t gid = make_kgid(ns->parent, id);
-			if (gid_eq(gid, file->f_cred->fsgid))
-				return true;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
  2014-12-08 22:07                         ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
@ 2014-12-08 22:08                         ` Eric W. Biederman
  2014-12-08 22:12                           ` Andy Lutomirski
  2014-12-08 22:10                         ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid.  Therefor it is safe to allow an unprivileged user to map their
euid and use CAP_SETUID privileged with exactly that uid, as no new
credentials can be obtained.

I can not find a combination of existing system calls that allows
setting uid, euid, suid, and fsuid from the fsuid making the previous
use of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 8e7c87162171..da1eeb927b21 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->fsuid))
+			if (uid_eq(uid, file->f_cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                           ` (2 preceding siblings ...)
  2014-12-08 22:08                         ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
@ 2014-12-08 22:10                         ` Eric W. Biederman
  2014-12-08 22:15                           ` Andy Lutomirski
  2014-12-08 22:11                         ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
  2014-12-08 22:14                         ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
  5 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace reduces the set of credentials that must be verified
can be obtained without privielge, making code verification simpler.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index da1eeb927b21..413f60fd5983 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
+	const struct cred *cred = file->f_cred;
 	/* Don't allow mappings that would allow anything that wouldn't
 	 * be allowed without the establishment of unprivileged mappings.
 	 */
-	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->euid))
+			if (uid_eq(uid, cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                           ` (3 preceding siblings ...)
  2014-12-08 22:10                         ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
@ 2014-12-08 22:11                         ` Eric W. Biederman
  2014-12-08 22:21                           ` Andy Lutomirski
  2014-12-08 22:14                         ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
  5 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of 0 means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of 1 means the segtoups system call is enabled.

- Descedent user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls
  currently do not pass in a struct file so file_ns_capable
  is unusable.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/base.c                 | 31 ++++++++++----
 include/linux/user_namespace.h | 25 +++++++++++
 kernel/groups.c                |  1 +
 kernel/user.c                  |  1 +
 kernel/user_namespace.c        | 97 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 147 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
 #endif /* CONFIG_TASK_IO_ACCOUNTING */
 
 #ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
 	const struct seq_operations *seq_ops)
 {
 	struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
 	return ret;
 }
 
-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *seq = file->private_data;
 	struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)
 
 static int proc_uid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+	return proc_userns_open(inode, file, &proc_uid_seq_operations);
 }
 
 static int proc_gid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+	return proc_userns_open(inode, file, &proc_gid_seq_operations);
 }
 
 static int proc_projid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+	return proc_userns_open(inode, file, &proc_projid_seq_operations);
 }
 
 static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
 	.write		= proc_uid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
 	.write		= proc_gid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
 	.write		= proc_projid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+	return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+	.open		= proc_setgroups_open,
+	.write		= proc_setgroups_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_userns_release,
 };
 #endif /* CONFIG_USER_NS */
 
@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	REG("timers",	  S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 };
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 41cc26e5a350..6451c401dcf6 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,12 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
 	} extent[UID_GID_MAP_MAX_EXTENTS];
 };
 
+enum user_namespace_flags {
+	USERNS_SETGROUPS_ALLOWED,
+};
+
+#define USERNS_INIT_FLAGS BIT(USERNS_SETGROUPS_ALLOWED)
+
 struct user_namespace {
 	struct uid_gid_map	uid_map;
 	struct uid_gid_map	gid_map;
@@ -27,6 +33,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -46,6 +53,22 @@ static inline bool userns_gid_mappings_established(const struct user_namespace *
 	return established;
 }
 
+static inline bool userns_setgroups_allowed(const struct user_namespace *ns)
+{
+	bool allowed;
+	smp_mb__before_atomic();
+	allowed = test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+	smp_mb__after_atomic();
+	return allowed;
+}
+
+static inline void userns_disable_setgroups(struct user_namespace *ns)
+{
+	smp_mb__before_atomic();
+	clear_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags);
+	smp_mb__after_atomic();
+}
+
 #ifdef CONFIG_USER_NS
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -69,9 +92,11 @@ struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
 extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
diff --git a/kernel/groups.c b/kernel/groups.c
index e0335e44f76a..2f136fda7c4d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -222,6 +222,7 @@ bool may_setgroups(void)
 	 * the user namespace has been established.
 	 */
 	return userns_gid_mappings_established(user_ns) &&
+		userns_setgroups_allowed(user_ns) &&
 		ns_capable(user_ns, CAP_SETGID);
 }
 
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..2d09940c9632 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.flags = USERNS_INIT_FLAGS,
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 413f60fd5983..3d128f91ced3 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -98,6 +98,11 @@ int create_user_ns(struct cred *new)
 	ns->level = parent_ns->level + 1;
 	ns->owner = owner;
 	ns->group = group;
+	ns->flags = USERNS_INIT_FLAGS;
+
+	/* Copy USERNS_SETGROUPS_ALLOWED from the parent user namespace */
+	if (!userns_setgroups_allowed(parent_ns))
+		userns_disable_setgroups(ns);
 
 	set_cred_user_ns(new, ns);
 
@@ -841,6 +846,98 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return (*ppos == 0) ?  ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+	++*ppos;
+	return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+
+	seq_printf(seq, "%s\n",
+		   test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
+		   "allow" : "deny");
+	return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+	.start	= setgroups_m_start,
+	.stop = setgroups_m_stop,
+	.next = setgroups_m_next,
+	.show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	char kbuf[8], *pos;
+	bool setgroups_allowed;
+	ssize_t ret;
+
+	ret = -EACCES;
+	if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
+		goto out;
+
+	/* Only allow a very narrow range of strings to be written */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= sizeof(kbuf)))
+		goto out;
+
+	/* What was written? */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+	pos = kbuf;
+
+	/* What is being requested? */
+	ret = -EINVAL;
+	if (strncmp(pos, "allow", 5) == 0) {
+		pos += 5;
+		setgroups_allowed = true;
+	}
+	else if (strncmp(pos, "deny", 4) == 0) {
+		pos += 4;
+		setgroups_allowed = false;
+	}
+	else
+		goto out;
+
+	/* Verify there is not trailing junk on the line */
+	pos = skip_spaces(pos);
+	if (*pos != '\0')
+		goto out;
+
+	if (setgroups_allowed) {
+		ret = -EPERM;
+		if (!userns_setgroups_allowed(ns))
+			goto out;
+	} else {
+		userns_disable_setgroups(ns);
+	}
+
+	/* Report a successful write */
+	*ppos = count;
+	ret = count;
+out:
+	return ret;
+}
+
 static void *userns_get(struct task_struct *task)
 {
 	struct user_namespace *user_ns;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
@ 2014-12-08 22:11                           ` Andy Lutomirski
       [not found]                             ` <87h9x5ok0h.fsf@x220.int.ebiederm.org>
  2014-12-08 22:17                           ` Richard Weinberger
  1 sibling, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
>
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established.  Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
>
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/user_namespace.h | 9 +++++++++
>  kernel/groups.c                | 7 ++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..41cc26e5a350 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -37,6 +37,15 @@ struct user_namespace {
>
>  extern struct user_namespace init_user_ns;
>
> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
> +{
> +       bool established;
> +       smp_mb__before_atomic();
> +       established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
> +       smp_mb__after_atomic();
> +       return established;
> +}

I don't think this works on all platforms.  ACCESS_ONCE is not atomic
in the smp_mb__before_atomic sense.

> +
>  #ifdef CONFIG_USER_NS
>
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 02d8a251c476..e0335e44f76a 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
>  #include <linux/slab.h>
>  #include <linux/security.h>
>  #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
>  #include <asm/uaccess.h>
>
>  /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -217,7 +218,11 @@ bool may_setgroups(void)
>  {
>         struct user_namespace *user_ns = current_user_ns();
>
> -       return ns_capable(user_ns, CAP_SETGID);
> +       /* It is not safe to use setgroups until a gid mapping in
> +        * the user namespace has been established.
> +        */
> +       return userns_gid_mappings_established(user_ns) &&
> +               ns_capable(user_ns, CAP_SETGID);
>  }
>
>  /*
> --
> 1.9.1
>

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping
  2014-12-08 22:08                         ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
@ 2014-12-08 22:12                           ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:08 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> setresuid allows the euid to be set to any of uid, euid, suid, and
> fsuid.  Therefor it is safe to allow an unprivileged user to map their
> euid and use CAP_SETUID privileged with exactly that uid, as no new
> credentials can be obtained.
>
> I can not find a combination of existing system calls that allows
> setting uid, euid, suid, and fsuid from the fsuid making the previous
> use of fsuid for allowing unprivileged mappings a bug.

Right.

>
> This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 8e7c87162171..da1eeb927b21 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
>                 u32 id = new_map->extent[0].lower_first;
>                 if (cap_setid == CAP_SETUID) {
>                         kuid_t uid = make_kuid(ns->parent, id);
> -                       if (uid_eq(uid, file->f_cred->fsuid))
> +                       if (uid_eq(uid, file->f_cred->euid))
>                                 return true;
>                 }
>         }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled
  2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                           ` (4 preceding siblings ...)
  2014-12-08 22:11                         ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-08 22:14                         ` Eric W. Biederman
  2014-12-08 22:26                           ` Andy Lutomirski
  5 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivilege
setting of gid_map was removed.  Applications that use this
functionality will need to check to see if they use setgroups or
init_groups, and if they don't they can be fixed by simply
disabling of setgroups before they run.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 3d128f91ced3..459c7f647072 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, cred->euid))
 				return true;
+		} else if (cap_setid == CAP_SETGID) {
+			kgid_t gid = make_kgid(ns->parent, id);
+			if (!userns_setgroups_allowed(ns) &&
+			    gid_eq(gid, cred->egid))
+				return true;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings
  2014-12-08 22:10                         ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
@ 2014-12-08 22:15                           ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:10 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> If you did not create the user namespace and are allowed
> to write to uid_map or gid_map you should already have the necessary
> privilege in the parent user namespace to establish any mapping
> you want so this will not affect userspace in practice.
>
> Limiting unprivileged uid mapping establishment to the creator of the
> user namespace reduces the set of credentials that must be verified
> can be obtained without privielge, making code verification simpler.
>

s/privielge/privilege/

But I still can't parse that sentence.

The code itself is:

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

> Limiting unprivileged gid mapping establishment (which is temporarily
> absent) to the creator of the user namespace also ensures that the
> combination of uid and gid can already be obtained without privilege.
>
> This is part of the fix for CVE-2014-8989.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index da1eeb927b21..413f60fd5983 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
>                                 struct user_namespace *ns, int cap_setid,
>                                 struct uid_gid_map *new_map)
>  {
> +       const struct cred *cred = file->f_cred;
>         /* Don't allow mappings that would allow anything that wouldn't
>          * be allowed without the establishment of unprivileged mappings.
>          */
> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
> +           uid_eq(ns->owner, cred->euid)) {
>                 u32 id = new_map->extent[0].lower_first;
>                 if (cap_setid == CAP_SETUID) {
>                         kuid_t uid = make_kuid(ns->parent, id);
> -                       if (uid_eq(uid, file->f_cred->euid))
> +                       if (uid_eq(uid, cred->euid))
>                                 return true;
>                 }
>         }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
  2014-12-08 22:11                           ` Andy Lutomirski
@ 2014-12-08 22:17                           ` Richard Weinberger
  2014-12-08 22:25                             ` Andy Lutomirski
  1 sibling, 1 reply; 79+ messages in thread
From: Richard Weinberger @ 2014-12-08 22:17 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable

Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
> 
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
> 
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established.  Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
> 
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/user_namespace.h | 9 +++++++++
>  kernel/groups.c                | 7 ++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..41cc26e5a350 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -37,6 +37,15 @@ struct user_namespace {
>  
>  extern struct user_namespace init_user_ns;
>  
> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
> +{
> +	bool established;
> +	smp_mb__before_atomic();
> +	established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
> +	smp_mb__after_atomic();
> +	return established;
> +}
> +

Maybe this is a stupid question, but why do we need all this magic
around established =  ... ?
The purpose of this code is to check whether ns->gid_map.nr_extents != 0
in a lock-free manner?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 22:11                         ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-08 22:21                           ` Andy Lutomirski
  2014-12-08 22:44                             ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
>   A value of 0 means the setgroups system call is disabled in the

"deny"

>   current processes user namespace and can not be enabled in the
>   future in this user namespace.
>
>   A value of 1 means the segtoups system call is enabled.
>

"allow"

> - Descedent user namespaces inherit the value of setgroups from

s/Descedent/Descendent/

> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>          * the user namespace has been established.
>          */
>         return userns_gid_mappings_established(user_ns) &&
> +               userns_setgroups_allowed(user_ns) &&
>                 ns_capable(user_ns, CAP_SETGID);
>  }

Can you add a comment explaining the ordering?  For example:

We need to check for a gid mapping before checking setgroups_allowed
because an unprivileged user can create a userns with setgroups
allowed, then disallow setgroups and add a mapping.  If we check in
the opposite order, then we have a race: we could see that setgroups
is allowed before the user clears the bit and then see that there is a
gid mapping after the other thread is done.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-08 22:17                           ` Richard Weinberger
@ 2014-12-08 22:25                             ` Andy Lutomirski
  2014-12-08 22:27                               ` Richard Weinberger
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:25 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Eric W. Biederman, Linux Containers, Josh Triplett,
	Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API,
	linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote:
> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>
>> setgroups is unique in not needing a valid mapping before it can be called,
>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>
>> The design of the user namespace assumes that CAP_SETGID can not actually
>> be used until a gid mapping is established.  Therefore add a helper function
>> to see if the user namespace gid mapping has been established and call
>> that function in the setgroups permission check.
>>
>> This is part of the fix for CVE-2014-8989, being able to drop groups
>> without privilege using user namespaces.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  include/linux/user_namespace.h | 9 +++++++++
>>  kernel/groups.c                | 7 ++++++-
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..41cc26e5a350 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -37,6 +37,15 @@ struct user_namespace {
>>
>>  extern struct user_namespace init_user_ns;
>>
>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>> +{
>> +     bool established;
>> +     smp_mb__before_atomic();
>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>> +     smp_mb__after_atomic();
>> +     return established;
>> +}
>> +
>
> Maybe this is a stupid question, but why do we need all this magic
> around established =  ... ?
> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
> in a lock-free manner?
>

See my other comment -- the ordering will matter at the end of the series.

It might be nicer to do this differently: in may_setgroups, do:

if (!userns_gid_mappings_established)
  return false;

/* User code can start with setgroups allowed, disallow it, and then
add a mapping.  We need to prevent a race that could cause this
function to return true. */
smp_rmb();

if (!userns_setgroups_allowed)
  return false;

--Andy

> Thanks,
> //richard



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled
  2014-12-08 22:14                         ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
@ 2014-12-08 22:26                           ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:14 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Now that setgroups can be disabled and not reenabled, setting gid_map
> without privielge can now be enabled when setgroups is disabled.
>
> This restores most of the functionality that was lost when unprivilege

unprivileged.

> setting of gid_map was removed.  Applications that use this
> functionality will need to check to see if they use setgroups or
> init_groups, and if they don't they can be fixed by simply
> disabling of setgroups before they run.

"disabling setgroups before writing to gid_map"?

The code is:

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 3d128f91ced3..459c7f647072 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file,
>                         kuid_t uid = make_kuid(ns->parent, id);
>                         if (uid_eq(uid, cred->euid))
>                                 return true;
> +               } else if (cap_setid == CAP_SETGID) {
> +                       kgid_t gid = make_kgid(ns->parent, id);
> +                       if (!userns_setgroups_allowed(ns) &&
> +                           gid_eq(gid, cred->egid))
> +                               return true;
>                 }
>         }
>
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-08 22:25                             ` Andy Lutomirski
@ 2014-12-08 22:27                               ` Richard Weinberger
       [not found]                                 ` <874mt5ojfh.fsf@x220.int.ebiederm.org>
  0 siblings, 1 reply; 79+ messages in thread
From: Richard Weinberger @ 2014-12-08 22:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Linux Containers, Josh Triplett,
	Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API,
	linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn,
	Kenton Varda, stable

Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote:
>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established.  Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  include/linux/user_namespace.h | 9 +++++++++
>>>  kernel/groups.c                | 7 ++++++-
>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>>  extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> +     bool established;
>>> +     smp_mb__before_atomic();
>>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> +     smp_mb__after_atomic();
>>> +     return established;
>>> +}
>>> +
>>
>> Maybe this is a stupid question, but why do we need all this magic
>> around established =  ... ?
>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>> in a lock-free manner?
>>
> 
> See my other comment -- the ordering will matter at the end of the series.

But ns->gid_map.nr_extents is not atomic, it is a plain u32.
This confuses me.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
       [not found]                             ` <87h9x5ok0h.fsf@x220.int.ebiederm.org>
@ 2014-12-08 22:33                               ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established.  Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  include/linux/user_namespace.h | 9 +++++++++
>>>  kernel/groups.c                | 7 ++++++-
>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>>  extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> +       bool established;
>>> +       smp_mb__before_atomic();
>>> +       established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> +       smp_mb__after_atomic();
>>> +       return established;
>>> +}
>>
>> I don't think this works on all platforms.  ACCESS_ONCE is not atomic
>> in the smp_mb__before_atomic sense.
>
> Documentation/atomic_ops.txt documents ACCESS_ONCE as being equivalent
> to atomic_read() and atomic_set().  smp_mb__before_atomic and
> smp_mb__after_atomic() are Documented as working with atomic_read and
> atomic_set.  Maybe it is a stretch to use them but it doesn't seem like
> much of a stretch.

I don't fully understand the design there.  I think this is an attempt
to work around the fact that test_bit is fully atomic on x86 but not
elsewhere.

>
> Further at this point I don't know that any barriers are strictly
> needed, beyond the ACCESS_ONCE.  However since x86 does all of the
> ordering in hardware that I need I am not going to find any bugs that
> don't require a barrier.
>
> All I really want is the same level of barriers I would get if I used a
> spin-lock protected data structure so I don't need to worry about
> crazy smp issues that happen when the hardware decides it is safe to
> reorder things.

Use smp_rmb(), I think.  It'll be obviously correct, and the
performance impact really doesn't matter.

Also, on platforms where this stuff matters, the barrier in
smp_mb__whatever will be a full fence, whereas smp_rmb may be lighter
weight.

--Andy

>
> Eric
>
>
>>> +
>>>  #ifdef CONFIG_USER_NS
>>>
>>>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>>> diff --git a/kernel/groups.c b/kernel/groups.c
>>> index 02d8a251c476..e0335e44f76a 100644
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -6,6 +6,7 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/security.h>
>>>  #include <linux/syscalls.h>
>>> +#include <linux/user_namespace.h>
>>>  #include <asm/uaccess.h>
>>>
>>>  /* init to 2 - one for init_task, one to ensure it is never freed */
>>> @@ -217,7 +218,11 @@ bool may_setgroups(void)
>>>  {
>>>         struct user_namespace *user_ns = current_user_ns();
>>>
>>> -       return ns_capable(user_ns, CAP_SETGID);
>>> +       /* It is not safe to use setgroups until a gid mapping in
>>> +        * the user namespace has been established.
>>> +        */
>>> +       return userns_gid_mappings_established(user_ns) &&
>>> +               ns_capable(user_ns, CAP_SETGID);
>>>  }
>>>
>>>  /*
>>> --
>>> 1.9.1
>>>
>>
>> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 22:21                           ` Andy Lutomirski
@ 2014-12-08 22:44                             ` Eric W. Biederman
  2014-12-08 22:48                               ` Andy Lutomirski
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 22:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>
>>   A value of 0 means the setgroups system call is disabled in the
>
> "deny"
>
>>   current processes user namespace and can not be enabled in the
>>   future in this user namespace.
>>
>>   A value of 1 means the segtoups system call is enabled.
>>
>
> "allow"
>
>> - Descedent user namespaces inherit the value of setgroups from
>
> s/Descedent/Descendent/

Bah.  I updated everything but the changelog comment.

>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>          * the user namespace has been established.
>>          */
>>         return userns_gid_mappings_established(user_ns) &&
>> +               userns_setgroups_allowed(user_ns) &&
>>                 ns_capable(user_ns, CAP_SETGID);
>>  }
>
> Can you add a comment explaining the ordering?  For example:

I need to think on what I can say to make it clear.
Perhaps: /* Careful the order of these checks is important. */

> We need to check for a gid mapping before checking setgroups_allowed
> because an unprivileged user can create a userns with setgroups
> allowed, then disallow setgroups and add a mapping.  If we check in
> the opposite order, then we have a race: we could see that setgroups
> is allowed before the user clears the bit and then see that there is a
> gid mapping after the other thread is done.

Since these are independent atomic variables yes that ordering issue
seems to be the case.

For me it was the natural ordering of the checks so I didn't even bother
to think about what happens when you reorder them.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
       [not found]                                 ` <874mt5ojfh.fsf@x220.int.ebiederm.org>
@ 2014-12-08 22:47                                   ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, Josh Triplett,
	Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API,
	linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:39 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Richard Weinberger <richard@nod.at> writes:
>
>> Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
>>> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote:
>>>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>>>
>>>>> setgroups is unique in not needing a valid mapping before it can be called,
>>>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>>>
>>>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>>>> be used until a gid mapping is established.  Therefore add a helper function
>>>>> to see if the user namespace gid mapping has been established and call
>>>>> that function in the setgroups permission check.
>>>>>
>>>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>>>> without privilege using user namespaces.
>>>>>
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>> ---
>>>>>  include/linux/user_namespace.h | 9 +++++++++
>>>>>  kernel/groups.c                | 7 ++++++-
>>>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>>>> index e95372654f09..41cc26e5a350 100644
>>>>> --- a/include/linux/user_namespace.h
>>>>> +++ b/include/linux/user_namespace.h
>>>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>>>
>>>>>  extern struct user_namespace init_user_ns;
>>>>>
>>>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>>>> +{
>>>>> +     bool established;
>>>>> +     smp_mb__before_atomic();
>>>>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>>>> +     smp_mb__after_atomic();
>>>>> +     return established;
>>>>> +}
>>>>> +
>>>>
>>>> Maybe this is a stupid question, but why do we need all this magic
>>>> around established =  ... ?
>>>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>>>> in a lock-free manner?
>>>>
>>>
>>> See my other comment -- the ordering will matter at the end of the series.
>>
>> But ns->gid_map.nr_extents is not atomic, it is a plain u32.
>> This confuses me.
>
> Read Documentation/atomic_ops.txt a plain u32 is atomic by definiton.
>

I still don't understand why the helper changed to smp_mb__before_atomic.

> Which is a little bit convoluted.  However that is part of the of the
> gid mapping path and I optimized that as far as I humanly could so that
> calls like stat don't take a noticable slow donw.
>
> On this path we don't particularly care except that I am using an the
> existing data structure.

As an example, arm64 defines both smp_mb__before_atomic and
smp_mb__after_atomic as smp_mb(), which is heavier then smp_rmb(), and
there are two of them.  So I still like the explicit smp_rmb() better.

--Andy

>
> Eric
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 22:44                             ` Eric W. Biederman
@ 2014-12-08 22:48                               ` Andy Lutomirski
  2014-12-08 23:30                                 ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-08 22:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>
>>>   A value of 0 means the setgroups system call is disabled in the
>>
>> "deny"
>>
>>>   current processes user namespace and can not be enabled in the
>>>   future in this user namespace.
>>>
>>>   A value of 1 means the segtoups system call is enabled.
>>>
>>
>> "allow"
>>
>>> - Descedent user namespaces inherit the value of setgroups from
>>
>> s/Descedent/Descendent/
>
> Bah.  I updated everything but the changelog comment.
>
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>>          * the user namespace has been established.
>>>          */
>>>         return userns_gid_mappings_established(user_ns) &&
>>> +               userns_setgroups_allowed(user_ns) &&
>>>                 ns_capable(user_ns, CAP_SETGID);
>>>  }
>>
>> Can you add a comment explaining the ordering?  For example:
>
> I need to think on what I can say to make it clear.
> Perhaps: /* Careful the order of these checks is important. */
>
>> We need to check for a gid mapping before checking setgroups_allowed
>> because an unprivileged user can create a userns with setgroups
>> allowed, then disallow setgroups and add a mapping.  If we check in
>> the opposite order, then we have a race: we could see that setgroups
>> is allowed before the user clears the bit and then see that there is a
>> gid mapping after the other thread is done.
>

This text was actually my suggested comment text.

If you put smp_rmb() in this function with a comment like that, then I
think it will all make sense and be obviously correct (even with most
of the other barriers removed).

--Andy

> Since these are independent atomic variables yes that ordering issue
> seems to be the case.
>
> For me it was the natural ordering of the checks so I didn't even bother
> to think about what happens when you reorder them.
>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 22:48                               ` Andy Lutomirski
@ 2014-12-08 23:30                                 ` Eric W. Biederman
  2014-12-09 19:31                                   ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-08 23:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

Andy Lutomirski <luto@amacapital.net> writes:

> On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>
>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>>
>>>>   A value of 0 means the setgroups system call is disabled in the
>>>
>>> "deny"
>>>
>>>>   current processes user namespace and can not be enabled in the
>>>>   future in this user namespace.
>>>>
>>>>   A value of 1 means the segtoups system call is enabled.
>>>>
>>>
>>> "allow"
>>>
>>>> - Descedent user namespaces inherit the value of setgroups from
>>>
>>> s/Descedent/Descendent/
>>
>> Bah.  I updated everything but the changelog comment.
>>
>>>> --- a/kernel/groups.c
>>>> +++ b/kernel/groups.c
>>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>>>          * the user namespace has been established.
>>>>          */
>>>>         return userns_gid_mappings_established(user_ns) &&
>>>> +               userns_setgroups_allowed(user_ns) &&
>>>>                 ns_capable(user_ns, CAP_SETGID);
>>>>  }
>>>
>>> Can you add a comment explaining the ordering?  For example:
>>
>> I need to think on what I can say to make it clear.
>> Perhaps: /* Careful the order of these checks is important. */
>>
>>> We need to check for a gid mapping before checking setgroups_allowed
>>> because an unprivileged user can create a userns with setgroups
>>> allowed, then disallow setgroups and add a mapping.  If we check in
>>> the opposite order, then we have a race: we could see that setgroups
>>> is allowed before the user clears the bit and then see that there is a
>>> gid mapping after the other thread is done.
>>
>
> This text was actually my suggested comment text.

Now I see.

> If you put smp_rmb() in this function with a comment like that, then I
> think it will all make sense and be obviously correct (even with most
> of the other barriers removed).

Right.

Given that we have to be careful when using these things anyway what
I was hoping to achieve with the barriers appears impossible, and
confusing so I will see about just adding barriers where we need them
for real.  Sigh.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-08 23:30                                 ` Eric W. Biederman
@ 2014-12-09 19:31                                   ` Eric W. Biederman
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

ebiederm@xmission.com (Eric W. Biederman) writes:

> Andy Lutomirski <luto@amacapital.net> writes:
>
>>
>> This text was actually my suggested comment text.
>
> Now I see.
>
>> If you put smp_rmb() in this function with a comment like that, then I
>> think it will all make sense and be obviously correct (even with most
>> of the other barriers removed).
>
> Right.
>
> Given that we have to be careful when using these things anyway what
> I was hoping to achieve with the barriers appears impossible, and
> confusing so I will see about just adding barriers where we need them
> for real.  Sigh.

Doh.  The code has been entirely too clever.

There are no need for atomics or other cleverness, I just need to
generalize id_map_mutex.  I knew that had to be a trivially correct
way of handling this mess.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings.
  2014-12-09 19:31                                   ` Eric W. Biederman
@ 2014-12-09 20:36                                     ` Eric W. Biederman
  2014-12-09 20:38                                       ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
                                                         ` (7 more replies)
  0 siblings, 8 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b99c862a2e3f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
-	/* Allow mapping to your own filesystem ids */
+	/* Don't allow mappings that would allow anything that wouldn't
+	 * be allowed without the establishment of unprivileged mappings.
+	 */
 	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
@ 2014-12-09 20:38                                       ` Eric W. Biederman
  2014-12-09 22:49                                         ` Andy Lutomirski
  2014-12-09 20:39                                       ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
                                                         ` (6 subsequent siblings)
  7 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |  5 +++++
 kernel/groups.c                |  4 +++-
 kernel/user_namespace.c        | 14 ++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..8d493083486a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern bool userns_may_setgroups(const struct user_namespace *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns)
 {
 }
 
+static inline bool userns_may_setgroups(const struct user_namespace *ns)
+{
+	return true;
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/groups.c b/kernel/groups.c
index 02d8a251c476..664411f171b5 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -217,7 +218,8 @@ bool may_setgroups(void)
 {
 	struct user_namespace *user_ns = current_user_ns();
 
-	return ns_capable(user_ns, CAP_SETGID);
+	return ns_capable(user_ns, CAP_SETGID) &&
+		userns_may_setgroups(user_ns);
 }
 
 /*
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b99c862a2e3f..27c8dab48c07 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+bool userns_may_setgroups(const struct user_namespace *ns)
+{
+	bool allowed;
+
+	mutex_lock(&id_map_mutex);
+	/* It is not safe to use setgroups until a gid mapping in
+	 * the user namespace has been established.
+	 */
+	allowed = ns->gid_map.nr_extents != 0;
+	mutex_unlock(&id_map_mutex);
+
+	return allowed;
+}
+
 static void *userns_get(struct task_struct *task)
 {
 	struct user_namespace *user_ns;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  2014-12-09 20:38                                       ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
@ 2014-12-09 20:39                                       ` Eric W. Biederman
  2014-12-09 23:00                                         ` Andy Lutomirski
  2014-12-09 20:39                                       ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
                                                         ` (5 subsequent siblings)
  7 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 27c8dab48c07..1ce6d67c07b7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, file->f_cred->fsuid))
 				return true;
-		} else if (cap_setid == CAP_SETGID) {
-			kgid_t gid = make_kgid(ns->parent, id);
-			if (gid_eq(gid, file->f_cred->fsgid))
-				return true;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
  2014-12-09 20:38                                       ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
  2014-12-09 20:39                                       ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
@ 2014-12-09 20:39                                       ` Eric W. Biederman
  2014-12-09 20:41                                       ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
                                                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid.  Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1ce6d67c07b7..9451b12a9b6c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->fsuid))
+			if (uid_eq(uid, file->f_cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                                         ` (2 preceding siblings ...)
  2014-12-09 20:39                                       ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
@ 2014-12-09 20:41                                       ` Eric W. Biederman
  2014-12-09 20:41                                       ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
                                                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9451b12a9b6c..1e34de2fbd60 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
+	const struct cred *cred = file->f_cred;
 	/* Don't allow mappings that would allow anything that wouldn't
 	 * be allowed without the establishment of unprivileged mappings.
 	 */
-	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->euid))
+			if (uid_eq(uid, cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                                         ` (3 preceding siblings ...)
  2014-12-09 20:41                                       ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
@ 2014-12-09 20:41                                       ` Eric W. Biederman
  2014-12-09 22:49                                         ` Andy Lutomirski
  2014-12-09 20:42                                       ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
                                                         ` (2 subsequent siblings)
  7 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


Generalize id_map_mutex so it can be used for more state of a user namespace.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1e34de2fbd60..44a555ac6104 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,6 +24,7 @@
 #include <linux/fs_struct.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
+static DEFINE_MUTEX(userns_state_mutex);
 
 static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
@@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map,
 	return false;
 }
 
-
-static DEFINE_MUTEX(id_map_mutex);
-
 static ssize_t map_write(struct file *file, const char __user *buf,
 			 size_t count, loff_t *ppos,
 			 int cap_setid,
@@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	ssize_t ret = -EINVAL;
 
 	/*
-	 * The id_map_mutex serializes all writes to any given map.
+	 * The userns_state_mutex serializes all writes to any given map.
 	 *
 	 * Any map is only ever written once.
 	 *
@@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	 * order and smp_rmb() is guaranteed that we don't have crazy
 	 * architectures returning stale data.
 	 */
-	mutex_lock(&id_map_mutex);
+	mutex_lock(&userns_state_mutex);
 
 	ret = -EPERM;
 	/* Only allow one successful write to the map */
@@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	*ppos = count;
 	ret = count;
 out:
-	mutex_unlock(&id_map_mutex);
+	mutex_unlock(&userns_state_mutex);
 	if (page)
 		free_page(page);
 	return ret;
@@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	bool allowed;
 
-	mutex_lock(&id_map_mutex);
+	mutex_lock(&userns_state_mutex);
 	/* It is not safe to use setgroups until a gid mapping in
 	 * the user namespace has been established.
 	 */
 	allowed = ns->gid_map.nr_extents != 0;
-	mutex_unlock(&id_map_mutex);
+	mutex_unlock(&userns_state_mutex);
 
 	return allowed;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                                         ` (4 preceding siblings ...)
  2014-12-09 20:41                                       ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
@ 2014-12-09 20:42                                       ` Eric W. Biederman
  2014-12-09 22:28                                         ` Andy Lutomirski
  2014-12-09 20:43                                       ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
  2014-12-10 16:39                                       ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman
  7 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of "deny" means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls
  currently do not pass in a struct file so file_ns_capable
  is unusable.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/base.c                 |  31 +++++++++----
 include/linux/user_namespace.h |   7 +++
 kernel/user.c                  |   1 +
 kernel/user_namespace.c        | 103 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..4ebed9f01d97 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
 #endif /* CONFIG_TASK_IO_ACCOUNTING */
 
 #ifdef CONFIG_USER_NS
-static int proc_id_map_open(struct inode *inode, struct file *file,
+static int proc_userns_open(struct inode *inode, struct file *file,
 	const struct seq_operations *seq_ops)
 {
 	struct user_namespace *ns = NULL;
@@ -2418,7 +2418,7 @@ err:
 	return ret;
 }
 
-static int proc_id_map_release(struct inode *inode, struct file *file)
+static int proc_userns_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *seq = file->private_data;
 	struct user_namespace *ns = seq->private;
@@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)
 
 static int proc_uid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+	return proc_userns_open(inode, file, &proc_uid_seq_operations);
 }
 
 static int proc_gid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+	return proc_userns_open(inode, file, &proc_gid_seq_operations);
 }
 
 static int proc_projid_map_open(struct inode *inode, struct file *file)
 {
-	return proc_id_map_open(inode, file, &proc_projid_seq_operations);
+	return proc_userns_open(inode, file, &proc_projid_seq_operations);
 }
 
 static const struct file_operations proc_uid_map_operations = {
@@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
 	.write		= proc_uid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_gid_map_operations = {
@@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
 	.write		= proc_gid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
 };
 
 static const struct file_operations proc_projid_map_operations = {
@@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
 	.write		= proc_projid_map_write,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= proc_id_map_release,
+	.release	= proc_userns_release,
+};
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+	return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
+}
+
+static const struct file_operations proc_setgroups_operations = {
+	.open		= proc_setgroups_open,
+	.write		= proc_setgroups_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_userns_release,
 };
 #endif /* CONFIG_USER_NS */
 
@@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	REG("timers",	  S_IRUGO, proc_timers_operations),
@@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 };
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8d493083486a..feb0f4ec5573 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,10 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
 	} extent[UID_GID_MAP_MAX_EXTENTS];
 };
 
+#define USERNS_SETGROUPS_ALLOWED 1UL
+
+#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
+
 struct user_namespace {
 	struct uid_gid_map	uid_map;
 	struct uid_gid_map	gid_map;
@@ -27,6 +31,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -60,9 +65,11 @@ struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
 extern const struct seq_operations proc_projid_seq_operations;
+extern const struct seq_operations proc_setgroups_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
 #else
 
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..2d09940c9632 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.flags = USERNS_INIT_FLAGS,
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 44a555ac6104..b507f9af7ff2 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -100,6 +100,11 @@ int create_user_ns(struct cred *new)
 	ns->owner = owner;
 	ns->group = group;
 
+	/* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
+	mutex_lock(&userns_state_mutex);
+	ns->flags = parent_ns->flags;
+	mutex_unlock(&userns_state_mutex);
+
 	set_cred_user_ns(new, ns);
 
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return (*ppos == 0) ?  ns : NULL;
+}
+
+static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+	++*ppos;
+	return NULL;
+}
+
+static void setgroups_m_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int setgroups_m_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+
+	seq_printf(seq, "%s\n",
+		   test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
+		   "allow" : "deny");
+	return 0;
+}
+
+const struct seq_operations proc_setgroups_seq_operations = {
+	.start	= setgroups_m_start,
+	.stop = setgroups_m_stop,
+	.next = setgroups_m_next,
+	.show = setgroups_m_show,
+};
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	char kbuf[8], *pos;
+	bool setgroups_allowed;
+	ssize_t ret;
+
+	ret = -EACCES;
+	if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
+		goto out;
+
+	/* Only allow a very narrow range of strings to be written */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= sizeof(kbuf)))
+		goto out;
+
+	/* What was written? */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+	pos = kbuf;
+
+	/* What is being requested? */
+	ret = -EINVAL;
+	if (strncmp(pos, "allow", 5) == 0) {
+		pos += 5;
+		setgroups_allowed = true;
+	}
+	else if (strncmp(pos, "deny", 4) == 0) {
+		pos += 4;
+		setgroups_allowed = false;
+	}
+	else
+		goto out;
+
+	/* Verify there is not trailing junk on the line */
+	pos = skip_spaces(pos);
+	if (*pos != '\0')
+		goto out;
+
+	mutex_lock(&userns_state_mutex);
+	if (setgroups_allowed) {
+		ret = -EPERM;
+		if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) {
+			mutex_unlock(&userns_state_mutex);
+			goto out;
+		}
+	} else {
+		ns->flags &= ~USERNS_SETGROUPS_ALLOWED;
+	}
+	mutex_unlock(&userns_state_mutex);
+
+	/* Report a successful write */
+	*ppos = count;
+	ret = count;
+out:
+	return ret;
+}
+
 bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	bool allowed;
@@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 	 * the user namespace has been established.
 	 */
 	allowed = ns->gid_map.nr_extents != 0;
+	/* Is setgroups allowed? */
+	allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
 	mutex_unlock(&userns_state_mutex);
 
 	return allowed;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                                         ` (5 preceding siblings ...)
  2014-12-09 20:42                                       ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-09 20:43                                       ` Eric W. Biederman
  2014-12-10 16:39                                       ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman
  7 siblings, 0 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-09 20:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable


Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed.  Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b507f9af7ff2..3b29b9a52332 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -826,6 +826,11 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, cred->euid))
 				return true;
+		} else if (cap_setid == CAP_SETGID) {
+			kgid_t gid = make_kgid(ns->parent, id);
+			if (!(ns->flags & USERNS_SETGROUPS_ALLOWED) &&
+			    gid_eq(gid, cred->egid))
+				return true;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-09 20:42                                       ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-09 22:28                                         ` Andy Lutomirski
       [not found]                                           ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com>
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-09 22:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
>   A value of "deny" means the setgroups system call is disabled in the
>   current processes user namespace and can not be enabled in the
>   future in this user namespace.
>
>   A value of "allow" means the segtoups system call is enabled.
>
> - Descendant user namespaces inherit the value of setgroups from
>   their parents.
>
> - A proc file is used (instead of a sysctl) as sysctls
>   currently do not pass in a struct file so file_ns_capable
>   is unusable.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

But I still don't like the name "setgroups".  People may look at that
and have no clue what the scope of the setting is.  And anyone who, as
root, writes "deny" to /proc/self/setgroups, thinking that it acts on
self, will be in for a surprise.

--Andy

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/proc/base.c                 |  31 +++++++++----
>  include/linux/user_namespace.h |   7 +++
>  kernel/user.c                  |   1 +
>  kernel/user_namespace.c        | 103 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 134 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 772efa45a452..4ebed9f01d97 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns,
>  #endif /* CONFIG_TASK_IO_ACCOUNTING */
>
>  #ifdef CONFIG_USER_NS
> -static int proc_id_map_open(struct inode *inode, struct file *file,
> +static int proc_userns_open(struct inode *inode, struct file *file,
>         const struct seq_operations *seq_ops)
>  {
>         struct user_namespace *ns = NULL;
> @@ -2418,7 +2418,7 @@ err:
>         return ret;
>  }
>
> -static int proc_id_map_release(struct inode *inode, struct file *file)
> +static int proc_userns_release(struct inode *inode, struct file *file)
>  {
>         struct seq_file *seq = file->private_data;
>         struct user_namespace *ns = seq->private;
> @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file)
>
>  static int proc_uid_map_open(struct inode *inode, struct file *file)
>  {
> -       return proc_id_map_open(inode, file, &proc_uid_seq_operations);
> +       return proc_userns_open(inode, file, &proc_uid_seq_operations);
>  }
>
>  static int proc_gid_map_open(struct inode *inode, struct file *file)
>  {
> -       return proc_id_map_open(inode, file, &proc_gid_seq_operations);
> +       return proc_userns_open(inode, file, &proc_gid_seq_operations);
>  }
>
>  static int proc_projid_map_open(struct inode *inode, struct file *file)
>  {
> -       return proc_id_map_open(inode, file, &proc_projid_seq_operations);
> +       return proc_userns_open(inode, file, &proc_projid_seq_operations);
>  }
>
>  static const struct file_operations proc_uid_map_operations = {
> @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = {
>         .write          = proc_uid_map_write,
>         .read           = seq_read,
>         .llseek         = seq_lseek,
> -       .release        = proc_id_map_release,
> +       .release        = proc_userns_release,
>  };
>
>  static const struct file_operations proc_gid_map_operations = {
> @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = {
>         .write          = proc_gid_map_write,
>         .read           = seq_read,
>         .llseek         = seq_lseek,
> -       .release        = proc_id_map_release,
> +       .release        = proc_userns_release,
>  };
>
>  static const struct file_operations proc_projid_map_operations = {
> @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = {
>         .write          = proc_projid_map_write,
>         .read           = seq_read,
>         .llseek         = seq_lseek,
> -       .release        = proc_id_map_release,
> +       .release        = proc_userns_release,
> +};
> +
> +static int proc_setgroups_open(struct inode *inode, struct file *file)
> +{
> +       return proc_userns_open(inode, file, &proc_setgroups_seq_operations);
> +}
> +
> +static const struct file_operations proc_setgroups_operations = {
> +       .open           = proc_setgroups_open,
> +       .write          = proc_setgroups_write,
> +       .read           = seq_read,
> +       .llseek         = seq_lseek,
> +       .release        = proc_userns_release,
>  };
>  #endif /* CONFIG_USER_NS */
>
> @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = {
>         REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
>         REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
>         REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
> +       REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
>  #endif
>  #ifdef CONFIG_CHECKPOINT_RESTORE
>         REG("timers",     S_IRUGO, proc_timers_operations),
> @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = {
>         REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
>         REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
>         REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
> +       REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
>  #endif
>  };
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8d493083486a..feb0f4ec5573 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -17,6 +17,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
>         } extent[UID_GID_MAP_MAX_EXTENTS];
>  };
>
> +#define USERNS_SETGROUPS_ALLOWED 1UL
> +
> +#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
> +
>  struct user_namespace {
>         struct uid_gid_map      uid_map;
>         struct uid_gid_map      gid_map;
> @@ -27,6 +31,7 @@ struct user_namespace {
>         kuid_t                  owner;
>         kgid_t                  group;
>         unsigned int            proc_inum;
> +       unsigned long           flags;
>
>         /* Register of per-UID persistent keyrings for this namespace */
>  #ifdef CONFIG_PERSISTENT_KEYRINGS
> @@ -60,9 +65,11 @@ struct seq_operations;
>  extern const struct seq_operations proc_uid_seq_operations;
>  extern const struct seq_operations proc_gid_seq_operations;
>  extern const struct seq_operations proc_projid_seq_operations;
> +extern const struct seq_operations proc_setgroups_seq_operations;
>  extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
>  extern bool userns_may_setgroups(const struct user_namespace *ns);
>  #else
>
> diff --git a/kernel/user.c b/kernel/user.c
> index 4efa39350e44..2d09940c9632 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
>         .owner = GLOBAL_ROOT_UID,
>         .group = GLOBAL_ROOT_GID,
>         .proc_inum = PROC_USER_INIT_INO,
> +       .flags = USERNS_INIT_FLAGS,
>  #ifdef CONFIG_PERSISTENT_KEYRINGS
>         .persistent_keyring_register_sem =
>         __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 44a555ac6104..b507f9af7ff2 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -100,6 +100,11 @@ int create_user_ns(struct cred *new)
>         ns->owner = owner;
>         ns->group = group;
>
> +       /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
> +       mutex_lock(&userns_state_mutex);
> +       ns->flags = parent_ns->flags;
> +       mutex_unlock(&userns_state_mutex);
> +
>         set_cred_user_ns(new, ns);
>
>  #ifdef CONFIG_PERSISTENT_KEYRINGS
> @@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file,
>         return false;
>  }
>
> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +       struct user_namespace *ns = seq->private;
> +
> +       return (*ppos == 0) ?  ns : NULL;
> +}
> +
> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos)
> +{
> +       ++*ppos;
> +       return NULL;
> +}
> +
> +static void setgroups_m_stop(struct seq_file *seq, void *v)
> +{
> +}
> +
> +static int setgroups_m_show(struct seq_file *seq, void *v)
> +{
> +       struct user_namespace *ns = seq->private;
> +
> +       seq_printf(seq, "%s\n",
> +                  test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ?
> +                  "allow" : "deny");
> +       return 0;
> +}
> +
> +const struct seq_operations proc_setgroups_seq_operations = {
> +       .start  = setgroups_m_start,
> +       .stop = setgroups_m_stop,
> +       .next = setgroups_m_next,
> +       .show = setgroups_m_show,
> +};
> +
> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *ppos)
> +{
> +       struct seq_file *seq = file->private_data;
> +       struct user_namespace *ns = seq->private;
> +       char kbuf[8], *pos;
> +       bool setgroups_allowed;
> +       ssize_t ret;
> +
> +       ret = -EACCES;
> +       if (!file_ns_capable(file, ns, CAP_SYS_ADMIN))
> +               goto out;
> +
> +       /* Only allow a very narrow range of strings to be written */
> +       ret = -EINVAL;
> +       if ((*ppos != 0) || (count >= sizeof(kbuf)))
> +               goto out;
> +
> +       /* What was written? */
> +       ret = -EFAULT;
> +       if (copy_from_user(kbuf, buf, count))
> +               goto out;
> +       kbuf[count] = '\0';
> +       pos = kbuf;
> +
> +       /* What is being requested? */
> +       ret = -EINVAL;
> +       if (strncmp(pos, "allow", 5) == 0) {
> +               pos += 5;
> +               setgroups_allowed = true;
> +       }
> +       else if (strncmp(pos, "deny", 4) == 0) {
> +               pos += 4;
> +               setgroups_allowed = false;
> +       }
> +       else
> +               goto out;
> +
> +       /* Verify there is not trailing junk on the line */
> +       pos = skip_spaces(pos);
> +       if (*pos != '\0')
> +               goto out;
> +
> +       mutex_lock(&userns_state_mutex);
> +       if (setgroups_allowed) {
> +               ret = -EPERM;
> +               if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) {
> +                       mutex_unlock(&userns_state_mutex);
> +                       goto out;
> +               }
> +       } else {
> +               ns->flags &= ~USERNS_SETGROUPS_ALLOWED;
> +       }
> +       mutex_unlock(&userns_state_mutex);
> +
> +       /* Report a successful write */
> +       *ppos = count;
> +       ret = count;
> +out:
> +       return ret;
> +}
> +
>  bool userns_may_setgroups(const struct user_namespace *ns)
>  {
>         bool allowed;
> @@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns)
>          * the user namespace has been established.
>          */
>         allowed = ns->gid_map.nr_extents != 0;
> +       /* Is setgroups allowed? */
> +       allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
>         mutex_unlock(&userns_state_mutex);
>
>         return allowed;
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-09 20:38                                       ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
@ 2014-12-09 22:49                                         ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-09 22:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 9, 2014 at 12:38 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> setgroups is unique in not needing a valid mapping before it can be called,
> in the case of setgroups(0, NULL) which drops all supplemental groups.
>
> The design of the user namespace assumes that CAP_SETGID can not actually
> be used until a gid mapping is established.  Therefore add a helper function
> to see if the user namespace gid mapping has been established and call
> that function in the setgroups permission check.
>
> This is part of the fix for CVE-2014-8989, being able to drop groups
> without privilege using user namespaces.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/user_namespace.h |  5 +++++
>  kernel/groups.c                |  4 +++-
>  kernel/user_namespace.c        | 14 ++++++++++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index e95372654f09..8d493083486a 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations;
>  extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern bool userns_may_setgroups(const struct user_namespace *ns);
>  #else
>
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns)
>  {
>  }
>
> +static inline bool userns_may_setgroups(const struct user_namespace *ns)
> +{
> +       return true;
> +}
>  #endif
>
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 02d8a251c476..664411f171b5 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -6,6 +6,7 @@
>  #include <linux/slab.h>
>  #include <linux/security.h>
>  #include <linux/syscalls.h>
> +#include <linux/user_namespace.h>
>  #include <asm/uaccess.h>
>
>  /* init to 2 - one for init_task, one to ensure it is never freed */
> @@ -217,7 +218,8 @@ bool may_setgroups(void)
>  {
>         struct user_namespace *user_ns = current_user_ns();
>
> -       return ns_capable(user_ns, CAP_SETGID);
> +       return ns_capable(user_ns, CAP_SETGID) &&
> +               userns_may_setgroups(user_ns);
>  }
>
>  /*
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index b99c862a2e3f..27c8dab48c07 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file,
>         return false;
>  }
>
> +bool userns_may_setgroups(const struct user_namespace *ns)
> +{
> +       bool allowed;
> +
> +       mutex_lock(&id_map_mutex);
> +       /* It is not safe to use setgroups until a gid mapping in
> +        * the user namespace has been established.
> +        */
> +       allowed = ns->gid_map.nr_extents != 0;
> +       mutex_unlock(&id_map_mutex);
> +
> +       return allowed;
> +}
> +
>  static void *userns_get(struct task_struct *task)
>  {
>         struct user_namespace *user_ns;
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex
  2014-12-09 20:41                                       ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
@ 2014-12-09 22:49                                         ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-09 22:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 9, 2014 at 12:41 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Generalize id_map_mutex so it can be used for more state of a user namespace.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 1e34de2fbd60..44a555ac6104 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs_struct.h>
>
>  static struct kmem_cache *user_ns_cachep __read_mostly;
> +static DEFINE_MUTEX(userns_state_mutex);
>
>  static bool new_idmap_permitted(const struct file *file,
>                                 struct user_namespace *ns, int cap_setid,
> @@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map,
>         return false;
>  }
>
> -
> -static DEFINE_MUTEX(id_map_mutex);
> -
>  static ssize_t map_write(struct file *file, const char __user *buf,
>                          size_t count, loff_t *ppos,
>                          int cap_setid,
> @@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>         ssize_t ret = -EINVAL;
>
>         /*
> -        * The id_map_mutex serializes all writes to any given map.
> +        * The userns_state_mutex serializes all writes to any given map.
>          *
>          * Any map is only ever written once.
>          *
> @@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>          * order and smp_rmb() is guaranteed that we don't have crazy
>          * architectures returning stale data.
>          */
> -       mutex_lock(&id_map_mutex);
> +       mutex_lock(&userns_state_mutex);
>
>         ret = -EPERM;
>         /* Only allow one successful write to the map */
> @@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>         *ppos = count;
>         ret = count;
>  out:
> -       mutex_unlock(&id_map_mutex);
> +       mutex_unlock(&userns_state_mutex);
>         if (page)
>                 free_page(page);
>         return ret;
> @@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns)
>  {
>         bool allowed;
>
> -       mutex_lock(&id_map_mutex);
> +       mutex_lock(&userns_state_mutex);
>         /* It is not safe to use setgroups until a gid mapping in
>          * the user namespace has been established.
>          */
>         allowed = ns->gid_map.nr_extents != 0;
> -       mutex_unlock(&id_map_mutex);
> +       mutex_unlock(&userns_state_mutex);
>
>         return allowed;
>  }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings
  2014-12-09 20:39                                       ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
@ 2014-12-09 23:00                                         ` Andy Lutomirski
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-09 23:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 9, 2014 at 12:39 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> As any gid mapping will allow and must allow for backwards
> compatibility dropping groups don't allow any gid mappings to be
> established without CAP_SETGID in the parent user namespace.
>
> For a small class of applications this change breaks userspace
> and removes useful functionality.  This small class of applications
> includes tools/testing/selftests/mount/unprivilged-remount-test.c
>
> Most of the removed functionality will be added back with the addition
> of a one way knob to disable setgroups.  Once setgroups is disabled
> setting the gid_map becomes as safe as setting the uid_map.
>
> For more common applications that set the uid_map and the gid_map
> with privilege this change will have no affect.
>

Reviewed-by: Andy Lutomirski <luto@amacapital.net>

> This is part of a fix for CVE-2014-8989.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 27c8dab48c07..1ce6d67c07b7 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
>                         kuid_t uid = make_kuid(ns->parent, id);
>                         if (uid_eq(uid, file->f_cred->fsuid))
>                                 return true;
> -               } else if (cap_setid == CAP_SETGID) {
> -                       kgid_t gid = make_kgid(ns->parent, id);
> -                       if (gid_eq(gid, file->f_cred->fsgid))
> -                               return true;
>                 }
>         }
>
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis
       [not found]                                           ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com>
@ 2014-12-10  0:21                                             ` Andy Lutomirski
       [not found]                                               ` <87wq5zf83t.fsf@x220.int.ebiederm.org>
  0 siblings, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-10  0:21 UTC (permalink / raw)
  To: Eric W.Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <ebiederm@xmission.com> wrote:
>
>
> On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <luto@amacapital.net> wrote:
>>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
>><ebiederm@xmission.com> wrote:
>>>
>>> - Expose the knob to user space through a proc file
>>/proc/<pid>/setgroups
>>>
>>>   A value of "deny" means the setgroups system call is disabled in
>>the
>>>   current processes user namespace and can not be enabled in the
>>>   future in this user namespace.
>>>
>>>   A value of "allow" means the segtoups system call is enabled.
>>>
>>> - Descendant user namespaces inherit the value of setgroups from
>>>   their parents.
>>>
>>> - A proc file is used (instead of a sysctl) as sysctls
>>>   currently do not pass in a struct file so file_ns_capable
>>>   is unusable.
>>
>>Reviewed-by: Andy Lutomirski <luto@amacapital.net>
>>
>>But I still don't like the name "setgroups".  People may look at that
>>and have no clue what the scope of the setting is.  And anyone who, as
>>root, writes "deny" to /proc/self/setgroups, thinking that it acts on
>>self, will be in for a surprise.
>
> True setgroups isn't perfect.  Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl.   But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed.
>

Would "userns_setgroups" be okay?

--Andy

> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [CFT] Can I get some Tested-By's on this series?
  2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                                                         ` (6 preceding siblings ...)
  2014-12-09 20:43                                       ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
@ 2014-12-10 16:39                                       ` Eric W. Biederman
  2014-12-10 22:48                                         ` Serge Hallyn
  2014-12-16  2:05                                         ` Andy Lutomirski
  7 siblings, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-10 16:39 UTC (permalink / raw)
  To: Andy Lutomirski, Serge E. Hallyn, Richard Weinberger
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Kenton Varda, stable


Will people please test these patches with their container project?

These changes break container userspace (hopefully in a minimal way) if
I could have that confirmed by testing I would really appreciate it.  I
really don't want to send out a bug fix that accidentally breaks
userspace again.

The only issue sort of under discussion is if there is a better name for
/proc/<pid>/setgroups, and the name of the file will not affect the
functionality of the patchset.

With the code reviewed and written in simple obviously correct, easily
reviewable ways I am hoping/planning to send this to Linus ASAP.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 16:39                                       ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman
@ 2014-12-10 22:48                                         ` Serge Hallyn
  2014-12-10 22:50                                           ` Richard Weinberger
                                                             ` (2 more replies)
  2014-12-16  2:05                                         ` Andy Lutomirski
  1 sibling, 3 replies; 79+ messages in thread
From: Serge Hallyn @ 2014-12-10 22:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Serge E. Hallyn, Richard Weinberger, linux-man,
	Kees Cook, Linux API, Linux Containers, Josh Triplett, stable,
	linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages,
	Casey Schaufler, Andrew Morton

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> Will people please test these patches with their container project?
> 
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it.  I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
> 
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
> 
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.
> 
> Eric

Is there a git tree we can clone?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 22:48                                         ` Serge Hallyn
@ 2014-12-10 22:50                                           ` Richard Weinberger
  2014-12-10 23:19                                             ` Eric W. Biederman
  2014-12-13 22:31                                           ` serge
       [not found]                                           ` <87lhmcy2et.fsf@x220.int.ebiederm.org>
  2 siblings, 1 reply; 79+ messages in thread
From: Richard Weinberger @ 2014-12-10 22:50 UTC (permalink / raw)
  To: Serge Hallyn, Eric W. Biederman
  Cc: Andy Lutomirski, Serge E. Hallyn, linux-man, Kees Cook,
	Linux API, Linux Containers, Josh Triplett, stable, linux-kernel,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler,
	Andrew Morton

Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>
>> Will people please test these patches with their container project?
>>
>> These changes break container userspace (hopefully in a minimal way) if
>> I could have that confirmed by testing I would really appreciate it.  I
>> really don't want to send out a bug fix that accidentally breaks
>> userspace again.
>>
>> The only issue sort of under discussion is if there is a better name for
>> /proc/<pid>/setgroups, and the name of the file will not affect the
>> functionality of the patchset.
>>
>> With the code reviewed and written in simple obviously correct, easily
>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>
>> Eric
> 
> Is there a git tree we can clone?

I was about to ask that too.
Hopefully I can tomorrow find some time for testing.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 22:50                                           ` Richard Weinberger
@ 2014-12-10 23:19                                             ` Eric W. Biederman
  2014-12-11 19:27                                               ` Richard Weinberger
  2014-12-12  6:56                                               ` Chen, Hanxiao
  0 siblings, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-10 23:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Serge Hallyn, Andy Lutomirski, Serge E. Hallyn, linux-man,
	Kees Cook, Linux API, Linux Containers, Josh Triplett, stable,
	linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages,
	Casey Schaufler, Andrew Morton

Richard Weinberger <richard@nod.at> writes:

> Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
>> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>>
>>> Will people please test these patches with their container project?
>>>
>>> These changes break container userspace (hopefully in a minimal way) if
>>> I could have that confirmed by testing I would really appreciate it.  I
>>> really don't want to send out a bug fix that accidentally breaks
>>> userspace again.
>>>
>>> The only issue sort of under discussion is if there is a better name for
>>> /proc/<pid>/setgroups, and the name of the file will not affect the
>>> functionality of the patchset.
>>>
>>> With the code reviewed and written in simple obviously correct, easily
>>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>>
>>> Eric
>> 
>> Is there a git tree we can clone?
>
> I was about to ask that too.
> Hopefully I can tomorrow find some time for testing.

git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

That holds my entire queue of fixes against 3.18-rc6

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 23:19                                             ` Eric W. Biederman
@ 2014-12-11 19:27                                               ` Richard Weinberger
  2014-12-12  6:56                                               ` Chen, Hanxiao
  1 sibling, 0 replies; 79+ messages in thread
From: Richard Weinberger @ 2014-12-11 19:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andy Lutomirski, Serge E. Hallyn, linux-man,
	Kees Cook, Linux API, Linux Containers, Josh Triplett, stable,
	linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages,
	Casey Schaufler, Andrew Morton

Am 11.12.2014 um 00:19 schrieb Eric W. Biederman:
> Richard Weinberger <richard@nod.at> writes:
> 
>> Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
>>> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>>>
>>>> Will people please test these patches with their container project?
>>>>
>>>> These changes break container userspace (hopefully in a minimal way) if
>>>> I could have that confirmed by testing I would really appreciate it.  I
>>>> really don't want to send out a bug fix that accidentally breaks
>>>> userspace again.
>>>>
>>>> The only issue sort of under discussion is if there is a better name for
>>>> /proc/<pid>/setgroups, and the name of the file will not affect the
>>>> functionality of the patchset.
>>>>
>>>> With the code reviewed and written in simple obviously correct, easily
>>>> reviewable ways I am hoping/planning to send this to Linus ASAP.
>>>>
>>>> Eric
>>>
>>> Is there a git tree we can clone?
>>
>> I was about to ask that too.
>> Hopefully I can tomorrow find some time for testing.
> 
> git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
> 
> That holds my entire queue of fixes against 3.18-rc6

So far nothing broke on my libvirt-lxc test bed. :-)
Tested with openSUSE 13.2 and libvirt 1.2.9.

Tested-by: Richard Weinberger <richard@nod.at>

Thanks,
//richard

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis
       [not found]                                                 ` <87iohh3c9c.fsf@x220.int.ebiederm.org>
@ 2014-12-12  1:30                                                   ` Andy Lutomirski
       [not found]                                                   ` <8761dh3b7k.fsf_-_@x220.int.ebiederm.org>
  1 sibling, 0 replies; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-12  1:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable

On Thu, Dec 11, 2014 at 5:09 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <ebiederm@xmission.com> wrote:
>>>>
>>>>
>>>> On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman
>>>>><ebiederm@xmission.com> wrote:
>>>>>>
>>>>>> - Expose the knob to user space through a proc file
>>>>>/proc/<pid>/setgroups
>>>>>>
>>>>>>   A value of "deny" means the setgroups system call is disabled in
>>>>>the
>>>>>>   current processes user namespace and can not be enabled in the
>>>>>>   future in this user namespace.
>>>>>>
>>>>>>   A value of "allow" means the segtoups system call is enabled.
>>>>>>
>>>>>> - Descendant user namespaces inherit the value of setgroups from
>>>>>>   their parents.
>>>>>>
>>>>>> - A proc file is used (instead of a sysctl) as sysctls
>>>>>>   currently do not pass in a struct file so file_ns_capable
>>>>>>   is unusable.
>>>>>
>>>>>Reviewed-by: Andy Lutomirski <luto@amacapital.net>
>>>>>
>>>>>But I still don't like the name "setgroups".  People may look at that
>>>>>and have no clue what the scope of the setting is.  And anyone who, as
>>>>>root, writes "deny" to /proc/self/setgroups, thinking that it acts on
>>>>>self, will be in for a surprise.
>>>>
>>>> True setgroups isn't perfect.  Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl.   But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed.
>>>>
>>>
>>> Would "userns_setgroups" be okay?
>>
>> Maybe.
>>
>> I just played with this and this is a much bigger booby trap than I had
>> realized.  Disabling setgroups disables the possibility of logging in the
>> future and since it is a one way switch the only way out is to reboot.
>>
>> Hooray our software checks the returns of setgroups.  Booh.  This is a
>> really nasty knob to have anywhere.
>>
>> I need to think about this a little bit.  Giving root the power to shoot
>> himself in the foot is one thing.  Giving root a loaded gun pointed at
>> his foot with the hammer pulled back, and a sign that says I dare you to
>> pull the trigger, seems like a bad idea.
>>
>> I think I need to reduce when that knob can be used.  Grr.
>> Back to the drawing board!
>
> I tried out a bunch of things and finally found a simple rule.  Don't
> allow setgroups to be disabled after setgroups has been enabled in a
> user namespace.  Or in practical terms don't allow setgroups to be
> disabled after the gid_map has been set.
>
> Which in practice pretty nearly means that we are only allowing writes
> to setgroups when it is a single process and it's eventual children that
> can be affected.
>
> At which point I don't think a name change would make things any
> clearer.

The name change still helps the user to does:

$ ls /proc/self

"setgroups?  what's that?"

>
> I have also updated the code to move the permission checks to open
> where they belong (doh!).  Patch follows.

Will review and test.

>
> Eric
>
>
>
>
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 23:19                                             ` Eric W. Biederman
  2014-12-11 19:27                                               ` Richard Weinberger
@ 2014-12-12  6:56                                               ` Chen, Hanxiao
  1 sibling, 0 replies; 79+ messages in thread
From: Chen, Hanxiao @ 2014-12-12  6:56 UTC (permalink / raw)
  To: Eric W. Biederman, Richard Weinberger
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Serge Hallyn,
	Josh Triplett, stable, Andy Lutomirski, Kenton Varda, LSM,
	Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 1968 bytes --]



> -----Original Message-----
> From: containers-bounces@lists.linux-foundation.org
> [mailto:containers-bounces@lists.linux-foundation.org] On Behalf Of Eric W.
> Biederman
> Sent: Thursday, December 11, 2014 7:20 AM
> To: Richard Weinberger
> Cc: linux-man; Kees Cook; Linux API; Linux Containers; Serge Hallyn; Josh Triplett;
> stable; Andy Lutomirski; Kenton Varda; LSM; Michael Kerrisk-manpages; Casey
> Schaufler; Andrew Morton; linux-kernel@vger.kernel.org
> Subject: Re: [CFT] Can I get some Tested-By's on this series?
> 
> Richard Weinberger <richard@nod.at> writes:
> 
> > Am 10.12.2014 um 23:48 schrieb Serge Hallyn:
> >> Quoting Eric W. Biederman (ebiederm@xmission.com):
> >>>
> >>> Will people please test these patches with their container project?
> >>>
> >>> These changes break container userspace (hopefully in a minimal way) if
> >>> I could have that confirmed by testing I would really appreciate it.  I
> >>> really don't want to send out a bug fix that accidentally breaks
> >>> userspace again.
> >>>
> >>> The only issue sort of under discussion is if there is a better name for
> >>> /proc/<pid>/setgroups, and the name of the file will not affect the
> >>> functionality of the patchset.
> >>>
> >>> With the code reviewed and written in simple obviously correct, easily
> >>> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >>>
> >>> Eric
> >>
> >> Is there a git tree we can clone?
> >
> > I was about to ask that too.
> > Hopefully I can tomorrow find some time for testing.
> 
> git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
> for-testing
> 
> That holds my entire queue of fixes against 3.18-rc6
> 

Tested on Fedora20 with libvirt 1.2.11, works fine.

Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>

Thanks,
- Chen
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
       [not found]                                                     ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org>
@ 2014-12-12 21:54                                                       ` Eric W. Biederman
  2015-02-02 15:36                                                         ` Michael Kerrisk (man-pages)
  2014-12-12 21:54                                                       ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman
  1 sibling, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-12 21:54 UTC (permalink / raw)
  To: Michael Kerrisk-manpages
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Linux API, linux-man, linux-kernel, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable,
	Andy Lutomirski


Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 man5/proc.5 | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/man5/proc.5 b/man5/proc.5
index 96077d0dd195..d661e8cfeac9 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
 .\"       Added in 2.6.9
 .\"       CONFIG_SCHEDSTATS
 .TP
+.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
+This file reports
+.BR allow
+if the setgroups system call is allowed in the current user namespace.
+This file reports
+.BR deny
+if the setgroups system call is not allowed in the current user namespace.
+This file may be written to with values of
+.BR allow
+and
+.BR deny
+before
+.IR /proc/[pid]/gid_map
+is written to (enabling setgroups) in a user namespace.
+.TP
 .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
 This file shows memory consumption for each of the process's mappings.
 (The
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
       [not found]                                                     ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org>
  2014-12-12 21:54                                                       ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman
@ 2014-12-12 21:54                                                       ` Eric W. Biederman
  2015-02-02 15:37                                                         ` Michael Kerrisk (man-pages)
  2015-02-02 21:31                                                         ` Alban Crequy
  1 sibling, 2 replies; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-12 21:54 UTC (permalink / raw)
  To: Michael Kerrisk-manpages
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Linux API, linux-man, linux-kernel, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable,
	Andy Lutomirski


Files with access permissions such as ---rwx---rwx give fewer
permissions to their group then they do to everyone else.  Which means
dropping groups with setgroups(0, NULL) actually grants a process
privileges.

The uprivileged setting of gid_map turned out not to be safe after
this change.  Privilege setting of gid_map can be interpreted as
meaning yes it is ok to drop groups.

To prevent this problem and future problems user namespaces were
changed in such a way as to guarantee a user can not obtain
credentials without privilege they could not obtain without the
help of user namespaces.

This meant testing the effective user ID and not the filesystem user
ID as setresuid and setregid allow setting any process uid or gid
(except the supplemental groups) to the effective ID.

Furthermore to preserve in some form the useful applications that have
been setting gid_map without privilege the file /proc/[pid]/setgroups
was added to allow disabling setgroups.  With the setgroups system
call permanently disabled in a user namespace it again becomes safe to
allow writes to gid_map without privilege.

Here is my meager attempt to update user_namespaces.7 to reflect these
issues.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7
index d76721d9a0a1..f8333a762308 100644
--- a/man7/user_namespaces.7
+++ b/man7/user_namespaces.7
@@ -533,11 +533,16 @@ One of the following is true:
 The data written to
 .I uid_map
 .RI ( gid_map )
-consists of a single line that maps the writing process's filesystem user ID
+consists of a single line that maps the writing process's effective user ID
 (group ID) in the parent user namespace to a user ID (group ID)
 in the user namespace.
-The usual case here is that this single line provides a mapping for user ID
-of the process that created the namespace.
+The writing process must have the same effective user ID as the process
+that created the user namespace.
+In the case of
+.I gid_map
+the
+.I setgroups
+file must have been written to earlier and disabled the setgroups system call.
 .IP * 3
 The opening process has the
 .BR CAP_SETUID
@@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error
 .\"
 .\" ============================================================
 .\"
+.SS Interaction with system calls that change the uid or gid values
+When in a user namespace where the
+.I uid_map
+or
+.I gid_map
+file has not been written the system calls that change user IDs
+or group IDs respectively will fail.  After the
+.I uid_map
+and
+.I gid_map
+file have been written only the mapped values may be used in
+system calls that change user IDs and group IDs.
+
+For user IDs these system calls include
+.BR setuid ,
+.BR setfsuid ,
+.BR setreuid ,
+and
+.BR setresuid .
+
+For group IDs these system calls include
+.BR setgid ,
+.BR setfsgid ,
+.BR setregid ,
+.BR setresgid ,
+and
+.BR setgroups.
+
+Writing
+.BR deny
+to the
+.I /proc/[pid]/setgroups
+file before writing to
+.I /proc/[pid]/gid_map
+will permanently disable the setgroups system call in a user namespace
+and allow writing to
+.I /proc/[pid]/gid_map
+without
+.BR CAP_SETGID
+in the parent user namespace.
+
 .SS Unmapped user and group IDs
 .PP
 There are various places where an unmapped user ID (group ID)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 22:48                                         ` Serge Hallyn
  2014-12-10 22:50                                           ` Richard Weinberger
@ 2014-12-13 22:31                                           ` serge
       [not found]                                           ` <87lhmcy2et.fsf@x220.int.ebiederm.org>
  2 siblings, 0 replies; 79+ messages in thread
From: serge @ 2014-12-13 22:31 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn
  Cc: Andy Lutomirski, Richard Weinberger, linux-man, Kees Cook,
	Linux API, Linux Containers, Josh Triplett, stable, linux-kernel,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler,
	Andrew Morton

sorry, I've only been back from the road the days...  Two tries at compiling have failed (infrastructure problems, not your set), hoping to fire of another build tonight.On 12/10/14 16:48 Serge Hallyn wrote:
Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> Will people please test these patches with their container project?
> 
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it.  I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
> 
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
> 
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.
> 
> Eric

Is there a git tree we can clone?


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
       [not found]                                               ` <8761dgze56.fsf@x220.int.ebiederm.org>
@ 2014-12-15 19:38                                                 ` Serge Hallyn
  2014-12-15 20:11                                                   ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Serge Hallyn @ 2014-12-15 19:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stéphane Graber, Richard Weinberger, Serge Hallyn,
	Andy Lutomirski, linux-man, Kees Cook, Linux API,
	Linux Containers, Josh Triplett, stable, linux-kernel,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler,
	Andrew Morton

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Stéphane Graber <stgraber@ubuntu.com> writes:
> 
> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
> >> Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> >> 
> >> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> >> 
> >> >> Will people please test these patches with their container project?
> >> >> 
> >> >> These changes break container userspace (hopefully in a minimal way) if
> >> >> I could have that confirmed by testing I would really appreciate it.  I
> >> >> really don't want to send out a bug fix that accidentally breaks
> >> >> userspace again.
> >> >> 
> >> >> The only issue sort of under discussion is if there is a better name for
> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
> >> >> functionality of the patchset.
> >> >> 
> >> >> With the code reviewed and written in simple obviously correct, easily
> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >> >> 
> >> >> Eric
> >> >
> >> > Is there a git tree we can clone?
> >> 
> >> Have either of you been able to check to see if any of my changes
> >> affects lxc?
> >> 
> >> I am trying to gauge how hard and how fast I should push to Linus.  lxc
> >> being the largest adopter of unprivileged user namespaces for general
> >> purpose containers.
> >> 
> >> I expect you just call newuidmap and newgidmap and don't actually care
> >> about not being able to set gid_map without privilege.  But I really
> >> want to avoid pushing a security fix and then being surprised that
> >> things like lxc break.
> >> 
> >> Eric
> >
> > Hi Eric,
> >
> > I've unfortunately been pretty busy this week as I was (well, still am)
> > travelling to South Africa for a meeting. I don't have a full kernel
> > tree around here and a full git clone isn't really doable over the kind
> > of Internet I've got here :)
> >
> > Hopefully Serge can give it a quick try, otherwise I should be able to
> > do some tests on Tuesday when I'm back home.
> 
> I thought Serge was going to but I haven't heard yet so I am prodding ;-)

Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
Just to be sure I was testing the right thing I also tested using
my unprivileged nsexec testcases, and they failed on setgroup/setgid
as now expected, and succeeded there without your patches.

thanks,
-serge

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-15 19:38                                                 ` Serge Hallyn
@ 2014-12-15 20:11                                                   ` Eric W. Biederman
  2014-12-15 20:49                                                     ` Serge Hallyn
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2014-12-15 20:11 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Stéphane Graber, Richard Weinberger, Andy Lutomirski,
	linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel, Kenton Varda, LSM,
	Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton

Serge Hallyn <serge.hallyn@ubuntu.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Stéphane Graber <stgraber@ubuntu.com> writes:
>> 
>> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
>> >> Serge Hallyn <serge.hallyn@ubuntu.com> writes:
>> >> 
>> >> > Quoting Eric W. Biederman (ebiederm@xmission.com):
>> >> >> 
>> >> >> Will people please test these patches with their container project?
>> >> >> 
>> >> >> These changes break container userspace (hopefully in a minimal way) if
>> >> >> I could have that confirmed by testing I would really appreciate it.  I
>> >> >> really don't want to send out a bug fix that accidentally breaks
>> >> >> userspace again.
>> >> >> 
>> >> >> The only issue sort of under discussion is if there is a better name for
>> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
>> >> >> functionality of the patchset.
>> >> >> 
>> >> >> With the code reviewed and written in simple obviously correct, easily
>> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
>> >> >> 
>> >> >> Eric
>> >> >
>> >> > Is there a git tree we can clone?
>> >> 
>> >> Have either of you been able to check to see if any of my changes
>> >> affects lxc?
>> >> 
>> >> I am trying to gauge how hard and how fast I should push to Linus.  lxc
>> >> being the largest adopter of unprivileged user namespaces for general
>> >> purpose containers.
>> >> 
>> >> I expect you just call newuidmap and newgidmap and don't actually care
>> >> about not being able to set gid_map without privilege.  But I really
>> >> want to avoid pushing a security fix and then being surprised that
>> >> things like lxc break.
>> >> 
>> >> Eric
>> >
>> > Hi Eric,
>> >
>> > I've unfortunately been pretty busy this week as I was (well, still am)
>> > travelling to South Africa for a meeting. I don't have a full kernel
>> > tree around here and a full git clone isn't really doable over the kind
>> > of Internet I've got here :)
>> >
>> > Hopefully Serge can give it a quick try, otherwise I should be able to
>> > do some tests on Tuesday when I'm back home.
>> 
>> I thought Serge was going to but I haven't heard yet so I am prodding ;-)
>
> Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> Just to be sure I was testing the right thing I also tested using
> my unprivileged nsexec testcases, and they failed on setgroup/setgid
> as now expected, and succeeded there without your patches.

Thanks.

Serge unless you object will add your Tested-By to my pull message to Linus.

Minor question do you runprivileged nsexec test cases test to see if the
write to gid_map succeeds?  I would have expected the gid_map write to
fail before the setgroups setgid system calls came into play.

Eric


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-15 20:11                                                   ` Eric W. Biederman
@ 2014-12-15 20:49                                                     ` Serge Hallyn
  0 siblings, 0 replies; 79+ messages in thread
From: Serge Hallyn @ 2014-12-15 20:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Richard Weinberger, Linux Containers,
	Josh Triplett, stable, Andy Lutomirski, Kenton Varda, LSM,
	Michael Kerrisk-manpages, Linux API, Casey Schaufler,
	Andrew Morton, linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> Stéphane Graber <stgraber@ubuntu.com> writes:
> >> 
> >> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote:
> >> >> Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> >> >> 
> >> >> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> >> >> 
> >> >> >> Will people please test these patches with their container project?
> >> >> >> 
> >> >> >> These changes break container userspace (hopefully in a minimal way) if
> >> >> >> I could have that confirmed by testing I would really appreciate it.  I
> >> >> >> really don't want to send out a bug fix that accidentally breaks
> >> >> >> userspace again.
> >> >> >> 
> >> >> >> The only issue sort of under discussion is if there is a better name for
> >> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the
> >> >> >> functionality of the patchset.
> >> >> >> 
> >> >> >> With the code reviewed and written in simple obviously correct, easily
> >> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP.
> >> >> >> 
> >> >> >> Eric
> >> >> >
> >> >> > Is there a git tree we can clone?
> >> >> 
> >> >> Have either of you been able to check to see if any of my changes
> >> >> affects lxc?
> >> >> 
> >> >> I am trying to gauge how hard and how fast I should push to Linus.  lxc
> >> >> being the largest adopter of unprivileged user namespaces for general
> >> >> purpose containers.
> >> >> 
> >> >> I expect you just call newuidmap and newgidmap and don't actually care
> >> >> about not being able to set gid_map without privilege.  But I really
> >> >> want to avoid pushing a security fix and then being surprised that
> >> >> things like lxc break.
> >> >> 
> >> >> Eric
> >> >
> >> > Hi Eric,
> >> >
> >> > I've unfortunately been pretty busy this week as I was (well, still am)
> >> > travelling to South Africa for a meeting. I don't have a full kernel
> >> > tree around here and a full git clone isn't really doable over the kind
> >> > of Internet I've got here :)
> >> >
> >> > Hopefully Serge can give it a quick try, otherwise I should be able to
> >> > do some tests on Tuesday when I'm back home.
> >> 
> >> I thought Serge was going to but I haven't heard yet so I am prodding ;-)
> >
> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> > Just to be sure I was testing the right thing I also tested using
> > my unprivileged nsexec testcases, and they failed on setgroup/setgid
> > as now expected, and succeeded there without your patches.
> 
> Thanks.
> 
> Serge unless you object will add your Tested-By to my pull message to Linus.

Sounds good.

> Minor question do you runprivileged nsexec test cases test to see if the
> write to gid_map succeeds?  I would have expected the gid_map write to
> fail before the setgroups setgid system calls came into play.

Yes, I did that by hand, and it failed (with your kernel).

-serge

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-10 16:39                                       ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman
  2014-12-10 22:48                                         ` Serge Hallyn
@ 2014-12-16  2:05                                         ` Andy Lutomirski
  2014-12-16  9:23                                           ` Richard Weinberger
  1 sibling, 1 reply; 79+ messages in thread
From: Andy Lutomirski @ 2014-12-16  2:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Linux Containers,
	Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Kenton Varda, stable

On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Will people please test these patches with their container project?
>
> These changes break container userspace (hopefully in a minimal way) if
> I could have that confirmed by testing I would really appreciate it.  I
> really don't want to send out a bug fix that accidentally breaks
> userspace again.
>
> The only issue sort of under discussion is if there is a better name for
> /proc/<pid>/setgroups, and the name of the file will not affect the
> functionality of the patchset.
>
> With the code reviewed and written in simple obviously correct, easily
> reviewable ways I am hoping/planning to send this to Linus ASAP.


I tested this with Sandstorm.  It breaks as is and it works if I add
the setgroups thing.

Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(

I still don't like the name "setgroups".

--Andy

>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [CFT] Can I get some Tested-By's on this series?
  2014-12-16  2:05                                         ` Andy Lutomirski
@ 2014-12-16  9:23                                           ` Richard Weinberger
  0 siblings, 0 replies; 79+ messages in thread
From: Richard Weinberger @ 2014-12-16  9:23 UTC (permalink / raw)
  To: Andy Lutomirski, Eric W. Biederman
  Cc: Serge E. Hallyn, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel, LSM, Casey Schaufler, Kenton Varda, stable

Am 16.12.2014 um 03:05 schrieb Andy Lutomirski:
> On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Will people please test these patches with their container project?
>>
>> These changes break container userspace (hopefully in a minimal way) if
>> I could have that confirmed by testing I would really appreciate it.  I
>> really don't want to send out a bug fix that accidentally breaks
>> userspace again.
>>
>> The only issue sort of under discussion is if there is a better name for
>> /proc/<pid>/setgroups, and the name of the file will not affect the
>> functionality of the patchset.
>>
>> With the code reviewed and written in simple obviously correct, easily
>> reviewable ways I am hoping/planning to send this to Linus ASAP.
> 
> 
> I tested this with Sandstorm.  It breaks as is and it works if I add
> the setgroups thing.
> 
> Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(
> 
> I still don't like the name "setgroups".

I agree that the name is not optimal.
But I don't have a counterproposal as my naming skills are miserable.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2014-12-12 21:54                                                       ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman
@ 2015-02-02 15:36                                                         ` Michael Kerrisk (man-pages)
  2015-02-11  8:01                                                           ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-02 15:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski, Josh Triplett

[Adding Josh to CC in case he has anything to add.]

On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  man5/proc.5 | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/man5/proc.5 b/man5/proc.5
> index 96077d0dd195..d661e8cfeac9 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>  .\"       Added in 2.6.9
>  .\"       CONFIG_SCHEDSTATS
>  .TP
> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
> +This file reports
> +.BR allow
> +if the setgroups system call is allowed in the current user namespace.
> +This file reports
> +.BR deny
> +if the setgroups system call is not allowed in the current user namespace.
> +This file may be written to with values of
> +.BR allow
> +and
> +.BR deny
> +before
> +.IR /proc/[pid]/gid_map
> +is written to (enabling setgroups) in a user namespace.
> +.TP
>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>  This file shows memory consumption for each of the process's mappings.
>  (The

Hi Eric,

Thanks for this patch. I applied it, and then tried to work in
quite a few other details gleaned from the source code and commit 
message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
Could you please let me know if the following is correct:

    /proc/[pid]/setgroups (since Linux 3.19)
           This file displays the string "allow"  if  processes  in 
           the  user  namespace  that  contains the process pid are
           permitted to employ the setgroups(2)  system  call,  and
           "deny"  if  setgroups(2)  is  not permitted in that user
           namespace.

           A privileged process (one with the  CAP_SYS_ADMIN  capa‐
           bility in the namespace) may write either of the strings
           "allow" or "deny" to this file before writing a group ID 
           mapping   for   this   user   namespace   to   the  file
           /proc/[pid]/gid_map.  Writing the string "deny" prevents
           any  process  in  the user namespace from employing set‐
           groups(2).

           The default value of  this  file  in  the  initial  user
           namespace is "allow".

           Once  /proc/[pid]/gid_map has been written to (which has
           the effect of enabling setgroups(2) in the  user  names‐
           pace),  it is no longer possible to deny setgroups(2) by 
           writing to /proc/[pid]/setgroups.

           A child user namespace inherits the  /proc/[pid]/gid_map
           setting from its parent.

           If  the  setgroups  file  has the value "deny", then the
           setgroups(2) system call can't subsequently be reenabled
           (by writing "allow" to the file) in this user namespace.
           This restriction also propagates down to all child  user
           namespaces of this user namespace.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2014-12-12 21:54                                                       ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman
@ 2015-02-02 15:37                                                         ` Michael Kerrisk (man-pages)
  2015-02-11  8:02                                                           ` Michael Kerrisk (man-pages)
  2015-02-02 21:31                                                         ` Alban Crequy
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-02 15:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hi Eric,

Thanks for writing this up!

On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
> 
> Files with access permissions such as ---rwx---rwx give fewer
> permissions to their group then they do to everyone else.  Which means
> dropping groups with setgroups(0, NULL) actually grants a process
> privileges.
> 
> The uprivileged setting of gid_map turned out not to be safe after
> this change.  Privilege setting of gid_map can be interpreted as
> meaning yes it is ok to drop groups.

I had trouble to parse that sentence (and I'd like to make sure that
the right sentence ends up in the commit message). Did you mean: 

    "*Unprivileged* setting of gid_map can be interpreted as meaning
     yes it is ok to drop groups"

?

Or something else?

> To prevent this problem and future problems user namespaces were
> changed in such a way as to guarantee a user can not obtain
> credentials without privilege they could not obtain without the
> help of user namespaces.
> 
> This meant testing the effective user ID and not the filesystem user
> ID as setresuid and setregid allow setting any process uid or gid
> (except the supplemental groups) to the effective ID.
> 
> Furthermore to preserve in some form the useful applications that have
> been setting gid_map without privilege the file /proc/[pid]/setgroups
> was added to allow disabling setgroups.  With the setgroups system
> call permanently disabled in a user namespace it again becomes safe to
> allow writes to gid_map without privilege.
> 
> Here is my meager attempt to update user_namespaces.7 to reflect these
> issues.

It looked pretty serviceable as patch, IMO. So, thanks again. I've applied,
tweaking some wordings afterward, but changing nothing essential. See below
for a question.

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 49 insertions(+), 3 deletions(-)
> 
> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7
> index d76721d9a0a1..f8333a762308 100644
> --- a/man7/user_namespaces.7
> +++ b/man7/user_namespaces.7
> @@ -533,11 +533,16 @@ One of the following is true:
>  The data written to
>  .I uid_map
>  .RI ( gid_map )
> -consists of a single line that maps the writing process's filesystem user ID
> +consists of a single line that maps the writing process's effective user ID
>  (group ID) in the parent user namespace to a user ID (group ID)
>  in the user namespace.
> -The usual case here is that this single line provides a mapping for user ID
> -of the process that created the namespace.
> +The writing process must have the same effective user ID as the process
> +that created the user namespace.
> +In the case of
> +.I gid_map
> +the
> +.I setgroups
> +file must have been written to earlier and disabled the setgroups system call.
>  .IP * 3
>  The opening process has the
>  .BR CAP_SETUID
> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error
>  .\"
>  .\" ============================================================
>  .\"
> +.SS Interaction with system calls that change the uid or gid values
> +When in a user namespace where the
> +.I uid_map
> +or
> +.I gid_map
> +file has not been written the system calls that change user IDs
> +or group IDs respectively will fail.  After the
> +.I uid_map
> +and
> +.I gid_map
> +file have been written only the mapped values may be used in
> +system calls that change user IDs and group IDs.
> +
> +For user IDs these system calls include
> +.BR setuid ,
> +.BR setfsuid ,
> +.BR setreuid ,
> +and
> +.BR setresuid .
> +
> +For group IDs these system calls include
> +.BR setgid ,
> +.BR setfsgid ,
> +.BR setregid ,
> +.BR setresgid ,
> +and
> +.BR setgroups.
> +
> +Writing
> +.BR deny
> +to the
> +.I /proc/[pid]/setgroups
> +file before writing to
> +.I /proc/[pid]/gid_map
> +will permanently disable the setgroups system call in a user namespace
> +and allow writing to
> +.I /proc/[pid]/gid_map
> +without
> +.BR CAP_SETGID
> +in the parent user namespace.

I just want to double check: you really did mean to write "*parent* namespace"
above, right?

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2014-12-12 21:54                                                       ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman
  2015-02-02 15:37                                                         ` Michael Kerrisk (man-pages)
@ 2015-02-02 21:31                                                         ` Alban Crequy
  2015-03-04 14:00                                                           ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 79+ messages in thread
From: Alban Crequy @ 2015-02-02 21:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk-manpages, Linux Containers, Josh Triplett,
	Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel,
	LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hello,

Thanks for updating the man page.

On 12 December 2014 at 22:54, Eric W. Biederman <ebiederm@xmission.com> wrote:
(...)
> Furthermore to preserve in some form the useful applications that have
> been setting gid_map without privilege the file /proc/[pid]/setgroups
> was added to allow disabling setgroups.  With the setgroups system
> call permanently disabled in a user namespace it again becomes safe to
> allow writes to gid_map without privilege.
>
> Here is my meager attempt to update user_namespaces.7 to reflect these
> issues.

The program userns_child_exec.c in user_namespaces.7 should be updated
to write in /proc/.../setgroups, near the line:
/* Update the UID and GID maps in the child */

Otherwise, the example given in the manpage does not work:
$ ./userns_child_exec -p -m -U -M '0 1000 1' -G '0 1000 1' bash

Cheers,
Alban

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2015-02-02 15:36                                                         ` Michael Kerrisk (man-pages)
@ 2015-02-11  8:01                                                           ` Michael Kerrisk (man-pages)
  2015-02-11 13:51                                                             ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-11  8:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hi Eric,

Ping!

Cheers,

Michael



On 2 February 2015 at 16:36, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> [Adding Josh to CC in case he has anything to add.]
>
> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  man5/proc.5 | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/man5/proc.5 b/man5/proc.5
>> index 96077d0dd195..d661e8cfeac9 100644
>> --- a/man5/proc.5
>> +++ b/man5/proc.5
>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>>  .\"       Added in 2.6.9
>>  .\"       CONFIG_SCHEDSTATS
>>  .TP
>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
>> +This file reports
>> +.BR allow
>> +if the setgroups system call is allowed in the current user namespace.
>> +This file reports
>> +.BR deny
>> +if the setgroups system call is not allowed in the current user namespace.
>> +This file may be written to with values of
>> +.BR allow
>> +and
>> +.BR deny
>> +before
>> +.IR /proc/[pid]/gid_map
>> +is written to (enabling setgroups) in a user namespace.
>> +.TP
>>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>>  This file shows memory consumption for each of the process's mappings.
>>  (The
>
> Hi Eric,
>
> Thanks for this patch. I applied it, and then tried to work in
> quite a few other details gleaned from the source code and commit
> message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
> Could you please let me know if the following is correct:
>
>     /proc/[pid]/setgroups (since Linux 3.19)
>            This file displays the string "allow"  if  processes  in
>            the  user  namespace  that  contains the process pid are
>            permitted to employ the setgroups(2)  system  call,  and
>            "deny"  if  setgroups(2)  is  not permitted in that user
>            namespace.
>
>            A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>            bility in the namespace) may write either of the strings
>            "allow" or "deny" to this file before writing a group ID
>            mapping   for   this   user   namespace   to   the  file
>            /proc/[pid]/gid_map.  Writing the string "deny" prevents
>            any  process  in  the user namespace from employing set‐
>            groups(2).
>
>            The default value of  this  file  in  the  initial  user
>            namespace is "allow".
>
>            Once  /proc/[pid]/gid_map has been written to (which has
>            the effect of enabling setgroups(2) in the  user  names‐
>            pace),  it is no longer possible to deny setgroups(2) by
>            writing to /proc/[pid]/setgroups.
>
>            A child user namespace inherits the  /proc/[pid]/gid_map
>            setting from its parent.
>
>            If  the  setgroups  file  has the value "deny", then the
>            setgroups(2) system call can't subsequently be reenabled
>            (by writing "allow" to the file) in this user namespace.
>            This restriction also propagates down to all child  user
>            namespaces of this user namespace.
>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2015-02-02 15:37                                                         ` Michael Kerrisk (man-pages)
@ 2015-02-11  8:02                                                           ` Michael Kerrisk (man-pages)
  2015-02-11 14:01                                                             ` Eric W. Biederman
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-11  8:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hi Eric,

Ping!

Cheers,

Michael


On 2 February 2015 at 16:37, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hi Eric,
>
> Thanks for writing this up!
>
> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>
>> Files with access permissions such as ---rwx---rwx give fewer
>> permissions to their group then they do to everyone else.  Which means
>> dropping groups with setgroups(0, NULL) actually grants a process
>> privileges.
>>
>> The uprivileged setting of gid_map turned out not to be safe after
>> this change.  Privilege setting of gid_map can be interpreted as
>> meaning yes it is ok to drop groups.
>
> I had trouble to parse that sentence (and I'd like to make sure that
> the right sentence ends up in the commit message). Did you mean:
>
>     "*Unprivileged* setting of gid_map can be interpreted as meaning
>      yes it is ok to drop groups"
>
> ?
>
> Or something else?
>
>> To prevent this problem and future problems user namespaces were
>> changed in such a way as to guarantee a user can not obtain
>> credentials without privilege they could not obtain without the
>> help of user namespaces.
>>
>> This meant testing the effective user ID and not the filesystem user
>> ID as setresuid and setregid allow setting any process uid or gid
>> (except the supplemental groups) to the effective ID.
>>
>> Furthermore to preserve in some form the useful applications that have
>> been setting gid_map without privilege the file /proc/[pid]/setgroups
>> was added to allow disabling setgroups.  With the setgroups system
>> call permanently disabled in a user namespace it again becomes safe to
>> allow writes to gid_map without privilege.
>>
>> Here is my meager attempt to update user_namespaces.7 to reflect these
>> issues.
>
> It looked pretty serviceable as patch, IMO. So, thanks again. I've applied,
> tweaking some wordings afterward, but changing nothing essential. See below
> for a question.
>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 49 insertions(+), 3 deletions(-)
>>
>> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7
>> index d76721d9a0a1..f8333a762308 100644
>> --- a/man7/user_namespaces.7
>> +++ b/man7/user_namespaces.7
>> @@ -533,11 +533,16 @@ One of the following is true:
>>  The data written to
>>  .I uid_map
>>  .RI ( gid_map )
>> -consists of a single line that maps the writing process's filesystem user ID
>> +consists of a single line that maps the writing process's effective user ID
>>  (group ID) in the parent user namespace to a user ID (group ID)
>>  in the user namespace.
>> -The usual case here is that this single line provides a mapping for user ID
>> -of the process that created the namespace.
>> +The writing process must have the same effective user ID as the process
>> +that created the user namespace.
>> +In the case of
>> +.I gid_map
>> +the
>> +.I setgroups
>> +file must have been written to earlier and disabled the setgroups system call.
>>  .IP * 3
>>  The opening process has the
>>  .BR CAP_SETUID
>> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error
>>  .\"
>>  .\" ============================================================
>>  .\"
>> +.SS Interaction with system calls that change the uid or gid values
>> +When in a user namespace where the
>> +.I uid_map
>> +or
>> +.I gid_map
>> +file has not been written the system calls that change user IDs
>> +or group IDs respectively will fail.  After the
>> +.I uid_map
>> +and
>> +.I gid_map
>> +file have been written only the mapped values may be used in
>> +system calls that change user IDs and group IDs.
>> +
>> +For user IDs these system calls include
>> +.BR setuid ,
>> +.BR setfsuid ,
>> +.BR setreuid ,
>> +and
>> +.BR setresuid .
>> +
>> +For group IDs these system calls include
>> +.BR setgid ,
>> +.BR setfsgid ,
>> +.BR setregid ,
>> +.BR setresgid ,
>> +and
>> +.BR setgroups.
>> +
>> +Writing
>> +.BR deny
>> +to the
>> +.I /proc/[pid]/setgroups
>> +file before writing to
>> +.I /proc/[pid]/gid_map
>> +will permanently disable the setgroups system call in a user namespace
>> +and allow writing to
>> +.I /proc/[pid]/gid_map
>> +without
>> +.BR CAP_SETGID
>> +in the parent user namespace.
>
> I just want to double check: you really did mean to write "*parent* namespace"
> above, right?
>
> Thanks,
>
> Michael
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2015-02-11  8:01                                                           ` Michael Kerrisk (man-pages)
@ 2015-02-11 13:51                                                             ` Eric W. Biederman
  2015-02-12 13:53                                                               ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2015-02-11 13:51 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Linux API, linux-man, linux-kernel, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable,
	Andy Lutomirski

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> Ping!
>
> Cheers,
>
> Michael

My apologies.  You description wasn't wrong but it may be a bit
misleading, explanation below.  You will have to figure out how to work
that into your proposed text.

> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> [Adding Josh to CC in case he has anything to add.]
>>
>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>
>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>> ---
>>>  man5/proc.5 | 15 +++++++++++++++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/man5/proc.5 b/man5/proc.5
>>> index 96077d0dd195..d661e8cfeac9 100644
>>> --- a/man5/proc.5
>>> +++ b/man5/proc.5
>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>>>  .\"       Added in 2.6.9
>>>  .\"       CONFIG_SCHEDSTATS
>>>  .TP
>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
>>> +This file reports
>>> +.BR allow
>>> +if the setgroups system call is allowed in the current user namespace.
>>> +This file reports
>>> +.BR deny
>>> +if the setgroups system call is not allowed in the current user namespace.
>>> +This file may be written to with values of
>>> +.BR allow
>>> +and
>>> +.BR deny
>>> +before
>>> +.IR /proc/[pid]/gid_map
>>> +is written to (enabling setgroups) in a user namespace.
>>> +.TP
>>>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>>>  This file shows memory consumption for each of the process's mappings.
>>>  (The
>>
>> Hi Eric,
>>
>> Thanks for this patch. I applied it, and then tried to work in
>> quite a few other details gleaned from the source code and commit
>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
>> Could you please let me know if the following is correct:

It is close but it may be misleading.

>>     /proc/[pid]/setgroups (since Linux 3.19)
>>            This file displays the string "allow"  if  processes  in
>>            the  user  namespace  that  contains the process pid are
>>            permitted to employ the setgroups(2)  system  call,  and
>>            "deny"  if  setgroups(2)  is  not permitted in that user
>>            namespace.

With the caveat that when gid_map is not set that setgroups is also not
allowed.

>>            A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>>            bility in the namespace) may write either of the strings
>>            "allow" or "deny" to this file before writing a group ID
>>            mapping   for   this   user   namespace   to   the  file
>>            /proc/[pid]/gid_map.  Writing the string "deny" prevents
>>            any  process  in  the user namespace from employing set‐
>>            groups(2).

Or more succintly.  You are allowed to write to /proc/[pid]/setgroups
when calling setgroups is not allowed because gid_map is unset.  This
ensures we do not have any transitions from a state where setgroups
is allowed to a state where setgroups is denied.  There are only
transitions from setgroups not-allowed to setgroups allowed.

>>            The default value of  this  file  in  the  initial  user
>>            namespace is "allow".
>>
>>            Once  /proc/[pid]/gid_map has been written to (which has
>>            the effect of enabling setgroups(2) in the  user  names‐
>>            pace),  it is no longer possible to deny setgroups(2) by
>>            writing to /proc/[pid]/setgroups.
>>
>>            A child user namespace inherits the  /proc/[pid]/gid_map
>>            setting from its parent.
>>
>>            If  the  setgroups  file  has the value "deny", then the
>>            setgroups(2) system call can't subsequently be reenabled
>>            (by writing "allow" to the file) in this user namespace.
>>            This restriction also propagates down to all child  user
>>            namespaces of this user namespace.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2015-02-11  8:02                                                           ` Michael Kerrisk (man-pages)
@ 2015-02-11 14:01                                                             ` Eric W. Biederman
  2015-02-12 10:11                                                               ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 79+ messages in thread
From: Eric W. Biederman @ 2015-02-11 14:01 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Linux API, linux-man, linux-kernel, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable,
	Andy Lutomirski

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> Ping!
>
> Cheers,
>
> Michael
>
>
> On 2 February 2015 at 16:37, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hi Eric,
>>
>> Thanks for writing this up!
>>
>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>
>>> Files with access permissions such as ---rwx---rwx give fewer
>>> permissions to their group then they do to everyone else.  Which means
>>> dropping groups with setgroups(0, NULL) actually grants a process
>>> privileges.
>>>
>>> The uprivileged setting of gid_map turned out not to be safe after
        ^^^^^^^^^^^
unprivileged  -- typo fix

>>> this change.  Privilege setting of gid_map can be interpreted as
>>> meaning yes it is ok to drop groups.
>>
>> I had trouble to parse that sentence (and I'd like to make sure that
>> the right sentence ends up in the commit message). Did you mean:
>>
>>     "*Unprivileged* setting of gid_map can be interpreted as meaning
>>      yes it is ok to drop groups"
>> ?
>>
>> Or something else?


I meant:  Setting of gid_map with privilege has been clarified to mean
that dropping groups is ok.  This allows existing programs that set
gid_map with privilege to work without changes.  That is newgidmap
continues to work unchanged.

>>> To prevent this problem and future problems user namespaces were
>>> changed in such a way as to guarantee a user can not obtain
>>> credentials without privilege they could not obtain without the
>>> help of user namespaces.
>>>
>>> This meant testing the effective user ID and not the filesystem user
>>> ID as setresuid and setregid allow setting any process uid or gid
>>> (except the supplemental groups) to the effective ID.
>>>
>>> Furthermore to preserve in some form the useful applications that have
>>> been setting gid_map without privilege the file /proc/[pid]/setgroups
>>> was added to allow disabling setgroups.  With the setgroups system
>>> call permanently disabled in a user namespace it again becomes safe to
>>> allow writes to gid_map without privilege.
>>>
>>> Here is my meager attempt to update user_namespaces.7 to reflect these
>>> issues.
>>
>> It looked pretty serviceable as patch, IMO. So, thanks again. I've applied,
>> tweaking some wordings afterward, but changing nothing essential. See below
>> for a question.
>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++---
>>>  1 file changed, 49 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7
>>> index d76721d9a0a1..f8333a762308 100644
>>> --- a/man7/user_namespaces.7
>>> +++ b/man7/user_namespaces.7
>>> @@ -533,11 +533,16 @@ One of the following is true:
>>>  The data written to
>>>  .I uid_map
>>>  .RI ( gid_map )
>>> -consists of a single line that maps the writing process's filesystem user ID
>>> +consists of a single line that maps the writing process's effective user ID
>>>  (group ID) in the parent user namespace to a user ID (group ID)
>>>  in the user namespace.
>>> -The usual case here is that this single line provides a mapping for user ID
>>> -of the process that created the namespace.
>>> +The writing process must have the same effective user ID as the process
>>> +that created the user namespace.
>>> +In the case of
>>> +.I gid_map
>>> +the
>>> +.I setgroups
>>> +file must have been written to earlier and disabled the setgroups system call.
>>>  .IP * 3
>>>  The opening process has the
>>>  .BR CAP_SETUID
>>> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error
>>>  .\"
>>>  .\" ============================================================
>>>  .\"
>>> +.SS Interaction with system calls that change the uid or gid values
>>> +When in a user namespace where the
>>> +.I uid_map
>>> +or
>>> +.I gid_map
>>> +file has not been written the system calls that change user IDs
>>> +or group IDs respectively will fail.  After the
>>> +.I uid_map
>>> +and
>>> +.I gid_map
>>> +file have been written only the mapped values may be used in
>>> +system calls that change user IDs and group IDs.
>>> +
>>> +For user IDs these system calls include
>>> +.BR setuid ,
>>> +.BR setfsuid ,
>>> +.BR setreuid ,
>>> +and
>>> +.BR setresuid .
>>> +
>>> +For group IDs these system calls include
>>> +.BR setgid ,
>>> +.BR setfsgid ,
>>> +.BR setregid ,
>>> +.BR setresgid ,
>>> +and
>>> +.BR setgroups.
>>> +
>>> +Writing
>>> +.BR deny
>>> +to the
>>> +.I /proc/[pid]/setgroups
>>> +file before writing to
>>> +.I /proc/[pid]/gid_map
>>> +will permanently disable the setgroups system call in a user namespace
>>> +and allow writing to
>>> +.I /proc/[pid]/gid_map
>>> +without
>>> +.BR CAP_SETGID
>>> +in the parent user namespace.
>>
>> I just want to double check: you really did mean to write "*parent* namespace"
>> above, right?

Yes.  At this point only privilege in the *parent* user namespace is
meaningful, as applications in the new user namespace have all
privileges.

Eric

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2015-02-11 14:01                                                             ` Eric W. Biederman
@ 2015-02-12 10:11                                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-12 10:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

On 02/11/2015 03:01 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> 
>> Hi Eric,
>>
>> Ping!
>>
>> Cheers,
>>
>> Michael
>>
>>
>> On 2 February 2015 at 16:37, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> Hi Eric,
>>>
>>> Thanks for writing this up!
>>>
>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>>
>>>> Files with access permissions such as ---rwx---rwx give fewer
>>>> permissions to their group then they do to everyone else.  Which means
>>>> dropping groups with setgroups(0, NULL) actually grants a process
>>>> privileges.
>>>>
>>>> The uprivileged setting of gid_map turned out not to be safe after
>         ^^^^^^^^^^^
> unprivileged  -- typo fix

Thanks for confirming.

>>>> this change.  Privilege setting of gid_map can be interpreted as
>>>> meaning yes it is ok to drop groups.
>>>
>>> I had trouble to parse that sentence (and I'd like to make sure that
>>> the right sentence ends up in the commit message). Did you mean:
>>>
>>>     "*Unprivileged* setting of gid_map can be interpreted as meaning
>>>      yes it is ok to drop groups"
>>> ?
>>>
>>> Or something else?
> 
> 
> I meant:  Setting of gid_map with privilege has been clarified to mean
> that dropping groups is ok.  This allows existing programs that set
> gid_map with privilege to work without changes.  That is newgidmap
> continues to work unchanged.

Thanks. I added that text to the changelog message.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2015-02-11 13:51                                                             ` Eric W. Biederman
@ 2015-02-12 13:53                                                               ` Michael Kerrisk (man-pages)
  2015-02-21  7:57                                                                 ` Michael Kerrisk (man-pages)
  2015-03-03 11:39                                                                 ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-12 13:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hello Eric,

On 02/11/2015 02:51 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> 
>> Hi Eric,
>>
>> Ping!
>>
>> Cheers,
>>
>> Michael
> 
> My apologies.  You description wasn't wrong but it may be a bit
> misleading, explanation below.  You will have to figure out how to work
> that into your proposed text.
> 
>> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> [Adding Josh to CC in case he has anything to add.]
>>>
>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>>
>>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>>> ---
>>>>  man5/proc.5 | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/man5/proc.5 b/man5/proc.5
>>>> index 96077d0dd195..d661e8cfeac9 100644
>>>> --- a/man5/proc.5
>>>> +++ b/man5/proc.5
>>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>>>>  .\"       Added in 2.6.9
>>>>  .\"       CONFIG_SCHEDSTATS
>>>>  .TP
>>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
>>>> +This file reports
>>>> +.BR allow
>>>> +if the setgroups system call is allowed in the current user namespace.
>>>> +This file reports
>>>> +.BR deny
>>>> +if the setgroups system call is not allowed in the current user namespace.
>>>> +This file may be written to with values of
>>>> +.BR allow
>>>> +and
>>>> +.BR deny
>>>> +before
>>>> +.IR /proc/[pid]/gid_map
>>>> +is written to (enabling setgroups) in a user namespace.
>>>> +.TP
>>>>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>>>>  This file shows memory consumption for each of the process's mappings.
>>>>  (The
>>>
>>> Hi Eric,
>>>
>>> Thanks for this patch. I applied it, and then tried to work in
>>> quite a few other details gleaned from the source code and commit
>>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
>>> Could you please let me know if the following is correct:
> 
> It is close but it may be misleading.
> 
>>>     /proc/[pid]/setgroups (since Linux 3.19)
>>>            This file displays the string "allow"  if  processes  in
>>>            the  user  namespace  that  contains the process pid are
>>>            permitted to employ the setgroups(2)  system  call,  and
>>>            "deny"  if  setgroups(2)  is  not permitted in that user
>>>            namespace.
> 
> With the caveat that when gid_map is not set that setgroups is also not
> allowed.

Okay -- Iadded that point.

>>>            A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>>>            bility in the namespace) may write either of the strings
>>>            "allow" or "deny" to this file before writing a group ID
>>>            mapping   for   this   user   namespace   to   the  file
>>>            /proc/[pid]/gid_map.  Writing the string "deny" prevents
>>>            any  process  in  the user namespace from employing set‐
>>>            groups(2).
> 
> Or more succintly.  You are allowed to write to /proc/[pid]/setgroups
> when calling setgroups is not allowed because gid_map is unset.  This
> ensures we do not have any transitions from a state where setgroups
> is allowed to a state where setgroups is denied.  There are only
> transitions from setgroups not-allowed to setgroups allowed.

And I've worked in the above point, rewording a bit along the way.
So, how does the following look (only the first two paragraphs have
changed)?

       /proc/[pid]/setgroups (since Linux 3.19)
              This file displays the string "allow"  if  processes  in
              the  user  namespace  that  contains the process pid are
              permitted to employ the setgroups(2)  system  call,  and
              "deny"  if  setgroups(2)  is  not permitted in that user
              namespace.  (Note, however, that calls  to  setgroups(2)
              are  also  not  permitted if /proc/[pid]/gid_map has not
              yet been set.)

              A privileged process (one with the  CAP_SYS_ADMIN  capa‐
              bility in the namespace) may write either of the strings
              "allow" or "deny" to this file before writing a group ID
              mapping   for   this   user   namespace   to   the  file
              /proc/[pid]/gid_map.  Writing the string "deny" prevents
              any  process  in  the user namespace from employing set‐
              groups(2).  In other words, it is permitted to write  to
              /proc/[pid]/setgroups so long as calling setgroups(2) is
              not allowed because /proc/[pid]gid_map has not been set.
              This  ensures  that  a  process cannot transition from a
              state where setgroups(2) is allowed  to  a  state  where
              setgroups(2)  is  denied;  a process can only trabsition
              from setgroups(2) being disallowed to setgroups(2) being
              allowed.

              The  default  value  of  this  file  in the initial user
              namespace is "allow".

              Once /proc/[pid]/gid_map has been written to (which  has
              the  effect  of enabling setgroups(2) in the user names‐
              pace), it is no longer possible to deny setgroups(2)  by
              writing to /proc/[pid]/setgroups.

              A  child user namespace inherits the /proc/[pid]/gid_map
              setting from its parent.

              If the setgroups file has the  value  "deny",  then  the
              setgroups(2) system call can't subsequently be reenabled
              (by writing "allow" to the file) in this user namespace.
              This  restriction also propagates down to all child user
              namespaces of this user namespace.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2015-02-12 13:53                                                               ` Michael Kerrisk (man-pages)
@ 2015-02-21  7:57                                                                 ` Michael Kerrisk (man-pages)
  2015-03-03 11:39                                                                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-21  7:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hi Eric,

Ping!

Cheers,

Michael


On 02/12/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
> Hello Eric,
> 
> On 02/11/2015 02:51 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Ping!
>>>
>>> Cheers,
>>>
>>> Michael
>>
>> My apologies.  You description wasn't wrong but it may be a bit
>> misleading, explanation below.  You will have to figure out how to work
>> that into your proposed text.
>>
>>> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> [Adding Josh to CC in case he has anything to add.]
>>>>
>>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>>>
>>>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>>>> ---
>>>>>  man5/proc.5 | 15 +++++++++++++++
>>>>>  1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/man5/proc.5 b/man5/proc.5
>>>>> index 96077d0dd195..d661e8cfeac9 100644
>>>>> --- a/man5/proc.5
>>>>> +++ b/man5/proc.5
>>>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>>>>>  .\"       Added in 2.6.9
>>>>>  .\"       CONFIG_SCHEDSTATS
>>>>>  .TP
>>>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
>>>>> +This file reports
>>>>> +.BR allow
>>>>> +if the setgroups system call is allowed in the current user namespace.
>>>>> +This file reports
>>>>> +.BR deny
>>>>> +if the setgroups system call is not allowed in the current user namespace.
>>>>> +This file may be written to with values of
>>>>> +.BR allow
>>>>> +and
>>>>> +.BR deny
>>>>> +before
>>>>> +.IR /proc/[pid]/gid_map
>>>>> +is written to (enabling setgroups) in a user namespace.
>>>>> +.TP
>>>>>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>>>>>  This file shows memory consumption for each of the process's mappings.
>>>>>  (The
>>>>
>>>> Hi Eric,
>>>>
>>>> Thanks for this patch. I applied it, and then tried to work in
>>>> quite a few other details gleaned from the source code and commit
>>>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
>>>> Could you please let me know if the following is correct:
>>
>> It is close but it may be misleading.
>>
>>>>     /proc/[pid]/setgroups (since Linux 3.19)
>>>>            This file displays the string "allow"  if  processes  in
>>>>            the  user  namespace  that  contains the process pid are
>>>>            permitted to employ the setgroups(2)  system  call,  and
>>>>            "deny"  if  setgroups(2)  is  not permitted in that user
>>>>            namespace.
>>
>> With the caveat that when gid_map is not set that setgroups is also not
>> allowed.
> 
> Okay -- Iadded that point.
> 
>>>>            A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>>>>            bility in the namespace) may write either of the strings
>>>>            "allow" or "deny" to this file before writing a group ID
>>>>            mapping   for   this   user   namespace   to   the  file
>>>>            /proc/[pid]/gid_map.  Writing the string "deny" prevents
>>>>            any  process  in  the user namespace from employing set‐
>>>>            groups(2).
>>
>> Or more succintly.  You are allowed to write to /proc/[pid]/setgroups
>> when calling setgroups is not allowed because gid_map is unset.  This
>> ensures we do not have any transitions from a state where setgroups
>> is allowed to a state where setgroups is denied.  There are only
>> transitions from setgroups not-allowed to setgroups allowed.
> 
> And I've worked in the above point, rewording a bit along the way.
> So, how does the following look (only the first two paragraphs have
> changed)?
> 
>        /proc/[pid]/setgroups (since Linux 3.19)
>               This file displays the string "allow"  if  processes  in
>               the  user  namespace  that  contains the process pid are
>               permitted to employ the setgroups(2)  system  call,  and
>               "deny"  if  setgroups(2)  is  not permitted in that user
>               namespace.  (Note, however, that calls  to  setgroups(2)
>               are  also  not  permitted if /proc/[pid]/gid_map has not
>               yet been set.)
> 
>               A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>               bility in the namespace) may write either of the strings
>               "allow" or "deny" to this file before writing a group ID
>               mapping   for   this   user   namespace   to   the  file
>               /proc/[pid]/gid_map.  Writing the string "deny" prevents
>               any  process  in  the user namespace from employing set‐
>               groups(2).  In other words, it is permitted to write  to
>               /proc/[pid]/setgroups so long as calling setgroups(2) is
>               not allowed because /proc/[pid]gid_map has not been set.
>               This  ensures  that  a  process cannot transition from a
>               state where setgroups(2) is allowed  to  a  state  where
>               setgroups(2)  is  denied;  a process can only trabsition
>               from setgroups(2) being disallowed to setgroups(2) being
>               allowed.
> 
>               The  default  value  of  this  file  in the initial user
>               namespace is "allow".
> 
>               Once /proc/[pid]/gid_map has been written to (which  has
>               the  effect  of enabling setgroups(2) in the user names‐
>               pace), it is no longer possible to deny setgroups(2)  by
>               writing to /proc/[pid]/setgroups.
> 
>               A  child user namespace inherits the /proc/[pid]/gid_map
>               setting from its parent.
> 
>               If the setgroups file has the  value  "deny",  then  the
>               setgroups(2) system call can't subsequently be reenabled
>               (by writing "allow" to the file) in this user namespace.
>               This  restriction also propagates down to all child user
>               namespaces of this user namespace.
> 
> Cheers,
> 
> Michael
> 
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups
  2015-02-12 13:53                                                               ` Michael Kerrisk (man-pages)
  2015-02-21  7:57                                                                 ` Michael Kerrisk (man-pages)
@ 2015-03-03 11:39                                                                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-03-03 11:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

Hi Eric

Ping^2!

Cheers,

Michael


On 12 February 2015 at 14:53, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hello Eric,
>
> On 02/11/2015 02:51 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Ping!
>>>
>>> Cheers,
>>>
>>> Michael
>>
>> My apologies.  You description wasn't wrong but it may be a bit
>> misleading, explanation below.  You will have to figure out how to work
>> that into your proposed text.
>>
>>> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> [Adding Josh to CC in case he has anything to add.]
>>>>
>>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote:
>>>>>
>>>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>>>> ---
>>>>>  man5/proc.5 | 15 +++++++++++++++
>>>>>  1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/man5/proc.5 b/man5/proc.5
>>>>> index 96077d0dd195..d661e8cfeac9 100644
>>>>> --- a/man5/proc.5
>>>>> +++ b/man5/proc.5
>>>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated
>>>>>  .\"       Added in 2.6.9
>>>>>  .\"       CONFIG_SCHEDSTATS
>>>>>  .TP
>>>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)"
>>>>> +This file reports
>>>>> +.BR allow
>>>>> +if the setgroups system call is allowed in the current user namespace.
>>>>> +This file reports
>>>>> +.BR deny
>>>>> +if the setgroups system call is not allowed in the current user namespace.
>>>>> +This file may be written to with values of
>>>>> +.BR allow
>>>>> +and
>>>>> +.BR deny
>>>>> +before
>>>>> +.IR /proc/[pid]/gid_map
>>>>> +is written to (enabling setgroups) in a user namespace.
>>>>> +.TP
>>>>>  .IR /proc/[pid]/smaps " (since Linux 2.6.14)"
>>>>>  This file shows memory consumption for each of the process's mappings.
>>>>>  (The
>>>>
>>>> Hi Eric,
>>>>
>>>> Thanks for this patch. I applied it, and then tried to work in
>>>> quite a few other details gleaned from the source code and commit
>>>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/.
>>>> Could you please let me know if the following is correct:
>>
>> It is close but it may be misleading.
>>
>>>>     /proc/[pid]/setgroups (since Linux 3.19)
>>>>            This file displays the string "allow"  if  processes  in
>>>>            the  user  namespace  that  contains the process pid are
>>>>            permitted to employ the setgroups(2)  system  call,  and
>>>>            "deny"  if  setgroups(2)  is  not permitted in that user
>>>>            namespace.
>>
>> With the caveat that when gid_map is not set that setgroups is also not
>> allowed.
>
> Okay -- Iadded that point.
>
>>>>            A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>>>>            bility in the namespace) may write either of the strings
>>>>            "allow" or "deny" to this file before writing a group ID
>>>>            mapping   for   this   user   namespace   to   the  file
>>>>            /proc/[pid]/gid_map.  Writing the string "deny" prevents
>>>>            any  process  in  the user namespace from employing set‐
>>>>            groups(2).
>>
>> Or more succintly.  You are allowed to write to /proc/[pid]/setgroups
>> when calling setgroups is not allowed because gid_map is unset.  This
>> ensures we do not have any transitions from a state where setgroups
>> is allowed to a state where setgroups is denied.  There are only
>> transitions from setgroups not-allowed to setgroups allowed.
>
> And I've worked in the above point, rewording a bit along the way.
> So, how does the following look (only the first two paragraphs have
> changed)?
>
>        /proc/[pid]/setgroups (since Linux 3.19)
>               This file displays the string "allow"  if  processes  in
>               the  user  namespace  that  contains the process pid are
>               permitted to employ the setgroups(2)  system  call,  and
>               "deny"  if  setgroups(2)  is  not permitted in that user
>               namespace.  (Note, however, that calls  to  setgroups(2)
>               are  also  not  permitted if /proc/[pid]/gid_map has not
>               yet been set.)
>
>               A privileged process (one with the  CAP_SYS_ADMIN  capa‐
>               bility in the namespace) may write either of the strings
>               "allow" or "deny" to this file before writing a group ID
>               mapping   for   this   user   namespace   to   the  file
>               /proc/[pid]/gid_map.  Writing the string "deny" prevents
>               any  process  in  the user namespace from employing set‐
>               groups(2).  In other words, it is permitted to write  to
>               /proc/[pid]/setgroups so long as calling setgroups(2) is
>               not allowed because /proc/[pid]gid_map has not been set.
>               This  ensures  that  a  process cannot transition from a
>               state where setgroups(2) is allowed  to  a  state  where
>               setgroups(2)  is  denied;  a process can only trabsition
>               from setgroups(2) being disallowed to setgroups(2) being
>               allowed.
>
>               The  default  value  of  this  file  in the initial user
>               namespace is "allow".
>
>               Once /proc/[pid]/gid_map has been written to (which  has
>               the  effect  of enabling setgroups(2) in the user names‐
>               pace), it is no longer possible to deny setgroups(2)  by
>               writing to /proc/[pid]/setgroups.
>
>               A  child user namespace inherits the /proc/[pid]/gid_map
>               setting from its parent.
>
>               If the setgroups file has the  value  "deny",  then  the
>               setgroups(2) system call can't subsequently be reenabled
>               (by writing "allow" to the file) in this user namespace.
>               This  restriction also propagates down to all child user
>               namespaces of this user namespace.
>
> Cheers,
>
> Michael
>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups
  2015-02-02 21:31                                                         ` Alban Crequy
@ 2015-03-04 14:00                                                           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-03-04 14:00 UTC (permalink / raw)
  To: Alban Crequy, Eric W. Biederman
  Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Linux API, linux-man, linux-kernel, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger,
	Kenton Varda, stable, Andy Lutomirski

On 02/02/2015 10:31 PM, Alban Crequy wrote:
> Hello,
> 
> Thanks for updating the man page.
> 
> On 12 December 2014 at 22:54, Eric W. Biederman <ebiederm@xmission.com> wrote:
> (...)
>> Furthermore to preserve in some form the useful applications that have
>> been setting gid_map without privilege the file /proc/[pid]/setgroups
>> was added to allow disabling setgroups.  With the setgroups system
>> call permanently disabled in a user namespace it again becomes safe to
>> allow writes to gid_map without privilege.
>>
>> Here is my meager attempt to update user_namespaces.7 to reflect these
>> issues.
> 
> The program userns_child_exec.c in user_namespaces.7 should be updated
> to write in /proc/.../setgroups, near the line:
> /* Update the UID and GID maps in the child */
> 
> Otherwise, the example given in the manpage does not work:
> $ ./userns_child_exec -p -m -U -M '0 1000 1' -G '0 1000 1' bash

Thanks, Alban. I've added code to the example to handle /proc/PID/setgroups
(and tested).

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2015-03-04 14:00 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-29 17:26 [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged Andy Lutomirski
2014-12-02 12:09 ` Eric W. Biederman
2014-12-02 18:53   ` Andy Lutomirski
2014-12-02 19:45     ` Eric W. Biederman
2014-12-02 20:13       ` Andy Lutomirski
2014-12-02 20:25         ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman
2014-12-02 20:28           ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
2014-12-02 20:30             ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman
2014-12-02 21:05             ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski
2014-12-02 21:45               ` Eric W. Biederman
2014-12-02 22:17                 ` Andy Lutomirski
2014-12-02 23:07                   ` Eric W. Biederman
2014-12-02 23:17                     ` Andy Lutomirski
2014-12-08 22:06                       ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
2014-12-08 22:07                         ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
2014-12-08 22:11                           ` Andy Lutomirski
     [not found]                             ` <87h9x5ok0h.fsf@x220.int.ebiederm.org>
2014-12-08 22:33                               ` Andy Lutomirski
2014-12-08 22:17                           ` Richard Weinberger
2014-12-08 22:25                             ` Andy Lutomirski
2014-12-08 22:27                               ` Richard Weinberger
     [not found]                                 ` <874mt5ojfh.fsf@x220.int.ebiederm.org>
2014-12-08 22:47                                   ` Andy Lutomirski
2014-12-08 22:07                         ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
2014-12-08 22:08                         ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
2014-12-08 22:12                           ` Andy Lutomirski
2014-12-08 22:10                         ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
2014-12-08 22:15                           ` Andy Lutomirski
2014-12-08 22:11                         ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
2014-12-08 22:21                           ` Andy Lutomirski
2014-12-08 22:44                             ` Eric W. Biederman
2014-12-08 22:48                               ` Andy Lutomirski
2014-12-08 23:30                                 ` Eric W. Biederman
2014-12-09 19:31                                   ` Eric W. Biederman
2014-12-09 20:36                                     ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
2014-12-09 20:38                                       ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
2014-12-09 22:49                                         ` Andy Lutomirski
2014-12-09 20:39                                       ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
2014-12-09 23:00                                         ` Andy Lutomirski
2014-12-09 20:39                                       ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
2014-12-09 20:41                                       ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
2014-12-09 20:41                                       ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
2014-12-09 22:49                                         ` Andy Lutomirski
2014-12-09 20:42                                       ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
2014-12-09 22:28                                         ` Andy Lutomirski
     [not found]                                           ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com>
2014-12-10  0:21                                             ` Andy Lutomirski
     [not found]                                               ` <87wq5zf83t.fsf@x220.int.ebiederm.org>
     [not found]                                                 ` <87iohh3c9c.fsf@x220.int.ebiederm.org>
2014-12-12  1:30                                                   ` Andy Lutomirski
     [not found]                                                   ` <8761dh3b7k.fsf_-_@x220.int.ebiederm.org>
     [not found]                                                     ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org>
2014-12-12 21:54                                                       ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman
2015-02-02 15:36                                                         ` Michael Kerrisk (man-pages)
2015-02-11  8:01                                                           ` Michael Kerrisk (man-pages)
2015-02-11 13:51                                                             ` Eric W. Biederman
2015-02-12 13:53                                                               ` Michael Kerrisk (man-pages)
2015-02-21  7:57                                                                 ` Michael Kerrisk (man-pages)
2015-03-03 11:39                                                                 ` Michael Kerrisk (man-pages)
2014-12-12 21:54                                                       ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman
2015-02-02 15:37                                                         ` Michael Kerrisk (man-pages)
2015-02-11  8:02                                                           ` Michael Kerrisk (man-pages)
2015-02-11 14:01                                                             ` Eric W. Biederman
2015-02-12 10:11                                                               ` Michael Kerrisk (man-pages)
2015-02-02 21:31                                                         ` Alban Crequy
2015-03-04 14:00                                                           ` Michael Kerrisk (man-pages)
2014-12-09 20:43                                       ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
2014-12-10 16:39                                       ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman
2014-12-10 22:48                                         ` Serge Hallyn
2014-12-10 22:50                                           ` Richard Weinberger
2014-12-10 23:19                                             ` Eric W. Biederman
2014-12-11 19:27                                               ` Richard Weinberger
2014-12-12  6:56                                               ` Chen, Hanxiao
2014-12-13 22:31                                           ` serge
     [not found]                                           ` <87lhmcy2et.fsf@x220.int.ebiederm.org>
     [not found]                                             ` <20141212220840.GF22091@castiana.ipv6.teksavvy.com>
     [not found]                                               ` <8761dgze56.fsf@x220.int.ebiederm.org>
2014-12-15 19:38                                                 ` Serge Hallyn
2014-12-15 20:11                                                   ` Eric W. Biederman
2014-12-15 20:49                                                     ` Serge Hallyn
2014-12-16  2:05                                         ` Andy Lutomirski
2014-12-16  9:23                                           ` Richard Weinberger
2014-12-08 22:14                         ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
2014-12-08 22:26                           ` Andy Lutomirski
2014-12-02 20:58           ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski
2014-12-02 21:26             ` Eric W. Biederman
2014-12-02 22:09               ` Andy Lutomirski
2014-12-02 22:48                 ` Eric W. Biederman
2014-12-02 22:56                   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).