* [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged @ 2014-11-29 17:26 Andy Lutomirski 2014-12-02 12:09 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-11-29 17:26 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, kenton, Andy Lutomirski, stable Classic unix permission checks have an interesting feature. The group permissions for a file can be set to less than the other permissions on a file. Occasionally this is used deliberately to give a certain group of users fewer permissions than the default. User namespaces break this usage. Groups set in rgid or egid are unaffected because an unprivileged user namespace creator can only map a single group, so setresgid inside and outside the namespace have the same effect. However, an unprivileged user namespace creator can currently use setgroups(2) to drop all supplementary groups, so, if a supplementary group denies access to some resource, user namespaces can be used to bypass that restriction. To fix this issue, this introduces a new user namespace flag USERNS_SETGROUPS_ALLOWED. If that flag is not set, then setgroups(2) will fail regardless of the caller's capabilities. USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By default, if the writer of gid_map has CAP_SETGID in the parent userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is not so privileged, then writing to gid_map will fail unless the writer adds "setgroups deny" to gid_map, in which case the check is skipped but USERNS_SETGROUPS_ALLOWED will remain cleared. The full semantics are: If "setgroups allow" is present or no explicit "setgroups" setting is written to gid_map, then writing to gid_map will fail with -EPERM unless the opener and writer have CAP_SETGID in the parent namespace and the parent namespace has USERNS_SETGROUPS_ALLOWED. If "setgroups deny" is present, then writing gid_map will work as before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will result in processes in the userns that have CAP_SETGID to be nontheless unable to use setgroups(2). If this breaks something inside the userns, then this is okay -- the userns creator specifically requested this behavior. While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user namespace creator has no supplementary groups, doing so could be surprising and could have unpleasant interactions with setns(2). Any application that uses newgidmap(1) should be unaffected by this fix, but unprivileged users that create user namespaces to manipulate mounts or sandbox themselves will break until they start using "setgroups deny". This should fix CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- Unlike v1, this *will* break things like Sandstorm. Fixing them will be easy. I agree that this will result in better long-term semantics, but I'm not so happy about breaking working software. If this is unpalatable, here's a different option: get rid of all these permission checks and just change setgroups. Specifically, make it so that setgroups(2) in a userns will succeed but will silently refuse to remove unmapped groups. Changes from v1: - Userns flags are now properly atomic. - "setgroups allow" is now the default, so legacy unprivileged gid_map writers will start to fail. include/linux/user_namespace.h | 3 +++ kernel/groups.c | 3 +++ kernel/user.c | 1 + kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 49 insertions(+) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index e95372654f09..0ae4a8c97165 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ } extent[UID_GID_MAP_MAX_EXTENTS]; }; +#define USERNS_SETGROUPS_ALLOWED 0 + struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; @@ -27,6 +29,7 @@ struct user_namespace { kuid_t owner; kgid_t group; unsigned int proc_inum; + unsigned long flags; /* Register of per-UID persistent keyrings for this namespace */ #ifdef CONFIG_PERSISTENT_KEYRINGS diff --git a/kernel/groups.c b/kernel/groups.c index 451698f86cfa..b5ec42423202 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -6,6 +6,7 @@ #include <linux/slab.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/user_namespace.h> #include <asm/uaccess.h> /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) struct group_info *group_info; int retval; + if (!test_bit(USERNS_SETGROUPS_ALLOWED, ¤t_user_ns()->flags)) + return -EPERM; if (!ns_capable(current_user_ns(), CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) diff --git a/kernel/user.c b/kernel/user.c index 4efa39350e44..58fba8ea0845 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, + .flags = (1 << USERNS_SETGROUPS_ALLOWED), #ifdef CONFIG_PERSISTENT_KEYRINGS .persistent_keyring_register_sem = __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index aa312b0dc3ec..1f63935483e9 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf, char *kbuf, *pos, *next_line; ssize_t ret = -EINVAL; + bool may_setgroups = false; + bool setgroups_requested = true; + bool seen_explicit_setgroups = false; + /* * The id_map_mutex serializes all writes to any given map. * @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf, if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) goto out; + if (map == &ns->gid_map) { + /* + * Setgroups is permitted if the writer and the + * parent ns are privileged. + */ + may_setgroups = + test_bit(USERNS_SETGROUPS_ALLOWED, + &ns->parent->flags) && + file_ns_capable(file, ns->parent, CAP_SETGID) && + ns_capable(ns->parent, CAP_SETGID); + } + /* Get a buffer */ ret = -ENOMEM; page = __get_free_page(GFP_TEMPORARY); @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf, next_line = NULL; } + /* Is this line a gid_map option? */ + if (map == &ns->gid_map) { + if (!strcmp(pos, "setgroups deny")) { + if (seen_explicit_setgroups) + goto out; + seen_explicit_setgroups = true; + setgroups_requested = false; + continue; + } else if (!strcmp(pos, "setgroups allow")) { + if (seen_explicit_setgroups) + goto out; + seen_explicit_setgroups = true; + setgroups_requested = true; + continue; + } + } + pos = skip_spaces(pos); extent->first = simple_strtoul(pos, &pos, 10); if (!isspace(*pos)) @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf, extent->lower_first = lower_first; } + /* Validate and install setgroups permission. */ + if (map == &ns->gid_map && setgroups_requested) { + if (!may_setgroups) { + ret = -EPERM; + goto out; + } + set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags); + } + /* Install the map */ memcpy(map->extent, new_map.extent, new_map.nr_extents*sizeof(new_map.extent[0])); -- 1.9.3 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged 2014-11-29 17:26 [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged Andy Lutomirski @ 2014-12-02 12:09 ` Eric W. Biederman 2014-12-02 18:53 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 12:09 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, kenton, stable Andy Lutomirski <luto@amacapital.net> writes: > Classic unix permission checks have an interesting feature. The > group permissions for a file can be set to less than the other > permissions on a file. Occasionally this is used deliberately to > give a certain group of users fewer permissions than the default. > > User namespaces break this usage. Groups set in rgid or egid are > unaffected because an unprivileged user namespace creator can only > map a single group, so setresgid inside and outside the namespace > have the same effect. However, an unprivileged user namespace > creator can currently use setgroups(2) to drop all supplementary > groups, so, if a supplementary group denies access to some resource, > user namespaces can be used to bypass that restriction. > > To fix this issue, this introduces a new user namespace flag > USERNS_SETGROUPS_ALLOWED. If that flag is not set, then > setgroups(2) will fail regardless of the caller's capabilities. > > USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By > default, if the writer of gid_map has CAP_SETGID in the parent > userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the > USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is > not so privileged, then writing to gid_map will fail unless the > writer adds "setgroups deny" to gid_map, in which case the check is > skipped but USERNS_SETGROUPS_ALLOWED will remain cleared. > > The full semantics are: > > If "setgroups allow" is present or no explicit "setgroups" setting > is written to gid_map, then writing to gid_map will fail with -EPERM > unless the opener and writer have CAP_SETGID in the parent namespace > and the parent namespace has USERNS_SETGROUPS_ALLOWED. > > If "setgroups deny" is present, then writing gid_map will work as > before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will > result in processes in the userns that have CAP_SETGID to be > nontheless unable to use setgroups(2). If this breaks something > inside the userns, then this is okay -- the userns creator > specifically requested this behavior. I think we need to do this but I also think setgroups allow/deny should be a separate knob than the uid/gid mapping. If for no other reason than you missed at least two implementations of setgroups, in your implementation. > While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user > namespace creator has no supplementary groups, doing so could be > surprising and could have unpleasant interactions with setns(2). > > Any application that uses newgidmap(1) should be unaffected by this > fix, but unprivileged users that create user namespaces to > manipulate mounts or sandbox themselves will break until they start > using "setgroups deny". > > This should fix CVE-2014-8989. > > Cc: stable@vger.kernel.org > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > > Unlike v1, this *will* break things like Sandstorm. Fixing them will be > easy. I agree that this will result in better long-term semantics, but > I'm not so happy about breaking working software. I know what you mean. One of the pieces of software broken by all of this is my test to verify the remount semantics. Which makes all of this very unfortunate. > If this is unpalatable, here's a different option: get rid of all these > permission checks and just change setgroups. Specifically, make it so > that setgroups(2) in a userns will succeed but will silently refuse to > remove unmapped groups. Nope silently refusing to remove unmapped groups is not enough. I can make any gid in my supplemental groups my egid, it takes a sgid helper application but I don't need any special privileges to create that. Once that group is my egid I can map it. Which means I could drop any one group of my choosing without privielges. Which out and out breaks negative groups :( I got to looking and I have a significant piece of code that all of this breaks. tools/testing/selftests/mount/unprivileged-remount-test.c So I am extra motivated to figure out at find a way to preserve most of the existing functionality. My regression tests won't pass until I can find something pallateable. It is very annoying that every option I have considered so far breaks something useful. Having a write once setgroups disable, and the allowing unprivileged mappings after that seems the most palatable option I have seen, semantically. Which means existing software that doesn't care about setgroups can just add the disable code and then work otherwise unmodified. The other option that I have played with is forcing a set of groups in setgroups if your user namespace was created without privilege, that winds up requiring that verify you don't have any other supplementary groups, and is generally messy whichever way I look at it. *Pounds head on desk* What a mess. Eric > Changes from v1: > - Userns flags are now properly atomic. > - "setgroups allow" is now the default, so legacy unprivileged gid_map > writers will start to fail. > > include/linux/user_namespace.h | 3 +++ > kernel/groups.c | 3 +++ > kernel/user.c | 1 + > kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 49 insertions(+) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index e95372654f09..0ae4a8c97165 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ > } extent[UID_GID_MAP_MAX_EXTENTS]; > }; > > +#define USERNS_SETGROUPS_ALLOWED 0 > + > struct user_namespace { > struct uid_gid_map uid_map; > struct uid_gid_map gid_map; > @@ -27,6 +29,7 @@ struct user_namespace { > kuid_t owner; > kgid_t group; > unsigned int proc_inum; > + unsigned long flags; > > /* Register of per-UID persistent keyrings for this namespace */ > #ifdef CONFIG_PERSISTENT_KEYRINGS > diff --git a/kernel/groups.c b/kernel/groups.c > index 451698f86cfa..b5ec42423202 100644 > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -6,6 +6,7 @@ > #include <linux/slab.h> > #include <linux/security.h> > #include <linux/syscalls.h> > +#include <linux/user_namespace.h> > #include <asm/uaccess.h> > > /* init to 2 - one for init_task, one to ensure it is never freed */ > @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) > struct group_info *group_info; > int retval; > > + if (!test_bit(USERNS_SETGROUPS_ALLOWED, ¤t_user_ns()->flags)) > + return -EPERM; > if (!ns_capable(current_user_ns(), CAP_SETGID)) > return -EPERM; > if ((unsigned)gidsetsize > NGROUPS_MAX) > diff --git a/kernel/user.c b/kernel/user.c > index 4efa39350e44..58fba8ea0845 100644 > --- a/kernel/user.c > +++ b/kernel/user.c > @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { > .owner = GLOBAL_ROOT_UID, > .group = GLOBAL_ROOT_GID, > .proc_inum = PROC_USER_INIT_INO, > + .flags = (1 << USERNS_SETGROUPS_ALLOWED), > #ifdef CONFIG_PERSISTENT_KEYRINGS > .persistent_keyring_register_sem = > __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index aa312b0dc3ec..1f63935483e9 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf, > char *kbuf, *pos, *next_line; > ssize_t ret = -EINVAL; > > + bool may_setgroups = false; > + bool setgroups_requested = true; > + bool seen_explicit_setgroups = false; > + > /* > * The id_map_mutex serializes all writes to any given map. > * > @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf, > if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) > goto out; > > + if (map == &ns->gid_map) { > + /* > + * Setgroups is permitted if the writer and the > + * parent ns are privileged. > + */ > + may_setgroups = > + test_bit(USERNS_SETGROUPS_ALLOWED, > + &ns->parent->flags) && > + file_ns_capable(file, ns->parent, CAP_SETGID) && > + ns_capable(ns->parent, CAP_SETGID); > + } > + > /* Get a buffer */ > ret = -ENOMEM; > page = __get_free_page(GFP_TEMPORARY); > @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf, > next_line = NULL; > } > > + /* Is this line a gid_map option? */ > + if (map == &ns->gid_map) { > + if (!strcmp(pos, "setgroups deny")) { > + if (seen_explicit_setgroups) > + goto out; > + seen_explicit_setgroups = true; > + setgroups_requested = false; > + continue; > + } else if (!strcmp(pos, "setgroups allow")) { > + if (seen_explicit_setgroups) > + goto out; > + seen_explicit_setgroups = true; > + setgroups_requested = true; > + continue; > + } > + } > + > pos = skip_spaces(pos); > extent->first = simple_strtoul(pos, &pos, 10); > if (!isspace(*pos)) > @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf, > extent->lower_first = lower_first; > } > > + /* Validate and install setgroups permission. */ > + if (map == &ns->gid_map && setgroups_requested) { > + if (!may_setgroups) { > + ret = -EPERM; > + goto out; > + } > + set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags); > + } > + > /* Install the map */ > memcpy(map->extent, new_map.extent, > new_map.nr_extents*sizeof(new_map.extent[0])); ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged 2014-12-02 12:09 ` Eric W. Biederman @ 2014-12-02 18:53 ` Andy Lutomirski 2014-12-02 19:45 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 18:53 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> Classic unix permission checks have an interesting feature. The >> group permissions for a file can be set to less than the other >> permissions on a file. Occasionally this is used deliberately to >> give a certain group of users fewer permissions than the default. >> >> User namespaces break this usage. Groups set in rgid or egid are >> unaffected because an unprivileged user namespace creator can only >> map a single group, so setresgid inside and outside the namespace >> have the same effect. However, an unprivileged user namespace >> creator can currently use setgroups(2) to drop all supplementary >> groups, so, if a supplementary group denies access to some resource, >> user namespaces can be used to bypass that restriction. >> >> To fix this issue, this introduces a new user namespace flag >> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then >> setgroups(2) will fail regardless of the caller's capabilities. >> >> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By >> default, if the writer of gid_map has CAP_SETGID in the parent >> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the >> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is >> not so privileged, then writing to gid_map will fail unless the >> writer adds "setgroups deny" to gid_map, in which case the check is >> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared. >> >> The full semantics are: >> >> If "setgroups allow" is present or no explicit "setgroups" setting >> is written to gid_map, then writing to gid_map will fail with -EPERM >> unless the opener and writer have CAP_SETGID in the parent namespace >> and the parent namespace has USERNS_SETGROUPS_ALLOWED. >> >> If "setgroups deny" is present, then writing gid_map will work as >> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will >> result in processes in the userns that have CAP_SETGID to be >> nontheless unable to use setgroups(2). If this breaks something >> inside the userns, then this is okay -- the userns creator >> specifically requested this behavior. > > I think we need to do this but I also think setgroups allow/deny > should be a separate knob than the uid/gid mapping. Yeah. It should be readable, too. > > If for no other reason than you missed at least two implementations of > setgroups, in your implementation. I clearly didn't grep hard enough. Grr. > >> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user >> namespace creator has no supplementary groups, doing so could be >> surprising and could have unpleasant interactions with setns(2). >> >> Any application that uses newgidmap(1) should be unaffected by this >> fix, but unprivileged users that create user namespaces to >> manipulate mounts or sandbox themselves will break until they start >> using "setgroups deny". >> >> This should fix CVE-2014-8989. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Andy Lutomirski <luto@amacapital.net> >> --- >> >> Unlike v1, this *will* break things like Sandstorm. Fixing them will be >> easy. I agree that this will result in better long-term semantics, but >> I'm not so happy about breaking working software. > > I know what you mean. One of the pieces of software broken by all of > this is my test to verify the remount semantics. Which makes all of > this very unfortunate. > >> If this is unpalatable, here's a different option: get rid of all these >> permission checks and just change setgroups. Specifically, make it so >> that setgroups(2) in a userns will succeed but will silently refuse to >> remove unmapped groups. > > Nope silently refusing to remove unmapped groups is not enough. I can > make any gid in my supplemental groups my egid, it takes a sgid helper > application but I don't need any special privileges to create that. > Once that group is my egid I can map it. Which means I could drop > any one group of my choosing without privielges. Which out and out > breaks negative groups :( Whoops, right. And you can, indeed, have egid match one of your supplementary groups. > > I got to looking and I have a significant piece of code that all of this > breaks. > > tools/testing/selftests/mount/unprivileged-remount-test.c > > So I am extra motivated to figure out at find a way to preserve most of > the existing functionality. My regression tests won't pass until I can > find something pallateable. > > It is very annoying that every option I have considered so far breaks > something useful. > > Having a write once setgroups disable, and the allowing unprivileged > mappings after that seems the most palatable option I have seen, > semantically. Which means existing software that doesn't care about > setgroups can just add the disable code and then work otherwise > unmodified. > > The other option that I have played with is forcing a set of groups > in setgroups if your user namespace was created without privilege, > that winds up requiring that verify you don't have any other > supplementary groups, and is generally messy whichever way I look at it. How bad would the automatic selection of setgroups behavior really be? Suppose we have /proc/self/userns_setgroups_mode that can be "allow", "deny", or "auto". It starts out as "auto" (or "deny" if it's set to "deny" in the parent). Once any of the maps have been set, userns_options becomes readonly. If you try to write to gid_map when setgroups == auto, then it switches to "allow" or "deny" depending on whether the writer has privilege. This is nasty magical behavior, but it should DTRT for existing users, and everyone can be updated to set the value explicitly. FWIW, it might also make sense to move all of this stuff into /proc/PID/userns. There may be races in which a setuid or otherwise privileged helper pokes at more than one userns file but actually modifies different namespaces each time. I don't know whether these races matter. uid_map, gid_map, and projid_map could be symlinks. --Andy > > *Pounds head on desk* > > What a mess. > > Eric > >> Changes from v1: >> - Userns flags are now properly atomic. >> - "setgroups allow" is now the default, so legacy unprivileged gid_map >> writers will start to fail. >> >> include/linux/user_namespace.h | 3 +++ >> kernel/groups.c | 3 +++ >> kernel/user.c | 1 + >> kernel/user_namespace.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 49 insertions(+) >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h >> index e95372654f09..0ae4a8c97165 100644 >> --- a/include/linux/user_namespace.h >> +++ b/include/linux/user_namespace.h >> @@ -17,6 +17,8 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ >> } extent[UID_GID_MAP_MAX_EXTENTS]; >> }; >> >> +#define USERNS_SETGROUPS_ALLOWED 0 >> + >> struct user_namespace { >> struct uid_gid_map uid_map; >> struct uid_gid_map gid_map; >> @@ -27,6 +29,7 @@ struct user_namespace { >> kuid_t owner; >> kgid_t group; >> unsigned int proc_inum; >> + unsigned long flags; >> >> /* Register of per-UID persistent keyrings for this namespace */ >> #ifdef CONFIG_PERSISTENT_KEYRINGS >> diff --git a/kernel/groups.c b/kernel/groups.c >> index 451698f86cfa..b5ec42423202 100644 >> --- a/kernel/groups.c >> +++ b/kernel/groups.c >> @@ -6,6 +6,7 @@ >> #include <linux/slab.h> >> #include <linux/security.h> >> #include <linux/syscalls.h> >> +#include <linux/user_namespace.h> >> #include <asm/uaccess.h> >> >> /* init to 2 - one for init_task, one to ensure it is never freed */ >> @@ -223,6 +224,8 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) >> struct group_info *group_info; >> int retval; >> >> + if (!test_bit(USERNS_SETGROUPS_ALLOWED, ¤t_user_ns()->flags)) >> + return -EPERM; >> if (!ns_capable(current_user_ns(), CAP_SETGID)) >> return -EPERM; >> if ((unsigned)gidsetsize > NGROUPS_MAX) >> diff --git a/kernel/user.c b/kernel/user.c >> index 4efa39350e44..58fba8ea0845 100644 >> --- a/kernel/user.c >> +++ b/kernel/user.c >> @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { >> .owner = GLOBAL_ROOT_UID, >> .group = GLOBAL_ROOT_GID, >> .proc_inum = PROC_USER_INIT_INO, >> + .flags = (1 << USERNS_SETGROUPS_ALLOWED), >> #ifdef CONFIG_PERSISTENT_KEYRINGS >> .persistent_keyring_register_sem = >> __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >> index aa312b0dc3ec..1f63935483e9 100644 >> --- a/kernel/user_namespace.c >> +++ b/kernel/user_namespace.c >> @@ -601,6 +601,10 @@ static ssize_t map_write(struct file *file, const char __user *buf, >> char *kbuf, *pos, *next_line; >> ssize_t ret = -EINVAL; >> >> + bool may_setgroups = false; >> + bool setgroups_requested = true; >> + bool seen_explicit_setgroups = false; >> + >> /* >> * The id_map_mutex serializes all writes to any given map. >> * >> @@ -633,6 +637,18 @@ static ssize_t map_write(struct file *file, const char __user *buf, >> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) >> goto out; >> >> + if (map == &ns->gid_map) { >> + /* >> + * Setgroups is permitted if the writer and the >> + * parent ns are privileged. >> + */ >> + may_setgroups = >> + test_bit(USERNS_SETGROUPS_ALLOWED, >> + &ns->parent->flags) && >> + file_ns_capable(file, ns->parent, CAP_SETGID) && >> + ns_capable(ns->parent, CAP_SETGID); >> + } >> + >> /* Get a buffer */ >> ret = -ENOMEM; >> page = __get_free_page(GFP_TEMPORARY); >> @@ -667,6 +683,23 @@ static ssize_t map_write(struct file *file, const char __user *buf, >> next_line = NULL; >> } >> >> + /* Is this line a gid_map option? */ >> + if (map == &ns->gid_map) { >> + if (!strcmp(pos, "setgroups deny")) { >> + if (seen_explicit_setgroups) >> + goto out; >> + seen_explicit_setgroups = true; >> + setgroups_requested = false; >> + continue; >> + } else if (!strcmp(pos, "setgroups allow")) { >> + if (seen_explicit_setgroups) >> + goto out; >> + seen_explicit_setgroups = true; >> + setgroups_requested = true; >> + continue; >> + } >> + } >> + >> pos = skip_spaces(pos); >> extent->first = simple_strtoul(pos, &pos, 10); >> if (!isspace(*pos)) >> @@ -741,6 +774,15 @@ static ssize_t map_write(struct file *file, const char __user *buf, >> extent->lower_first = lower_first; >> } >> >> + /* Validate and install setgroups permission. */ >> + if (map == &ns->gid_map && setgroups_requested) { >> + if (!may_setgroups) { >> + ret = -EPERM; >> + goto out; >> + } >> + set_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags); >> + } >> + >> /* Install the map */ >> memcpy(map->extent, new_map.extent, >> new_map.nr_extents*sizeof(new_map.extent[0])); -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged 2014-12-02 18:53 ` Andy Lutomirski @ 2014-12-02 19:45 ` Eric W. Biederman 2014-12-02 20:13 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 19:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> Classic unix permission checks have an interesting feature. The >>> group permissions for a file can be set to less than the other >>> permissions on a file. Occasionally this is used deliberately to >>> give a certain group of users fewer permissions than the default. >>> >>> User namespaces break this usage. Groups set in rgid or egid are >>> unaffected because an unprivileged user namespace creator can only >>> map a single group, so setresgid inside and outside the namespace >>> have the same effect. However, an unprivileged user namespace >>> creator can currently use setgroups(2) to drop all supplementary >>> groups, so, if a supplementary group denies access to some resource, >>> user namespaces can be used to bypass that restriction. >>> >>> To fix this issue, this introduces a new user namespace flag >>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then >>> setgroups(2) will fail regardless of the caller's capabilities. >>> >>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By >>> default, if the writer of gid_map has CAP_SETGID in the parent >>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the >>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is >>> not so privileged, then writing to gid_map will fail unless the >>> writer adds "setgroups deny" to gid_map, in which case the check is >>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared. >>> >>> The full semantics are: >>> >>> If "setgroups allow" is present or no explicit "setgroups" setting >>> is written to gid_map, then writing to gid_map will fail with -EPERM >>> unless the opener and writer have CAP_SETGID in the parent namespace >>> and the parent namespace has USERNS_SETGROUPS_ALLOWED. >>> >>> If "setgroups deny" is present, then writing gid_map will work as >>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will >>> result in processes in the userns that have CAP_SETGID to be >>> nontheless unable to use setgroups(2). If this breaks something >>> inside the userns, then this is okay -- the userns creator >>> specifically requested this behavior. >> >> I think we need to do this but I also think setgroups allow/deny >> should be a separate knob than the uid/gid mapping. > > Yeah. It should be readable, too. > >> >> If for no other reason than you missed at least two implementations of >> setgroups, in your implementation. > > I clearly didn't grep hard enough. Grr. > >> >>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user >>> namespace creator has no supplementary groups, doing so could be >>> surprising and could have unpleasant interactions with setns(2). >>> >>> Any application that uses newgidmap(1) should be unaffected by this >>> fix, but unprivileged users that create user namespaces to >>> manipulate mounts or sandbox themselves will break until they start >>> using "setgroups deny". >>> >>> This should fix CVE-2014-8989. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Andy Lutomirski <luto@amacapital.net> >>> --- >>> >>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be >>> easy. I agree that this will result in better long-term semantics, but >>> I'm not so happy about breaking working software. >> >> I know what you mean. One of the pieces of software broken by all of >> this is my test to verify the remount semantics. Which makes all of >> this very unfortunate. >> >>> If this is unpalatable, here's a different option: get rid of all these >>> permission checks and just change setgroups. Specifically, make it so >>> that setgroups(2) in a userns will succeed but will silently refuse to >>> remove unmapped groups. >> >> Nope silently refusing to remove unmapped groups is not enough. I can >> make any gid in my supplemental groups my egid, it takes a sgid helper >> application but I don't need any special privileges to create that. >> Once that group is my egid I can map it. Which means I could drop >> any one group of my choosing without privielges. Which out and out >> breaks negative groups :( > > Whoops, right. And you can, indeed, have egid match one of your > supplementary groups. > >> >> I got to looking and I have a significant piece of code that all of this >> breaks. >> >> tools/testing/selftests/mount/unprivileged-remount-test.c >> >> So I am extra motivated to figure out at find a way to preserve most of >> the existing functionality. My regression tests won't pass until I can >> find something pallateable. >> >> It is very annoying that every option I have considered so far breaks >> something useful. >> >> Having a write once setgroups disable, and the allowing unprivileged >> mappings after that seems the most palatable option I have seen, >> semantically. Which means existing software that doesn't care about >> setgroups can just add the disable code and then work otherwise >> unmodified. >> >> The other option that I have played with is forcing a set of groups >> in setgroups if your user namespace was created without privilege, >> that winds up requiring that verify you don't have any other >> supplementary groups, and is generally messy whichever way I look at it. > > How bad would the automatic selection of setgroups behavior really be? > > Suppose we have /proc/self/userns_setgroups_mode that can be "allow", > "deny", or "auto". It starts out as "auto" (or "deny" if it's set to > "deny" in the parent). Once any of the maps have been set, > userns_options becomes readonly. If you try to write to gid_map when > setgroups == auto, then it switches to "allow" or "deny" depending on > whether the writer has privilege. > > This is nasty magical behavior, but it should DTRT for existing users, > and everyone can be updated to set the value explicitly. Rarely is everything updated unless there is a requirement for an update. For my code that cares an update is necessary anyway as it contains a gratuitous setgroups(0, NULL). Since we have to break applications breaking them loud and clear and letting them set the flat to recover (if possible) seems the best we can do. That at least allows someone to ask if they depend on setgroups or init_groups. > FWIW, it might also make sense to move all of this stuff into > /proc/PID/userns. There may be races in which a setuid or otherwise > privileged helper pokes at more than one userns file but actually > modifies different namespaces each time. I don't know whether these > races matter. uid_map, gid_map, and projid_map could be symlinks. I don't see how moving these files as removing any races. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged 2014-12-02 19:45 ` Eric W. Biederman @ 2014-12-02 20:13 ` Andy Lutomirski 2014-12-02 20:25 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 20:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 11:45 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> Andy Lutomirski <luto@amacapital.net> writes: >>> >>>> Classic unix permission checks have an interesting feature. The >>>> group permissions for a file can be set to less than the other >>>> permissions on a file. Occasionally this is used deliberately to >>>> give a certain group of users fewer permissions than the default. >>>> >>>> User namespaces break this usage. Groups set in rgid or egid are >>>> unaffected because an unprivileged user namespace creator can only >>>> map a single group, so setresgid inside and outside the namespace >>>> have the same effect. However, an unprivileged user namespace >>>> creator can currently use setgroups(2) to drop all supplementary >>>> groups, so, if a supplementary group denies access to some resource, >>>> user namespaces can be used to bypass that restriction. >>>> >>>> To fix this issue, this introduces a new user namespace flag >>>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then >>>> setgroups(2) will fail regardless of the caller's capabilities. >>>> >>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By >>>> default, if the writer of gid_map has CAP_SETGID in the parent >>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the >>>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is >>>> not so privileged, then writing to gid_map will fail unless the >>>> writer adds "setgroups deny" to gid_map, in which case the check is >>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared. >>>> >>>> The full semantics are: >>>> >>>> If "setgroups allow" is present or no explicit "setgroups" setting >>>> is written to gid_map, then writing to gid_map will fail with -EPERM >>>> unless the opener and writer have CAP_SETGID in the parent namespace >>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED. >>>> >>>> If "setgroups deny" is present, then writing gid_map will work as >>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will >>>> result in processes in the userns that have CAP_SETGID to be >>>> nontheless unable to use setgroups(2). If this breaks something >>>> inside the userns, then this is okay -- the userns creator >>>> specifically requested this behavior. >>> >>> I think we need to do this but I also think setgroups allow/deny >>> should be a separate knob than the uid/gid mapping. >> >> Yeah. It should be readable, too. >> >>> >>> If for no other reason than you missed at least two implementations of >>> setgroups, in your implementation. >> >> I clearly didn't grep hard enough. Grr. >> >>> >>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user >>>> namespace creator has no supplementary groups, doing so could be >>>> surprising and could have unpleasant interactions with setns(2). >>>> >>>> Any application that uses newgidmap(1) should be unaffected by this >>>> fix, but unprivileged users that create user namespaces to >>>> manipulate mounts or sandbox themselves will break until they start >>>> using "setgroups deny". >>>> >>>> This should fix CVE-2014-8989. >>>> >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: Andy Lutomirski <luto@amacapital.net> >>>> --- >>>> >>>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be >>>> easy. I agree that this will result in better long-term semantics, but >>>> I'm not so happy about breaking working software. >>> >>> I know what you mean. One of the pieces of software broken by all of >>> this is my test to verify the remount semantics. Which makes all of >>> this very unfortunate. >>> >>>> If this is unpalatable, here's a different option: get rid of all these >>>> permission checks and just change setgroups. Specifically, make it so >>>> that setgroups(2) in a userns will succeed but will silently refuse to >>>> remove unmapped groups. >>> >>> Nope silently refusing to remove unmapped groups is not enough. I can >>> make any gid in my supplemental groups my egid, it takes a sgid helper >>> application but I don't need any special privileges to create that. >>> Once that group is my egid I can map it. Which means I could drop >>> any one group of my choosing without privielges. Which out and out >>> breaks negative groups :( >> >> Whoops, right. And you can, indeed, have egid match one of your >> supplementary groups. >> >>> >>> I got to looking and I have a significant piece of code that all of this >>> breaks. >>> >>> tools/testing/selftests/mount/unprivileged-remount-test.c >>> >>> So I am extra motivated to figure out at find a way to preserve most of >>> the existing functionality. My regression tests won't pass until I can >>> find something pallateable. >>> >>> It is very annoying that every option I have considered so far breaks >>> something useful. >>> >>> Having a write once setgroups disable, and the allowing unprivileged >>> mappings after that seems the most palatable option I have seen, >>> semantically. Which means existing software that doesn't care about >>> setgroups can just add the disable code and then work otherwise >>> unmodified. >>> >>> The other option that I have played with is forcing a set of groups >>> in setgroups if your user namespace was created without privilege, >>> that winds up requiring that verify you don't have any other >>> supplementary groups, and is generally messy whichever way I look at it. >> >> How bad would the automatic selection of setgroups behavior really be? >> >> Suppose we have /proc/self/userns_setgroups_mode that can be "allow", >> "deny", or "auto". It starts out as "auto" (or "deny" if it's set to >> "deny" in the parent). Once any of the maps have been set, >> userns_options becomes readonly. If you try to write to gid_map when >> setgroups == auto, then it switches to "allow" or "deny" depending on >> whether the writer has privilege. >> >> This is nasty magical behavior, but it should DTRT for existing users, >> and everyone can be updated to set the value explicitly. > > Rarely is everything updated unless there is a requirement for an > update. > > For my code that cares an update is necessary anyway as it contains > a gratuitous setgroups(0, NULL). > > Since we have to break applications breaking them loud and clear and > letting them set the flat to recover (if possible) seems the best we can > do. That at least allows someone to ask if they depend on setgroups or > init_groups. Fair enough. Any thoughts on what the API should be for v3? > >> FWIW, it might also make sense to move all of this stuff into >> /proc/PID/userns. There may be races in which a setuid or otherwise >> privileged helper pokes at more than one userns file but actually >> modifies different namespaces each time. I don't know whether these >> races matter. uid_map, gid_map, and projid_map could be symlinks. > > I don't see how moving these files as removing any races. It helps if you use openat to open the userns directory and of the /proc infrastructure is smart enough to make that work. Admittedly, I don't actually see a dangerous race right now. --Andy > > Eric > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 20:13 ` Andy Lutomirski @ 2014-12-02 20:25 ` Eric W. Biederman 2014-12-02 20:28 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-02 20:58 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski 0 siblings, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 20:25 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Classic unix permission checks have an interesting feature, the group permissions for a file can be set to less than the other permissions on a file. Occassionally this is used deliberately to give a certain group of users fewer permissions than the default. Overlooking negative groups has resulted in the permission checks for setting up a group mapping in a user namespace to be too lax. Tighten the permission checks in new_idmap_permitted to ensure that mapping uids and gids into user namespaces without privilege will not result in new combinations of credentials being available to the users. When setting mappings without privilege only the creator of the user namespace is interesting as all other users that have CAP_SETUID over the user namespace will also have CAP_SETUID over the user namespaces parent. So the scope of the unprivileged check is reduced to just the case where cred->euid is the namespace creator. For setting a uid mapping without privilege only euid is considered as setresuid can set uid, suid and fsuid from euid without privielege making any combination of uids possible with user namespaces already possible without them. For now seeting a gid mapping without privilege is removed. The only possible set of credentials that would be safe without a gid mapping (egid without any supplementary groups) just doesn't happen in practice so would simply lead to unused untested code. setgroups is modified to fail not only when the group ids do not map but also when there are no gid mappings at all, preventing setgroups(0, NULL) from succeeding when gid mappings have not been established. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivileged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no effect on them. This should fix CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/s390/kernel/compat_linux.c | 5 ++++- include/linux/user_namespace.h | 10 ++++++++++ kernel/groups.c | 5 ++++- kernel/uid16.c | 5 ++++- kernel/user_namespace.c | 17 ++++++++++------- 5 files changed, 32 insertions(+), 10 deletions(-) diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c index ca38139423ae..21c91feeca2d 100644 --- a/arch/s390/kernel/compat_linux.c +++ b/arch/s390/kernel/compat_linux.c @@ -49,6 +49,7 @@ #include <linux/fadvise.h> #include <linux/ipc.h> #include <linux/slab.h> +#include <linux/user_namespace.h> #include <asm/types.h> #include <asm/uaccess.h> @@ -246,10 +247,12 @@ out: COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplist) { + struct user_namespace *user_ns = current_user_ns(); struct group_info *group_info; int retval; - if (!capable(CAP_SETGID)) + if (!gid_mapping_possible(user_ns) || + !capable(CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index e95372654f09..26d5e8f5db97 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -46,6 +46,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) return ns; } +static inline bool gid_mapping_possible(const struct user_namespace *ns) +{ + return ns->gid_map.nr_extents != 0; +} + extern int create_user_ns(struct cred *new); extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred); extern void free_user_ns(struct user_namespace *ns); @@ -70,6 +75,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) return &init_user_ns; } +static inline bool gid_mapping_possible(const struct user_namespace *ns) +{ + return true; +} + static inline int create_user_ns(struct cred *new) { return -EINVAL; diff --git a/kernel/groups.c b/kernel/groups.c index 451698f86cfa..b9a6a5c7e100 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -6,6 +6,7 @@ #include <linux/slab.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/user_namespace.h> #include <asm/uaccess.h> /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -220,10 +221,12 @@ out: SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) { + struct user_namespace *user_ns = current_user_ns(); struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) + if (!gid_mapping_possible(user_ns) || + !ns_capable(user_ns, CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/kernel/uid16.c b/kernel/uid16.c index 602e5bbbceff..602c7de2aa11 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -13,6 +13,7 @@ #include <linux/highuid.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/user_namespace.h> #include <asm/uaccess.h> @@ -173,10 +174,12 @@ out: SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) { + struct user_namespace *user_ns = current_user_ns(); struct group_info *group_info; int retval; - if (!ns_capable(current_user_ns(), CAP_SETGID)) + if (!gid_mapping_possible(user_ns) || + !ns_capable(user_ns, CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) return -EINVAL; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index aa312b0dc3ec..51d65b444951 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *new_map) { - /* Allow mapping to your own filesystem ids */ - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { + const struct cred *cred = file->f_cred; + + /* Allow a mapping without capabilities when allowing the root + * of the user namespace capabilities restricted to that id + * will not change the set of credentials available to that + * user. + */ + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && + uid_eq(ns->owner, cred->euid)) { u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { kuid_t uid = make_kuid(ns->parent, id); - if (uid_eq(uid, file->f_cred->fsuid)) - return true; - } else if (cap_setid == CAP_SETGID) { - kgid_t gid = make_kgid(ns->parent, id); - if (gid_eq(gid, file->f_cred->fsgid)) + if (uid_eq(uid, cred->euid)) return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 20:25 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman @ 2014-12-02 20:28 ` Eric W. Biederman 2014-12-02 20:30 ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman 2014-12-02 21:05 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski 2014-12-02 20:58 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski 1 sibling, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 20:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of 0 means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of 1 means the segtoups system call is enabled. - Descedent user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not pass in a struct file so file_ns_capable is unusable. - Update new_idmap_permitted to allow unprivileged users to establish a mapping of their own gid, as such mappings can no longer allow dropping of supplemental groups. This set of changes restores as much as possible the functionality that was lost when new_idmap_permitted was modified to not allow mappinges to be established without privilege. As this fixes a regression from: "userns: Avoid problems with negative groups" it is probably a candidate for a backport. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- This patch still needs a little bit of love. I need to take a hard look at the interaction of barriers and atomic ops, and it seems I have at least one comment fix that needs to move elsewhere. But this should be enough to move the conversation forward. arch/s390/kernel/compat_linux.c | 1 + fs/proc/base.c | 31 ++++++++++---- include/linux/user_namespace.h | 3 ++ kernel/groups.c | 1 + kernel/uid16.c | 1 + kernel/user.c | 1 + kernel/user_namespace.c | 95 ++++++++++++++++++++++++++++++++++++++++- 7 files changed, 124 insertions(+), 9 deletions(-) diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c index 21c91feeca2d..6d0ee1b089fb 100644 --- a/arch/s390/kernel/compat_linux.c +++ b/arch/s390/kernel/compat_linux.c @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis int retval; if (!gid_mapping_possible(user_ns) || + !atomic_read(&user_ns->setgroups_allowed) || !capable(CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) diff --git a/fs/proc/base.c b/fs/proc/base.c index 772efa45a452..4ebed9f01d97 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns, #endif /* CONFIG_TASK_IO_ACCOUNTING */ #ifdef CONFIG_USER_NS -static int proc_id_map_open(struct inode *inode, struct file *file, +static int proc_userns_open(struct inode *inode, struct file *file, const struct seq_operations *seq_ops) { struct user_namespace *ns = NULL; @@ -2418,7 +2418,7 @@ err: return ret; } -static int proc_id_map_release(struct inode *inode, struct file *file) +static int proc_userns_release(struct inode *inode, struct file *file) { struct seq_file *seq = file->private_data; struct user_namespace *ns = seq->private; @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file) static int proc_uid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_uid_seq_operations); + return proc_userns_open(inode, file, &proc_uid_seq_operations); } static int proc_gid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_gid_seq_operations); + return proc_userns_open(inode, file, &proc_gid_seq_operations); } static int proc_projid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_projid_seq_operations); + return proc_userns_open(inode, file, &proc_projid_seq_operations); } static const struct file_operations proc_uid_map_operations = { @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = { .write = proc_uid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_gid_map_operations = { @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = { .write = proc_gid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_projid_map_operations = { @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = { .write = proc_projid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, +}; + +static int proc_setgroups_open(struct inode *inode, struct file *file) +{ + return proc_userns_open(inode, file, &proc_setgroups_seq_operations); +} + +static const struct file_operations proc_setgroups_operations = { + .open = proc_setgroups_open, + .write = proc_setgroups_write, + .read = seq_read, + .llseek = seq_lseek, + .release = proc_userns_release, }; #endif /* CONFIG_USER_NS */ @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif #ifdef CONFIG_CHECKPOINT_RESTORE REG("timers", S_IRUGO, proc_timers_operations), @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif }; diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 26d5e8f5db97..1e8cb168b1d0 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -27,6 +27,7 @@ struct user_namespace { kuid_t owner; kgid_t group; unsigned int proc_inum; + atomic_t setgroups_allowed; /* Register of per-UID persistent keyrings for this namespace */ #ifdef CONFIG_PERSISTENT_KEYRINGS @@ -65,9 +66,11 @@ struct seq_operations; extern const struct seq_operations proc_uid_seq_operations; extern const struct seq_operations proc_gid_seq_operations; extern const struct seq_operations proc_projid_seq_operations; +extern const struct seq_operations proc_setgroups_seq_operations; extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) diff --git a/kernel/groups.c b/kernel/groups.c index b9a6a5c7e100..467ae954e859 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -226,6 +226,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) int retval; if (!gid_mapping_possible(user_ns) || + !atomic_read(&user_ns->setgroups_allowed) || !ns_capable(user_ns, CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) diff --git a/kernel/uid16.c b/kernel/uid16.c index 602c7de2aa11..096962fa1975 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -179,6 +179,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) int retval; if (!gid_mapping_possible(user_ns) || + !atomic_read(&user_ns->setgroups_allowed) || !ns_capable(user_ns, CAP_SETGID)) return -EPERM; if ((unsigned)gidsetsize > NGROUPS_MAX) diff --git a/kernel/user.c b/kernel/user.c index 4efa39350e44..0d78759f7dbe 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, + .setgroups_allowed = ATOMIC_INIT(1), #ifdef CONFIG_PERSISTENT_KEYRINGS .persistent_keyring_register_sem = __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 51d65b444951..521c8d53ee17 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -98,6 +98,7 @@ int create_user_ns(struct cred *new) ns->level = parent_ns->level + 1; ns->owner = owner; ns->group = group; + atomic_set(&ns->setgroups_allowed, atomic_read(&parent_ns->setgroups_allowed)); set_cred_user_ns(new, ns); @@ -640,7 +641,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, if (!page) goto out; - /* Only allow <= page size writes at the beginning of the file */ + /* Only allow < page size writes at the beginning of the file */ ret = -EINVAL; if ((*ppos != 0) || (count >= PAGE_SIZE)) goto out; @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, kuid_t uid = make_kuid(ns->parent, id); if (uid_eq(uid, cred->euid)) return true; + } else if (cap_setid == CAP_SETGID) { + kgid_t gid = make_kgid(ns->parent, id); + if (!atomic_read(&ns->setgroups_allowed) && + gid_eq(gid, cred->egid)) + return true; } } @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file, return false; } +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) +{ + struct user_namespace *ns = seq->private; + + return (*ppos == 0) ? ns : NULL; +} + +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) +{ + ++*ppos; + return NULL; +} + +static void setgroups_m_stop(struct seq_file *seq, void *v) +{ +} + +static int setgroups_m_show(struct seq_file *seq, void *v) +{ + struct user_namespace *ns = seq->private; + + seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed)); + return 0; +} + +const struct seq_operations proc_setgroups_seq_operations = { + .start = setgroups_m_start, + .stop = setgroups_m_stop, + .next = setgroups_m_next, + .show = setgroups_m_show, +}; + +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct seq_file *seq = file->private_data; + struct user_namespace *ns = seq->private; + char kbuf[3]; + int setgroups_allowed; + ssize_t ret; + + ret = -EPERM; + if (!file_ns_capable(file, ns, CAP_SETGID)) + goto out; + + /* Only allow a very narrow range of strings to be written */ + ret = -EINVAL; + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) + goto out; + + /* What was written? */ + ret = -EFAULT; + if (copy_from_user(kbuf, buf, count)) + goto out; + kbuf[count] = '\0'; + + /* What is being requested? */ + ret = -EINVAL; + if (kbuf[0] == '0') + setgroups_allowed = 0; + else if (kbuf[0] == '1') + setgroups_allowed = 1; + else + goto out; + + /* Allow a trailing newline */ + ret = -EINVAL; + if ((count == 2) && (kbuf[1] != '\n')) + goto out; + + + if (setgroups_allowed) { + ret = -EINVAL; + if (atomic_read(&ns->setgroups_allowed) == 0) + goto out; + } else { + atomic_set(&ns->setgroups_allowed, 0); + /* sigh memory barriers! */ + } + + /* Report a successful write */ + *ppos = count; + ret = count; +out: + return ret; +} + static void *userns_get(struct task_struct *task) { struct user_namespace *user_ns; -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests 2014-12-02 20:28 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-02 20:30 ` Eric W. Biederman 2014-12-02 21:05 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski 1 sibling, 0 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 20:30 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable A security fix in caused the way the unprivileged remount tests were using user namespaces to break. Tweak the way user namespaces are being used so the test works again. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- This is what it takes to fix a broken application, in it's full glory. This fix works even if new functionality does not exist. tools/testing/selftests/mount/unprivileged-remount-test.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 9669d375625a..d47227494137 100644 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c @@ -144,13 +144,12 @@ static void create_and_enter_userns(void) strerror(errno)); } + if (access("/proc/self/setgroups", F_OK) == 0) { + write_file("/proc/self/setgroups", "0"); + } write_file("/proc/self/uid_map", "0 %d 1", uid); write_file("/proc/self/gid_map", "0 %d 1", gid); - if (setgroups(0, NULL) != 0) { - die("setgroups failed: %s\n", - strerror(errno)); - } if (setgid(0) != 0) { die ("setgid(0) failed %s\n", strerror(errno)); -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 20:28 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-02 20:30 ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman @ 2014-12-02 21:05 ` Andy Lutomirski 2014-12-02 21:45 ` Eric W. Biederman 1 sibling, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 21:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > - Expose the knob to user space through a proc file /proc/<pid>/setgroups Can you rename this to something clearer, e.g. userns_setgroups_mode? > > A value of 0 means the setgroups system call is disabled in the > current processes user namespace and can not be enabled in the > future in this user namespace. > > A value of 1 means the segtoups system call is enabled. Would it make more sense to put strings like "allow" and "deny" in here? That way, future extensions could add additional values. > diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c > index 21c91feeca2d..6d0ee1b089fb 100644 > --- a/arch/s390/kernel/compat_linux.c > +++ b/arch/s390/kernel/compat_linux.c > @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis > int retval; > > if (!gid_mapping_possible(user_ns) || > + !atomic_read(&user_ns->setgroups_allowed) || > !capable(CAP_SETGID)) > return -EPERM; This is now incomprehensible because of the gid_mapping_possible thing. If you renamed gid_mapping_possible to userns_setgroup_allowed, then this could be added to the implementation, and this would all make sense (not to mention avoiding duplicating this thing). > @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, > kuid_t uid = make_kuid(ns->parent, id); > if (uid_eq(uid, cred->euid)) > return true; > + } else if (cap_setid == CAP_SETGID) { > + kgid_t gid = make_kgid(ns->parent, id); > + if (!atomic_read(&ns->setgroups_allowed) && > + gid_eq(gid, cred->egid)) > + return true; I still don't see why egid is any better than fsgid here. > } > } > > @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file, > return false; > } > > +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) > +{ > + struct user_namespace *ns = seq->private; > + > + return (*ppos == 0) ? ns : NULL; > +} > + > +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) > +{ > + ++*ppos; > + return NULL; > +} > + > +static void setgroups_m_stop(struct seq_file *seq, void *v) > +{ > +} > + > +static int setgroups_m_show(struct seq_file *seq, void *v) > +{ > + struct user_namespace *ns = seq->private; > + > + seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed)); > + return 0; > +} > + > +const struct seq_operations proc_setgroups_seq_operations = { > + .start = setgroups_m_start, > + .stop = setgroups_m_stop, > + .next = setgroups_m_next, > + .show = setgroups_m_show, > +}; > + > +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, > + size_t count, loff_t *ppos) > +{ > + struct seq_file *seq = file->private_data; > + struct user_namespace *ns = seq->private; > + char kbuf[3]; > + int setgroups_allowed; > + ssize_t ret; > + > + ret = -EPERM; > + if (!file_ns_capable(file, ns, CAP_SETGID)) > + goto out; CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's reconfiguring the namespace. > + > + /* Only allow a very narrow range of strings to be written */ > + ret = -EINVAL; > + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) > + goto out; > + > + /* What was written? */ > + ret = -EFAULT; > + if (copy_from_user(kbuf, buf, count)) > + goto out; > + kbuf[count] = '\0'; > + > + /* What is being requested? */ > + ret = -EINVAL; > + if (kbuf[0] == '0') > + setgroups_allowed = 0; > + else if (kbuf[0] == '1') > + setgroups_allowed = 1; > + else > + goto out; > + > + /* Allow a trailing newline */ > + ret = -EINVAL; > + if ((count == 2) && (kbuf[1] != '\n')) > + goto out; > + > + > + if (setgroups_allowed) { > + ret = -EINVAL; > + if (atomic_read(&ns->setgroups_allowed) == 0) > + goto out; > + } else { I would disallow this if gid_map has been written in the interest of sanity. > + atomic_set(&ns->setgroups_allowed, 0); > + /* sigh memory barriers! */ I don't think that any barriers are needed. If you ever observe setgroups_allowed == 0, it will stay that way forever. > + } > + > + /* Report a successful write */ > + *ppos = count; > + ret = count; > +out: > + return ret; > +} > + > static void *userns_get(struct task_struct *task) > { > struct user_namespace *user_ns; --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 21:05 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski @ 2014-12-02 21:45 ` Eric W. Biederman 2014-12-02 22:17 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 21:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> - Expose the knob to user space through a proc file /proc/<pid>/setgroups > > Can you rename this to something clearer, e.g. userns_setgroups_mode? I am not certain that is any clearer. That just reads as wordier. The userns bit is definitely confusing and wrong. Why should we need to spell out the scope? >> A value of 0 means the setgroups system call is disabled in the >> current processes user namespace and can not be enabled in the >> future in this user namespace. >> >> A value of 1 means the segtoups system call is enabled. > > Would it make more sense to put strings like "allow" and "deny" in > here? That way, future extensions could add additional values. If the implementation of the write side isn't too bad. I would love to see precedent elsewhere in the kernel. What I don't expect to do is have any values except setgroups are enabled and setgroups are disabled. I am fine with allowing for the possibility (that is just good engineering) but I don't intend to seriously considering or implementing other possibilities. >> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c >> index 21c91feeca2d..6d0ee1b089fb 100644 >> --- a/arch/s390/kernel/compat_linux.c >> +++ b/arch/s390/kernel/compat_linux.c >> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis >> int retval; >> >> if (!gid_mapping_possible(user_ns) || >> + !atomic_read(&user_ns->setgroups_allowed) || >> !capable(CAP_SETGID)) >> return -EPERM; > > This is now incomprehensible because of the gid_mapping_possible > thing. If you renamed gid_mapping_possible to > userns_setgroup_allowed, then this could be added to the > implementation, and this would all make sense (not to mention avoiding > duplicating this thing). > >> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, >> kuid_t uid = make_kuid(ns->parent, id); >> if (uid_eq(uid, cred->euid)) >> return true; >> + } else if (cap_setid == CAP_SETGID) { >> + kgid_t gid = make_kgid(ns->parent, id); >> + if (!atomic_read(&ns->setgroups_allowed) && >> + gid_eq(gid, cred->egid)) >> + return true; > > I still don't see why egid is any better than fsgid here. Answered in my earlier response fsgid was a goof. I can set any gid to my egid with my existing permissions. Show me how I can do that with fsgid or fsuid and I will be happy to use those. >> } >> } >> >> @@ -844,6 +850,93 @@ static bool new_idmap_permitted(const struct file *file, >> return false; >> } >> >> +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) >> +{ >> + struct user_namespace *ns = seq->private; >> + >> + return (*ppos == 0) ? ns : NULL; >> +} >> + >> +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) >> +{ >> + ++*ppos; >> + return NULL; >> +} >> + >> +static void setgroups_m_stop(struct seq_file *seq, void *v) >> +{ >> +} >> + >> +static int setgroups_m_show(struct seq_file *seq, void *v) >> +{ >> + struct user_namespace *ns = seq->private; >> + >> + seq_printf(seq, "%u\n", atomic_read(&ns->setgroups_allowed)); >> + return 0; >> +} >> + >> +const struct seq_operations proc_setgroups_seq_operations = { >> + .start = setgroups_m_start, >> + .stop = setgroups_m_stop, >> + .next = setgroups_m_next, >> + .show = setgroups_m_show, >> +}; >> + >> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, >> + size_t count, loff_t *ppos) >> +{ >> + struct seq_file *seq = file->private_data; >> + struct user_namespace *ns = seq->private; >> + char kbuf[3]; >> + int setgroups_allowed; >> + ssize_t ret; >> + >> + ret = -EPERM; >> + if (!file_ns_capable(file, ns, CAP_SETGID)) >> + goto out; > > CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's > reconfiguring the namespace. Hmm. Maybe. It is an activity that is normally controlled by CAP_SETGID. Frankly I think the entire split up of the capability model is almost totally broken. But I think CAP_SETGID is a close approximation of the right thing here. >> + /* Only allow a very narrow range of strings to be written */ >> + ret = -EINVAL; >> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) >> + goto out; >> + >> + /* What was written? */ >> + ret = -EFAULT; >> + if (copy_from_user(kbuf, buf, count)) >> + goto out; >> + kbuf[count] = '\0'; >> + >> + /* What is being requested? */ >> + ret = -EINVAL; >> + if (kbuf[0] == '0') >> + setgroups_allowed = 0; >> + else if (kbuf[0] == '1') >> + setgroups_allowed = 1; >> + else >> + goto out; >> + >> + /* Allow a trailing newline */ >> + ret = -EINVAL; >> + if ((count == 2) && (kbuf[1] != '\n')) >> + goto out; >> + >> + >> + if (setgroups_allowed) { >> + ret = -EINVAL; >> + if (atomic_read(&ns->setgroups_allowed) == 0) >> + goto out; >> + } else { > > I would disallow this if gid_map has been written in the interest of > sanity. Not a chance. That is part of making this an independent knob. If there is another reason for disabling setgroups you can flip this knob even after mappings are established. >> + atomic_set(&ns->setgroups_allowed, 0); >> + /* sigh memory barriers! */ > > I don't think that any barriers are needed. If you ever observe > setgroups_allowed == 0, it will stay that way forever. Likely. In practice the code works today. But I need to review things closely to understand if there are barriers needed. But especially since it is a write once knob we can get away with a lot. >> + } >> + >> + /* Report a successful write */ >> + *ppos = count; >> + ret = count; >> +out: >> + return ret; >> +} >> + >> static void *userns_get(struct task_struct *task) >> { >> struct user_namespace *user_ns; > > --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 21:45 ` Eric W. Biederman @ 2014-12-02 22:17 ` Andy Lutomirski 2014-12-02 23:07 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 22:17 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: >>> >>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >> >> Can you rename this to something clearer, e.g. userns_setgroups_mode? > > I am not certain that is any clearer. That just reads as wordier. > > The userns bit is definitely confusing and wrong. Why should we need to > spell out the scope? Because it's a property of the process' userns, not a property of the process. userns_setgroups would be okay. (Arguably it should've been userns_uid_map, too, but at least uid_map sounds obviously namespace-related.) > >>> A value of 0 means the setgroups system call is disabled in the >>> current processes user namespace and can not be enabled in the >>> future in this user namespace. >>> >>> A value of 1 means the segtoups system call is enabled. >> >> Would it make more sense to put strings like "allow" and "deny" in >> here? That way, future extensions could add additional values. > > If the implementation of the write side isn't too bad. I would love > to see precedent elsewhere in the kernel. What I don't expect to do > is have any values except setgroups are enabled and setgroups are > disabled. current_clocksource? I think there are lots of things like that. > > I am fine with allowing for the possibility (that is just good > engineering) but I don't intend to seriously considering or > implementing other possibilities. I can imagine a mode where there's a fixed set of groups that are forced set but other groups can be added and dropped. It would be complicated to set up right, but someone might want it some day. > >>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c >>> index 21c91feeca2d..6d0ee1b089fb 100644 >>> --- a/arch/s390/kernel/compat_linux.c >>> +++ b/arch/s390/kernel/compat_linux.c >>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis >>> int retval; >>> >>> if (!gid_mapping_possible(user_ns) || >>> + !atomic_read(&user_ns->setgroups_allowed) || >>> !capable(CAP_SETGID)) >>> return -EPERM; >> >> This is now incomprehensible because of the gid_mapping_possible >> thing. If you renamed gid_mapping_possible to >> userns_setgroup_allowed, then this could be added to the >> implementation, and this would all make sense (not to mention avoiding >> duplicating this thing). >> >>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, >>> kuid_t uid = make_kuid(ns->parent, id); >>> if (uid_eq(uid, cred->euid)) >>> return true; >>> + } else if (cap_setid == CAP_SETGID) { >>> + kgid_t gid = make_kgid(ns->parent, id); >>> + if (!atomic_read(&ns->setgroups_allowed) && >>> + gid_eq(gid, cred->egid)) >>> + return true; >> >> I still don't see why egid is any better than fsgid here. > > Answered in my earlier response fsgid was a goof. > I can set any gid to my egid with my existing permissions. > Show me how I can do that with fsgid or fsuid and I will be happy to use > those. You can use your fsgid to make a setgid file, I think. But yes, point taken, although as mentioned in the other thread, I think it would be a lot clearer if it were a separate patch. >>> + >>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, >>> + size_t count, loff_t *ppos) >>> +{ >>> + struct seq_file *seq = file->private_data; >>> + struct user_namespace *ns = seq->private; >>> + char kbuf[3]; >>> + int setgroups_allowed; >>> + ssize_t ret; >>> + >>> + ret = -EPERM; >>> + if (!file_ns_capable(file, ns, CAP_SETGID)) >>> + goto out; >> >> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's >> reconfiguring the namespace. > > Hmm. Maybe. It is an activity that is normally controlled by > CAP_SETGID. > > Frankly I think the entire split up of the capability model is almost > totally broken. But I think CAP_SETGID is a close approximation of the > right thing here. I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for everything else that changes the overall behavior of a namespace. In any event, the only way it matters is for a non-ns owner in the parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that CAP_SETGID should not be usable to make unrelated process' syscalls fail. > >>> + /* Only allow a very narrow range of strings to be written */ >>> + ret = -EINVAL; >>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) >>> + goto out; >>> + >>> + /* What was written? */ >>> + ret = -EFAULT; >>> + if (copy_from_user(kbuf, buf, count)) >>> + goto out; >>> + kbuf[count] = '\0'; >>> + >>> + /* What is being requested? */ >>> + ret = -EINVAL; >>> + if (kbuf[0] == '0') >>> + setgroups_allowed = 0; >>> + else if (kbuf[0] == '1') >>> + setgroups_allowed = 1; >>> + else >>> + goto out; >>> + >>> + /* Allow a trailing newline */ >>> + ret = -EINVAL; >>> + if ((count == 2) && (kbuf[1] != '\n')) >>> + goto out; >>> + >>> + >>> + if (setgroups_allowed) { >>> + ret = -EINVAL; >>> + if (atomic_read(&ns->setgroups_allowed) == 0) >>> + goto out; >>> + } else { >> >> I would disallow this if gid_map has been written in the interest of >> sanity. > > Not a chance. That is part of making this an independent knob. If > there is another reason for disabling setgroups you can flip this > knob even after mappings are established. Then you really want CAP_SYS_ADMIN, I think. > >>> + atomic_set(&ns->setgroups_allowed, 0); >>> + /* sigh memory barriers! */ >> >> I don't think that any barriers are needed. If you ever observe >> setgroups_allowed == 0, it will stay that way forever. > > Likely. In practice the code works today. > > But I need to review things closely to understand if there are barriers > needed. But especially since it is a write once knob we can get away > with a lot. > Yeah. For long-term use, I kind of like the flags approach I added in the other patch. It makes it easy to add more flags in the future. In any event, I think the only barrier that's needed is when writing gid_map: atomic_read / test_bit to determine whether unpriv mappings are okay; smp_mb() or whatever the current _after_atomic thing is these days; write mapping; Although I'm not sure whether Linux supports any architectures that can violate causality in the way that barrier is there to prevent. --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 22:17 ` Andy Lutomirski @ 2014-12-02 23:07 ` Eric W. Biederman 2014-12-02 23:17 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 23:07 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman >>> <ebiederm@xmission.com> wrote: >>>> >>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >>> >>> Can you rename this to something clearer, e.g. userns_setgroups_mode? >> >> I am not certain that is any clearer. That just reads as wordier. >> >> The userns bit is definitely confusing and wrong. Why should we need to >> spell out the scope? > > Because it's a property of the process' userns, not a property of the process. > > userns_setgroups would be okay. (Arguably it should've been > userns_uid_map, too, but at least uid_map sounds obviously > namespace-related.) > >> >>>> A value of 0 means the setgroups system call is disabled in the >>>> current processes user namespace and can not be enabled in the >>>> future in this user namespace. >>>> >>>> A value of 1 means the segtoups system call is enabled. >>> >>> Would it make more sense to put strings like "allow" and "deny" in >>> here? That way, future extensions could add additional values. >> >> If the implementation of the write side isn't too bad. I would love >> to see precedent elsewhere in the kernel. What I don't expect to do >> is have any values except setgroups are enabled and setgroups are >> disabled. > > current_clocksource? I think there are lots of things like that. > >> >> I am fine with allowing for the possibility (that is just good >> engineering) but I don't intend to seriously considering or >> implementing other possibilities. > > I can imagine a mode where there's a fixed set of groups that are > forced set but other groups can be added and dropped. It would be > complicated to set up right, but someone might want it some day. Yeah. I am defintiely not designing for that. >>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c >>>> index 21c91feeca2d..6d0ee1b089fb 100644 >>>> --- a/arch/s390/kernel/compat_linux.c >>>> +++ b/arch/s390/kernel/compat_linux.c >>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis >>>> int retval; >>>> >>>> if (!gid_mapping_possible(user_ns) || >>>> + !atomic_read(&user_ns->setgroups_allowed) || >>>> !capable(CAP_SETGID)) >>>> return -EPERM; >>> >>> This is now incomprehensible because of the gid_mapping_possible >>> thing. If you renamed gid_mapping_possible to >>> userns_setgroup_allowed, then this could be added to the >>> implementation, and this would all make sense (not to mention avoiding >>> duplicating this thing). >>> >>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, >>>> kuid_t uid = make_kuid(ns->parent, id); >>>> if (uid_eq(uid, cred->euid)) >>>> return true; >>>> + } else if (cap_setid == CAP_SETGID) { >>>> + kgid_t gid = make_kgid(ns->parent, id); >>>> + if (!atomic_read(&ns->setgroups_allowed) && >>>> + gid_eq(gid, cred->egid)) >>>> + return true; >>> >>> I still don't see why egid is any better than fsgid here. >> >> Answered in my earlier response fsgid was a goof. >> I can set any gid to my egid with my existing permissions. >> Show me how I can do that with fsgid or fsuid and I will be happy to use >> those. > > You can use your fsgid to make a setgid file, I think. But yes, point > taken, although as mentioned in the other thread, I think it would be > a lot clearer if it were a separate patch. Agreed. >>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, >>>> + size_t count, loff_t *ppos) >>>> +{ >>>> + struct seq_file *seq = file->private_data; >>>> + struct user_namespace *ns = seq->private; >>>> + char kbuf[3]; >>>> + int setgroups_allowed; >>>> + ssize_t ret; >>>> + >>>> + ret = -EPERM; >>>> + if (!file_ns_capable(file, ns, CAP_SETGID)) >>>> + goto out; >>> >>> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's >>> reconfiguring the namespace. >> >> Hmm. Maybe. It is an activity that is normally controlled by >> CAP_SETGID. >> >> Frankly I think the entire split up of the capability model is almost >> totally broken. But I think CAP_SETGID is a close approximation of the >> right thing here. > > I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for > everything else that changes the overall behavior of a namespace. > > In any event, the only way it matters is for a non-ns owner in the > parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that > CAP_SETGID should not be usable to make unrelated process' syscalls > fail. That is a pretty decent argument. I will take a good stare at this issue as I refresh these patches and see how close to perfection I can make them. >>>> + /* Only allow a very narrow range of strings to be written */ >>>> + ret = -EINVAL; >>>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) >>>> + goto out; >>>> + >>>> + /* What was written? */ >>>> + ret = -EFAULT; >>>> + if (copy_from_user(kbuf, buf, count)) >>>> + goto out; >>>> + kbuf[count] = '\0'; >>>> + >>>> + /* What is being requested? */ >>>> + ret = -EINVAL; >>>> + if (kbuf[0] == '0') >>>> + setgroups_allowed = 0; >>>> + else if (kbuf[0] == '1') >>>> + setgroups_allowed = 1; >>>> + else >>>> + goto out; >>>> + >>>> + /* Allow a trailing newline */ >>>> + ret = -EINVAL; >>>> + if ((count == 2) && (kbuf[1] != '\n')) >>>> + goto out; >>>> + >>>> + >>>> + if (setgroups_allowed) { >>>> + ret = -EINVAL; >>>> + if (atomic_read(&ns->setgroups_allowed) == 0) >>>> + goto out; >>>> + } else { >>> >>> I would disallow this if gid_map has been written in the interest of >>> sanity. >> >> Not a chance. That is part of making this an independent knob. If >> there is another reason for disabling setgroups you can flip this >> knob even after mappings are established. > > Then you really want CAP_SYS_ADMIN, I think. > >> >>>> + atomic_set(&ns->setgroups_allowed, 0); >>>> + /* sigh memory barriers! */ >>> >>> I don't think that any barriers are needed. If you ever observe >>> setgroups_allowed == 0, it will stay that way forever. >> >> Likely. In practice the code works today. >> >> But I need to review things closely to understand if there are barriers >> needed. But especially since it is a write once knob we can get away >> with a lot. >> > > Yeah. > > For long-term use, I kind of like the flags approach I added in the > other patch. It makes it easy to add more flags in the future. I thought a dedicated word might be easier. I don't know that it makes much of a difference at this point. > In any event, I think the only barrier that's needed is when writing gid_map: > > atomic_read / test_bit to determine whether unpriv mappings are okay; > smp_mb() or whatever the current _after_atomic thing is these days; > write mapping; > > Although I'm not sure whether Linux supports any architectures that > can violate causality in the way that barrier is there to prevent. The DEC Alpha at least used to be able to violate causality in ways weird enough to confuse anyone, and the alpha still seems to be in the tree. What keyed me in was the part in atomic_ops.txt that says these things don't have barriers and you will probably need them, and I remember spin locks tend to take care of all those issues for you. Anyway thanks for your barrier pointer. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-02 23:07 ` Eric W. Biederman @ 2014-12-02 23:17 ` Andy Lutomirski 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 23:17 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 3:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Tue, Dec 2, 2014 at 1:45 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> Andy Lutomirski <luto@amacapital.net> writes: >>> >>>> On Tue, Dec 2, 2014 at 12:28 PM, Eric W. Biederman >>>> <ebiederm@xmission.com> wrote: >>>>> >>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >>>> >>>> Can you rename this to something clearer, e.g. userns_setgroups_mode? >>> >>> I am not certain that is any clearer. That just reads as wordier. >>> >>> The userns bit is definitely confusing and wrong. Why should we need to >>> spell out the scope? >> >> Because it's a property of the process' userns, not a property of the process. >> >> userns_setgroups would be okay. (Arguably it should've been >> userns_uid_map, too, but at least uid_map sounds obviously >> namespace-related.) >> >>> >>>>> A value of 0 means the setgroups system call is disabled in the >>>>> current processes user namespace and can not be enabled in the >>>>> future in this user namespace. >>>>> >>>>> A value of 1 means the segtoups system call is enabled. >>>> >>>> Would it make more sense to put strings like "allow" and "deny" in >>>> here? That way, future extensions could add additional values. >>> >>> If the implementation of the write side isn't too bad. I would love >>> to see precedent elsewhere in the kernel. What I don't expect to do >>> is have any values except setgroups are enabled and setgroups are >>> disabled. >> >> current_clocksource? I think there are lots of things like that. >> >>> >>> I am fine with allowing for the possibility (that is just good >>> engineering) but I don't intend to seriously considering or >>> implementing other possibilities. >> >> I can imagine a mode where there's a fixed set of groups that are >> forced set but other groups can be added and dropped. It would be >> complicated to set up right, but someone might want it some day. > > Yeah. I am defintiely not designing for that. > >>>>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c >>>>> index 21c91feeca2d..6d0ee1b089fb 100644 >>>>> --- a/arch/s390/kernel/compat_linux.c >>>>> +++ b/arch/s390/kernel/compat_linux.c >>>>> @@ -252,6 +252,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis >>>>> int retval; >>>>> >>>>> if (!gid_mapping_possible(user_ns) || >>>>> + !atomic_read(&user_ns->setgroups_allowed) || >>>>> !capable(CAP_SETGID)) >>>>> return -EPERM; >>>> >>>> This is now incomprehensible because of the gid_mapping_possible >>>> thing. If you renamed gid_mapping_possible to >>>> userns_setgroup_allowed, then this could be added to the >>>> implementation, and this would all make sense (not to mention avoiding >>>> duplicating this thing). >>>> >>>>> @@ -826,6 +827,11 @@ static bool new_idmap_permitted(const struct file *file, >>>>> kuid_t uid = make_kuid(ns->parent, id); >>>>> if (uid_eq(uid, cred->euid)) >>>>> return true; >>>>> + } else if (cap_setid == CAP_SETGID) { >>>>> + kgid_t gid = make_kgid(ns->parent, id); >>>>> + if (!atomic_read(&ns->setgroups_allowed) && >>>>> + gid_eq(gid, cred->egid)) >>>>> + return true; >>>> >>>> I still don't see why egid is any better than fsgid here. >>> >>> Answered in my earlier response fsgid was a goof. >>> I can set any gid to my egid with my existing permissions. >>> Show me how I can do that with fsgid or fsuid and I will be happy to use >>> those. >> >> You can use your fsgid to make a setgid file, I think. But yes, point >> taken, although as mentioned in the other thread, I think it would be >> a lot clearer if it were a separate patch. > > Agreed. > >>>>> +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, >>>>> + size_t count, loff_t *ppos) >>>>> +{ >>>>> + struct seq_file *seq = file->private_data; >>>>> + struct user_namespace *ns = seq->private; >>>>> + char kbuf[3]; >>>>> + int setgroups_allowed; >>>>> + ssize_t ret; >>>>> + >>>>> + ret = -EPERM; >>>>> + if (!file_ns_capable(file, ns, CAP_SETGID)) >>>>> + goto out; >>>> >>>> CAP_SYS_ADMIN? This isn't setting a gid in the namespace; it's >>>> reconfiguring the namespace. >>> >>> Hmm. Maybe. It is an activity that is normally controlled by >>> CAP_SETGID. >>> >>> Frankly I think the entire split up of the capability model is almost >>> totally broken. But I think CAP_SETGID is a close approximation of the >>> right thing here. >> >> I agree that the cap model is screwy. But we use CAP_SYS_ADMIN for >> everything else that changes the overall behavior of a namespace. >> >> In any event, the only way it matters is for a non-ns owner in the >> parent ns with CAP_SETGID set but not CAP_SYS_ADMIN. I'd argue that >> CAP_SETGID should not be usable to make unrelated process' syscalls >> fail. > > That is a pretty decent argument. I will take a good stare at this > issue as I refresh these patches and see how close to perfection I can > make them. > >>>>> + /* Only allow a very narrow range of strings to be written */ >>>>> + ret = -EINVAL; >>>>> + if ((*ppos != 0) || (count >= sizeof(kbuf)) || (count < 1)) >>>>> + goto out; >>>>> + >>>>> + /* What was written? */ >>>>> + ret = -EFAULT; >>>>> + if (copy_from_user(kbuf, buf, count)) >>>>> + goto out; >>>>> + kbuf[count] = '\0'; >>>>> + >>>>> + /* What is being requested? */ >>>>> + ret = -EINVAL; >>>>> + if (kbuf[0] == '0') >>>>> + setgroups_allowed = 0; >>>>> + else if (kbuf[0] == '1') >>>>> + setgroups_allowed = 1; >>>>> + else >>>>> + goto out; >>>>> + >>>>> + /* Allow a trailing newline */ >>>>> + ret = -EINVAL; >>>>> + if ((count == 2) && (kbuf[1] != '\n')) >>>>> + goto out; >>>>> + >>>>> + >>>>> + if (setgroups_allowed) { >>>>> + ret = -EINVAL; >>>>> + if (atomic_read(&ns->setgroups_allowed) == 0) >>>>> + goto out; >>>>> + } else { >>>> >>>> I would disallow this if gid_map has been written in the interest of >>>> sanity. >>> >>> Not a chance. That is part of making this an independent knob. If >>> there is another reason for disabling setgroups you can flip this >>> knob even after mappings are established. >> >> Then you really want CAP_SYS_ADMIN, I think. >> >>> >>>>> + atomic_set(&ns->setgroups_allowed, 0); >>>>> + /* sigh memory barriers! */ >>>> >>>> I don't think that any barriers are needed. If you ever observe >>>> setgroups_allowed == 0, it will stay that way forever. >>> >>> Likely. In practice the code works today. >>> >>> But I need to review things closely to understand if there are barriers >>> needed. But especially since it is a write once knob we can get away >>> with a lot. >>> >> >> Yeah. >> >> For long-term use, I kind of like the flags approach I added in the >> other patch. It makes it easy to add more flags in the future. > > I thought a dedicated word might be easier. I don't know that it makes > much of a difference at this point. > >> In any event, I think the only barrier that's needed is when writing gid_map: >> >> atomic_read / test_bit to determine whether unpriv mappings are okay; >> smp_mb() or whatever the current _after_atomic thing is these days; >> write mapping; >> >> Although I'm not sure whether Linux supports any architectures that >> can violate causality in the way that barrier is there to prevent. > > The DEC Alpha at least used to be able to violate causality in ways > weird enough to confuse anyone, and the alpha still seems to be in the > tree. What keyed me in was the part in atomic_ops.txt that says these > things don't have barriers and you will probably need them, and I > remember spin locks tend to take care of all those issues for you. > > Anyway thanks for your barrier pointer. > My pointer was a bit off. Barriers synchronize with barriers; the check for whether setgroups is okay may need a barrier as well: static inline bool may_setgroups(whatever) { if (no groups are mapped) return false; smp_rmb() (or _before_atomic, maybe -- I don't know whether that's okay for atomic_read); if (setgroups is disallowed by option) return false; return true; } It would be bad if there were a window in which we'd see a group mapped but not see that setgroups is disallowed. --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings. 2014-12-02 23:17 ` Andy Lutomirski @ 2014-12-08 22:06 ` Eric W. Biederman 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman ` (5 more replies) 0 siblings, 6 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:06 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index aa312b0dc3ec..b99c862a2e3f 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *new_map) { - /* Allow mapping to your own filesystem ids */ + /* Don't allow mappings that would allow anything that wouldn't + * be allowed without the establishment of unprivileged mappings. + */ if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman @ 2014-12-08 22:07 ` Eric W. Biederman 2014-12-08 22:11 ` Andy Lutomirski 2014-12-08 22:17 ` Richard Weinberger 2014-12-08 22:07 ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman ` (4 subsequent siblings) 5 siblings, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:07 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/user_namespace.h | 9 +++++++++ kernel/groups.c | 7 ++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index e95372654f09..41cc26e5a350 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -37,6 +37,15 @@ struct user_namespace { extern struct user_namespace init_user_ns; +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) +{ + bool established; + smp_mb__before_atomic(); + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; + smp_mb__after_atomic(); + return established; +} + #ifdef CONFIG_USER_NS static inline struct user_namespace *get_user_ns(struct user_namespace *ns) diff --git a/kernel/groups.c b/kernel/groups.c index 02d8a251c476..e0335e44f76a 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -6,6 +6,7 @@ #include <linux/slab.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/user_namespace.h> #include <asm/uaccess.h> /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -217,7 +218,11 @@ bool may_setgroups(void) { struct user_namespace *user_ns = current_user_ns(); - return ns_capable(user_ns, CAP_SETGID); + /* It is not safe to use setgroups until a gid mapping in + * the user namespace has been established. + */ + return userns_gid_mappings_established(user_ns) && + ns_capable(user_ns, CAP_SETGID); } /* -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman @ 2014-12-08 22:11 ` Andy Lutomirski [not found] ` <87h9x5ok0h.fsf@x220.int.ebiederm.org> 2014-12-08 22:17 ` Richard Weinberger 1 sibling, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:11 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > setgroups is unique in not needing a valid mapping before it can be called, > in the case of setgroups(0, NULL) which drops all supplemental groups. > > The design of the user namespace assumes that CAP_SETGID can not actually > be used until a gid mapping is established. Therefore add a helper function > to see if the user namespace gid mapping has been established and call > that function in the setgroups permission check. > > This is part of the fix for CVE-2014-8989, being able to drop groups > without privilege using user namespaces. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > include/linux/user_namespace.h | 9 +++++++++ > kernel/groups.c | 7 ++++++- > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index e95372654f09..41cc26e5a350 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -37,6 +37,15 @@ struct user_namespace { > > extern struct user_namespace init_user_ns; > > +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) > +{ > + bool established; > + smp_mb__before_atomic(); > + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; > + smp_mb__after_atomic(); > + return established; > +} I don't think this works on all platforms. ACCESS_ONCE is not atomic in the smp_mb__before_atomic sense. > + > #ifdef CONFIG_USER_NS > > static inline struct user_namespace *get_user_ns(struct user_namespace *ns) > diff --git a/kernel/groups.c b/kernel/groups.c > index 02d8a251c476..e0335e44f76a 100644 > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -6,6 +6,7 @@ > #include <linux/slab.h> > #include <linux/security.h> > #include <linux/syscalls.h> > +#include <linux/user_namespace.h> > #include <asm/uaccess.h> > > /* init to 2 - one for init_task, one to ensure it is never freed */ > @@ -217,7 +218,11 @@ bool may_setgroups(void) > { > struct user_namespace *user_ns = current_user_ns(); > > - return ns_capable(user_ns, CAP_SETGID); > + /* It is not safe to use setgroups until a gid mapping in > + * the user namespace has been established. > + */ > + return userns_gid_mappings_established(user_ns) && > + ns_capable(user_ns, CAP_SETGID); > } > > /* > -- > 1.9.1 > --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <87h9x5ok0h.fsf@x220.int.ebiederm.org>]
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished [not found] ` <87h9x5ok0h.fsf@x220.int.ebiederm.org> @ 2014-12-08 22:33 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:33 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> >>> setgroups is unique in not needing a valid mapping before it can be called, >>> in the case of setgroups(0, NULL) which drops all supplemental groups. >>> >>> The design of the user namespace assumes that CAP_SETGID can not actually >>> be used until a gid mapping is established. Therefore add a helper function >>> to see if the user namespace gid mapping has been established and call >>> that function in the setgroups permission check. >>> >>> This is part of the fix for CVE-2014-8989, being able to drop groups >>> without privilege using user namespaces. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>> --- >>> include/linux/user_namespace.h | 9 +++++++++ >>> kernel/groups.c | 7 ++++++- >>> 2 files changed, 15 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h >>> index e95372654f09..41cc26e5a350 100644 >>> --- a/include/linux/user_namespace.h >>> +++ b/include/linux/user_namespace.h >>> @@ -37,6 +37,15 @@ struct user_namespace { >>> >>> extern struct user_namespace init_user_ns; >>> >>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) >>> +{ >>> + bool established; >>> + smp_mb__before_atomic(); >>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; >>> + smp_mb__after_atomic(); >>> + return established; >>> +} >> >> I don't think this works on all platforms. ACCESS_ONCE is not atomic >> in the smp_mb__before_atomic sense. > > Documentation/atomic_ops.txt documents ACCESS_ONCE as being equivalent > to atomic_read() and atomic_set(). smp_mb__before_atomic and > smp_mb__after_atomic() are Documented as working with atomic_read and > atomic_set. Maybe it is a stretch to use them but it doesn't seem like > much of a stretch. I don't fully understand the design there. I think this is an attempt to work around the fact that test_bit is fully atomic on x86 but not elsewhere. > > Further at this point I don't know that any barriers are strictly > needed, beyond the ACCESS_ONCE. However since x86 does all of the > ordering in hardware that I need I am not going to find any bugs that > don't require a barrier. > > All I really want is the same level of barriers I would get if I used a > spin-lock protected data structure so I don't need to worry about > crazy smp issues that happen when the hardware decides it is safe to > reorder things. Use smp_rmb(), I think. It'll be obviously correct, and the performance impact really doesn't matter. Also, on platforms where this stuff matters, the barrier in smp_mb__whatever will be a full fence, whereas smp_rmb may be lighter weight. --Andy > > Eric > > >>> + >>> #ifdef CONFIG_USER_NS >>> >>> static inline struct user_namespace *get_user_ns(struct user_namespace *ns) >>> diff --git a/kernel/groups.c b/kernel/groups.c >>> index 02d8a251c476..e0335e44f76a 100644 >>> --- a/kernel/groups.c >>> +++ b/kernel/groups.c >>> @@ -6,6 +6,7 @@ >>> #include <linux/slab.h> >>> #include <linux/security.h> >>> #include <linux/syscalls.h> >>> +#include <linux/user_namespace.h> >>> #include <asm/uaccess.h> >>> >>> /* init to 2 - one for init_task, one to ensure it is never freed */ >>> @@ -217,7 +218,11 @@ bool may_setgroups(void) >>> { >>> struct user_namespace *user_ns = current_user_ns(); >>> >>> - return ns_capable(user_ns, CAP_SETGID); >>> + /* It is not safe to use setgroups until a gid mapping in >>> + * the user namespace has been established. >>> + */ >>> + return userns_gid_mappings_established(user_ns) && >>> + ns_capable(user_ns, CAP_SETGID); >>> } >>> >>> /* >>> -- >>> 1.9.1 >>> >> >> --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman 2014-12-08 22:11 ` Andy Lutomirski @ 2014-12-08 22:17 ` Richard Weinberger 2014-12-08 22:25 ` Andy Lutomirski 1 sibling, 1 reply; 79+ messages in thread From: Richard Weinberger @ 2014-12-08 22:17 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable Am 08.12.2014 um 23:07 schrieb Eric W. Biederman: > > setgroups is unique in not needing a valid mapping before it can be called, > in the case of setgroups(0, NULL) which drops all supplemental groups. > > The design of the user namespace assumes that CAP_SETGID can not actually > be used until a gid mapping is established. Therefore add a helper function > to see if the user namespace gid mapping has been established and call > that function in the setgroups permission check. > > This is part of the fix for CVE-2014-8989, being able to drop groups > without privilege using user namespaces. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > include/linux/user_namespace.h | 9 +++++++++ > kernel/groups.c | 7 ++++++- > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index e95372654f09..41cc26e5a350 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -37,6 +37,15 @@ struct user_namespace { > > extern struct user_namespace init_user_ns; > > +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) > +{ > + bool established; > + smp_mb__before_atomic(); > + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; > + smp_mb__after_atomic(); > + return established; > +} > + Maybe this is a stupid question, but why do we need all this magic around established = ... ? The purpose of this code is to check whether ns->gid_map.nr_extents != 0 in a lock-free manner? Thanks, //richard ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-08 22:17 ` Richard Weinberger @ 2014-12-08 22:25 ` Andy Lutomirski 2014-12-08 22:27 ` Richard Weinberger 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:25 UTC (permalink / raw) To: Richard Weinberger Cc: Eric W. Biederman, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote: > Am 08.12.2014 um 23:07 schrieb Eric W. Biederman: >> >> setgroups is unique in not needing a valid mapping before it can be called, >> in the case of setgroups(0, NULL) which drops all supplemental groups. >> >> The design of the user namespace assumes that CAP_SETGID can not actually >> be used until a gid mapping is established. Therefore add a helper function >> to see if the user namespace gid mapping has been established and call >> that function in the setgroups permission check. >> >> This is part of the fix for CVE-2014-8989, being able to drop groups >> without privilege using user namespaces. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> include/linux/user_namespace.h | 9 +++++++++ >> kernel/groups.c | 7 ++++++- >> 2 files changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h >> index e95372654f09..41cc26e5a350 100644 >> --- a/include/linux/user_namespace.h >> +++ b/include/linux/user_namespace.h >> @@ -37,6 +37,15 @@ struct user_namespace { >> >> extern struct user_namespace init_user_ns; >> >> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) >> +{ >> + bool established; >> + smp_mb__before_atomic(); >> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; >> + smp_mb__after_atomic(); >> + return established; >> +} >> + > > Maybe this is a stupid question, but why do we need all this magic > around established = ... ? > The purpose of this code is to check whether ns->gid_map.nr_extents != 0 > in a lock-free manner? > See my other comment -- the ordering will matter at the end of the series. It might be nicer to do this differently: in may_setgroups, do: if (!userns_gid_mappings_established) return false; /* User code can start with setgroups allowed, disallow it, and then add a mapping. We need to prevent a race that could cause this function to return true. */ smp_rmb(); if (!userns_setgroups_allowed) return false; --Andy > Thanks, > //richard -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-08 22:25 ` Andy Lutomirski @ 2014-12-08 22:27 ` Richard Weinberger [not found] ` <874mt5ojfh.fsf@x220.int.ebiederm.org> 0 siblings, 1 reply; 79+ messages in thread From: Richard Weinberger @ 2014-12-08 22:27 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable Am 08.12.2014 um 23:25 schrieb Andy Lutomirski: > On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote: >> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman: >>> >>> setgroups is unique in not needing a valid mapping before it can be called, >>> in the case of setgroups(0, NULL) which drops all supplemental groups. >>> >>> The design of the user namespace assumes that CAP_SETGID can not actually >>> be used until a gid mapping is established. Therefore add a helper function >>> to see if the user namespace gid mapping has been established and call >>> that function in the setgroups permission check. >>> >>> This is part of the fix for CVE-2014-8989, being able to drop groups >>> without privilege using user namespaces. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>> --- >>> include/linux/user_namespace.h | 9 +++++++++ >>> kernel/groups.c | 7 ++++++- >>> 2 files changed, 15 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h >>> index e95372654f09..41cc26e5a350 100644 >>> --- a/include/linux/user_namespace.h >>> +++ b/include/linux/user_namespace.h >>> @@ -37,6 +37,15 @@ struct user_namespace { >>> >>> extern struct user_namespace init_user_ns; >>> >>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) >>> +{ >>> + bool established; >>> + smp_mb__before_atomic(); >>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; >>> + smp_mb__after_atomic(); >>> + return established; >>> +} >>> + >> >> Maybe this is a stupid question, but why do we need all this magic >> around established = ... ? >> The purpose of this code is to check whether ns->gid_map.nr_extents != 0 >> in a lock-free manner? >> > > See my other comment -- the ordering will matter at the end of the series. But ns->gid_map.nr_extents is not atomic, it is a plain u32. This confuses me. Thanks, //richard ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <874mt5ojfh.fsf@x220.int.ebiederm.org>]
* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished [not found] ` <874mt5ojfh.fsf@x220.int.ebiederm.org> @ 2014-12-08 22:47 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Weinberger, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:39 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Richard Weinberger <richard@nod.at> writes: > >> Am 08.12.2014 um 23:25 schrieb Andy Lutomirski: >>> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard@nod.at> wrote: >>>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman: >>>>> >>>>> setgroups is unique in not needing a valid mapping before it can be called, >>>>> in the case of setgroups(0, NULL) which drops all supplemental groups. >>>>> >>>>> The design of the user namespace assumes that CAP_SETGID can not actually >>>>> be used until a gid mapping is established. Therefore add a helper function >>>>> to see if the user namespace gid mapping has been established and call >>>>> that function in the setgroups permission check. >>>>> >>>>> This is part of the fix for CVE-2014-8989, being able to drop groups >>>>> without privilege using user namespaces. >>>>> >>>>> Cc: stable@vger.kernel.org >>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>>> --- >>>>> include/linux/user_namespace.h | 9 +++++++++ >>>>> kernel/groups.c | 7 ++++++- >>>>> 2 files changed, 15 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h >>>>> index e95372654f09..41cc26e5a350 100644 >>>>> --- a/include/linux/user_namespace.h >>>>> +++ b/include/linux/user_namespace.h >>>>> @@ -37,6 +37,15 @@ struct user_namespace { >>>>> >>>>> extern struct user_namespace init_user_ns; >>>>> >>>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns) >>>>> +{ >>>>> + bool established; >>>>> + smp_mb__before_atomic(); >>>>> + established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0; >>>>> + smp_mb__after_atomic(); >>>>> + return established; >>>>> +} >>>>> + >>>> >>>> Maybe this is a stupid question, but why do we need all this magic >>>> around established = ... ? >>>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0 >>>> in a lock-free manner? >>>> >>> >>> See my other comment -- the ordering will matter at the end of the series. >> >> But ns->gid_map.nr_extents is not atomic, it is a plain u32. >> This confuses me. > > Read Documentation/atomic_ops.txt a plain u32 is atomic by definiton. > I still don't understand why the helper changed to smp_mb__before_atomic. > Which is a little bit convoluted. However that is part of the of the > gid mapping path and I optimized that as far as I humanly could so that > calls like stat don't take a noticable slow donw. > > On this path we don't particularly care except that I am using an the > existing data structure. As an example, arm64 defines both smp_mb__before_atomic and smp_mb__after_atomic as smp_mb(), which is heavier then smp_rmb(), and there are two of them. So I still like the explicit smp_rmb() better. --Andy > > Eric > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman @ 2014-12-08 22:07 ` Eric W. Biederman 2014-12-08 22:08 ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman ` (3 subsequent siblings) 5 siblings, 0 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:07 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index b99c862a2e3f..8e7c87162171 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file, kuid_t uid = make_kuid(ns->parent, id); if (uid_eq(uid, file->f_cred->fsuid)) return true; - } else if (cap_setid == CAP_SETGID) { - kgid_t gid = make_kgid(ns->parent, id); - if (gid_eq(gid, file->f_cred->fsgid)) - return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman 2014-12-08 22:07 ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman @ 2014-12-08 22:08 ` Eric W. Biederman 2014-12-08 22:12 ` Andy Lutomirski 2014-12-08 22:10 ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman ` (2 subsequent siblings) 5 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:08 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 8e7c87162171..da1eeb927b21 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file, u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { kuid_t uid = make_kuid(ns->parent, id); - if (uid_eq(uid, file->f_cred->fsuid)) + if (uid_eq(uid, file->f_cred->euid)) return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping 2014-12-08 22:08 ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman @ 2014-12-08 22:12 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:12 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:08 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > setresuid allows the euid to be set to any of uid, euid, suid, and > fsuid. Therefor it is safe to allow an unprivileged user to map their > euid and use CAP_SETUID privileged with exactly that uid, as no new > credentials can be obtained. > > I can not find a combination of existing system calls that allows > setting uid, euid, suid, and fsuid from the fsuid making the previous > use of fsuid for allowing unprivileged mappings a bug. Right. > > This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/user_namespace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 8e7c87162171..da1eeb927b21 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file, > u32 id = new_map->extent[0].lower_first; > if (cap_setid == CAP_SETUID) { > kuid_t uid = make_kuid(ns->parent, id); > - if (uid_eq(uid, file->f_cred->fsuid)) > + if (uid_eq(uid, file->f_cred->euid)) > return true; > } > } > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (2 preceding siblings ...) 2014-12-08 22:08 ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman @ 2014-12-08 22:10 ` Eric W. Biederman 2014-12-08 22:15 ` Andy Lutomirski 2014-12-08 22:11 ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-08 22:14 ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman 5 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:10 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace reduces the set of credentials that must be verified can be obtained without privielge, making code verification simpler. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index da1eeb927b21..413f60fd5983 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *new_map) { + const struct cred *cred = file->f_cred; /* Don't allow mappings that would allow anything that wouldn't * be allowed without the establishment of unprivileged mappings. */ - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && + uid_eq(ns->owner, cred->euid)) { u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { kuid_t uid = make_kuid(ns->parent, id); - if (uid_eq(uid, file->f_cred->euid)) + if (uid_eq(uid, cred->euid)) return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings 2014-12-08 22:10 ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman @ 2014-12-08 22:15 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:10 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > If you did not create the user namespace and are allowed > to write to uid_map or gid_map you should already have the necessary > privilege in the parent user namespace to establish any mapping > you want so this will not affect userspace in practice. > > Limiting unprivileged uid mapping establishment to the creator of the > user namespace reduces the set of credentials that must be verified > can be obtained without privielge, making code verification simpler. > s/privielge/privilege/ But I still can't parse that sentence. The code itself is: Reviewed-by: Andy Lutomirski <luto@amacapital.net> > Limiting unprivileged gid mapping establishment (which is temporarily > absent) to the creator of the user namespace also ensures that the > combination of uid and gid can already be obtained without privilege. > > This is part of the fix for CVE-2014-8989. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/user_namespace.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index da1eeb927b21..413f60fd5983 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file, > struct user_namespace *ns, int cap_setid, > struct uid_gid_map *new_map) > { > + const struct cred *cred = file->f_cred; > /* Don't allow mappings that would allow anything that wouldn't > * be allowed without the establishment of unprivileged mappings. > */ > - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { > + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && > + uid_eq(ns->owner, cred->euid)) { > u32 id = new_map->extent[0].lower_first; > if (cap_setid == CAP_SETUID) { > kuid_t uid = make_kuid(ns->parent, id); > - if (uid_eq(uid, file->f_cred->euid)) > + if (uid_eq(uid, cred->euid)) > return true; > } > } > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (3 preceding siblings ...) 2014-12-08 22:10 ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman @ 2014-12-08 22:11 ` Eric W. Biederman 2014-12-08 22:21 ` Andy Lutomirski 2014-12-08 22:14 ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman 5 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:11 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of 0 means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of 1 means the segtoups system call is enabled. - Descedent user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not pass in a struct file so file_ns_capable is unusable. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/proc/base.c | 31 ++++++++++---- include/linux/user_namespace.h | 25 +++++++++++ kernel/groups.c | 1 + kernel/user.c | 1 + kernel/user_namespace.c | 97 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 147 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 772efa45a452..4ebed9f01d97 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns, #endif /* CONFIG_TASK_IO_ACCOUNTING */ #ifdef CONFIG_USER_NS -static int proc_id_map_open(struct inode *inode, struct file *file, +static int proc_userns_open(struct inode *inode, struct file *file, const struct seq_operations *seq_ops) { struct user_namespace *ns = NULL; @@ -2418,7 +2418,7 @@ err: return ret; } -static int proc_id_map_release(struct inode *inode, struct file *file) +static int proc_userns_release(struct inode *inode, struct file *file) { struct seq_file *seq = file->private_data; struct user_namespace *ns = seq->private; @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file) static int proc_uid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_uid_seq_operations); + return proc_userns_open(inode, file, &proc_uid_seq_operations); } static int proc_gid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_gid_seq_operations); + return proc_userns_open(inode, file, &proc_gid_seq_operations); } static int proc_projid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_projid_seq_operations); + return proc_userns_open(inode, file, &proc_projid_seq_operations); } static const struct file_operations proc_uid_map_operations = { @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = { .write = proc_uid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_gid_map_operations = { @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = { .write = proc_gid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_projid_map_operations = { @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = { .write = proc_projid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, +}; + +static int proc_setgroups_open(struct inode *inode, struct file *file) +{ + return proc_userns_open(inode, file, &proc_setgroups_seq_operations); +} + +static const struct file_operations proc_setgroups_operations = { + .open = proc_setgroups_open, + .write = proc_setgroups_write, + .read = seq_read, + .llseek = seq_lseek, + .release = proc_userns_release, }; #endif /* CONFIG_USER_NS */ @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif #ifdef CONFIG_CHECKPOINT_RESTORE REG("timers", S_IRUGO, proc_timers_operations), @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif }; diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 41cc26e5a350..6451c401dcf6 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -17,6 +17,12 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ } extent[UID_GID_MAP_MAX_EXTENTS]; }; +enum user_namespace_flags { + USERNS_SETGROUPS_ALLOWED, +}; + +#define USERNS_INIT_FLAGS BIT(USERNS_SETGROUPS_ALLOWED) + struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; @@ -27,6 +33,7 @@ struct user_namespace { kuid_t owner; kgid_t group; unsigned int proc_inum; + unsigned long flags; /* Register of per-UID persistent keyrings for this namespace */ #ifdef CONFIG_PERSISTENT_KEYRINGS @@ -46,6 +53,22 @@ static inline bool userns_gid_mappings_established(const struct user_namespace * return established; } +static inline bool userns_setgroups_allowed(const struct user_namespace *ns) +{ + bool allowed; + smp_mb__before_atomic(); + allowed = test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags); + smp_mb__after_atomic(); + return allowed; +} + +static inline void userns_disable_setgroups(struct user_namespace *ns) +{ + smp_mb__before_atomic(); + clear_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags); + smp_mb__after_atomic(); +} + #ifdef CONFIG_USER_NS static inline struct user_namespace *get_user_ns(struct user_namespace *ns) @@ -69,9 +92,11 @@ struct seq_operations; extern const struct seq_operations proc_uid_seq_operations; extern const struct seq_operations proc_gid_seq_operations; extern const struct seq_operations proc_projid_seq_operations; +extern const struct seq_operations proc_setgroups_seq_operations; extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) diff --git a/kernel/groups.c b/kernel/groups.c index e0335e44f76a..2f136fda7c4d 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -222,6 +222,7 @@ bool may_setgroups(void) * the user namespace has been established. */ return userns_gid_mappings_established(user_ns) && + userns_setgroups_allowed(user_ns) && ns_capable(user_ns, CAP_SETGID); } diff --git a/kernel/user.c b/kernel/user.c index 4efa39350e44..2d09940c9632 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, + .flags = USERNS_INIT_FLAGS, #ifdef CONFIG_PERSISTENT_KEYRINGS .persistent_keyring_register_sem = __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 413f60fd5983..3d128f91ced3 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -98,6 +98,11 @@ int create_user_ns(struct cred *new) ns->level = parent_ns->level + 1; ns->owner = owner; ns->group = group; + ns->flags = USERNS_INIT_FLAGS; + + /* Copy USERNS_SETGROUPS_ALLOWED from the parent user namespace */ + if (!userns_setgroups_allowed(parent_ns)) + userns_disable_setgroups(ns); set_cred_user_ns(new, ns); @@ -841,6 +846,98 @@ static bool new_idmap_permitted(const struct file *file, return false; } +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) +{ + struct user_namespace *ns = seq->private; + + return (*ppos == 0) ? ns : NULL; +} + +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) +{ + ++*ppos; + return NULL; +} + +static void setgroups_m_stop(struct seq_file *seq, void *v) +{ +} + +static int setgroups_m_show(struct seq_file *seq, void *v) +{ + struct user_namespace *ns = seq->private; + + seq_printf(seq, "%s\n", + test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ? + "allow" : "deny"); + return 0; +} + +const struct seq_operations proc_setgroups_seq_operations = { + .start = setgroups_m_start, + .stop = setgroups_m_stop, + .next = setgroups_m_next, + .show = setgroups_m_show, +}; + +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct seq_file *seq = file->private_data; + struct user_namespace *ns = seq->private; + char kbuf[8], *pos; + bool setgroups_allowed; + ssize_t ret; + + ret = -EACCES; + if (!file_ns_capable(file, ns, CAP_SYS_ADMIN)) + goto out; + + /* Only allow a very narrow range of strings to be written */ + ret = -EINVAL; + if ((*ppos != 0) || (count >= sizeof(kbuf))) + goto out; + + /* What was written? */ + ret = -EFAULT; + if (copy_from_user(kbuf, buf, count)) + goto out; + kbuf[count] = '\0'; + pos = kbuf; + + /* What is being requested? */ + ret = -EINVAL; + if (strncmp(pos, "allow", 5) == 0) { + pos += 5; + setgroups_allowed = true; + } + else if (strncmp(pos, "deny", 4) == 0) { + pos += 4; + setgroups_allowed = false; + } + else + goto out; + + /* Verify there is not trailing junk on the line */ + pos = skip_spaces(pos); + if (*pos != '\0') + goto out; + + if (setgroups_allowed) { + ret = -EPERM; + if (!userns_setgroups_allowed(ns)) + goto out; + } else { + userns_disable_setgroups(ns); + } + + /* Report a successful write */ + *ppos = count; + ret = count; +out: + return ret; +} + static void *userns_get(struct task_struct *task) { struct user_namespace *user_ns; -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 22:11 ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-08 22:21 ` Andy Lutomirski 2014-12-08 22:44 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:21 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > - Expose the knob to user space through a proc file /proc/<pid>/setgroups > > A value of 0 means the setgroups system call is disabled in the "deny" > current processes user namespace and can not be enabled in the > future in this user namespace. > > A value of 1 means the segtoups system call is enabled. > "allow" > - Descedent user namespaces inherit the value of setgroups from s/Descedent/Descendent/ > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -222,6 +222,7 @@ bool may_setgroups(void) > * the user namespace has been established. > */ > return userns_gid_mappings_established(user_ns) && > + userns_setgroups_allowed(user_ns) && > ns_capable(user_ns, CAP_SETGID); > } Can you add a comment explaining the ordering? For example: We need to check for a gid mapping before checking setgroups_allowed because an unprivileged user can create a userns with setgroups allowed, then disallow setgroups and add a mapping. If we check in the opposite order, then we have a race: we could see that setgroups is allowed before the user clears the bit and then see that there is a gid mapping after the other thread is done. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 22:21 ` Andy Lutomirski @ 2014-12-08 22:44 ` Eric W. Biederman 2014-12-08 22:48 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:44 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >> >> A value of 0 means the setgroups system call is disabled in the > > "deny" > >> current processes user namespace and can not be enabled in the >> future in this user namespace. >> >> A value of 1 means the segtoups system call is enabled. >> > > "allow" > >> - Descedent user namespaces inherit the value of setgroups from > > s/Descedent/Descendent/ Bah. I updated everything but the changelog comment. >> --- a/kernel/groups.c >> +++ b/kernel/groups.c >> @@ -222,6 +222,7 @@ bool may_setgroups(void) >> * the user namespace has been established. >> */ >> return userns_gid_mappings_established(user_ns) && >> + userns_setgroups_allowed(user_ns) && >> ns_capable(user_ns, CAP_SETGID); >> } > > Can you add a comment explaining the ordering? For example: I need to think on what I can say to make it clear. Perhaps: /* Careful the order of these checks is important. */ > We need to check for a gid mapping before checking setgroups_allowed > because an unprivileged user can create a userns with setgroups > allowed, then disallow setgroups and add a mapping. If we check in > the opposite order, then we have a race: we could see that setgroups > is allowed before the user clears the bit and then see that there is a > gid mapping after the other thread is done. Since these are independent atomic variables yes that ordering issue seems to be the case. For me it was the natural ordering of the checks so I didn't even bother to think about what happens when you reorder them. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 22:44 ` Eric W. Biederman @ 2014-12-08 22:48 ` Andy Lutomirski 2014-12-08 23:30 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> >>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >>> >>> A value of 0 means the setgroups system call is disabled in the >> >> "deny" >> >>> current processes user namespace and can not be enabled in the >>> future in this user namespace. >>> >>> A value of 1 means the segtoups system call is enabled. >>> >> >> "allow" >> >>> - Descedent user namespaces inherit the value of setgroups from >> >> s/Descedent/Descendent/ > > Bah. I updated everything but the changelog comment. > >>> --- a/kernel/groups.c >>> +++ b/kernel/groups.c >>> @@ -222,6 +222,7 @@ bool may_setgroups(void) >>> * the user namespace has been established. >>> */ >>> return userns_gid_mappings_established(user_ns) && >>> + userns_setgroups_allowed(user_ns) && >>> ns_capable(user_ns, CAP_SETGID); >>> } >> >> Can you add a comment explaining the ordering? For example: > > I need to think on what I can say to make it clear. > Perhaps: /* Careful the order of these checks is important. */ > >> We need to check for a gid mapping before checking setgroups_allowed >> because an unprivileged user can create a userns with setgroups >> allowed, then disallow setgroups and add a mapping. If we check in >> the opposite order, then we have a race: we could see that setgroups >> is allowed before the user clears the bit and then see that there is a >> gid mapping after the other thread is done. > This text was actually my suggested comment text. If you put smp_rmb() in this function with a comment like that, then I think it will all make sense and be obviously correct (even with most of the other barriers removed). --Andy > Since these are independent atomic variables yes that ordering issue > seems to be the case. > > For me it was the natural ordering of the checks so I didn't even bother > to think about what happens when you reorder them. > > Eric -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 22:48 ` Andy Lutomirski @ 2014-12-08 23:30 ` Eric W. Biederman 2014-12-09 19:31 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 23:30 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>>> >>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups >>>> >>>> A value of 0 means the setgroups system call is disabled in the >>> >>> "deny" >>> >>>> current processes user namespace and can not be enabled in the >>>> future in this user namespace. >>>> >>>> A value of 1 means the segtoups system call is enabled. >>>> >>> >>> "allow" >>> >>>> - Descedent user namespaces inherit the value of setgroups from >>> >>> s/Descedent/Descendent/ >> >> Bah. I updated everything but the changelog comment. >> >>>> --- a/kernel/groups.c >>>> +++ b/kernel/groups.c >>>> @@ -222,6 +222,7 @@ bool may_setgroups(void) >>>> * the user namespace has been established. >>>> */ >>>> return userns_gid_mappings_established(user_ns) && >>>> + userns_setgroups_allowed(user_ns) && >>>> ns_capable(user_ns, CAP_SETGID); >>>> } >>> >>> Can you add a comment explaining the ordering? For example: >> >> I need to think on what I can say to make it clear. >> Perhaps: /* Careful the order of these checks is important. */ >> >>> We need to check for a gid mapping before checking setgroups_allowed >>> because an unprivileged user can create a userns with setgroups >>> allowed, then disallow setgroups and add a mapping. If we check in >>> the opposite order, then we have a race: we could see that setgroups >>> is allowed before the user clears the bit and then see that there is a >>> gid mapping after the other thread is done. >> > > This text was actually my suggested comment text. Now I see. > If you put smp_rmb() in this function with a comment like that, then I > think it will all make sense and be obviously correct (even with most > of the other barriers removed). Right. Given that we have to be careful when using these things anyway what I was hoping to achieve with the barriers appears impossible, and confusing so I will see about just adding barriers where we need them for real. Sigh. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-08 23:30 ` Eric W. Biederman @ 2014-12-09 19:31 ` Eric W. Biederman 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 19:31 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable ebiederm@xmission.com (Eric W. Biederman) writes: > Andy Lutomirski <luto@amacapital.net> writes: > >> >> This text was actually my suggested comment text. > > Now I see. > >> If you put smp_rmb() in this function with a comment like that, then I >> think it will all make sense and be obviously correct (even with most >> of the other barriers removed). > > Right. > > Given that we have to be careful when using these things anyway what > I was hoping to achieve with the barriers appears impossible, and > confusing so I will see about just adding barriers where we need them > for real. Sigh. Doh. The code has been entirely too clever. There are no need for atomics or other cleverness, I just need to generalize id_map_mutex. I knew that had to be a trivially correct way of handling this mess. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings. 2014-12-09 19:31 ` Eric W. Biederman @ 2014-12-09 20:36 ` Eric W. Biederman 2014-12-09 20:38 ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman ` (7 more replies) 0 siblings, 8 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:36 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index aa312b0dc3ec..b99c862a2e3f 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *new_map) { - /* Allow mapping to your own filesystem ids */ + /* Don't allow mappings that would allow anything that wouldn't + * be allowed without the establishment of unprivileged mappings. + */ if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman @ 2014-12-09 20:38 ` Eric W. Biederman 2014-12-09 22:49 ` Andy Lutomirski 2014-12-09 20:39 ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman ` (6 subsequent siblings) 7 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:38 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/user_namespace.h | 5 +++++ kernel/groups.c | 4 +++- kernel/user_namespace.c | 14 ++++++++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index e95372654f09..8d493083486a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations; extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); +extern bool userns_may_setgroups(const struct user_namespace *ns); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) @@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns) { } +static inline bool userns_may_setgroups(const struct user_namespace *ns) +{ + return true; +} #endif #endif /* _LINUX_USER_H */ diff --git a/kernel/groups.c b/kernel/groups.c index 02d8a251c476..664411f171b5 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -6,6 +6,7 @@ #include <linux/slab.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/user_namespace.h> #include <asm/uaccess.h> /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -217,7 +218,8 @@ bool may_setgroups(void) { struct user_namespace *user_ns = current_user_ns(); - return ns_capable(user_ns, CAP_SETGID); + return ns_capable(user_ns, CAP_SETGID) && + userns_may_setgroups(user_ns); } /* diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index b99c862a2e3f..27c8dab48c07 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file, return false; } +bool userns_may_setgroups(const struct user_namespace *ns) +{ + bool allowed; + + mutex_lock(&id_map_mutex); + /* It is not safe to use setgroups until a gid mapping in + * the user namespace has been established. + */ + allowed = ns->gid_map.nr_extents != 0; + mutex_unlock(&id_map_mutex); + + return allowed; +} + static void *userns_get(struct task_struct *task) { struct user_namespace *user_ns; -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished 2014-12-09 20:38 ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman @ 2014-12-09 22:49 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-09 22:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 9, 2014 at 12:38 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > setgroups is unique in not needing a valid mapping before it can be called, > in the case of setgroups(0, NULL) which drops all supplemental groups. > > The design of the user namespace assumes that CAP_SETGID can not actually > be used until a gid mapping is established. Therefore add a helper function > to see if the user namespace gid mapping has been established and call > that function in the setgroups permission check. > > This is part of the fix for CVE-2014-8989, being able to drop groups > without privilege using user namespaces. Reviewed-by: Andy Lutomirski <luto@amacapital.net> > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > include/linux/user_namespace.h | 5 +++++ > kernel/groups.c | 4 +++- > kernel/user_namespace.c | 14 ++++++++++++++ > 3 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index e95372654f09..8d493083486a 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations; > extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); > extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); > extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); > +extern bool userns_may_setgroups(const struct user_namespace *ns); > #else > > static inline struct user_namespace *get_user_ns(struct user_namespace *ns) > @@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns) > { > } > > +static inline bool userns_may_setgroups(const struct user_namespace *ns) > +{ > + return true; > +} > #endif > > #endif /* _LINUX_USER_H */ > diff --git a/kernel/groups.c b/kernel/groups.c > index 02d8a251c476..664411f171b5 100644 > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -6,6 +6,7 @@ > #include <linux/slab.h> > #include <linux/security.h> > #include <linux/syscalls.h> > +#include <linux/user_namespace.h> > #include <asm/uaccess.h> > > /* init to 2 - one for init_task, one to ensure it is never freed */ > @@ -217,7 +218,8 @@ bool may_setgroups(void) > { > struct user_namespace *user_ns = current_user_ns(); > > - return ns_capable(user_ns, CAP_SETGID); > + return ns_capable(user_ns, CAP_SETGID) && > + userns_may_setgroups(user_ns); > } > > /* > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index b99c862a2e3f..27c8dab48c07 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file, > return false; > } > > +bool userns_may_setgroups(const struct user_namespace *ns) > +{ > + bool allowed; > + > + mutex_lock(&id_map_mutex); > + /* It is not safe to use setgroups until a gid mapping in > + * the user namespace has been established. > + */ > + allowed = ns->gid_map.nr_extents != 0; > + mutex_unlock(&id_map_mutex); > + > + return allowed; > +} > + > static void *userns_get(struct task_struct *task) > { > struct user_namespace *user_ns; > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-09 20:38 ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman @ 2014-12-09 20:39 ` Eric W. Biederman 2014-12-09 23:00 ` Andy Lutomirski 2014-12-09 20:39 ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman ` (5 subsequent siblings) 7 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:39 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 27c8dab48c07..1ce6d67c07b7 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file, kuid_t uid = make_kuid(ns->parent, id); if (uid_eq(uid, file->f_cred->fsuid)) return true; - } else if (cap_setid == CAP_SETGID) { - kgid_t gid = make_kgid(ns->parent, id); - if (gid_eq(gid, file->f_cred->fsgid)) - return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings 2014-12-09 20:39 ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman @ 2014-12-09 23:00 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-09 23:00 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 9, 2014 at 12:39 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > As any gid mapping will allow and must allow for backwards > compatibility dropping groups don't allow any gid mappings to be > established without CAP_SETGID in the parent user namespace. > > For a small class of applications this change breaks userspace > and removes useful functionality. This small class of applications > includes tools/testing/selftests/mount/unprivilged-remount-test.c > > Most of the removed functionality will be added back with the addition > of a one way knob to disable setgroups. Once setgroups is disabled > setting the gid_map becomes as safe as setting the uid_map. > > For more common applications that set the uid_map and the gid_map > with privilege this change will have no affect. > Reviewed-by: Andy Lutomirski <luto@amacapital.net> > This is part of a fix for CVE-2014-8989. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/user_namespace.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 27c8dab48c07..1ce6d67c07b7 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file, > kuid_t uid = make_kuid(ns->parent, id); > if (uid_eq(uid, file->f_cred->fsuid)) > return true; > - } else if (cap_setid == CAP_SETGID) { > - kgid_t gid = make_kgid(ns->parent, id); > - if (gid_eq(gid, file->f_cred->fsgid)) > - return true; > } > } > > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-09 20:38 ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman 2014-12-09 20:39 ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman @ 2014-12-09 20:39 ` Eric W. Biederman 2014-12-09 20:41 ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman ` (4 subsequent siblings) 7 siblings, 0 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:39 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 1ce6d67c07b7..9451b12a9b6c 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file, u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { kuid_t uid = make_kuid(ns->parent, id); - if (uid_eq(uid, file->f_cred->fsuid)) + if (uid_eq(uid, file->f_cred->euid)) return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (2 preceding siblings ...) 2014-12-09 20:39 ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman @ 2014-12-09 20:41 ` Eric W. Biederman 2014-12-09 20:41 ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman ` (3 subsequent siblings) 7 siblings, 0 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:41 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 9451b12a9b6c..1e34de2fbd60 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *new_map) { + const struct cred *cred = file->f_cred; /* Don't allow mappings that would allow anything that wouldn't * be allowed without the establishment of unprivileged mappings. */ - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && + uid_eq(ns->owner, cred->euid)) { u32 id = new_map->extent[0].lower_first; if (cap_setid == CAP_SETUID) { kuid_t uid = make_kuid(ns->parent, id); - if (uid_eq(uid, file->f_cred->euid)) + if (uid_eq(uid, cred->euid)) return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (3 preceding siblings ...) 2014-12-09 20:41 ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman @ 2014-12-09 20:41 ` Eric W. Biederman 2014-12-09 22:49 ` Andy Lutomirski 2014-12-09 20:42 ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman ` (2 subsequent siblings) 7 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:41 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Generalize id_map_mutex so it can be used for more state of a user namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 1e34de2fbd60..44a555ac6104 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -24,6 +24,7 @@ #include <linux/fs_struct.h> static struct kmem_cache *user_ns_cachep __read_mostly; +static DEFINE_MUTEX(userns_state_mutex); static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, @@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map, return false; } - -static DEFINE_MUTEX(id_map_mutex); - static ssize_t map_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos, int cap_setid, @@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, ssize_t ret = -EINVAL; /* - * The id_map_mutex serializes all writes to any given map. + * The userns_state_mutex serializes all writes to any given map. * * Any map is only ever written once. * @@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, * order and smp_rmb() is guaranteed that we don't have crazy * architectures returning stale data. */ - mutex_lock(&id_map_mutex); + mutex_lock(&userns_state_mutex); ret = -EPERM; /* Only allow one successful write to the map */ @@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, *ppos = count; ret = count; out: - mutex_unlock(&id_map_mutex); + mutex_unlock(&userns_state_mutex); if (page) free_page(page); return ret; @@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns) { bool allowed; - mutex_lock(&id_map_mutex); + mutex_lock(&userns_state_mutex); /* It is not safe to use setgroups until a gid mapping in * the user namespace has been established. */ allowed = ns->gid_map.nr_extents != 0; - mutex_unlock(&id_map_mutex); + mutex_unlock(&userns_state_mutex); return allowed; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex 2014-12-09 20:41 ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman @ 2014-12-09 22:49 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-09 22:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 9, 2014 at 12:41 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Generalize id_map_mutex so it can be used for more state of a user namespace. Reviewed-by: Andy Lutomirski <luto@amacapital.net> > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/user_namespace.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 1e34de2fbd60..44a555ac6104 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -24,6 +24,7 @@ > #include <linux/fs_struct.h> > > static struct kmem_cache *user_ns_cachep __read_mostly; > +static DEFINE_MUTEX(userns_state_mutex); > > static bool new_idmap_permitted(const struct file *file, > struct user_namespace *ns, int cap_setid, > @@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map, > return false; > } > > - > -static DEFINE_MUTEX(id_map_mutex); > - > static ssize_t map_write(struct file *file, const char __user *buf, > size_t count, loff_t *ppos, > int cap_setid, > @@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, > ssize_t ret = -EINVAL; > > /* > - * The id_map_mutex serializes all writes to any given map. > + * The userns_state_mutex serializes all writes to any given map. > * > * Any map is only ever written once. > * > @@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, > * order and smp_rmb() is guaranteed that we don't have crazy > * architectures returning stale data. > */ > - mutex_lock(&id_map_mutex); > + mutex_lock(&userns_state_mutex); > > ret = -EPERM; > /* Only allow one successful write to the map */ > @@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, > *ppos = count; > ret = count; > out: > - mutex_unlock(&id_map_mutex); > + mutex_unlock(&userns_state_mutex); > if (page) > free_page(page); > return ret; > @@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns) > { > bool allowed; > > - mutex_lock(&id_map_mutex); > + mutex_lock(&userns_state_mutex); > /* It is not safe to use setgroups until a gid mapping in > * the user namespace has been established. > */ > allowed = ns->gid_map.nr_extents != 0; > - mutex_unlock(&id_map_mutex); > + mutex_unlock(&userns_state_mutex); > > return allowed; > } > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (4 preceding siblings ...) 2014-12-09 20:41 ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman @ 2014-12-09 20:42 ` Eric W. Biederman 2014-12-09 22:28 ` Andy Lutomirski 2014-12-09 20:43 ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman 2014-12-10 16:39 ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman 7 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:42 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not pass in a struct file so file_ns_capable is unusable. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/proc/base.c | 31 +++++++++---- include/linux/user_namespace.h | 7 +++ kernel/user.c | 1 + kernel/user_namespace.c | 103 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 134 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 772efa45a452..4ebed9f01d97 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns, #endif /* CONFIG_TASK_IO_ACCOUNTING */ #ifdef CONFIG_USER_NS -static int proc_id_map_open(struct inode *inode, struct file *file, +static int proc_userns_open(struct inode *inode, struct file *file, const struct seq_operations *seq_ops) { struct user_namespace *ns = NULL; @@ -2418,7 +2418,7 @@ err: return ret; } -static int proc_id_map_release(struct inode *inode, struct file *file) +static int proc_userns_release(struct inode *inode, struct file *file) { struct seq_file *seq = file->private_data; struct user_namespace *ns = seq->private; @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file) static int proc_uid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_uid_seq_operations); + return proc_userns_open(inode, file, &proc_uid_seq_operations); } static int proc_gid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_gid_seq_operations); + return proc_userns_open(inode, file, &proc_gid_seq_operations); } static int proc_projid_map_open(struct inode *inode, struct file *file) { - return proc_id_map_open(inode, file, &proc_projid_seq_operations); + return proc_userns_open(inode, file, &proc_projid_seq_operations); } static const struct file_operations proc_uid_map_operations = { @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = { .write = proc_uid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_gid_map_operations = { @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = { .write = proc_gid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, }; static const struct file_operations proc_projid_map_operations = { @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = { .write = proc_projid_map_write, .read = seq_read, .llseek = seq_lseek, - .release = proc_id_map_release, + .release = proc_userns_release, +}; + +static int proc_setgroups_open(struct inode *inode, struct file *file) +{ + return proc_userns_open(inode, file, &proc_setgroups_seq_operations); +} + +static const struct file_operations proc_setgroups_operations = { + .open = proc_setgroups_open, + .write = proc_setgroups_write, + .read = seq_read, + .llseek = seq_lseek, + .release = proc_userns_release, }; #endif /* CONFIG_USER_NS */ @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif #ifdef CONFIG_CHECKPOINT_RESTORE REG("timers", S_IRUGO, proc_timers_operations), @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif }; diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 8d493083486a..feb0f4ec5573 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -17,6 +17,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ } extent[UID_GID_MAP_MAX_EXTENTS]; }; +#define USERNS_SETGROUPS_ALLOWED 1UL + +#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED + struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; @@ -27,6 +31,7 @@ struct user_namespace { kuid_t owner; kgid_t group; unsigned int proc_inum; + unsigned long flags; /* Register of per-UID persistent keyrings for this namespace */ #ifdef CONFIG_PERSISTENT_KEYRINGS @@ -60,9 +65,11 @@ struct seq_operations; extern const struct seq_operations proc_uid_seq_operations; extern const struct seq_operations proc_gid_seq_operations; extern const struct seq_operations proc_projid_seq_operations; +extern const struct seq_operations proc_setgroups_seq_operations; extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); extern bool userns_may_setgroups(const struct user_namespace *ns); #else diff --git a/kernel/user.c b/kernel/user.c index 4efa39350e44..2d09940c9632 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, + .flags = USERNS_INIT_FLAGS, #ifdef CONFIG_PERSISTENT_KEYRINGS .persistent_keyring_register_sem = __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 44a555ac6104..b507f9af7ff2 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -100,6 +100,11 @@ int create_user_ns(struct cred *new) ns->owner = owner; ns->group = group; + /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */ + mutex_lock(&userns_state_mutex); + ns->flags = parent_ns->flags; + mutex_unlock(&userns_state_mutex); + set_cred_user_ns(new, ns); #ifdef CONFIG_PERSISTENT_KEYRINGS @@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file, return false; } +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) +{ + struct user_namespace *ns = seq->private; + + return (*ppos == 0) ? ns : NULL; +} + +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) +{ + ++*ppos; + return NULL; +} + +static void setgroups_m_stop(struct seq_file *seq, void *v) +{ +} + +static int setgroups_m_show(struct seq_file *seq, void *v) +{ + struct user_namespace *ns = seq->private; + + seq_printf(seq, "%s\n", + test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ? + "allow" : "deny"); + return 0; +} + +const struct seq_operations proc_setgroups_seq_operations = { + .start = setgroups_m_start, + .stop = setgroups_m_stop, + .next = setgroups_m_next, + .show = setgroups_m_show, +}; + +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct seq_file *seq = file->private_data; + struct user_namespace *ns = seq->private; + char kbuf[8], *pos; + bool setgroups_allowed; + ssize_t ret; + + ret = -EACCES; + if (!file_ns_capable(file, ns, CAP_SYS_ADMIN)) + goto out; + + /* Only allow a very narrow range of strings to be written */ + ret = -EINVAL; + if ((*ppos != 0) || (count >= sizeof(kbuf))) + goto out; + + /* What was written? */ + ret = -EFAULT; + if (copy_from_user(kbuf, buf, count)) + goto out; + kbuf[count] = '\0'; + pos = kbuf; + + /* What is being requested? */ + ret = -EINVAL; + if (strncmp(pos, "allow", 5) == 0) { + pos += 5; + setgroups_allowed = true; + } + else if (strncmp(pos, "deny", 4) == 0) { + pos += 4; + setgroups_allowed = false; + } + else + goto out; + + /* Verify there is not trailing junk on the line */ + pos = skip_spaces(pos); + if (*pos != '\0') + goto out; + + mutex_lock(&userns_state_mutex); + if (setgroups_allowed) { + ret = -EPERM; + if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) { + mutex_unlock(&userns_state_mutex); + goto out; + } + } else { + ns->flags &= ~USERNS_SETGROUPS_ALLOWED; + } + mutex_unlock(&userns_state_mutex); + + /* Report a successful write */ + *ppos = count; + ret = count; +out: + return ret; +} + bool userns_may_setgroups(const struct user_namespace *ns) { bool allowed; @@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns) * the user namespace has been established. */ allowed = ns->gid_map.nr_extents != 0; + /* Is setgroups allowed? */ + allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED); mutex_unlock(&userns_state_mutex); return allowed; -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis 2014-12-09 20:42 ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-09 22:28 ` Andy Lutomirski [not found] ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com> 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-09 22:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > - Expose the knob to user space through a proc file /proc/<pid>/setgroups > > A value of "deny" means the setgroups system call is disabled in the > current processes user namespace and can not be enabled in the > future in this user namespace. > > A value of "allow" means the segtoups system call is enabled. > > - Descendant user namespaces inherit the value of setgroups from > their parents. > > - A proc file is used (instead of a sysctl) as sysctls > currently do not pass in a struct file so file_ns_capable > is unusable. Reviewed-by: Andy Lutomirski <luto@amacapital.net> But I still don't like the name "setgroups". People may look at that and have no clue what the scope of the setting is. And anyone who, as root, writes "deny" to /proc/self/setgroups, thinking that it acts on self, will be in for a surprise. --Andy > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/proc/base.c | 31 +++++++++---- > include/linux/user_namespace.h | 7 +++ > kernel/user.c | 1 + > kernel/user_namespace.c | 103 +++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 134 insertions(+), 8 deletions(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index 772efa45a452..4ebed9f01d97 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -2386,7 +2386,7 @@ static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns, > #endif /* CONFIG_TASK_IO_ACCOUNTING */ > > #ifdef CONFIG_USER_NS > -static int proc_id_map_open(struct inode *inode, struct file *file, > +static int proc_userns_open(struct inode *inode, struct file *file, > const struct seq_operations *seq_ops) > { > struct user_namespace *ns = NULL; > @@ -2418,7 +2418,7 @@ err: > return ret; > } > > -static int proc_id_map_release(struct inode *inode, struct file *file) > +static int proc_userns_release(struct inode *inode, struct file *file) > { > struct seq_file *seq = file->private_data; > struct user_namespace *ns = seq->private; > @@ -2428,17 +2428,17 @@ static int proc_id_map_release(struct inode *inode, struct file *file) > > static int proc_uid_map_open(struct inode *inode, struct file *file) > { > - return proc_id_map_open(inode, file, &proc_uid_seq_operations); > + return proc_userns_open(inode, file, &proc_uid_seq_operations); > } > > static int proc_gid_map_open(struct inode *inode, struct file *file) > { > - return proc_id_map_open(inode, file, &proc_gid_seq_operations); > + return proc_userns_open(inode, file, &proc_gid_seq_operations); > } > > static int proc_projid_map_open(struct inode *inode, struct file *file) > { > - return proc_id_map_open(inode, file, &proc_projid_seq_operations); > + return proc_userns_open(inode, file, &proc_projid_seq_operations); > } > > static const struct file_operations proc_uid_map_operations = { > @@ -2446,7 +2446,7 @@ static const struct file_operations proc_uid_map_operations = { > .write = proc_uid_map_write, > .read = seq_read, > .llseek = seq_lseek, > - .release = proc_id_map_release, > + .release = proc_userns_release, > }; > > static const struct file_operations proc_gid_map_operations = { > @@ -2454,7 +2454,7 @@ static const struct file_operations proc_gid_map_operations = { > .write = proc_gid_map_write, > .read = seq_read, > .llseek = seq_lseek, > - .release = proc_id_map_release, > + .release = proc_userns_release, > }; > > static const struct file_operations proc_projid_map_operations = { > @@ -2462,7 +2462,20 @@ static const struct file_operations proc_projid_map_operations = { > .write = proc_projid_map_write, > .read = seq_read, > .llseek = seq_lseek, > - .release = proc_id_map_release, > + .release = proc_userns_release, > +}; > + > +static int proc_setgroups_open(struct inode *inode, struct file *file) > +{ > + return proc_userns_open(inode, file, &proc_setgroups_seq_operations); > +} > + > +static const struct file_operations proc_setgroups_operations = { > + .open = proc_setgroups_open, > + .write = proc_setgroups_write, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = proc_userns_release, > }; > #endif /* CONFIG_USER_NS */ > > @@ -2572,6 +2585,7 @@ static const struct pid_entry tgid_base_stuff[] = { > REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), > REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), > REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), > + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), > #endif > #ifdef CONFIG_CHECKPOINT_RESTORE > REG("timers", S_IRUGO, proc_timers_operations), > @@ -2913,6 +2927,7 @@ static const struct pid_entry tid_base_stuff[] = { > REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), > REG("gid_map", S_IRUGO|S_IWUSR, proc_gid_map_operations), > REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), > + REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), > #endif > }; > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index 8d493083486a..feb0f4ec5573 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -17,6 +17,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ > } extent[UID_GID_MAP_MAX_EXTENTS]; > }; > > +#define USERNS_SETGROUPS_ALLOWED 1UL > + > +#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED > + > struct user_namespace { > struct uid_gid_map uid_map; > struct uid_gid_map gid_map; > @@ -27,6 +31,7 @@ struct user_namespace { > kuid_t owner; > kgid_t group; > unsigned int proc_inum; > + unsigned long flags; > > /* Register of per-UID persistent keyrings for this namespace */ > #ifdef CONFIG_PERSISTENT_KEYRINGS > @@ -60,9 +65,11 @@ struct seq_operations; > extern const struct seq_operations proc_uid_seq_operations; > extern const struct seq_operations proc_gid_seq_operations; > extern const struct seq_operations proc_projid_seq_operations; > +extern const struct seq_operations proc_setgroups_seq_operations; > extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); > extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); > extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); > +extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); > extern bool userns_may_setgroups(const struct user_namespace *ns); > #else > > diff --git a/kernel/user.c b/kernel/user.c > index 4efa39350e44..2d09940c9632 100644 > --- a/kernel/user.c > +++ b/kernel/user.c > @@ -51,6 +51,7 @@ struct user_namespace init_user_ns = { > .owner = GLOBAL_ROOT_UID, > .group = GLOBAL_ROOT_GID, > .proc_inum = PROC_USER_INIT_INO, > + .flags = USERNS_INIT_FLAGS, > #ifdef CONFIG_PERSISTENT_KEYRINGS > .persistent_keyring_register_sem = > __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 44a555ac6104..b507f9af7ff2 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -100,6 +100,11 @@ int create_user_ns(struct cred *new) > ns->owner = owner; > ns->group = group; > > + /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */ > + mutex_lock(&userns_state_mutex); > + ns->flags = parent_ns->flags; > + mutex_unlock(&userns_state_mutex); > + > set_cred_user_ns(new, ns); > > #ifdef CONFIG_PERSISTENT_KEYRINGS > @@ -839,6 +844,102 @@ static bool new_idmap_permitted(const struct file *file, > return false; > } > > +static void *setgroups_m_start(struct seq_file *seq, loff_t *ppos) > +{ > + struct user_namespace *ns = seq->private; > + > + return (*ppos == 0) ? ns : NULL; > +} > + > +static void *setgroups_m_next(struct seq_file *seq, void *v, loff_t *ppos) > +{ > + ++*ppos; > + return NULL; > +} > + > +static void setgroups_m_stop(struct seq_file *seq, void *v) > +{ > +} > + > +static int setgroups_m_show(struct seq_file *seq, void *v) > +{ > + struct user_namespace *ns = seq->private; > + > + seq_printf(seq, "%s\n", > + test_bit(USERNS_SETGROUPS_ALLOWED, &ns->flags) ? > + "allow" : "deny"); > + return 0; > +} > + > +const struct seq_operations proc_setgroups_seq_operations = { > + .start = setgroups_m_start, > + .stop = setgroups_m_stop, > + .next = setgroups_m_next, > + .show = setgroups_m_show, > +}; > + > +ssize_t proc_setgroups_write(struct file *file, const char __user *buf, > + size_t count, loff_t *ppos) > +{ > + struct seq_file *seq = file->private_data; > + struct user_namespace *ns = seq->private; > + char kbuf[8], *pos; > + bool setgroups_allowed; > + ssize_t ret; > + > + ret = -EACCES; > + if (!file_ns_capable(file, ns, CAP_SYS_ADMIN)) > + goto out; > + > + /* Only allow a very narrow range of strings to be written */ > + ret = -EINVAL; > + if ((*ppos != 0) || (count >= sizeof(kbuf))) > + goto out; > + > + /* What was written? */ > + ret = -EFAULT; > + if (copy_from_user(kbuf, buf, count)) > + goto out; > + kbuf[count] = '\0'; > + pos = kbuf; > + > + /* What is being requested? */ > + ret = -EINVAL; > + if (strncmp(pos, "allow", 5) == 0) { > + pos += 5; > + setgroups_allowed = true; > + } > + else if (strncmp(pos, "deny", 4) == 0) { > + pos += 4; > + setgroups_allowed = false; > + } > + else > + goto out; > + > + /* Verify there is not trailing junk on the line */ > + pos = skip_spaces(pos); > + if (*pos != '\0') > + goto out; > + > + mutex_lock(&userns_state_mutex); > + if (setgroups_allowed) { > + ret = -EPERM; > + if (!(ns->flags & USERNS_SETGROUPS_ALLOWED)) { > + mutex_unlock(&userns_state_mutex); > + goto out; > + } > + } else { > + ns->flags &= ~USERNS_SETGROUPS_ALLOWED; > + } > + mutex_unlock(&userns_state_mutex); > + > + /* Report a successful write */ > + *ppos = count; > + ret = count; > +out: > + return ret; > +} > + > bool userns_may_setgroups(const struct user_namespace *ns) > { > bool allowed; > @@ -848,6 +949,8 @@ bool userns_may_setgroups(const struct user_namespace *ns) > * the user namespace has been established. > */ > allowed = ns->gid_map.nr_extents != 0; > + /* Is setgroups allowed? */ > + allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED); > mutex_unlock(&userns_state_mutex); > > return allowed; > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com>]
* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis [not found] ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com> @ 2014-12-10 0:21 ` Andy Lutomirski [not found] ` <87wq5zf83t.fsf@x220.int.ebiederm.org> 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-10 0:21 UTC (permalink / raw) To: Eric W.Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <ebiederm@xmission.com> wrote: > > > On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <luto@amacapital.net> wrote: >>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman >><ebiederm@xmission.com> wrote: >>> >>> - Expose the knob to user space through a proc file >>/proc/<pid>/setgroups >>> >>> A value of "deny" means the setgroups system call is disabled in >>the >>> current processes user namespace and can not be enabled in the >>> future in this user namespace. >>> >>> A value of "allow" means the segtoups system call is enabled. >>> >>> - Descendant user namespaces inherit the value of setgroups from >>> their parents. >>> >>> - A proc file is used (instead of a sysctl) as sysctls >>> currently do not pass in a struct file so file_ns_capable >>> is unusable. >> >>Reviewed-by: Andy Lutomirski <luto@amacapital.net> >> >>But I still don't like the name "setgroups". People may look at that >>and have no clue what the scope of the setting is. And anyone who, as >>root, writes "deny" to /proc/self/setgroups, thinking that it acts on >>self, will be in for a surprise. > > True setgroups isn't perfect. Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl. But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed. > Would "userns_setgroups" be okay? --Andy > Eric -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <87wq5zf83t.fsf@x220.int.ebiederm.org>]
[parent not found: <87iohh3c9c.fsf@x220.int.ebiederm.org>]
* Re: [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis [not found] ` <87iohh3c9c.fsf@x220.int.ebiederm.org> @ 2014-12-12 1:30 ` Andy Lutomirski [not found] ` <8761dh3b7k.fsf_-_@x220.int.ebiederm.org> 1 sibling, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-12 1:30 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Thu, Dec 11, 2014 at 5:09 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > ebiederm@xmission.com (Eric W. Biederman) writes: > >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> On Tue, Dec 9, 2014 at 4:04 PM, Eric W.Biederman <ebiederm@xmission.com> wrote: >>>> >>>> >>>> On December 9, 2014 4:28:38 PM CST, Andy Lutomirski <luto@amacapital.net> wrote: >>>>>On Tue, Dec 9, 2014 at 12:42 PM, Eric W. Biederman >>>>><ebiederm@xmission.com> wrote: >>>>>> >>>>>> - Expose the knob to user space through a proc file >>>>>/proc/<pid>/setgroups >>>>>> >>>>>> A value of "deny" means the setgroups system call is disabled in >>>>>the >>>>>> current processes user namespace and can not be enabled in the >>>>>> future in this user namespace. >>>>>> >>>>>> A value of "allow" means the segtoups system call is enabled. >>>>>> >>>>>> - Descendant user namespaces inherit the value of setgroups from >>>>>> their parents. >>>>>> >>>>>> - A proc file is used (instead of a sysctl) as sysctls >>>>>> currently do not pass in a struct file so file_ns_capable >>>>>> is unusable. >>>>> >>>>>Reviewed-by: Andy Lutomirski <luto@amacapital.net> >>>>> >>>>>But I still don't like the name "setgroups". People may look at that >>>>>and have no clue what the scope of the setting is. And anyone who, as >>>>>root, writes "deny" to /proc/self/setgroups, thinking that it acts on >>>>>self, will be in for a surprise. >>>> >>>> True setgroups isn't perfect. Documenting it in a manpage may have to be enough. The only real improvement I can think of would be to make the setting a sysctl. But I think pursuing that approaches the point where perfection is the enemy of getting this problem fixed. >>>> >>> >>> Would "userns_setgroups" be okay? >> >> Maybe. >> >> I just played with this and this is a much bigger booby trap than I had >> realized. Disabling setgroups disables the possibility of logging in the >> future and since it is a one way switch the only way out is to reboot. >> >> Hooray our software checks the returns of setgroups. Booh. This is a >> really nasty knob to have anywhere. >> >> I need to think about this a little bit. Giving root the power to shoot >> himself in the foot is one thing. Giving root a loaded gun pointed at >> his foot with the hammer pulled back, and a sign that says I dare you to >> pull the trigger, seems like a bad idea. >> >> I think I need to reduce when that knob can be used. Grr. >> Back to the drawing board! > > I tried out a bunch of things and finally found a simple rule. Don't > allow setgroups to be disabled after setgroups has been enabled in a > user namespace. Or in practical terms don't allow setgroups to be > disabled after the gid_map has been set. > > Which in practice pretty nearly means that we are only allowing writes > to setgroups when it is a single process and it's eventual children that > can be affected. > > At which point I don't think a name change would make things any > clearer. The name change still helps the user to does: $ ls /proc/self "setgroups? what's that?" > > I have also updated the code to move the permission checks to open > where they belong (doh!). Patch follows. Will review and test. > > Eric > > > > > > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <8761dh3b7k.fsf_-_@x220.int.ebiederm.org>]
[parent not found: <878uicy1r9.fsf_-_@x220.int.ebiederm.org>]
* [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups [not found] ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org> @ 2014-12-12 21:54 ` Eric W. Biederman 2015-02-02 15:36 ` Michael Kerrisk (man-pages) 2014-12-12 21:54 ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman 1 sibling, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-12 21:54 UTC (permalink / raw) To: Michael Kerrisk-manpages Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- man5/proc.5 | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/man5/proc.5 b/man5/proc.5 index 96077d0dd195..d661e8cfeac9 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated .\" Added in 2.6.9 .\" CONFIG_SCHEDSTATS .TP +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" +This file reports +.BR allow +if the setgroups system call is allowed in the current user namespace. +This file reports +.BR deny +if the setgroups system call is not allowed in the current user namespace. +This file may be written to with values of +.BR allow +and +.BR deny +before +.IR /proc/[pid]/gid_map +is written to (enabling setgroups) in a user namespace. +.TP .IR /proc/[pid]/smaps " (since Linux 2.6.14)" This file shows memory consumption for each of the process's mappings. (The -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2014-12-12 21:54 ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman @ 2015-02-02 15:36 ` Michael Kerrisk (man-pages) 2015-02-11 8:01 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-02 15:36 UTC (permalink / raw) To: Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski, Josh Triplett [Adding Josh to CC in case he has anything to add.] On 12/12/2014 10:54 PM, Eric W. Biederman wrote: > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > man5/proc.5 | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/man5/proc.5 b/man5/proc.5 > index 96077d0dd195..d661e8cfeac9 100644 > --- a/man5/proc.5 > +++ b/man5/proc.5 > @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated > .\" Added in 2.6.9 > .\" CONFIG_SCHEDSTATS > .TP > +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" > +This file reports > +.BR allow > +if the setgroups system call is allowed in the current user namespace. > +This file reports > +.BR deny > +if the setgroups system call is not allowed in the current user namespace. > +This file may be written to with values of > +.BR allow > +and > +.BR deny > +before > +.IR /proc/[pid]/gid_map > +is written to (enabling setgroups) in a user namespace. > +.TP > .IR /proc/[pid]/smaps " (since Linux 2.6.14)" > This file shows memory consumption for each of the process's mappings. > (The Hi Eric, Thanks for this patch. I applied it, and then tried to work in quite a few other details gleaned from the source code and commit message, and Jon Corbet's article at http://lwn.net/Articles/626665/. Could you please let me know if the following is correct: /proc/[pid]/setgroups (since Linux 3.19) This file displays the string "allow" if processes in the user namespace that contains the process pid are permitted to employ the setgroups(2) system call, and "deny" if setgroups(2) is not permitted in that user namespace. A privileged process (one with the CAP_SYS_ADMIN capa‐ bility in the namespace) may write either of the strings "allow" or "deny" to this file before writing a group ID mapping for this user namespace to the file /proc/[pid]/gid_map. Writing the string "deny" prevents any process in the user namespace from employing set‐ groups(2). The default value of this file in the initial user namespace is "allow". Once /proc/[pid]/gid_map has been written to (which has the effect of enabling setgroups(2) in the user names‐ pace), it is no longer possible to deny setgroups(2) by writing to /proc/[pid]/setgroups. A child user namespace inherits the /proc/[pid]/gid_map setting from its parent. If the setgroups file has the value "deny", then the setgroups(2) system call can't subsequently be reenabled (by writing "allow" to the file) in this user namespace. This restriction also propagates down to all child user namespaces of this user namespace. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2015-02-02 15:36 ` Michael Kerrisk (man-pages) @ 2015-02-11 8:01 ` Michael Kerrisk (man-pages) 2015-02-11 13:51 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-11 8:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hi Eric, Ping! Cheers, Michael On 2 February 2015 at 16:36, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > [Adding Josh to CC in case he has anything to add.] > > On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >> --- >> man5/proc.5 | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/man5/proc.5 b/man5/proc.5 >> index 96077d0dd195..d661e8cfeac9 100644 >> --- a/man5/proc.5 >> +++ b/man5/proc.5 >> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated >> .\" Added in 2.6.9 >> .\" CONFIG_SCHEDSTATS >> .TP >> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" >> +This file reports >> +.BR allow >> +if the setgroups system call is allowed in the current user namespace. >> +This file reports >> +.BR deny >> +if the setgroups system call is not allowed in the current user namespace. >> +This file may be written to with values of >> +.BR allow >> +and >> +.BR deny >> +before >> +.IR /proc/[pid]/gid_map >> +is written to (enabling setgroups) in a user namespace. >> +.TP >> .IR /proc/[pid]/smaps " (since Linux 2.6.14)" >> This file shows memory consumption for each of the process's mappings. >> (The > > Hi Eric, > > Thanks for this patch. I applied it, and then tried to work in > quite a few other details gleaned from the source code and commit > message, and Jon Corbet's article at http://lwn.net/Articles/626665/. > Could you please let me know if the following is correct: > > /proc/[pid]/setgroups (since Linux 3.19) > This file displays the string "allow" if processes in > the user namespace that contains the process pid are > permitted to employ the setgroups(2) system call, and > "deny" if setgroups(2) is not permitted in that user > namespace. > > A privileged process (one with the CAP_SYS_ADMIN capa‐ > bility in the namespace) may write either of the strings > "allow" or "deny" to this file before writing a group ID > mapping for this user namespace to the file > /proc/[pid]/gid_map. Writing the string "deny" prevents > any process in the user namespace from employing set‐ > groups(2). > > The default value of this file in the initial user > namespace is "allow". > > Once /proc/[pid]/gid_map has been written to (which has > the effect of enabling setgroups(2) in the user names‐ > pace), it is no longer possible to deny setgroups(2) by > writing to /proc/[pid]/setgroups. > > A child user namespace inherits the /proc/[pid]/gid_map > setting from its parent. > > If the setgroups file has the value "deny", then the > setgroups(2) system call can't subsequently be reenabled > (by writing "allow" to the file) in this user namespace. > This restriction also propagates down to all child user > namespaces of this user namespace. > > Thanks, > > Michael > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2015-02-11 8:01 ` Michael Kerrisk (man-pages) @ 2015-02-11 13:51 ` Eric W. Biederman 2015-02-12 13:53 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2015-02-11 13:51 UTC (permalink / raw) To: mtk.manpages Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > Hi Eric, > > Ping! > > Cheers, > > Michael My apologies. You description wasn't wrong but it may be a bit misleading, explanation below. You will have to figure out how to work that into your proposed text. > On 2 February 2015 at 16:36, Michael Kerrisk (man-pages) > <mtk.manpages@gmail.com> wrote: >> [Adding Josh to CC in case he has anything to add.] >> >> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>> >>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >>> --- >>> man5/proc.5 | 15 +++++++++++++++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/man5/proc.5 b/man5/proc.5 >>> index 96077d0dd195..d661e8cfeac9 100644 >>> --- a/man5/proc.5 >>> +++ b/man5/proc.5 >>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated >>> .\" Added in 2.6.9 >>> .\" CONFIG_SCHEDSTATS >>> .TP >>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" >>> +This file reports >>> +.BR allow >>> +if the setgroups system call is allowed in the current user namespace. >>> +This file reports >>> +.BR deny >>> +if the setgroups system call is not allowed in the current user namespace. >>> +This file may be written to with values of >>> +.BR allow >>> +and >>> +.BR deny >>> +before >>> +.IR /proc/[pid]/gid_map >>> +is written to (enabling setgroups) in a user namespace. >>> +.TP >>> .IR /proc/[pid]/smaps " (since Linux 2.6.14)" >>> This file shows memory consumption for each of the process's mappings. >>> (The >> >> Hi Eric, >> >> Thanks for this patch. I applied it, and then tried to work in >> quite a few other details gleaned from the source code and commit >> message, and Jon Corbet's article at http://lwn.net/Articles/626665/. >> Could you please let me know if the following is correct: It is close but it may be misleading. >> /proc/[pid]/setgroups (since Linux 3.19) >> This file displays the string "allow" if processes in >> the user namespace that contains the process pid are >> permitted to employ the setgroups(2) system call, and >> "deny" if setgroups(2) is not permitted in that user >> namespace. With the caveat that when gid_map is not set that setgroups is also not allowed. >> A privileged process (one with the CAP_SYS_ADMIN capa‐ >> bility in the namespace) may write either of the strings >> "allow" or "deny" to this file before writing a group ID >> mapping for this user namespace to the file >> /proc/[pid]/gid_map. Writing the string "deny" prevents >> any process in the user namespace from employing set‐ >> groups(2). Or more succintly. You are allowed to write to /proc/[pid]/setgroups when calling setgroups is not allowed because gid_map is unset. This ensures we do not have any transitions from a state where setgroups is allowed to a state where setgroups is denied. There are only transitions from setgroups not-allowed to setgroups allowed. >> The default value of this file in the initial user >> namespace is "allow". >> >> Once /proc/[pid]/gid_map has been written to (which has >> the effect of enabling setgroups(2) in the user names‐ >> pace), it is no longer possible to deny setgroups(2) by >> writing to /proc/[pid]/setgroups. >> >> A child user namespace inherits the /proc/[pid]/gid_map >> setting from its parent. >> >> If the setgroups file has the value "deny", then the >> setgroups(2) system call can't subsequently be reenabled >> (by writing "allow" to the file) in this user namespace. >> This restriction also propagates down to all child user >> namespaces of this user namespace. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2015-02-11 13:51 ` Eric W. Biederman @ 2015-02-12 13:53 ` Michael Kerrisk (man-pages) 2015-02-21 7:57 ` Michael Kerrisk (man-pages) 2015-03-03 11:39 ` Michael Kerrisk (man-pages) 0 siblings, 2 replies; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-12 13:53 UTC (permalink / raw) To: Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hello Eric, On 02/11/2015 02:51 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > >> Hi Eric, >> >> Ping! >> >> Cheers, >> >> Michael > > My apologies. You description wasn't wrong but it may be a bit > misleading, explanation below. You will have to figure out how to work > that into your proposed text. > >> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages) >> <mtk.manpages@gmail.com> wrote: >>> [Adding Josh to CC in case he has anything to add.] >>> >>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>>> >>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >>>> --- >>>> man5/proc.5 | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/man5/proc.5 b/man5/proc.5 >>>> index 96077d0dd195..d661e8cfeac9 100644 >>>> --- a/man5/proc.5 >>>> +++ b/man5/proc.5 >>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated >>>> .\" Added in 2.6.9 >>>> .\" CONFIG_SCHEDSTATS >>>> .TP >>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" >>>> +This file reports >>>> +.BR allow >>>> +if the setgroups system call is allowed in the current user namespace. >>>> +This file reports >>>> +.BR deny >>>> +if the setgroups system call is not allowed in the current user namespace. >>>> +This file may be written to with values of >>>> +.BR allow >>>> +and >>>> +.BR deny >>>> +before >>>> +.IR /proc/[pid]/gid_map >>>> +is written to (enabling setgroups) in a user namespace. >>>> +.TP >>>> .IR /proc/[pid]/smaps " (since Linux 2.6.14)" >>>> This file shows memory consumption for each of the process's mappings. >>>> (The >>> >>> Hi Eric, >>> >>> Thanks for this patch. I applied it, and then tried to work in >>> quite a few other details gleaned from the source code and commit >>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/. >>> Could you please let me know if the following is correct: > > It is close but it may be misleading. > >>> /proc/[pid]/setgroups (since Linux 3.19) >>> This file displays the string "allow" if processes in >>> the user namespace that contains the process pid are >>> permitted to employ the setgroups(2) system call, and >>> "deny" if setgroups(2) is not permitted in that user >>> namespace. > > With the caveat that when gid_map is not set that setgroups is also not > allowed. Okay -- Iadded that point. >>> A privileged process (one with the CAP_SYS_ADMIN capa‐ >>> bility in the namespace) may write either of the strings >>> "allow" or "deny" to this file before writing a group ID >>> mapping for this user namespace to the file >>> /proc/[pid]/gid_map. Writing the string "deny" prevents >>> any process in the user namespace from employing set‐ >>> groups(2). > > Or more succintly. You are allowed to write to /proc/[pid]/setgroups > when calling setgroups is not allowed because gid_map is unset. This > ensures we do not have any transitions from a state where setgroups > is allowed to a state where setgroups is denied. There are only > transitions from setgroups not-allowed to setgroups allowed. And I've worked in the above point, rewording a bit along the way. So, how does the following look (only the first two paragraphs have changed)? /proc/[pid]/setgroups (since Linux 3.19) This file displays the string "allow" if processes in the user namespace that contains the process pid are permitted to employ the setgroups(2) system call, and "deny" if setgroups(2) is not permitted in that user namespace. (Note, however, that calls to setgroups(2) are also not permitted if /proc/[pid]/gid_map has not yet been set.) A privileged process (one with the CAP_SYS_ADMIN capa‐ bility in the namespace) may write either of the strings "allow" or "deny" to this file before writing a group ID mapping for this user namespace to the file /proc/[pid]/gid_map. Writing the string "deny" prevents any process in the user namespace from employing set‐ groups(2). In other words, it is permitted to write to /proc/[pid]/setgroups so long as calling setgroups(2) is not allowed because /proc/[pid]gid_map has not been set. This ensures that a process cannot transition from a state where setgroups(2) is allowed to a state where setgroups(2) is denied; a process can only trabsition from setgroups(2) being disallowed to setgroups(2) being allowed. The default value of this file in the initial user namespace is "allow". Once /proc/[pid]/gid_map has been written to (which has the effect of enabling setgroups(2) in the user names‐ pace), it is no longer possible to deny setgroups(2) by writing to /proc/[pid]/setgroups. A child user namespace inherits the /proc/[pid]/gid_map setting from its parent. If the setgroups file has the value "deny", then the setgroups(2) system call can't subsequently be reenabled (by writing "allow" to the file) in this user namespace. This restriction also propagates down to all child user namespaces of this user namespace. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2015-02-12 13:53 ` Michael Kerrisk (man-pages) @ 2015-02-21 7:57 ` Michael Kerrisk (man-pages) 2015-03-03 11:39 ` Michael Kerrisk (man-pages) 1 sibling, 0 replies; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-21 7:57 UTC (permalink / raw) To: Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hi Eric, Ping! Cheers, Michael On 02/12/2015 02:53 PM, Michael Kerrisk (man-pages) wrote: > Hello Eric, > > On 02/11/2015 02:51 PM, Eric W. Biederman wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: >> >>> Hi Eric, >>> >>> Ping! >>> >>> Cheers, >>> >>> Michael >> >> My apologies. You description wasn't wrong but it may be a bit >> misleading, explanation below. You will have to figure out how to work >> that into your proposed text. >> >>> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages) >>> <mtk.manpages@gmail.com> wrote: >>>> [Adding Josh to CC in case he has anything to add.] >>>> >>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>>>> >>>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >>>>> --- >>>>> man5/proc.5 | 15 +++++++++++++++ >>>>> 1 file changed, 15 insertions(+) >>>>> >>>>> diff --git a/man5/proc.5 b/man5/proc.5 >>>>> index 96077d0dd195..d661e8cfeac9 100644 >>>>> --- a/man5/proc.5 >>>>> +++ b/man5/proc.5 >>>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated >>>>> .\" Added in 2.6.9 >>>>> .\" CONFIG_SCHEDSTATS >>>>> .TP >>>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" >>>>> +This file reports >>>>> +.BR allow >>>>> +if the setgroups system call is allowed in the current user namespace. >>>>> +This file reports >>>>> +.BR deny >>>>> +if the setgroups system call is not allowed in the current user namespace. >>>>> +This file may be written to with values of >>>>> +.BR allow >>>>> +and >>>>> +.BR deny >>>>> +before >>>>> +.IR /proc/[pid]/gid_map >>>>> +is written to (enabling setgroups) in a user namespace. >>>>> +.TP >>>>> .IR /proc/[pid]/smaps " (since Linux 2.6.14)" >>>>> This file shows memory consumption for each of the process's mappings. >>>>> (The >>>> >>>> Hi Eric, >>>> >>>> Thanks for this patch. I applied it, and then tried to work in >>>> quite a few other details gleaned from the source code and commit >>>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/. >>>> Could you please let me know if the following is correct: >> >> It is close but it may be misleading. >> >>>> /proc/[pid]/setgroups (since Linux 3.19) >>>> This file displays the string "allow" if processes in >>>> the user namespace that contains the process pid are >>>> permitted to employ the setgroups(2) system call, and >>>> "deny" if setgroups(2) is not permitted in that user >>>> namespace. >> >> With the caveat that when gid_map is not set that setgroups is also not >> allowed. > > Okay -- Iadded that point. > >>>> A privileged process (one with the CAP_SYS_ADMIN capa‐ >>>> bility in the namespace) may write either of the strings >>>> "allow" or "deny" to this file before writing a group ID >>>> mapping for this user namespace to the file >>>> /proc/[pid]/gid_map. Writing the string "deny" prevents >>>> any process in the user namespace from employing set‐ >>>> groups(2). >> >> Or more succintly. You are allowed to write to /proc/[pid]/setgroups >> when calling setgroups is not allowed because gid_map is unset. This >> ensures we do not have any transitions from a state where setgroups >> is allowed to a state where setgroups is denied. There are only >> transitions from setgroups not-allowed to setgroups allowed. > > And I've worked in the above point, rewording a bit along the way. > So, how does the following look (only the first two paragraphs have > changed)? > > /proc/[pid]/setgroups (since Linux 3.19) > This file displays the string "allow" if processes in > the user namespace that contains the process pid are > permitted to employ the setgroups(2) system call, and > "deny" if setgroups(2) is not permitted in that user > namespace. (Note, however, that calls to setgroups(2) > are also not permitted if /proc/[pid]/gid_map has not > yet been set.) > > A privileged process (one with the CAP_SYS_ADMIN capa‐ > bility in the namespace) may write either of the strings > "allow" or "deny" to this file before writing a group ID > mapping for this user namespace to the file > /proc/[pid]/gid_map. Writing the string "deny" prevents > any process in the user namespace from employing set‐ > groups(2). In other words, it is permitted to write to > /proc/[pid]/setgroups so long as calling setgroups(2) is > not allowed because /proc/[pid]gid_map has not been set. > This ensures that a process cannot transition from a > state where setgroups(2) is allowed to a state where > setgroups(2) is denied; a process can only trabsition > from setgroups(2) being disallowed to setgroups(2) being > allowed. > > The default value of this file in the initial user > namespace is "allow". > > Once /proc/[pid]/gid_map has been written to (which has > the effect of enabling setgroups(2) in the user names‐ > pace), it is no longer possible to deny setgroups(2) by > writing to /proc/[pid]/setgroups. > > A child user namespace inherits the /proc/[pid]/gid_map > setting from its parent. > > If the setgroups file has the value "deny", then the > setgroups(2) system call can't subsequently be reenabled > (by writing "allow" to the file) in this user namespace. > This restriction also propagates down to all child user > namespaces of this user namespace. > > Cheers, > > Michael > > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups 2015-02-12 13:53 ` Michael Kerrisk (man-pages) 2015-02-21 7:57 ` Michael Kerrisk (man-pages) @ 2015-03-03 11:39 ` Michael Kerrisk (man-pages) 1 sibling, 0 replies; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-03-03 11:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hi Eric Ping^2! Cheers, Michael On 12 February 2015 at 14:53, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > Hello Eric, > > On 02/11/2015 02:51 PM, Eric W. Biederman wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: >> >>> Hi Eric, >>> >>> Ping! >>> >>> Cheers, >>> >>> Michael >> >> My apologies. You description wasn't wrong but it may be a bit >> misleading, explanation below. You will have to figure out how to work >> that into your proposed text. >> >>> On 2 February 2015 at 16:36, Michael Kerrisk (man-pages) >>> <mtk.manpages@gmail.com> wrote: >>>> [Adding Josh to CC in case he has anything to add.] >>>> >>>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>>>> >>>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> >>>>> --- >>>>> man5/proc.5 | 15 +++++++++++++++ >>>>> 1 file changed, 15 insertions(+) >>>>> >>>>> diff --git a/man5/proc.5 b/man5/proc.5 >>>>> index 96077d0dd195..d661e8cfeac9 100644 >>>>> --- a/man5/proc.5 >>>>> +++ b/man5/proc.5 >>>>> @@ -1097,6 +1097,21 @@ are not available if the main thread has already terminated >>>>> .\" Added in 2.6.9 >>>>> .\" CONFIG_SCHEDSTATS >>>>> .TP >>>>> +.IR /proc/[pid]/setgroups " (since Linux 3.19-rc1)" >>>>> +This file reports >>>>> +.BR allow >>>>> +if the setgroups system call is allowed in the current user namespace. >>>>> +This file reports >>>>> +.BR deny >>>>> +if the setgroups system call is not allowed in the current user namespace. >>>>> +This file may be written to with values of >>>>> +.BR allow >>>>> +and >>>>> +.BR deny >>>>> +before >>>>> +.IR /proc/[pid]/gid_map >>>>> +is written to (enabling setgroups) in a user namespace. >>>>> +.TP >>>>> .IR /proc/[pid]/smaps " (since Linux 2.6.14)" >>>>> This file shows memory consumption for each of the process's mappings. >>>>> (The >>>> >>>> Hi Eric, >>>> >>>> Thanks for this patch. I applied it, and then tried to work in >>>> quite a few other details gleaned from the source code and commit >>>> message, and Jon Corbet's article at http://lwn.net/Articles/626665/. >>>> Could you please let me know if the following is correct: >> >> It is close but it may be misleading. >> >>>> /proc/[pid]/setgroups (since Linux 3.19) >>>> This file displays the string "allow" if processes in >>>> the user namespace that contains the process pid are >>>> permitted to employ the setgroups(2) system call, and >>>> "deny" if setgroups(2) is not permitted in that user >>>> namespace. >> >> With the caveat that when gid_map is not set that setgroups is also not >> allowed. > > Okay -- Iadded that point. > >>>> A privileged process (one with the CAP_SYS_ADMIN capa‐ >>>> bility in the namespace) may write either of the strings >>>> "allow" or "deny" to this file before writing a group ID >>>> mapping for this user namespace to the file >>>> /proc/[pid]/gid_map. Writing the string "deny" prevents >>>> any process in the user namespace from employing set‐ >>>> groups(2). >> >> Or more succintly. You are allowed to write to /proc/[pid]/setgroups >> when calling setgroups is not allowed because gid_map is unset. This >> ensures we do not have any transitions from a state where setgroups >> is allowed to a state where setgroups is denied. There are only >> transitions from setgroups not-allowed to setgroups allowed. > > And I've worked in the above point, rewording a bit along the way. > So, how does the following look (only the first two paragraphs have > changed)? > > /proc/[pid]/setgroups (since Linux 3.19) > This file displays the string "allow" if processes in > the user namespace that contains the process pid are > permitted to employ the setgroups(2) system call, and > "deny" if setgroups(2) is not permitted in that user > namespace. (Note, however, that calls to setgroups(2) > are also not permitted if /proc/[pid]/gid_map has not > yet been set.) > > A privileged process (one with the CAP_SYS_ADMIN capa‐ > bility in the namespace) may write either of the strings > "allow" or "deny" to this file before writing a group ID > mapping for this user namespace to the file > /proc/[pid]/gid_map. Writing the string "deny" prevents > any process in the user namespace from employing set‐ > groups(2). In other words, it is permitted to write to > /proc/[pid]/setgroups so long as calling setgroups(2) is > not allowed because /proc/[pid]gid_map has not been set. > This ensures that a process cannot transition from a > state where setgroups(2) is allowed to a state where > setgroups(2) is denied; a process can only trabsition > from setgroups(2) being disallowed to setgroups(2) being > allowed. > > The default value of this file in the initial user > namespace is "allow". > > Once /proc/[pid]/gid_map has been written to (which has > the effect of enabling setgroups(2) in the user names‐ > pace), it is no longer possible to deny setgroups(2) by > writing to /proc/[pid]/setgroups. > > A child user namespace inherits the /proc/[pid]/gid_map > setting from its parent. > > If the setgroups file has the value "deny", then the > setgroups(2) system call can't subsequently be reenabled > (by writing "allow" to the file) in this user namespace. > This restriction also propagates down to all child user > namespaces of this user namespace. > > Cheers, > > Michael > > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups [not found] ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org> 2014-12-12 21:54 ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman @ 2014-12-12 21:54 ` Eric W. Biederman 2015-02-02 15:37 ` Michael Kerrisk (man-pages) 2015-02-02 21:31 ` Alban Crequy 1 sibling, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-12 21:54 UTC (permalink / raw) To: Michael Kerrisk-manpages Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Files with access permissions such as ---rwx---rwx give fewer permissions to their group then they do to everyone else. Which means dropping groups with setgroups(0, NULL) actually grants a process privileges. The uprivileged setting of gid_map turned out not to be safe after this change. Privilege setting of gid_map can be interpreted as meaning yes it is ok to drop groups. To prevent this problem and future problems user namespaces were changed in such a way as to guarantee a user can not obtain credentials without privilege they could not obtain without the help of user namespaces. This meant testing the effective user ID and not the filesystem user ID as setresuid and setregid allow setting any process uid or gid (except the supplemental groups) to the effective ID. Furthermore to preserve in some form the useful applications that have been setting gid_map without privilege the file /proc/[pid]/setgroups was added to allow disabling setgroups. With the setgroups system call permanently disabled in a user namespace it again becomes safe to allow writes to gid_map without privilege. Here is my meager attempt to update user_namespaces.7 to reflect these issues. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 index d76721d9a0a1..f8333a762308 100644 --- a/man7/user_namespaces.7 +++ b/man7/user_namespaces.7 @@ -533,11 +533,16 @@ One of the following is true: The data written to .I uid_map .RI ( gid_map ) -consists of a single line that maps the writing process's filesystem user ID +consists of a single line that maps the writing process's effective user ID (group ID) in the parent user namespace to a user ID (group ID) in the user namespace. -The usual case here is that this single line provides a mapping for user ID -of the process that created the namespace. +The writing process must have the same effective user ID as the process +that created the user namespace. +In the case of +.I gid_map +the +.I setgroups +file must have been written to earlier and disabled the setgroups system call. .IP * 3 The opening process has the .BR CAP_SETUID @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error .\" .\" ============================================================ .\" +.SS Interaction with system calls that change the uid or gid values +When in a user namespace where the +.I uid_map +or +.I gid_map +file has not been written the system calls that change user IDs +or group IDs respectively will fail. After the +.I uid_map +and +.I gid_map +file have been written only the mapped values may be used in +system calls that change user IDs and group IDs. + +For user IDs these system calls include +.BR setuid , +.BR setfsuid , +.BR setreuid , +and +.BR setresuid . + +For group IDs these system calls include +.BR setgid , +.BR setfsgid , +.BR setregid , +.BR setresgid , +and +.BR setgroups. + +Writing +.BR deny +to the +.I /proc/[pid]/setgroups +file before writing to +.I /proc/[pid]/gid_map +will permanently disable the setgroups system call in a user namespace +and allow writing to +.I /proc/[pid]/gid_map +without +.BR CAP_SETGID +in the parent user namespace. + .SS Unmapped user and group IDs .PP There are various places where an unmapped user ID (group ID) -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2014-12-12 21:54 ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman @ 2015-02-02 15:37 ` Michael Kerrisk (man-pages) 2015-02-11 8:02 ` Michael Kerrisk (man-pages) 2015-02-02 21:31 ` Alban Crequy 1 sibling, 1 reply; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-02 15:37 UTC (permalink / raw) To: Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hi Eric, Thanks for writing this up! On 12/12/2014 10:54 PM, Eric W. Biederman wrote: > > Files with access permissions such as ---rwx---rwx give fewer > permissions to their group then they do to everyone else. Which means > dropping groups with setgroups(0, NULL) actually grants a process > privileges. > > The uprivileged setting of gid_map turned out not to be safe after > this change. Privilege setting of gid_map can be interpreted as > meaning yes it is ok to drop groups. I had trouble to parse that sentence (and I'd like to make sure that the right sentence ends up in the commit message). Did you mean: "*Unprivileged* setting of gid_map can be interpreted as meaning yes it is ok to drop groups" ? Or something else? > To prevent this problem and future problems user namespaces were > changed in such a way as to guarantee a user can not obtain > credentials without privilege they could not obtain without the > help of user namespaces. > > This meant testing the effective user ID and not the filesystem user > ID as setresuid and setregid allow setting any process uid or gid > (except the supplemental groups) to the effective ID. > > Furthermore to preserve in some form the useful applications that have > been setting gid_map without privilege the file /proc/[pid]/setgroups > was added to allow disabling setgroups. With the setgroups system > call permanently disabled in a user namespace it again becomes safe to > allow writes to gid_map without privilege. > > Here is my meager attempt to update user_namespaces.7 to reflect these > issues. It looked pretty serviceable as patch, IMO. So, thanks again. I've applied, tweaking some wordings afterward, but changing nothing essential. See below for a question. > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 49 insertions(+), 3 deletions(-) > > diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 > index d76721d9a0a1..f8333a762308 100644 > --- a/man7/user_namespaces.7 > +++ b/man7/user_namespaces.7 > @@ -533,11 +533,16 @@ One of the following is true: > The data written to > .I uid_map > .RI ( gid_map ) > -consists of a single line that maps the writing process's filesystem user ID > +consists of a single line that maps the writing process's effective user ID > (group ID) in the parent user namespace to a user ID (group ID) > in the user namespace. > -The usual case here is that this single line provides a mapping for user ID > -of the process that created the namespace. > +The writing process must have the same effective user ID as the process > +that created the user namespace. > +In the case of > +.I gid_map > +the > +.I setgroups > +file must have been written to earlier and disabled the setgroups system call. > .IP * 3 > The opening process has the > .BR CAP_SETUID > @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error > .\" > .\" ============================================================ > .\" > +.SS Interaction with system calls that change the uid or gid values > +When in a user namespace where the > +.I uid_map > +or > +.I gid_map > +file has not been written the system calls that change user IDs > +or group IDs respectively will fail. After the > +.I uid_map > +and > +.I gid_map > +file have been written only the mapped values may be used in > +system calls that change user IDs and group IDs. > + > +For user IDs these system calls include > +.BR setuid , > +.BR setfsuid , > +.BR setreuid , > +and > +.BR setresuid . > + > +For group IDs these system calls include > +.BR setgid , > +.BR setfsgid , > +.BR setregid , > +.BR setresgid , > +and > +.BR setgroups. > + > +Writing > +.BR deny > +to the > +.I /proc/[pid]/setgroups > +file before writing to > +.I /proc/[pid]/gid_map > +will permanently disable the setgroups system call in a user namespace > +and allow writing to > +.I /proc/[pid]/gid_map > +without > +.BR CAP_SETGID > +in the parent user namespace. I just want to double check: you really did mean to write "*parent* namespace" above, right? Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2015-02-02 15:37 ` Michael Kerrisk (man-pages) @ 2015-02-11 8:02 ` Michael Kerrisk (man-pages) 2015-02-11 14:01 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-11 8:02 UTC (permalink / raw) To: Eric W. Biederman Cc: Michael Kerrisk, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hi Eric, Ping! Cheers, Michael On 2 February 2015 at 16:37, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > Hi Eric, > > Thanks for writing this up! > > On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >> >> Files with access permissions such as ---rwx---rwx give fewer >> permissions to their group then they do to everyone else. Which means >> dropping groups with setgroups(0, NULL) actually grants a process >> privileges. >> >> The uprivileged setting of gid_map turned out not to be safe after >> this change. Privilege setting of gid_map can be interpreted as >> meaning yes it is ok to drop groups. > > I had trouble to parse that sentence (and I'd like to make sure that > the right sentence ends up in the commit message). Did you mean: > > "*Unprivileged* setting of gid_map can be interpreted as meaning > yes it is ok to drop groups" > > ? > > Or something else? > >> To prevent this problem and future problems user namespaces were >> changed in such a way as to guarantee a user can not obtain >> credentials without privilege they could not obtain without the >> help of user namespaces. >> >> This meant testing the effective user ID and not the filesystem user >> ID as setresuid and setregid allow setting any process uid or gid >> (except the supplemental groups) to the effective ID. >> >> Furthermore to preserve in some form the useful applications that have >> been setting gid_map without privilege the file /proc/[pid]/setgroups >> was added to allow disabling setgroups. With the setgroups system >> call permanently disabled in a user namespace it again becomes safe to >> allow writes to gid_map without privilege. >> >> Here is my meager attempt to update user_namespaces.7 to reflect these >> issues. > > It looked pretty serviceable as patch, IMO. So, thanks again. I've applied, > tweaking some wordings afterward, but changing nothing essential. See below > for a question. > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 49 insertions(+), 3 deletions(-) >> >> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 >> index d76721d9a0a1..f8333a762308 100644 >> --- a/man7/user_namespaces.7 >> +++ b/man7/user_namespaces.7 >> @@ -533,11 +533,16 @@ One of the following is true: >> The data written to >> .I uid_map >> .RI ( gid_map ) >> -consists of a single line that maps the writing process's filesystem user ID >> +consists of a single line that maps the writing process's effective user ID >> (group ID) in the parent user namespace to a user ID (group ID) >> in the user namespace. >> -The usual case here is that this single line provides a mapping for user ID >> -of the process that created the namespace. >> +The writing process must have the same effective user ID as the process >> +that created the user namespace. >> +In the case of >> +.I gid_map >> +the >> +.I setgroups >> +file must have been written to earlier and disabled the setgroups system call. >> .IP * 3 >> The opening process has the >> .BR CAP_SETUID >> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error >> .\" >> .\" ============================================================ >> .\" >> +.SS Interaction with system calls that change the uid or gid values >> +When in a user namespace where the >> +.I uid_map >> +or >> +.I gid_map >> +file has not been written the system calls that change user IDs >> +or group IDs respectively will fail. After the >> +.I uid_map >> +and >> +.I gid_map >> +file have been written only the mapped values may be used in >> +system calls that change user IDs and group IDs. >> + >> +For user IDs these system calls include >> +.BR setuid , >> +.BR setfsuid , >> +.BR setreuid , >> +and >> +.BR setresuid . >> + >> +For group IDs these system calls include >> +.BR setgid , >> +.BR setfsgid , >> +.BR setregid , >> +.BR setresgid , >> +and >> +.BR setgroups. >> + >> +Writing >> +.BR deny >> +to the >> +.I /proc/[pid]/setgroups >> +file before writing to >> +.I /proc/[pid]/gid_map >> +will permanently disable the setgroups system call in a user namespace >> +and allow writing to >> +.I /proc/[pid]/gid_map >> +without >> +.BR CAP_SETGID >> +in the parent user namespace. > > I just want to double check: you really did mean to write "*parent* namespace" > above, right? > > Thanks, > > Michael > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2015-02-11 8:02 ` Michael Kerrisk (man-pages) @ 2015-02-11 14:01 ` Eric W. Biederman 2015-02-12 10:11 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2015-02-11 14:01 UTC (permalink / raw) To: mtk.manpages Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > Hi Eric, > > Ping! > > Cheers, > > Michael > > > On 2 February 2015 at 16:37, Michael Kerrisk (man-pages) > <mtk.manpages@gmail.com> wrote: >> Hi Eric, >> >> Thanks for writing this up! >> >> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>> >>> Files with access permissions such as ---rwx---rwx give fewer >>> permissions to their group then they do to everyone else. Which means >>> dropping groups with setgroups(0, NULL) actually grants a process >>> privileges. >>> >>> The uprivileged setting of gid_map turned out not to be safe after ^^^^^^^^^^^ unprivileged -- typo fix >>> this change. Privilege setting of gid_map can be interpreted as >>> meaning yes it is ok to drop groups. >> >> I had trouble to parse that sentence (and I'd like to make sure that >> the right sentence ends up in the commit message). Did you mean: >> >> "*Unprivileged* setting of gid_map can be interpreted as meaning >> yes it is ok to drop groups" >> ? >> >> Or something else? I meant: Setting of gid_map with privilege has been clarified to mean that dropping groups is ok. This allows existing programs that set gid_map with privilege to work without changes. That is newgidmap continues to work unchanged. >>> To prevent this problem and future problems user namespaces were >>> changed in such a way as to guarantee a user can not obtain >>> credentials without privilege they could not obtain without the >>> help of user namespaces. >>> >>> This meant testing the effective user ID and not the filesystem user >>> ID as setresuid and setregid allow setting any process uid or gid >>> (except the supplemental groups) to the effective ID. >>> >>> Furthermore to preserve in some form the useful applications that have >>> been setting gid_map without privilege the file /proc/[pid]/setgroups >>> was added to allow disabling setgroups. With the setgroups system >>> call permanently disabled in a user namespace it again becomes safe to >>> allow writes to gid_map without privilege. >>> >>> Here is my meager attempt to update user_namespaces.7 to reflect these >>> issues. >> >> It looked pretty serviceable as patch, IMO. So, thanks again. I've applied, >> tweaking some wordings afterward, but changing nothing essential. See below >> for a question. >> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>> --- >>> man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 49 insertions(+), 3 deletions(-) >>> >>> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 >>> index d76721d9a0a1..f8333a762308 100644 >>> --- a/man7/user_namespaces.7 >>> +++ b/man7/user_namespaces.7 >>> @@ -533,11 +533,16 @@ One of the following is true: >>> The data written to >>> .I uid_map >>> .RI ( gid_map ) >>> -consists of a single line that maps the writing process's filesystem user ID >>> +consists of a single line that maps the writing process's effective user ID >>> (group ID) in the parent user namespace to a user ID (group ID) >>> in the user namespace. >>> -The usual case here is that this single line provides a mapping for user ID >>> -of the process that created the namespace. >>> +The writing process must have the same effective user ID as the process >>> +that created the user namespace. >>> +In the case of >>> +.I gid_map >>> +the >>> +.I setgroups >>> +file must have been written to earlier and disabled the setgroups system call. >>> .IP * 3 >>> The opening process has the >>> .BR CAP_SETUID >>> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error >>> .\" >>> .\" ============================================================ >>> .\" >>> +.SS Interaction with system calls that change the uid or gid values >>> +When in a user namespace where the >>> +.I uid_map >>> +or >>> +.I gid_map >>> +file has not been written the system calls that change user IDs >>> +or group IDs respectively will fail. After the >>> +.I uid_map >>> +and >>> +.I gid_map >>> +file have been written only the mapped values may be used in >>> +system calls that change user IDs and group IDs. >>> + >>> +For user IDs these system calls include >>> +.BR setuid , >>> +.BR setfsuid , >>> +.BR setreuid , >>> +and >>> +.BR setresuid . >>> + >>> +For group IDs these system calls include >>> +.BR setgid , >>> +.BR setfsgid , >>> +.BR setregid , >>> +.BR setresgid , >>> +and >>> +.BR setgroups. >>> + >>> +Writing >>> +.BR deny >>> +to the >>> +.I /proc/[pid]/setgroups >>> +file before writing to >>> +.I /proc/[pid]/gid_map >>> +will permanently disable the setgroups system call in a user namespace >>> +and allow writing to >>> +.I /proc/[pid]/gid_map >>> +without >>> +.BR CAP_SETGID >>> +in the parent user namespace. >> >> I just want to double check: you really did mean to write "*parent* namespace" >> above, right? Yes. At this point only privilege in the *parent* user namespace is meaningful, as applications in the new user namespace have all privileges. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2015-02-11 14:01 ` Eric W. Biederman @ 2015-02-12 10:11 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-02-12 10:11 UTC (permalink / raw) To: Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski On 02/11/2015 03:01 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > >> Hi Eric, >> >> Ping! >> >> Cheers, >> >> Michael >> >> >> On 2 February 2015 at 16:37, Michael Kerrisk (man-pages) >> <mtk.manpages@gmail.com> wrote: >>> Hi Eric, >>> >>> Thanks for writing this up! >>> >>> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>>> >>>> Files with access permissions such as ---rwx---rwx give fewer >>>> permissions to their group then they do to everyone else. Which means >>>> dropping groups with setgroups(0, NULL) actually grants a process >>>> privileges. >>>> >>>> The uprivileged setting of gid_map turned out not to be safe after > ^^^^^^^^^^^ > unprivileged -- typo fix Thanks for confirming. >>>> this change. Privilege setting of gid_map can be interpreted as >>>> meaning yes it is ok to drop groups. >>> >>> I had trouble to parse that sentence (and I'd like to make sure that >>> the right sentence ends up in the commit message). Did you mean: >>> >>> "*Unprivileged* setting of gid_map can be interpreted as meaning >>> yes it is ok to drop groups" >>> ? >>> >>> Or something else? > > > I meant: Setting of gid_map with privilege has been clarified to mean > that dropping groups is ok. This allows existing programs that set > gid_map with privilege to work without changes. That is newgidmap > continues to work unchanged. Thanks. I added that text to the changelog message. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2014-12-12 21:54 ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman 2015-02-02 15:37 ` Michael Kerrisk (man-pages) @ 2015-02-02 21:31 ` Alban Crequy 2015-03-04 14:00 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 79+ messages in thread From: Alban Crequy @ 2015-02-02 21:31 UTC (permalink / raw) To: Eric W. Biederman Cc: Michael Kerrisk-manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski Hello, Thanks for updating the man page. On 12 December 2014 at 22:54, Eric W. Biederman <ebiederm@xmission.com> wrote: (...) > Furthermore to preserve in some form the useful applications that have > been setting gid_map without privilege the file /proc/[pid]/setgroups > was added to allow disabling setgroups. With the setgroups system > call permanently disabled in a user namespace it again becomes safe to > allow writes to gid_map without privilege. > > Here is my meager attempt to update user_namespaces.7 to reflect these > issues. The program userns_child_exec.c in user_namespaces.7 should be updated to write in /proc/.../setgroups, near the line: /* Update the UID and GID maps in the child */ Otherwise, the example given in the manpage does not work: $ ./userns_child_exec -p -m -U -M '0 1000 1' -G '0 1000 1' bash Cheers, Alban ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups 2015-02-02 21:31 ` Alban Crequy @ 2015-03-04 14:00 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 79+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-03-04 14:00 UTC (permalink / raw) To: Alban Crequy, Eric W. Biederman Cc: mtk.manpages, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable, Andy Lutomirski On 02/02/2015 10:31 PM, Alban Crequy wrote: > Hello, > > Thanks for updating the man page. > > On 12 December 2014 at 22:54, Eric W. Biederman <ebiederm@xmission.com> wrote: > (...) >> Furthermore to preserve in some form the useful applications that have >> been setting gid_map without privilege the file /proc/[pid]/setgroups >> was added to allow disabling setgroups. With the setgroups system >> call permanently disabled in a user namespace it again becomes safe to >> allow writes to gid_map without privilege. >> >> Here is my meager attempt to update user_namespaces.7 to reflect these >> issues. > > The program userns_child_exec.c in user_namespaces.7 should be updated > to write in /proc/.../setgroups, near the line: > /* Update the UID and GID maps in the child */ > > Otherwise, the example given in the manpage does not work: > $ ./userns_child_exec -p -m -U -M '0 1000 1' -G '0 1000 1' bash Thanks, Alban. I've added code to the example to handle /proc/PID/setgroups (and tested). Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (5 preceding siblings ...) 2014-12-09 20:42 ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-09 20:43 ` Eric W. Biederman 2014-12-10 16:39 ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman 7 siblings, 0 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-09 20:43 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index b507f9af7ff2..3b29b9a52332 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -826,6 +826,11 @@ static bool new_idmap_permitted(const struct file *file, kuid_t uid = make_kuid(ns->parent, id); if (uid_eq(uid, cred->euid)) return true; + } else if (cap_setid == CAP_SETGID) { + kgid_t gid = make_kgid(ns->parent, id); + if (!(ns->flags & USERNS_SETGROUPS_ALLOWED) && + gid_eq(gid, cred->egid)) + return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* [CFT] Can I get some Tested-By's on this series? 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (6 preceding siblings ...) 2014-12-09 20:43 ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman @ 2014-12-10 16:39 ` Eric W. Biederman 2014-12-10 22:48 ` Serge Hallyn 2014-12-16 2:05 ` Andy Lutomirski 7 siblings, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-10 16:39 UTC (permalink / raw) To: Andy Lutomirski, Serge E. Hallyn, Richard Weinberger Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Kenton Varda, stable Will people please test these patches with their container project? These changes break container userspace (hopefully in a minimal way) if I could have that confirmed by testing I would really appreciate it. I really don't want to send out a bug fix that accidentally breaks userspace again. The only issue sort of under discussion is if there is a better name for /proc/<pid>/setgroups, and the name of the file will not affect the functionality of the patchset. With the code reviewed and written in simple obviously correct, easily reviewable ways I am hoping/planning to send this to Linus ASAP. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 16:39 ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman @ 2014-12-10 22:48 ` Serge Hallyn 2014-12-10 22:50 ` Richard Weinberger ` (2 more replies) 2014-12-16 2:05 ` Andy Lutomirski 1 sibling, 3 replies; 79+ messages in thread From: Serge Hallyn @ 2014-12-10 22:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Andy Lutomirski, Serge E. Hallyn, Richard Weinberger, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Quoting Eric W. Biederman (ebiederm@xmission.com): > > Will people please test these patches with their container project? > > These changes break container userspace (hopefully in a minimal way) if > I could have that confirmed by testing I would really appreciate it. I > really don't want to send out a bug fix that accidentally breaks > userspace again. > > The only issue sort of under discussion is if there is a better name for > /proc/<pid>/setgroups, and the name of the file will not affect the > functionality of the patchset. > > With the code reviewed and written in simple obviously correct, easily > reviewable ways I am hoping/planning to send this to Linus ASAP. > > Eric Is there a git tree we can clone? ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 22:48 ` Serge Hallyn @ 2014-12-10 22:50 ` Richard Weinberger 2014-12-10 23:19 ` Eric W. Biederman 2014-12-13 22:31 ` serge [not found] ` <87lhmcy2et.fsf@x220.int.ebiederm.org> 2 siblings, 1 reply; 79+ messages in thread From: Richard Weinberger @ 2014-12-10 22:50 UTC (permalink / raw) To: Serge Hallyn, Eric W. Biederman Cc: Andy Lutomirski, Serge E. Hallyn, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Am 10.12.2014 um 23:48 schrieb Serge Hallyn: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> >> Will people please test these patches with their container project? >> >> These changes break container userspace (hopefully in a minimal way) if >> I could have that confirmed by testing I would really appreciate it. I >> really don't want to send out a bug fix that accidentally breaks >> userspace again. >> >> The only issue sort of under discussion is if there is a better name for >> /proc/<pid>/setgroups, and the name of the file will not affect the >> functionality of the patchset. >> >> With the code reviewed and written in simple obviously correct, easily >> reviewable ways I am hoping/planning to send this to Linus ASAP. >> >> Eric > > Is there a git tree we can clone? I was about to ask that too. Hopefully I can tomorrow find some time for testing. Thanks, //richard ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 22:50 ` Richard Weinberger @ 2014-12-10 23:19 ` Eric W. Biederman 2014-12-11 19:27 ` Richard Weinberger 2014-12-12 6:56 ` Chen, Hanxiao 0 siblings, 2 replies; 79+ messages in thread From: Eric W. Biederman @ 2014-12-10 23:19 UTC (permalink / raw) To: Richard Weinberger Cc: Serge Hallyn, Andy Lutomirski, Serge E. Hallyn, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Richard Weinberger <richard@nod.at> writes: > Am 10.12.2014 um 23:48 schrieb Serge Hallyn: >> Quoting Eric W. Biederman (ebiederm@xmission.com): >>> >>> Will people please test these patches with their container project? >>> >>> These changes break container userspace (hopefully in a minimal way) if >>> I could have that confirmed by testing I would really appreciate it. I >>> really don't want to send out a bug fix that accidentally breaks >>> userspace again. >>> >>> The only issue sort of under discussion is if there is a better name for >>> /proc/<pid>/setgroups, and the name of the file will not affect the >>> functionality of the patchset. >>> >>> With the code reviewed and written in simple obviously correct, easily >>> reviewable ways I am hoping/planning to send this to Linus ASAP. >>> >>> Eric >> >> Is there a git tree we can clone? > > I was about to ask that too. > Hopefully I can tomorrow find some time for testing. git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing That holds my entire queue of fixes against 3.18-rc6 Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 23:19 ` Eric W. Biederman @ 2014-12-11 19:27 ` Richard Weinberger 2014-12-12 6:56 ` Chen, Hanxiao 1 sibling, 0 replies; 79+ messages in thread From: Richard Weinberger @ 2014-12-11 19:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, Andy Lutomirski, Serge E. Hallyn, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Am 11.12.2014 um 00:19 schrieb Eric W. Biederman: > Richard Weinberger <richard@nod.at> writes: > >> Am 10.12.2014 um 23:48 schrieb Serge Hallyn: >>> Quoting Eric W. Biederman (ebiederm@xmission.com): >>>> >>>> Will people please test these patches with their container project? >>>> >>>> These changes break container userspace (hopefully in a minimal way) if >>>> I could have that confirmed by testing I would really appreciate it. I >>>> really don't want to send out a bug fix that accidentally breaks >>>> userspace again. >>>> >>>> The only issue sort of under discussion is if there is a better name for >>>> /proc/<pid>/setgroups, and the name of the file will not affect the >>>> functionality of the patchset. >>>> >>>> With the code reviewed and written in simple obviously correct, easily >>>> reviewable ways I am hoping/planning to send this to Linus ASAP. >>>> >>>> Eric >>> >>> Is there a git tree we can clone? >> >> I was about to ask that too. >> Hopefully I can tomorrow find some time for testing. > > git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > > That holds my entire queue of fixes against 3.18-rc6 So far nothing broke on my libvirt-lxc test bed. :-) Tested with openSUSE 13.2 and libvirt 1.2.9. Tested-by: Richard Weinberger <richard@nod.at> Thanks, //richard ^ permalink raw reply [flat|nested] 79+ messages in thread
* RE: [CFT] Can I get some Tested-By's on this series? 2014-12-10 23:19 ` Eric W. Biederman 2014-12-11 19:27 ` Richard Weinberger @ 2014-12-12 6:56 ` Chen, Hanxiao 1 sibling, 0 replies; 79+ messages in thread From: Chen, Hanxiao @ 2014-12-12 6:56 UTC (permalink / raw) To: Eric W. Biederman, Richard Weinberger Cc: linux-man, Kees Cook, Linux API, Linux Containers, Serge Hallyn, Josh Triplett, stable, Andy Lutomirski, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="gb2312", Size: 1968 bytes --] > -----Original Message----- > From: containers-bounces@lists.linux-foundation.org > [mailto:containers-bounces@lists.linux-foundation.org] On Behalf Of Eric W. > Biederman > Sent: Thursday, December 11, 2014 7:20 AM > To: Richard Weinberger > Cc: linux-man; Kees Cook; Linux API; Linux Containers; Serge Hallyn; Josh Triplett; > stable; Andy Lutomirski; Kenton Varda; LSM; Michael Kerrisk-manpages; Casey > Schaufler; Andrew Morton; linux-kernel@vger.kernel.org > Subject: Re: [CFT] Can I get some Tested-By's on this series? > > Richard Weinberger <richard@nod.at> writes: > > > Am 10.12.2014 um 23:48 schrieb Serge Hallyn: > >> Quoting Eric W. Biederman (ebiederm@xmission.com): > >>> > >>> Will people please test these patches with their container project? > >>> > >>> These changes break container userspace (hopefully in a minimal way) if > >>> I could have that confirmed by testing I would really appreciate it. I > >>> really don't want to send out a bug fix that accidentally breaks > >>> userspace again. > >>> > >>> The only issue sort of under discussion is if there is a better name for > >>> /proc/<pid>/setgroups, and the name of the file will not affect the > >>> functionality of the patchset. > >>> > >>> With the code reviewed and written in simple obviously correct, easily > >>> reviewable ways I am hoping/planning to send this to Linus ASAP. > >>> > >>> Eric > >> > >> Is there a git tree we can clone? > > > > I was about to ask that too. > > Hopefully I can tomorrow find some time for testing. > > git pull git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git > for-testing > > That holds my entire queue of fixes against 3.18-rc6 > Tested on Fedora20 with libvirt 1.2.11, works fine. Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Thanks, - Chen ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 22:48 ` Serge Hallyn 2014-12-10 22:50 ` Richard Weinberger @ 2014-12-13 22:31 ` serge [not found] ` <87lhmcy2et.fsf@x220.int.ebiederm.org> 2 siblings, 0 replies; 79+ messages in thread From: serge @ 2014-12-13 22:31 UTC (permalink / raw) To: Eric W. Biederman, Serge Hallyn Cc: Andy Lutomirski, Richard Weinberger, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton sorry, I've only been back from the road the days... Two tries at compiling have failed (infrastructure problems, not your set), hoping to fire of another build tonight.On 12/10/14 16:48 Serge Hallyn wrote: Quoting Eric W. Biederman (ebiederm@xmission.com): > > Will people please test these patches with their container project? > > These changes break container userspace (hopefully in a minimal way) if > I could have that confirmed by testing I would really appreciate it. I > really don't want to send out a bug fix that accidentally breaks > userspace again. > > The only issue sort of under discussion is if there is a better name for > /proc/<pid>/setgroups, and the name of the file will not affect the > functionality of the patchset. > > With the code reviewed and written in simple obviously correct, easily > reviewable ways I am hoping/planning to send this to Linus ASAP. > > Eric Is there a git tree we can clone? ^ permalink raw reply [flat|nested] 79+ messages in thread
[parent not found: <87lhmcy2et.fsf@x220.int.ebiederm.org>]
[parent not found: <20141212220840.GF22091@castiana.ipv6.teksavvy.com>]
[parent not found: <8761dgze56.fsf@x220.int.ebiederm.org>]
* Re: [CFT] Can I get some Tested-By's on this series? [not found] ` <8761dgze56.fsf@x220.int.ebiederm.org> @ 2014-12-15 19:38 ` Serge Hallyn 2014-12-15 20:11 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Serge Hallyn @ 2014-12-15 19:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Stéphane Graber, Richard Weinberger, Serge Hallyn, Andy Lutomirski, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Quoting Eric W. Biederman (ebiederm@xmission.com): > Stéphane Graber <stgraber@ubuntu.com> writes: > > > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote: > >> Serge Hallyn <serge.hallyn@ubuntu.com> writes: > >> > >> > Quoting Eric W. Biederman (ebiederm@xmission.com): > >> >> > >> >> Will people please test these patches with their container project? > >> >> > >> >> These changes break container userspace (hopefully in a minimal way) if > >> >> I could have that confirmed by testing I would really appreciate it. I > >> >> really don't want to send out a bug fix that accidentally breaks > >> >> userspace again. > >> >> > >> >> The only issue sort of under discussion is if there is a better name for > >> >> /proc/<pid>/setgroups, and the name of the file will not affect the > >> >> functionality of the patchset. > >> >> > >> >> With the code reviewed and written in simple obviously correct, easily > >> >> reviewable ways I am hoping/planning to send this to Linus ASAP. > >> >> > >> >> Eric > >> > > >> > Is there a git tree we can clone? > >> > >> Have either of you been able to check to see if any of my changes > >> affects lxc? > >> > >> I am trying to gauge how hard and how fast I should push to Linus. lxc > >> being the largest adopter of unprivileged user namespaces for general > >> purpose containers. > >> > >> I expect you just call newuidmap and newgidmap and don't actually care > >> about not being able to set gid_map without privilege. But I really > >> want to avoid pushing a security fix and then being surprised that > >> things like lxc break. > >> > >> Eric > > > > Hi Eric, > > > > I've unfortunately been pretty busy this week as I was (well, still am) > > travelling to South Africa for a meeting. I don't have a full kernel > > tree around here and a full git clone isn't really doable over the kind > > of Internet I've got here :) > > > > Hopefully Serge can give it a quick try, otherwise I should be able to > > do some tests on Tuesday when I'm back home. > > I thought Serge was going to but I haven't heard yet so I am prodding ;-) Ok, thanks - yes, unprivileged lxc is working fine with your kernels. Just to be sure I was testing the right thing I also tested using my unprivileged nsexec testcases, and they failed on setgroup/setgid as now expected, and succeeded there without your patches. thanks, -serge ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-15 19:38 ` Serge Hallyn @ 2014-12-15 20:11 ` Eric W. Biederman 2014-12-15 20:49 ` Serge Hallyn 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-15 20:11 UTC (permalink / raw) To: Serge Hallyn Cc: Stéphane Graber, Richard Weinberger, Andy Lutomirski, linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett, stable, linux-kernel, Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton Serge Hallyn <serge.hallyn@ubuntu.com> writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> Stéphane Graber <stgraber@ubuntu.com> writes: >> >> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote: >> >> Serge Hallyn <serge.hallyn@ubuntu.com> writes: >> >> >> >> > Quoting Eric W. Biederman (ebiederm@xmission.com): >> >> >> >> >> >> Will people please test these patches with their container project? >> >> >> >> >> >> These changes break container userspace (hopefully in a minimal way) if >> >> >> I could have that confirmed by testing I would really appreciate it. I >> >> >> really don't want to send out a bug fix that accidentally breaks >> >> >> userspace again. >> >> >> >> >> >> The only issue sort of under discussion is if there is a better name for >> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the >> >> >> functionality of the patchset. >> >> >> >> >> >> With the code reviewed and written in simple obviously correct, easily >> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP. >> >> >> >> >> >> Eric >> >> > >> >> > Is there a git tree we can clone? >> >> >> >> Have either of you been able to check to see if any of my changes >> >> affects lxc? >> >> >> >> I am trying to gauge how hard and how fast I should push to Linus. lxc >> >> being the largest adopter of unprivileged user namespaces for general >> >> purpose containers. >> >> >> >> I expect you just call newuidmap and newgidmap and don't actually care >> >> about not being able to set gid_map without privilege. But I really >> >> want to avoid pushing a security fix and then being surprised that >> >> things like lxc break. >> >> >> >> Eric >> > >> > Hi Eric, >> > >> > I've unfortunately been pretty busy this week as I was (well, still am) >> > travelling to South Africa for a meeting. I don't have a full kernel >> > tree around here and a full git clone isn't really doable over the kind >> > of Internet I've got here :) >> > >> > Hopefully Serge can give it a quick try, otherwise I should be able to >> > do some tests on Tuesday when I'm back home. >> >> I thought Serge was going to but I haven't heard yet so I am prodding ;-) > > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. Thanks. Serge unless you object will add your Tested-By to my pull message to Linus. Minor question do you runprivileged nsexec test cases test to see if the write to gid_map succeeds? I would have expected the gid_map write to fail before the setgroups setgid system calls came into play. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-15 20:11 ` Eric W. Biederman @ 2014-12-15 20:49 ` Serge Hallyn 0 siblings, 0 replies; 79+ messages in thread From: Serge Hallyn @ 2014-12-15 20:49 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-man, Kees Cook, Richard Weinberger, Linux Containers, Josh Triplett, stable, Andy Lutomirski, Kenton Varda, LSM, Michael Kerrisk-manpages, Linux API, Casey Schaufler, Andrew Morton, linux-kernel Quoting Eric W. Biederman (ebiederm@xmission.com): > Serge Hallyn <serge.hallyn@ubuntu.com> writes: > > > Quoting Eric W. Biederman (ebiederm@xmission.com): > >> Stéphane Graber <stgraber@ubuntu.com> writes: > >> > >> > On Fri, Dec 12, 2014 at 03:38:18PM -0600, Eric W. Biederman wrote: > >> >> Serge Hallyn <serge.hallyn@ubuntu.com> writes: > >> >> > >> >> > Quoting Eric W. Biederman (ebiederm@xmission.com): > >> >> >> > >> >> >> Will people please test these patches with their container project? > >> >> >> > >> >> >> These changes break container userspace (hopefully in a minimal way) if > >> >> >> I could have that confirmed by testing I would really appreciate it. I > >> >> >> really don't want to send out a bug fix that accidentally breaks > >> >> >> userspace again. > >> >> >> > >> >> >> The only issue sort of under discussion is if there is a better name for > >> >> >> /proc/<pid>/setgroups, and the name of the file will not affect the > >> >> >> functionality of the patchset. > >> >> >> > >> >> >> With the code reviewed and written in simple obviously correct, easily > >> >> >> reviewable ways I am hoping/planning to send this to Linus ASAP. > >> >> >> > >> >> >> Eric > >> >> > > >> >> > Is there a git tree we can clone? > >> >> > >> >> Have either of you been able to check to see if any of my changes > >> >> affects lxc? > >> >> > >> >> I am trying to gauge how hard and how fast I should push to Linus. lxc > >> >> being the largest adopter of unprivileged user namespaces for general > >> >> purpose containers. > >> >> > >> >> I expect you just call newuidmap and newgidmap and don't actually care > >> >> about not being able to set gid_map without privilege. But I really > >> >> want to avoid pushing a security fix and then being surprised that > >> >> things like lxc break. > >> >> > >> >> Eric > >> > > >> > Hi Eric, > >> > > >> > I've unfortunately been pretty busy this week as I was (well, still am) > >> > travelling to South Africa for a meeting. I don't have a full kernel > >> > tree around here and a full git clone isn't really doable over the kind > >> > of Internet I've got here :) > >> > > >> > Hopefully Serge can give it a quick try, otherwise I should be able to > >> > do some tests on Tuesday when I'm back home. > >> > >> I thought Serge was going to but I haven't heard yet so I am prodding ;-) > > > > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > > Just to be sure I was testing the right thing I also tested using > > my unprivileged nsexec testcases, and they failed on setgroup/setgid > > as now expected, and succeeded there without your patches. > > Thanks. > > Serge unless you object will add your Tested-By to my pull message to Linus. Sounds good. > Minor question do you runprivileged nsexec test cases test to see if the > write to gid_map succeeds? I would have expected the gid_map write to > fail before the setgroups setgid system calls came into play. Yes, I did that by hand, and it failed (with your kernel). -serge ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-10 16:39 ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman 2014-12-10 22:48 ` Serge Hallyn @ 2014-12-16 2:05 ` Andy Lutomirski 2014-12-16 9:23 ` Richard Weinberger 1 sibling, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-16 2:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Richard Weinberger, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Kenton Varda, stable On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Will people please test these patches with their container project? > > These changes break container userspace (hopefully in a minimal way) if > I could have that confirmed by testing I would really appreciate it. I > really don't want to send out a bug fix that accidentally breaks > userspace again. > > The only issue sort of under discussion is if there is a better name for > /proc/<pid>/setgroups, and the name of the file will not affect the > functionality of the patchset. > > With the code reviewed and written in simple obviously correct, easily > reviewable ways I am hoping/planning to send this to Linus ASAP. I tested this with Sandstorm. It breaks as is and it works if I add the setgroups thing. Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :( I still don't like the name "setgroups". --Andy > > Eric -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT] Can I get some Tested-By's on this series? 2014-12-16 2:05 ` Andy Lutomirski @ 2014-12-16 9:23 ` Richard Weinberger 0 siblings, 0 replies; 79+ messages in thread From: Richard Weinberger @ 2014-12-16 9:23 UTC (permalink / raw) To: Andy Lutomirski, Eric W. Biederman Cc: Serge E. Hallyn, Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Kenton Varda, stable Am 16.12.2014 um 03:05 schrieb Andy Lutomirski: > On Wed, Dec 10, 2014 at 8:39 AM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> Will people please test these patches with their container project? >> >> These changes break container userspace (hopefully in a minimal way) if >> I could have that confirmed by testing I would really appreciate it. I >> really don't want to send out a bug fix that accidentally breaks >> userspace again. >> >> The only issue sort of under discussion is if there is a better name for >> /proc/<pid>/setgroups, and the name of the file will not affect the >> functionality of the patchset. >> >> With the code reviewed and written in simple obviously correct, easily >> reviewable ways I am hoping/planning to send this to Linus ASAP. > > > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :( > > I still don't like the name "setgroups". I agree that the name is not optimal. But I don't have a counterproposal as my naming skills are miserable. Thanks, //richard ^ permalink raw reply [flat|nested] 79+ messages in thread
* [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman ` (4 preceding siblings ...) 2014-12-08 22:11 ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-08 22:14 ` Eric W. Biederman 2014-12-08 22:26 ` Andy Lutomirski 5 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-08 22:14 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivilege setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling of setgroups before they run. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/user_namespace.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 3d128f91ced3..459c7f647072 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file, kuid_t uid = make_kuid(ns->parent, id); if (uid_eq(uid, cred->euid)) return true; + } else if (cap_setid == CAP_SETGID) { + kgid_t gid = make_kgid(ns->parent, id); + if (!userns_setgroups_allowed(ns) && + gid_eq(gid, cred->egid)) + return true; } } -- 1.9.1 ^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled 2014-12-08 22:14 ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman @ 2014-12-08 22:26 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-08 22:26 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Mon, Dec 8, 2014 at 2:14 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Now that setgroups can be disabled and not reenabled, setting gid_map > without privielge can now be enabled when setgroups is disabled. > > This restores most of the functionality that was lost when unprivilege unprivileged. > setting of gid_map was removed. Applications that use this > functionality will need to check to see if they use setgroups or > init_groups, and if they don't they can be fixed by simply > disabling of setgroups before they run. "disabling setgroups before writing to gid_map"? The code is: Reviewed-by: Andy Lutomirski <luto@amacapital.net> > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > kernel/user_namespace.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 3d128f91ced3..459c7f647072 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file, > kuid_t uid = make_kuid(ns->parent, id); > if (uid_eq(uid, cred->euid)) > return true; > + } else if (cap_setid == CAP_SETGID) { > + kgid_t gid = make_kgid(ns->parent, id); > + if (!userns_setgroups_allowed(ns) && > + gid_eq(gid, cred->egid)) > + return true; > } > } > > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 20:25 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman 2014-12-02 20:28 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman @ 2014-12-02 20:58 ` Andy Lutomirski 2014-12-02 21:26 ` Eric W. Biederman 1 sibling, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 20:58 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Classic unix permission checks have an interesting feature, the group > permissions for a file can be set to less than the other permissions > on a file. Occassionally this is used deliberately to give a certain > group of users fewer permissions than the default. > > Overlooking negative groups has resulted in the permission checks for > setting up a group mapping in a user namespace to be too lax. Tighten > the permission checks in new_idmap_permitted to ensure that mapping > uids and gids into user namespaces without privilege will not result > in new combinations of credentials being available to the users. > > When setting mappings without privilege only the creator of the user > namespace is interesting as all other users that have CAP_SETUID over > the user namespace will also have CAP_SETUID over the user namespaces > parent. So the scope of the unprivileged check is reduced to just > the case where cred->euid is the namespace creator. > > For setting a uid mapping without privilege only euid is considered as > setresuid can set uid, suid and fsuid from euid without privielege > making any combination of uids possible with user namespaces already > possible without them. > > For now seeting a gid mapping without privilege is removed. The only > possible set of credentials that would be safe without a gid mapping > (egid without any supplementary groups) just doesn't happen in practice > so would simply lead to unused untested code. > > setgroups is modified to fail not only when the group ids do not > map but also when there are no gid mappings at all, preventing > setgroups(0, NULL) from succeeding when gid mappings have not been > established. > > For a small class of applications this change breaks userspace > and removes useful functionality. This small class of applications > includes tools/testing/selftests/mount/unprivileged-remount-test.c > > Most of the removed functionality will be added back with the > addition of a one way knob to disable setgroups. Once setgroups > is disabled setting the gid_map becomes as safe as setting the uid_map. > > For more common applications that set the uid_map and the gid_map with > privilege this change will have no effect on them. > > This should fix CVE-2014-8989. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > > +static inline bool gid_mapping_possible(const struct user_namespace *ns) > +{ > + return ns->gid_map.nr_extents != 0; > +} > + Can you rename this to userns_may_setgroups or something like that? To me, gid_mapping_possible sounds like you're allowed to map gids, which sounds like the opposite condition, and it doesn't explain what the point is. > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index aa312b0dc3ec..51d65b444951 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, > struct user_namespace *ns, int cap_setid, > struct uid_gid_map *new_map) > { > - /* Allow mapping to your own filesystem ids */ > - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { > + const struct cred *cred = file->f_cred; > + > + /* Allow a mapping without capabilities when allowing the root > + * of the user namespace capabilities restricted to that id > + * will not change the set of credentials available to that > + * user. > + */ > + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && > + uid_eq(ns->owner, cred->euid)) { What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) goto out; (except that I don't know why cap_valid(cap_setid) is checked -- this ought to be enforced for projid_map, too, right?) > u32 id = new_map->extent[0].lower_first; > if (cap_setid == CAP_SETUID) { > kuid_t uid = make_kuid(ns->parent, id); > - if (uid_eq(uid, file->f_cred->fsuid)) > - return true; > - } else if (cap_setid == CAP_SETGID) { > - kgid_t gid = make_kgid(ns->parent, id); > - if (gid_eq(gid, file->f_cred->fsgid)) > + if (uid_eq(uid, cred->euid)) Why'd you change this from fsuid to euid? --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 20:58 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski @ 2014-12-02 21:26 ` Eric W. Biederman 2014-12-02 22:09 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 21:26 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> Classic unix permission checks have an interesting feature, the group >> permissions for a file can be set to less than the other permissions >> on a file. Occassionally this is used deliberately to give a certain >> group of users fewer permissions than the default. >> >> Overlooking negative groups has resulted in the permission checks for >> setting up a group mapping in a user namespace to be too lax. Tighten >> the permission checks in new_idmap_permitted to ensure that mapping >> uids and gids into user namespaces without privilege will not result >> in new combinations of credentials being available to the users. >> >> When setting mappings without privilege only the creator of the user >> namespace is interesting as all other users that have CAP_SETUID over >> the user namespace will also have CAP_SETUID over the user namespaces >> parent. So the scope of the unprivileged check is reduced to just >> the case where cred->euid is the namespace creator. >> >> For setting a uid mapping without privilege only euid is considered as >> setresuid can set uid, suid and fsuid from euid without privielege >> making any combination of uids possible with user namespaces already >> possible without them. >> >> For now seeting a gid mapping without privilege is removed. The only >> possible set of credentials that would be safe without a gid mapping >> (egid without any supplementary groups) just doesn't happen in practice >> so would simply lead to unused untested code. >> >> setgroups is modified to fail not only when the group ids do not >> map but also when there are no gid mappings at all, preventing >> setgroups(0, NULL) from succeeding when gid mappings have not been >> established. >> >> For a small class of applications this change breaks userspace >> and removes useful functionality. This small class of applications >> includes tools/testing/selftests/mount/unprivileged-remount-test.c >> >> Most of the removed functionality will be added back with the >> addition of a one way knob to disable setgroups. Once setgroups >> is disabled setting the gid_map becomes as safe as setting the uid_map. >> >> For more common applications that set the uid_map and the gid_map with >> privilege this change will have no effect on them. >> >> This should fix CVE-2014-8989. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- > >> >> +static inline bool gid_mapping_possible(const struct user_namespace *ns) >> +{ >> + return ns->gid_map.nr_extents != 0; >> +} >> + > > Can you rename this to userns_may_setgroups or something like that? > To me, gid_mapping_possible sounds like you're allowed to map gids, > which sounds like the opposite condition, and it doesn't explain what > the point is. gid_mapping_established? What I mean to be testing if is if from_kgid and make_kgid will work because the gid mappings have been set. The userns knob for setgroups is a different test and is added in the next patch. And yes we really need both or the knob can start out as on, and we need to provent setgroups(0, NULL) before the user namespace is unshared. Although come to think about it probably makes sense to roll those two test into one function and call that inline function from the setgroups implementation. Anyway I will think about it and see what I can do to make it easily comprehensible. >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >> index aa312b0dc3ec..51d65b444951 100644 >> --- a/kernel/user_namespace.c >> +++ b/kernel/user_namespace.c >> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, >> struct user_namespace *ns, int cap_setid, >> struct uid_gid_map *new_map) >> { >> - /* Allow mapping to your own filesystem ids */ >> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { >> + const struct cred *cred = file->f_cred; >> + >> + /* Allow a mapping without capabilities when allowing the root >> + * of the user namespace capabilities restricted to that id >> + * will not change the set of credentials available to that >> + * user. >> + */ >> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && >> + uid_eq(ns->owner, cred->euid)) { > > What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: This means that the only user we attempt to set up unprivileged mappings for is the owner of the user namespace. Anyone else should already have capabilities in the parent user namespace or shouldn't be able to set the mapping at all. In practice it is a clarification to make it easier to think about the code. > if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) > goto out; > > (except that I don't know why cap_valid(cap_setid) is checked -- this > ought to be enforced for projid_map, too, right?) What to do with projid_map is entirely different discussion. In practice it is dead, and either XFS needs to be fixed to use it or that code needs to be removed. At the time I wrote it XFS did not require any privileges to set project ids. >> u32 id = new_map->extent[0].lower_first; >> if (cap_setid == CAP_SETUID) { >> kuid_t uid = make_kuid(ns->parent, id); >> - if (uid_eq(uid, file->f_cred->fsuid)) >> - return true; >> - } else if (cap_setid == CAP_SETGID) { >> - kgid_t gid = make_kgid(ns->parent, id); >> - if (gid_eq(gid, file->f_cred->fsgid)) >> + if (uid_eq(uid, cred->euid)) > > Why'd you change this from fsuid to euid? Because strangely enough I can set euid to any other uid with setresuid, but the same does not hold with fsuid. So strictly speaking fsuid was actually wrong before. In practice fsuid == euid so I don't think anyone will care. But I want very much to enforce the rule that user namespaces can't give you any credentials you couldn't get otherwise. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 21:26 ` Eric W. Biederman @ 2014-12-02 22:09 ` Andy Lutomirski 2014-12-02 22:48 ` Eric W. Biederman 0 siblings, 1 reply; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 22:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: >>> >>> Classic unix permission checks have an interesting feature, the group >>> permissions for a file can be set to less than the other permissions >>> on a file. Occassionally this is used deliberately to give a certain >>> group of users fewer permissions than the default. >>> >>> Overlooking negative groups has resulted in the permission checks for >>> setting up a group mapping in a user namespace to be too lax. Tighten >>> the permission checks in new_idmap_permitted to ensure that mapping >>> uids and gids into user namespaces without privilege will not result >>> in new combinations of credentials being available to the users. >>> >>> When setting mappings without privilege only the creator of the user >>> namespace is interesting as all other users that have CAP_SETUID over >>> the user namespace will also have CAP_SETUID over the user namespaces >>> parent. So the scope of the unprivileged check is reduced to just >>> the case where cred->euid is the namespace creator. >>> >>> For setting a uid mapping without privilege only euid is considered as >>> setresuid can set uid, suid and fsuid from euid without privielege >>> making any combination of uids possible with user namespaces already >>> possible without them. >>> >>> For now seeting a gid mapping without privilege is removed. The only >>> possible set of credentials that would be safe without a gid mapping >>> (egid without any supplementary groups) just doesn't happen in practice >>> so would simply lead to unused untested code. >>> >>> setgroups is modified to fail not only when the group ids do not >>> map but also when there are no gid mappings at all, preventing >>> setgroups(0, NULL) from succeeding when gid mappings have not been >>> established. >>> >>> For a small class of applications this change breaks userspace >>> and removes useful functionality. This small class of applications >>> includes tools/testing/selftests/mount/unprivileged-remount-test.c >>> >>> Most of the removed functionality will be added back with the >>> addition of a one way knob to disable setgroups. Once setgroups >>> is disabled setting the gid_map becomes as safe as setting the uid_map. >>> >>> For more common applications that set the uid_map and the gid_map with >>> privilege this change will have no effect on them. >>> >>> This should fix CVE-2014-8989. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>> --- >> >>> >>> +static inline bool gid_mapping_possible(const struct user_namespace *ns) >>> +{ >>> + return ns->gid_map.nr_extents != 0; >>> +} >>> + >> >> Can you rename this to userns_may_setgroups or something like that? >> To me, gid_mapping_possible sounds like you're allowed to map gids, >> which sounds like the opposite condition, and it doesn't explain what >> the point is. > > gid_mapping_established? > > What I mean to be testing if is if from_kgid and make_kgid will work > because the gid mappings have been set. But why do you care whether from_kgid and make_kgid will work? If they fail, then they fail. I think that the point is that you're checking whether allowing setgroups to drop groups is safe, and that's only barely the same condition. > > The userns knob for setgroups is a different test and is added > in the next patch. And yes we really need both or the knob can > start out as on, and we need to provent setgroups(0, NULL) > before the user namespace is unshared. Do you mean before it's mapped? > > Although come to think about it probably makes sense to roll those two > test into one function and call that inline function from the setgroups > implementation. That's what I think, too. > > Anyway I will think about it and see what I can do to make it easily > comprehensible. > >>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >>> index aa312b0dc3ec..51d65b444951 100644 >>> --- a/kernel/user_namespace.c >>> +++ b/kernel/user_namespace.c >>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, >>> struct user_namespace *ns, int cap_setid, >>> struct uid_gid_map *new_map) >>> { >>> - /* Allow mapping to your own filesystem ids */ >>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { >>> + const struct cred *cred = file->f_cred; >>> + >>> + /* Allow a mapping without capabilities when allowing the root >>> + * of the user namespace capabilities restricted to that id >>> + * will not change the set of credentials available to that >>> + * user. >>> + */ >>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && >>> + uid_eq(ns->owner, cred->euid)) { >> >> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: > > This means that the only user we attempt to set up unprivileged mappings > for is the owner of the user namespace. Anyone else should already > have capabilities in the parent user namespace or shouldn't be able to > set the mapping at all. > > In practice it is a clarification to make it easier to think about the code. But why? I don't see why this check is necessary or why it's relevant to the current issue. > >> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) >> goto out; >> >> (except that I don't know why cap_valid(cap_setid) is checked -- this >> ought to be enforced for projid_map, too, right?) > > What to do with projid_map is entirely different discussion. In > practice it is dead, and either XFS needs to be fixed to use it > or that code needs to be removed. At the time I wrote it XFS > did not require any privileges to set project ids. > >>> u32 id = new_map->extent[0].lower_first; >>> if (cap_setid == CAP_SETUID) { >>> kuid_t uid = make_kuid(ns->parent, id); >>> - if (uid_eq(uid, file->f_cred->fsuid)) >>> - return true; >>> - } else if (cap_setid == CAP_SETGID) { >>> - kgid_t gid = make_kgid(ns->parent, id); >>> - if (gid_eq(gid, file->f_cred->fsgid)) >>> + if (uid_eq(uid, cred->euid)) >> >> Why'd you change this from fsuid to euid? > > Because strangely enough I can set euid to any other uid with > setresuid, but the same does not hold with fsuid. > > So strictly speaking fsuid was actually wrong before. In practice > fsuid == euid so I don't think anyone will care. But I want very much > to enforce the rule that user namespaces can't give you any credentials > you couldn't get otherwise. Fair enough. Want to split that into a separate patch, then? --Andy > > Eric -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 22:09 ` Andy Lutomirski @ 2014-12-02 22:48 ` Eric W. Biederman 2014-12-02 22:56 ` Andy Lutomirski 0 siblings, 1 reply; 79+ messages in thread From: Eric W. Biederman @ 2014-12-02 22:48 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable Andy Lutomirski <luto@amacapital.net> writes: > On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman >>> <ebiederm@xmission.com> wrote: >>>> >>>> Classic unix permission checks have an interesting feature, the group >>>> permissions for a file can be set to less than the other permissions >>>> on a file. Occassionally this is used deliberately to give a certain >>>> group of users fewer permissions than the default. >>>> >>>> Overlooking negative groups has resulted in the permission checks for >>>> setting up a group mapping in a user namespace to be too lax. Tighten >>>> the permission checks in new_idmap_permitted to ensure that mapping >>>> uids and gids into user namespaces without privilege will not result >>>> in new combinations of credentials being available to the users. >>>> >>>> When setting mappings without privilege only the creator of the user >>>> namespace is interesting as all other users that have CAP_SETUID over >>>> the user namespace will also have CAP_SETUID over the user namespaces >>>> parent. So the scope of the unprivileged check is reduced to just >>>> the case where cred->euid is the namespace creator. >>>> >>>> For setting a uid mapping without privilege only euid is considered as >>>> setresuid can set uid, suid and fsuid from euid without privielege >>>> making any combination of uids possible with user namespaces already >>>> possible without them. >>>> >>>> For now seeting a gid mapping without privilege is removed. The only >>>> possible set of credentials that would be safe without a gid mapping >>>> (egid without any supplementary groups) just doesn't happen in practice >>>> so would simply lead to unused untested code. >>>> >>>> setgroups is modified to fail not only when the group ids do not >>>> map but also when there are no gid mappings at all, preventing >>>> setgroups(0, NULL) from succeeding when gid mappings have not been >>>> established. >>>> >>>> For a small class of applications this change breaks userspace >>>> and removes useful functionality. This small class of applications >>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c >>>> >>>> Most of the removed functionality will be added back with the >>>> addition of a one way knob to disable setgroups. Once setgroups >>>> is disabled setting the gid_map becomes as safe as setting the uid_map. >>>> >>>> For more common applications that set the uid_map and the gid_map with >>>> privilege this change will have no effect on them. >>>> >>>> This should fix CVE-2014-8989. >>>> >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>> --- >>> >>>> >>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns) >>>> +{ >>>> + return ns->gid_map.nr_extents != 0; >>>> +} >>>> + >>> >>> Can you rename this to userns_may_setgroups or something like that? >>> To me, gid_mapping_possible sounds like you're allowed to map gids, >>> which sounds like the opposite condition, and it doesn't explain what >>> the point is. >> >> gid_mapping_established? >> >> What I mean to be testing if is if from_kgid and make_kgid will work >> because the gid mappings have been set. > > But why do you care whether from_kgid and make_kgid will work? If > they fail, then they fail. I think that the point is that you're > checking whether allowing setgroups to drop groups is safe, and that's > only barely the same condition. For all of the system calls to set or change uids and gids except setgroups it happens to fall out that if there are no mappings set the system calls fail. That is and was deliberate. However setgroups is weird because it allows the case of 0 mappings and to maintain the constraint that it fails when there are no mapping set (just like everything else) that requires an additional test. >> The userns knob for setgroups is a different test and is added >> in the next patch. And yes we really need both or the knob can >> start out as on, and we need to provent setgroups(0, NULL) >> before the user namespace is unshared. > > Do you mean before it's mapped? Right we need to prevent setgroups(0, NULL) before we set the gid mapping. >> Although come to think about it probably makes sense to roll those two >> test into one function and call that inline function from the setgroups >> implementation. > > That's what I think, too. > >> >> Anyway I will think about it and see what I can do to make it easily >> comprehensible. >> >>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >>>> index aa312b0dc3ec..51d65b444951 100644 >>>> --- a/kernel/user_namespace.c >>>> +++ b/kernel/user_namespace.c >>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, >>>> struct user_namespace *ns, int cap_setid, >>>> struct uid_gid_map *new_map) >>>> { >>>> - /* Allow mapping to your own filesystem ids */ >>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { >>>> + const struct cred *cred = file->f_cred; >>>> + >>>> + /* Allow a mapping without capabilities when allowing the root >>>> + * of the user namespace capabilities restricted to that id >>>> + * will not change the set of credentials available to that >>>> + * user. >>>> + */ >>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && >>>> + uid_eq(ns->owner, cred->euid)) { >>> >>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: >> >> This means that the only user we attempt to set up unprivileged mappings >> for is the owner of the user namespace. Anyone else should already >> have capabilities in the parent user namespace or shouldn't be able to >> set the mapping at all. >> >> In practice it is a clarification to make it easier to think about the code. > > But why? I don't see why this check is necessary or why it's relevant > to the current issue. My goal in this check is to guarantee that any combination of uids and gids in struct cred that you can obtain with mappings and a user namespace you can also obtain without privilege without a user namespace. What limiting euid to ns->owner does is it guarantees that when a user namespace is created without privilege root doesn't come along and set the mapping using the unprivileged path. That is confusing to think about and it is not necessary to support. With ns->owner == euid I have the guarantee especially with the gids that they wind up paired with a uid in struct cred that came from the same user. Either that or someone set one of the mappings with privilege. With ns->owner == euid I can verify all of these things pretty trivially. Without that check I don't have a clue how to verify the pairing between uids and gids in the unprivileged mapping. Does that make things clearer? >>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) >>> goto out; >>> >>> (except that I don't know why cap_valid(cap_setid) is checked -- this >>> ought to be enforced for projid_map, too, right?) >> >> What to do with projid_map is entirely different discussion. In >> practice it is dead, and either XFS needs to be fixed to use it >> or that code needs to be removed. At the time I wrote it XFS >> did not require any privileges to set project ids. >> >>>> u32 id = new_map->extent[0].lower_first; >>>> if (cap_setid == CAP_SETUID) { >>>> kuid_t uid = make_kuid(ns->parent, id); >>>> - if (uid_eq(uid, file->f_cred->fsuid)) >>>> - return true; >>>> - } else if (cap_setid == CAP_SETGID) { >>>> - kgid_t gid = make_kgid(ns->parent, id); >>>> - if (gid_eq(gid, file->f_cred->fsgid)) >>>> + if (uid_eq(uid, cred->euid)) >>> >>> Why'd you change this from fsuid to euid? >> >> Because strangely enough I can set euid to any other uid with >> setresuid, but the same does not hold with fsuid. >> >> So strictly speaking fsuid was actually wrong before. In practice >> fsuid == euid so I don't think anyone will care. But I want very much >> to enforce the rule that user namespaces can't give you any credentials >> you couldn't get otherwise. > > Fair enough. Want to split that into a separate patch, then? Strictly speaking it is part and parcel of the same thing but it probably makes sense to split it out and emphasise and explain the change. Eric ^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [CFT][PATCH 1/3] userns: Avoid problems with negative groups 2014-12-02 22:48 ` Eric W. Biederman @ 2014-12-02 22:56 ` Andy Lutomirski 0 siblings, 0 replies; 79+ messages in thread From: Andy Lutomirski @ 2014-12-02 22:56 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man, linux-kernel, LSM, Casey Schaufler, Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable On Tue, Dec 2, 2014 at 2:48 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> Andy Lutomirski <luto@amacapital.net> writes: >>> >>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman >>>> <ebiederm@xmission.com> wrote: >>>>> >>>>> Classic unix permission checks have an interesting feature, the group >>>>> permissions for a file can be set to less than the other permissions >>>>> on a file. Occassionally this is used deliberately to give a certain >>>>> group of users fewer permissions than the default. >>>>> >>>>> Overlooking negative groups has resulted in the permission checks for >>>>> setting up a group mapping in a user namespace to be too lax. Tighten >>>>> the permission checks in new_idmap_permitted to ensure that mapping >>>>> uids and gids into user namespaces without privilege will not result >>>>> in new combinations of credentials being available to the users. >>>>> >>>>> When setting mappings without privilege only the creator of the user >>>>> namespace is interesting as all other users that have CAP_SETUID over >>>>> the user namespace will also have CAP_SETUID over the user namespaces >>>>> parent. So the scope of the unprivileged check is reduced to just >>>>> the case where cred->euid is the namespace creator. >>>>> >>>>> For setting a uid mapping without privilege only euid is considered as >>>>> setresuid can set uid, suid and fsuid from euid without privielege >>>>> making any combination of uids possible with user namespaces already >>>>> possible without them. >>>>> >>>>> For now seeting a gid mapping without privilege is removed. The only >>>>> possible set of credentials that would be safe without a gid mapping >>>>> (egid without any supplementary groups) just doesn't happen in practice >>>>> so would simply lead to unused untested code. >>>>> >>>>> setgroups is modified to fail not only when the group ids do not >>>>> map but also when there are no gid mappings at all, preventing >>>>> setgroups(0, NULL) from succeeding when gid mappings have not been >>>>> established. >>>>> >>>>> For a small class of applications this change breaks userspace >>>>> and removes useful functionality. This small class of applications >>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c >>>>> >>>>> Most of the removed functionality will be added back with the >>>>> addition of a one way knob to disable setgroups. Once setgroups >>>>> is disabled setting the gid_map becomes as safe as setting the uid_map. >>>>> >>>>> For more common applications that set the uid_map and the gid_map with >>>>> privilege this change will have no effect on them. >>>>> >>>>> This should fix CVE-2014-8989. >>>>> >>>>> Cc: stable@vger.kernel.org >>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>>> --- >>>> >>>>> >>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns) >>>>> +{ >>>>> + return ns->gid_map.nr_extents != 0; >>>>> +} >>>>> + >>>> >>>> Can you rename this to userns_may_setgroups or something like that? >>>> To me, gid_mapping_possible sounds like you're allowed to map gids, >>>> which sounds like the opposite condition, and it doesn't explain what >>>> the point is. >>> >>> gid_mapping_established? >>> >>> What I mean to be testing if is if from_kgid and make_kgid will work >>> because the gid mappings have been set. >> >> But why do you care whether from_kgid and make_kgid will work? If >> they fail, then they fail. I think that the point is that you're >> checking whether allowing setgroups to drop groups is safe, and that's >> only barely the same condition. > > For all of the system calls to set or change uids and gids except > setgroups it happens to fall out that if there are no mappings set the > system calls fail. That is and was deliberate. However setgroups is > weird because it allows the case of 0 mappings and to maintain the > constraint that it fails when there are no mapping set (just like > everything else) that requires an additional test. > >>> The userns knob for setgroups is a different test and is added >>> in the next patch. And yes we really need both or the knob can >>> start out as on, and we need to provent setgroups(0, NULL) >>> before the user namespace is unshared. >> >> Do you mean before it's mapped? > > Right we need to prevent setgroups(0, NULL) before we set the gid > mapping. Fair enough. If you factor this into a separate inline helper, it might be worth adding a short comment to that effect. It could be as simple as: static inline bool whatever(whatever) { if (mapping is empty) return false; /* setgroups with a nonempty set requires a mapping; make sure that setgroups(0, NULL) does, too. */ ...; } > >>> Although come to think about it probably makes sense to roll those two >>> test into one function and call that inline function from the setgroups >>> implementation. >> >> That's what I think, too. >> >>> >>> Anyway I will think about it and see what I can do to make it easily >>> comprehensible. >>> >>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >>>>> index aa312b0dc3ec..51d65b444951 100644 >>>>> --- a/kernel/user_namespace.c >>>>> +++ b/kernel/user_namespace.c >>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, >>>>> struct user_namespace *ns, int cap_setid, >>>>> struct uid_gid_map *new_map) >>>>> { >>>>> - /* Allow mapping to your own filesystem ids */ >>>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { >>>>> + const struct cred *cred = file->f_cred; >>>>> + >>>>> + /* Allow a mapping without capabilities when allowing the root >>>>> + * of the user namespace capabilities restricted to that id >>>>> + * will not change the set of credentials available to that >>>>> + * user. >>>>> + */ >>>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && >>>>> + uid_eq(ns->owner, cred->euid)) { >>>> >>>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: >>> >>> This means that the only user we attempt to set up unprivileged mappings >>> for is the owner of the user namespace. Anyone else should already >>> have capabilities in the parent user namespace or shouldn't be able to >>> set the mapping at all. >>> >>> In practice it is a clarification to make it easier to think about the code. >> >> But why? I don't see why this check is necessary or why it's relevant >> to the current issue. > > My goal in this check is to guarantee that any combination of uids and > gids in struct cred that you can obtain with mappings and a user > namespace you can also obtain without privilege without a user > namespace. > > What limiting euid to ns->owner does is it guarantees that when a user > namespace is created without privilege root doesn't come along and set > the mapping using the unprivileged path. That is confusing to think > about and it is not necessary to support. > > With ns->owner == euid I have the guarantee especially with the gids > that they wind up paired with a uid in struct cred that came from the > same user. Either that or someone set one of the mappings with > privilege. > > With ns->owner == euid I can verify all of these things pretty > trivially. Without that check I don't have a clue how to verify > the pairing between uids and gids in the unprivileged mapping. > > Does that make things clearer? Yes. Thanks. It might pay to try to improve the comment. I understand it with this explanation but I didn't when I just read the comment. > >>>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) >>>> goto out; >>>> >>>> (except that I don't know why cap_valid(cap_setid) is checked -- this >>>> ought to be enforced for projid_map, too, right?) >>> >>> What to do with projid_map is entirely different discussion. In >>> practice it is dead, and either XFS needs to be fixed to use it >>> or that code needs to be removed. At the time I wrote it XFS >>> did not require any privileges to set project ids. >>> >>>>> u32 id = new_map->extent[0].lower_first; >>>>> if (cap_setid == CAP_SETUID) { >>>>> kuid_t uid = make_kuid(ns->parent, id); >>>>> - if (uid_eq(uid, file->f_cred->fsuid)) >>>>> - return true; >>>>> - } else if (cap_setid == CAP_SETGID) { >>>>> - kgid_t gid = make_kgid(ns->parent, id); >>>>> - if (gid_eq(gid, file->f_cred->fsgid)) >>>>> + if (uid_eq(uid, cred->euid)) >>>> >>>> Why'd you change this from fsuid to euid? >>> >>> Because strangely enough I can set euid to any other uid with >>> setresuid, but the same does not hold with fsuid. >>> >>> So strictly speaking fsuid was actually wrong before. In practice >>> fsuid == euid so I don't think anyone will care. But I want very much >>> to enforce the rule that user namespaces can't give you any credentials >>> you couldn't get otherwise. >> >> Fair enough. Want to split that into a separate patch, then? > > Strictly speaking it is part and parcel of the same thing but it > probably makes sense to split it out and emphasise and explain the > change. Sounds good. Anyway, time to do a combination of Real Work (tm) and dealing with the fact that I found a whole family of vulnerabilities of as-yet-unknown severity in arch/x86 this morning. --Andy ^ permalink raw reply [flat|nested] 79+ messages in thread
end of thread, other threads:[~2015-03-04 14:00 UTC | newest] Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-29 17:26 [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged Andy Lutomirski 2014-12-02 12:09 ` Eric W. Biederman 2014-12-02 18:53 ` Andy Lutomirski 2014-12-02 19:45 ` Eric W. Biederman 2014-12-02 20:13 ` Andy Lutomirski 2014-12-02 20:25 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Eric W. Biederman 2014-12-02 20:28 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-02 20:30 ` [CFT][PATCH 3/3] userns: Unbreak the unprivileged remount tests Eric W. Biederman 2014-12-02 21:05 ` [CFT][PATCH 2/3] userns: Add a knob to disable setgroups on a per user namespace basis Andy Lutomirski 2014-12-02 21:45 ` Eric W. Biederman 2014-12-02 22:17 ` Andy Lutomirski 2014-12-02 23:07 ` Eric W. Biederman 2014-12-02 23:17 ` Andy Lutomirski 2014-12-08 22:06 ` [CFT][PATCH 1/7] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-08 22:07 ` [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman 2014-12-08 22:11 ` Andy Lutomirski [not found] ` <87h9x5ok0h.fsf@x220.int.ebiederm.org> 2014-12-08 22:33 ` Andy Lutomirski 2014-12-08 22:17 ` Richard Weinberger 2014-12-08 22:25 ` Andy Lutomirski 2014-12-08 22:27 ` Richard Weinberger [not found] ` <874mt5ojfh.fsf@x220.int.ebiederm.org> 2014-12-08 22:47 ` Andy Lutomirski 2014-12-08 22:07 ` [CFT][PATCH 3/7] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman 2014-12-08 22:08 ` [CFT][PATCH 4/7] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman 2014-12-08 22:12 ` Andy Lutomirski 2014-12-08 22:10 ` [CFT][PATCH 5/7] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman 2014-12-08 22:15 ` Andy Lutomirski 2014-12-08 22:11 ` [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-08 22:21 ` Andy Lutomirski 2014-12-08 22:44 ` Eric W. Biederman 2014-12-08 22:48 ` Andy Lutomirski 2014-12-08 23:30 ` Eric W. Biederman 2014-12-09 19:31 ` Eric W. Biederman 2014-12-09 20:36 ` [CFT][PATCH 1/8] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman 2014-12-09 20:38 ` [CFT][PATCH 2/8] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman 2014-12-09 22:49 ` Andy Lutomirski 2014-12-09 20:39 ` [CFT][PATCH 3/8] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman 2014-12-09 23:00 ` Andy Lutomirski 2014-12-09 20:39 ` [CFT][PATCH 4/8] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman 2014-12-09 20:41 ` [CFT][PATCH 5/8] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman 2014-12-09 20:41 ` [CFT][PATCH 6/8] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman 2014-12-09 22:49 ` Andy Lutomirski 2014-12-09 20:42 ` [CFT][PATCH 7/8] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman 2014-12-09 22:28 ` Andy Lutomirski [not found] ` <971ad3f6-90fd-4e3f-916c-8988af3c826d@email.android.com> 2014-12-10 0:21 ` Andy Lutomirski [not found] ` <87wq5zf83t.fsf@x220.int.ebiederm.org> [not found] ` <87iohh3c9c.fsf@x220.int.ebiederm.org> 2014-12-12 1:30 ` Andy Lutomirski [not found] ` <8761dh3b7k.fsf_-_@x220.int.ebiederm.org> [not found] ` <878uicy1r9.fsf_-_@x220.int.ebiederm.org> 2014-12-12 21:54 ` [PATCH 1/2] proc.5: Document /proc/[pid]/setgroups Eric W. Biederman 2015-02-02 15:36 ` Michael Kerrisk (man-pages) 2015-02-11 8:01 ` Michael Kerrisk (man-pages) 2015-02-11 13:51 ` Eric W. Biederman 2015-02-12 13:53 ` Michael Kerrisk (man-pages) 2015-02-21 7:57 ` Michael Kerrisk (man-pages) 2015-03-03 11:39 ` Michael Kerrisk (man-pages) 2014-12-12 21:54 ` [PATCH 2/2] user_namespaces.7: Update the documention to reflect the fixes for negative groups Eric W. Biederman 2015-02-02 15:37 ` Michael Kerrisk (man-pages) 2015-02-11 8:02 ` Michael Kerrisk (man-pages) 2015-02-11 14:01 ` Eric W. Biederman 2015-02-12 10:11 ` Michael Kerrisk (man-pages) 2015-02-02 21:31 ` Alban Crequy 2015-03-04 14:00 ` Michael Kerrisk (man-pages) 2014-12-09 20:43 ` [CFT][PATCH 8/8] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman 2014-12-10 16:39 ` [CFT] Can I get some Tested-By's on this series? Eric W. Biederman 2014-12-10 22:48 ` Serge Hallyn 2014-12-10 22:50 ` Richard Weinberger 2014-12-10 23:19 ` Eric W. Biederman 2014-12-11 19:27 ` Richard Weinberger 2014-12-12 6:56 ` Chen, Hanxiao 2014-12-13 22:31 ` serge [not found] ` <87lhmcy2et.fsf@x220.int.ebiederm.org> [not found] ` <20141212220840.GF22091@castiana.ipv6.teksavvy.com> [not found] ` <8761dgze56.fsf@x220.int.ebiederm.org> 2014-12-15 19:38 ` Serge Hallyn 2014-12-15 20:11 ` Eric W. Biederman 2014-12-15 20:49 ` Serge Hallyn 2014-12-16 2:05 ` Andy Lutomirski 2014-12-16 9:23 ` Richard Weinberger 2014-12-08 22:14 ` [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman 2014-12-08 22:26 ` Andy Lutomirski 2014-12-02 20:58 ` [CFT][PATCH 1/3] userns: Avoid problems with negative groups Andy Lutomirski 2014-12-02 21:26 ` Eric W. Biederman 2014-12-02 22:09 ` Andy Lutomirski 2014-12-02 22:48 ` Eric W. Biederman 2014-12-02 22:56 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).