All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups
@ 2021-05-10 13:00 Giuseppe Scrivano
  2021-05-10 13:00 ` [RFC PATCH 1/3] setgroups: " Giuseppe Scrivano
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-10 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: serge, dwalsh, christian.brauner, ebiederm

This series is based on some old patches I've been playing with some
years ago, but they were never sent to lkml as I was not sure about
their complexity/usefulness ratio.  It was recently reported by
another user that these patches are still useful[1] so I am submitting
the last version and see what other folks think about this feature.

Since the fix for CVE-2014-8989 in order to set a gids mapping for a
user namespace when the user namespace owner doesn't have CAP_SETGID
in its parent, it is necessary to first disable setgroups(2) through
/proc/PID/setgroups.

Setting up a user namespace with multiple IDs mapped into is usually
done through the privileged helpers newuidmap/newgidmap.
Since these helpers run either as setuid or with CAP_SET[U,G]ID file
capabilities, it is not necessary to disable setgroups(2) in the
created user namespace.  The user running in the user namespace can
use setgroups(2) and drop the additional groups that it had initially.

This is still an issue on systems where negative groups ACLs, i.e. the
group permissions are more restrictive than the entry for the other
categories, are used.  With such configuration, allowing setgroups(2)
would cause the same security vulnerability described by
CVE-2014-8989.

This patchset adds a new 'shadow' mode for the /proc/PID/setgroups
file.  It permits to safely enable setgroups also when negative groups
ACLs are used.
When the 'shadow' mode is written to /proc/PID/setgroups, the
current process groups are stored into the user namespace and they
will be silently added on each setgroups(2) call.  A child user
namespace won't be able to drop these groups anymore.

To fully take advantage of this feature, newgidmap will also need to
learn about the 'shadow' mode.  An idea is that when the system or the
user are using negative groups ACLs, then newgidmap needs to check
that /proc/PID/setgroups is set to 'deny' or 'shadow' before allowing a
mapping.  The configuration for negative groups ACLs can either be a
system wide or per user setting.

[1] https://lore.kernel.org/lkml/20210507133703.GB22450@mail.hallyn.com/T/#mc53e0fc80203b8209a8836d5861a267ce22c5c0f

Giuseppe Scrivano (3):
  setgroups: new mode 'shadow' for /proc/PID/setgroups
  getgroups: hide unknown groups
  proc: hide unknown groups in status

 fs/proc/array.c                |  12 +++-
 include/linux/cred.h           |   4 +-
 include/linux/user_namespace.h |  11 +++-
 kernel/groups.c                | 100 +++++++++++++++++++++------------
 kernel/uid16.c                 |  90 ++++++++++++++++++-----------
 kernel/user_namespace.c        |  34 +++++++++--
 6 files changed, 169 insertions(+), 82 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
@ 2021-05-10 13:00 ` Giuseppe Scrivano
  2021-05-15  1:51   ` Serge E. Hallyn
  2021-05-10 13:00 ` [RFC PATCH 2/3] getgroups: hide unknown groups Giuseppe Scrivano
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-10 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: serge, dwalsh, christian.brauner, ebiederm

add a new mode 'shadow' to the /proc/PID/setgroups file.

When the 'shadow' string is written to the file, the current process
groups are inherited from the process and stored into the user
namespace.  These groups will be silently added on each setgroups(2)
call.  A child user namespace won't be able to drop these groups
anymore.

It enables using setgroups(2) in user namespaces on systems where
negative ACLs are used.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---
 include/linux/cred.h           |  4 ++-
 include/linux/user_namespace.h | 11 +++++--
 kernel/groups.c                | 60 +++++++++++++++++++++++++---------
 kernel/uid16.c                 | 38 ++++++++++++++++-----
 kernel/user_namespace.c        | 34 +++++++++++++++----
 5 files changed, 114 insertions(+), 33 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 14971322e1a0..f3e631293532 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
 
 extern int set_current_groups(struct group_info *);
 extern void set_groups(struct cred *, struct group_info *);
-extern bool may_setgroups(void);
+extern bool may_setgroups(struct group_info **shadowed_groups);
+extern void add_shadowed_groups(struct group_info *group_info,
+				struct group_info *shadowed);
 extern void groups_sort(struct group_info *);
 #else
 static inline void groups_free(struct group_info *group_info)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 1d08dbbcfe32..cb003172de20 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
 };
 
 #define USERNS_SETGROUPS_ALLOWED 1UL
+#define USERNS_SETGROUPS_SHADOW  2UL
 
 #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
 
@@ -93,6 +94,9 @@ struct user_namespace {
 #endif
 	struct ucounts		*ucounts;
 	int ucount_max[UCOUNT_COUNTS];
+
+	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
+	struct group_info *shadow_group_info;
 } __randomize_layout;
 
 struct ucounts {
@@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
-extern bool userns_may_setgroups(const struct user_namespace *ns);
+extern bool userns_may_setgroups(const struct user_namespace *ns,
+				 struct group_info **shadowed_groups);
 extern bool in_userns(const struct user_namespace *ancestor,
 		       const struct user_namespace *child);
 extern bool current_in_userns(const struct user_namespace *target_ns);
@@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
 {
 }
 
-static inline bool userns_may_setgroups(const struct user_namespace *ns)
+static inline bool userns_may_setgroups(const struct user_namespace *ns,
+					struct group_info **shadowed_groups)
 {
+	*shadowed_groups = NULL;
 	return true;
 }
 
diff --git a/kernel/groups.c b/kernel/groups.c
index 787b381c7c00..f0c3b49da19e 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
 
 /* fill a group_info from a user-space array - it must be allocated already */
 static int groups_from_user(struct group_info *group_info,
-    gid_t __user *grouplist)
+			    gid_t __user *grouplist, int count)
 {
 	struct user_namespace *user_ns = current_user_ns();
 	int i;
-	unsigned int count = group_info->ngroups;
 
 	for (i = 0; i < count; i++) {
 		gid_t gid;
@@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
 	return i;
 }
 
-bool may_setgroups(void)
+bool may_setgroups(struct group_info **shadowed_groups)
 {
 	struct user_namespace *user_ns = current_user_ns();
 
 	return ns_capable_setid(user_ns, CAP_SETGID) &&
-		userns_may_setgroups(user_ns);
+		userns_may_setgroups(user_ns, shadowed_groups);
+}
+
+void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
+{
+	int i, j;
+
+	for (i = 0; i < shadowed->ngroups; i++) {
+		kgid_t kgid = shadowed->gid[i];
+
+		j = group_info->ngroups - i - 1;
+		group_info->gid[j] = kgid;
+	}
 }
 
 /*
@@ -184,27 +195,44 @@ bool may_setgroups(void)
 
 SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 {
-	struct group_info *group_info;
+	struct group_info *shadowed_groups = NULL;
+	struct group_info *group_info = NULL;
+	unsigned int arraysize = gidsetsize;
 	int retval;
 
-	if (!may_setgroups())
-		return -EPERM;
-	if ((unsigned)gidsetsize > NGROUPS_MAX)
+	if (arraysize > NGROUPS_MAX)
 		return -EINVAL;
 
-	group_info = groups_alloc(gidsetsize);
-	if (!group_info)
-		return -ENOMEM;
-	retval = groups_from_user(group_info, grouplist);
-	if (retval) {
-		put_group_info(group_info);
-		return retval;
+	if (!may_setgroups(&shadowed_groups))
+		return -EPERM;
+
+	if (shadowed_groups) {
+		retval = -EINVAL;
+		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
+			goto out;
+		arraysize += shadowed_groups->ngroups;
 	}
 
+	group_info = groups_alloc(arraysize);
+	retval = -ENOMEM;
+	if (!group_info)
+		goto out;
+
+	retval = groups_from_user(group_info, grouplist, gidsetsize);
+	if (retval)
+		goto out;
+
+	if (shadowed_groups)
+		add_shadowed_groups(group_info, shadowed_groups);
+
 	groups_sort(group_info);
 	retval = set_current_groups(group_info);
-	put_group_info(group_info);
 
+out:
+	if (group_info)
+		put_group_info(group_info);
+	if (shadowed_groups)
+		put_group_info(shadowed_groups);
 	return retval;
 }
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index af6925d8599b..cb1110f083ce 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
 }
 
 static int groups16_from_user(struct group_info *group_info,
-    old_gid_t __user *grouplist)
+			      old_gid_t __user *grouplist, int count)
 {
 	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
 	kgid_t kgid;
 
-	for (i = 0; i < group_info->ngroups; i++) {
+	for (i = 0; i < count; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
 
@@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 {
 	struct group_info *group_info;
+	struct group_info *shadowed_groups = NULL;
 	int retval;
+	unsigned int arraysize = gidsetsize;
 
-	if (!may_setgroups())
-		return -EPERM;
-	if ((unsigned)gidsetsize > NGROUPS_MAX)
+	if (arraysize > NGROUPS_MAX)
 		return -EINVAL;
 
-	group_info = groups_alloc(gidsetsize);
-	if (!group_info)
+	if (!may_setgroups(&shadowed_groups))
+		return -EPERM;
+
+	if (shadowed_groups) {
+		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
+			put_group_info(shadowed_groups);
+			return -EINVAL;
+		}
+		arraysize += shadowed_groups->ngroups;
+	}
+
+	group_info = groups_alloc(arraysize);
+	if (!group_info) {
+		if (shadowed_groups)
+			put_group_info(shadowed_groups);
 		return -ENOMEM;
-	retval = groups16_from_user(group_info, grouplist);
+	}
+
+	retval = groups16_from_user(group_info, grouplist, gidsetsize);
 	if (retval) {
+		if (shadowed_groups)
+			put_group_info(shadowed_groups);
 		put_group_info(group_info);
 		return retval;
 	}
 
+	if (shadowed_groups)
+		add_shadowed_groups(group_info, shadowed_groups);
+
 	groups_sort(group_info);
 	retval = set_current_groups(group_info);
 	put_group_info(group_info);
+	if (shadowed_groups)
+		put_group_info(shadowed_groups);
 
 	return retval;
 }
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 8d62863721b0..b1940b63f7ac 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
 		ns->ucount_max[i] = INT_MAX;
 	}
 	ns->ucounts = ucounts;
+	ns->shadow_group_info = get_current_groups();
 
 	/* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
 	mutex_lock(&userns_state_mutex);
@@ -177,6 +178,9 @@ static void free_user_ns(struct work_struct *work)
 	struct user_namespace *parent, *ns =
 		container_of(work, struct user_namespace, work);
 
+	if (ns->shadow_group_info)
+		put_group_info(ns->shadow_group_info);
+
 	do {
 		struct ucounts *ucounts = ns->ucounts;
 		parent = ns->parent;
@@ -1185,7 +1189,8 @@ int proc_setgroups_show(struct seq_file *seq, void *v)
 
 	seq_printf(seq, "%s\n",
 		   (userns_flags & USERNS_SETGROUPS_ALLOWED) ?
-		   "allow" : "deny");
+		   ((userns_flags & USERNS_SETGROUPS_SHADOW) ? "shadow" : "allow")
+		   : "deny");
 	return 0;
 }
 
@@ -1196,6 +1201,7 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
 	struct user_namespace *ns = seq->private;
 	char kbuf[8], *pos;
 	bool setgroups_allowed;
+	bool setgroups_shadow = false;
 	ssize_t ret;
 
 	/* Only allow a very narrow range of strings to be written */
@@ -1215,12 +1221,14 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
 	if (strncmp(pos, "allow", 5) == 0) {
 		pos += 5;
 		setgroups_allowed = true;
-	}
-	else if (strncmp(pos, "deny", 4) == 0) {
+	} else if (strncmp(pos, "shadow", 6) == 0) {
+		pos += 6;
+		setgroups_allowed = true;
+		setgroups_shadow = true;
+	} else if (strncmp(pos, "deny", 4) == 0) {
 		pos += 4;
 		setgroups_allowed = false;
-	}
-	else
+	} else
 		goto out;
 
 	/* Verify there is not trailing junk on the line */
@@ -1236,6 +1244,13 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
 		 */
 		if (!(ns->flags & USERNS_SETGROUPS_ALLOWED))
 			goto out_unlock;
+
+		if (setgroups_shadow)
+			ns->flags |= USERNS_SETGROUPS_SHADOW;
+		else {
+			put_group_info(ns->shadow_group_info);
+			ns->shadow_group_info = NULL;
+		}
 	} else {
 		/* Permanently disabling setgroups after setgroups has
 		 * been enabled by writing the gid_map is not allowed.
@@ -1256,9 +1271,11 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
 	goto out;
 }
 
-bool userns_may_setgroups(const struct user_namespace *ns)
+bool userns_may_setgroups(const struct user_namespace *ns,
+			 struct group_info **shadowed_groups)
 {
 	bool allowed;
+	struct group_info *shadowed = NULL;
 
 	mutex_lock(&userns_state_mutex);
 	/* It is not safe to use setgroups until a gid mapping in
@@ -1267,8 +1284,13 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 	allowed = ns->gid_map.nr_extents != 0;
 	/* Is setgroups allowed? */
 	allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
+	if (allowed && (ns->flags & USERNS_SETGROUPS_SHADOW))
+		shadowed = get_group_info(ns->shadow_group_info);
+
 	mutex_unlock(&userns_state_mutex);
 
+	*shadowed_groups = shadowed;
+
 	return allowed;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/3] getgroups: hide unknown groups
  2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
  2021-05-10 13:00 ` [RFC PATCH 1/3] setgroups: " Giuseppe Scrivano
@ 2021-05-10 13:00 ` Giuseppe Scrivano
  2021-05-15  1:21   ` Serge E. Hallyn
  2021-05-10 13:00 ` [RFC PATCH 3/3] proc: hide unknown groups in status Giuseppe Scrivano
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-10 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: serge, dwalsh, christian.brauner, ebiederm

do not copy to userspace the groups that are not known in the
namespace.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---
 kernel/groups.c | 40 +++++++++++++++++++--------------------
 kernel/uid16.c  | 50 ++++++++++++++++++++++++-------------------------
 2 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index f0c3b49da19e..97fa9faa7813 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -35,19 +35,30 @@ EXPORT_SYMBOL(groups_free);
 
 /* export the group_info to a user-space array */
 static int groups_to_user(gid_t __user *grouplist,
-			  const struct group_info *group_info)
+			  const struct group_info *group_info,
+			  int gidsetsize)
 {
 	struct user_namespace *user_ns = current_user_ns();
-	int i;
 	unsigned int count = group_info->ngroups;
+	int i, ngroups = 0;
 
 	for (i = 0; i < count; i++) {
 		gid_t gid;
-		gid = from_kgid_munged(user_ns, group_info->gid[i]);
-		if (put_user(gid, grouplist+i))
-			return -EFAULT;
+		gid = from_kgid(user_ns, group_info->gid[i]);
+		if (gid == (gid_t)-1) {
+			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+				continue;
+			gid = overflowgid;
+		}
+		if (gidsetsize) {
+			if (ngroups == gidsetsize)
+				return -EINVAL;
+			if (put_user(gid, grouplist + ngroups))
+				return -EFAULT;
+		}
+		ngroups++;
 	}
-	return 0;
+	return ngroups;
 }
 
 /* fill a group_info from a user-space array - it must be allocated already */
@@ -146,26 +157,13 @@ EXPORT_SYMBOL(set_current_groups);
 
 SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
 {
+	/* no need to grab task_lock here; it cannot change */
 	const struct cred *cred = current_cred();
-	int i;
 
 	if (gidsetsize < 0)
 		return -EINVAL;
 
-	/* no need to grab task_lock here; it cannot change */
-	i = cred->group_info->ngroups;
-	if (gidsetsize) {
-		if (i > gidsetsize) {
-			i = -EINVAL;
-			goto out;
-		}
-		if (groups_to_user(grouplist, cred->group_info)) {
-			i = -EFAULT;
-			goto out;
-		}
-	}
-out:
-	return i;
+	return groups_to_user(grouplist, cred->group_info, gidsetsize);
 }
 
 bool may_setgroups(struct group_info **shadowed_groups)
diff --git a/kernel/uid16.c b/kernel/uid16.c
index cb1110f083ce..6f2dc793b5f8 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -112,21 +112,33 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
 }
 
 static int groups16_to_user(old_gid_t __user *grouplist,
-    struct group_info *group_info)
+			    struct group_info *group_info,
+			    int gidsetsize)
 {
 	struct user_namespace *user_ns = current_user_ns();
-	int i;
-	old_gid_t group;
-	kgid_t kgid;
+	unsigned int count = group_info->ngroups;
+	int i, ngroups = 0;
 
-	for (i = 0; i < group_info->ngroups; i++) {
-		kgid = group_info->gid[i];
-		group = high2lowgid(from_kgid_munged(user_ns, kgid));
-		if (put_user(group, grouplist+i))
-			return -EFAULT;
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		old_gid_t group;
+
+		gid = from_kgid(user_ns, group_info->gid[i]);
+		if (gid == (gid_t)-1) {
+			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+				continue;
+			gid = overflowgid;
+		}
+		group = high2lowgid(gid);
+		if (gidsetsize) {
+			if (ngroups == gidsetsize)
+				return -EINVAL;
+			if (put_user(group, grouplist + ngroups))
+				return -EFAULT;
+		}
+		ngroups++;
 	}
-
-	return 0;
+	return ngroups;
 }
 
 static int groups16_from_user(struct group_info *group_info,
@@ -153,25 +165,13 @@ static int groups16_from_user(struct group_info *group_info,
 
 SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 {
+	/* no need to grab task_lock here; it cannot change */
 	const struct cred *cred = current_cred();
-	int i;
 
 	if (gidsetsize < 0)
 		return -EINVAL;
 
-	i = cred->group_info->ngroups;
-	if (gidsetsize) {
-		if (i > gidsetsize) {
-			i = -EINVAL;
-			goto out;
-		}
-		if (groups16_to_user(grouplist, cred->group_info)) {
-			i = -EFAULT;
-			goto out;
-		}
-	}
-out:
-	return i;
+	return groups16_to_user(grouplist, cred->group_info, gidsetsize);
 }
 
 SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/3] proc: hide unknown groups in status
  2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
  2021-05-10 13:00 ` [RFC PATCH 1/3] setgroups: " Giuseppe Scrivano
  2021-05-10 13:00 ` [RFC PATCH 2/3] getgroups: hide unknown groups Giuseppe Scrivano
@ 2021-05-10 13:00 ` Giuseppe Scrivano
  2021-05-15  1:24   ` Serge E. Hallyn
  2021-05-10 16:02 ` [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Snaipe
  2021-05-21 15:16 ` Eric W. Biederman
  4 siblings, 1 reply; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-10 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: serge, dwalsh, christian.brauner, ebiederm

when the "shadow" mode is enabled for the user namespace, do not copy
to userspace the groups that are not mapped.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---
 fs/proc/array.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 7ec59171f197..81dc733773d4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -202,9 +202,17 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 
 	seq_puts(m, "\nGroups:\t");
 	group_info = cred->group_info;
-	for (g = 0; g < group_info->ngroups; g++)
+	for (g = 0; g < group_info->ngroups; g++) {
+		gid_t gid = from_kgid(user_ns, group_info->gid[g]);
+
+		if (gid == (gid_t)-1) {
+			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
+				continue;
+			gid = overflowgid;
+		}
 		seq_put_decimal_ull(m, g ? " " : "",
-				from_kgid_munged(user_ns, group_info->gid[g]));
+				gid);
+	}
 	put_cred(cred);
 	/* Trailing space shouldn't have been added in the first place. */
 	seq_putc(m, ' ');
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups
  2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
                   ` (2 preceding siblings ...)
  2021-05-10 13:00 ` [RFC PATCH 3/3] proc: hide unknown groups in status Giuseppe Scrivano
@ 2021-05-10 16:02 ` Snaipe
  2021-05-21 15:16 ` Eric W. Biederman
  4 siblings, 0 replies; 18+ messages in thread
From: Snaipe @ 2021-05-10 16:02 UTC (permalink / raw)
  To: gscrivan; +Cc: christian.brauner, dwalsh, ebiederm, linux-kernel, serge, Snaipe

"Giuseppe Scrivano" <gscrivan@redhat.com> writes:
> This series is based on some old patches I've been playing with some
> years ago, but they were never sent to lkml as I was not sure about
> their complexity/usefulness ratio.  It was recently reported by
> another user that these patches are still useful[1] so I am submitting
> the last version and see what other folks think about this feature.

For context, the reason why these patches are useful to us is that our
use-case of user namespaces includes running executables within an
{u,g}id space controlled by the calling user, while still remaining in
the original root filesystem.

For example, we use user namespaces through one of our tools[1] as a
substitute to fakeroot in order to build software that would otherwise
need root permission to package, like sudo or ping, where setting the
setuid bit or more importantly file capabilities are necessary.

In these use-cases, still respecting the original group membership is
actually desired. Not just because of the negative-access permission
issues, but because it's possible to lose legitimate access to files
if using setgroups while holding membership of unmapped GIDs. This can
be very surprising behaviour, especially when, as an example, the caller
suddenly loses access to their current working directory after entering
a user namespace and using setgroups.

I've seen other solutions to the original problem mentioned, like
introducing a new sysctl to convey that the system does not use negative-
access permissions -- I believe these alternate solutions do not solve
my second point about losing legitimate access, while this patchset does.
I've tested an older version of these patches and they have all of the
desired properties:

    $ id
    uid=1000(snaipe) gid=1000(snaipe) groups=1000(snaipe),998(wheel)

    $ bst grep . /proc/self/uid_map /proc/self/gid_map /proc/self/setgroups
    /proc/self/uid_map:         0       1000          1
    /proc/self/uid_map:         1     100000      65536
    /proc/self/gid_map:         0       1000          1
    /proc/self/gid_map:         1     100000      65536
    /proc/self/setgroups:shadow

    $ ls -l
    total 8
    drwxr-xr-x 2 root wheel 4096 Apr 23 14:18 allowed
    drwx---r-x 2 root wheel 4096 Apr 23 14:18 denied

    $ bst sh -c 'id; ls allowed denied'
    uid=0(root) gid=0(root) groups=0(root)
    allowed:
    ls: cannot open directory 'denied': Permission denied

    $ bst --groups 1 sh -c 'id; ls allowed denied'
    uid=0(root) gid=0(root) groups=0(root),1(daemon)
    allowed:
    ls: cannot open directory 'denied': Permission denied

Ultimately, we want to make it safe to run our tool as an unprivileged user,
and while we're currently riding the status-quo of "safe-to-use-but-not-
if-you're-using-negative-permissions", having a way for us to do the right
thing -- without relying on the gotcha that a system administrator must
configure a system knob to make it safe -- is quite attractive.

[1]: https://github.com/aristanetworks/bst

-- 
Snaipe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/3] getgroups: hide unknown groups
  2021-05-10 13:00 ` [RFC PATCH 2/3] getgroups: hide unknown groups Giuseppe Scrivano
@ 2021-05-15  1:21   ` Serge E. Hallyn
  0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2021-05-15  1:21 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: linux-kernel, serge, dwalsh, christian.brauner, ebiederm

On Mon, May 10, 2021 at 03:00:10PM +0200, Giuseppe Scrivano wrote:
> do not copy to userspace the groups that are not known in the
> namespace.
> 
> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Reviewed-by: Serge Hallyn <serge@hallyn.com>

> ---
>  kernel/groups.c | 40 +++++++++++++++++++--------------------
>  kernel/uid16.c  | 50 ++++++++++++++++++++++++-------------------------
>  2 files changed, 44 insertions(+), 46 deletions(-)
> 
> diff --git a/kernel/groups.c b/kernel/groups.c
> index f0c3b49da19e..97fa9faa7813 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -35,19 +35,30 @@ EXPORT_SYMBOL(groups_free);
>  
>  /* export the group_info to a user-space array */
>  static int groups_to_user(gid_t __user *grouplist,
> -			  const struct group_info *group_info)
> +			  const struct group_info *group_info,
> +			  int gidsetsize)
>  {
>  	struct user_namespace *user_ns = current_user_ns();
> -	int i;
>  	unsigned int count = group_info->ngroups;
> +	int i, ngroups = 0;
>  
>  	for (i = 0; i < count; i++) {
>  		gid_t gid;
> -		gid = from_kgid_munged(user_ns, group_info->gid[i]);
> -		if (put_user(gid, grouplist+i))
> -			return -EFAULT;
> +		gid = from_kgid(user_ns, group_info->gid[i]);
> +		if (gid == (gid_t)-1) {
> +			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> +				continue;
> +			gid = overflowgid;
> +		}
> +		if (gidsetsize) {
> +			if (ngroups == gidsetsize)
> +				return -EINVAL;
> +			if (put_user(gid, grouplist + ngroups))
> +				return -EFAULT;
> +		}
> +		ngroups++;
>  	}
> -	return 0;
> +	return ngroups;
>  }
>  
>  /* fill a group_info from a user-space array - it must be allocated already */
> @@ -146,26 +157,13 @@ EXPORT_SYMBOL(set_current_groups);
>  
>  SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
>  {
> +	/* no need to grab task_lock here; it cannot change */
>  	const struct cred *cred = current_cred();
> -	int i;
>  
>  	if (gidsetsize < 0)
>  		return -EINVAL;
>  
> -	/* no need to grab task_lock here; it cannot change */
> -	i = cred->group_info->ngroups;
> -	if (gidsetsize) {
> -		if (i > gidsetsize) {
> -			i = -EINVAL;
> -			goto out;
> -		}
> -		if (groups_to_user(grouplist, cred->group_info)) {
> -			i = -EFAULT;
> -			goto out;
> -		}
> -	}
> -out:
> -	return i;
> +	return groups_to_user(grouplist, cred->group_info, gidsetsize);
>  }
>  
>  bool may_setgroups(struct group_info **shadowed_groups)
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index cb1110f083ce..6f2dc793b5f8 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -112,21 +112,33 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
>  }
>  
>  static int groups16_to_user(old_gid_t __user *grouplist,
> -    struct group_info *group_info)
> +			    struct group_info *group_info,
> +			    int gidsetsize)
>  {
>  	struct user_namespace *user_ns = current_user_ns();
> -	int i;
> -	old_gid_t group;
> -	kgid_t kgid;
> +	unsigned int count = group_info->ngroups;
> +	int i, ngroups = 0;
>  
> -	for (i = 0; i < group_info->ngroups; i++) {
> -		kgid = group_info->gid[i];
> -		group = high2lowgid(from_kgid_munged(user_ns, kgid));
> -		if (put_user(group, grouplist+i))
> -			return -EFAULT;
> +	for (i = 0; i < count; i++) {
> +		gid_t gid;
> +		old_gid_t group;
> +
> +		gid = from_kgid(user_ns, group_info->gid[i]);
> +		if (gid == (gid_t)-1) {
> +			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> +				continue;
> +			gid = overflowgid;
> +		}
> +		group = high2lowgid(gid);
> +		if (gidsetsize) {
> +			if (ngroups == gidsetsize)
> +				return -EINVAL;
> +			if (put_user(group, grouplist + ngroups))
> +				return -EFAULT;
> +		}
> +		ngroups++;
>  	}
> -
> -	return 0;
> +	return ngroups;
>  }
>  
>  static int groups16_from_user(struct group_info *group_info,
> @@ -153,25 +165,13 @@ static int groups16_from_user(struct group_info *group_info,
>  
>  SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  {
> +	/* no need to grab task_lock here; it cannot change */
>  	const struct cred *cred = current_cred();
> -	int i;
>  
>  	if (gidsetsize < 0)
>  		return -EINVAL;
>  
> -	i = cred->group_info->ngroups;
> -	if (gidsetsize) {
> -		if (i > gidsetsize) {
> -			i = -EINVAL;
> -			goto out;
> -		}
> -		if (groups16_to_user(grouplist, cred->group_info)) {
> -			i = -EFAULT;
> -			goto out;
> -		}
> -	}
> -out:
> -	return i;
> +	return groups16_to_user(grouplist, cred->group_info, gidsetsize);
>  }
>  
>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] proc: hide unknown groups in status
  2021-05-10 13:00 ` [RFC PATCH 3/3] proc: hide unknown groups in status Giuseppe Scrivano
@ 2021-05-15  1:24   ` Serge E. Hallyn
  0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2021-05-15  1:24 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: linux-kernel, serge, dwalsh, christian.brauner, ebiederm

On Mon, May 10, 2021 at 03:00:11PM +0200, Giuseppe Scrivano wrote:
> when the "shadow" mode is enabled for the user namespace, do not copy
> to userspace the groups that are not mapped.
> 
> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Reviewed-by: Serge Hallyn <serge@hallyn.com>

> ---
>  fs/proc/array.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 7ec59171f197..81dc733773d4 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -202,9 +202,17 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  
>  	seq_puts(m, "\nGroups:\t");
>  	group_info = cred->group_info;
> -	for (g = 0; g < group_info->ngroups; g++)
> +	for (g = 0; g < group_info->ngroups; g++) {
> +		gid_t gid = from_kgid(user_ns, group_info->gid[g]);
> +
> +		if (gid == (gid_t)-1) {
> +			if (user_ns->flags & USERNS_SETGROUPS_SHADOW)
> +				continue;
> +			gid = overflowgid;
> +		}
>  		seq_put_decimal_ull(m, g ? " " : "",
> -				from_kgid_munged(user_ns, group_info->gid[g]));
> +				gid);
> +	}
>  	put_cred(cred);
>  	/* Trailing space shouldn't have been added in the first place. */
>  	seq_putc(m, ' ');
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-10 13:00 ` [RFC PATCH 1/3] setgroups: " Giuseppe Scrivano
@ 2021-05-15  1:51   ` Serge E. Hallyn
  2021-05-17 13:30     ` Giuseppe Scrivano
  0 siblings, 1 reply; 18+ messages in thread
From: Serge E. Hallyn @ 2021-05-15  1:51 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: linux-kernel, serge, dwalsh, christian.brauner, ebiederm

On Mon, May 10, 2021 at 03:00:09PM +0200, Giuseppe Scrivano wrote:
> add a new mode 'shadow' to the /proc/PID/setgroups file.
> 
> When the 'shadow' string is written to the file, the current process
> groups are inherited from the process and stored into the user
> namespace.  These groups will be silently added on each setgroups(2)
> call.  A child user namespace won't be able to drop these groups
> anymore.
> 
> It enables using setgroups(2) in user namespaces on systems where
> negative ACLs are used.
> 
> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Thanks for re-sending.

Two closely related questions (and one comment) below.

> ---
>  include/linux/cred.h           |  4 ++-
>  include/linux/user_namespace.h | 11 +++++--
>  kernel/groups.c                | 60 +++++++++++++++++++++++++---------
>  kernel/uid16.c                 | 38 ++++++++++++++++-----
>  kernel/user_namespace.c        | 34 +++++++++++++++----
>  5 files changed, 114 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 14971322e1a0..f3e631293532 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
>  
>  extern int set_current_groups(struct group_info *);
>  extern void set_groups(struct cred *, struct group_info *);
> -extern bool may_setgroups(void);
> +extern bool may_setgroups(struct group_info **shadowed_groups);
> +extern void add_shadowed_groups(struct group_info *group_info,
> +				struct group_info *shadowed);
>  extern void groups_sort(struct group_info *);
>  #else
>  static inline void groups_free(struct group_info *group_info)
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 1d08dbbcfe32..cb003172de20 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
>  };
>  
>  #define USERNS_SETGROUPS_ALLOWED 1UL
> +#define USERNS_SETGROUPS_SHADOW  2UL
>  
>  #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
>  
> @@ -93,6 +94,9 @@ struct user_namespace {
>  #endif
>  	struct ucounts		*ucounts;
>  	int ucount_max[UCOUNT_COUNTS];
> +
> +	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
> +	struct group_info *shadow_group_info;
>  } __randomize_layout;
>  
>  struct ucounts {
> @@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
>  extern int proc_setgroups_show(struct seq_file *m, void *v);
> -extern bool userns_may_setgroups(const struct user_namespace *ns);

I realize there's not a single other comment in this file, but I
think userns_may_setgroups() could use a comment to explain what
shadowed_groups is.

> +extern bool userns_may_setgroups(const struct user_namespace *ns,
> +				 struct group_info **shadowed_groups);
>  extern bool in_userns(const struct user_namespace *ancestor,
>  		       const struct user_namespace *child);
>  extern bool current_in_userns(const struct user_namespace *target_ns);
> @@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
>  {
>  }
>  
> -static inline bool userns_may_setgroups(const struct user_namespace *ns)
> +static inline bool userns_may_setgroups(const struct user_namespace *ns,
> +					struct group_info **shadowed_groups)
>  {
> +	*shadowed_groups = NULL;
>  	return true;
>  }
>  
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 787b381c7c00..f0c3b49da19e 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
>  
>  /* fill a group_info from a user-space array - it must be allocated already */
>  static int groups_from_user(struct group_info *group_info,
> -    gid_t __user *grouplist)
> +			    gid_t __user *grouplist, int count)
>  {
>  	struct user_namespace *user_ns = current_user_ns();
>  	int i;
> -	unsigned int count = group_info->ngroups;
>  
>  	for (i = 0; i < count; i++) {
>  		gid_t gid;
> @@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
>  	return i;
>  }
>  
> -bool may_setgroups(void)
> +bool may_setgroups(struct group_info **shadowed_groups)
>  {
>  	struct user_namespace *user_ns = current_user_ns();
>  
>  	return ns_capable_setid(user_ns, CAP_SETGID) &&
> -		userns_may_setgroups(user_ns);
> +		userns_may_setgroups(user_ns, shadowed_groups);
> +}
> +
> +void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
> +{
> +	int i, j;
> +
> +	for (i = 0; i < shadowed->ngroups; i++) {
> +		kgid_t kgid = shadowed->gid[i];
> +
> +		j = group_info->ngroups - i - 1;
> +		group_info->gid[j] = kgid;
> +	}
>  }
>  
>  /*
> @@ -184,27 +195,44 @@ bool may_setgroups(void)
>  
>  SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>  {
> -	struct group_info *group_info;
> +	struct group_info *shadowed_groups = NULL;
> +	struct group_info *group_info = NULL;
> +	unsigned int arraysize = gidsetsize;
>  	int retval;
>  
> -	if (!may_setgroups())
> -		return -EPERM;
> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> +	if (arraysize > NGROUPS_MAX)
>  		return -EINVAL;
>  
> -	group_info = groups_alloc(gidsetsize);
> -	if (!group_info)
> -		return -ENOMEM;
> -	retval = groups_from_user(group_info, grouplist);
> -	if (retval) {
> -		put_group_info(group_info);
> -		return retval;
> +	if (!may_setgroups(&shadowed_groups))
> +		return -EPERM;
> +
> +	if (shadowed_groups) {
> +		retval = -EINVAL;
> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
> +			goto out;
> +		arraysize += shadowed_groups->ngroups;
>  	}
>  
> +	group_info = groups_alloc(arraysize);
> +	retval = -ENOMEM;
> +	if (!group_info)
> +		goto out;
> +
> +	retval = groups_from_user(group_info, grouplist, gidsetsize);
> +	if (retval)
> +		goto out;
> +
> +	if (shadowed_groups)
> +		add_shadowed_groups(group_info, shadowed_groups);
> +
>  	groups_sort(group_info);
>  	retval = set_current_groups(group_info);
> -	put_group_info(group_info);
>  
> +out:
> +	if (group_info)
> +		put_group_info(group_info);
> +	if (shadowed_groups)
> +		put_group_info(shadowed_groups);
>  	return retval;
>  }
>  
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index af6925d8599b..cb1110f083ce 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
>  }
>  
>  static int groups16_from_user(struct group_info *group_info,
> -    old_gid_t __user *grouplist)
> +			      old_gid_t __user *grouplist, int count)
>  {
>  	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	old_gid_t group;
>  	kgid_t kgid;
>  
> -	for (i = 0; i < group_info->ngroups; i++) {
> +	for (i = 0; i < count; i++) {
>  		if (get_user(group, grouplist+i))
>  			return  -EFAULT;
>  
> @@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  {
>  	struct group_info *group_info;
> +	struct group_info *shadowed_groups = NULL;
>  	int retval;
> +	unsigned int arraysize = gidsetsize;
>  
> -	if (!may_setgroups())
> -		return -EPERM;
> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> +	if (arraysize > NGROUPS_MAX)
>  		return -EINVAL;
>  
> -	group_info = groups_alloc(gidsetsize);
> -	if (!group_info)
> +	if (!may_setgroups(&shadowed_groups))
> +		return -EPERM;
> +
> +	if (shadowed_groups) {
> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
> +			put_group_info(shadowed_groups);
> +			return -EINVAL;
> +		}
> +		arraysize += shadowed_groups->ngroups;
> +	}
> +
> +	group_info = groups_alloc(arraysize);
> +	if (!group_info) {
> +		if (shadowed_groups)
> +			put_group_info(shadowed_groups);
>  		return -ENOMEM;
> -	retval = groups16_from_user(group_info, grouplist);
> +	}
> +
> +	retval = groups16_from_user(group_info, grouplist, gidsetsize);
>  	if (retval) {
> +		if (shadowed_groups)
> +			put_group_info(shadowed_groups);
>  		put_group_info(group_info);
>  		return retval;
>  	}
>  
> +	if (shadowed_groups)
> +		add_shadowed_groups(group_info, shadowed_groups);
> +
>  	groups_sort(group_info);
>  	retval = set_current_groups(group_info);
>  	put_group_info(group_info);
> +	if (shadowed_groups)
> +		put_group_info(shadowed_groups);
>  
>  	return retval;
>  }
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 8d62863721b0..b1940b63f7ac 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
>  		ns->ucount_max[i] = INT_MAX;
>  	}
>  	ns->ucounts = ucounts;
> +	ns->shadow_group_info = get_current_groups();

If userns u1 unshares u2 with shadow set, then when u2 unshares
u3, should u3 get the same shadowed set that u2 has, or should it
get all of u2's groups as u3's initial shadow set?

>  
>  	/* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
>  	mutex_lock(&userns_state_mutex);
> @@ -177,6 +178,9 @@ static void free_user_ns(struct work_struct *work)
>  	struct user_namespace *parent, *ns =
>  		container_of(work, struct user_namespace, work);
>  
> +	if (ns->shadow_group_info)
> +		put_group_info(ns->shadow_group_info);
> +
>  	do {
>  		struct ucounts *ucounts = ns->ucounts;
>  		parent = ns->parent;
> @@ -1185,7 +1189,8 @@ int proc_setgroups_show(struct seq_file *seq, void *v)
>  
>  	seq_printf(seq, "%s\n",
>  		   (userns_flags & USERNS_SETGROUPS_ALLOWED) ?
> -		   "allow" : "deny");
> +		   ((userns_flags & USERNS_SETGROUPS_SHADOW) ? "shadow" : "allow")
> +		   : "deny");
>  	return 0;
>  }
>  
> @@ -1196,6 +1201,7 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>  	struct user_namespace *ns = seq->private;
>  	char kbuf[8], *pos;
>  	bool setgroups_allowed;
> +	bool setgroups_shadow = false;
>  	ssize_t ret;
>  
>  	/* Only allow a very narrow range of strings to be written */
> @@ -1215,12 +1221,14 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>  	if (strncmp(pos, "allow", 5) == 0) {
>  		pos += 5;
>  		setgroups_allowed = true;
> -	}
> -	else if (strncmp(pos, "deny", 4) == 0) {
> +	} else if (strncmp(pos, "shadow", 6) == 0) {
> +		pos += 6;
> +		setgroups_allowed = true;
> +		setgroups_shadow = true;
> +	} else if (strncmp(pos, "deny", 4) == 0) {
>  		pos += 4;
>  		setgroups_allowed = false;
> -	}
> -	else
> +	} else
>  		goto out;
>  
>  	/* Verify there is not trailing junk on the line */
> @@ -1236,6 +1244,13 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>  		 */
>  		if (!(ns->flags & USERNS_SETGROUPS_ALLOWED))
>  			goto out_unlock;
> +
> +		if (setgroups_shadow)

I might be missing something here.  Once userns u1 has unshared
u2 with 'shadow' set, when u2 unshare u3, what here prevents
u3 from clearing shadow flag?

> +			ns->flags |= USERNS_SETGROUPS_SHADOW;
> +		else {
> +			put_group_info(ns->shadow_group_info);
> +			ns->shadow_group_info = NULL;
> +		}
>  	} else {
>  		/* Permanently disabling setgroups after setgroups has
>  		 * been enabled by writing the gid_map is not allowed.
> @@ -1256,9 +1271,11 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>  	goto out;
>  }
>  
> -bool userns_may_setgroups(const struct user_namespace *ns)
> +bool userns_may_setgroups(const struct user_namespace *ns,
> +			 struct group_info **shadowed_groups)
>  {
>  	bool allowed;
> +	struct group_info *shadowed = NULL;
>  
>  	mutex_lock(&userns_state_mutex);
>  	/* It is not safe to use setgroups until a gid mapping in
> @@ -1267,8 +1284,13 @@ bool userns_may_setgroups(const struct user_namespace *ns)
>  	allowed = ns->gid_map.nr_extents != 0;
>  	/* Is setgroups allowed? */
>  	allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
> +	if (allowed && (ns->flags & USERNS_SETGROUPS_SHADOW))
> +		shadowed = get_group_info(ns->shadow_group_info);
> +
>  	mutex_unlock(&userns_state_mutex);
>  
> +	*shadowed_groups = shadowed;
> +
>  	return allowed;
>  }
>  
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-15  1:51   ` Serge E. Hallyn
@ 2021-05-17 13:30     ` Giuseppe Scrivano
  2021-05-17 14:33       ` Christian Brauner
  2021-05-17 16:08       ` Serge E. Hallyn
  0 siblings, 2 replies; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-17 13:30 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: linux-kernel, dwalsh, christian.brauner, ebiederm

Hi Serge,

thanks for the review.

"Serge E. Hallyn" <serge@hallyn.com> writes:

> On Mon, May 10, 2021 at 03:00:09PM +0200, Giuseppe Scrivano wrote:
>> add a new mode 'shadow' to the /proc/PID/setgroups file.
>> 
>> When the 'shadow' string is written to the file, the current process
>> groups are inherited from the process and stored into the user
>> namespace.  These groups will be silently added on each setgroups(2)
>> call.  A child user namespace won't be able to drop these groups
>> anymore.
>> 
>> It enables using setgroups(2) in user namespaces on systems where
>> negative ACLs are used.
>> 
>> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>
> Thanks for re-sending.
>
> Two closely related questions (and one comment) below.
>
>> ---
>>  include/linux/cred.h           |  4 ++-
>>  include/linux/user_namespace.h | 11 +++++--
>>  kernel/groups.c                | 60 +++++++++++++++++++++++++---------
>>  kernel/uid16.c                 | 38 ++++++++++++++++-----
>>  kernel/user_namespace.c        | 34 +++++++++++++++----
>>  5 files changed, 114 insertions(+), 33 deletions(-)
>> 
>> diff --git a/include/linux/cred.h b/include/linux/cred.h
>> index 14971322e1a0..f3e631293532 100644
>> --- a/include/linux/cred.h
>> +++ b/include/linux/cred.h
>> @@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
>>  
>>  extern int set_current_groups(struct group_info *);
>>  extern void set_groups(struct cred *, struct group_info *);
>> -extern bool may_setgroups(void);
>> +extern bool may_setgroups(struct group_info **shadowed_groups);
>> +extern void add_shadowed_groups(struct group_info *group_info,
>> +				struct group_info *shadowed);
>>  extern void groups_sort(struct group_info *);
>>  #else
>>  static inline void groups_free(struct group_info *group_info)
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index 1d08dbbcfe32..cb003172de20 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
>>  };
>>  
>>  #define USERNS_SETGROUPS_ALLOWED 1UL
>> +#define USERNS_SETGROUPS_SHADOW  2UL
>>  
>>  #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
>>  
>> @@ -93,6 +94,9 @@ struct user_namespace {
>>  #endif
>>  	struct ucounts		*ucounts;
>>  	int ucount_max[UCOUNT_COUNTS];
>> +
>> +	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
>> +	struct group_info *shadow_group_info;
>>  } __randomize_layout;
>>  
>>  struct ucounts {
>> @@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
>>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
>>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
>>  extern int proc_setgroups_show(struct seq_file *m, void *v);
>> -extern bool userns_may_setgroups(const struct user_namespace *ns);
>
> I realize there's not a single other comment in this file, but I
> think userns_may_setgroups() could use a comment to explain what
> shadowed_groups is.

sure, I'll add some comments to the code.

>> +extern bool userns_may_setgroups(const struct user_namespace *ns,
>> +				 struct group_info **shadowed_groups);
>>  extern bool in_userns(const struct user_namespace *ancestor,
>>  		       const struct user_namespace *child);
>>  extern bool current_in_userns(const struct user_namespace *target_ns);
>> @@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
>>  {
>>  }
>>  
>> -static inline bool userns_may_setgroups(const struct user_namespace *ns)
>> +static inline bool userns_may_setgroups(const struct user_namespace *ns,
>> +					struct group_info **shadowed_groups)
>>  {
>> +	*shadowed_groups = NULL;
>>  	return true;
>>  }
>>  
>> diff --git a/kernel/groups.c b/kernel/groups.c
>> index 787b381c7c00..f0c3b49da19e 100644
>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
>>  
>>  /* fill a group_info from a user-space array - it must be allocated already */
>>  static int groups_from_user(struct group_info *group_info,
>> -    gid_t __user *grouplist)
>> +			    gid_t __user *grouplist, int count)
>>  {
>>  	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>> -	unsigned int count = group_info->ngroups;
>>  
>>  	for (i = 0; i < count; i++) {
>>  		gid_t gid;
>> @@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
>>  	return i;
>>  }
>>  
>> -bool may_setgroups(void)
>> +bool may_setgroups(struct group_info **shadowed_groups)
>>  {
>>  	struct user_namespace *user_ns = current_user_ns();
>>  
>>  	return ns_capable_setid(user_ns, CAP_SETGID) &&
>> -		userns_may_setgroups(user_ns);
>> +		userns_may_setgroups(user_ns, shadowed_groups);
>> +}
>> +
>> +void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
>> +{
>> +	int i, j;
>> +
>> +	for (i = 0; i < shadowed->ngroups; i++) {
>> +		kgid_t kgid = shadowed->gid[i];
>> +
>> +		j = group_info->ngroups - i - 1;
>> +		group_info->gid[j] = kgid;
>> +	}
>>  }
>>  
>>  /*
>> @@ -184,27 +195,44 @@ bool may_setgroups(void)
>>  
>>  SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>>  {
>> -	struct group_info *group_info;
>> +	struct group_info *shadowed_groups = NULL;
>> +	struct group_info *group_info = NULL;
>> +	unsigned int arraysize = gidsetsize;
>>  	int retval;
>>  
>> -	if (!may_setgroups())
>> -		return -EPERM;
>> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
>> +	if (arraysize > NGROUPS_MAX)
>>  		return -EINVAL;
>>  
>> -	group_info = groups_alloc(gidsetsize);
>> -	if (!group_info)
>> -		return -ENOMEM;
>> -	retval = groups_from_user(group_info, grouplist);
>> -	if (retval) {
>> -		put_group_info(group_info);
>> -		return retval;
>> +	if (!may_setgroups(&shadowed_groups))
>> +		return -EPERM;
>> +
>> +	if (shadowed_groups) {
>> +		retval = -EINVAL;
>> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
>> +			goto out;
>> +		arraysize += shadowed_groups->ngroups;
>>  	}
>>  
>> +	group_info = groups_alloc(arraysize);
>> +	retval = -ENOMEM;
>> +	if (!group_info)
>> +		goto out;
>> +
>> +	retval = groups_from_user(group_info, grouplist, gidsetsize);
>> +	if (retval)
>> +		goto out;
>> +
>> +	if (shadowed_groups)
>> +		add_shadowed_groups(group_info, shadowed_groups);
>> +
>>  	groups_sort(group_info);
>>  	retval = set_current_groups(group_info);
>> -	put_group_info(group_info);
>>  
>> +out:
>> +	if (group_info)
>> +		put_group_info(group_info);
>> +	if (shadowed_groups)
>> +		put_group_info(shadowed_groups);
>>  	return retval;
>>  }
>>  
>> diff --git a/kernel/uid16.c b/kernel/uid16.c
>> index af6925d8599b..cb1110f083ce 100644
>> --- a/kernel/uid16.c
>> +++ b/kernel/uid16.c
>> @@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
>>  }
>>  
>>  static int groups16_from_user(struct group_info *group_info,
>> -    old_gid_t __user *grouplist)
>> +			      old_gid_t __user *grouplist, int count)
>>  {
>>  	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>>  	old_gid_t group;
>>  	kgid_t kgid;
>>  
>> -	for (i = 0; i < group_info->ngroups; i++) {
>> +	for (i = 0; i < count; i++) {
>>  		if (get_user(group, grouplist+i))
>>  			return  -EFAULT;
>>  
>> @@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>>  {
>>  	struct group_info *group_info;
>> +	struct group_info *shadowed_groups = NULL;
>>  	int retval;
>> +	unsigned int arraysize = gidsetsize;
>>  
>> -	if (!may_setgroups())
>> -		return -EPERM;
>> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
>> +	if (arraysize > NGROUPS_MAX)
>>  		return -EINVAL;
>>  
>> -	group_info = groups_alloc(gidsetsize);
>> -	if (!group_info)
>> +	if (!may_setgroups(&shadowed_groups))
>> +		return -EPERM;
>> +
>> +	if (shadowed_groups) {
>> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
>> +			put_group_info(shadowed_groups);
>> +			return -EINVAL;
>> +		}
>> +		arraysize += shadowed_groups->ngroups;
>> +	}
>> +
>> +	group_info = groups_alloc(arraysize);
>> +	if (!group_info) {
>> +		if (shadowed_groups)
>> +			put_group_info(shadowed_groups);
>>  		return -ENOMEM;
>> -	retval = groups16_from_user(group_info, grouplist);
>> +	}
>> +
>> +	retval = groups16_from_user(group_info, grouplist, gidsetsize);
>>  	if (retval) {
>> +		if (shadowed_groups)
>> +			put_group_info(shadowed_groups);
>>  		put_group_info(group_info);
>>  		return retval;
>>  	}
>>  
>> +	if (shadowed_groups)
>> +		add_shadowed_groups(group_info, shadowed_groups);
>> +
>>  	groups_sort(group_info);
>>  	retval = set_current_groups(group_info);
>>  	put_group_info(group_info);
>> +	if (shadowed_groups)
>> +		put_group_info(shadowed_groups);
>>  
>>  	return retval;
>>  }
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 8d62863721b0..b1940b63f7ac 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
>>  		ns->ucount_max[i] = INT_MAX;
>>  	}
>>  	ns->ucounts = ucounts;
>> +	ns->shadow_group_info = get_current_groups();
>
> If userns u1 unshares u2 with shadow set, then when u2 unshares
> u3, should u3 get the same shadowed set that u2 has, or should it
> get all of u2's groups as u3's initial shadow set?

good question.  Thinking more of it, I think a reasonable interface is
to expect a child userns to inherit the same shadow groups as its parent
userns.  If "shadow" is written again to the /proc/PID/setgroups file
then it grows shadow groups set to include the ones the userns had at
creation time (which includes the parent shadow groups).  What do you
think of it?  I'll play more with this idea and see if it works.


>>  	/* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
>>  	mutex_lock(&userns_state_mutex);
>> @@ -177,6 +178,9 @@ static void free_user_ns(struct work_struct *work)
>>  	struct user_namespace *parent, *ns =
>>  		container_of(work, struct user_namespace, work);
>>  
>> +	if (ns->shadow_group_info)
>> +		put_group_info(ns->shadow_group_info);
>> +
>>  	do {
>>  		struct ucounts *ucounts = ns->ucounts;
>>  		parent = ns->parent;
>> @@ -1185,7 +1189,8 @@ int proc_setgroups_show(struct seq_file *seq, void *v)
>>  
>>  	seq_printf(seq, "%s\n",
>>  		   (userns_flags & USERNS_SETGROUPS_ALLOWED) ?
>> -		   "allow" : "deny");
>> +		   ((userns_flags & USERNS_SETGROUPS_SHADOW) ? "shadow" : "allow")
>> +		   : "deny");
>>  	return 0;
>>  }
>>  
>> @@ -1196,6 +1201,7 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>  	struct user_namespace *ns = seq->private;
>>  	char kbuf[8], *pos;
>>  	bool setgroups_allowed;
>> +	bool setgroups_shadow = false;
>>  	ssize_t ret;
>>  
>>  	/* Only allow a very narrow range of strings to be written */
>> @@ -1215,12 +1221,14 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>  	if (strncmp(pos, "allow", 5) == 0) {
>>  		pos += 5;
>>  		setgroups_allowed = true;
>> -	}
>> -	else if (strncmp(pos, "deny", 4) == 0) {
>> +	} else if (strncmp(pos, "shadow", 6) == 0) {
>> +		pos += 6;
>> +		setgroups_allowed = true;
>> +		setgroups_shadow = true;
>> +	} else if (strncmp(pos, "deny", 4) == 0) {
>>  		pos += 4;
>>  		setgroups_allowed = false;
>> -	}
>> -	else
>> +	} else
>>  		goto out;
>>  
>>  	/* Verify there is not trailing junk on the line */
>> @@ -1236,6 +1244,13 @@ ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
>>  		 */
>>  		if (!(ns->flags & USERNS_SETGROUPS_ALLOWED))
>>  			goto out_unlock;
>> +
>> +		if (setgroups_shadow)
>
> I might be missing something here.  Once userns u1 has unshared
> u2 with 'shadow' set, when u2 unshare u3, what here prevents
> u3 from clearing shadow flag?
>
>> +			ns->flags |= USERNS_SETGROUPS_SHADOW;

once it is set, the shadow flag cannot be cleared.  These flags will be
inherited by any child user namespace thus preventing a nested userns to
gain the capability to drop these groups.

Regards,
Giuseppe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 13:30     ` Giuseppe Scrivano
@ 2021-05-17 14:33       ` Christian Brauner
  2021-05-17 16:17         ` Serge E. Hallyn
                           ` (2 more replies)
  2021-05-17 16:08       ` Serge E. Hallyn
  1 sibling, 3 replies; 18+ messages in thread
From: Christian Brauner @ 2021-05-17 14:33 UTC (permalink / raw)
  To: Giuseppe Scrivano; +Cc: Serge E. Hallyn, linux-kernel, dwalsh, ebiederm

On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> Hi Serge,
> 
> thanks for the review.
> 
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > On Mon, May 10, 2021 at 03:00:09PM +0200, Giuseppe Scrivano wrote:
> >> add a new mode 'shadow' to the /proc/PID/setgroups file.
> >> 
> >> When the 'shadow' string is written to the file, the current process
> >> groups are inherited from the process and stored into the user
> >> namespace.  These groups will be silently added on each setgroups(2)
> >> call.  A child user namespace won't be able to drop these groups
> >> anymore.
> >> 
> >> It enables using setgroups(2) in user namespaces on systems where
> >> negative ACLs are used.
> >> 
> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> >
> > Thanks for re-sending.
> >
> > Two closely related questions (and one comment) below.
> >
> >> ---
> >>  include/linux/cred.h           |  4 ++-
> >>  include/linux/user_namespace.h | 11 +++++--
> >>  kernel/groups.c                | 60 +++++++++++++++++++++++++---------
> >>  kernel/uid16.c                 | 38 ++++++++++++++++-----
> >>  kernel/user_namespace.c        | 34 +++++++++++++++----
> >>  5 files changed, 114 insertions(+), 33 deletions(-)
> >> 
> >> diff --git a/include/linux/cred.h b/include/linux/cred.h
> >> index 14971322e1a0..f3e631293532 100644
> >> --- a/include/linux/cred.h
> >> +++ b/include/linux/cred.h
> >> @@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
> >>  
> >>  extern int set_current_groups(struct group_info *);
> >>  extern void set_groups(struct cred *, struct group_info *);
> >> -extern bool may_setgroups(void);
> >> +extern bool may_setgroups(struct group_info **shadowed_groups);
> >> +extern void add_shadowed_groups(struct group_info *group_info,
> >> +				struct group_info *shadowed);
> >>  extern void groups_sort(struct group_info *);
> >>  #else
> >>  static inline void groups_free(struct group_info *group_info)
> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> index 1d08dbbcfe32..cb003172de20 100644
> >> --- a/include/linux/user_namespace.h
> >> +++ b/include/linux/user_namespace.h
> >> @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
> >>  };
> >>  
> >>  #define USERNS_SETGROUPS_ALLOWED 1UL
> >> +#define USERNS_SETGROUPS_SHADOW  2UL
> >>  
> >>  #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
> >>  
> >> @@ -93,6 +94,9 @@ struct user_namespace {
> >>  #endif
> >>  	struct ucounts		*ucounts;
> >>  	int ucount_max[UCOUNT_COUNTS];
> >> +
> >> +	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
> >> +	struct group_info *shadow_group_info;
> >>  } __randomize_layout;
> >>  
> >>  struct ucounts {
> >> @@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
> >>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> >>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
> >>  extern int proc_setgroups_show(struct seq_file *m, void *v);
> >> -extern bool userns_may_setgroups(const struct user_namespace *ns);
> >
> > I realize there's not a single other comment in this file, but I
> > think userns_may_setgroups() could use a comment to explain what
> > shadowed_groups is.
> 
> sure, I'll add some comments to the code.
> 
> >> +extern bool userns_may_setgroups(const struct user_namespace *ns,
> >> +				 struct group_info **shadowed_groups);
> >>  extern bool in_userns(const struct user_namespace *ancestor,
> >>  		       const struct user_namespace *child);
> >>  extern bool current_in_userns(const struct user_namespace *target_ns);
> >> @@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
> >>  {
> >>  }
> >>  
> >> -static inline bool userns_may_setgroups(const struct user_namespace *ns)
> >> +static inline bool userns_may_setgroups(const struct user_namespace *ns,
> >> +					struct group_info **shadowed_groups)
> >>  {
> >> +	*shadowed_groups = NULL;
> >>  	return true;
> >>  }
> >>  
> >> diff --git a/kernel/groups.c b/kernel/groups.c
> >> index 787b381c7c00..f0c3b49da19e 100644
> >> --- a/kernel/groups.c
> >> +++ b/kernel/groups.c
> >> @@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
> >>  
> >>  /* fill a group_info from a user-space array - it must be allocated already */
> >>  static int groups_from_user(struct group_info *group_info,
> >> -    gid_t __user *grouplist)
> >> +			    gid_t __user *grouplist, int count)
> >>  {
> >>  	struct user_namespace *user_ns = current_user_ns();
> >>  	int i;
> >> -	unsigned int count = group_info->ngroups;
> >>  
> >>  	for (i = 0; i < count; i++) {
> >>  		gid_t gid;
> >> @@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
> >>  	return i;
> >>  }
> >>  
> >> -bool may_setgroups(void)
> >> +bool may_setgroups(struct group_info **shadowed_groups)
> >>  {
> >>  	struct user_namespace *user_ns = current_user_ns();
> >>  
> >>  	return ns_capable_setid(user_ns, CAP_SETGID) &&
> >> -		userns_may_setgroups(user_ns);
> >> +		userns_may_setgroups(user_ns, shadowed_groups);
> >> +}
> >> +
> >> +void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
> >> +{
> >> +	int i, j;
> >> +
> >> +	for (i = 0; i < shadowed->ngroups; i++) {
> >> +		kgid_t kgid = shadowed->gid[i];
> >> +
> >> +		j = group_info->ngroups - i - 1;
> >> +		group_info->gid[j] = kgid;
> >> +	}
> >>  }
> >>  
> >>  /*
> >> @@ -184,27 +195,44 @@ bool may_setgroups(void)
> >>  
> >>  SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
> >>  {
> >> -	struct group_info *group_info;
> >> +	struct group_info *shadowed_groups = NULL;
> >> +	struct group_info *group_info = NULL;
> >> +	unsigned int arraysize = gidsetsize;
> >>  	int retval;
> >>  
> >> -	if (!may_setgroups())
> >> -		return -EPERM;
> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> >> +	if (arraysize > NGROUPS_MAX)
> >>  		return -EINVAL;
> >>  
> >> -	group_info = groups_alloc(gidsetsize);
> >> -	if (!group_info)
> >> -		return -ENOMEM;
> >> -	retval = groups_from_user(group_info, grouplist);
> >> -	if (retval) {
> >> -		put_group_info(group_info);
> >> -		return retval;
> >> +	if (!may_setgroups(&shadowed_groups))
> >> +		return -EPERM;
> >> +
> >> +	if (shadowed_groups) {
> >> +		retval = -EINVAL;
> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
> >> +			goto out;
> >> +		arraysize += shadowed_groups->ngroups;
> >>  	}
> >>  
> >> +	group_info = groups_alloc(arraysize);
> >> +	retval = -ENOMEM;
> >> +	if (!group_info)
> >> +		goto out;
> >> +
> >> +	retval = groups_from_user(group_info, grouplist, gidsetsize);
> >> +	if (retval)
> >> +		goto out;
> >> +
> >> +	if (shadowed_groups)
> >> +		add_shadowed_groups(group_info, shadowed_groups);
> >> +
> >>  	groups_sort(group_info);
> >>  	retval = set_current_groups(group_info);
> >> -	put_group_info(group_info);
> >>  
> >> +out:
> >> +	if (group_info)
> >> +		put_group_info(group_info);
> >> +	if (shadowed_groups)
> >> +		put_group_info(shadowed_groups);
> >>  	return retval;
> >>  }
> >>  
> >> diff --git a/kernel/uid16.c b/kernel/uid16.c
> >> index af6925d8599b..cb1110f083ce 100644
> >> --- a/kernel/uid16.c
> >> +++ b/kernel/uid16.c
> >> @@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
> >>  }
> >>  
> >>  static int groups16_from_user(struct group_info *group_info,
> >> -    old_gid_t __user *grouplist)
> >> +			      old_gid_t __user *grouplist, int count)
> >>  {
> >>  	struct user_namespace *user_ns = current_user_ns();
> >>  	int i;
> >>  	old_gid_t group;
> >>  	kgid_t kgid;
> >>  
> >> -	for (i = 0; i < group_info->ngroups; i++) {
> >> +	for (i = 0; i < count; i++) {
> >>  		if (get_user(group, grouplist+i))
> >>  			return  -EFAULT;
> >>  
> >> @@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> >>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> >>  {
> >>  	struct group_info *group_info;
> >> +	struct group_info *shadowed_groups = NULL;
> >>  	int retval;
> >> +	unsigned int arraysize = gidsetsize;
> >>  
> >> -	if (!may_setgroups())
> >> -		return -EPERM;
> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> >> +	if (arraysize > NGROUPS_MAX)
> >>  		return -EINVAL;
> >>  
> >> -	group_info = groups_alloc(gidsetsize);
> >> -	if (!group_info)
> >> +	if (!may_setgroups(&shadowed_groups))
> >> +		return -EPERM;
> >> +
> >> +	if (shadowed_groups) {
> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
> >> +			put_group_info(shadowed_groups);
> >> +			return -EINVAL;
> >> +		}
> >> +		arraysize += shadowed_groups->ngroups;
> >> +	}
> >> +
> >> +	group_info = groups_alloc(arraysize);
> >> +	if (!group_info) {
> >> +		if (shadowed_groups)
> >> +			put_group_info(shadowed_groups);
> >>  		return -ENOMEM;
> >> -	retval = groups16_from_user(group_info, grouplist);
> >> +	}
> >> +
> >> +	retval = groups16_from_user(group_info, grouplist, gidsetsize);
> >>  	if (retval) {
> >> +		if (shadowed_groups)
> >> +			put_group_info(shadowed_groups);
> >>  		put_group_info(group_info);
> >>  		return retval;
> >>  	}
> >>  
> >> +	if (shadowed_groups)
> >> +		add_shadowed_groups(group_info, shadowed_groups);
> >> +
> >>  	groups_sort(group_info);
> >>  	retval = set_current_groups(group_info);
> >>  	put_group_info(group_info);
> >> +	if (shadowed_groups)
> >> +		put_group_info(shadowed_groups);
> >>  
> >>  	return retval;
> >>  }
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 8d62863721b0..b1940b63f7ac 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
> >>  		ns->ucount_max[i] = INT_MAX;
> >>  	}
> >>  	ns->ucounts = ucounts;
> >> +	ns->shadow_group_info = get_current_groups();
> >
> > If userns u1 unshares u2 with shadow set, then when u2 unshares
> > u3, should u3 get the same shadowed set that u2 has, or should it
> > get all of u2's groups as u3's initial shadow set?
> 
> good question.  Thinking more of it, I think a reasonable interface is
> to expect a child userns to inherit the same shadow groups as its parent
> userns.  If "shadow" is written again to the /proc/PID/setgroups file
> then it grows shadow groups set to include the ones the userns had at
> creation time (which includes the parent shadow groups).  What do you
> think of it?  I'll play more with this idea and see if it works.

So when I initially looked at that proposal I was neither "yay" or "nay"
since it seemed useful to people and it looked somewhat straightforward
to implement.

But I do have concerns now after seeing this. The whole
/proc/<pid>/setgroups API is terrible in the first place and causes even
more special-casing in container runtimes then there already is. But it
fixes a security issue so ok we'll live with it.

But I'm not happy about extending its format to include more options. I
really don't want the precedent of adding magic keywords into this file.

Which brings me to my second concern. I think starting to magically
inherit group ids isn't a great idea. It's got a lot of potential for
confusion.

The point Serge here made makes this pretty obvious imho. I don't think
introducing the complexities of magic group inheritance is something we
should do.

Alternative proposal, can we solve this in userspace instead?

As has been pointed out there is a solution to this problem already
which is to explicitly map those groups through, i.e. punch holes for
the groups to be inherited.

So can't we introduce a new mode for newgidmap by e.g. introducing
another /etc/setgroups file or something similar that can be configured
by the administrator. It could take options, e.g. "shadow=always" which
could mean "everyone must inherit their groups" so newgidmap will punch
holes for the caller's groups when writing the gid mapping. We could
also extend this by making newgidmap take a command line switch so it's
on a case-by-case basis.

This is even more flexible since you could extend the new /etc/setgroups
file to specify a list of groups that must always be preserved. I'd
rather see something like this rather than a magic inheritance switch.

Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 13:30     ` Giuseppe Scrivano
  2021-05-17 14:33       ` Christian Brauner
@ 2021-05-17 16:08       ` Serge E. Hallyn
  1 sibling, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2021-05-17 16:08 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: Serge E. Hallyn, linux-kernel, dwalsh, christian.brauner, ebiederm

On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> Hi Serge,
> 
> thanks for the review.
> 
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 8d62863721b0..b1940b63f7ac 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
> >>  		ns->ucount_max[i] = INT_MAX;
> >>  	}
> >>  	ns->ucounts = ucounts;
> >> +	ns->shadow_group_info = get_current_groups();
> >
> > If userns u1 unshares u2 with shadow set, then when u2 unshares
> > u3, should u3 get the same shadowed set that u2 has, or should it
> > get all of u2's groups as u3's initial shadow set?
> 
> good question.  Thinking more of it, I think a reasonable interface is
> to expect a child userns to inherit the same shadow groups as its parent
> userns.  If "shadow" is written again to the /proc/PID/setgroups file
> then it grows shadow groups set to include the ones the userns had at
> creation time (which includes the parent shadow groups).  What do you
> think of it?  I'll play more with this idea and see if it works.

That's what I was thinking would make the most sense.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 14:33       ` Christian Brauner
@ 2021-05-17 16:17         ` Serge E. Hallyn
  2021-05-18  9:32           ` Christian Brauner
  2021-05-17 19:00         ` Giuseppe Scrivano
  2021-05-21 14:03         ` Snaipe
  2 siblings, 1 reply; 18+ messages in thread
From: Serge E. Hallyn @ 2021-05-17 16:17 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Giuseppe Scrivano, Serge E. Hallyn, linux-kernel, dwalsh, ebiederm

On Mon, May 17, 2021 at 04:33:21PM +0200, Christian Brauner wrote:
> On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> > >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > >> index 8d62863721b0..b1940b63f7ac 100644
> > >> --- a/kernel/user_namespace.c
> > >> +++ b/kernel/user_namespace.c
> > >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
> > >>  		ns->ucount_max[i] = INT_MAX;
> > >>  	}
> > >>  	ns->ucounts = ucounts;
> > >> +	ns->shadow_group_info = get_current_groups();
> > >
> > > If userns u1 unshares u2 with shadow set, then when u2 unshares
> > > u3, should u3 get the same shadowed set that u2 has, or should it
> > > get all of u2's groups as u3's initial shadow set?
> > 
> > good question.  Thinking more of it, I think a reasonable interface is
> > to expect a child userns to inherit the same shadow groups as its parent
> > userns.  If "shadow" is written again to the /proc/PID/setgroups file
> > then it grows shadow groups set to include the ones the userns had at
> > creation time (which includes the parent shadow groups).  What do you
> > think of it?  I'll play more with this idea and see if it works.
> 
> So when I initially looked at that proposal I was neither "yay" or "nay"
> since it seemed useful to people and it looked somewhat straightforward
> to implement.
> 
> But I do have concerns now after seeing this. The whole
> /proc/<pid>/setgroups API is terrible in the first place and causes even
> more special-casing in container runtimes then there already is. But it
> fixes a security issue so ok we'll live with it.
> 
> But I'm not happy about extending its format to include more options. I
> really don't want the precedent of adding magic keywords into this file.
> 
> Which brings me to my second concern. I think starting to magically
> inherit group ids isn't a great idea. It's got a lot of potential for
> confusion.
> 
> The point Serge here made makes this pretty obvious imho. I don't think
> introducing the complexities of magic group inheritance is something we
> should do.
> 
> Alternative proposal, can we solve this in userspace instead?
> 
> As has been pointed out there is a solution to this problem already
> which is to explicitly map those groups through, i.e. punch holes for
> the groups to be inherited.
> 
> So can't we introduce a new mode for newgidmap by e.g. introducing
> another /etc/setgroups file or something similar that can be configured
> by the administrator. It could take options, e.g. "shadow=always" which
> could mean "everyone must inherit their groups" so newgidmap will punch
> holes for the caller's groups when writing the gid mapping. We could
> also extend this by making newgidmap take a command line switch so it's
> on a case-by-case basis.
> 
> This is even more flexible since you could extend the new /etc/setgroups
> file to specify a list of groups that must always be preserved. I'd
> rather see something like this rather than a magic inheritance switch.
> 
> Christian

That sounds reasonable, but my concern is that admins currently using
groups to deny file access will need to take extra steps to maintain
that guarantee.  I think that falls under the category of a regression.
Unless we make 'shadow=always' the default.  But then *that* will regress
users who currently do not want that feature :)

Anyway, if we do go this route - or maybe a login.defs option
"ALLOW_UNPRIV_GROUPS_DROP" - perhaps we can also add a new /etc/subauxgroups
file (TODO find a better name) which admins who are in the know can use to say
"hallyn 2000" meaning "user hallyn cannot drop group 2000"

-serge

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 14:33       ` Christian Brauner
  2021-05-17 16:17         ` Serge E. Hallyn
@ 2021-05-17 19:00         ` Giuseppe Scrivano
  2021-05-18  9:16           ` Christian Brauner
  2021-05-21 14:03         ` Snaipe
  2 siblings, 1 reply; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-17 19:00 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Serge E. Hallyn, linux-kernel, dwalsh, ebiederm

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
>> Hi Serge,
>> 
>> thanks for the review.
>> 
>> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> 
>> > On Mon, May 10, 2021 at 03:00:09PM +0200, Giuseppe Scrivano wrote:
>> >> add a new mode 'shadow' to the /proc/PID/setgroups file.
>> >> 
>> >> When the 'shadow' string is written to the file, the current process
>> >> groups are inherited from the process and stored into the user
>> >> namespace.  These groups will be silently added on each setgroups(2)
>> >> call.  A child user namespace won't be able to drop these groups
>> >> anymore.
>> >> 
>> >> It enables using setgroups(2) in user namespaces on systems where
>> >> negative ACLs are used.
>> >> 
>> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>> >
>> > Thanks for re-sending.
>> >
>> > Two closely related questions (and one comment) below.
>> >
>> >> ---
>> >>  include/linux/cred.h           |  4 ++-
>> >>  include/linux/user_namespace.h | 11 +++++--
>> >>  kernel/groups.c                | 60 +++++++++++++++++++++++++---------
>> >>  kernel/uid16.c                 | 38 ++++++++++++++++-----
>> >>  kernel/user_namespace.c        | 34 +++++++++++++++----
>> >>  5 files changed, 114 insertions(+), 33 deletions(-)
>> >> 
>> >> diff --git a/include/linux/cred.h b/include/linux/cred.h
>> >> index 14971322e1a0..f3e631293532 100644
>> >> --- a/include/linux/cred.h
>> >> +++ b/include/linux/cred.h
>> >> @@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
>> >>  
>> >>  extern int set_current_groups(struct group_info *);
>> >>  extern void set_groups(struct cred *, struct group_info *);
>> >> -extern bool may_setgroups(void);
>> >> +extern bool may_setgroups(struct group_info **shadowed_groups);
>> >> +extern void add_shadowed_groups(struct group_info *group_info,
>> >> +				struct group_info *shadowed);
>> >>  extern void groups_sort(struct group_info *);
>> >>  #else
>> >>  static inline void groups_free(struct group_info *group_info)
>> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> >> index 1d08dbbcfe32..cb003172de20 100644
>> >> --- a/include/linux/user_namespace.h
>> >> +++ b/include/linux/user_namespace.h
>> >> @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
>> >>  };
>> >>  
>> >>  #define USERNS_SETGROUPS_ALLOWED 1UL
>> >> +#define USERNS_SETGROUPS_SHADOW  2UL
>> >>  
>> >>  #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
>> >>  
>> >> @@ -93,6 +94,9 @@ struct user_namespace {
>> >>  #endif
>> >>  	struct ucounts		*ucounts;
>> >>  	int ucount_max[UCOUNT_COUNTS];
>> >> +
>> >> +	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
>> >> +	struct group_info *shadow_group_info;
>> >>  } __randomize_layout;
>> >>  
>> >>  struct ucounts {
>> >> @@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
>> >>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
>> >>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
>> >>  extern int proc_setgroups_show(struct seq_file *m, void *v);
>> >> -extern bool userns_may_setgroups(const struct user_namespace *ns);
>> >
>> > I realize there's not a single other comment in this file, but I
>> > think userns_may_setgroups() could use a comment to explain what
>> > shadowed_groups is.
>> 
>> sure, I'll add some comments to the code.
>> 
>> >> +extern bool userns_may_setgroups(const struct user_namespace *ns,
>> >> +				 struct group_info **shadowed_groups);
>> >>  extern bool in_userns(const struct user_namespace *ancestor,
>> >>  		       const struct user_namespace *child);
>> >>  extern bool current_in_userns(const struct user_namespace *target_ns);
>> >> @@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
>> >>  {
>> >>  }
>> >>  
>> >> -static inline bool userns_may_setgroups(const struct user_namespace *ns)
>> >> +static inline bool userns_may_setgroups(const struct user_namespace *ns,
>> >> +					struct group_info **shadowed_groups)
>> >>  {
>> >> +	*shadowed_groups = NULL;
>> >>  	return true;
>> >>  }
>> >>  
>> >> diff --git a/kernel/groups.c b/kernel/groups.c
>> >> index 787b381c7c00..f0c3b49da19e 100644
>> >> --- a/kernel/groups.c
>> >> +++ b/kernel/groups.c
>> >> @@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
>> >>  
>> >>  /* fill a group_info from a user-space array - it must be allocated already */
>> >>  static int groups_from_user(struct group_info *group_info,
>> >> -    gid_t __user *grouplist)
>> >> +			    gid_t __user *grouplist, int count)
>> >>  {
>> >>  	struct user_namespace *user_ns = current_user_ns();
>> >>  	int i;
>> >> -	unsigned int count = group_info->ngroups;
>> >>  
>> >>  	for (i = 0; i < count; i++) {
>> >>  		gid_t gid;
>> >> @@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
>> >>  	return i;
>> >>  }
>> >>  
>> >> -bool may_setgroups(void)
>> >> +bool may_setgroups(struct group_info **shadowed_groups)
>> >>  {
>> >>  	struct user_namespace *user_ns = current_user_ns();
>> >>  
>> >>  	return ns_capable_setid(user_ns, CAP_SETGID) &&
>> >> -		userns_may_setgroups(user_ns);
>> >> +		userns_may_setgroups(user_ns, shadowed_groups);
>> >> +}
>> >> +
>> >> +void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
>> >> +{
>> >> +	int i, j;
>> >> +
>> >> +	for (i = 0; i < shadowed->ngroups; i++) {
>> >> +		kgid_t kgid = shadowed->gid[i];
>> >> +
>> >> +		j = group_info->ngroups - i - 1;
>> >> +		group_info->gid[j] = kgid;
>> >> +	}
>> >>  }
>> >>  
>> >>  /*
>> >> @@ -184,27 +195,44 @@ bool may_setgroups(void)
>> >>  
>> >>  SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>> >>  {
>> >> -	struct group_info *group_info;
>> >> +	struct group_info *shadowed_groups = NULL;
>> >> +	struct group_info *group_info = NULL;
>> >> +	unsigned int arraysize = gidsetsize;
>> >>  	int retval;
>> >>  
>> >> -	if (!may_setgroups())
>> >> -		return -EPERM;
>> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
>> >> +	if (arraysize > NGROUPS_MAX)
>> >>  		return -EINVAL;
>> >>  
>> >> -	group_info = groups_alloc(gidsetsize);
>> >> -	if (!group_info)
>> >> -		return -ENOMEM;
>> >> -	retval = groups_from_user(group_info, grouplist);
>> >> -	if (retval) {
>> >> -		put_group_info(group_info);
>> >> -		return retval;
>> >> +	if (!may_setgroups(&shadowed_groups))
>> >> +		return -EPERM;
>> >> +
>> >> +	if (shadowed_groups) {
>> >> +		retval = -EINVAL;
>> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
>> >> +			goto out;
>> >> +		arraysize += shadowed_groups->ngroups;
>> >>  	}
>> >>  
>> >> +	group_info = groups_alloc(arraysize);
>> >> +	retval = -ENOMEM;
>> >> +	if (!group_info)
>> >> +		goto out;
>> >> +
>> >> +	retval = groups_from_user(group_info, grouplist, gidsetsize);
>> >> +	if (retval)
>> >> +		goto out;
>> >> +
>> >> +	if (shadowed_groups)
>> >> +		add_shadowed_groups(group_info, shadowed_groups);
>> >> +
>> >>  	groups_sort(group_info);
>> >>  	retval = set_current_groups(group_info);
>> >> -	put_group_info(group_info);
>> >>  
>> >> +out:
>> >> +	if (group_info)
>> >> +		put_group_info(group_info);
>> >> +	if (shadowed_groups)
>> >> +		put_group_info(shadowed_groups);
>> >>  	return retval;
>> >>  }
>> >>  
>> >> diff --git a/kernel/uid16.c b/kernel/uid16.c
>> >> index af6925d8599b..cb1110f083ce 100644
>> >> --- a/kernel/uid16.c
>> >> +++ b/kernel/uid16.c
>> >> @@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
>> >>  }
>> >>  
>> >>  static int groups16_from_user(struct group_info *group_info,
>> >> -    old_gid_t __user *grouplist)
>> >> +			      old_gid_t __user *grouplist, int count)
>> >>  {
>> >>  	struct user_namespace *user_ns = current_user_ns();
>> >>  	int i;
>> >>  	old_gid_t group;
>> >>  	kgid_t kgid;
>> >>  
>> >> -	for (i = 0; i < group_info->ngroups; i++) {
>> >> +	for (i = 0; i < count; i++) {
>> >>  		if (get_user(group, grouplist+i))
>> >>  			return  -EFAULT;
>> >>  
>> >> @@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>> >>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>> >>  {
>> >>  	struct group_info *group_info;
>> >> +	struct group_info *shadowed_groups = NULL;
>> >>  	int retval;
>> >> +	unsigned int arraysize = gidsetsize;
>> >>  
>> >> -	if (!may_setgroups())
>> >> -		return -EPERM;
>> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
>> >> +	if (arraysize > NGROUPS_MAX)
>> >>  		return -EINVAL;
>> >>  
>> >> -	group_info = groups_alloc(gidsetsize);
>> >> -	if (!group_info)
>> >> +	if (!may_setgroups(&shadowed_groups))
>> >> +		return -EPERM;
>> >> +
>> >> +	if (shadowed_groups) {
>> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
>> >> +			put_group_info(shadowed_groups);
>> >> +			return -EINVAL;
>> >> +		}
>> >> +		arraysize += shadowed_groups->ngroups;
>> >> +	}
>> >> +
>> >> +	group_info = groups_alloc(arraysize);
>> >> +	if (!group_info) {
>> >> +		if (shadowed_groups)
>> >> +			put_group_info(shadowed_groups);
>> >>  		return -ENOMEM;
>> >> -	retval = groups16_from_user(group_info, grouplist);
>> >> +	}
>> >> +
>> >> +	retval = groups16_from_user(group_info, grouplist, gidsetsize);
>> >>  	if (retval) {
>> >> +		if (shadowed_groups)
>> >> +			put_group_info(shadowed_groups);
>> >>  		put_group_info(group_info);
>> >>  		return retval;
>> >>  	}
>> >>  
>> >> +	if (shadowed_groups)
>> >> +		add_shadowed_groups(group_info, shadowed_groups);
>> >> +
>> >>  	groups_sort(group_info);
>> >>  	retval = set_current_groups(group_info);
>> >>  	put_group_info(group_info);
>> >> +	if (shadowed_groups)
>> >> +		put_group_info(shadowed_groups);
>> >>  
>> >>  	return retval;
>> >>  }
>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> index 8d62863721b0..b1940b63f7ac 100644
>> >> --- a/kernel/user_namespace.c
>> >> +++ b/kernel/user_namespace.c
>> >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
>> >>  		ns->ucount_max[i] = INT_MAX;
>> >>  	}
>> >>  	ns->ucounts = ucounts;
>> >> +	ns->shadow_group_info = get_current_groups();
>> >
>> > If userns u1 unshares u2 with shadow set, then when u2 unshares
>> > u3, should u3 get the same shadowed set that u2 has, or should it
>> > get all of u2's groups as u3's initial shadow set?
>> 
>> good question.  Thinking more of it, I think a reasonable interface is
>> to expect a child userns to inherit the same shadow groups as its parent
>> userns.  If "shadow" is written again to the /proc/PID/setgroups file
>> then it grows shadow groups set to include the ones the userns had at
>> creation time (which includes the parent shadow groups).  What do you
>> think of it?  I'll play more with this idea and see if it works.
>
> So when I initially looked at that proposal I was neither "yay" or "nay"
> since it seemed useful to people and it looked somewhat straightforward
> to implement.
>
> But I do have concerns now after seeing this. The whole
> /proc/<pid>/setgroups API is terrible in the first place and causes even
> more special-casing in container runtimes then there already is. But it
> fixes a security issue so ok we'll live with it.
>
> But I'm not happy about extending its format to include more options. I
> really don't want the precedent of adding magic keywords into this file.
>
> Which brings me to my second concern. I think starting to magically
> inherit group ids isn't a great idea. It's got a lot of potential for
> confusion.
>
> The point Serge here made makes this pretty obvious imho. I don't think
> introducing the complexities of magic group inheritance is something we
> should do.
>
> Alternative proposal, can we solve this in userspace instead?
>
> As has been pointed out there is a solution to this problem already
> which is to explicitly map those groups through, i.e. punch holes for
> the groups to be inherited.

That is an useful feature and I think we should also have, but IMO it
solves a different problem.

It can be used by unprivileged users to maintain their additional gids
inside a user namespace through the setgid helper, but it won't be
enough to safely use setgroups when negative "groups" are involved.

> So can't we introduce a new mode for newgidmap by e.g. introducing
> another /etc/setgroups file or something similar that can be configured
> by the administrator. It could take options, e.g. "shadow=always" which
> could mean "everyone must inherit their groups" so newgidmap will punch
> holes for the caller's groups when writing the gid mapping. We could
> also extend this by making newgidmap take a command line switch so it's
> on a case-by-case basis.
>
> This is even more flexible since you could extend the new /etc/setgroups
> file to specify a list of groups that must always be preserved. I'd
> rather see something like this rather than a magic inheritance switch.

This is helpful and tracked upstream: https://github.com/shadow-maint/shadow/issues/135

But how can we enforce the shadow=always at runtime when setgroups is
enabled?

Thanks,
Giuseppe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 19:00         ` Giuseppe Scrivano
@ 2021-05-18  9:16           ` Christian Brauner
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2021-05-18  9:16 UTC (permalink / raw)
  To: Giuseppe Scrivano; +Cc: Serge E. Hallyn, linux-kernel, dwalsh, ebiederm

On Mon, May 17, 2021 at 09:00:58PM +0200, Giuseppe Scrivano wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> >> Hi Serge,
> >> 
> >> thanks for the review.
> >> 
> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> 
> >> > On Mon, May 10, 2021 at 03:00:09PM +0200, Giuseppe Scrivano wrote:
> >> >> add a new mode 'shadow' to the /proc/PID/setgroups file.
> >> >> 
> >> >> When the 'shadow' string is written to the file, the current process
> >> >> groups are inherited from the process and stored into the user
> >> >> namespace.  These groups will be silently added on each setgroups(2)
> >> >> call.  A child user namespace won't be able to drop these groups
> >> >> anymore.
> >> >> 
> >> >> It enables using setgroups(2) in user namespaces on systems where
> >> >> negative ACLs are used.
> >> >> 
> >> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> >> >
> >> > Thanks for re-sending.
> >> >
> >> > Two closely related questions (and one comment) below.
> >> >
> >> >> ---
> >> >>  include/linux/cred.h           |  4 ++-
> >> >>  include/linux/user_namespace.h | 11 +++++--
> >> >>  kernel/groups.c                | 60 +++++++++++++++++++++++++---------
> >> >>  kernel/uid16.c                 | 38 ++++++++++++++++-----
> >> >>  kernel/user_namespace.c        | 34 +++++++++++++++----
> >> >>  5 files changed, 114 insertions(+), 33 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/cred.h b/include/linux/cred.h
> >> >> index 14971322e1a0..f3e631293532 100644
> >> >> --- a/include/linux/cred.h
> >> >> +++ b/include/linux/cred.h
> >> >> @@ -63,7 +63,9 @@ extern int groups_search(const struct group_info *, kgid_t);
> >> >>  
> >> >>  extern int set_current_groups(struct group_info *);
> >> >>  extern void set_groups(struct cred *, struct group_info *);
> >> >> -extern bool may_setgroups(void);
> >> >> +extern bool may_setgroups(struct group_info **shadowed_groups);
> >> >> +extern void add_shadowed_groups(struct group_info *group_info,
> >> >> +				struct group_info *shadowed);
> >> >>  extern void groups_sort(struct group_info *);
> >> >>  #else
> >> >>  static inline void groups_free(struct group_info *group_info)
> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> >> index 1d08dbbcfe32..cb003172de20 100644
> >> >> --- a/include/linux/user_namespace.h
> >> >> +++ b/include/linux/user_namespace.h
> >> >> @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
> >> >>  };
> >> >>  
> >> >>  #define USERNS_SETGROUPS_ALLOWED 1UL
> >> >> +#define USERNS_SETGROUPS_SHADOW  2UL
> >> >>  
> >> >>  #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
> >> >>  
> >> >> @@ -93,6 +94,9 @@ struct user_namespace {
> >> >>  #endif
> >> >>  	struct ucounts		*ucounts;
> >> >>  	int ucount_max[UCOUNT_COUNTS];
> >> >> +
> >> >> +	/* Supplementary groups when setgroups "shadow" mode is enabled.   */
> >> >> +	struct group_info *shadow_group_info;
> >> >>  } __randomize_layout;
> >> >>  
> >> >>  struct ucounts {
> >> >> @@ -138,7 +142,8 @@ extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, lo
> >> >>  extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
> >> >>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
> >> >>  extern int proc_setgroups_show(struct seq_file *m, void *v);
> >> >> -extern bool userns_may_setgroups(const struct user_namespace *ns);
> >> >
> >> > I realize there's not a single other comment in this file, but I
> >> > think userns_may_setgroups() could use a comment to explain what
> >> > shadowed_groups is.
> >> 
> >> sure, I'll add some comments to the code.
> >> 
> >> >> +extern bool userns_may_setgroups(const struct user_namespace *ns,
> >> >> +				 struct group_info **shadowed_groups);
> >> >>  extern bool in_userns(const struct user_namespace *ancestor,
> >> >>  		       const struct user_namespace *child);
> >> >>  extern bool current_in_userns(const struct user_namespace *target_ns);
> >> >> @@ -167,8 +172,10 @@ static inline void put_user_ns(struct user_namespace *ns)
> >> >>  {
> >> >>  }
> >> >>  
> >> >> -static inline bool userns_may_setgroups(const struct user_namespace *ns)
> >> >> +static inline bool userns_may_setgroups(const struct user_namespace *ns,
> >> >> +					struct group_info **shadowed_groups)
> >> >>  {
> >> >> +	*shadowed_groups = NULL;
> >> >>  	return true;
> >> >>  }
> >> >>  
> >> >> diff --git a/kernel/groups.c b/kernel/groups.c
> >> >> index 787b381c7c00..f0c3b49da19e 100644
> >> >> --- a/kernel/groups.c
> >> >> +++ b/kernel/groups.c
> >> >> @@ -52,11 +52,10 @@ static int groups_to_user(gid_t __user *grouplist,
> >> >>  
> >> >>  /* fill a group_info from a user-space array - it must be allocated already */
> >> >>  static int groups_from_user(struct group_info *group_info,
> >> >> -    gid_t __user *grouplist)
> >> >> +			    gid_t __user *grouplist, int count)
> >> >>  {
> >> >>  	struct user_namespace *user_ns = current_user_ns();
> >> >>  	int i;
> >> >> -	unsigned int count = group_info->ngroups;
> >> >>  
> >> >>  	for (i = 0; i < count; i++) {
> >> >>  		gid_t gid;
> >> >> @@ -169,12 +168,24 @@ SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist)
> >> >>  	return i;
> >> >>  }
> >> >>  
> >> >> -bool may_setgroups(void)
> >> >> +bool may_setgroups(struct group_info **shadowed_groups)
> >> >>  {
> >> >>  	struct user_namespace *user_ns = current_user_ns();
> >> >>  
> >> >>  	return ns_capable_setid(user_ns, CAP_SETGID) &&
> >> >> -		userns_may_setgroups(user_ns);
> >> >> +		userns_may_setgroups(user_ns, shadowed_groups);
> >> >> +}
> >> >> +
> >> >> +void add_shadowed_groups(struct group_info *group_info, struct group_info *shadowed)
> >> >> +{
> >> >> +	int i, j;
> >> >> +
> >> >> +	for (i = 0; i < shadowed->ngroups; i++) {
> >> >> +		kgid_t kgid = shadowed->gid[i];
> >> >> +
> >> >> +		j = group_info->ngroups - i - 1;
> >> >> +		group_info->gid[j] = kgid;
> >> >> +	}
> >> >>  }
> >> >>  
> >> >>  /*
> >> >> @@ -184,27 +195,44 @@ bool may_setgroups(void)
> >> >>  
> >> >>  SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
> >> >>  {
> >> >> -	struct group_info *group_info;
> >> >> +	struct group_info *shadowed_groups = NULL;
> >> >> +	struct group_info *group_info = NULL;
> >> >> +	unsigned int arraysize = gidsetsize;
> >> >>  	int retval;
> >> >>  
> >> >> -	if (!may_setgroups())
> >> >> -		return -EPERM;
> >> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> >> >> +	if (arraysize > NGROUPS_MAX)
> >> >>  		return -EINVAL;
> >> >>  
> >> >> -	group_info = groups_alloc(gidsetsize);
> >> >> -	if (!group_info)
> >> >> -		return -ENOMEM;
> >> >> -	retval = groups_from_user(group_info, grouplist);
> >> >> -	if (retval) {
> >> >> -		put_group_info(group_info);
> >> >> -		return retval;
> >> >> +	if (!may_setgroups(&shadowed_groups))
> >> >> +		return -EPERM;
> >> >> +
> >> >> +	if (shadowed_groups) {
> >> >> +		retval = -EINVAL;
> >> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX)
> >> >> +			goto out;
> >> >> +		arraysize += shadowed_groups->ngroups;
> >> >>  	}
> >> >>  
> >> >> +	group_info = groups_alloc(arraysize);
> >> >> +	retval = -ENOMEM;
> >> >> +	if (!group_info)
> >> >> +		goto out;
> >> >> +
> >> >> +	retval = groups_from_user(group_info, grouplist, gidsetsize);
> >> >> +	if (retval)
> >> >> +		goto out;
> >> >> +
> >> >> +	if (shadowed_groups)
> >> >> +		add_shadowed_groups(group_info, shadowed_groups);
> >> >> +
> >> >>  	groups_sort(group_info);
> >> >>  	retval = set_current_groups(group_info);
> >> >> -	put_group_info(group_info);
> >> >>  
> >> >> +out:
> >> >> +	if (group_info)
> >> >> +		put_group_info(group_info);
> >> >> +	if (shadowed_groups)
> >> >> +		put_group_info(shadowed_groups);
> >> >>  	return retval;
> >> >>  }
> >> >>  
> >> >> diff --git a/kernel/uid16.c b/kernel/uid16.c
> >> >> index af6925d8599b..cb1110f083ce 100644
> >> >> --- a/kernel/uid16.c
> >> >> +++ b/kernel/uid16.c
> >> >> @@ -130,14 +130,14 @@ static int groups16_to_user(old_gid_t __user *grouplist,
> >> >>  }
> >> >>  
> >> >>  static int groups16_from_user(struct group_info *group_info,
> >> >> -    old_gid_t __user *grouplist)
> >> >> +			      old_gid_t __user *grouplist, int count)
> >> >>  {
> >> >>  	struct user_namespace *user_ns = current_user_ns();
> >> >>  	int i;
> >> >>  	old_gid_t group;
> >> >>  	kgid_t kgid;
> >> >>  
> >> >> -	for (i = 0; i < group_info->ngroups; i++) {
> >> >> +	for (i = 0; i < count; i++) {
> >> >>  		if (get_user(group, grouplist+i))
> >> >>  			return  -EFAULT;
> >> >>  
> >> >> @@ -177,25 +177,47 @@ SYSCALL_DEFINE2(getgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> >> >>  SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> >> >>  {
> >> >>  	struct group_info *group_info;
> >> >> +	struct group_info *shadowed_groups = NULL;
> >> >>  	int retval;
> >> >> +	unsigned int arraysize = gidsetsize;
> >> >>  
> >> >> -	if (!may_setgroups())
> >> >> -		return -EPERM;
> >> >> -	if ((unsigned)gidsetsize > NGROUPS_MAX)
> >> >> +	if (arraysize > NGROUPS_MAX)
> >> >>  		return -EINVAL;
> >> >>  
> >> >> -	group_info = groups_alloc(gidsetsize);
> >> >> -	if (!group_info)
> >> >> +	if (!may_setgroups(&shadowed_groups))
> >> >> +		return -EPERM;
> >> >> +
> >> >> +	if (shadowed_groups) {
> >> >> +		if (shadowed_groups->ngroups + gidsetsize > NGROUPS_MAX) {
> >> >> +			put_group_info(shadowed_groups);
> >> >> +			return -EINVAL;
> >> >> +		}
> >> >> +		arraysize += shadowed_groups->ngroups;
> >> >> +	}
> >> >> +
> >> >> +	group_info = groups_alloc(arraysize);
> >> >> +	if (!group_info) {
> >> >> +		if (shadowed_groups)
> >> >> +			put_group_info(shadowed_groups);
> >> >>  		return -ENOMEM;
> >> >> -	retval = groups16_from_user(group_info, grouplist);
> >> >> +	}
> >> >> +
> >> >> +	retval = groups16_from_user(group_info, grouplist, gidsetsize);
> >> >>  	if (retval) {
> >> >> +		if (shadowed_groups)
> >> >> +			put_group_info(shadowed_groups);
> >> >>  		put_group_info(group_info);
> >> >>  		return retval;
> >> >>  	}
> >> >>  
> >> >> +	if (shadowed_groups)
> >> >> +		add_shadowed_groups(group_info, shadowed_groups);
> >> >> +
> >> >>  	groups_sort(group_info);
> >> >>  	retval = set_current_groups(group_info);
> >> >>  	put_group_info(group_info);
> >> >> +	if (shadowed_groups)
> >> >> +		put_group_info(shadowed_groups);
> >> >>  
> >> >>  	return retval;
> >> >>  }
> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> >> index 8d62863721b0..b1940b63f7ac 100644
> >> >> --- a/kernel/user_namespace.c
> >> >> +++ b/kernel/user_namespace.c
> >> >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
> >> >>  		ns->ucount_max[i] = INT_MAX;
> >> >>  	}
> >> >>  	ns->ucounts = ucounts;
> >> >> +	ns->shadow_group_info = get_current_groups();
> >> >
> >> > If userns u1 unshares u2 with shadow set, then when u2 unshares
> >> > u3, should u3 get the same shadowed set that u2 has, or should it
> >> > get all of u2's groups as u3's initial shadow set?
> >> 
> >> good question.  Thinking more of it, I think a reasonable interface is
> >> to expect a child userns to inherit the same shadow groups as its parent
> >> userns.  If "shadow" is written again to the /proc/PID/setgroups file
> >> then it grows shadow groups set to include the ones the userns had at
> >> creation time (which includes the parent shadow groups).  What do you
> >> think of it?  I'll play more with this idea and see if it works.
> >
> > So when I initially looked at that proposal I was neither "yay" or "nay"
> > since it seemed useful to people and it looked somewhat straightforward
> > to implement.
> >
> > But I do have concerns now after seeing this. The whole
> > /proc/<pid>/setgroups API is terrible in the first place and causes even
> > more special-casing in container runtimes then there already is. But it
> > fixes a security issue so ok we'll live with it.
> >
> > But I'm not happy about extending its format to include more options. I
> > really don't want the precedent of adding magic keywords into this file.
> >
> > Which brings me to my second concern. I think starting to magically
> > inherit group ids isn't a great idea. It's got a lot of potential for
> > confusion.
> >
> > The point Serge here made makes this pretty obvious imho. I don't think
> > introducing the complexities of magic group inheritance is something we
> > should do.
> >
> > Alternative proposal, can we solve this in userspace instead?
> >
> > As has been pointed out there is a solution to this problem already
> > which is to explicitly map those groups through, i.e. punch holes for
> > the groups to be inherited.
> 
> That is an useful feature and I think we should also have, but IMO it
> solves a different problem.
> 
> It can be used by unprivileged users to maintain their additional gids
> inside a user namespace through the setgid helper, but it won't be
> enough to safely use setgroups when negative "groups" are involved.

Imho negative acls are a misconfiguration and as such a problem that
needs to be fixed in userspace. It's not that they are the default,
encouraged or widely used. Frankly, using negative acls as an argument
makes me more suspicious that this is the wrong way to go. :)
I'm very reluctant to see negative acls as a valid argument for
introducing more magic into the kernel to accommodate such userspace
configurations. It's not the kernel's job to fix this. Especially as we
did have that discussion in 2017 related to another bug, e.g. see:
https://bugs.launchpad.net/ubuntu/+source/shadow/+bug/1729357/comments/9

> 
> > So can't we introduce a new mode for newgidmap by e.g. introducing
> > another /etc/setgroups file or something similar that can be configured
> > by the administrator. It could take options, e.g. "shadow=always" which
> > could mean "everyone must inherit their groups" so newgidmap will punch
> > holes for the caller's groups when writing the gid mapping. We could
> > also extend this by making newgidmap take a command line switch so it's
> > on a case-by-case basis.
> >
> > This is even more flexible since you could extend the new /etc/setgroups
> > file to specify a list of groups that must always be preserved. I'd
> > rather see something like this rather than a magic inheritance switch.
> 
> This is helpful and tracked upstream: https://github.com/shadow-maint/shadow/issues/135
> 
> But how can we enforce the shadow=always at runtime when setgroups is
> enabled?

Enforcement only seems to be needed in the face of negative acls to
which the correct answer seems to me to not use them for container
users.
(But in general, this is a management problem similar to the selection of
id mappings and so isn't really different from reserving id ranges in
userspace across container runtimes.)

I also looked at Podman/crun and you do already have logic today to
handle setgroups inheritance under the
--group-add keep-groups
flag. Even featured on the project's website
https://github.com/containers/podman/blob/9788289f942ff7a79c214446047d0f943c66f409/troubleshooting.md#solution-18
and with a blog about it
https://www.redhat.com/sysadmin/supplemental-groups-podman-containers

That fixes most of your itch.
So this patch, apart from the accommodation of negative acls, is mostly
a convenience and in general I do really sympathize with that as I've
mentioned before to you.
But then there should be a good trade-off between cost and benefit. Not
just in terms of complexity for the kernel itself but also for api
useability and how understandable it is for userspace developers. And I
don't think we're hitting the mark with a change like this.

First, because it is solveable in userspace as --group-add clearly
shows. Second, because we're introducing this whole inheritance and
expansion mechanism alongside "shadow" in nested scenarios. And third,
if one looks at man user_namespaces and reads the section about
/etc/setgroups with the eye of a moderately experienced (container)
administrator it's already pretty confusing. Now we need to add yet
another section explaining that new "shadow" mode, how it relates to
"setgroups={allow,deny}" and to negative acls, and how it is inherited
and expanded when creating nested userns.
All this mostly motivated by a barely used concept with the potential to
muddy the semantics of user namespaces further. I generally don't like
to be so discouraging but I'm not at all convinced we want to do this.

Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 16:17         ` Serge E. Hallyn
@ 2021-05-18  9:32           ` Christian Brauner
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2021-05-18  9:32 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Giuseppe Scrivano, linux-kernel, dwalsh, ebiederm

On Mon, May 17, 2021 at 11:17:13AM -0500, Serge Hallyn wrote:
> On Mon, May 17, 2021 at 04:33:21PM +0200, Christian Brauner wrote:
> > On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> > > >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > > >> index 8d62863721b0..b1940b63f7ac 100644
> > > >> --- a/kernel/user_namespace.c
> > > >> +++ b/kernel/user_namespace.c
> > > >> @@ -123,6 +123,7 @@ int create_user_ns(struct cred *new)
> > > >>  		ns->ucount_max[i] = INT_MAX;
> > > >>  	}
> > > >>  	ns->ucounts = ucounts;
> > > >> +	ns->shadow_group_info = get_current_groups();
> > > >
> > > > If userns u1 unshares u2 with shadow set, then when u2 unshares
> > > > u3, should u3 get the same shadowed set that u2 has, or should it
> > > > get all of u2's groups as u3's initial shadow set?
> > > 
> > > good question.  Thinking more of it, I think a reasonable interface is
> > > to expect a child userns to inherit the same shadow groups as its parent
> > > userns.  If "shadow" is written again to the /proc/PID/setgroups file
> > > then it grows shadow groups set to include the ones the userns had at
> > > creation time (which includes the parent shadow groups).  What do you
> > > think of it?  I'll play more with this idea and see if it works.
> > 
> > So when I initially looked at that proposal I was neither "yay" or "nay"
> > since it seemed useful to people and it looked somewhat straightforward
> > to implement.
> > 
> > But I do have concerns now after seeing this. The whole
> > /proc/<pid>/setgroups API is terrible in the first place and causes even
> > more special-casing in container runtimes then there already is. But it
> > fixes a security issue so ok we'll live with it.
> > 
> > But I'm not happy about extending its format to include more options. I
> > really don't want the precedent of adding magic keywords into this file.
> > 
> > Which brings me to my second concern. I think starting to magically
> > inherit group ids isn't a great idea. It's got a lot of potential for
> > confusion.
> > 
> > The point Serge here made makes this pretty obvious imho. I don't think
> > introducing the complexities of magic group inheritance is something we
> > should do.
> > 
> > Alternative proposal, can we solve this in userspace instead?
> > 
> > As has been pointed out there is a solution to this problem already
> > which is to explicitly map those groups through, i.e. punch holes for
> > the groups to be inherited.
> > 
> > So can't we introduce a new mode for newgidmap by e.g. introducing
> > another /etc/setgroups file or something similar that can be configured
> > by the administrator. It could take options, e.g. "shadow=always" which
> > could mean "everyone must inherit their groups" so newgidmap will punch
> > holes for the caller's groups when writing the gid mapping. We could
> > also extend this by making newgidmap take a command line switch so it's
> > on a case-by-case basis.
> > 
> > This is even more flexible since you could extend the new /etc/setgroups
> > file to specify a list of groups that must always be preserved. I'd
> > rather see something like this rather than a magic inheritance switch.
> > 
> > Christian
> 
> That sounds reasonable, but my concern is that admins currently using
> groups to deny file access will need to take extra steps to maintain
> that guarantee.  I think that falls under the category of a regression.
> Unless we make 'shadow=always' the default.  But then *that* will regress
> users who currently do not want that feature :)

I think this should simply be up to the administrator.

> 
> Anyway, if we do go this route - or maybe a login.defs option
> "ALLOW_UNPRIV_GROUPS_DROP" - perhaps we can also add a new /etc/subauxgroups
> file (TODO find a better name) which admins who are in the know can use to say
> "hallyn 2000" meaning "user hallyn cannot drop group 2000"

That sounds like something useful. Ideally integrated with the newly
added libsubid.

Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/3] setgroups: new mode 'shadow' for /proc/PID/setgroups
  2021-05-17 14:33       ` Christian Brauner
  2021-05-17 16:17         ` Serge E. Hallyn
  2021-05-17 19:00         ` Giuseppe Scrivano
@ 2021-05-21 14:03         ` Snaipe
  2 siblings, 0 replies; 18+ messages in thread
From: Snaipe @ 2021-05-21 14:03 UTC (permalink / raw)
  To: christian.brauner; +Cc: dwalsh, ebiederm, gscrivan, linux-kernel, serge, Snaipe

Christian Brauner <christian.brauner@ubuntu.com> writes:
> On Mon, May 17, 2021 at 03:30:16PM +0200, Giuseppe Scrivano wrote:
> > "Serge E. Hallyn" <serge@hallyn.com> writes:
> > > If userns u1 unshares u2 with shadow set, then when u2 unshares
> > > u3, should u3 get the same shadowed set that u2 has, or should it
> > > get all of u2's groups as u3's initial shadow set?
> > 
> > good question.  Thinking more of it, I think a reasonable interface is
> > to expect a child userns to inherit the same shadow groups as its parent
> > userns.  If "shadow" is written again to the /proc/PID/setgroups file
> > then it grows shadow groups set to include the ones the userns had at
> > creation time (which includes the parent shadow groups).  What do you
> > think of it?  I'll play more with this idea and see if it works.
> 
> So when I initially looked at that proposal I was neither "yay" or "nay"
> since it seemed useful to people and it looked somewhat straightforward
> to implement.
> 
> But I do have concerns now after seeing this. The whole
> /proc/<pid>/setgroups API is terrible in the first place and causes even
> more special-casing in container runtimes then there already is. But it
> fixes a security issue so ok we'll live with it.
> 
> But I'm not happy about extending its format to include more options. I
> really don't want the precedent of adding magic keywords into this file.
> 
> Which brings me to my second concern. I think starting to magically
> inherit group ids isn't a great idea. It's got a lot of potential for
> confusion.

To be fair, we already magically inherit group ids -- that's what not calling
setgroups after entering the userns and setting up the {u,g}id maps does.
The new shadow mode does not really cause inheritance, but it does provide
a way for userspace programs to keep these unmapped groups around after
a setgroups, which is currently impossible, and quite sad for the reasons
I outlined in my other email[1].

> The point Serge here made makes this pretty obvious imho. I don't think
> introducing the complexities of magic group inheritance is something we
> should do.
> 
> Alternative proposal, can we solve this in userspace instead?
> 
> As has been pointed out there is a solution to this problem already
> which is to explicitly map those groups through, i.e. punch holes for
> the groups to be inherited.

I don't think it really addresses the problem. If you explicitly map these
groups through, then it's still trivial for userspace programs to simply
drop them after the gid map is written. It just solves the problem of
having control over your additional groups in the userns, which, it turns
out, is already configurable by the system administrator via /etc/subgid:

    host$ id
    uid=1000(snaipe) gid=1000(snaipe) groupes=1000(snaipe),998(wheel)

    host$ cat /etc/subgid
    user:1000000:1100000
    user:998:1

    host$ unshare -fU --map-user=0 sh
    ns# echo $$
    3720498

    host$ newgidmap 3725680 0 1000 1 1 1000000 1100000 1100001 998 1

    ns# id
    uid=0(root) gid=0(root) groupes=0(root),65534(nobody),1100001

This is only fine if we're not interested in supporting both negative-
access permission bits and ACLs. It also means application writers and
system administrators cannot force unprivileged users to stay in their
groups even after giving them control of their own user namespace.

[1]: https://lore.kernel.org/lkml/20210510160233.2266000-1-snaipe@arista.com/

-- 
Snaipe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups
  2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
                   ` (3 preceding siblings ...)
  2021-05-10 16:02 ` [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Snaipe
@ 2021-05-21 15:16 ` Eric W. Biederman
  2021-05-24 13:41   ` Giuseppe Scrivano
  4 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2021-05-21 15:16 UTC (permalink / raw)
  To: Giuseppe Scrivano; +Cc: linux-kernel, serge, dwalsh, christian.brauner

Giuseppe Scrivano <gscrivan@redhat.com> writes:

> This series is based on some old patches I've been playing with some
> years ago, but they were never sent to lkml as I was not sure about
> their complexity/usefulness ratio.  It was recently reported by
> another user that these patches are still useful[1] so I am submitting
> the last version and see what other folks think about this feature.
>
> Since the fix for CVE-2014-8989 in order to set a gids mapping for a
> user namespace when the user namespace owner doesn't have CAP_SETGID
> in its parent, it is necessary to first disable setgroups(2) through
> /proc/PID/setgroups.
>
> Setting up a user namespace with multiple IDs mapped into is usually
> done through the privileged helpers newuidmap/newgidmap.
> Since these helpers run either as setuid or with CAP_SET[U,G]ID file
> capabilities, it is not necessary to disable setgroups(2) in the
> created user namespace.  The user running in the user namespace can
> use setgroups(2) and drop the additional groups that it had initially.
>
> This is still an issue on systems where negative groups ACLs, i.e. the
> group permissions are more restrictive than the entry for the other
> categories, are used.  With such configuration, allowing setgroups(2)
> would cause the same security vulnerability described by
> CVE-2014-8989.

Do you have any experience or any documentation about systems that are
using groups to deny access?

There are some deployments somewhere, but last I looked they were rare
enough that the intersection between systems using groups to deny access
and systems deploying containers could reasonably be assumed to be the
empty set?

Before we seriously consider merging a change like this I believe we
need some references to actual deployed systems.  As adding a feature
that is designed around a premise of a security model that people
are not using will likely lead to poor testing, poor review and
not enough feedback to get the rough edges off.


I suspect systems need to preserve some set of groups released from
the restriction of needing to preserve negative groups could result
in a very different result.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups
  2021-05-21 15:16 ` Eric W. Biederman
@ 2021-05-24 13:41   ` Giuseppe Scrivano
  0 siblings, 0 replies; 18+ messages in thread
From: Giuseppe Scrivano @ 2021-05-24 13:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, serge, dwalsh, christian.brauner, snaipe

ebiederm@xmission.com (Eric W. Biederman) writes:

> Giuseppe Scrivano <gscrivan@redhat.com> writes:
>
>> This series is based on some old patches I've been playing with some
>> years ago, but they were never sent to lkml as I was not sure about
>> their complexity/usefulness ratio.  It was recently reported by
>> another user that these patches are still useful[1] so I am submitting
>> the last version and see what other folks think about this feature.
>>
>> Since the fix for CVE-2014-8989 in order to set a gids mapping for a
>> user namespace when the user namespace owner doesn't have CAP_SETGID
>> in its parent, it is necessary to first disable setgroups(2) through
>> /proc/PID/setgroups.
>>
>> Setting up a user namespace with multiple IDs mapped into is usually
>> done through the privileged helpers newuidmap/newgidmap.
>> Since these helpers run either as setuid or with CAP_SET[U,G]ID file
>> capabilities, it is not necessary to disable setgroups(2) in the
>> created user namespace.  The user running in the user namespace can
>> use setgroups(2) and drop the additional groups that it had initially.
>>
>> This is still an issue on systems where negative groups ACLs, i.e. the
>> group permissions are more restrictive than the entry for the other
>> categories, are used.  With such configuration, allowing setgroups(2)
>> would cause the same security vulnerability described by
>> CVE-2014-8989.
>
> Do you have any experience or any documentation about systems that are
> using groups to deny access?
>
> There are some deployments somewhere, but last I looked they were rare
> enough that the intersection between systems using groups to deny access
> and systems deploying containers could reasonably be assumed to be the
> empty set?
>
> Before we seriously consider merging a change like this I believe we
> need some references to actual deployed systems.  As adding a feature
> that is designed around a premise of a security model that people
> are not using will likely lead to poor testing, poor review and
> not enough feedback to get the rough edges off.

Snaipe (added to CC) has raised this point some weeks ago.  Snaipe, do
you have any more information to share on what systems are using user
namespaces and deny access through groups?

Giuseppe


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-24 13:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10 13:00 [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Giuseppe Scrivano
2021-05-10 13:00 ` [RFC PATCH 1/3] setgroups: " Giuseppe Scrivano
2021-05-15  1:51   ` Serge E. Hallyn
2021-05-17 13:30     ` Giuseppe Scrivano
2021-05-17 14:33       ` Christian Brauner
2021-05-17 16:17         ` Serge E. Hallyn
2021-05-18  9:32           ` Christian Brauner
2021-05-17 19:00         ` Giuseppe Scrivano
2021-05-18  9:16           ` Christian Brauner
2021-05-21 14:03         ` Snaipe
2021-05-17 16:08       ` Serge E. Hallyn
2021-05-10 13:00 ` [RFC PATCH 2/3] getgroups: hide unknown groups Giuseppe Scrivano
2021-05-15  1:21   ` Serge E. Hallyn
2021-05-10 13:00 ` [RFC PATCH 3/3] proc: hide unknown groups in status Giuseppe Scrivano
2021-05-15  1:24   ` Serge E. Hallyn
2021-05-10 16:02 ` [RFC PATCH 0/3] new mode 'shadow' for /proc/PID/setgroups Snaipe
2021-05-21 15:16 ` Eric W. Biederman
2021-05-24 13:41   ` Giuseppe Scrivano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.