linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH review 0/6] miscelaneous user namespace patches
@ 2013-01-26  2:15 Eric W. Biederman
  2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:15 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel


Now that I have done my worst to infect user space with some
basic tools for using user namespaces, this is my first round of patches
aimed at the 3.9 merge window.

This documents that if you care about limit resources you want
to configure the memory control group when user namespaces are
enabled.

This enables the user namespace root to mount devpts, ramfs and tmpfs.
Functionality that is needed for practical uses of the user namespace.

This includes my patch to enable more flexibility into the input
allowed in uid_map and gid_map.

 Documentation/namespaces/resource-control.txt |   10 ++++
 fs/devpts/inode.c                             |   18 +++++++
 fs/ramfs/inode.c                              |    1 +
 include/linux/user_namespace.h                |   10 ++--
 init/Kconfig                                  |    7 +++
 kernel/user.c                                 |    4 +-
 kernel/user_namespace.c                       |   62 +++++++++++++++++++------
 mm/shmem.c                                    |    2 +
 8 files changed, 92 insertions(+), 22 deletions(-)

Eric W. Biederman (6):
      userns: Avoid recursion in put_user_ns
      userns: Allow any uid or gid mappings that don't overlap.
      userns: Recommend use of memory control groups.
      userns: Allow the userns root to mount of devpts
      userns: Allow the userns root to mount ramfs.
      userns: Allow the userns root to mount tmpfs.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH review 1/6] userns: Avoid recursion in put_user_ns
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
@ 2013-01-26  2:19 ` Eric W. Biederman
  2013-01-26 20:58   ` Serge E. Hallyn
  2013-01-28 14:51   ` Vasily Kulikov
  2013-01-26  2:21 ` [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap Eric W. Biederman
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:19 UTC (permalink / raw)
  To: Linux Containers
  Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel, Vasily Kulikov


When freeing a deeply nested user namespace free_user_ns calls
put_user_ns on it's parent which may in turn call free_user_ns again.
When -fno-optimize-sibling-calls is passed to gcc one stack frame per
user namespace is left on the stack, potentially overflowing the
kernel stack.  CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls
so we can't count on gcc to optimize this code.

Remove struct kref and use a plain atomic_t.  Making the code more
flexible and easier to comprehend.  Make the loop in free_user_ns
explict to guarantee that the stack does not overflow with
CONFIG_FRAME_POINTER enabled.

I have tested this fix with a simple program that uses unshare to
create a deeply nested user namespace structure and then calls exit.
With 1000 nesteuser namespaces before this change running my test
program causes the kernel to die a horrible death.  With 10,000,000
nested user namespaces after this change my test program runs to
completion and causes no harm.

Pointed-out-by: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |   10 +++++-----
 kernel/user.c                  |    4 +---
 kernel/user_namespace.c        |   17 +++++++++--------
 3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b9bd2e6..4ce0093 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,7 +21,7 @@ struct user_namespace {
 	struct uid_gid_map	uid_map;
 	struct uid_gid_map	gid_map;
 	struct uid_gid_map	projid_map;
-	struct kref		kref;
+	atomic_t		count;
 	struct user_namespace	*parent;
 	kuid_t			owner;
 	kgid_t			group;
@@ -35,18 +35,18 @@ extern struct user_namespace init_user_ns;
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
 {
 	if (ns)
-		kref_get(&ns->kref);
+		atomic_inc(&ns->count);
 	return ns;
 }
 
 extern int create_user_ns(struct cred *new);
 extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred);
-extern void free_user_ns(struct kref *kref);
+extern void free_user_ns(struct user_namespace *ns);
 
 static inline void put_user_ns(struct user_namespace *ns)
 {
-	if (ns)
-		kref_put(&ns->kref, free_user_ns);
+	if (ns && atomic_dec_and_test(&ns->count))
+		free_user_ns(ns);
 }
 
 struct seq_operations;
diff --git a/kernel/user.c b/kernel/user.c
index 33acb5e..57ebfd4 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -47,9 +47,7 @@ struct user_namespace init_user_ns = {
 			.count = 4294967295U,
 		},
 	},
-	.kref = {
-		.refcount	= ATOMIC_INIT(3),
-	},
+	.count = ATOMIC_INIT(3),
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 2b042c4..24f8ec3 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -78,7 +78,7 @@ int create_user_ns(struct cred *new)
 		return ret;
 	}
 
-	kref_init(&ns->kref);
+	atomic_set(&ns->count, 1);
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->owner = owner;
@@ -104,15 +104,16 @@ int unshare_userns(unsigned long unshare_flags, struct cred **new_cred)
 	return create_user_ns(cred);
 }
 
-void free_user_ns(struct kref *kref)
+void free_user_ns(struct user_namespace *ns)
 {
-	struct user_namespace *parent, *ns =
-		container_of(kref, struct user_namespace, kref);
+	struct user_namespace *parent;
 
-	parent = ns->parent;
-	proc_free_inum(ns->proc_inum);
-	kmem_cache_free(user_ns_cachep, ns);
-	put_user_ns(parent);
+	do {
+		parent = ns->parent;
+		proc_free_inum(ns->proc_inum);
+		kmem_cache_free(user_ns_cachep, ns);
+		ns = parent;
+	} while (atomic_dec_and_test(&parent->count));
 }
 EXPORT_SYMBOL(free_user_ns);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
  2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
@ 2013-01-26  2:21 ` Eric W. Biederman
  2013-01-26 21:08   ` Serge E. Hallyn
  2013-01-28 14:28   ` Aristeu Rozanski
  2013-01-26  2:22 ` [PATCH review 3/6] userns: Recommend use of memory control groups Eric W. Biederman
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel, Aristeu Rozanski,
	Andrew Morton


When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
and avoided duplicate mappings by the simple expedient of ensuring the
first number in a new extent was greater than any number in the
previous extent.

Unfortunately that precludes a number of valid mappings, and someone
noticed and complained.  So use a simple check to ensure that ranges
in the mapping extents don't overlap.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c |   45 +++++++++++++++++++++++++++++++++++++++------
 1 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 24f8ec3..8b65083 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -520,6 +520,42 @@ struct seq_operations proc_projid_seq_operations = {
 	.show = projid_m_show,
 };
 
+static bool mappings_overlap(struct uid_gid_map *new_map, struct uid_gid_extent *extent)
+{
+	u32 upper_first, lower_first, upper_last, lower_last;
+	unsigned idx;
+
+	upper_first = extent->first;
+	lower_first = extent->lower_first;
+	upper_last = upper_first + extent->count - 1;
+	lower_last = lower_first + extent->count - 1;
+
+	for (idx = 0; idx < new_map->nr_extents; idx++) {
+		u32 prev_upper_first, prev_lower_first;
+		u32 prev_upper_last, prev_lower_last;
+		struct uid_gid_extent *prev;
+
+		prev = &new_map->extent[idx];
+
+		prev_upper_first = prev->first;
+		prev_lower_first = prev->lower_first;
+		prev_upper_last = prev_upper_first + prev->count - 1;
+		prev_lower_last = prev_lower_first + prev->count - 1;
+
+		/* Does the upper range intersect a previous extent? */
+		if ((prev_upper_first <= upper_last) &&
+		    (prev_upper_last >= upper_first))
+			return true;
+
+		/* Does the lower range intersect a previous extent? */
+		if ((prev_lower_first <= lower_last) &&
+		    (prev_lower_last >= lower_first))
+			return true;
+	}
+	return false;
+}
+
+
 static DEFINE_MUTEX(id_map_mutex);
 
 static ssize_t map_write(struct file *file, const char __user *buf,
@@ -532,7 +568,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	struct user_namespace *ns = seq->private;
 	struct uid_gid_map new_map;
 	unsigned idx;
-	struct uid_gid_extent *extent, *last = NULL;
+	struct uid_gid_extent *extent = NULL;
 	unsigned long page = 0;
 	char *kbuf, *pos, *next_line;
 	ssize_t ret = -EINVAL;
@@ -635,14 +671,11 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 		if ((extent->lower_first + extent->count) <= extent->lower_first)
 			goto out;
 
-		/* For now only accept extents that are strictly in order */
-		if (last &&
-		    (((last->first + last->count) > extent->first) ||
-		     ((last->lower_first + last->count) > extent->lower_first)))
+		/* Do the ranges in extent overlap any previous extents? */
+		if (mappings_overlap(&new_map, extent))
 			goto out;
 
 		new_map.nr_extents++;
-		last = extent;
 
 		/* Fail if the file contains too many extents */
 		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
  2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
  2013-01-26  2:21 ` [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap Eric W. Biederman
@ 2013-01-26  2:22 ` Eric W. Biederman
  2013-01-26 21:13   ` Serge E. Hallyn
  2013-01-28  7:37   ` Lord Glauber Costa of Sealand
  2013-01-26  2:23 ` [PATCH review 4/6] userns: Allow the userns root to mount of devpts Eric W. Biederman
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:22 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel


In the help text describing user namespaces recommend use of memory
control groups.  In many cases memory control groups are the only
mechanism there is to limit how much memory a user who can create
user namespaces can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/namespaces/resource-control.txt |   10 ++++++++++
 init/Kconfig                                  |    7 +++++++
 2 files changed, 17 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/resource-control.txt

diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
new file mode 100644
index 0000000..3d8178a
--- /dev/null
+++ b/Documentation/namespaces/resource-control.txt
@@ -0,0 +1,10 @@
+There are a lot of kinds of objects in the kernel that don't have
+individual limits or that have limits that are ineffective when a set
+of processes is allowed to switch user ids.  With user namespaces
+enabled in a kernel for people who don't trust their users or their
+users programs to play nice this problems becomes more acute.
+
+Therefore it is recommended that memory control groups be enabled in
+kernels that enable user namespaces, and it is further recommended
+that userspace configure memory control groups to limit how much
+memory users they don't trust to play nice can use.
diff --git a/init/Kconfig b/init/Kconfig
index 7d30240..c8c58bd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1035,6 +1035,13 @@ config USER_NS
 	help
 	  This allows containers, i.e. vservers, to use user namespaces
 	  to provide different user info for different servers.
+
+	  When user namespaces are enabled in the kernel it is
+	  recommended that the MEMCG and MEMCG_KMEM options also be
+	  enabled and that user-space use the memory control groups to
+	  limit the amount of memory a memory unprivileged users can
+	  use.
+
 	  If unsure, say N.
 
 config PID_NS
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH review 4/6] userns: Allow the userns root to mount of devpts
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
                   ` (2 preceding siblings ...)
  2013-01-26  2:22 ` [PATCH review 3/6] userns: Recommend use of memory control groups Eric W. Biederman
@ 2013-01-26  2:23 ` Eric W. Biederman
  2013-01-26 21:22   ` Serge E. Hallyn
  2013-01-26  2:26 ` [PATCH review 5/6] userns: Allow the userns root to mount ramfs Eric W. Biederman
  2013-01-26  2:26 ` [PATCH review 6/6] userns: Allow the userns root to mount tmpfs Eric W. Biederman
  5 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:23 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel


- The context in which devpts is mounted has no effect on the creation
  of ptys as the /dev/ptmx interface has been used by unprivileged
  users for many years.

- Only support unprivileged mounts in combination with the newinstance
  option to ensure that mounting of /dev/pts in a user namespace will
  not allow the options of an existing mount of devpts to be modified.

- Create /dev/pts/ptmx as the root user in the user namespace that
  mounts devpts so that it's permissions to be changed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/devpts/inode.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 472e6be..073d30b 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -243,6 +243,13 @@ static int mknod_ptmx(struct super_block *sb)
 	struct dentry *root = sb->s_root;
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
 	struct pts_mount_opts *opts = &fsi->mount_opts;
+	kuid_t root_uid;
+	kgid_t root_gid;
+
+	root_uid = make_kuid(current_user_ns(), 0);
+	root_gid = make_kgid(current_user_ns(), 0);
+	if (!uid_valid(root_uid) || !gid_valid(root_gid))
+		return -EINVAL;
 
 	mutex_lock(&root->d_inode->i_mutex);
 
@@ -273,6 +280,8 @@ static int mknod_ptmx(struct super_block *sb)
 
 	mode = S_IFCHR|opts->ptmxmode;
 	init_special_inode(inode, mode, MKDEV(TTYAUX_MAJOR, 2));
+	inode->i_uid = root_uid;
+	inode->i_gid = root_gid;
 
 	d_add(dentry, inode);
 
@@ -438,6 +447,12 @@ static struct dentry *devpts_mount(struct file_system_type *fs_type,
 	if (error)
 		return ERR_PTR(error);
 
+	/* Require newinstance for all user namespace mounts to ensure
+	 * the mount options are not changed.
+	 */
+	if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
+		return ERR_PTR(-EINVAL);
+
 	if (opts.newinstance)
 		s = sget(fs_type, NULL, set_anon_super, flags, NULL);
 	else
@@ -491,6 +506,9 @@ static struct file_system_type devpts_fs_type = {
 	.name		= "devpts",
 	.mount		= devpts_mount,
 	.kill_sb	= devpts_kill_sb,
+#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
+#endif
 };
 
 /*
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH review 5/6] userns: Allow the userns root to mount ramfs.
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
                   ` (3 preceding siblings ...)
  2013-01-26  2:23 ` [PATCH review 4/6] userns: Allow the userns root to mount of devpts Eric W. Biederman
@ 2013-01-26  2:26 ` Eric W. Biederman
  2013-01-26 21:29   ` Serge E. Hallyn
  2013-01-27 18:23   ` Serge E. Hallyn
  2013-01-26  2:26 ` [PATCH review 6/6] userns: Allow the userns root to mount tmpfs Eric W. Biederman
  5 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:26 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel


There is no backing store to ramfs and file creation
rules are the same as for any other filesystem so
it is semantically safe to allow unprivileged users
to mount it.

The memory control group successfully limits how much
memory ramfs can consume on any system that cares about
a user namespace root using ramfs to exhaust memory
the memory control group can be deployed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/ramfs/inode.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index eab8c09..c24f1e1 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -260,6 +260,7 @@ static struct file_system_type ramfs_fs_type = {
 	.name		= "ramfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= ramfs_kill_sb,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 static struct file_system_type rootfs_fs_type = {
 	.name		= "rootfs",
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH review 6/6] userns: Allow the userns root to mount tmpfs.
  2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
                   ` (4 preceding siblings ...)
  2013-01-26  2:26 ` [PATCH review 5/6] userns: Allow the userns root to mount ramfs Eric W. Biederman
@ 2013-01-26  2:26 ` Eric W. Biederman
  2013-01-27 18:23   ` Serge E. Hallyn
  2013-01-28  1:28   ` Gao feng
  5 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-26  2:26 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, linux-fsdevel


There is no backing store to tmpfs and file creation rules are the
same as for any other filesystem so it is semantically safe to allow
unprivileged users to mount it.  ramfs is safe for the same reasons so
allow either flavor of tmpfs to be mounted by a user namespace root
user.

The memory control group successfully limits how much memory tmpfs can
consume on any system that cares about a user namespace root using
tmpfs to exhaust memory the memory control group can be deployed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 mm/shmem.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 5c90d84..197ca5e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= shmem_mount,
 	.kill_sb	= kill_litter_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init shmem_init(void)
@@ -2823,6 +2824,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= kill_litter_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init shmem_init(void)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH review 1/6] userns: Avoid recursion in put_user_ns
  2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
@ 2013-01-26 20:58   ` Serge E. Hallyn
  2013-01-28 14:51   ` Vasily Kulikov
  1 sibling, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-26 20:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel,
	Vasily Kulikov

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> When freeing a deeply nested user namespace free_user_ns calls
> put_user_ns on it's parent which may in turn call free_user_ns again.
> When -fno-optimize-sibling-calls is passed to gcc one stack frame per
> user namespace is left on the stack, potentially overflowing the
> kernel stack.  CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls
> so we can't count on gcc to optimize this code.
> 
> Remove struct kref and use a plain atomic_t.  Making the code more
> flexible and easier to comprehend.  Make the loop in free_user_ns
> explict to guarantee that the stack does not overflow with
> CONFIG_FRAME_POINTER enabled.
> 
> I have tested this fix with a simple program that uses unshare to
> create a deeply nested user namespace structure and then calls exit.
> With 1000 nesteuser namespaces before this change running my test
> program causes the kernel to die a horrible death.  With 10,000,000
> nested user namespaces after this change my test program runs to
> completion and causes no harm.
> 
> Pointed-out-by: Vasily Kulikov <segoon@openwall.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/user_namespace.h |   10 +++++-----
>  kernel/user.c                  |    4 +---
>  kernel/user_namespace.c        |   17 +++++++++--------
>  3 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index b9bd2e6..4ce0093 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -21,7 +21,7 @@ struct user_namespace {
>  	struct uid_gid_map	uid_map;
>  	struct uid_gid_map	gid_map;
>  	struct uid_gid_map	projid_map;
> -	struct kref		kref;
> +	atomic_t		count;
>  	struct user_namespace	*parent;
>  	kuid_t			owner;
>  	kgid_t			group;
> @@ -35,18 +35,18 @@ extern struct user_namespace init_user_ns;
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>  {
>  	if (ns)
> -		kref_get(&ns->kref);
> +		atomic_inc(&ns->count);
>  	return ns;
>  }
>  
>  extern int create_user_ns(struct cred *new);
>  extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred);
> -extern void free_user_ns(struct kref *kref);
> +extern void free_user_ns(struct user_namespace *ns);
>  
>  static inline void put_user_ns(struct user_namespace *ns)
>  {
> -	if (ns)
> -		kref_put(&ns->kref, free_user_ns);
> +	if (ns && atomic_dec_and_test(&ns->count))
> +		free_user_ns(ns);
>  }
>  
>  struct seq_operations;
> diff --git a/kernel/user.c b/kernel/user.c
> index 33acb5e..57ebfd4 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -47,9 +47,7 @@ struct user_namespace init_user_ns = {
>  			.count = 4294967295U,
>  		},
>  	},
> -	.kref = {
> -		.refcount	= ATOMIC_INIT(3),
> -	},
> +	.count = ATOMIC_INIT(3),
>  	.owner = GLOBAL_ROOT_UID,
>  	.group = GLOBAL_ROOT_GID,
>  	.proc_inum = PROC_USER_INIT_INO,
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 2b042c4..24f8ec3 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -78,7 +78,7 @@ int create_user_ns(struct cred *new)
>  		return ret;
>  	}
>  
> -	kref_init(&ns->kref);
> +	atomic_set(&ns->count, 1);
>  	/* Leave the new->user_ns reference with the new user namespace. */
>  	ns->parent = parent_ns;
>  	ns->owner = owner;
> @@ -104,15 +104,16 @@ int unshare_userns(unsigned long unshare_flags, struct cred **new_cred)
>  	return create_user_ns(cred);
>  }
>  
> -void free_user_ns(struct kref *kref)
> +void free_user_ns(struct user_namespace *ns)
>  {
> -	struct user_namespace *parent, *ns =
> -		container_of(kref, struct user_namespace, kref);
> +	struct user_namespace *parent;
>  
> -	parent = ns->parent;
> -	proc_free_inum(ns->proc_inum);
> -	kmem_cache_free(user_ns_cachep, ns);
> -	put_user_ns(parent);
> +	do {
> +		parent = ns->parent;
> +		proc_free_inum(ns->proc_inum);
> +		kmem_cache_free(user_ns_cachep, ns);
> +		ns = parent;
> +	} while (atomic_dec_and_test(&parent->count));
>  }
>  EXPORT_SYMBOL(free_user_ns);
>  
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
  2013-01-26  2:21 ` [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap Eric W. Biederman
@ 2013-01-26 21:08   ` Serge E. Hallyn
  2013-01-28 14:28   ` Aristeu Rozanski
  1 sibling, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-26 21:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel,
	Aristeu Rozanski, Andrew Morton

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
> and avoided duplicate mappings by the simple expedient of ensuring the
> first number in a new extent was greater than any number in the
> previous extent.
> 
> Unfortunately that precludes a number of valid mappings, and someone
> noticed and complained.  So use a simple check to ensure that ranges
> in the mapping extents don't overlap.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/user_namespace.c |   45 +++++++++++++++++++++++++++++++++++++++------
>  1 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 24f8ec3..8b65083 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -520,6 +520,42 @@ struct seq_operations proc_projid_seq_operations = {
>  	.show = projid_m_show,
>  };
>  
> +static bool mappings_overlap(struct uid_gid_map *new_map, struct uid_gid_extent *extent)
> +{
> +	u32 upper_first, lower_first, upper_last, lower_last;
> +	unsigned idx;
> +
> +	upper_first = extent->first;
> +	lower_first = extent->lower_first;
> +	upper_last = upper_first + extent->count - 1;
> +	lower_last = lower_first + extent->count - 1;
> +
> +	for (idx = 0; idx < new_map->nr_extents; idx++) {
> +		u32 prev_upper_first, prev_lower_first;
> +		u32 prev_upper_last, prev_lower_last;
> +		struct uid_gid_extent *prev;
> +
> +		prev = &new_map->extent[idx];
> +
> +		prev_upper_first = prev->first;
> +		prev_lower_first = prev->lower_first;
> +		prev_upper_last = prev_upper_first + prev->count - 1;
> +		prev_lower_last = prev_lower_first + prev->count - 1;
> +
> +		/* Does the upper range intersect a previous extent? */
> +		if ((prev_upper_first <= upper_last) &&
> +		    (prev_upper_last >= upper_first))
> +			return true;
> +
> +		/* Does the lower range intersect a previous extent? */
> +		if ((prev_lower_first <= lower_last) &&
> +		    (prev_lower_last >= lower_first))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +
>  static DEFINE_MUTEX(id_map_mutex);
>  
>  static ssize_t map_write(struct file *file, const char __user *buf,
> @@ -532,7 +568,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	struct user_namespace *ns = seq->private;
>  	struct uid_gid_map new_map;
>  	unsigned idx;
> -	struct uid_gid_extent *extent, *last = NULL;
> +	struct uid_gid_extent *extent = NULL;
>  	unsigned long page = 0;
>  	char *kbuf, *pos, *next_line;
>  	ssize_t ret = -EINVAL;
> @@ -635,14 +671,11 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  		if ((extent->lower_first + extent->count) <= extent->lower_first)
>  			goto out;
>  
> -		/* For now only accept extents that are strictly in order */
> -		if (last &&
> -		    (((last->first + last->count) > extent->first) ||
> -		     ((last->lower_first + last->count) > extent->lower_first)))
> +		/* Do the ranges in extent overlap any previous extents? */
> +		if (mappings_overlap(&new_map, extent))
>  			goto out;
>  
>  		new_map.nr_extents++;
> -		last = extent;
>  
>  		/* Fail if the file contains too many extents */
>  		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-26  2:22 ` [PATCH review 3/6] userns: Recommend use of memory control groups Eric W. Biederman
@ 2013-01-26 21:13   ` Serge E. Hallyn
  2013-01-27  6:19     ` Eric W. Biederman
  2013-01-28  7:37   ` Lord Glauber Costa of Sealand
  1 sibling, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-26 21:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> In the help text describing user namespaces recommend use of memory
> control groups.  In many cases memory control groups are the only
> mechanism there is to limit how much memory a user who can create
> user namespaces can use.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

nit:

> ---
>  Documentation/namespaces/resource-control.txt |   10 ++++++++++
>  init/Kconfig                                  |    7 +++++++
>  2 files changed, 17 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/namespaces/resource-control.txt
> 
> diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
> new file mode 100644
> index 0000000..3d8178a
> --- /dev/null
> +++ b/Documentation/namespaces/resource-control.txt
> @@ -0,0 +1,10 @@
> +There are a lot of kinds of objects in the kernel that don't have
> +individual limits or that have limits that are ineffective when a set
> +of processes is allowed to switch user ids.  With user namespaces
> +enabled in a kernel for people who don't trust their users or their
> +users programs to play nice this problems becomes more acute.

users' programs

> +
> +Therefore it is recommended that memory control groups be enabled in
> +kernels that enable user namespaces, and it is further recommended
> +that userspace configure memory control groups to limit how much
> +memory users they don't trust to play nice can use.
> diff --git a/init/Kconfig b/init/Kconfig
> index 7d30240..c8c58bd 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1035,6 +1035,13 @@ config USER_NS
>  	help
>  	  This allows containers, i.e. vservers, to use user namespaces
>  	  to provide different user info for different servers.
> +
> +	  When user namespaces are enabled in the kernel it is
> +	  recommended that the MEMCG and MEMCG_KMEM options also be
> +	  enabled and that user-space use the memory control groups to
> +	  limit the amount of memory a memory unprivileged users can
> +	  use.
> +
>  	  If unsure, say N.
>  
>  config PID_NS
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 4/6] userns: Allow the userns root to mount of devpts
  2013-01-26  2:23 ` [PATCH review 4/6] userns: Allow the userns root to mount of devpts Eric W. Biederman
@ 2013-01-26 21:22   ` Serge E. Hallyn
  0 siblings, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-26 21:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> - The context in which devpts is mounted has no effect on the creation
>   of ptys as the /dev/ptmx interface has been used by unprivileged
>   users for many years.
> 
> - Only support unprivileged mounts in combination with the newinstance
>   option to ensure that mounting of /dev/pts in a user namespace will
>   not allow the options of an existing mount of devpts to be modified.
> 
> - Create /dev/pts/ptmx as the root user in the user namespace that
>   mounts devpts so that it's permissions to be changed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/devpts/inode.c |   18 ++++++++++++++++++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index 472e6be..073d30b 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -243,6 +243,13 @@ static int mknod_ptmx(struct super_block *sb)
>  	struct dentry *root = sb->s_root;
>  	struct pts_fs_info *fsi = DEVPTS_SB(sb);
>  	struct pts_mount_opts *opts = &fsi->mount_opts;
> +	kuid_t root_uid;
> +	kgid_t root_gid;
> +
> +	root_uid = make_kuid(current_user_ns(), 0);
> +	root_gid = make_kgid(current_user_ns(), 0);
> +	if (!uid_valid(root_uid) || !gid_valid(root_gid))
> +		return -EINVAL;
>  
>  	mutex_lock(&root->d_inode->i_mutex);
>  
> @@ -273,6 +280,8 @@ static int mknod_ptmx(struct super_block *sb)
>  
>  	mode = S_IFCHR|opts->ptmxmode;
>  	init_special_inode(inode, mode, MKDEV(TTYAUX_MAJOR, 2));
> +	inode->i_uid = root_uid;
> +	inode->i_gid = root_gid;
>  
>  	d_add(dentry, inode);
>  
> @@ -438,6 +447,12 @@ static struct dentry *devpts_mount(struct file_system_type *fs_type,
>  	if (error)
>  		return ERR_PTR(error);
>  
> +	/* Require newinstance for all user namespace mounts to ensure
> +	 * the mount options are not changed.
> +	 */
> +	if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
> +		return ERR_PTR(-EINVAL);
> +
>  	if (opts.newinstance)
>  		s = sget(fs_type, NULL, set_anon_super, flags, NULL);
>  	else
> @@ -491,6 +506,9 @@ static struct file_system_type devpts_fs_type = {
>  	.name		= "devpts",
>  	.mount		= devpts_mount,
>  	.kill_sb	= devpts_kill_sb,
> +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
> +#endif
>  };
>  
>  /*
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 5/6] userns: Allow the userns root to mount ramfs.
  2013-01-26  2:26 ` [PATCH review 5/6] userns: Allow the userns root to mount ramfs Eric W. Biederman
@ 2013-01-26 21:29   ` Serge E. Hallyn
  2013-01-27  6:09     ` Eric W. Biederman
  2013-01-27 18:23   ` Serge E. Hallyn
  1 sibling, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-26 21:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> There is no backing store to ramfs and file creation
> rules are the same as for any other filesystem so
> it is semantically safe to allow unprivileged users
> to mount it.
> 
> The memory control group successfully limits how much
> memory ramfs can consume on any system that cares about
> a user namespace root using ramfs to exhaust memory
> the memory control group can be deployed.

But that does mean that to avoid this new type of attack, when handed a
new kernel (i.e. by one's distro) one has to explicitly (know about and)
configure those limits.  The "your distro should do this for you"
argument doesn't seem right.  And I'd really prefer there not be
barriers to user namespaces being compiled in when there don't have to
be.

What was your thought on the suggestion to only allow FS_USERNS_MOUNT
mounts by users confined in a non-init memory cgroup?

Alternatively, what about a simple sysctl knob to turn on
FS_USERNS_MOUNTs?  Then if I've got no untrusted users I can just turn
that on without the system second-guessing me for not having extra
configuration...

> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/ramfs/inode.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index eab8c09..c24f1e1 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -260,6 +260,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  static struct file_system_type rootfs_fs_type = {
>  	.name		= "rootfs",
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 5/6] userns: Allow the userns root to mount ramfs.
  2013-01-26 21:29   ` Serge E. Hallyn
@ 2013-01-27  6:09     ` Eric W. Biederman
  2013-01-27 18:23       ` Serge E. Hallyn
  0 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-27  6:09 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers, linux-kernel, linux-fsdevel

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> 
>> There is no backing store to ramfs and file creation
>> rules are the same as for any other filesystem so
>> it is semantically safe to allow unprivileged users
>> to mount it.
>> 
>> The memory control group successfully limits how much
>> memory ramfs can consume on any system that cares about
>> a user namespace root using ramfs to exhaust memory
>> the memory control group can be deployed.
>
> But that does mean that to avoid this new type of attack, when handed a
> new kernel (i.e. by one's distro) one has to explicitly (know about and)
> configure those limits.  The "your distro should do this for you"
> argument doesn't seem right.  And I'd really prefer there not be
> barriers to user namespaces being compiled in when there don't have to
> be.

The thing is this really isn't a new type of attack.  There are a lot of
existing methods to exhaust memory with the default configuration on
most distros.  All this is is a new method to method to implement such
an attack.

Most distros allow a large number or processes and allow those processes
to consume a large if not unlimited amount of ram.

The OOM killer still will recover your system from a ramfs or a tmpfs
mounted in a mount namespace created with user namespace permissions.
It works because the OOM killer will kill all of the processes in the
mount namespace.  At which point all of the mounts have their reference
counts go to 0 the filesystems are unmounted.  When a ramfs or 
tmpfs is unmounted all of the files in a ramfs or tmpfs are freed.

On the flip side every resource has historically come with it's own new
knob.  The new knob in this case is memory control groups.  It isn't an
rlimit, and it isn't global limit tunable with a sysctl.  It is a much
more general knob than that.

> What was your thought on the suggestion to only allow FS_USERNS_MOUNT
> mounts by users confined in a non-init memory cgroup?

Over design.

But more than that there are a lot of other ways to get into trouble if
you don't enable memory control groups with user namespaces.   tmpfs is
just the first one I identified.

for (;;) unshare(CLONE_NEWUSER) is equally as bad, and if I look I can
find a bunch of others.

The practical fact is that allowing userspace to exhaust memory and get
the system into an OOM condition happens today.   There are lots of lots
of resources that it would take a lot of time to individually limit, or
put a knob on and even then we would miss some.  The memory control group
limits all of those now, and isn't particularly hard to configure.

So for the people who care I recommend using the tools that are
available now and work now the memory control group.

Personally I don't think distros care.

> Alternatively, what about a simple sysctl knob to turn on
> FS_USERNS_MOUNTs?  Then if I've got no untrusted users I can just turn
> that on without the system second-guessing me for not having extra
> configuration...

I suppose we could do something like what happens on terminals where
scheduler control groups are automatically created by the kernel.  Or
perhaps have an on/off sysctl knob for user namespaces themselves.  I
don't think anything more fine grained is worth it at this point.

Not that I will oppose more fine grained patches if someone writes else
writes them, I just don't see the bang for the buck.

I understand about not wanting to introduce limits on people enabling
user namespaces.  Most distro's don't appear to limit users memory today
so enabling user namespaces won't change anything.  For people who do
want to limit a users memory consumption it looks like all you need
to do is something like:

$ apt-get install cgroup-bin libcgroup1 libpam-cgroup

$ cat >> /etc/cgconfig <<EOF
group eric {
      perm {
		task {
			uid = root;
			gid = root;
		}
		admin {
			uid = root;
			gid = root;
		}
	}
	memory {
		memory.limit_in_bytes = 1073741824;
		memory.kmem.limit_in_bytes = 1073741824;
	}
}
mount {
	memory = /mnt/cgroups/memory;
}
EOF

$ cat >> /etc/cgrules <<EOF
eric		memory		eric/
EOF

So shrug.  The mechanisms that I am suggesting people use already exist,
and appear to have been present long enough to have made it into debian
stable release February of 2011.

My apologies for not having done that part of my homework earlier to
know that libpam-cgroup and friends are well established and 
have existed for quite a long time.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-26 21:13   ` Serge E. Hallyn
@ 2013-01-27  6:19     ` Eric W. Biederman
  0 siblings, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-27  6:19 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers, linux-kernel, linux-fsdevel

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> 
>> In the help text describing user namespaces recommend use of memory
>> control groups.  In many cases memory control groups are the only
>> mechanism there is to limit how much memory a user who can create
>> user namespaces can use.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

>
> nit:
>

I have fixed you nit and added the following text, so people know
have a clue where to look to configure cgroups in userspace.

diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
index 3d8178a..abc13c3 100644
--- a/Documentation/namespaces/resource-control.txt
+++ b/Documentation/namespaces/resource-control.txt
@@ -7,4 +7,8 @@ users programs to play nice this problems becomes more acute.
 Therefore it is recommended that memory control groups be enabled in
 kernels that enable user namespaces, and it is further recommended
 that userspace configure memory control groups to limit how much
-memory users they don't trust to play nice can use.
+memory user's they don't trust to play nice can use.
+
+Memory control groups can be configured by installing the libcgroup
+package present on most distros editing /etc/cgrules.conf,
+/etc/cgconfig.conf and setting up libpam-cgroup.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH review 5/6] userns: Allow the userns root to mount ramfs.
  2013-01-27  6:09     ` Eric W. Biederman
@ 2013-01-27 18:23       ` Serge E. Hallyn
  0 siblings, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-27 18:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Linux Containers, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> 
> >> There is no backing store to ramfs and file creation
> >> rules are the same as for any other filesystem so
> >> it is semantically safe to allow unprivileged users
> >> to mount it.
> >> 
> >> The memory control group successfully limits how much
> >> memory ramfs can consume on any system that cares about
> >> a user namespace root using ramfs to exhaust memory
> >> the memory control group can be deployed.
> >
> > But that does mean that to avoid this new type of attack, when handed a
> > new kernel (i.e. by one's distro) one has to explicitly (know about and)
> > configure those limits.  The "your distro should do this for you"
> > argument doesn't seem right.  And I'd really prefer there not be
> > barriers to user namespaces being compiled in when there don't have to
> > be.
> 
> The thing is this really isn't a new type of attack.  There are a lot of

Of course.

> existing methods to exhaust memory with the default configuration on
> most distros.  All this is is a new method to method to implement such
> an attack.

Right.

...

> > What was your thought on the suggestion to only allow FS_USERNS_MOUNT
> > mounts by users confined in a non-init memory cgroup?
> 
> Over design.

Ok.  Fair.

> So shrug.  The mechanisms that I am suggesting people use already exist,
> and appear to have been present long enough to have made it into debian
> stable release February of 2011.

Heh - right, libcgroup does have its problems, but I don't think there
are any problems with the pam module actually.  I'm meant to talk with
the debian maintainer for them soon, will test.

thanks,
-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 5/6] userns: Allow the userns root to mount ramfs.
  2013-01-26  2:26 ` [PATCH review 5/6] userns: Allow the userns root to mount ramfs Eric W. Biederman
  2013-01-26 21:29   ` Serge E. Hallyn
@ 2013-01-27 18:23   ` Serge E. Hallyn
  1 sibling, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-27 18:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> There is no backing store to ramfs and file creation
> rules are the same as for any other filesystem so
> it is semantically safe to allow unprivileged users
> to mount it.
> 
> The memory control group successfully limits how much
> memory ramfs can consume on any system that cares about
> a user namespace root using ramfs to exhaust memory
> the memory control group can be deployed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/ramfs/inode.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index eab8c09..c24f1e1 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -260,6 +260,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  static struct file_system_type rootfs_fs_type = {
>  	.name		= "rootfs",
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 6/6] userns: Allow the userns root to mount tmpfs.
  2013-01-26  2:26 ` [PATCH review 6/6] userns: Allow the userns root to mount tmpfs Eric W. Biederman
@ 2013-01-27 18:23   ` Serge E. Hallyn
  2013-01-28  1:28   ` Gao feng
  1 sibling, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2013-01-27 18:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> There is no backing store to tmpfs and file creation rules are the
> same as for any other filesystem so it is semantically safe to allow
> unprivileged users to mount it.  ramfs is safe for the same reasons so
> allow either flavor of tmpfs to be mounted by a user namespace root
> user.
> 
> The memory control group successfully limits how much memory tmpfs can
> consume on any system that cares about a user namespace root using
> tmpfs to exhaust memory the memory control group can be deployed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  mm/shmem.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5c90d84..197ca5e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -2823,6 +2824,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 6/6] userns: Allow the userns root to mount tmpfs.
  2013-01-26  2:26 ` [PATCH review 6/6] userns: Allow the userns root to mount tmpfs Eric W. Biederman
  2013-01-27 18:23   ` Serge E. Hallyn
@ 2013-01-28  1:28   ` Gao feng
  1 sibling, 0 replies; 31+ messages in thread
From: Gao feng @ 2013-01-28  1:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

On 2013/01/26 10:26, Eric W. Biederman wrote:
> 
> There is no backing store to tmpfs and file creation rules are the
> same as for any other filesystem so it is semantically safe to allow
> unprivileged users to mount it.  ramfs is safe for the same reasons so
> allow either flavor of tmpfs to be mounted by a user namespace root
> user.
> 
> The memory control group successfully limits how much memory tmpfs can
> consume on any system that cares about a user namespace root using
> tmpfs to exhaust memory the memory control group can be deployed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---

useful to me,thanks Eric & Serge.

Acked-by: Gao feng <gaofeng@cn.fujitsu.com>

>  mm/shmem.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5c90d84..197ca5e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -2823,6 +2824,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-26  2:22 ` [PATCH review 3/6] userns: Recommend use of memory control groups Eric W. Biederman
  2013-01-26 21:13   ` Serge E. Hallyn
@ 2013-01-28  7:37   ` Lord Glauber Costa of Sealand
  2013-01-28  7:50     ` Lord Glauber Costa of Sealand
  2013-01-28  8:05     ` Eric W. Biederman
  1 sibling, 2 replies; 31+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28  7:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers, linux-fsdevel, linux-kernel

On 01/26/2013 06:22 AM, Eric W. Biederman wrote:
> 
> In the help text describing user namespaces recommend use of memory
> control groups.  In many cases memory control groups are the only
> mechanism there is to limit how much memory a user who can create
> user namespaces can use.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  Documentation/namespaces/resource-control.txt |   10 ++++++++++
>  init/Kconfig                                  |    7 +++++++
>  2 files changed, 17 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/namespaces/resource-control.txt
> 
> diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
> new file mode 100644
> index 0000000..3d8178a
> --- /dev/null
> +++ b/Documentation/namespaces/resource-control.txt
> @@ -0,0 +1,10 @@
> +There are a lot of kinds of objects in the kernel that don't have
> +individual limits or that have limits that are ineffective when a set
> +of processes is allowed to switch user ids.  With user namespaces
> +enabled in a kernel for people who don't trust their users or their
> +users programs to play nice this problems becomes more acute.
> +
> +Therefore it is recommended that memory control groups be enabled in
> +kernels that enable user namespaces, and it is further recommended
> +that userspace configure memory control groups to limit how much
> +memory users they don't trust to play nice can use.
> diff --git a/init/Kconfig b/init/Kconfig
> index 7d30240..c8c58bd 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1035,6 +1035,13 @@ config USER_NS
>  	help
>  	  This allows containers, i.e. vservers, to use user namespaces
>  	  to provide different user info for different servers.
> +
> +	  When user namespaces are enabled in the kernel it is
> +	  recommended that the MEMCG and MEMCG_KMEM options also be
> +	  enabled and that user-space use the memory control groups to
> +	  limit the amount of memory a memory unprivileged users can
> +	  use.
> +
>  	  If unsure, say N.

Since this becomes an official recommendation that people will likely
follow, are we really that much concerned about the types of abuses the
MEMCG_KMEM will prevent? Those are mostly metadata-based abuses users
could do in their own local disks without mounting anything extra (and
things that look like that)

Unless there is a specific concern here, shouldn't we say "... that the
MEMCG (and possibly MEMCG_KMEM) options..." ?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28  7:37   ` Lord Glauber Costa of Sealand
@ 2013-01-28  7:50     ` Lord Glauber Costa of Sealand
  2013-01-28  8:14       ` Eric W. Biederman
  2013-01-28  8:05     ` Eric W. Biederman
  1 sibling, 1 reply; 31+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28  7:50 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-fsdevel, Linux Containers, linux-kernel

On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote:
> On 01/26/2013 06:22 AM, Eric W. Biederman wrote:
>>
>> In the help text describing user namespaces recommend use of memory
>> control groups.  In many cases memory control groups are the only
>> mechanism there is to limit how much memory a user who can create
>> user namespaces can use.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  Documentation/namespaces/resource-control.txt |   10 ++++++++++
>>  init/Kconfig                                  |    7 +++++++
>>  2 files changed, 17 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/namespaces/resource-control.txt
>>
>> diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
>> new file mode 100644
>> index 0000000..3d8178a
>> --- /dev/null
>> +++ b/Documentation/namespaces/resource-control.txt
>> @@ -0,0 +1,10 @@
>> +There are a lot of kinds of objects in the kernel that don't have
>> +individual limits or that have limits that are ineffective when a set
>> +of processes is allowed to switch user ids.  With user namespaces
>> +enabled in a kernel for people who don't trust their users or their
>> +users programs to play nice this problems becomes more acute.
>> +
>> +Therefore it is recommended that memory control groups be enabled in
>> +kernels that enable user namespaces, and it is further recommended
>> +that userspace configure memory control groups to limit how much
>> +memory users they don't trust to play nice can use.
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 7d30240..c8c58bd 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1035,6 +1035,13 @@ config USER_NS
>>  	help
>>  	  This allows containers, i.e. vservers, to use user namespaces
>>  	  to provide different user info for different servers.
>> +
>> +	  When user namespaces are enabled in the kernel it is
>> +	  recommended that the MEMCG and MEMCG_KMEM options also be
>> +	  enabled and that user-space use the memory control groups to
>> +	  limit the amount of memory a memory unprivileged users can
>> +	  use.
>> +
>>  	  If unsure, say N.
> 
> Since this becomes an official recommendation that people will likely
> follow, are we really that much concerned about the types of abuses the
> MEMCG_KMEM will prevent? Those are mostly metadata-based abuses users
> could do in their own local disks without mounting anything extra (and
> things that look like that)
> 
> Unless there is a specific concern here, shouldn't we say "... that the
> MEMCG (and possibly MEMCG_KMEM) options..." ?
> 
> 
I just saw in a later patch of yours that your concern here seems not
limited to backed ram by tmpfs, but with things like the internal
structures for userns , to avoid patterns in the form: 'for (;;)
unshare(...)'

Humm, it does seem sensible. The kernel memory controller aims to
prevent exactly things like that. But they all exist already before
userns: there are destructive patterns like that with sockets, dentries,
processes, and pretty much every other resource in the kernel. So
Although the recommendation per-se makes sense, I am wondering if it is
worth it to mention anything in the user_ns config?





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28  7:37   ` Lord Glauber Costa of Sealand
  2013-01-28  7:50     ` Lord Glauber Costa of Sealand
@ 2013-01-28  8:05     ` Eric W. Biederman
  1 sibling, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-28  8:05 UTC (permalink / raw)
  To: Lord Glauber Costa of Sealand
  Cc: Linux Containers, linux-fsdevel, linux-kernel

Lord Glauber Costa of Sealand <glommer@parallels.com> writes:

> On 01/26/2013 06:22 AM, Eric W. Biederman wrote:
>> 
>> In the help text describing user namespaces recommend use of memory
>> control groups.  In many cases memory control groups are the only
>> mechanism there is to limit how much memory a user who can create
>> user namespaces can use.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  Documentation/namespaces/resource-control.txt |   10 ++++++++++
>>  init/Kconfig                                  |    7 +++++++
>>  2 files changed, 17 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/namespaces/resource-control.txt
>> 
>> diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
>> new file mode 100644
>> index 0000000..3d8178a
>> --- /dev/null
>> +++ b/Documentation/namespaces/resource-control.txt
>> @@ -0,0 +1,10 @@
>> +There are a lot of kinds of objects in the kernel that don't have
>> +individual limits or that have limits that are ineffective when a set
>> +of processes is allowed to switch user ids.  With user namespaces
>> +enabled in a kernel for people who don't trust their users or their
>> +users programs to play nice this problems becomes more acute.
>> +
>> +Therefore it is recommended that memory control groups be enabled in
>> +kernels that enable user namespaces, and it is further recommended
>> +that userspace configure memory control groups to limit how much
>> +memory users they don't trust to play nice can use.
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 7d30240..c8c58bd 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1035,6 +1035,13 @@ config USER_NS
>>  	help
>>  	  This allows containers, i.e. vservers, to use user namespaces
>>  	  to provide different user info for different servers.
>> +
>> +	  When user namespaces are enabled in the kernel it is
>> +	  recommended that the MEMCG and MEMCG_KMEM options also be
>> +	  enabled and that user-space use the memory control groups to
>> +	  limit the amount of memory a memory unprivileged users can
>> +	  use.
>> +
>>  	  If unsure, say N.
>
> Since this becomes an official recommendation that people will likely
> follow, are we really that much concerned about the types of abuses the
> MEMCG_KMEM will prevent? Those are mostly metadata-based abuses users
> could do in their own local disks without mounting anything extra (and
> things that look like that)
>
> Unless there is a specific concern here, shouldn't we say "... that the
> MEMCG (and possibly MEMCG_KMEM) options..." ?

There are quite a few specific concerns.  The easiest to spot is
unshare(CLONE_NEWUSER), and the other namespaces.  Then there are
network devices.  Then there is I don't know what else.

Most distro's don't seem to care at all about limiting a users memory
so in that sense it is not a concern.

On the other hand for everyone who wants to limit a user's memory the
only way that is going to happen in a reasonable amount of
implementation time is with memory control groups, and slabs and kmalloc
are most definitely part of the memory needs to be limited.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28  7:50     ` Lord Glauber Costa of Sealand
@ 2013-01-28  8:14       ` Eric W. Biederman
  2013-01-28  8:22         ` Lord Glauber Costa of Sealand
  0 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-28  8:14 UTC (permalink / raw)
  To: Lord Glauber Costa of Sealand
  Cc: linux-fsdevel, Linux Containers, linux-kernel

Lord Glauber Costa of Sealand <glommer@parallels.com> writes:

> I just saw in a later patch of yours that your concern here seems not
> limited to backed ram by tmpfs, but with things like the internal
> structures for userns , to avoid patterns in the form: 'for (;;)
> unshare(...)'
>
> Humm, it does seem sensible. The kernel memory controller aims to
> prevent exactly things like that. But they all exist already before
> userns: there are destructive patterns like that with sockets, dentries,
> processes, and pretty much every other resource in the kernel. So
> Although the recommendation per-se makes sense, I am wondering if it is
> worth it to mention anything in the user_ns config?

The config might be overkill.  However I have already gotten bug reports
about there being no limits.

So someone needs to stop and connect the dots and say: "If you care this
is what you can do."  Especially since the familiar old limits that can
kind of sort of prevent memory abuses are not generally available with
user namespaces.

Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28  8:14       ` Eric W. Biederman
@ 2013-01-28  8:22         ` Lord Glauber Costa of Sealand
  2013-01-28 16:19           ` Eric W. Biederman
  0 siblings, 1 reply; 31+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28  8:22 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-fsdevel, Linux Containers, linux-kernel

On 01/28/2013 12:14 PM, Eric W. Biederman wrote:
> Lord Glauber Costa of Sealand <glommer@parallels.com> writes:
> 
>> I just saw in a later patch of yours that your concern here seems not
>> limited to backed ram by tmpfs, but with things like the internal
>> structures for userns , to avoid patterns in the form: 'for (;;)
>> unshare(...)'
>>
>> Humm, it does seem sensible. The kernel memory controller aims to
>> prevent exactly things like that. But they all exist already before
>> userns: there are destructive patterns like that with sockets, dentries,
>> processes, and pretty much every other resource in the kernel. So
>> Although the recommendation per-se makes sense, I am wondering if it is
>> worth it to mention anything in the user_ns config?
> 
> The config might be overkill.  However I have already gotten bug reports
> about there being no limits.
> 
> So someone needs to stop and connect the dots and say: 
Absolutely, and I am all for it

> "If you care this is what you can do." 

How about we say it, then?

The current text in quite cryptic in this aspect, in the sense that it
doesn't give enough information for standard people about what are the
problems involved.

Of course, maybe the Kconfig text is not the best place for having all
the info: but don't we have some place in Documentation/ where we could
put this, and then refer people there from Kconfig ?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
  2013-01-26  2:21 ` [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap Eric W. Biederman
  2013-01-26 21:08   ` Serge E. Hallyn
@ 2013-01-28 14:28   ` Aristeu Rozanski
  2013-01-28 14:41     ` Lord Glauber Costa of Sealand
  1 sibling, 1 reply; 31+ messages in thread
From: Aristeu Rozanski @ 2013-01-28 14:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel,
	Andrew Morton

On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote:
> When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
> and avoided duplicate mappings by the simple expedient of ensuring the
> first number in a new extent was greater than any number in the
> previous extent.
> 
> Unfortunately that precludes a number of valid mappings, and someone
> noticed and complained.  So use a simple check to ensure that ranges
> in the mapping extents don't overlap.

Acked-by: Someone <aris@redhat.com>

> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/user_namespace.c |   45 +++++++++++++++++++++++++++++++++++++++------
>  1 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 24f8ec3..8b65083 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -520,6 +520,42 @@ struct seq_operations proc_projid_seq_operations = {
>  	.show = projid_m_show,
>  };
>  
> +static bool mappings_overlap(struct uid_gid_map *new_map, struct uid_gid_extent *extent)
> +{
> +	u32 upper_first, lower_first, upper_last, lower_last;
> +	unsigned idx;
> +
> +	upper_first = extent->first;
> +	lower_first = extent->lower_first;
> +	upper_last = upper_first + extent->count - 1;
> +	lower_last = lower_first + extent->count - 1;
> +
> +	for (idx = 0; idx < new_map->nr_extents; idx++) {
> +		u32 prev_upper_first, prev_lower_first;
> +		u32 prev_upper_last, prev_lower_last;
> +		struct uid_gid_extent *prev;
> +
> +		prev = &new_map->extent[idx];
> +
> +		prev_upper_first = prev->first;
> +		prev_lower_first = prev->lower_first;
> +		prev_upper_last = prev_upper_first + prev->count - 1;
> +		prev_lower_last = prev_lower_first + prev->count - 1;
> +
> +		/* Does the upper range intersect a previous extent? */
> +		if ((prev_upper_first <= upper_last) &&
> +		    (prev_upper_last >= upper_first))
> +			return true;
> +
> +		/* Does the lower range intersect a previous extent? */
> +		if ((prev_lower_first <= lower_last) &&
> +		    (prev_lower_last >= lower_first))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +
>  static DEFINE_MUTEX(id_map_mutex);
>  
>  static ssize_t map_write(struct file *file, const char __user *buf,
> @@ -532,7 +568,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	struct user_namespace *ns = seq->private;
>  	struct uid_gid_map new_map;
>  	unsigned idx;
> -	struct uid_gid_extent *extent, *last = NULL;
> +	struct uid_gid_extent *extent = NULL;
>  	unsigned long page = 0;
>  	char *kbuf, *pos, *next_line;
>  	ssize_t ret = -EINVAL;
> @@ -635,14 +671,11 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  		if ((extent->lower_first + extent->count) <= extent->lower_first)
>  			goto out;
>  
> -		/* For now only accept extents that are strictly in order */
> -		if (last &&
> -		    (((last->first + last->count) > extent->first) ||
> -		     ((last->lower_first + last->count) > extent->lower_first)))
> +		/* Do the ranges in extent overlap any previous extents? */
> +		if (mappings_overlap(&new_map, extent))
>  			goto out;
>  
>  		new_map.nr_extents++;
> -		last = extent;
>  
>  		/* Fail if the file contains too many extents */
>  		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
> -- 
> 1.7.5.4
> 

-- 
Aristeu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
  2013-01-28 14:28   ` Aristeu Rozanski
@ 2013-01-28 14:41     ` Lord Glauber Costa of Sealand
  2013-01-28 15:12       ` Aristeu Rozanski
  0 siblings, 1 reply; 31+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28 14:41 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Eric W. Biederman, linux-fsdevel, Linux Containers, linux-kernel,
	Andrew Morton

Hello Mr. Someone.

On 01/28/2013 06:28 PM, Aristeu Rozanski wrote:
> On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote:
>> When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
>> and avoided duplicate mappings by the simple expedient of ensuring the
>> first number in a new extent was greater than any number in the
>> previous extent.
>>
>> Unfortunately that precludes a number of valid mappings, and someone
>> noticed and complained.  So use a simple check to ensure that ranges
>> in the mapping extents don't overlap.
> 
> Acked-by: Someone <aris@redhat.com>
> 

Documentation/SubmittingPatches:

"then you just add a line saying

        Signed-off-by: Random J Developer <random@developer.example.org>

using your real name (sorry, no pseudonyms or anonymous contributions.)"

I know how it feels, but that is how it goes. You'll have to change that.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 1/6] userns: Avoid recursion in put_user_ns
  2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
  2013-01-26 20:58   ` Serge E. Hallyn
@ 2013-01-28 14:51   ` Vasily Kulikov
  2013-01-28 16:34     ` Eric W. Biederman
  1 sibling, 1 reply; 31+ messages in thread
From: Vasily Kulikov @ 2013-01-28 14:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

On Fri, Jan 25, 2013 at 18:19 -0800, Eric W. Biederman wrote:
> 
> When freeing a deeply nested user namespace free_user_ns calls
> put_user_ns on it's parent which may in turn call free_user_ns again.
> When -fno-optimize-sibling-calls is passed to gcc one stack frame per
> user namespace is left on the stack, potentially overflowing the
> kernel stack.  CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls
> so we can't count on gcc to optimize this code.
> 
> Remove struct kref and use a plain atomic_t.  Making the code more
> flexible and easier to comprehend.  Make the loop in free_user_ns
> explict to guarantee that the stack does not overflow with
> CONFIG_FRAME_POINTER enabled.
> 
> I have tested this fix with a simple program that uses unshare to
> create a deeply nested user namespace structure and then calls exit.
> With 1000 nesteuser namespaces before this change running my test
> program causes the kernel to die a horrible death.  With 10,000,000
> nested user namespaces after this change my test program runs to
> completion and causes no harm.
> 
> Pointed-out-by: Vasily Kulikov <segoon@openwall.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Looks sane, thanks.

Acked-by: Vasily Kulikov <segoon@openwall.com>

The second bug I've noted in the same email (OOM) looks like should be
"fixed" by using memcg to limit kernel memory.  So, I'm fine with this
side of user_ns :)

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
  2013-01-28 14:41     ` Lord Glauber Costa of Sealand
@ 2013-01-28 15:12       ` Aristeu Rozanski
  0 siblings, 0 replies; 31+ messages in thread
From: Aristeu Rozanski @ 2013-01-28 15:12 UTC (permalink / raw)
  To: Lord Glauber Costa of Sealand
  Cc: Eric W. Biederman, linux-fsdevel, Linux Containers, linux-kernel,
	Andrew Morton

On Mon, Jan 28, 2013 at 06:41:39PM +0400, Lord Glauber Costa of Sealand wrote:
> Hello Mr. Someone.
> 
> On 01/28/2013 06:28 PM, Aristeu Rozanski wrote:
> > On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote:
> >> When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
> >> and avoided duplicate mappings by the simple expedient of ensuring the
> >> first number in a new extent was greater than any number in the
> >> previous extent.
> >>
> >> Unfortunately that precludes a number of valid mappings, and someone
> >> noticed and complained.  So use a simple check to ensure that ranges
> >> in the mapping extents don't overlap.
> > 
> > Acked-by: Someone <aris@redhat.com>
> > 
> 
> Documentation/SubmittingPatches:
> 
> "then you just add a line saying
> 
>         Signed-off-by: Random J Developer <random@developer.example.org>
> 
> using your real name (sorry, no pseudonyms or anonymous contributions.)"
> 
> I know how it feels, but that is how it goes. You'll have to change that.

oh well, another trip to the social security office and DMV.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28  8:22         ` Lord Glauber Costa of Sealand
@ 2013-01-28 16:19           ` Eric W. Biederman
  2013-01-28 16:37             ` Lord Glauber Costa of Sealand
  0 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-28 16:19 UTC (permalink / raw)
  To: Lord Glauber Costa of Sealand
  Cc: linux-fsdevel, Linux Containers, linux-kernel

Lord Glauber Costa of Sealand <glommer@parallels.com> writes:

> On 01/28/2013 12:14 PM, Eric W. Biederman wrote:
>> Lord Glauber Costa of Sealand <glommer@parallels.com> writes:
>> 
>>> I just saw in a later patch of yours that your concern here seems not
>>> limited to backed ram by tmpfs, but with things like the internal
>>> structures for userns , to avoid patterns in the form: 'for (;;)
>>> unshare(...)'
>>>
>>> Humm, it does seem sensible. The kernel memory controller aims to
>>> prevent exactly things like that. But they all exist already before
>>> userns: there are destructive patterns like that with sockets, dentries,
>>> processes, and pretty much every other resource in the kernel. So
>>> Although the recommendation per-se makes sense, I am wondering if it is
>>> worth it to mention anything in the user_ns config?
>> 
>> The config might be overkill.  However I have already gotten bug reports
>> about there being no limits.
>> 
>> So someone needs to stop and connect the dots and say: 
> Absolutely, and I am all for it
>
>> "If you care this is what you can do." 
>
> How about we say it, then?
>
> The current text in quite cryptic in this aspect, in the sense that it
> doesn't give enough information for standard people about what are the
> problems involved.
>
> Of course, maybe the Kconfig text is not the best place for having all
> the info: but don't we have some place in Documentation/ where we could
> put this, and then refer people there from Kconfig ?

At this point I have written the best text I can.

Please feel free to look at my tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

and send me an patch on top of that to improve the wording.

At this point I have done my best to connect the dots for people who
care, that the memory control group is what they need to limit what
people can do with user namespaces.

My hope is that there is at least a passing mention in the next user
namespace article on lwn.

For two pieces of software that were designed to complement each other
I find it a bit surprising how many people (including myself) need the
connection made that memory control groups and user namespaces should go
together.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 1/6] userns: Avoid recursion in put_user_ns
  2013-01-28 14:51   ` Vasily Kulikov
@ 2013-01-28 16:34     ` Eric W. Biederman
  0 siblings, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-28 16:34 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, linux-fsdevel

Vasily Kulikov <segoon@openwall.com> writes:

> Acked-by: Vasily Kulikov <segoon@openwall.com>
>
> The second bug I've noted in the same email (OOM) looks like should be
> "fixed" by using memcg to limit kernel memory.  So, I'm fine with this
> side of user_ns :)

Good.  That is what it looked like from here.

You pointed out one or two other things that I am still thinking about.

Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28 16:19           ` Eric W. Biederman
@ 2013-01-28 16:37             ` Lord Glauber Costa of Sealand
  2013-01-28 17:18               ` Eric W. Biederman
  0 siblings, 1 reply; 31+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-28 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel, Linux Containers, linux-kernel, Michael Kerrisk

On 01/28/2013 08:19 PM, Eric W. Biederman wrote:
> Lord Glauber Costa of Sealand <glommer@parallels.com> writes:
> 
>> On 01/28/2013 12:14 PM, Eric W. Biederman wrote:
>>> Lord Glauber Costa of Sealand <glommer@parallels.com> writes:
>>>
>>>> I just saw in a later patch of yours that your concern here seems not
>>>> limited to backed ram by tmpfs, but with things like the internal
>>>> structures for userns , to avoid patterns in the form: 'for (;;)
>>>> unshare(...)'
>>>>
>>>> Humm, it does seem sensible. The kernel memory controller aims to
>>>> prevent exactly things like that. But they all exist already before
>>>> userns: there are destructive patterns like that with sockets, dentries,
>>>> processes, and pretty much every other resource in the kernel. So
>>>> Although the recommendation per-se makes sense, I am wondering if it is
>>>> worth it to mention anything in the user_ns config?
>>>
>>> The config might be overkill.  However I have already gotten bug reports
>>> about there being no limits.
>>>
>>> So someone needs to stop and connect the dots and say: 
>> Absolutely, and I am all for it
>>
>>> "If you care this is what you can do." 
>>
>> How about we say it, then?
>>
>> The current text in quite cryptic in this aspect, in the sense that it
>> doesn't give enough information for standard people about what are the
>> problems involved.
>>
>> Of course, maybe the Kconfig text is not the best place for having all
>> the info: but don't we have some place in Documentation/ where we could
>> put this, and then refer people there from Kconfig ?
> 
> At this point I have written the best text I can.
> 
> Please feel free to look at my tree at:
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next
> 
Will do soon, thanks for your effort.

> and send me an patch on top of that to improve the wording.
> 
> At this point I have done my best to connect the dots for people who
> care, that the memory control group is what they need to limit what
> people can do with user namespaces.
> 
> My hope is that there is at least a passing mention in the next user
> namespace article on lwn.
It would definitely be helpful. Let's hope someone from there is reading! =)

> 
> For two pieces of software that were designed to complement each other
> I find it a bit surprising how many people (including myself) need the
> connection made that memory control groups and user namespaces should go
> together.

Well, I've manifested many times in here that I am less than satisfied
about the fact that the connection between namespaces and cgroups are so
loose. There are many situations, like virtualizing the proc files and
friends where I believe we could benefit from having the information
about whether or not cgroups and namespaces are used at the same time.

But since after considering a lot of alternatives, I could never come up
with one that were really clean,  I guess just communicating it
extensively is the best we can do so far.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH review 3/6] userns: Recommend use of memory control groups.
  2013-01-28 16:37             ` Lord Glauber Costa of Sealand
@ 2013-01-28 17:18               ` Eric W. Biederman
  0 siblings, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2013-01-28 17:18 UTC (permalink / raw)
  To: Lord Glauber Costa of Sealand
  Cc: linux-fsdevel, Linux Containers, linux-kernel, Michael Kerrisk

Lord Glauber Costa of Sealand <glommer@parallels.com> writes:

>> For two pieces of software that were designed to complement each other
>> I find it a bit surprising how many people (including myself) need the
>> connection made that memory control groups and user namespaces should go
>> together.
>
> Well, I've manifested many times in here that I am less than satisfied
> about the fact that the connection between namespaces and cgroups are so
> loose. There are many situations, like virtualizing the proc files and
> friends where I believe we could benefit from having the information
> about whether or not cgroups and namespaces are used at the same time.

Certainly there are issues where there are not good ways for
applications to discover how much memory or how many cpus are available
for the application to run on.  And applications resort to looking at
how much memory or how many cpus are in the box.

The only way I can think of to make things better is to provide good
alternatives that people can look at and make limiting applications
common enough that the bugs start getting fixed.  And possibly doing the
work to fix a common library or two.

Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-01-28 17:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-26  2:15 [PATCH review 0/6] miscelaneous user namespace patches Eric W. Biederman
2013-01-26  2:19 ` [PATCH review 1/6] userns: Avoid recursion in put_user_ns Eric W. Biederman
2013-01-26 20:58   ` Serge E. Hallyn
2013-01-28 14:51   ` Vasily Kulikov
2013-01-28 16:34     ` Eric W. Biederman
2013-01-26  2:21 ` [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap Eric W. Biederman
2013-01-26 21:08   ` Serge E. Hallyn
2013-01-28 14:28   ` Aristeu Rozanski
2013-01-28 14:41     ` Lord Glauber Costa of Sealand
2013-01-28 15:12       ` Aristeu Rozanski
2013-01-26  2:22 ` [PATCH review 3/6] userns: Recommend use of memory control groups Eric W. Biederman
2013-01-26 21:13   ` Serge E. Hallyn
2013-01-27  6:19     ` Eric W. Biederman
2013-01-28  7:37   ` Lord Glauber Costa of Sealand
2013-01-28  7:50     ` Lord Glauber Costa of Sealand
2013-01-28  8:14       ` Eric W. Biederman
2013-01-28  8:22         ` Lord Glauber Costa of Sealand
2013-01-28 16:19           ` Eric W. Biederman
2013-01-28 16:37             ` Lord Glauber Costa of Sealand
2013-01-28 17:18               ` Eric W. Biederman
2013-01-28  8:05     ` Eric W. Biederman
2013-01-26  2:23 ` [PATCH review 4/6] userns: Allow the userns root to mount of devpts Eric W. Biederman
2013-01-26 21:22   ` Serge E. Hallyn
2013-01-26  2:26 ` [PATCH review 5/6] userns: Allow the userns root to mount ramfs Eric W. Biederman
2013-01-26 21:29   ` Serge E. Hallyn
2013-01-27  6:09     ` Eric W. Biederman
2013-01-27 18:23       ` Serge E. Hallyn
2013-01-27 18:23   ` Serge E. Hallyn
2013-01-26  2:26 ` [PATCH review 6/6] userns: Allow the userns root to mount tmpfs Eric W. Biederman
2013-01-27 18:23   ` Serge E. Hallyn
2013-01-28  1:28   ` Gao feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).