linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Namespace contrained helper execution
@ 2014-11-25  1:07 Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 1/4] vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install() Ian Kent
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25  1:07 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: J. Bruce Fields, Oleg Nesterov, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

Hi all,

Some time ago an attempt was made to update call_usermodehelper()
to execute within it's namespace.

Comments at the time were basically that the approach didn't go
nearly far enough to constrain the process.

This series attempts to remedy that by taking care to create an
appropriate namespace environment then switch to it and setup
fs_struct for path walking prior to the user mode helper thread
runner calling do_execve().

Please review and comment on the patch series.
Ian

---

Benjamin Coddington (1):
      KEYS: exec request-key within the requesting task's namespace

Ian Kent (3):
      vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install()
      nsproxy - make create_new_namespaces() non-static
      kmod - add call_usermodehelper_ns() helper


 fs/namespace.c              |   41 ++++++++++++++++++++++++++++-----------
 include/linux/kmod.h        |   17 ++++++++++++++++
 include/linux/mount.h       |    1 +
 include/linux/nsproxy.h     |    3 +++
 kernel/kmod.c               |   39 +++++++++++++++++++++++++++++++++++++
 kernel/nsproxy.c            |    2 +-
 security/keys/request_key.c |   45 +++++++++++++++++++++++++++++++++++++------
 7 files changed, 129 insertions(+), 19 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/4] vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install()
  2014-11-25  1:07 [RFC PATCH 0/4] Namespace contrained helper execution Ian Kent
@ 2014-11-25  1:07 ` Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 2/4] nsproxy - make create_new_namespaces() non-static Ian Kent
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25  1:07 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: J. Bruce Fields, Oleg Nesterov, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

Some users of kmod.c's usermodehelper need to setup their fs_struct
root and pwd based on the callers namespaces after the kernel thread
runner has created a new process but before do_execve() is called.

Break out the fs_struct portion of mntns_install so it can be used
for this purpose.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/namespace.c        |   41 +++++++++++++++++++++++++++++------------
 include/linux/mount.h |    1 +
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5b66b2b..3cdbb9e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3165,11 +3165,38 @@ static void mntns_put(void *ns)
 	put_mnt_ns(ns);
 }
 
+/*
+ * Set fs root and pwd for a subsequent call to do_execve().
+ *
+ * This assumes that an nsproxy has been created within the
+ * namespace context that is the target for the exec so that
+ * mnt_ns is already set. Additionally, since the nsproxy is
+ * created within the requesting namespace context the security
+ * checks of mntns_install() aren't required.
+ */
+void mntns_setfs(struct mnt_namespace *mnt_ns)
+{
+	struct fs_struct *fs = current->fs;
+	struct path root;
+
+	/* Find the root */
+	root.mnt    = &mnt_ns->root->mnt;
+	root.dentry = mnt_ns->root->mnt.mnt_root;
+	path_get(&root);
+	while(d_mountpoint(root.dentry) && follow_down_one(&root))
+		;
+
+	/* Update the pwd and root */
+	set_fs_pwd(fs, &root);
+	set_fs_root(fs, &root);
+
+	path_put(&root);
+}
+
 static int mntns_install(struct nsproxy *nsproxy, void *ns)
 {
 	struct fs_struct *fs = current->fs;
 	struct mnt_namespace *mnt_ns = ns;
-	struct path root;
 
 	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
@@ -3183,18 +3210,8 @@ static int mntns_install(struct nsproxy *nsproxy, void *ns)
 	put_mnt_ns(nsproxy->mnt_ns);
 	nsproxy->mnt_ns = mnt_ns;
 
-	/* Find the root */
-	root.mnt    = &mnt_ns->root->mnt;
-	root.dentry = mnt_ns->root->mnt.mnt_root;
-	path_get(&root);
-	while(d_mountpoint(root.dentry) && follow_down_one(&root))
-		;
-
-	/* Update the pwd and root */
-	set_fs_pwd(fs, &root);
-	set_fs_root(fs, &root);
+	mntns_setfs(mnt_ns);
 
-	path_put(&root);
 	return 0;
 }
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c2c561d..a9f6548 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -80,6 +80,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+void mntns_setfs(struct mnt_namespace *mnt_ns);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/4] nsproxy - make create_new_namespaces() non-static
  2014-11-25  1:07 [RFC PATCH 0/4] Namespace contrained helper execution Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 1/4] vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install() Ian Kent
@ 2014-11-25  1:07 ` Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 4/4] KEYS: exec request-key within the requesting task's namespace Ian Kent
  3 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25  1:07 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: J. Bruce Fields, Oleg Nesterov, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

create_new_namespaces() will be needed by call_usermodehelper_ns()
and call_usermodehelper_keys() for namespace restricted execution.

Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/nsproxy.h |    3 +++
 kernel/nsproxy.c        |    2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 35fa08f..dfe7dda 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -62,6 +62,9 @@ extern struct nsproxy init_nsproxy;
  *
  */
 
+struct nsproxy *create_new_namespaces(unsigned long flags,
+	struct task_struct *tsk, struct user_namespace *user_ns,
+	struct fs_struct *new_fs);
 int copy_namespaces(unsigned long flags, struct task_struct *tsk);
 void exit_task_namespaces(struct task_struct *tsk);
 void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index ef42d0a..dcfbb13 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -56,7 +56,7 @@ static inline struct nsproxy *create_nsproxy(void)
  * Return the newly created nsproxy.  Do not attach this to the task,
  * leave it to the caller to do proper locking and attach it to task.
  */
-static struct nsproxy *create_new_namespaces(unsigned long flags,
+struct nsproxy *create_new_namespaces(unsigned long flags,
 	struct task_struct *tsk, struct user_namespace *user_ns,
 	struct fs_struct *new_fs)
 {


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25  1:07 [RFC PATCH 0/4] Namespace contrained helper execution Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 1/4] vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install() Ian Kent
  2014-11-25  1:07 ` [RFC PATCH 2/4] nsproxy - make create_new_namespaces() non-static Ian Kent
@ 2014-11-25  1:07 ` Ian Kent
  2014-11-25 21:52   ` Oleg Nesterov
  2014-11-25  1:07 ` [RFC PATCH 4/4] KEYS: exec request-key within the requesting task's namespace Ian Kent
  3 siblings, 1 reply; 27+ messages in thread
From: Ian Kent @ 2014-11-25  1:07 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: J. Bruce Fields, Oleg Nesterov, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

The call_usermodehelper() function executes all binaries in the
global "init" root context. This doesn't allow a binary to be run
within the callers namespace (aka. a container). So create a new
function call_usermodehelper_ns() to do this.

Both containerized NFS client and NFS server need the ability to
execute a binary within their container. To do this create a new
nsproxy within the callers' context so it can be used for setup
prior to calling do_execve() from the user mode helper thread
runner.

Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/kmod.h |   17 +++++++++++++++++
 kernel/kmod.c        |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 0555cc6..fd5509a 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -69,6 +69,23 @@ struct subprocess_info {
 extern int
 call_usermodehelper(char *path, char **argv, char **envp, int wait);
 
+#if !defined(CONFIG_PROC_FS) || !defined(CONFIG_NAMESPACES)
+inline struct nsproxy *umh_open_ns(void)
+{
+	return NULL;
+}
+
+inline int
+call_usermodehelper_ns(char *path, char **argv, char **envp, int wait)
+{
+	return -ENOTSUP;
+}
+#else
+extern struct nsproxy *umh_open_ns(void);
+extern int
+call_usermodehelper_ns(char *path, char **argv, char **envp, int wait);
+#endif
+
 extern struct subprocess_info *
 call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t gfp_mask,
 			  int (*init)(struct subprocess_info *info, struct cred *new),
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 80f7a6d..0ddcfbb 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -39,6 +39,7 @@
 #include <linux/rwsem.h>
 #include <linux/ptrace.h>
 #include <linux/async.h>
+#include <linux/mount.h>
 #include <asm/uaccess.h>
 
 #include <trace/events/module.h>
@@ -642,6 +643,44 @@ int call_usermodehelper(char *path, char **argv, char **envp, int wait)
 }
 EXPORT_SYMBOL(call_usermodehelper);
 
+#if defined(CONFIG_PROC_FS) && defined(CONFIG_NAMESPACES)
+static int umh_set_ns(struct subprocess_info *info, struct cred *new)
+{
+	struct nsproxy *ns = info->data;
+
+	mntns_setfs(ns->mnt_ns);
+	switch_task_namespaces(current, ns);
+	return 0;
+}
+
+struct nsproxy *umh_open_ns(void)
+{
+	return create_new_namespaces(0, current, current_user_ns(), current->fs);
+}
+
+/* Call a usermode helper to execute within current namespace. */
+int call_usermodehelper_ns(char *path, char **argv, char **envp, int wait)
+{
+	struct subprocess_info *info;
+	struct nsproxy *ns;
+	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
+
+	ns = umh_open_ns();
+	if (IS_ERR(ns))
+		return PTR_ERR(ns);
+
+	info = call_usermodehelper_setup(path, argv, envp,
+					 gfp_mask, umh_set_ns, NULL, ns);
+	if (!info) {
+		free_nsproxy(ns);
+		return -ENOMEM;
+	}
+
+	return call_usermodehelper_exec(info, wait);
+}
+EXPORT_SYMBOL(call_usermodehelper_ns);
+#endif
+
 static int proc_cap_handler(struct ctl_table *table, int write,
 			 void __user *buffer, size_t *lenp, loff_t *ppos)
 {


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 4/4] KEYS: exec request-key within the requesting task's namespace
  2014-11-25  1:07 [RFC PATCH 0/4] Namespace contrained helper execution Ian Kent
                   ` (2 preceding siblings ...)
  2014-11-25  1:07 ` [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper Ian Kent
@ 2014-11-25  1:07 ` Ian Kent
  3 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25  1:07 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: J. Bruce Fields, Oleg Nesterov, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

From: Benjamin Coddington <bcodding@redhat.com>

Copy the current task's namespaces into the request-key userspace helper to
restrict contained processes from executing key instantiation processes
outside their containers.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 security/keys/request_key.c |   45 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index bb4337c..b03feec 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -17,6 +17,8 @@
 #include <linux/err.h>
 #include <linux/keyctl.h>
 #include <linux/slab.h>
+#include <linux/nsproxy.h>
+#include <linux/mount.h>
 #include "internal.h"
 
 #define key_negative_timeout	60	/* default timeout on a negative key's existence */
@@ -46,6 +48,11 @@ void complete_request_key(struct key_construction *cons, int error)
 }
 EXPORT_SYMBOL(complete_request_key);
 
+struct request_key_info {
+	struct key	*keyring;
+	struct nsproxy	*nsproxy;
+};
+
 /*
  * Initialise a usermode helper that is going to have a specific session
  * keyring.
@@ -55,9 +62,14 @@ EXPORT_SYMBOL(complete_request_key);
  */
 static int umh_keys_init(struct subprocess_info *info, struct cred *cred)
 {
-	struct key *keyring = info->data;
+	struct request_key_info *rki = info->data;
+	struct nsproxy *ns = rki->nsproxy;
 
-	return install_session_keyring_to_cred(cred, keyring);
+	if (ns) {
+		mntns_setfs(ns->mnt_ns);
+		switch_task_namespaces(current, ns);
+	}
+	return install_session_keyring_to_cred(cred, rki->keyring);
 }
 
 /*
@@ -65,8 +77,9 @@ static int umh_keys_init(struct subprocess_info *info, struct cred *cred)
  */
 static void umh_keys_cleanup(struct subprocess_info *info)
 {
-	struct key *keyring = info->data;
-	key_put(keyring);
+	struct request_key_info *rki = info->data;
+	key_put(rki->keyring);
+	kfree(rki);
 }
 
 /*
@@ -76,12 +89,32 @@ static int call_usermodehelper_keys(char *path, char **argv, char **envp,
 					struct key *session_keyring, int wait)
 {
 	struct subprocess_info *info;
+	struct request_key_info *rki;
+	struct nsproxy *ns;
+
+	ns = umh_open_ns();
+	if (ns && IS_ERR(ns))
+		return PTR_ERR(ns);
+
+	rki = kmalloc(sizeof(*rki), GFP_KERNEL);
+	if (!rki) {
+		if (ns)
+			free_nsproxy(ns);
+		return -ENOMEM;
+	}
+
+	rki->keyring = session_keyring;
+	rki->nsproxy = ns;
 
 	info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
 					  umh_keys_init, umh_keys_cleanup,
-					  session_keyring);
-	if (!info)
+					  rki);
+	if (!info) {
+		if (ns)
+			free_nsproxy(ns);
+		kfree(rki);
 		return -ENOMEM;
+	}
 
 	key_get(session_keyring);
 	return call_usermodehelper_exec(info, wait);


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25  1:07 ` [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper Ian Kent
@ 2014-11-25 21:52   ` Oleg Nesterov
  2014-11-25 22:06     ` Oleg Nesterov
  2014-11-25 22:36     ` Ian Kent
  0 siblings, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2014-11-25 21:52 UTC (permalink / raw)
  To: Ian Kent
  Cc: Kernel Mailing List, J. Bruce Fields, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

Let me first apologize, I didn't actually read this series yet.

But I have to admit that so far I do not like this approach...
probably I am biased.

On 11/25, Ian Kent wrote:
>
> The call_usermodehelper() function executes all binaries in the
> global "init" root context. This doesn't allow a binary to be run
> within the callers namespace (aka. a container).

Please see below.

> Both containerized NFS client and NFS server need the ability to
> execute a binary within their container. To do this create a new
> nsproxy within the callers' context so it can be used for setup
> prior to calling do_execve() from the user mode helper thread
> runner.

and probably we also need this for coredump helpers, we want them
to be per-namespace.

> +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> +{
> +	struct nsproxy *ns = info->data;
> +
> +	mntns_setfs(ns->mnt_ns);

Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
Let me remind about the coredump. The dumping task can cloned with
CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
this enough.


> +	switch_task_namespaces(current, ns);

This doesn't look sane because this won't switch task_active_pid_ns().

And this reminds me another discussion, please look at
http://marc.info/?l=linux-kernel&m=138479570926192

Once again, this is just an idea to provoke more discussion. I am starting
to think that perhaps we need pid_ns->umh_helper (init by default). And
PR_SET_NS_UMH_HELPER.

Not sure.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 21:52   ` Oleg Nesterov
@ 2014-11-25 22:06     ` Oleg Nesterov
  2014-11-25 22:23       ` Eric W. Biederman
                         ` (2 more replies)
  2014-11-25 22:36     ` Ian Kent
  1 sibling, 3 replies; 27+ messages in thread
From: Oleg Nesterov @ 2014-11-25 22:06 UTC (permalink / raw)
  To: Ian Kent
  Cc: Kernel Mailing List, J. Bruce Fields, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

On 11/25, Oleg Nesterov wrote:
>
> Let me first apologize, I didn't actually read this series yet.
>
> But I have to admit that so far I do not like this approach...
> probably I am biased.

Yes.

And I have another concern... this is mostly a feeling, I can be
easily wrong but:

> On 11/25, Ian Kent wrote:
> >
> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> > +{
> > +	struct nsproxy *ns = info->data;
> > +
> > +	mntns_setfs(ns->mnt_ns);
>
> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> Let me remind about the coredump. The dumping task can cloned with
> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> this enough.

And otoh. If we actually want to use the caller's mnt_ns/namespaces we
could simply fork/reparent a child which will do execve ?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:06     ` Oleg Nesterov
@ 2014-11-25 22:23       ` Eric W. Biederman
  2014-11-25 23:07         ` Ian Kent
  2014-11-25 23:14       ` Ian Kent
  2014-11-26 11:46       ` David Howells
  2 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2014-11-25 22:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ian Kent, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 11/25, Oleg Nesterov wrote:
>>
>> Let me first apologize, I didn't actually read this series yet.
>>
>> But I have to admit that so far I do not like this approach...
>> probably I am biased.
>
> Yes.
>
> And I have another concern... this is mostly a feeling, I can be
> easily wrong but:
>
>> On 11/25, Ian Kent wrote:
>> >
>> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
>> > +{
>> > +	struct nsproxy *ns = info->data;
>> > +
>> > +	mntns_setfs(ns->mnt_ns);
>>
>> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
>> Let me remind about the coredump. The dumping task can cloned with
>> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
>> this enough.
>
> And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> could simply fork/reparent a child which will do execve ?

That would certainly be a better approach, and roughly equivalent to
what exists here.  That would even ensure we remain in the proper
cgroups, and lsm context.

The practical problem with the approach presented here is that I can
hijack any user mode helper I wish, and make it run in any executable I
wish as the global root user.

Ian if we were to merge this I believe you would win the award for
easiest path to a root shell.

Eric


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 21:52   ` Oleg Nesterov
  2014-11-25 22:06     ` Oleg Nesterov
@ 2014-11-25 22:36     ` Ian Kent
  2014-11-25 23:27       ` Eric W. Biederman
  2014-11-27  1:30       ` Oleg Nesterov
  1 sibling, 2 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25 22:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kernel Mailing List, J. Bruce Fields, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

On Tue, 2014-11-25 at 22:52 +0100, Oleg Nesterov wrote:
> Let me first apologize, I didn't actually read this series yet.
> 
> But I have to admit that so far I do not like this approach...
> probably I am biased.

Oleg, thanks for your comments.

> 
> On 11/25, Ian Kent wrote:
> >
> > The call_usermodehelper() function executes all binaries in the
> > global "init" root context. This doesn't allow a binary to be run
> > within the callers namespace (aka. a container).
> 
> Please see below.
> 
> > Both containerized NFS client and NFS server need the ability to
> > execute a binary within their container. To do this create a new
> > nsproxy within the callers' context so it can be used for setup
> > prior to calling do_execve() from the user mode helper thread
> > runner.
> 
> and probably we also need this for coredump helpers, we want them
> to be per-namespace.

To save me some time could you point me to some of the related code
please. I don't normally play in that area.

> 
> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> > +{
> > +	struct nsproxy *ns = info->data;
> > +
> > +	mntns_setfs(ns->mnt_ns);
> 
> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> Let me remind about the coredump. The dumping task can cloned with
> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> this enough.
> 
> 
> > +	switch_task_namespaces(current, ns);
> 
> This doesn't look sane because this won't switch task_active_pid_ns().

I wondered about that too but I didn't design the open()/setns()
interface and TBH I've been wondering how he hell it is supposed to work
because of exactly this.

The statement amounts to saying that the
fd = open(/proc/<pid in target namespace>/ns/mnt);
setns(fd);
won't set the namespace properly but the documentation I've seen so far
(there's probably more that I need to see, I'll look further) implies
this is sufficient.

How does one correctly set the namespace in user space since each of
the /proc/<pid>/ns/<namespace> will use a slightly different
proc_ns_operations install function? 

Are we saying that, for example, if open(/proc/<pid>/ns/pid)/setns() is
used then the process must not do path lookups if it expects them to be
within the namespace and restrict itself to pid related system calls
only and so on for each of the other namespaces?

Or is it assumed that userspace will do
open(/proc/<pid>/ns/<namespace>)/setns()/close() every time it makes
systems calls that rely on a specific type of namespace?

> 
> And this reminds me another discussion, please look at
> http://marc.info/?l=linux-kernel&m=138479570926192
> 
> Once again, this is just an idea to provoke more discussion. I am starting
> to think that perhaps we need pid_ns->umh_helper (init by default). And
> PR_SET_NS_UMH_HELPER.

Yeah, I'll need to digest that for a while.

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:23       ` Eric W. Biederman
@ 2014-11-25 23:07         ` Ian Kent
  2014-11-25 23:19           ` Eric W. Biederman
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Kent @ 2014-11-25 23:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> Oleg Nesterov <oleg@redhat.com> writes:
> 
> > On 11/25, Oleg Nesterov wrote:
> >>
> >> Let me first apologize, I didn't actually read this series yet.
> >>
> >> But I have to admit that so far I do not like this approach...
> >> probably I am biased.
> >
> > Yes.
> >
> > And I have another concern... this is mostly a feeling, I can be
> > easily wrong but:
> >
> >> On 11/25, Ian Kent wrote:
> >> >
> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> >> > +{
> >> > +	struct nsproxy *ns = info->data;
> >> > +
> >> > +	mntns_setfs(ns->mnt_ns);
> >>
> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> >> Let me remind about the coredump. The dumping task can cloned with
> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> >> this enough.
> >
> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> > could simply fork/reparent a child which will do execve ?
> 
> That would certainly be a better approach, and roughly equivalent to
> what exists here.  That would even ensure we remain in the proper
> cgroups, and lsm context.
> 
> The practical problem with the approach presented here is that I can
> hijack any user mode helper I wish, and make it run in any executable I
> wish as the global root user.
> 
> Ian if we were to merge this I believe you would win the award for
> easiest path to a root shell.

LOL, OK, so there's a problem with this.

But, how should a user mode helper execute within a namespace (or more
specifically within a container)?

Suppose a user mode helper program scans through the pid list and
somehow picks the correct process pid and then does an
open()/setns()/execve().

Does that then satisfy the requirements?
What needs to be done to safely do that in kernel?

The other approach I've considered is doing a full open()/setns() in
kernel (since the caller already knows its pid) but it sounds like
that's not right either.

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:06     ` Oleg Nesterov
  2014-11-25 22:23       ` Eric W. Biederman
@ 2014-11-25 23:14       ` Ian Kent
  2014-11-26 11:46       ` David Howells
  2 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25 23:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kernel Mailing List, J. Bruce Fields, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

On Tue, 2014-11-25 at 23:06 +0100, Oleg Nesterov wrote:
> On 11/25, Oleg Nesterov wrote:
> >
> > Let me first apologize, I didn't actually read this series yet.
> >
> > But I have to admit that so far I do not like this approach...
> > probably I am biased.
> 
> Yes.
> 
> And I have another concern... this is mostly a feeling, I can be
> easily wrong but:
> 
> > On 11/25, Ian Kent wrote:
> > >
> > > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> > > +{
> > > +	struct nsproxy *ns = info->data;
> > > +
> > > +	mntns_setfs(ns->mnt_ns);
> >
> > Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> > Let me remind about the coredump. The dumping task can cloned with
> > CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> > this enough.
> 
> And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> could simply fork/reparent a child which will do execve ?

Are you saying that the user space program should be modified to do
this?

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 23:07         ` Ian Kent
@ 2014-11-25 23:19           ` Eric W. Biederman
  2014-11-25 23:50             ` Ian Kent
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2014-11-25 23:19 UTC (permalink / raw)
  To: Ian Kent
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

Ian Kent <ikent@redhat.com> writes:

> On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
>> Oleg Nesterov <oleg@redhat.com> writes:
>> 
>> > On 11/25, Oleg Nesterov wrote:
>> >>
>> >> Let me first apologize, I didn't actually read this series yet.
>> >>
>> >> But I have to admit that so far I do not like this approach...
>> >> probably I am biased.
>> >
>> > Yes.
>> >
>> > And I have another concern... this is mostly a feeling, I can be
>> > easily wrong but:
>> >
>> >> On 11/25, Ian Kent wrote:
>> >> >
>> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
>> >> > +{
>> >> > +	struct nsproxy *ns = info->data;
>> >> > +
>> >> > +	mntns_setfs(ns->mnt_ns);
>> >>
>> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
>> >> Let me remind about the coredump. The dumping task can cloned with
>> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
>> >> this enough.
>> >
>> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
>> > could simply fork/reparent a child which will do execve ?
>> 
>> That would certainly be a better approach, and roughly equivalent to
>> what exists here.  That would even ensure we remain in the proper
>> cgroups, and lsm context.
>> 
>> The practical problem with the approach presented here is that I can
>> hijack any user mode helper I wish, and make it run in any executable I
>> wish as the global root user.
>> 
>> Ian if we were to merge this I believe you would win the award for
>> easiest path to a root shell.
>
> LOL, OK, so there's a problem with this.
>
> But, how should a user mode helper execute within a namespace (or more
> specifically within a container)?
>
> Suppose a user mode helper program scans through the pid list and
> somehow picks the correct process pid and then does an
> open()/setns()/execve().
>
> Does that then satisfy the requirements?
> What needs to be done to safely do that in kernel?
>
> The other approach I've considered is doing a full open()/setns() in
> kernel (since the caller already knows its pid) but it sounds like
> that's not right either.

The approach we agreed upon with the core dump helper was to provide
enough information that userspace could figure out what was the
appropriate policy and call nsenter/setns.

The only sane approach I can think of in the context of nfs is to fork
a kernel thread at mount time that has all of the appropriate context
because it was captured from the privileged mounting process, and use
that kernel as the equivalent of kthreadd.

There may be some intermediate ground where we capture things or we use
the init process of the pid namespace (captured at mount time) as our
template/reference process.

If we are going to set this stuff up in the kernel we need a reference
process that we can create children of because what is possible with
respect to containers keeps changing, and it is extremely error prone to
figure out what all othe crazy little bits are, and to update everything
every time someone tweaks the kernel's capabilities.  We have kthreadd
because it was too error prone to scrub a userspace thread of all of the
userspace bits and make it the equivalent of what kthreadd is today.

Of course it is also rather nice to have something to hang everything
else on.

In summary we need a reference struct task that is all setup properly
so that we can create an appropriate kernel thread.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:36     ` Ian Kent
@ 2014-11-25 23:27       ` Eric W. Biederman
  2014-11-28  0:19         ` Ian Kent
  2014-11-27  1:30       ` Oleg Nesterov
  1 sibling, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2014-11-25 23:27 UTC (permalink / raw)
  To: Ian Kent
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

Ian Kent <ikent@redhat.com> writes:

> On Tue, 2014-11-25 at 22:52 +0100, Oleg Nesterov wrote:
>> Let me first apologize, I didn't actually read this series yet.
>> 
>> But I have to admit that so far I do not like this approach...
>> probably I am biased.
>
> Oleg, thanks for your comments.
>
>> 
>> On 11/25, Ian Kent wrote:
>> >
>> > The call_usermodehelper() function executes all binaries in the
>> > global "init" root context. This doesn't allow a binary to be run
>> > within the callers namespace (aka. a container).
>> 
>> Please see below.
>> 
>> > Both containerized NFS client and NFS server need the ability to
>> > execute a binary within their container. To do this create a new
>> > nsproxy within the callers' context so it can be used for setup
>> > prior to calling do_execve() from the user mode helper thread
>> > runner.
>> 
>> and probably we also need this for coredump helpers, we want them
>> to be per-namespace.
>
> To save me some time could you point me to some of the related code
> please. I don't normally play in that area.
>
>> 
>> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
>> > +{
>> > +	struct nsproxy *ns = info->data;
>> > +
>> > +	mntns_setfs(ns->mnt_ns);
>> 
>> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
>> Let me remind about the coredump. The dumping task can cloned with
>> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
>> this enough.
>> 
>> 
>> > +	switch_task_namespaces(current, ns);
>> 
>> This doesn't look sane because this won't switch task_active_pid_ns().
>
> I wondered about that too but I didn't design the open()/setns()
> interface and TBH I've been wondering how he hell it is supposed to work
> because of exactly this.
>
> The statement amounts to saying that the
> fd = open(/proc/<pid in target namespace>/ns/mnt);
> setns(fd);
> won't set the namespace properly but the documentation I've seen so far
> (there's probably more that I need to see, I'll look further) implies
> this is sufficient.

It is but it is a bit peculiar.

> How does one correctly set the namespace in user space since each of
> the /proc/<pid>/ns/<namespace> will use a slightly different
> proc_ns_operations install function? 
>
> Are we saying that, for example, if open(/proc/<pid>/ns/pid)/setns() is
> used then the process must not do path lookups if it expects them to be
> within the namespace and restrict itself to pid related system calls
> only and so on for each of the other namespaces?

In userspace you can only set the pid namespace for new children.  You
can never change your own pid namespace.  Because actually changing a
processes pid is too nasty to contemplate, or implement and because in a
login daemon context having your first child be the initial process of
the pid namespace is actually what is desirable.

> Or is it assumed that userspace will do
> open(/proc/<pid>/ns/<namespace>)/setns()/close() every time it makes
> systems calls that rely on a specific type of namespace?

setns is designed to be the exception, rather thant something you need
to do every time.

But nsproxy is not the one true source of namespaces, nsproxy is simply
a convinient place so we don't bloat struct task.  The primary reference
for the pid namespace is in a struct pid, what is in nsproxy is just
which pid namespace children will be created in.  The user namespace
reference comes from struct cred.

>> And this reminds me another discussion, please look at
>> http://marc.info/?l=linux-kernel&m=138479570926192
>> 
>> Once again, this is just an idea to provoke more discussion. I am starting
>> to think that perhaps we need pid_ns->umh_helper (init by default). And
>> PR_SET_NS_UMH_HELPER.
>
> Yeah, I'll need to digest that for a while.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 23:19           ` Eric W. Biederman
@ 2014-11-25 23:50             ` Ian Kent
  2014-11-26  0:44               ` Ian Kent
  2014-11-26  1:38               ` Eric W. Biederman
  0 siblings, 2 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-25 23:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
> Ian Kent <ikent@redhat.com> writes:
> 
> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> >> Oleg Nesterov <oleg@redhat.com> writes:
> >> 
> >> > On 11/25, Oleg Nesterov wrote:
> >> >>
> >> >> Let me first apologize, I didn't actually read this series yet.
> >> >>
> >> >> But I have to admit that so far I do not like this approach...
> >> >> probably I am biased.
> >> >
> >> > Yes.
> >> >
> >> > And I have another concern... this is mostly a feeling, I can be
> >> > easily wrong but:
> >> >
> >> >> On 11/25, Ian Kent wrote:
> >> >> >
> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> >> >> > +{
> >> >> > +	struct nsproxy *ns = info->data;
> >> >> > +
> >> >> > +	mntns_setfs(ns->mnt_ns);
> >> >>
> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> >> >> Let me remind about the coredump. The dumping task can cloned with
> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> >> >> this enough.
> >> >
> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> >> > could simply fork/reparent a child which will do execve ?
> >> 
> >> That would certainly be a better approach, and roughly equivalent to
> >> what exists here.  That would even ensure we remain in the proper
> >> cgroups, and lsm context.
> >> 
> >> The practical problem with the approach presented here is that I can
> >> hijack any user mode helper I wish, and make it run in any executable I
> >> wish as the global root user.
> >> 
> >> Ian if we were to merge this I believe you would win the award for
> >> easiest path to a root shell.
> >
> > LOL, OK, so there's a problem with this.
> >
> > But, how should a user mode helper execute within a namespace (or more
> > specifically within a container)?
> >
> > Suppose a user mode helper program scans through the pid list and
> > somehow picks the correct process pid and then does an
> > open()/setns()/execve().
> >
> > Does that then satisfy the requirements?
> > What needs to be done to safely do that in kernel?
> >
> > The other approach I've considered is doing a full open()/setns() in
> > kernel (since the caller already knows its pid) but it sounds like
> > that's not right either.
> 
> The approach we agreed upon with the core dump helper was to provide
> enough information that userspace could figure out what was the
> appropriate policy and call nsenter/setns.

Your recommending I have a look at the core dump helper, that's fine,
I'll do that.

> 
> The only sane approach I can think of in the context of nfs is to fork
> a kernel thread at mount time that has all of the appropriate context
> because it was captured from the privileged mounting process, and use
> that kernel as the equivalent of kthreadd.
> 
> There may be some intermediate ground where we capture things or we use
> the init process of the pid namespace (captured at mount time) as our
> template/reference process.
> 
> If we are going to set this stuff up in the kernel we need a reference
> process that we can create children of because what is possible with
> respect to containers keeps changing, and it is extremely error prone to
> figure out what all othe crazy little bits are, and to update everything
> every time someone tweaks the kernel's capabilities.  We have kthreadd
> because it was too error prone to scrub a userspace thread of all of the
> userspace bits and make it the equivalent of what kthreadd is today.
> 
> Of course it is also rather nice to have something to hang everything
> else on.
> 
> In summary we need a reference struct task that is all setup properly
> so that we can create an appropriate kernel thread.

I'm having trouble understanding what your getting at here but I'm not
that sharp so bear with me.

When call_usermodehelper() is called it's called from a process that is
within the context within which the execution is required.

So what information do we not have available for setup?

Are you saying that the problem is that when the user mode helper run
thread is invoked we don't have the information available that was
present when call_usermodehelper() was called and that's where the
challenge lies?

Ian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 23:50             ` Ian Kent
@ 2014-11-26  0:44               ` Ian Kent
  2014-11-26  1:38               ` Eric W. Biederman
  1 sibling, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-26  0:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

On Wed, 2014-11-26 at 07:50 +0800, Ian Kent wrote:
> > 
> > If we are going to set this stuff up in the kernel we need a reference
> > process that we can create children of because what is possible with
> > respect to containers keeps changing, and it is extremely error prone to
> > figure out what all othe crazy little bits are, and to update everything
> > every time someone tweaks the kernel's capabilities.  We have kthreadd
> > because it was too error prone to scrub a userspace thread of all of the
> > userspace bits and make it the equivalent of what kthreadd is today.
> > 
> > Of course it is also rather nice to have something to hang everything
> > else on.
> > 
> > In summary we need a reference struct task that is all setup properly
> > so that we can create an appropriate kernel thread.
> 
> I'm having trouble understanding what your getting at here but I'm not
> that sharp so bear with me.
> 
> When call_usermodehelper() is called it's called from a process that is
> within the context within which the execution is required.

Umm .. OK, that's probably not quite right either ....

For nfsd I think it's OK but for nfs clients the context is probably
that of the caller ....

Whereas the helper to get a key info maybe does need to be called in the
context of the caller .....

> 
> So what information do we not have available for setup?
> 
> Are you saying that the problem is that when the user mode helper run
> thread is invoked we don't have the information available that was
> present when call_usermodehelper() was called and that's where the
> challenge lies?
> 
> Ian
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 23:50             ` Ian Kent
  2014-11-26  0:44               ` Ian Kent
@ 2014-11-26  1:38               ` Eric W. Biederman
  2014-12-01 21:56                 ` Benjamin Coddington
  1 sibling, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2014-11-26  1:38 UTC (permalink / raw)
  To: Ian Kent
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

Ian Kent <ikent@redhat.com> writes:

> On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
>> Ian Kent <ikent@redhat.com> writes:
>> 
>> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
>> >> Oleg Nesterov <oleg@redhat.com> writes:
>> >> 
>> >> > On 11/25, Oleg Nesterov wrote:
>> >> >>
>> >> >> Let me first apologize, I didn't actually read this series yet.
>> >> >>
>> >> >> But I have to admit that so far I do not like this approach...
>> >> >> probably I am biased.
>> >> >
>> >> > Yes.
>> >> >
>> >> > And I have another concern... this is mostly a feeling, I can be
>> >> > easily wrong but:
>> >> >
>> >> >> On 11/25, Ian Kent wrote:
>> >> >> >
>> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
>> >> >> > +{
>> >> >> > +	struct nsproxy *ns = info->data;
>> >> >> > +
>> >> >> > +	mntns_setfs(ns->mnt_ns);
>> >> >>
>> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
>> >> >> Let me remind about the coredump. The dumping task can cloned with
>> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
>> >> >> this enough.
>> >> >
>> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
>> >> > could simply fork/reparent a child which will do execve ?
>> >> 
>> >> That would certainly be a better approach, and roughly equivalent to
>> >> what exists here.  That would even ensure we remain in the proper
>> >> cgroups, and lsm context.
>> >> 
>> >> The practical problem with the approach presented here is that I can
>> >> hijack any user mode helper I wish, and make it run in any executable I
>> >> wish as the global root user.
>> >> 
>> >> Ian if we were to merge this I believe you would win the award for
>> >> easiest path to a root shell.
>> >
>> > LOL, OK, so there's a problem with this.
>> >
>> > But, how should a user mode helper execute within a namespace (or more
>> > specifically within a container)?
>> >
>> > Suppose a user mode helper program scans through the pid list and
>> > somehow picks the correct process pid and then does an
>> > open()/setns()/execve().
>> >
>> > Does that then satisfy the requirements?
>> > What needs to be done to safely do that in kernel?
>> >
>> > The other approach I've considered is doing a full open()/setns() in
>> > kernel (since the caller already knows its pid) but it sounds like
>> > that's not right either.
>> 
>> The approach we agreed upon with the core dump helper was to provide
>> enough information that userspace could figure out what was the
>> appropriate policy and call nsenter/setns.
>
> Your recommending I have a look at the core dump helper, that's fine,
> I'll do that.

I am just describing it because it came up.  Core dumps are a much
easier case than nfs.

Frankly if we can figure out how to run the user mode helpers from the
kernel with an appropriate context and not involve userspace I think
that will be better for everyone, as it involves fewer moving parts at
the end of the day.

>> The only sane approach I can think of in the context of nfs is to fork
>> a kernel thread at mount time that has all of the appropriate context
>> because it was captured from the privileged mounting process, and use
>> that kernel as the equivalent of kthreadd.
>> 
>> There may be some intermediate ground where we capture things or we use
>> the init process of the pid namespace (captured at mount time) as our
>> template/reference process.
>> 
>> If we are going to set this stuff up in the kernel we need a reference
>> process that we can create children of because what is possible with
>> respect to containers keeps changing, and it is extremely error prone to
>> figure out what all othe crazy little bits are, and to update everything
>> every time someone tweaks the kernel's capabilities.  We have kthreadd
>> because it was too error prone to scrub a userspace thread of all of the
>> userspace bits and make it the equivalent of what kthreadd is today.
>> 
>> Of course it is also rather nice to have something to hang everything
>> else on.
>> 
>> In summary we need a reference struct task that is all setup properly
>> so that we can create an appropriate kernel thread.
>
> I'm having trouble understanding what your getting at here but I'm not
> that sharp so bear with me.
>
> When call_usermodehelper() is called it's called from a process that is
> within the context within which the execution is required.

No it is not.  That is precisely why we have call_usermodehelper instead
of just forking and exec'ing something.  The context which triggers the
call can be completely different from where you want to run.

> So what information do we not have available for setup?
>
> Are you saying that the problem is that when the user mode helper run
> thread is invoked we don't have the information available that was
> present when call_usermodehelper() was called and that's where the
> challenge lies?

That is part of it.

However in the context of nfs the correct context is determined at mount
time, and so when call_usermodehelper() is invoked the only link to that
correct context that we have is the nfs super block.

In a pathological case a userspace application can create a new user
namespace, and a new mount namespace and completely rearrange the
mounts, and deliberately trigger an nfs user mode helper.  If we were to
use that applications context it could control which userspace
application the kernel invoked.

So deeply and fundamentally for the case of nfs you need to capture the
context at nfs mount time.  The easiest way would be to fork a kernel
thread when nfs is mounted, that captures the entire context.

We might be able to do something a little more sophisticated and a
little less resource intensive, like pick on the init process that
exists when nfs is mounted.

Those are the general parameters.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:06     ` Oleg Nesterov
  2014-11-25 22:23       ` Eric W. Biederman
  2014-11-25 23:14       ` Ian Kent
@ 2014-11-26 11:46       ` David Howells
  2014-11-26 15:00         ` Eric W. Biederman
  2 siblings, 1 reply; 27+ messages in thread
From: David Howells @ 2014-11-26 11:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: dhowells, Oleg Nesterov, Ian Kent, Kernel Mailing List,
	J. Bruce Fields, Stanislav Kinsbursky, Trond Myklebust,
	Benjamin Coddington, Al Viro

Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ian if we were to merge this I believe you would win the award for
> easiest path to a root shell.

Is there any particular reason the upcalled program has to be run as root?
Could the kernel not run it as something else - perhaps the caller's UID,GID
or even something anonymous?

Also, call_sbin_request_key() could be given a parameter to call something
other than /sbin/request-key, and key_type::request_key could be used.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-26 11:46       ` David Howells
@ 2014-11-26 15:00         ` Eric W. Biederman
  2014-11-26 22:57           ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2014-11-26 15:00 UTC (permalink / raw)
  To: David Howells
  Cc: Oleg Nesterov, Ian Kent, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, Benjamin Coddington,
	Al Viro

David Howells <dhowells@redhat.com> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ian if we were to merge this I believe you would win the award for
>> easiest path to a root shell.
>
> Is there any particular reason the upcalled program has to be run as root?
> Could the kernel not run it as something else - perhaps the caller's UID,GID
> or even something anonymous?
>
> Also, call_sbin_request_key() could be given a parameter to call something
> other than /sbin/request-key, and key_type::request_key could be used.

Fundamentally the upcall needs to happen with enough privileges to do
the job, and that means running in practice running as root in the
appropriate context.  If we didn't need to gain privileges we wouldn't
need an upcall.

In the code I was critisizing struct cred was not being changed because
of what I believe was an ignorance of what task->nsproxy was about and
is for.

It is straight forward to save off a for a kernel thread from the
process calling mount and make it responsible for the upcall and use
that as the parent for all of the containerized upcalls, and we could
easily run with that threads permissions.

We can't use the context of the triggering user but we instead need to
use the context of the mounter of the filesystem.  As otherwise the
triggering user can control what is /sbin/request-key and cause problems
that way.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-26 15:00         ` Eric W. Biederman
@ 2014-11-26 22:57           ` J. Bruce Fields
  0 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2014-11-26 22:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Howells, Oleg Nesterov, Ian Kent, Kernel Mailing List,
	Stanislav Kinsbursky, Trond Myklebust, Benjamin Coddington,
	Al Viro

On Wed, Nov 26, 2014 at 09:00:11AM -0600, Eric W. Biederman wrote:
> David Howells <dhowells@redhat.com> writes:
> 
> > Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> Ian if we were to merge this I believe you would win the award for
> >> easiest path to a root shell.
> >
> > Is there any particular reason the upcalled program has to be run as root?
> > Could the kernel not run it as something else - perhaps the caller's UID,GID
> > or even something anonymous?
> >
> > Also, call_sbin_request_key() could be given a parameter to call something
> > other than /sbin/request-key, and key_type::request_key could be used.
> 
> Fundamentally the upcall needs to happen with enough privileges to do
> the job, and that means running in practice running as root in the
> appropriate context.  If we didn't need to gain privileges we wouldn't
> need an upcall.
> 
> In the code I was critisizing struct cred was not being changed because
> of what I believe was an ignorance of what task->nsproxy was about and
> is for.
> 
> It is straight forward to save off a for a kernel thread from the
> process calling mount and make it responsible for the upcall and use
> that as the parent for all of the containerized upcalls, and we could
> easily run with that threads permissions.
> 
> We can't use the context of the triggering user but we instead need to
> use the context of the mounter of the filesystem.  As otherwise the
> triggering user can control what is /sbin/request-key and cause problems
> that way.

That makes sense to me.  There are three cases I think we care about:

	- client idmapper upcall: translates between on-the-wire
	  "user@domain" names and local uid's/gid's.  The results may be
	  cached and used by other users, and you certainly don't want
	  the user who happened to trigger the upcall by doing a "ls" to
	  poison the cache with weird values.

	- server-side state recovery call: used to record some
	  information about which clients hold active state.  There's no
	  "mount" on the server side, but "root in the appropriate
	  context" still sounds right--we probably want to save the
	  context of whoever started the server.  One extra thread per
	  server would certainly be no big deal, so if that's how we
	  need to save the context, fine.

	- client request-key call: this isn't actually implemented yet
	  and I don't feel like I understand it.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 22:36     ` Ian Kent
  2014-11-25 23:27       ` Eric W. Biederman
@ 2014-11-27  1:30       ` Oleg Nesterov
  1 sibling, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2014-11-27  1:30 UTC (permalink / raw)
  To: Ian Kent
  Cc: Kernel Mailing List, J. Bruce Fields, Stanislav Kinsbursky,
	Trond Myklebust, David Howells, Benjamin Coddington, Al Viro,
	Eric W. Biederman

I didn't have time to follow this thread today, will try tomorrow.
Perhaps this was already answered...

On 11/26, Ian Kent wrote:
>
> On Tue, 2014-11-25 at 22:52 +0100, Oleg Nesterov wrote:
> >
> > and probably we also need this for coredump helpers, we want them
> > to be per-namespace.
>
> To save me some time could you point me to some of the related code
> please. I don't normally play in that area.

See call_usermodehelper_*() in do_coredump(). This has the same problems
(and just in case, of course other problems, starting from the fact that
core_pattern is global). We need the right root to find the binary, etc.

> > > +	switch_task_namespaces(current, ns);
> >
> > This doesn't look sane because this won't switch task_active_pid_ns().
>
> I wondered about that too but I didn't design the open()/setns()

No, I don't think we should use setns() in this case...

> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> > could simply fork/reparent a child which will do execve ?
>
> Are you saying that the user space program should be modified to do
> this?

No, no, I meant that the kernel could do this (yes, not that trivial)
on behalf of the caller's process.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-25 23:27       ` Eric W. Biederman
@ 2014-11-28  0:19         ` Ian Kent
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-11-28  0:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells,
	Benjamin Coddington, Al Viro

On Tue, 2014-11-25 at 17:27 -0600, Eric W. Biederman wrote:
> 
> > How does one correctly set the namespace in user space since each of
> > the /proc/<pid>/ns/<namespace> will use a slightly different
> > proc_ns_operations install function? 
> >
> > Are we saying that, for example, if open(/proc/<pid>/ns/pid)/setns() is
> > used then the process must not do path lookups if it expects them to be
> > within the namespace and restrict itself to pid related system calls
> > only and so on for each of the other namespaces?
> 
> In userspace you can only set the pid namespace for new children.  You
> can never change your own pid namespace.  Because actually changing a
> processes pid is too nasty to contemplate, or implement and because in a
> login daemon context having your first child be the initial process of
> the pid namespace is actually what is desirable.
> 

Maybe using the pid namespace was a bad example but now it seems I don't
understand the purpose of /proc/<pid>/ns/pid with the use of setns()
either.

I wasn't thinking of changing the process pid here at all, as you say
from the kernel POV that must not happen, I was thinking of changing an
execed userspace process view of pids.

Assuming that <pid> is valid within a callers namespace (and I believe
that's checked along the way) doesn't using this allow the created
userspace process to see pids in the view of the given namespace? Isn't
that the intended use of this?

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-11-26  1:38               ` Eric W. Biederman
@ 2014-12-01 21:56                 ` Benjamin Coddington
  2014-12-02 23:33                   ` Ian Kent
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Coddington @ 2014-12-01 21:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ian Kent, Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells, Al Viro

n Tue, 25 Nov 2014, Eric W. Biederman wrote:
Hi,

> Ian Kent <ikent@redhat.com> writes:
>
> > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
> >> Ian Kent <ikent@redhat.com> writes:
> >>
> >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> >> >> Oleg Nesterov <oleg@redhat.com> writes:
> >> >>
> >> >> > On 11/25, Oleg Nesterov wrote:
> >> >> >>
> >> >> >> Let me first apologize, I didn't actually read this series yet.
> >> >> >>
> >> >> >> But I have to admit that so far I do not like this approach...
> >> >> >> probably I am biased.
> >> >> >
> >> >> > Yes.
> >> >> >
> >> >> > And I have another concern... this is mostly a feeling, I can be
> >> >> > easily wrong but:
> >> >> >
> >> >> >> On 11/25, Ian Kent wrote:
> >> >> >> >
> >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> >> >> >> > +{
> >> >> >> > +	struct nsproxy *ns = info->data;
> >> >> >> > +
> >> >> >> > +	mntns_setfs(ns->mnt_ns);
> >> >> >>
> >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> >> >> >> Let me remind about the coredump. The dumping task can cloned with
> >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> >> >> >> this enough.
> >> >> >
> >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> >> >> > could simply fork/reparent a child which will do execve ?
> >> >>
> >> >> That would certainly be a better approach, and roughly equivalent to
> >> >> what exists here.  That would even ensure we remain in the proper
> >> >> cgroups, and lsm context.
> >> >>
> >> >> The practical problem with the approach presented here is that I can
> >> >> hijack any user mode helper I wish, and make it run in any executable I
> >> >> wish as the global root user.
> >> >>
> >> >> Ian if we were to merge this I believe you would win the award for
> >> >> easiest path to a root shell.
> >> >
> >> > LOL, OK, so there's a problem with this.
> >> >
> >> > But, how should a user mode helper execute within a namespace (or more
> >> > specifically within a container)?
> >> >
> >> > Suppose a user mode helper program scans through the pid list and
> >> > somehow picks the correct process pid and then does an
> >> > open()/setns()/execve().
> >> >
> >> > Does that then satisfy the requirements?
> >> > What needs to be done to safely do that in kernel?
> >> >
> >> > The other approach I've considered is doing a full open()/setns() in
> >> > kernel (since the caller already knows its pid) but it sounds like
> >> > that's not right either.
> >>
> >> The approach we agreed upon with the core dump helper was to provide
> >> enough information that userspace could figure out what was the
> >> appropriate policy and call nsenter/setns.
> >
> > Your recommending I have a look at the core dump helper, that's fine,
> > I'll do that.
>
> I am just describing it because it came up.  Core dumps are a much
> easier case than nfs.
>
> Frankly if we can figure out how to run the user mode helpers from the
> kernel with an appropriate context and not involve userspace I think
> that will be better for everyone, as it involves fewer moving parts at
> the end of the day.
>
> >> The only sane approach I can think of in the context of nfs is to fork
> >> a kernel thread at mount time that has all of the appropriate context
> >> because it was captured from the privileged mounting process, and use
> >> that kernel as the equivalent of kthreadd.
> >>
> >> There may be some intermediate ground where we capture things or we use
> >> the init process of the pid namespace (captured at mount time) as our
> >> template/reference process.
> >>
> >> If we are going to set this stuff up in the kernel we need a reference
> >> process that we can create children of because what is possible with
> >> respect to containers keeps changing, and it is extremely error prone to
> >> figure out what all othe crazy little bits are, and to update everything
> >> every time someone tweaks the kernel's capabilities.  We have kthreadd
> >> because it was too error prone to scrub a userspace thread of all of the
> >> userspace bits and make it the equivalent of what kthreadd is today.
> >>
> >> Of course it is also rather nice to have something to hang everything
> >> else on.
> >>
> >> In summary we need a reference struct task that is all setup properly
> >> so that we can create an appropriate kernel thread.
> >
> > I'm having trouble understanding what your getting at here but I'm not
> > that sharp so bear with me.
> >
> > When call_usermodehelper() is called it's called from a process that is
> > within the context within which the execution is required.
>
> No it is not.  That is precisely why we have call_usermodehelper instead
> of just forking and exec'ing something.  The context which triggers the
> call can be completely different from where you want to run.
>
> > So what information do we not have available for setup?
> >
> > Are you saying that the problem is that when the user mode helper run
> > thread is invoked we don't have the information available that was
> > present when call_usermodehelper() was called and that's where the
> > challenge lies?
>
> That is part of it.
>
> However in the context of nfs the correct context is determined at mount
> time, and so when call_usermodehelper() is invoked the only link to that
> correct context that we have is the nfs super block.
>
> In a pathological case a userspace application can create a new user
> namespace, and a new mount namespace and completely rearrange the
> mounts, and deliberately trigger an nfs user mode helper.  If we were to
> use that applications context it could control which userspace
> application the kernel invoked.

Then it seems it would never be safe to try to execve /sbin/request-key
within the calling process' mnt_ns, so if we want to do some work
within that namespace the transition to it needs to happen after the
upcall, after /sbin/request-key calls the appropriate helper.

And instead of transitioning into this context before execve, instead
pass it along to allow the userspace helper to do setns()..


> So deeply and fundamentally for the case of nfs you need to capture the
> context at nfs mount time.  The easiest way would be to fork a kernel
> thread when nfs is mounted, that captures the entire context.
>
> We might be able to do something a little more sophisticated and a
> little less resource intensive, like pick on the init process that
> exists when nfs is mounted.

The only namespace that needs to be preserved from mount time would be
the mount namespace which can be pulled from the mount itself and passed
along in the key request.

Or if we tracked the mount namespace when creating a user
namespace, we could climb out of the mount namespaces created within
user namespaces by following them out.  Does that sound sane, or
completely nuts?

> Those are the general parameters.

It does seem very expensive to keep a thread around for every mount; I'm
still trying to find a way around it..

Ben

>
> Eric
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-12-01 21:56                 ` Benjamin Coddington
@ 2014-12-02 23:33                   ` Ian Kent
  2014-12-03 16:49                     ` Eric W. Biederman
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Kent @ 2014-12-02 23:33 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Eric W. Biederman, Oleg Nesterov, Kernel Mailing List,
	J. Bruce Fields, Stanislav Kinsbursky, Trond Myklebust,
	David Howells, Al Viro

On Mon, 2014-12-01 at 16:56 -0500, Benjamin Coddington wrote:
> n Tue, 25 Nov 2014, Eric W. Biederman wrote:
> Hi,
> 
> > Ian Kent <ikent@redhat.com> writes:
> >
> > > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
> > >> Ian Kent <ikent@redhat.com> writes:
> > >>
> > >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> > >> >> Oleg Nesterov <oleg@redhat.com> writes:
> > >> >>
> > >> >> > On 11/25, Oleg Nesterov wrote:
> > >> >> >>
> > >> >> >> Let me first apologize, I didn't actually read this series yet.
> > >> >> >>
> > >> >> >> But I have to admit that so far I do not like this approach...
> > >> >> >> probably I am biased.
> > >> >> >
> > >> >> > Yes.
> > >> >> >
> > >> >> > And I have another concern... this is mostly a feeling, I can be
> > >> >> > easily wrong but:
> > >> >> >
> > >> >> >> On 11/25, Ian Kent wrote:
> > >> >> >> >
> > >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> > >> >> >> > +{
> > >> >> >> > +	struct nsproxy *ns = info->data;
> > >> >> >> > +
> > >> >> >> > +	mntns_setfs(ns->mnt_ns);
> > >> >> >>
> > >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> > >> >> >> Let me remind about the coredump. The dumping task can cloned with
> > >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> > >> >> >> this enough.
> > >> >> >
> > >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> > >> >> > could simply fork/reparent a child which will do execve ?
> > >> >>
> > >> >> That would certainly be a better approach, and roughly equivalent to
> > >> >> what exists here.  That would even ensure we remain in the proper
> > >> >> cgroups, and lsm context.
> > >> >>
> > >> >> The practical problem with the approach presented here is that I can
> > >> >> hijack any user mode helper I wish, and make it run in any executable I
> > >> >> wish as the global root user.
> > >> >>
> > >> >> Ian if we were to merge this I believe you would win the award for
> > >> >> easiest path to a root shell.
> > >> >
> > >> > LOL, OK, so there's a problem with this.
> > >> >
> > >> > But, how should a user mode helper execute within a namespace (or more
> > >> > specifically within a container)?
> > >> >
> > >> > Suppose a user mode helper program scans through the pid list and
> > >> > somehow picks the correct process pid and then does an
> > >> > open()/setns()/execve().
> > >> >
> > >> > Does that then satisfy the requirements?
> > >> > What needs to be done to safely do that in kernel?
> > >> >
> > >> > The other approach I've considered is doing a full open()/setns() in
> > >> > kernel (since the caller already knows its pid) but it sounds like
> > >> > that's not right either.
> > >>
> > >> The approach we agreed upon with the core dump helper was to provide
> > >> enough information that userspace could figure out what was the
> > >> appropriate policy and call nsenter/setns.
> > >
> > > Your recommending I have a look at the core dump helper, that's fine,
> > > I'll do that.
> >
> > I am just describing it because it came up.  Core dumps are a much
> > easier case than nfs.
> >
> > Frankly if we can figure out how to run the user mode helpers from the
> > kernel with an appropriate context and not involve userspace I think
> > that will be better for everyone, as it involves fewer moving parts at
> > the end of the day.
> >
> > >> The only sane approach I can think of in the context of nfs is to fork
> > >> a kernel thread at mount time that has all of the appropriate context
> > >> because it was captured from the privileged mounting process, and use
> > >> that kernel as the equivalent of kthreadd.
> > >>
> > >> There may be some intermediate ground where we capture things or we use
> > >> the init process of the pid namespace (captured at mount time) as our
> > >> template/reference process.
> > >>
> > >> If we are going to set this stuff up in the kernel we need a reference
> > >> process that we can create children of because what is possible with
> > >> respect to containers keeps changing, and it is extremely error prone to
> > >> figure out what all othe crazy little bits are, and to update everything
> > >> every time someone tweaks the kernel's capabilities.  We have kthreadd
> > >> because it was too error prone to scrub a userspace thread of all of the
> > >> userspace bits and make it the equivalent of what kthreadd is today.
> > >>
> > >> Of course it is also rather nice to have something to hang everything
> > >> else on.
> > >>
> > >> In summary we need a reference struct task that is all setup properly
> > >> so that we can create an appropriate kernel thread.
> > >
> > > I'm having trouble understanding what your getting at here but I'm not
> > > that sharp so bear with me.
> > >
> > > When call_usermodehelper() is called it's called from a process that is
> > > within the context within which the execution is required.
> >
> > No it is not.  That is precisely why we have call_usermodehelper instead
> > of just forking and exec'ing something.  The context which triggers the
> > call can be completely different from where you want to run.
> >
> > > So what information do we not have available for setup?
> > >
> > > Are you saying that the problem is that when the user mode helper run
> > > thread is invoked we don't have the information available that was
> > > present when call_usermodehelper() was called and that's where the
> > > challenge lies?
> >
> > That is part of it.
> >
> > However in the context of nfs the correct context is determined at mount
> > time, and so when call_usermodehelper() is invoked the only link to that
> > correct context that we have is the nfs super block.
> >
> > In a pathological case a userspace application can create a new user
> > namespace, and a new mount namespace and completely rearrange the
> > mounts, and deliberately trigger an nfs user mode helper.  If we were to
> > use that applications context it could control which userspace
> > application the kernel invoked.
> 
> Then it seems it would never be safe to try to execve /sbin/request-key
> within the calling process' mnt_ns, so if we want to do some work
> within that namespace the transition to it needs to happen after the
> upcall, after /sbin/request-key calls the appropriate helper.
> 
> And instead of transitioning into this context before execve, instead
> pass it along to allow the userspace helper to do setns()..

I still don't see the difference between, given an appropriate pid,
using the open()/setns() mechanism in kernel similar to nsenter(1). 

In that case the setup is done before the first exec rather than between
the first and second exec (the one in nsenter). Our other approach
essentially did that but (incorrectly) used the current process pid and
didn't deal with all the namespace types as nsenter does.

Isn't the real problem getting hold of an appropriate pid to use for the
open/setns that belongs to a process that's still running?
What else is needed, Eric?

Not having looked at how core dumping works before, I'm having
difficulty working out what gets executed in user space so I can't be
sure that what I'm saying is accurate, can someone spell this out for me
please?

> 
> 
> > So deeply and fundamentally for the case of nfs you need to capture the
> > context at nfs mount time.  The easiest way would be to fork a kernel
> > thread when nfs is mounted, that captures the entire context.
> >
> > We might be able to do something a little more sophisticated and a
> > little less resource intensive, like pick on the init process that
> > exists when nfs is mounted.
> 
> The only namespace that needs to be preserved from mount time would be
> the mount namespace which can be pulled from the mount itself and passed
> along in the key request.

Sounds simple but isn't it the case we don't know and shouldn't have to
care about what system calls the helper uses since the environment
should "see" what it's supposed to?

> 
> Or if we tracked the mount namespace when creating a user
> namespace, we could climb out of the mount namespaces created within
> user namespaces by following them out.  Does that sound sane, or
> completely nuts?
> 
> > Those are the general parameters.
> 
> It does seem very expensive to keep a thread around for every mount; I'm
> still trying to find a way around it..

Yeah, that's not such a good idea.

Several hundred mounts is going to create a big mess for system
administrators, not to mention the overhead. It's right up there with
symlinking /etc/mtab to /proc/self/mounts at sites with large direct
mount maps.

Ian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-12-02 23:33                   ` Ian Kent
@ 2014-12-03 16:49                     ` Eric W. Biederman
  2014-12-03 18:14                       ` Benjamin Coddington
                                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eric W. Biederman @ 2014-12-03 16:49 UTC (permalink / raw)
  To: Ian Kent
  Cc: Benjamin Coddington, Oleg Nesterov, Kernel Mailing List,
	J. Bruce Fields, Stanislav Kinsbursky, Trond Myklebust,
	David Howells, Al Viro

Ian Kent <ikent@redhat.com> writes:

> On Mon, 2014-12-01 at 16:56 -0500, Benjamin Coddington wrote:
>> n Tue, 25 Nov 2014, Eric W. Biederman wrote:
>> Hi,
>> 
>> > Ian Kent <ikent@redhat.com> writes:
>> >
>> > > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
>> > >> Ian Kent <ikent@redhat.com> writes:
>> > >>
>> > >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
>> > >> >> Oleg Nesterov <oleg@redhat.com> writes:
>> > >> >>
>> > >> >> > On 11/25, Oleg Nesterov wrote:
>> > >> >> >>
>> > >> >> >> Let me first apologize, I didn't actually read this series yet.
>> > >> >> >>
>> > >> >> >> But I have to admit that so far I do not like this approach...
>> > >> >> >> probably I am biased.
>> > >> >> >
>> > >> >> > Yes.
>> > >> >> >
>> > >> >> > And I have another concern... this is mostly a feeling, I can be
>> > >> >> > easily wrong but:
>> > >> >> >
>> > >> >> >> On 11/25, Ian Kent wrote:
>> > >> >> >> >
>> > >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
>> > >> >> >> > +{
>> > >> >> >> > +	struct nsproxy *ns = info->data;
>> > >> >> >> > +
>> > >> >> >> > +	mntns_setfs(ns->mnt_ns);
>> > >> >> >>
>> > >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
>> > >> >> >> Let me remind about the coredump. The dumping task can cloned with
>> > >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
>> > >> >> >> this enough.
>> > >> >> >
>> > >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
>> > >> >> > could simply fork/reparent a child which will do execve ?
>> > >> >>
>> > >> >> That would certainly be a better approach, and roughly equivalent to
>> > >> >> what exists here.  That would even ensure we remain in the proper
>> > >> >> cgroups, and lsm context.
>> > >> >>
>> > >> >> The practical problem with the approach presented here is that I can
>> > >> >> hijack any user mode helper I wish, and make it run in any executable I
>> > >> >> wish as the global root user.
>> > >> >>
>> > >> >> Ian if we were to merge this I believe you would win the award for
>> > >> >> easiest path to a root shell.
>> > >> >
>> > >> > LOL, OK, so there's a problem with this.
>> > >> >
>> > >> > But, how should a user mode helper execute within a namespace (or more
>> > >> > specifically within a container)?
>> > >> >
>> > >> > Suppose a user mode helper program scans through the pid list and
>> > >> > somehow picks the correct process pid and then does an
>> > >> > open()/setns()/execve().
>> > >> >
>> > >> > Does that then satisfy the requirements?
>> > >> > What needs to be done to safely do that in kernel?
>> > >> >
>> > >> > The other approach I've considered is doing a full open()/setns() in
>> > >> > kernel (since the caller already knows its pid) but it sounds like
>> > >> > that's not right either.
>> > >>
>> > >> The approach we agreed upon with the core dump helper was to provide
>> > >> enough information that userspace could figure out what was the
>> > >> appropriate policy and call nsenter/setns.
>> > >
>> > > Your recommending I have a look at the core dump helper, that's fine,
>> > > I'll do that.
>> >
>> > I am just describing it because it came up.  Core dumps are a much
>> > easier case than nfs.
>> >
>> > Frankly if we can figure out how to run the user mode helpers from the
>> > kernel with an appropriate context and not involve userspace I think
>> > that will be better for everyone, as it involves fewer moving parts at
>> > the end of the day.
>> >
>> > >> The only sane approach I can think of in the context of nfs is to fork
>> > >> a kernel thread at mount time that has all of the appropriate context
>> > >> because it was captured from the privileged mounting process, and use
>> > >> that kernel as the equivalent of kthreadd.
>> > >>
>> > >> There may be some intermediate ground where we capture things or we use
>> > >> the init process of the pid namespace (captured at mount time) as our
>> > >> template/reference process.
>> > >>
>> > >> If we are going to set this stuff up in the kernel we need a reference
>> > >> process that we can create children of because what is possible with
>> > >> respect to containers keeps changing, and it is extremely error prone to
>> > >> figure out what all othe crazy little bits are, and to update everything
>> > >> every time someone tweaks the kernel's capabilities.  We have kthreadd
>> > >> because it was too error prone to scrub a userspace thread of all of the
>> > >> userspace bits and make it the equivalent of what kthreadd is today.
>> > >>
>> > >> Of course it is also rather nice to have something to hang everything
>> > >> else on.
>> > >>
>> > >> In summary we need a reference struct task that is all setup properly
>> > >> so that we can create an appropriate kernel thread.
>> > >
>> > > I'm having trouble understanding what your getting at here but I'm not
>> > > that sharp so bear with me.
>> > >
>> > > When call_usermodehelper() is called it's called from a process that is
>> > > within the context within which the execution is required.
>> >
>> > No it is not.  That is precisely why we have call_usermodehelper instead
>> > of just forking and exec'ing something.  The context which triggers the
>> > call can be completely different from where you want to run.
>> >
>> > > So what information do we not have available for setup?
>> > >
>> > > Are you saying that the problem is that when the user mode helper run
>> > > thread is invoked we don't have the information available that was
>> > > present when call_usermodehelper() was called and that's where the
>> > > challenge lies?
>> >
>> > That is part of it.
>> >
>> > However in the context of nfs the correct context is determined at mount
>> > time, and so when call_usermodehelper() is invoked the only link to that
>> > correct context that we have is the nfs super block.
>> >
>> > In a pathological case a userspace application can create a new user
>> > namespace, and a new mount namespace and completely rearrange the
>> > mounts, and deliberately trigger an nfs user mode helper.  If we were to
>> > use that applications context it could control which userspace
>> > application the kernel invoked.
>> 
>> Then it seems it would never be safe to try to execve /sbin/request-key
>> within the calling process' mnt_ns, so if we want to do some work
>> within that namespace the transition to it needs to happen after the
>> upcall, after /sbin/request-key calls the appropriate helper.

No.  We never want to do work in the calling process's mnt_ns.
We want to work in the mounters mnt_ns which given mount propagation
can be something completely and totally different.

>> And instead of transitioning into this context before execve, instead
>> pass it along to allow the userspace helper to do setns()..
>
> I still don't see the difference between, given an appropriate pid,
> using the open()/setns() mechanism in kernel similar to nsenter(1). 
>
> In that case the setup is done before the first exec rather than between
> the first and second exec (the one in nsenter). Our other approach
> essentially did that but (incorrectly) used the current process pid and
> didn't deal with all the namespace types as nsenter does.
>
> Isn't the real problem getting hold of an appropriate pid to use for the
> open/setns that belongs to a process that's still running?
> What else is needed, Eric?

Essentially that is the real problem.  Getting ahold of the context of
the container in which a filesystem was mounted.

Now it isn't just namespaces but also cgroups and uids and gids and
capabilities and process groups and sessions.  There are a whole host of
little things that need to be right to run a user mode helper, and
in particular to run a user mode helper in a ``container''.

> Not having looked at how core dumping works before, I'm having
> difficulty working out what gets executed in user space so I can't be
> sure that what I'm saying is accurate, can someone spell this out for me
> please?

In that case what happens is that the normal user mode helper process
runs started by kthreadd and if %P is specified that processes knows the
pid in the initial pid namespace.  Which is good enough to use nsenter
and otherwise figure out the appropriate container from the process that
is dumping core.

It is easier with core dumping as we don't have the disconnect in
contexts that we do with filesystems.

>> > So deeply and fundamentally for the case of nfs you need to capture the
>> > context at nfs mount time.  The easiest way would be to fork a kernel
>> > thread when nfs is mounted, that captures the entire context.
>> >
>> > We might be able to do something a little more sophisticated and a
>> > little less resource intensive, like pick on the init process that
>> > exists when nfs is mounted.
>> 
>> The only namespace that needs to be preserved from mount time would be
>> the mount namespace which can be pulled from the mount itself and passed
>> along in the key request.

We absolutely can not pull the mount from the mount namespace.  Mount
propagtion makes that impossible.

> Sounds simple but isn't it the case we don't know and shouldn't have to
> care about what system calls the helper uses since the environment
> should "see" what it's supposed to?

The tricky bit is setting the environment correctly.

>> Or if we tracked the mount namespace when creating a user
>> namespace, we could climb out of the mount namespaces created within
>> user namespaces by following them out.  Does that sound sane, or
>> completely nuts?

For what we have today (where the global root user has to mount the nfs
filesystem) tracking the mount namespace at the time nfs is mounted is
close to enough.  However if we are going to work this out we need to
solve this for an nfs where we are brave enough to set
 ".fs_flags = FS_USERNS_MOUNT". 

The NFS code base is nearly at the point where we can reasonably talk
about supporting unprivileged mounts.

>> > Those are the general parameters.
>> 
>> It does seem very expensive to keep a thread around for every mount; I'm
>> still trying to find a way around it..
>
> Yeah, that's not such a good idea.
>
> Several hundred mounts is going to create a big mess for system
> administrators, not to mention the overhead. It's right up there with
> symlinking /etc/mtab to /proc/self/mounts at sites with large direct
> mount maps.

A thread will cost you maybe 40k.  In the grand scheme of things that
really isn't a lot.  I agree that it would be nice to do better than
one thread per mount.   But I start with a thread as a reference point
as that is the trivial way to get things correct.

To relax it from a thread per mount we need definitions that we can
explain to folks in userspace about when and where user mode helpers
will run.

I think something like Oleg's proposal to use the init process of a pid
namespace as our reference is worth exploring.  We certainly don't match
the mounters exact environment when running user mode helpers today, so
there is leeway for some relaxation without breaking userspace.

There are funny issues without forking a thread to run the user mode
helpers, such as what happens if all of the processes in the container
exit what should happen?  Remember that mount propagation allows the
filesystem to propagate into different mount namespaces and bind mounts
allow the filesystem to show up in different places in the those mount
namespaces.  

If the container has exited should the filesystem remain functional?
Should the kernel keep pieces of the container alive to run user mode
helpers?

Eric


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-12-03 16:49                     ` Eric W. Biederman
@ 2014-12-03 18:14                       ` Benjamin Coddington
  2014-12-03 22:53                       ` Ian Kent
  2014-12-03 23:34                       ` Ian Kent
  2 siblings, 0 replies; 27+ messages in thread
From: Benjamin Coddington @ 2014-12-03 18:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ian Kent, Oleg Nesterov, Kernel Mailing List, J. Bruce Fields,
	Stanislav Kinsbursky, Trond Myklebust, David Howells, Al Viro



On Wed, 3 Dec 2014, Eric W. Biederman wrote:

> Ian Kent <ikent@redhat.com> writes:
>
> > On Mon, 2014-12-01 at 16:56 -0500, Benjamin Coddington wrote:
> >> n Tue, 25 Nov 2014, Eric W. Biederman wrote:
> >> Hi,
> >>
> >> > Ian Kent <ikent@redhat.com> writes:
> >> >
> >> > > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
> >> > >> Ian Kent <ikent@redhat.com> writes:
> >> > >>
> >> > >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> >> > >> >> Oleg Nesterov <oleg@redhat.com> writes:
> >> > >> >>
> >> > >> >> > On 11/25, Oleg Nesterov wrote:
> >> > >> >> >>
> >> > >> >> >> Let me first apologize, I didn't actually read this series yet.
> >> > >> >> >>
> >> > >> >> >> But I have to admit that so far I do not like this approach...
> >> > >> >> >> probably I am biased.
> >> > >> >> >
> >> > >> >> > Yes.
> >> > >> >> >
> >> > >> >> > And I have another concern... this is mostly a feeling, I can be
> >> > >> >> > easily wrong but:
> >> > >> >> >
> >> > >> >> >> On 11/25, Ian Kent wrote:
> >> > >> >> >> >
> >> > >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> >> > >> >> >> > +{
> >> > >> >> >> > +	struct nsproxy *ns = info->data;
> >> > >> >> >> > +
> >> > >> >> >> > +	mntns_setfs(ns->mnt_ns);
> >> > >> >> >>
> >> > >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> >> > >> >> >> Let me remind about the coredump. The dumping task can cloned with
> >> > >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> >> > >> >> >> this enough.
> >> > >> >> >
> >> > >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> >> > >> >> > could simply fork/reparent a child which will do execve ?
> >> > >> >>
> >> > >> >> That would certainly be a better approach, and roughly equivalent to
> >> > >> >> what exists here.  That would even ensure we remain in the proper
> >> > >> >> cgroups, and lsm context.
> >> > >> >>
> >> > >> >> The practical problem with the approach presented here is that I can
> >> > >> >> hijack any user mode helper I wish, and make it run in any executable I
> >> > >> >> wish as the global root user.
> >> > >> >>
> >> > >> >> Ian if we were to merge this I believe you would win the award for
> >> > >> >> easiest path to a root shell.
> >> > >> >
> >> > >> > LOL, OK, so there's a problem with this.
> >> > >> >
> >> > >> > But, how should a user mode helper execute within a namespace (or more
> >> > >> > specifically within a container)?
> >> > >> >
> >> > >> > Suppose a user mode helper program scans through the pid list and
> >> > >> > somehow picks the correct process pid and then does an
> >> > >> > open()/setns()/execve().
> >> > >> >
> >> > >> > Does that then satisfy the requirements?
> >> > >> > What needs to be done to safely do that in kernel?
> >> > >> >
> >> > >> > The other approach I've considered is doing a full open()/setns() in
> >> > >> > kernel (since the caller already knows its pid) but it sounds like
> >> > >> > that's not right either.
> >> > >>
> >> > >> The approach we agreed upon with the core dump helper was to provide
> >> > >> enough information that userspace could figure out what was the
> >> > >> appropriate policy and call nsenter/setns.
> >> > >
> >> > > Your recommending I have a look at the core dump helper, that's fine,
> >> > > I'll do that.
> >> >
> >> > I am just describing it because it came up.  Core dumps are a much
> >> > easier case than nfs.
> >> >
> >> > Frankly if we can figure out how to run the user mode helpers from the
> >> > kernel with an appropriate context and not involve userspace I think
> >> > that will be better for everyone, as it involves fewer moving parts at
> >> > the end of the day.
> >> >
> >> > >> The only sane approach I can think of in the context of nfs is to fork
> >> > >> a kernel thread at mount time that has all of the appropriate context
> >> > >> because it was captured from the privileged mounting process, and use
> >> > >> that kernel as the equivalent of kthreadd.
> >> > >>
> >> > >> There may be some intermediate ground where we capture things or we use
> >> > >> the init process of the pid namespace (captured at mount time) as our
> >> > >> template/reference process.
> >> > >>
> >> > >> If we are going to set this stuff up in the kernel we need a reference
> >> > >> process that we can create children of because what is possible with
> >> > >> respect to containers keeps changing, and it is extremely error prone to
> >> > >> figure out what all othe crazy little bits are, and to update everything
> >> > >> every time someone tweaks the kernel's capabilities.  We have kthreadd
> >> > >> because it was too error prone to scrub a userspace thread of all of the
> >> > >> userspace bits and make it the equivalent of what kthreadd is today.
> >> > >>
> >> > >> Of course it is also rather nice to have something to hang everything
> >> > >> else on.
> >> > >>
> >> > >> In summary we need a reference struct task that is all setup properly
> >> > >> so that we can create an appropriate kernel thread.
> >> > >
> >> > > I'm having trouble understanding what your getting at here but I'm not
> >> > > that sharp so bear with me.
> >> > >
> >> > > When call_usermodehelper() is called it's called from a process that is
> >> > > within the context within which the execution is required.
> >> >
> >> > No it is not.  That is precisely why we have call_usermodehelper instead
> >> > of just forking and exec'ing something.  The context which triggers the
> >> > call can be completely different from where you want to run.
> >> >
> >> > > So what information do we not have available for setup?
> >> > >
> >> > > Are you saying that the problem is that when the user mode helper run
> >> > > thread is invoked we don't have the information available that was
> >> > > present when call_usermodehelper() was called and that's where the
> >> > > challenge lies?
> >> >
> >> > That is part of it.
> >> >
> >> > However in the context of nfs the correct context is determined at mount
> >> > time, and so when call_usermodehelper() is invoked the only link to that
> >> > correct context that we have is the nfs super block.
> >> >
> >> > In a pathological case a userspace application can create a new user
> >> > namespace, and a new mount namespace and completely rearrange the
> >> > mounts, and deliberately trigger an nfs user mode helper.  If we were to
> >> > use that applications context it could control which userspace
> >> > application the kernel invoked.
> >>
> >> Then it seems it would never be safe to try to execve /sbin/request-key
> >> within the calling process' mnt_ns, so if we want to do some work
> >> within that namespace the transition to it needs to happen after the
> >> upcall, after /sbin/request-key calls the appropriate helper.
>
> No.  We never want to do work in the calling process's mnt_ns.
> We want to work in the mounters mnt_ns which given mount propagation
> can be something completely and totally different.
>
> >> And instead of transitioning into this context before execve, instead
> >> pass it along to allow the userspace helper to do setns()..
> >
> > I still don't see the difference between, given an appropriate pid,
> > using the open()/setns() mechanism in kernel similar to nsenter(1).
> >
> > In that case the setup is done before the first exec rather than between
> > the first and second exec (the one in nsenter). Our other approach
> > essentially did that but (incorrectly) used the current process pid and
> > didn't deal with all the namespace types as nsenter does.
> >
> > Isn't the real problem getting hold of an appropriate pid to use for the
> > open/setns that belongs to a process that's still running?
> > What else is needed, Eric?
>
> Essentially that is the real problem.  Getting ahold of the context of
> the container in which a filesystem was mounted.
>
> Now it isn't just namespaces but also cgroups and uids and gids and
> capabilities and process groups and sessions.  There are a whole host of
> little things that need to be right to run a user mode helper, and
> in particular to run a user mode helper in a ``container''.
>
> > Not having looked at how core dumping works before, I'm having
> > difficulty working out what gets executed in user space so I can't be
> > sure that what I'm saying is accurate, can someone spell this out for me
> > please?
>
> In that case what happens is that the normal user mode helper process
> runs started by kthreadd and if %P is specified that processes knows the
> pid in the initial pid namespace.  Which is good enough to use nsenter
> and otherwise figure out the appropriate container from the process that
> is dumping core.
>
> It is easier with core dumping as we don't have the disconnect in
> contexts that we do with filesystems.
>
> >> > So deeply and fundamentally for the case of nfs you need to capture the
> >> > context at nfs mount time.  The easiest way would be to fork a kernel
> >> > thread when nfs is mounted, that captures the entire context.
> >> >
> >> > We might be able to do something a little more sophisticated and a
> >> > little less resource intensive, like pick on the init process that
> >> > exists when nfs is mounted.
> >>
> >> The only namespace that needs to be preserved from mount time would be
> >> the mount namespace which can be pulled from the mount itself and passed
> >> along in the key request.
>
> We absolutely can not pull the mount from the mount namespace.  Mount
> propagtion makes that impossible.
>
> > Sounds simple but isn't it the case we don't know and shouldn't have to
> > care about what system calls the helper uses since the environment
> > should "see" what it's supposed to?
>
> The tricky bit is setting the environment correctly.
>
> >> Or if we tracked the mount namespace when creating a user
> >> namespace, we could climb out of the mount namespaces created within
> >> user namespaces by following them out.  Does that sound sane, or
> >> completely nuts?
>
> For what we have today (where the global root user has to mount the nfs
> filesystem) tracking the mount namespace at the time nfs is mounted is
> close to enough.  However if we are going to work this out we need to
> solve this for an nfs where we are brave enough to set
>  ".fs_flags = FS_USERNS_MOUNT".
>
> The NFS code base is nearly at the point where we can reasonably talk
> about supporting unprivileged mounts.
>
> >> > Those are the general parameters.
> >>
> >> It does seem very expensive to keep a thread around for every mount; I'm
> >> still trying to find a way around it..
> >
> > Yeah, that's not such a good idea.
> >
> > Several hundred mounts is going to create a big mess for system
> > administrators, not to mention the overhead. It's right up there with
> > symlinking /etc/mtab to /proc/self/mounts at sites with large direct
> > mount maps.
>
> A thread will cost you maybe 40k.  In the grand scheme of things that
> really isn't a lot.  I agree that it would be nice to do better than
> one thread per mount.   But I start with a thread as a reference point
> as that is the trivial way to get things correct.
>
> To relax it from a thread per mount we need definitions that we can
> explain to folks in userspace about when and where user mode helpers
> will run.
>
> I think something like Oleg's proposal to use the init process of a pid
> namespace as our reference is worth exploring.  We certainly don't match
> the mounters exact environment when running user mode helpers today, so
> there is leeway for some relaxation without breaking userspace.

That might look like a "khelper" workqueue for each pid namespace, and the
user of call_usermodehelper() would specify that the current pid namespace
helper should or should not be used.

Oleg also mentioned only using (or creating) the namespace helper if
requested via prctl().  I can imagine that some users of namespaces would
want to keep calls to userspace helpers within the global root space.

If the helper would only be created by request, that takes away the overhead
of thread-per-mount or thread-per-pid-namespace a bit.

As you mentioned earlier, it is a good idea to think about providing
userspace with a clear understanding of how and where user mode helpers run.
This approach allows userspace to think about them as being tied to the
first process in each container.

Ben

> There are funny issues without forking a thread to run the user mode
> helpers, such as what happens if all of the processes in the container
> exit what should happen?  Remember that mount propagation allows the
> filesystem to propagate into different mount namespaces and bind mounts
> allow the filesystem to show up in different places in the those mount
> namespaces.
>
> If the container has exited should the filesystem remain functional?
> Should the kernel keep pieces of the container alive to run user mode
> helpers?
>
> Eric
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-12-03 16:49                     ` Eric W. Biederman
  2014-12-03 18:14                       ` Benjamin Coddington
@ 2014-12-03 22:53                       ` Ian Kent
  2014-12-03 23:34                       ` Ian Kent
  2 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-12-03 22:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Coddington, Oleg Nesterov, Kernel Mailing List,
	J. Bruce Fields, Stanislav Kinsbursky, Trond Myklebust,
	David Howells, Al Viro

On Wed, 2014-12-03 at 10:49 -0600, Eric W. Biederman wrote:
> 
> >> > Those are the general parameters.
> >> 
> >> It does seem very expensive to keep a thread around for every mount; I'm
> >> still trying to find a way around it..
> >
> > Yeah, that's not such a good idea.
> >
> > Several hundred mounts is going to create a big mess for system
> > administrators, not to mention the overhead. It's right up there with
> > symlinking /etc/mtab to /proc/self/mounts at sites with large direct
> > mount maps.
> 
> A thread will cost you maybe 40k.  In the grand scheme of things that
> really isn't a lot.  I agree that it would be nice to do better than
> one thread per mount.   But I start with a thread as a reference point
> as that is the trivial way to get things correct.

Sure it has merit because it's straight forward but my criticism isn't
about overhead. It's about system administrators annoyance and
frustration level when they're trying get things done.

For example, with the symlinking of the mount table and even just a few
hundred direct automounts, listing the mount table fills the screen with
tombs of stuff that really should be hidden (and only listed if
requested). That just gets in the way when you're trying desperately to
resolve some urgent problem.

There is a resource issue as well since so many site administration
applications will read the table and, as the number of entries gets
larger, so does the time these things take to run. Not having to deal
with this on a day to day basis tends to make us forget about these
little side issues but I think they are important. 

Point is, for process listings the issue is almost exactly the same.

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper
  2014-12-03 16:49                     ` Eric W. Biederman
  2014-12-03 18:14                       ` Benjamin Coddington
  2014-12-03 22:53                       ` Ian Kent
@ 2014-12-03 23:34                       ` Ian Kent
  2 siblings, 0 replies; 27+ messages in thread
From: Ian Kent @ 2014-12-03 23:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Coddington, Oleg Nesterov, Kernel Mailing List,
	J. Bruce Fields, Stanislav Kinsbursky, Trond Myklebust,
	David Howells, Al Viro

On Wed, 2014-12-03 at 10:49 -0600, Eric W. Biederman wrote:
> Ian Kent <ikent@redhat.com> writes:
> 
> > On Mon, 2014-12-01 at 16:56 -0500, Benjamin Coddington wrote:
> >> n Tue, 25 Nov 2014, Eric W. Biederman wrote:
> >> Hi,
> >> 
> >> > Ian Kent <ikent@redhat.com> writes:
> >> >
> >> > > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote:
> >> > >> Ian Kent <ikent@redhat.com> writes:
> >> > >>
> >> > >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote:
> >> > >> >> Oleg Nesterov <oleg@redhat.com> writes:
> >> > >> >>
> >> > >> >> > On 11/25, Oleg Nesterov wrote:
> >> > >> >> >>
> >> > >> >> >> Let me first apologize, I didn't actually read this series yet.
> >> > >> >> >>
> >> > >> >> >> But I have to admit that so far I do not like this approach...
> >> > >> >> >> probably I am biased.
> >> > >> >> >
> >> > >> >> > Yes.
> >> > >> >> >
> >> > >> >> > And I have another concern... this is mostly a feeling, I can be
> >> > >> >> > easily wrong but:
> >> > >> >> >
> >> > >> >> >> On 11/25, Ian Kent wrote:
> >> > >> >> >> >
> >> > >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new)
> >> > >> >> >> > +{
> >> > >> >> >> > +	struct nsproxy *ns = info->data;
> >> > >> >> >> > +
> >> > >> >> >> > +	mntns_setfs(ns->mnt_ns);
> >> > >> >> >>
> >> > >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns.
> >> > >> >> >> Let me remind about the coredump. The dumping task can cloned with
> >> > >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand
> >> > >> >> >> this enough.
> >> > >> >> >
> >> > >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we
> >> > >> >> > could simply fork/reparent a child which will do execve ?
> >> > >> >>
> >> > >> >> That would certainly be a better approach, and roughly equivalent to
> >> > >> >> what exists here.  That would even ensure we remain in the proper
> >> > >> >> cgroups, and lsm context.
> >> > >> >>
> >> > >> >> The practical problem with the approach presented here is that I can
> >> > >> >> hijack any user mode helper I wish, and make it run in any executable I
> >> > >> >> wish as the global root user.
> >> > >> >>
> >> > >> >> Ian if we were to merge this I believe you would win the award for
> >> > >> >> easiest path to a root shell.
> >> > >> >
> >> > >> > LOL, OK, so there's a problem with this.
> >> > >> >
> >> > >> > But, how should a user mode helper execute within a namespace (or more
> >> > >> > specifically within a container)?
> >> > >> >
> >> > >> > Suppose a user mode helper program scans through the pid list and
> >> > >> > somehow picks the correct process pid and then does an
> >> > >> > open()/setns()/execve().
> >> > >> >
> >> > >> > Does that then satisfy the requirements?
> >> > >> > What needs to be done to safely do that in kernel?
> >> > >> >
> >> > >> > The other approach I've considered is doing a full open()/setns() in
> >> > >> > kernel (since the caller already knows its pid) but it sounds like
> >> > >> > that's not right either.
> >> > >>
> >> > >> The approach we agreed upon with the core dump helper was to provide
> >> > >> enough information that userspace could figure out what was the
> >> > >> appropriate policy and call nsenter/setns.
> >> > >
> >> > > Your recommending I have a look at the core dump helper, that's fine,
> >> > > I'll do that.
> >> >
> >> > I am just describing it because it came up.  Core dumps are a much
> >> > easier case than nfs.
> >> >
> >> > Frankly if we can figure out how to run the user mode helpers from the
> >> > kernel with an appropriate context and not involve userspace I think
> >> > that will be better for everyone, as it involves fewer moving parts at
> >> > the end of the day.
> >> >
> >> > >> The only sane approach I can think of in the context of nfs is to fork
> >> > >> a kernel thread at mount time that has all of the appropriate context
> >> > >> because it was captured from the privileged mounting process, and use
> >> > >> that kernel as the equivalent of kthreadd.
> >> > >>
> >> > >> There may be some intermediate ground where we capture things or we use
> >> > >> the init process of the pid namespace (captured at mount time) as our
> >> > >> template/reference process.
> >> > >>
> >> > >> If we are going to set this stuff up in the kernel we need a reference
> >> > >> process that we can create children of because what is possible with
> >> > >> respect to containers keeps changing, and it is extremely error prone to
> >> > >> figure out what all othe crazy little bits are, and to update everything
> >> > >> every time someone tweaks the kernel's capabilities.  We have kthreadd
> >> > >> because it was too error prone to scrub a userspace thread of all of the
> >> > >> userspace bits and make it the equivalent of what kthreadd is today.
> >> > >>
> >> > >> Of course it is also rather nice to have something to hang everything
> >> > >> else on.
> >> > >>
> >> > >> In summary we need a reference struct task that is all setup properly
> >> > >> so that we can create an appropriate kernel thread.
> >> > >
> >> > > I'm having trouble understanding what your getting at here but I'm not
> >> > > that sharp so bear with me.
> >> > >
> >> > > When call_usermodehelper() is called it's called from a process that is
> >> > > within the context within which the execution is required.
> >> >
> >> > No it is not.  That is precisely why we have call_usermodehelper instead
> >> > of just forking and exec'ing something.  The context which triggers the
> >> > call can be completely different from where you want to run.
> >> >
> >> > > So what information do we not have available for setup?
> >> > >
> >> > > Are you saying that the problem is that when the user mode helper run
> >> > > thread is invoked we don't have the information available that was
> >> > > present when call_usermodehelper() was called and that's where the
> >> > > challenge lies?
> >> >
> >> > That is part of it.
> >> >
> >> > However in the context of nfs the correct context is determined at mount
> >> > time, and so when call_usermodehelper() is invoked the only link to that
> >> > correct context that we have is the nfs super block.
> >> >
> >> > In a pathological case a userspace application can create a new user
> >> > namespace, and a new mount namespace and completely rearrange the
> >> > mounts, and deliberately trigger an nfs user mode helper.  If we were to
> >> > use that applications context it could control which userspace
> >> > application the kernel invoked.
> >> 
> >> Then it seems it would never be safe to try to execve /sbin/request-key
> >> within the calling process' mnt_ns, so if we want to do some work
> >> within that namespace the transition to it needs to happen after the
> >> upcall, after /sbin/request-key calls the appropriate helper.
> 
> No.  We never want to do work in the calling process's mnt_ns.
> We want to work in the mounters mnt_ns which given mount propagation
> can be something completely and totally different.
> 
> >> And instead of transitioning into this context before execve, instead
> >> pass it along to allow the userspace helper to do setns()..
> >
> > I still don't see the difference between, given an appropriate pid,
> > using the open()/setns() mechanism in kernel similar to nsenter(1). 
> >
> > In that case the setup is done before the first exec rather than between
> > the first and second exec (the one in nsenter). Our other approach
> > essentially did that but (incorrectly) used the current process pid and
> > didn't deal with all the namespace types as nsenter does.
> >
> > Isn't the real problem getting hold of an appropriate pid to use for the
> > open/setns that belongs to a process that's still running?
> > What else is needed, Eric?
> 
> Essentially that is the real problem.  Getting ahold of the context of
> the container in which a filesystem was mounted.
> 
> Now it isn't just namespaces but also cgroups and uids and gids and
> capabilities and process groups and sessions.  There are a whole host of
> little things that need to be right to run a user mode helper, and
> in particular to run a user mode helper in a ``container''.

And I think this is where the confusion starts because ....

As it stands (whether executing within a container or not) the mounting
process is mostly long gone when these calls are made and they are
executed from the root init context.

So doesn't adding a new init context, for example the init context of a
container, then using that for the template amount to the same thing
within the container?

If we accept that then the open/setns is "fairly" straight forward.

TBH I'm struggling to see where would be different.

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-12-03 23:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-25  1:07 [RFC PATCH 0/4] Namespace contrained helper execution Ian Kent
2014-11-25  1:07 ` [RFC PATCH 1/4] vfs - fs/namespaces.c: break out mntns_setfs() from mntns_install() Ian Kent
2014-11-25  1:07 ` [RFC PATCH 2/4] nsproxy - make create_new_namespaces() non-static Ian Kent
2014-11-25  1:07 ` [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper Ian Kent
2014-11-25 21:52   ` Oleg Nesterov
2014-11-25 22:06     ` Oleg Nesterov
2014-11-25 22:23       ` Eric W. Biederman
2014-11-25 23:07         ` Ian Kent
2014-11-25 23:19           ` Eric W. Biederman
2014-11-25 23:50             ` Ian Kent
2014-11-26  0:44               ` Ian Kent
2014-11-26  1:38               ` Eric W. Biederman
2014-12-01 21:56                 ` Benjamin Coddington
2014-12-02 23:33                   ` Ian Kent
2014-12-03 16:49                     ` Eric W. Biederman
2014-12-03 18:14                       ` Benjamin Coddington
2014-12-03 22:53                       ` Ian Kent
2014-12-03 23:34                       ` Ian Kent
2014-11-25 23:14       ` Ian Kent
2014-11-26 11:46       ` David Howells
2014-11-26 15:00         ` Eric W. Biederman
2014-11-26 22:57           ` J. Bruce Fields
2014-11-25 22:36     ` Ian Kent
2014-11-25 23:27       ` Eric W. Biederman
2014-11-28  0:19         ` Ian Kent
2014-11-27  1:30       ` Oleg Nesterov
2014-11-25  1:07 ` [RFC PATCH 4/4] KEYS: exec request-key within the requesting task's namespace Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).