All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container
@ 2016-03-31 14:54 ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:54 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-fsdevel
  Cc: Jan Kara, Al Viro

Changes in v3:
 * Collected acks from Jan Kara and Serge Hallyn
 * Fixed doc spelling errors
 * fput -> put_filp in error path

Changes in v2:

 * Use oldfile f_creds for the vfs_open credentials in
   filp_clone_open()
 * remove spurious fput() after filp_close()
 * Add documentation for the F option

There's also now a git tree at

https://git.kernel.org/cgit/linux/kernel/git/jejb/binfmt_misc.git

Original cover letter:

I've recently been playing a lot with architecture emulation containers
using qemu, mainly so I can build and test the arm secure boot
toolchain on my x86 laptop (not having any arm server hardware).

The process for bringing up architecture emulation containers using
qemu-user is very cumbersome because the emulation has to be installed
within the container.  This is bad for several reasons, firstly because
it contaminates the container image to have an emulator sitting inside
it.  Secondly it means that all orchestration systems have to be
explicitly modified to work with non-native architectures and thirdly
it means that the consumer of the container can accidentally destroy
the emulation and thus permanently crash the container.

The fix for this is to add a mode to binfmt_misc where the emulation
can be permanently installed from the current mount namespace and where
it survives both chroot and changes of mount namespace, effectively
allowing the container to run using the emulator, but where the
emulator itself isn't present within the container.

There are a couple of downsides to this, firstly the mapping of the
interpreter is accessible inside the container even if the actual file
isn't (I don't think this is really a security problem) and secondly,
to update the emulator package, you now have to remove the emulation
update and reinstall it, but this should be easy to do with packaging
scripts.  Finally, the emulator must be static otherwise the container
would crash when the elf binary format attempted to run the elf
interpreter.

I'm not really a regular FS developer, so I'd appreciate FS people
taking a look at the new filp_clone_open() API and whether I got
everything correct.

Thanks,

James

---
James Bottomley (3):
  fs: add filp_clone_open API
  binfmt_misc: add persistent opened binary handler for containers
  binfmt_misc: add F option description to documentation

 Documentation/binfmt_misc.txt |  7 +++++++
 fs/binfmt_misc.c              | 41 +++++++++++++++++++++++++++++++++++++++--
 fs/internal.h                 |  1 +
 fs/open.c                     | 20 ++++++++++++++++++++
 4 files changed, 67 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container
@ 2016-03-31 14:54 ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:54 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

Changes in v3:
 * Collected acks from Jan Kara and Serge Hallyn
 * Fixed doc spelling errors
 * fput -> put_filp in error path

Changes in v2:

 * Use oldfile f_creds for the vfs_open credentials in
   filp_clone_open()
 * remove spurious fput() after filp_close()
 * Add documentation for the F option

There's also now a git tree at

https://git.kernel.org/cgit/linux/kernel/git/jejb/binfmt_misc.git

Original cover letter:

I've recently been playing a lot with architecture emulation containers
using qemu, mainly so I can build and test the arm secure boot
toolchain on my x86 laptop (not having any arm server hardware).

The process for bringing up architecture emulation containers using
qemu-user is very cumbersome because the emulation has to be installed
within the container.  This is bad for several reasons, firstly because
it contaminates the container image to have an emulator sitting inside
it.  Secondly it means that all orchestration systems have to be
explicitly modified to work with non-native architectures and thirdly
it means that the consumer of the container can accidentally destroy
the emulation and thus permanently crash the container.

The fix for this is to add a mode to binfmt_misc where the emulation
can be permanently installed from the current mount namespace and where
it survives both chroot and changes of mount namespace, effectively
allowing the container to run using the emulator, but where the
emulator itself isn't present within the container.

There are a couple of downsides to this, firstly the mapping of the
interpreter is accessible inside the container even if the actual file
isn't (I don't think this is really a security problem) and secondly,
to update the emulator package, you now have to remove the emulation
update and reinstall it, but this should be easy to do with packaging
scripts.  Finally, the emulator must be static otherwise the container
would crash when the elf binary format attempted to run the elf
interpreter.

I'm not really a regular FS developer, so I'd appreciate FS people
taking a look at the new filp_clone_open() API and whether I got
everything correct.

Thanks,

James

---
James Bottomley (3):
  fs: add filp_clone_open API
  binfmt_misc: add persistent opened binary handler for containers
  binfmt_misc: add F option description to documentation

 Documentation/binfmt_misc.txt |  7 +++++++
 fs/binfmt_misc.c              | 41 +++++++++++++++++++++++++++++++++++++++--
 fs/internal.h                 |  1 +
 fs/open.c                     | 20 ++++++++++++++++++++
 4 files changed, 67 insertions(+), 2 deletions(-)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch v3 1/3] fs: add filp_clone_open API
  2016-03-31 14:54 ` James Bottomley
@ 2016-03-31 14:55     ` James Bottomley
  -1 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:55 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-fsdevel
  Cc: Jan Kara, Al Viro

I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler.  I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem.  The use case will be a persistent binfmt_misc handler.

Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/internal.h |  1 +
 fs/open.c     | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/fs/internal.h b/fs/internal.h
index b71deee..c8ca0c9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,6 +108,7 @@ extern long do_handle_open(int mountdirfd,
 			   struct file_handle __user *ufh, int open_flag);
 extern int open_check_o_direct(struct file *f);
 extern int vfs_open(const struct path *, struct file *, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
 
 /*
  * inode.c
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..bfe6f2b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1002,6 +1002,26 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
 }
 EXPORT_SYMBOL(file_open_root);
 
+struct file *filp_clone_open(struct file *oldfile)
+{
+	struct file *file;
+	int retval;
+
+	file = get_empty_filp();
+	if (IS_ERR(file))
+		return file;
+
+	file->f_flags = oldfile->f_flags;
+	retval = vfs_open(&oldfile->f_path, file, oldfile->f_cred);
+	if (retval) {
+		put_filp(file);
+		return ERR_PTR(retval);
+	}
+
+	return file;
+}
+EXPORT_SYMBOL(filp_clone_open);
+
 long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
 {
 	struct open_flags op;
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v3 1/3] fs: add filp_clone_open API
@ 2016-03-31 14:55     ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:55 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler.  I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem.  The use case will be a persistent binfmt_misc handler.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/internal.h |  1 +
 fs/open.c     | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/fs/internal.h b/fs/internal.h
index b71deee..c8ca0c9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,6 +108,7 @@ extern long do_handle_open(int mountdirfd,
 			   struct file_handle __user *ufh, int open_flag);
 extern int open_check_o_direct(struct file *f);
 extern int vfs_open(const struct path *, struct file *, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
 
 /*
  * inode.c
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..bfe6f2b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1002,6 +1002,26 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
 }
 EXPORT_SYMBOL(file_open_root);
 
+struct file *filp_clone_open(struct file *oldfile)
+{
+	struct file *file;
+	int retval;
+
+	file = get_empty_filp();
+	if (IS_ERR(file))
+		return file;
+
+	file->f_flags = oldfile->f_flags;
+	retval = vfs_open(&oldfile->f_path, file, oldfile->f_cred);
+	if (retval) {
+		put_filp(file);
+		return ERR_PTR(retval);
+	}
+
+	return file;
+}
+EXPORT_SYMBOL(filp_clone_open);
+
 long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
 {
 	struct open_flags op;
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v3 2/3] binfmt_misc: add persistent opened binary handler for containers
  2016-03-31 14:54 ` James Bottomley
@ 2016-03-31 14:56     ` James Bottomley
  -1 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:56 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-fsdevel
  Cc: Jan Kara, Al Viro

This patch adds a new flag 'F' to the binfmt handlers.  If you pass in
'F' the binary that runs the emulation will be opened immediately and
in future, will be cloned from the open file.

The net effect is that the handler survives both changeroots and mount
namespace changes, making it easy to work with foreign architecture
containers without contaminating the container image with the
emulator.

Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/binfmt_misc.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 3a3ced7..8a108c4 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -26,6 +26,8 @@
 #include <linux/fs.h>
 #include <linux/uaccess.h>
 
+#include "internal.h"
+
 #ifdef DEBUG
 # define USE_DEBUG 1
 #else
@@ -43,6 +45,7 @@ enum {Enabled, Magic};
 #define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
 #define MISC_FMT_OPEN_BINARY (1 << 30)
 #define MISC_FMT_CREDENTIALS (1 << 29)
+#define MISC_FMT_OPEN_FILE (1 << 28)
 
 typedef struct {
 	struct list_head list;
@@ -54,6 +57,7 @@ typedef struct {
 	char *interpreter;		/* filename of interpreter */
 	char *name;
 	struct dentry *dentry;
+	struct file *interp_file;
 } Node;
 
 static DEFINE_RWLOCK(entries_lock);
@@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto error;
 
-	interp_file = open_exec(iname);
+	if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
+		interp_file = filp_clone_open(fmt->interp_file);
+		if (!IS_ERR(interp_file))
+			deny_write_access(interp_file);
+	} else {
+		interp_file = open_exec(iname);
+	}
 	retval = PTR_ERR(interp_file);
 	if (IS_ERR(interp_file))
 		goto error;
@@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
 			e->flags |= (MISC_FMT_CREDENTIALS |
 					MISC_FMT_OPEN_BINARY);
 			break;
+		case 'F':
+			pr_debug("register: flag: F: open interpreter file now\n");
+			p++;
+			e->flags |= MISC_FMT_OPEN_FILE;
+			break;
 		default:
 			cont = 0;
 		}
@@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
 		*dp++ = 'O';
 	if (e->flags & MISC_FMT_CREDENTIALS)
 		*dp++ = 'C';
+	if (e->flags & MISC_FMT_OPEN_FILE)
+		*dp++ = 'F';
 	*dp++ = '\n';
 
 	if (!test_bit(Magic, &e->flags)) {
@@ -590,6 +607,11 @@ static void kill_node(Node *e)
 	}
 	write_unlock(&entries_lock);
 
+	if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
+		filp_close(e->interp_file, NULL);
+		e->interp_file = NULL;
+	}
+
 	if (dentry) {
 		drop_nlink(d_inode(dentry));
 		d_drop(dentry);
@@ -698,6 +720,21 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
 		goto out2;
 	}
 
+	if (e->flags & MISC_FMT_OPEN_FILE) {
+		struct file *f;
+
+		f = open_exec(e->interpreter);
+		if (IS_ERR(f)) {
+			err = PTR_ERR(f);
+			pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
+			simple_release_fs(&bm_mnt, &entry_count);
+			iput(inode);
+			inode = NULL;
+			goto out2;
+		}
+		e->interp_file = f;
+	}
+
 	e->dentry = dget(dentry);
 	inode->i_private = e;
 	inode->i_fop = &bm_entry_operations;
@@ -716,7 +753,7 @@ out:
 
 	if (err) {
 		kfree(e);
-		return -EINVAL;
+		return err;
 	}
 	return count;
 }
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v3 2/3] binfmt_misc: add persistent opened binary handler for containers
@ 2016-03-31 14:56     ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:56 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

This patch adds a new flag 'F' to the binfmt handlers.  If you pass in
'F' the binary that runs the emulation will be opened immediately and
in future, will be cloned from the open file.

The net effect is that the handler survives both changeroots and mount
namespace changes, making it easy to work with foreign architecture
containers without contaminating the container image with the
emulator.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/binfmt_misc.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 3a3ced7..8a108c4 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -26,6 +26,8 @@
 #include <linux/fs.h>
 #include <linux/uaccess.h>
 
+#include "internal.h"
+
 #ifdef DEBUG
 # define USE_DEBUG 1
 #else
@@ -43,6 +45,7 @@ enum {Enabled, Magic};
 #define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
 #define MISC_FMT_OPEN_BINARY (1 << 30)
 #define MISC_FMT_CREDENTIALS (1 << 29)
+#define MISC_FMT_OPEN_FILE (1 << 28)
 
 typedef struct {
 	struct list_head list;
@@ -54,6 +57,7 @@ typedef struct {
 	char *interpreter;		/* filename of interpreter */
 	char *name;
 	struct dentry *dentry;
+	struct file *interp_file;
 } Node;
 
 static DEFINE_RWLOCK(entries_lock);
@@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto error;
 
-	interp_file = open_exec(iname);
+	if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
+		interp_file = filp_clone_open(fmt->interp_file);
+		if (!IS_ERR(interp_file))
+			deny_write_access(interp_file);
+	} else {
+		interp_file = open_exec(iname);
+	}
 	retval = PTR_ERR(interp_file);
 	if (IS_ERR(interp_file))
 		goto error;
@@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
 			e->flags |= (MISC_FMT_CREDENTIALS |
 					MISC_FMT_OPEN_BINARY);
 			break;
+		case 'F':
+			pr_debug("register: flag: F: open interpreter file now\n");
+			p++;
+			e->flags |= MISC_FMT_OPEN_FILE;
+			break;
 		default:
 			cont = 0;
 		}
@@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
 		*dp++ = 'O';
 	if (e->flags & MISC_FMT_CREDENTIALS)
 		*dp++ = 'C';
+	if (e->flags & MISC_FMT_OPEN_FILE)
+		*dp++ = 'F';
 	*dp++ = '\n';
 
 	if (!test_bit(Magic, &e->flags)) {
@@ -590,6 +607,11 @@ static void kill_node(Node *e)
 	}
 	write_unlock(&entries_lock);
 
+	if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
+		filp_close(e->interp_file, NULL);
+		e->interp_file = NULL;
+	}
+
 	if (dentry) {
 		drop_nlink(d_inode(dentry));
 		d_drop(dentry);
@@ -698,6 +720,21 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
 		goto out2;
 	}
 
+	if (e->flags & MISC_FMT_OPEN_FILE) {
+		struct file *f;
+
+		f = open_exec(e->interpreter);
+		if (IS_ERR(f)) {
+			err = PTR_ERR(f);
+			pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
+			simple_release_fs(&bm_mnt, &entry_count);
+			iput(inode);
+			inode = NULL;
+			goto out2;
+		}
+		e->interp_file = f;
+	}
+
 	e->dentry = dget(dentry);
 	inode->i_private = e;
 	inode->i_fop = &bm_entry_operations;
@@ -716,7 +753,7 @@ out:
 
 	if (err) {
 		kfree(e);
-		return -EINVAL;
+		return err;
 	}
 	return count;
 }
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v3 3/3] binfmt_misc: add F option description to documentation
  2016-03-31 14:54 ` James Bottomley
@ 2016-03-31 14:57     ` James Bottomley
  -1 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:57 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-fsdevel
  Cc: Jan Kara, Al Viro

Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
---
 Documentation/binfmt_misc.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt
index 6b1de70..ec83bbc 100644
--- a/Documentation/binfmt_misc.txt
+++ b/Documentation/binfmt_misc.txt
@@ -66,6 +66,13 @@ Here is what the fields mean:
             This feature should be used with care as the interpreter
             will run with root permissions when a setuid binary owned by root
             is run with binfmt_misc.
+      'F' - fix binary.  The usual behaviour of binfmt_misc is to spawn the
+      	    binary lazily when the misc format file is invoked.  However,
+	    this doesn't work very well in the face of mount namespaces and
+	    changeroots, so the F mode opens the binary as soon as the
+	    emulation is installed and uses the opened image to spawn the
+	    emulator, meaning it is always available once installed,
+	    regardless of how the environment changes.
 
 
 There are some restrictions:
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch v3 3/3] binfmt_misc: add F option description to documentation
@ 2016-03-31 14:57     ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2016-03-31 14:57 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 Documentation/binfmt_misc.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt
index 6b1de70..ec83bbc 100644
--- a/Documentation/binfmt_misc.txt
+++ b/Documentation/binfmt_misc.txt
@@ -66,6 +66,13 @@ Here is what the fields mean:
             This feature should be used with care as the interpreter
             will run with root permissions when a setuid binary owned by root
             is run with binfmt_misc.
+      'F' - fix binary.  The usual behaviour of binfmt_misc is to spawn the
+      	    binary lazily when the misc format file is invoked.  However,
+	    this doesn't work very well in the face of mount namespaces and
+	    changeroots, so the F mode opens the binary as soon as the
+	    emulation is installed and uses the opened image to spawn the
+	    emulator, meaning it is always available once installed,
+	    regardless of how the environment changes.
 
 
 There are some restrictions:
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Patch v3 2/3] binfmt_misc: add persistent opened binary handler for containers
       [not found]     ` <1459436179.2958.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-04-13 21:02       ` Matthew Helsley
       [not found]         ` <CA+RrjuV3VOjLRMm_bnk-9CKF_3NqX6DVNBgAUVpxW0+NE0bLHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Helsley @ 2016-04-13 21:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-fsdevel,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jan Kara,
	Al Viro

With the current model it's possible to upgrade the emulator binary and
have new exec()s utilize the upgraded binary. I think this flag would
prevent that from happening until userspace removed the binfmt_misc entry
and re-added it. That creates a race where the entry would be missing for a
time and exec() could fail unexpectedly. It looks like the register path in
binfmt_misc fails if the entry already exists so there's no race-free way
to change or replace a registered binfmt_misc entry.

If binfmt_misc could hold a reference to the mount namespace used when
registering the handler then we wouldn't need to hold the filp reference on
the emulator binary and upgrading would not be affected. Then modifying the
entry would never be necessary AFAICS. I think that *might* be better so
long there's no circular reference between the mount namespace reference
and the binfmt_misc procfs code that I haven't accounted for.

Alternately, perhaps you could keep the filp and add a way to
modify/replace binfmt_misc entries without the race.

Cheers,
    -Matt Helsley

On Thu, Mar 31, 2016 at 7:56 AM, James Bottomley <
James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote:

> This patch adds a new flag 'F' to the binfmt handlers.  If you pass in
> 'F' the binary that runs the emulation will be opened immediately and
> in future, will be cloned from the open file.
>
> The net effect is that the handler survives both changeroots and mount
> namespace changes, making it easy to work with foreign architecture
> containers without contaminating the container image with the
> emulator.
>
> Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
>  fs/binfmt_misc.c | 41 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> index 3a3ced7..8a108c4 100644
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -26,6 +26,8 @@
>  #include <linux/fs.h>
>  #include <linux/uaccess.h>
>
> +#include "internal.h"
> +
>  #ifdef DEBUG
>  # define USE_DEBUG 1
>  #else
> @@ -43,6 +45,7 @@ enum {Enabled, Magic};
>  #define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
>  #define MISC_FMT_OPEN_BINARY (1 << 30)
>  #define MISC_FMT_CREDENTIALS (1 << 29)
> +#define MISC_FMT_OPEN_FILE (1 << 28)
>
>  typedef struct {
>         struct list_head list;
> @@ -54,6 +57,7 @@ typedef struct {
>         char *interpreter;              /* filename of interpreter */
>         char *name;
>         struct dentry *dentry;
> +       struct file *interp_file;
>  } Node;
>
>  static DEFINE_RWLOCK(entries_lock);
> @@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
>         if (retval < 0)
>                 goto error;
>
> -       interp_file = open_exec(iname);
> +       if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
> +               interp_file = filp_clone_open(fmt->interp_file);
> +               if (!IS_ERR(interp_file))
> +                       deny_write_access(interp_file);
> +       } else {
> +               interp_file = open_exec(iname);
> +       }
>         retval = PTR_ERR(interp_file);
>         if (IS_ERR(interp_file))
>                 goto error;
> @@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
>                         e->flags |= (MISC_FMT_CREDENTIALS |
>                                         MISC_FMT_OPEN_BINARY);
>                         break;
> +               case 'F':
> +                       pr_debug("register: flag: F: open interpreter file
> now\n");
> +                       p++;
> +                       e->flags |= MISC_FMT_OPEN_FILE;
> +                       break;
>                 default:
>                         cont = 0;
>                 }
> @@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
>                 *dp++ = 'O';
>         if (e->flags & MISC_FMT_CREDENTIALS)
>                 *dp++ = 'C';
> +       if (e->flags & MISC_FMT_OPEN_FILE)
> +               *dp++ = 'F';
>         *dp++ = '\n';
>
>         if (!test_bit(Magic, &e->flags)) {
> @@ -590,6 +607,11 @@ static void kill_node(Node *e)
>         }
>         write_unlock(&entries_lock);
>
> +       if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
> +               filp_close(e->interp_file, NULL);
> +               e->interp_file = NULL;
> +       }
> +
>         if (dentry) {
>                 drop_nlink(d_inode(dentry));
>                 d_drop(dentry);
> @@ -698,6 +720,21 @@ static ssize_t bm_register_write(struct file *file,
> const char __user *buffer,
>                 goto out2;
>         }
>
> +       if (e->flags & MISC_FMT_OPEN_FILE) {
> +               struct file *f;
> +
> +               f = open_exec(e->interpreter);
> +               if (IS_ERR(f)) {
> +                       err = PTR_ERR(f);
> +                       pr_notice("register: failed to install interpreter
> file %s\n", e->interpreter);
> +                       simple_release_fs(&bm_mnt, &entry_count);
> +                       iput(inode);
> +                       inode = NULL;
> +                       goto out2;
> +               }
> +               e->interp_file = f;
> +       }
> +
>         e->dentry = dget(dentry);
>         inode->i_private = e;
>         inode->i_fop = &bm_entry_operations;
> @@ -716,7 +753,7 @@ out:
>
>         if (err) {
>                 kfree(e);
> -               return -EINVAL;
> +               return err;
>         }
>         return count;
>  }
> --
> 2.6.2
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch v3 2/3] binfmt_misc: add persistent opened binary handler for containers
       [not found]         ` <CA+RrjuV3VOjLRMm_bnk-9CKF_3NqX6DVNBgAUVpxW0+NE0bLHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-04-14 22:33           ` johnny frazier
  0 siblings, 0 replies; 10+ messages in thread
From: johnny frazier @ 2016-04-14 22:33 UTC (permalink / raw)
  To: Matthew Helsley
  Cc: linux-fsdevel,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jan Kara,
	Al Viro

Y'all need to stay the f*** out of my business
On Apr 13, 2016 4:02 PM, "Matthew Helsley" <matt.helsley-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> With the current model it's possible to upgrade the emulator binary and
> have new exec()s utilize the upgraded binary. I think this flag would
> prevent that from happening until userspace removed the binfmt_misc entry
> and re-added it. That creates a race where the entry would be missing for a
> time and exec() could fail unexpectedly. It looks like the register path in
> binfmt_misc fails if the entry already exists so there's no race-free way
> to change or replace a registered binfmt_misc entry.
>
> If binfmt_misc could hold a reference to the mount namespace used when
> registering the handler then we wouldn't need to hold the filp reference on
> the emulator binary and upgrading would not be affected. Then modifying the
> entry would never be necessary AFAICS. I think that *might* be better so
> long there's no circular reference between the mount namespace reference
> and the binfmt_misc procfs code that I haven't accounted for.
>
> Alternately, perhaps you could keep the filp and add a way to
> modify/replace binfmt_misc entries without the race.
>
> Cheers,
>     -Matt Helsley
>
> On Thu, Mar 31, 2016 at 7:56 AM, James Bottomley <
> James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote:
>
> > This patch adds a new flag 'F' to the binfmt handlers.  If you pass in
> > 'F' the binary that runs the emulation will be opened immediately and
> > in future, will be cloned from the open file.
> >
> > The net effect is that the handler survives both changeroots and mount
> > namespace changes, making it easy to work with foreign architecture
> > containers without contaminating the container image with the
> > emulator.
> >
> > Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> > Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > ---
> >  fs/binfmt_misc.c | 41 +++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> > index 3a3ced7..8a108c4 100644
> > --- a/fs/binfmt_misc.c
> > +++ b/fs/binfmt_misc.c
> > @@ -26,6 +26,8 @@
> >  #include <linux/fs.h>
> >  #include <linux/uaccess.h>
> >
> > +#include "internal.h"
> > +
> >  #ifdef DEBUG
> >  # define USE_DEBUG 1
> >  #else
> > @@ -43,6 +45,7 @@ enum {Enabled, Magic};
> >  #define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
> >  #define MISC_FMT_OPEN_BINARY (1 << 30)
> >  #define MISC_FMT_CREDENTIALS (1 << 29)
> > +#define MISC_FMT_OPEN_FILE (1 << 28)
> >
> >  typedef struct {
> >         struct list_head list;
> > @@ -54,6 +57,7 @@ typedef struct {
> >         char *interpreter;              /* filename of interpreter */
> >         char *name;
> >         struct dentry *dentry;
> > +       struct file *interp_file;
> >  } Node;
> >
> >  static DEFINE_RWLOCK(entries_lock);
> > @@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm
> *bprm)
> >         if (retval < 0)
> >                 goto error;
> >
> > -       interp_file = open_exec(iname);
> > +       if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
> > +               interp_file = filp_clone_open(fmt->interp_file);
> > +               if (!IS_ERR(interp_file))
> > +                       deny_write_access(interp_file);
> > +       } else {
> > +               interp_file = open_exec(iname);
> > +       }
> >         retval = PTR_ERR(interp_file);
> >         if (IS_ERR(interp_file))
> >                 goto error;
> > @@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
> >                         e->flags |= (MISC_FMT_CREDENTIALS |
> >                                         MISC_FMT_OPEN_BINARY);
> >                         break;
> > +               case 'F':
> > +                       pr_debug("register: flag: F: open interpreter
> file
> > now\n");
> > +                       p++;
> > +                       e->flags |= MISC_FMT_OPEN_FILE;
> > +                       break;
> >                 default:
> >                         cont = 0;
> >                 }
> > @@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
> >                 *dp++ = 'O';
> >         if (e->flags & MISC_FMT_CREDENTIALS)
> >                 *dp++ = 'C';
> > +       if (e->flags & MISC_FMT_OPEN_FILE)
> > +               *dp++ = 'F';
> >         *dp++ = '\n';
> >
> >         if (!test_bit(Magic, &e->flags)) {
> > @@ -590,6 +607,11 @@ static void kill_node(Node *e)
> >         }
> >         write_unlock(&entries_lock);
> >
> > +       if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
> > +               filp_close(e->interp_file, NULL);
> > +               e->interp_file = NULL;
> > +       }
> > +
> >         if (dentry) {
> >                 drop_nlink(d_inode(dentry));
> >                 d_drop(dentry);
> > @@ -698,6 +720,21 @@ static ssize_t bm_register_write(struct file *file,
> > const char __user *buffer,
> >                 goto out2;
> >         }
> >
> > +       if (e->flags & MISC_FMT_OPEN_FILE) {
> > +               struct file *f;
> > +
> > +               f = open_exec(e->interpreter);
> > +               if (IS_ERR(f)) {
> > +                       err = PTR_ERR(f);
> > +                       pr_notice("register: failed to install
> interpreter
> > file %s\n", e->interpreter);
> > +                       simple_release_fs(&bm_mnt, &entry_count);
> > +                       iput(inode);
> > +                       inode = NULL;
> > +                       goto out2;
> > +               }
> > +               e->interp_file = f;
> > +       }
> > +
> >         e->dentry = dget(dentry);
> >         inode->i_private = e;
> >         inode->i_fop = &bm_entry_operations;
> > @@ -716,7 +753,7 @@ out:
> >
> >         if (err) {
> >                 kfree(e);
> > -               return -EINVAL;
> > +               return err;
> >         }
> >         return count;
> >  }
> > --
> > 2.6.2
> >
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> >
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-14 22:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 14:54 [Patch v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
2016-03-31 14:54 ` James Bottomley
     [not found] ` <1459436046.2958.21.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-03-31 14:55   ` [Patch v3 1/3] fs: add filp_clone_open API James Bottomley
2016-03-31 14:55     ` James Bottomley
2016-03-31 14:56   ` [Patch v3 2/3] binfmt_misc: add persistent opened binary handler for containers James Bottomley
2016-03-31 14:56     ` James Bottomley
     [not found]     ` <1459436179.2958.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-04-13 21:02       ` Matthew Helsley
     [not found]         ` <CA+RrjuV3VOjLRMm_bnk-9CKF_3NqX6DVNBgAUVpxW0+NE0bLHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-14 22:33           ` johnny frazier
2016-03-31 14:57   ` [Patch v3 3/3] binfmt_misc: add F option description to documentation James Bottomley
2016-03-31 14:57     ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.