* [PATCH v7 0/1] ns: introduce binfmt_misc namespace @ 2019-11-07 14:03 Laurent Vivier 2019-11-07 14:03 ` [PATCH v7 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier 0 siblings, 1 reply; 10+ messages in thread From: Laurent Vivier @ 2019-11-07 14:03 UTC (permalink / raw) To: linux-kernel Cc: Dmitry Safonov, Henning Schild, linux-fsdevel, James Bottomley, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka, Laurent Vivier v7: Use the new mount API Replace static struct dentry *bm_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { struct user_namespace *ns = current_user_ns(); return mount_ns(fs_type, flags, data, ns, ns, bm_fill_super); } by static void bm_free(struct fs_context *fc) { if (fc->s_fs_info) put_user_ns(fc->s_fs_info); } static int bm_get_tree(struct fs_context *fc) { return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns)); } static const struct fs_context_operations bm_context_ops = { .free = bm_free, .get_tree = bm_get_tree, }; static int bm_init_fs_context(struct fs_context *fc) { fc->ops = &bm_context_ops; return 0; } v6: Return &init_binfmt_ns instead of NULL in binfmt_ns() This should never happen, but to stay safe return a value we can use. change subject from "RFC" to "PATCH" v5: Use READ_ONCE()/WRITE_ONCE() move mount pointer struct init to bm_fill_super() and add smp_wmb() remove useless NULL value init add WARN_ON_ONCE() v4: first user namespace is initialized with &init_binfmt_ns, all new user namespaces are initialized with a NULL and use the one of the first parent that is not NULL. The pointer is initialized to a valid value the first time the binfmt_misc fs is mounted in the current user namespace. This allows to not change the way it was working before: new ns inherits values from its parent, and if parent value is modified (or parent creates its own binfmt entry by mounting the fs) child inherits it (unless it has itself mounted the fs). v3: create a structure to store binfmt_misc data, add a pointer to this structure in the user_namespace structure, in init_user_ns structure this pointer points to an init_binfmt_ns structure. And all new user namespaces point to this init structure. A new binfmt namespace structure is allocated if the binfmt_misc filesystem is mounted in a user namespace that is not the initial one but its binfmt namespace pointer points to the initial one. add override_creds()/revert_creds() around open_exec() in bm_register_write() v2: no new namespace, binfmt_misc data are now part of the mount namespace I put this in mount namespace instead of user namespace because the mount namespace is already needed and I don't want to force to have the user namespace for that. As this is a filesystem, it seems logic to have it here. This allows to define a new interpreter for each new container. But the main goal is to be able to chroot to a directory using a binfmt_misc interpreter without being root. I have a modified version of unshare at: https://github.com/vivier/util-linux.git branch unshare-chroot with some new options to unshare binfmt_misc namespace and to chroot to a directory. If you have a directory /chroot/powerpc/jessie containing debian for powerpc binaries and a qemu-ppc interpreter, you can do for instance: $ uname -a Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux $ ./unshare --map-root-user --fork --pid \ --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/qemu-ppc:OC" \ --root=/chroot/powerpc/jessie /bin/bash -l # uname -a Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 ppc GNU/Linux # id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) # ls -l total 5940 drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 bin drwxr-xr-x. 2 nobody nogroup 4096 Jun 17 20:26 boot drwxr-xr-x. 4 nobody nogroup 4096 Aug 12 00:08 dev drwxr-xr-x. 42 nobody nogroup 4096 Sep 28 07:25 etc drwxr-xr-x. 3 nobody nogroup 4096 Sep 28 07:25 home drwxr-xr-x. 9 nobody nogroup 4096 Aug 12 00:58 lib drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 media drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 mnt drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 13:09 opt dr-xr-xr-x. 143 nobody nogroup 0 Sep 30 23:02 proc -rwxr-xr-x. 1 nobody nogroup 6009712 Sep 28 07:22 qemu-ppc drwx------. 3 nobody nogroup 4096 Aug 12 12:54 root drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 00:08 run drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 sbin drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 srv drwxr-xr-x. 2 nobody nogroup 4096 Apr 6 2015 sys drwxrwxrwt. 2 nobody nogroup 4096 Sep 28 10:31 tmp drwxr-xr-x. 10 nobody nogroup 4096 Aug 12 00:08 usr drwxr-xr-x. 11 nobody nogroup 4096 Aug 12 00:08 var If you want to use the qemu binary provided by your distro, you can use --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/bin/qemu-ppc-static:OCF" With the 'F' flag, qemu-ppc-static will be then loaded from the main root filesystem before switching to the chroot. Another example is to use the 'P' flag in one chroot and not in another one (useful in a test environment to test different configurations of the same interpreter): ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-noargv0:OCF" --root=/chroot/powerpc/jessie /bin/bash -l root@localhost:/# sh -c 'echo $0' /bin/sh ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-argv0:OCFP" --root=/chroot/powerpc/jessie /bin/bash -l root@localhost:/# sh -c 'echo $0' sh Laurent Vivier (1): ns: add binfmt_misc to the user namespace fs/binfmt_misc.c | 115 +++++++++++++++++++++++++-------- include/linux/user_namespace.h | 15 +++++ kernel/user.c | 14 ++++ kernel/user_namespace.c | 3 + 4 files changed, 119 insertions(+), 28 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-11-07 14:03 [PATCH v7 0/1] ns: introduce binfmt_misc namespace Laurent Vivier @ 2019-11-07 14:03 ` Laurent Vivier 2019-12-03 9:44 ` Laurent Vivier 2019-12-03 15:40 ` Christian Brauner 0 siblings, 2 replies; 10+ messages in thread From: Laurent Vivier @ 2019-11-07 14:03 UTC (permalink / raw) To: linux-kernel Cc: Dmitry Safonov, Henning Schild, linux-fsdevel, James Bottomley, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka, Laurent Vivier This patch allows to have a different binfmt_misc configuration for each new user namespace. By default, the binfmt_misc configuration is the one of the previous level, but if the binfmt_misc filesystem is mounted in the new namespace a new empty binfmt instance is created and used in this namespace. For instance, using "unshare" we can start a chroot of another architecture and configure the binfmt_misc interpreter without being root to run the binaries in this chroot. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Acked-by: Andrei Vagin <avagin@gmail.com> --- fs/binfmt_misc.c | 115 +++++++++++++++++++++++++-------- include/linux/user_namespace.h | 15 +++++ kernel/user.c | 14 ++++ kernel/user_namespace.c | 3 + 4 files changed, 119 insertions(+), 28 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index cdb45829354d..ba5f0d2ade96 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -40,9 +40,6 @@ enum { VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ }; -static LIST_HEAD(entries); -static int enabled = 1; - enum {Enabled, Magic}; #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) #define MISC_FMT_OPEN_BINARY (1 << 30) @@ -62,10 +59,7 @@ typedef struct { struct file *interp_file; } Node; -static DEFINE_RWLOCK(entries_lock); static struct file_system_type bm_fs_type; -static struct vfsmount *bm_mnt; -static int entry_count; /* * Max length of the register string. Determined by: @@ -82,18 +76,37 @@ static int entry_count; */ #define MAX_REGISTER_LENGTH 1920 +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) +{ + struct binfmt_namespace *b_ns; + + while (ns) { + b_ns = READ_ONCE(ns->binfmt_ns); + if (b_ns) + return b_ns; + ns = ns->parent; + } + /* as the first user namespace is initialized with + * &init_binfmt_ns we should never come here + * but we try to stay safe by logging a warning + * and returning a sane value + */ + WARN_ON_ONCE(1); + return &init_binfmt_ns; +} + /* * Check if we support the binfmt * if we do, return the node, else NULL * locking is done in load_misc_binary */ -static Node *check_file(struct linux_binprm *bprm) +static Node *check_file(struct binfmt_namespace *ns, struct linux_binprm *bprm) { char *p = strrchr(bprm->interp, '.'); struct list_head *l; /* Walk all the registered handlers. */ - list_for_each(l, &entries) { + list_for_each(l, &ns->entries) { Node *e = list_entry(l, Node, list); char *s; int j; @@ -135,17 +148,18 @@ static int load_misc_binary(struct linux_binprm *bprm) struct file *interp_file = NULL; int retval; int fd_binary = -1; + struct binfmt_namespace *ns = binfmt_ns(current_user_ns()); retval = -ENOEXEC; - if (!enabled) + if (!ns->enabled) return retval; /* to keep locking time low, we copy the interpreter string */ - read_lock(&entries_lock); - fmt = check_file(bprm); + read_lock(&ns->entries_lock); + fmt = check_file(ns, bprm); if (fmt) dget(fmt->dentry); - read_unlock(&entries_lock); + read_unlock(&ns->entries_lock); if (!fmt) return retval; @@ -611,19 +625,19 @@ static void bm_evict_inode(struct inode *inode) kfree(e); } -static void kill_node(Node *e) +static void kill_node(struct binfmt_namespace *ns, Node *e) { struct dentry *dentry; - write_lock(&entries_lock); + write_lock(&ns->entries_lock); list_del_init(&e->list); - write_unlock(&entries_lock); + write_unlock(&ns->entries_lock); dentry = e->dentry; drop_nlink(d_inode(dentry)); d_drop(dentry); dput(dentry); - simple_release_fs(&bm_mnt, &entry_count); + simple_release_fs(&ns->bm_mnt, &ns->entry_count); } /* /<entry> */ @@ -653,6 +667,9 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, struct dentry *root; Node *e = file_inode(file)->i_private; int res = parse_command(buffer, count); + struct binfmt_namespace *ns; + + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); switch (res) { case 1: @@ -669,7 +686,7 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, inode_lock(d_inode(root)); if (!list_empty(&e->list)) - kill_node(e); + kill_node(ns, e); inode_unlock(d_inode(root)); break; @@ -695,6 +712,7 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, struct inode *inode; struct super_block *sb = file_inode(file)->i_sb; struct dentry *root = sb->s_root, *dentry; + struct binfmt_namespace *ns; int err = 0; e = create_entry(buffer, count); @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, if (!inode) goto out2; - err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count); + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); + err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt, + &ns->entry_count); if (err) { iput(inode); inode = NULL; @@ -727,12 +747,16 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, if (e->flags & MISC_FMT_OPEN_FILE) { struct file *f; + const struct cred *old_cred; + old_cred = override_creds(file->f_cred); f = open_exec(e->interpreter); + revert_creds(old_cred); if (IS_ERR(f)) { err = PTR_ERR(f); pr_notice("register: failed to install interpreter file %s\n", e->interpreter); - simple_release_fs(&bm_mnt, &entry_count); + simple_release_fs(&ns->bm_mnt, + &ns->entry_count); iput(inode); inode = NULL; goto out2; @@ -745,9 +769,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, inode->i_fop = &bm_entry_operations; d_instantiate(dentry, inode); - write_lock(&entries_lock); - list_add(&e->list, &entries); - write_unlock(&entries_lock); + write_lock(&ns->entries_lock); + list_add(&e->list, &ns->entries); + write_unlock(&ns->entries_lock); err = 0; out2: @@ -772,7 +796,9 @@ static const struct file_operations bm_register_operations = { static ssize_t bm_status_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) { - char *s = enabled ? "enabled\n" : "disabled\n"; + struct binfmt_namespace *ns = + binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); + char *s = ns->enabled ? "enabled\n" : "disabled\n"; return simple_read_from_buffer(buf, nbytes, ppos, s, strlen(s)); } @@ -780,25 +806,28 @@ bm_status_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) static ssize_t bm_status_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { + struct binfmt_namespace *ns; int res = parse_command(buffer, count); struct dentry *root; + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); switch (res) { case 1: /* Disable all handlers. */ - enabled = 0; + ns->enabled = 0; break; case 2: /* Enable all handlers. */ - enabled = 1; + ns->enabled = 1; break; case 3: /* Delete all handlers. */ root = file_inode(file)->i_sb->s_root; inode_lock(d_inode(root)); - while (!list_empty(&entries)) - kill_node(list_first_entry(&entries, Node, list)); + while (!list_empty(&ns->entries)) + kill_node(ns, list_first_entry(&ns->entries, + Node, list)); inode_unlock(d_inode(root)); break; @@ -825,24 +854,53 @@ static const struct super_operations s_ops = { static int bm_fill_super(struct super_block *sb, struct fs_context *fc) { int err; + struct user_namespace *ns = sb->s_user_ns; static const struct tree_descr bm_files[] = { [2] = {"status", &bm_status_operations, S_IWUSR|S_IRUGO}, [3] = {"register", &bm_register_operations, S_IWUSR}, /* last one */ {""} }; + /* create a new binfmt namespace + * if we are not in the first user namespace + * but the binfmt namespace is the first one + */ + if (READ_ONCE(ns->binfmt_ns) == NULL) { + struct binfmt_namespace *new_ns; + + new_ns = kmalloc(sizeof(struct binfmt_namespace), + GFP_KERNEL); + if (new_ns == NULL) + return -ENOMEM; + INIT_LIST_HEAD(&new_ns->entries); + new_ns->enabled = 1; + rwlock_init(&new_ns->entries_lock); + new_ns->bm_mnt = NULL; + new_ns->entry_count = 0; + /* ensure new_ns is completely initialized before sharing it */ + smp_wmb(); + WRITE_ONCE(ns->binfmt_ns, new_ns); + } + err = simple_fill_super(sb, BINFMTFS_MAGIC, bm_files); if (!err) sb->s_op = &s_ops; return err; } +static void bm_free(struct fs_context *fc) +{ + if (fc->s_fs_info) + put_user_ns(fc->s_fs_info); +} + static int bm_get_tree(struct fs_context *fc) { - return get_tree_single(fc, bm_fill_super); + return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns)); } static const struct fs_context_operations bm_context_ops = { + .free = bm_free, .get_tree = bm_get_tree, }; @@ -861,6 +919,7 @@ static struct file_system_type bm_fs_type = { .owner = THIS_MODULE, .name = "binfmt_misc", .init_fs_context = bm_init_fs_context, + .fs_flags = FS_USERNS_MOUNT, .kill_sb = kill_litter_super, }; MODULE_ALIAS_FS("binfmt_misc"); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index fb9f4f799554..16e6f3a97a01 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -52,6 +52,18 @@ enum ucount_type { UCOUNT_COUNTS, }; +#if IS_ENABLED(CONFIG_BINFMT_MISC) +struct binfmt_namespace { + struct list_head entries; + rwlock_t entries_lock; + int enabled; + struct vfsmount *bm_mnt; + int entry_count; +} __randomize_layout; + +extern struct binfmt_namespace init_binfmt_ns; +#endif + struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; @@ -86,6 +98,9 @@ struct user_namespace { #endif struct ucounts *ucounts; int ucount_max[UCOUNT_COUNTS]; +#if IS_ENABLED(CONFIG_BINFMT_MISC) + struct binfmt_namespace *binfmt_ns; +#endif } __randomize_layout; struct ucounts { diff --git a/kernel/user.c b/kernel/user.c index 5235d7f49982..092b2b4d47a6 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -20,6 +20,17 @@ #include <linux/user_namespace.h> #include <linux/proc_ns.h> +#if IS_ENABLED(CONFIG_BINFMT_MISC) +struct binfmt_namespace init_binfmt_ns = { + .entries = LIST_HEAD_INIT(init_binfmt_ns.entries), + .enabled = 1, + .entries_lock = __RW_LOCK_UNLOCKED(init_binfmt_ns.entries_lock), + .bm_mnt = NULL, + .entry_count = 0, +}; +EXPORT_SYMBOL_GPL(init_binfmt_ns); +#endif + /* * userns count is 1 for root user, 1 for init_uts_ns, * and 1 for... ? @@ -67,6 +78,9 @@ struct user_namespace init_user_ns = { .keyring_name_list = LIST_HEAD_INIT(init_user_ns.keyring_name_list), .keyring_sem = __RWSEM_INITIALIZER(init_user_ns.keyring_sem), #endif +#if IS_ENABLED(CONFIG_BINFMT_MISC) + .binfmt_ns = &init_binfmt_ns, +#endif }; EXPORT_SYMBOL_GPL(init_user_ns); diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 8eadadc478f9..f42c32269e20 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -191,6 +191,9 @@ static void free_user_ns(struct work_struct *work) kfree(ns->projid_map.forward); kfree(ns->projid_map.reverse); } +#if IS_ENABLED(CONFIG_BINFMT_MISC) + kfree(ns->binfmt_ns); +#endif retire_userns_sysctls(ns); key_free_user_ns(ns); ns_free_inum(&ns->ns); -- 2.21.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-11-07 14:03 ` [PATCH v7 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier @ 2019-12-03 9:44 ` Laurent Vivier 2019-12-13 17:51 ` Henning Schild 2019-12-03 15:40 ` Christian Brauner 1 sibling, 1 reply; 10+ messages in thread From: Laurent Vivier @ 2019-12-03 9:44 UTC (permalink / raw) To: linux-kernel Cc: Dmitry Safonov, Henning Schild, linux-fsdevel, James Bottomley, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka Ping Thanks, Laurent Le 07/11/2019 à 15:03, Laurent Vivier a écrit : > This patch allows to have a different binfmt_misc configuration > for each new user namespace. By default, the binfmt_misc configuration > is the one of the previous level, but if the binfmt_misc filesystem is > mounted in the new namespace a new empty binfmt instance is created and > used in this namespace. > > For instance, using "unshare" we can start a chroot of another > architecture and configure the binfmt_misc interpreter without being root > to run the binaries in this chroot. > > Signed-off-by: Laurent Vivier <laurent@vivier.eu> > Acked-by: Andrei Vagin <avagin@gmail.com> > --- > fs/binfmt_misc.c | 115 +++++++++++++++++++++++++-------- > include/linux/user_namespace.h | 15 +++++ > kernel/user.c | 14 ++++ > kernel/user_namespace.c | 3 + > 4 files changed, 119 insertions(+), 28 deletions(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index cdb45829354d..ba5f0d2ade96 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -40,9 +40,6 @@ enum { > VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ > }; > > -static LIST_HEAD(entries); > -static int enabled = 1; > - > enum {Enabled, Magic}; > #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) > #define MISC_FMT_OPEN_BINARY (1 << 30) > @@ -62,10 +59,7 @@ typedef struct { > struct file *interp_file; > } Node; > > -static DEFINE_RWLOCK(entries_lock); > static struct file_system_type bm_fs_type; > -static struct vfsmount *bm_mnt; > -static int entry_count; > > /* > * Max length of the register string. Determined by: > @@ -82,18 +76,37 @@ static int entry_count; > */ > #define MAX_REGISTER_LENGTH 1920 > > +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) > +{ > + struct binfmt_namespace *b_ns; > + > + while (ns) { > + b_ns = READ_ONCE(ns->binfmt_ns); > + if (b_ns) > + return b_ns; > + ns = ns->parent; > + } > + /* as the first user namespace is initialized with > + * &init_binfmt_ns we should never come here > + * but we try to stay safe by logging a warning > + * and returning a sane value > + */ > + WARN_ON_ONCE(1); > + return &init_binfmt_ns; > +} > + > /* > * Check if we support the binfmt > * if we do, return the node, else NULL > * locking is done in load_misc_binary > */ > -static Node *check_file(struct linux_binprm *bprm) > +static Node *check_file(struct binfmt_namespace *ns, struct linux_binprm *bprm) > { > char *p = strrchr(bprm->interp, '.'); > struct list_head *l; > > /* Walk all the registered handlers. */ > - list_for_each(l, &entries) { > + list_for_each(l, &ns->entries) { > Node *e = list_entry(l, Node, list); > char *s; > int j; > @@ -135,17 +148,18 @@ static int load_misc_binary(struct linux_binprm *bprm) > struct file *interp_file = NULL; > int retval; > int fd_binary = -1; > + struct binfmt_namespace *ns = binfmt_ns(current_user_ns()); > > retval = -ENOEXEC; > - if (!enabled) > + if (!ns->enabled) > return retval; > > /* to keep locking time low, we copy the interpreter string */ > - read_lock(&entries_lock); > - fmt = check_file(bprm); > + read_lock(&ns->entries_lock); > + fmt = check_file(ns, bprm); > if (fmt) > dget(fmt->dentry); > - read_unlock(&entries_lock); > + read_unlock(&ns->entries_lock); > if (!fmt) > return retval; > > @@ -611,19 +625,19 @@ static void bm_evict_inode(struct inode *inode) > kfree(e); > } > > -static void kill_node(Node *e) > +static void kill_node(struct binfmt_namespace *ns, Node *e) > { > struct dentry *dentry; > > - write_lock(&entries_lock); > + write_lock(&ns->entries_lock); > list_del_init(&e->list); > - write_unlock(&entries_lock); > + write_unlock(&ns->entries_lock); > > dentry = e->dentry; > drop_nlink(d_inode(dentry)); > d_drop(dentry); > dput(dentry); > - simple_release_fs(&bm_mnt, &entry_count); > + simple_release_fs(&ns->bm_mnt, &ns->entry_count); > } > > /* /<entry> */ > @@ -653,6 +667,9 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, > struct dentry *root; > Node *e = file_inode(file)->i_private; > int res = parse_command(buffer, count); > + struct binfmt_namespace *ns; > + > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > > switch (res) { > case 1: > @@ -669,7 +686,7 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, > inode_lock(d_inode(root)); > > if (!list_empty(&e->list)) > - kill_node(e); > + kill_node(ns, e); > > inode_unlock(d_inode(root)); > break; > @@ -695,6 +712,7 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, > struct inode *inode; > struct super_block *sb = file_inode(file)->i_sb; > struct dentry *root = sb->s_root, *dentry; > + struct binfmt_namespace *ns; > int err = 0; > > e = create_entry(buffer, count); > @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, > if (!inode) > goto out2; > > - err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count); > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > + err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt, > + &ns->entry_count); > if (err) { > iput(inode); > inode = NULL; > @@ -727,12 +747,16 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, > > if (e->flags & MISC_FMT_OPEN_FILE) { > struct file *f; > + const struct cred *old_cred; > > + old_cred = override_creds(file->f_cred); > f = open_exec(e->interpreter); > + revert_creds(old_cred); > if (IS_ERR(f)) { > err = PTR_ERR(f); > pr_notice("register: failed to install interpreter file %s\n", e->interpreter); > - simple_release_fs(&bm_mnt, &entry_count); > + simple_release_fs(&ns->bm_mnt, > + &ns->entry_count); > iput(inode); > inode = NULL; > goto out2; > @@ -745,9 +769,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer, > inode->i_fop = &bm_entry_operations; > > d_instantiate(dentry, inode); > - write_lock(&entries_lock); > - list_add(&e->list, &entries); > - write_unlock(&entries_lock); > + write_lock(&ns->entries_lock); > + list_add(&e->list, &ns->entries); > + write_unlock(&ns->entries_lock); > > err = 0; > out2: > @@ -772,7 +796,9 @@ static const struct file_operations bm_register_operations = { > static ssize_t > bm_status_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) > { > - char *s = enabled ? "enabled\n" : "disabled\n"; > + struct binfmt_namespace *ns = > + binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > + char *s = ns->enabled ? "enabled\n" : "disabled\n"; > > return simple_read_from_buffer(buf, nbytes, ppos, s, strlen(s)); > } > @@ -780,25 +806,28 @@ bm_status_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) > static ssize_t bm_status_write(struct file *file, const char __user *buffer, > size_t count, loff_t *ppos) > { > + struct binfmt_namespace *ns; > int res = parse_command(buffer, count); > struct dentry *root; > > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > switch (res) { > case 1: > /* Disable all handlers. */ > - enabled = 0; > + ns->enabled = 0; > break; > case 2: > /* Enable all handlers. */ > - enabled = 1; > + ns->enabled = 1; > break; > case 3: > /* Delete all handlers. */ > root = file_inode(file)->i_sb->s_root; > inode_lock(d_inode(root)); > > - while (!list_empty(&entries)) > - kill_node(list_first_entry(&entries, Node, list)); > + while (!list_empty(&ns->entries)) > + kill_node(ns, list_first_entry(&ns->entries, > + Node, list)); > > inode_unlock(d_inode(root)); > break; > @@ -825,24 +854,53 @@ static const struct super_operations s_ops = { > static int bm_fill_super(struct super_block *sb, struct fs_context *fc) > { > int err; > + struct user_namespace *ns = sb->s_user_ns; > static const struct tree_descr bm_files[] = { > [2] = {"status", &bm_status_operations, S_IWUSR|S_IRUGO}, > [3] = {"register", &bm_register_operations, S_IWUSR}, > /* last one */ {""} > }; > > + /* create a new binfmt namespace > + * if we are not in the first user namespace > + * but the binfmt namespace is the first one > + */ > + if (READ_ONCE(ns->binfmt_ns) == NULL) { > + struct binfmt_namespace *new_ns; > + > + new_ns = kmalloc(sizeof(struct binfmt_namespace), > + GFP_KERNEL); > + if (new_ns == NULL) > + return -ENOMEM; > + INIT_LIST_HEAD(&new_ns->entries); > + new_ns->enabled = 1; > + rwlock_init(&new_ns->entries_lock); > + new_ns->bm_mnt = NULL; > + new_ns->entry_count = 0; > + /* ensure new_ns is completely initialized before sharing it */ > + smp_wmb(); > + WRITE_ONCE(ns->binfmt_ns, new_ns); > + } > + > err = simple_fill_super(sb, BINFMTFS_MAGIC, bm_files); > if (!err) > sb->s_op = &s_ops; > return err; > } > > +static void bm_free(struct fs_context *fc) > +{ > + if (fc->s_fs_info) > + put_user_ns(fc->s_fs_info); > +} > + > static int bm_get_tree(struct fs_context *fc) > { > - return get_tree_single(fc, bm_fill_super); > + return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns)); > } > > static const struct fs_context_operations bm_context_ops = { > + .free = bm_free, > .get_tree = bm_get_tree, > }; > > @@ -861,6 +919,7 @@ static struct file_system_type bm_fs_type = { > .owner = THIS_MODULE, > .name = "binfmt_misc", > .init_fs_context = bm_init_fs_context, > + .fs_flags = FS_USERNS_MOUNT, > .kill_sb = kill_litter_super, > }; > MODULE_ALIAS_FS("binfmt_misc"); > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index fb9f4f799554..16e6f3a97a01 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -52,6 +52,18 @@ enum ucount_type { > UCOUNT_COUNTS, > }; > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > +struct binfmt_namespace { > + struct list_head entries; > + rwlock_t entries_lock; > + int enabled; > + struct vfsmount *bm_mnt; > + int entry_count; > +} __randomize_layout; > + > +extern struct binfmt_namespace init_binfmt_ns; > +#endif > + > struct user_namespace { > struct uid_gid_map uid_map; > struct uid_gid_map gid_map; > @@ -86,6 +98,9 @@ struct user_namespace { > #endif > struct ucounts *ucounts; > int ucount_max[UCOUNT_COUNTS]; > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + struct binfmt_namespace *binfmt_ns; > +#endif > } __randomize_layout; > > struct ucounts { > diff --git a/kernel/user.c b/kernel/user.c > index 5235d7f49982..092b2b4d47a6 100644 > --- a/kernel/user.c > +++ b/kernel/user.c > @@ -20,6 +20,17 @@ > #include <linux/user_namespace.h> > #include <linux/proc_ns.h> > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > +struct binfmt_namespace init_binfmt_ns = { > + .entries = LIST_HEAD_INIT(init_binfmt_ns.entries), > + .enabled = 1, > + .entries_lock = __RW_LOCK_UNLOCKED(init_binfmt_ns.entries_lock), > + .bm_mnt = NULL, > + .entry_count = 0, > +}; > +EXPORT_SYMBOL_GPL(init_binfmt_ns); > +#endif > + > /* > * userns count is 1 for root user, 1 for init_uts_ns, > * and 1 for... ? > @@ -67,6 +78,9 @@ struct user_namespace init_user_ns = { > .keyring_name_list = LIST_HEAD_INIT(init_user_ns.keyring_name_list), > .keyring_sem = __RWSEM_INITIALIZER(init_user_ns.keyring_sem), > #endif > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + .binfmt_ns = &init_binfmt_ns, > +#endif > }; > EXPORT_SYMBOL_GPL(init_user_ns); > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 8eadadc478f9..f42c32269e20 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -191,6 +191,9 @@ static void free_user_ns(struct work_struct *work) > kfree(ns->projid_map.forward); > kfree(ns->projid_map.reverse); > } > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + kfree(ns->binfmt_ns); > +#endif > retire_userns_sysctls(ns); > key_free_user_ns(ns); > ns_free_inum(&ns->ns); > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-03 9:44 ` Laurent Vivier @ 2019-12-13 17:51 ` Henning Schild 2019-12-13 19:59 ` James Bottomley 0 siblings, 1 reply; 10+ messages in thread From: Henning Schild @ 2019-12-13 17:51 UTC (permalink / raw) To: Laurent Vivier Cc: linux-kernel, Dmitry Safonov, linux-fsdevel, James Bottomley, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka Hi all, that is a very useful contribution, which will hopefully be considered. Tested-by: Henning Schild <henning.schild@siemens.com> Having a private binfmt_misc allows having different interpreters (paths to them) for the same format in each namespace. Imagine chroots of different architectures where the static qemu is in a different place i.e. /usr/bin/qemu-arm-static, vs /sbin/qemu-arm-static, or /home/dev/qemu/qemu-arm-static And it also eventually allows setting up binfmt_misc for a chroot without requiring root. Henning Am Tue, 3 Dec 2019 10:44:44 +0100 schrieb Laurent Vivier <laurent@vivier.eu>: > Ping > > Thanks, > Laurent > > Le 07/11/2019 à 15:03, Laurent Vivier a écrit : > > This patch allows to have a different binfmt_misc configuration > > for each new user namespace. By default, the binfmt_misc > > configuration is the one of the previous level, but if the > > binfmt_misc filesystem is mounted in the new namespace a new empty > > binfmt instance is created and used in this namespace. > > > > For instance, using "unshare" we can start a chroot of another > > architecture and configure the binfmt_misc interpreter without > > being root to run the binaries in this chroot. > > > > Signed-off-by: Laurent Vivier <laurent@vivier.eu> > > Acked-by: Andrei Vagin <avagin@gmail.com> > > --- > > fs/binfmt_misc.c | 115 > > +++++++++++++++++++++++++-------- include/linux/user_namespace.h | > > 15 +++++ kernel/user.c | 14 ++++ > > kernel/user_namespace.c | 3 + > > 4 files changed, 119 insertions(+), 28 deletions(-) > > > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > > index cdb45829354d..ba5f0d2ade96 100644 > > --- a/fs/binfmt_misc.c > > +++ b/fs/binfmt_misc.c > > @@ -40,9 +40,6 @@ enum { > > VERBOSE_STATUS = 1 /* make it zero to save 400 bytes > > kernel memory */ }; > > > > -static LIST_HEAD(entries); > > -static int enabled = 1; > > - > > enum {Enabled, Magic}; > > #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) > > #define MISC_FMT_OPEN_BINARY (1 << 30) > > @@ -62,10 +59,7 @@ typedef struct { > > struct file *interp_file; > > } Node; > > > > -static DEFINE_RWLOCK(entries_lock); > > static struct file_system_type bm_fs_type; > > -static struct vfsmount *bm_mnt; > > -static int entry_count; > > > > /* > > * Max length of the register string. Determined by: > > @@ -82,18 +76,37 @@ static int entry_count; > > */ > > #define MAX_REGISTER_LENGTH 1920 > > > > +static struct binfmt_namespace *binfmt_ns(struct user_namespace > > *ns) +{ > > + struct binfmt_namespace *b_ns; > > + > > + while (ns) { > > + b_ns = READ_ONCE(ns->binfmt_ns); > > + if (b_ns) > > + return b_ns; > > + ns = ns->parent; > > + } > > + /* as the first user namespace is initialized with > > + * &init_binfmt_ns we should never come here > > + * but we try to stay safe by logging a warning > > + * and returning a sane value > > + */ > > + WARN_ON_ONCE(1); > > + return &init_binfmt_ns; > > +} > > + > > /* > > * Check if we support the binfmt > > * if we do, return the node, else NULL > > * locking is done in load_misc_binary > > */ > > -static Node *check_file(struct linux_binprm *bprm) > > +static Node *check_file(struct binfmt_namespace *ns, struct > > linux_binprm *bprm) { > > char *p = strrchr(bprm->interp, '.'); > > struct list_head *l; > > > > /* Walk all the registered handlers. */ > > - list_for_each(l, &entries) { > > + list_for_each(l, &ns->entries) { > > Node *e = list_entry(l, Node, list); > > char *s; > > int j; > > @@ -135,17 +148,18 @@ static int load_misc_binary(struct > > linux_binprm *bprm) struct file *interp_file = NULL; > > int retval; > > int fd_binary = -1; > > + struct binfmt_namespace *ns = binfmt_ns(current_user_ns()); > > > > retval = -ENOEXEC; > > - if (!enabled) > > + if (!ns->enabled) > > return retval; > > > > /* to keep locking time low, we copy the interpreter > > string */ > > - read_lock(&entries_lock); > > - fmt = check_file(bprm); > > + read_lock(&ns->entries_lock); > > + fmt = check_file(ns, bprm); > > if (fmt) > > dget(fmt->dentry); > > - read_unlock(&entries_lock); > > + read_unlock(&ns->entries_lock); > > if (!fmt) > > return retval; > > > > @@ -611,19 +625,19 @@ static void bm_evict_inode(struct inode > > *inode) kfree(e); > > } > > > > -static void kill_node(Node *e) > > +static void kill_node(struct binfmt_namespace *ns, Node *e) > > { > > struct dentry *dentry; > > > > - write_lock(&entries_lock); > > + write_lock(&ns->entries_lock); > > list_del_init(&e->list); > > - write_unlock(&entries_lock); > > + write_unlock(&ns->entries_lock); > > > > dentry = e->dentry; > > drop_nlink(d_inode(dentry)); > > d_drop(dentry); > > dput(dentry); > > - simple_release_fs(&bm_mnt, &entry_count); > > + simple_release_fs(&ns->bm_mnt, &ns->entry_count); > > } > > > > /* /<entry> */ > > @@ -653,6 +667,9 @@ static ssize_t bm_entry_write(struct file > > *file, const char __user *buffer, struct dentry *root; > > Node *e = file_inode(file)->i_private; > > int res = parse_command(buffer, count); > > + struct binfmt_namespace *ns; > > + > > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > > > > switch (res) { > > case 1: > > @@ -669,7 +686,7 @@ static ssize_t bm_entry_write(struct file > > *file, const char __user *buffer, inode_lock(d_inode(root)); > > > > if (!list_empty(&e->list)) > > - kill_node(e); > > + kill_node(ns, e); > > > > inode_unlock(d_inode(root)); > > break; > > @@ -695,6 +712,7 @@ static ssize_t bm_register_write(struct file > > *file, const char __user *buffer, struct inode *inode; > > struct super_block *sb = file_inode(file)->i_sb; > > struct dentry *root = sb->s_root, *dentry; > > + struct binfmt_namespace *ns; > > int err = 0; > > > > e = create_entry(buffer, count); > > @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file > > *file, const char __user *buffer, if (!inode) > > goto out2; > > > > - err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count); > > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > > + err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt, > > + &ns->entry_count); > > if (err) { > > iput(inode); > > inode = NULL; > > @@ -727,12 +747,16 @@ static ssize_t bm_register_write(struct file > > *file, const char __user *buffer, > > if (e->flags & MISC_FMT_OPEN_FILE) { > > struct file *f; > > + const struct cred *old_cred; > > > > + old_cred = override_creds(file->f_cred); > > f = open_exec(e->interpreter); > > + revert_creds(old_cred); > > if (IS_ERR(f)) { > > err = PTR_ERR(f); > > pr_notice("register: failed to install > > interpreter file %s\n", e->interpreter); > > - simple_release_fs(&bm_mnt, &entry_count); > > + simple_release_fs(&ns->bm_mnt, > > + &ns->entry_count); > > iput(inode); > > inode = NULL; > > goto out2; > > @@ -745,9 +769,9 @@ static ssize_t bm_register_write(struct file > > *file, const char __user *buffer, inode->i_fop = > > &bm_entry_operations; > > d_instantiate(dentry, inode); > > - write_lock(&entries_lock); > > - list_add(&e->list, &entries); > > - write_unlock(&entries_lock); > > + write_lock(&ns->entries_lock); > > + list_add(&e->list, &ns->entries); > > + write_unlock(&ns->entries_lock); > > > > err = 0; > > out2: > > @@ -772,7 +796,9 @@ static const struct file_operations > > bm_register_operations = { static ssize_t > > bm_status_read(struct file *file, char __user *buf, size_t nbytes, > > loff_t *ppos) { > > - char *s = enabled ? "enabled\n" : "disabled\n"; > > + struct binfmt_namespace *ns = > > + > > binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > > + char *s = ns->enabled ? "enabled\n" : "disabled\n"; > > > > return simple_read_from_buffer(buf, nbytes, ppos, s, > > strlen(s)); } > > @@ -780,25 +806,28 @@ bm_status_read(struct file *file, char __user > > *buf, size_t nbytes, loff_t *ppos) static ssize_t > > bm_status_write(struct file *file, const char __user *buffer, > > size_t count, loff_t *ppos) { > > + struct binfmt_namespace *ns; > > int res = parse_command(buffer, count); > > struct dentry *root; > > > > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); > > switch (res) { > > case 1: > > /* Disable all handlers. */ > > - enabled = 0; > > + ns->enabled = 0; > > break; > > case 2: > > /* Enable all handlers. */ > > - enabled = 1; > > + ns->enabled = 1; > > break; > > case 3: > > /* Delete all handlers. */ > > root = file_inode(file)->i_sb->s_root; > > inode_lock(d_inode(root)); > > > > - while (!list_empty(&entries)) > > - kill_node(list_first_entry(&entries, Node, > > list)); > > + while (!list_empty(&ns->entries)) > > + kill_node(ns, > > list_first_entry(&ns->entries, > > + Node, > > list)); > > inode_unlock(d_inode(root)); > > break; > > @@ -825,24 +854,53 @@ static const struct super_operations s_ops = { > > static int bm_fill_super(struct super_block *sb, struct fs_context > > *fc) { > > int err; > > + struct user_namespace *ns = sb->s_user_ns; > > static const struct tree_descr bm_files[] = { > > [2] = {"status", &bm_status_operations, > > S_IWUSR|S_IRUGO}, [3] = {"register", &bm_register_operations, > > S_IWUSR}, /* last one */ {""} > > }; > > > > + /* create a new binfmt namespace > > + * if we are not in the first user namespace > > + * but the binfmt namespace is the first one > > + */ > > + if (READ_ONCE(ns->binfmt_ns) == NULL) { > > + struct binfmt_namespace *new_ns; > > + > > + new_ns = kmalloc(sizeof(struct binfmt_namespace), > > + GFP_KERNEL); > > + if (new_ns == NULL) > > + return -ENOMEM; > > + INIT_LIST_HEAD(&new_ns->entries); > > + new_ns->enabled = 1; > > + rwlock_init(&new_ns->entries_lock); > > + new_ns->bm_mnt = NULL; > > + new_ns->entry_count = 0; > > + /* ensure new_ns is completely initialized before > > sharing it */ > > + smp_wmb(); > > + WRITE_ONCE(ns->binfmt_ns, new_ns); > > + } > > + > > err = simple_fill_super(sb, BINFMTFS_MAGIC, bm_files); > > if (!err) > > sb->s_op = &s_ops; > > return err; > > } > > > > +static void bm_free(struct fs_context *fc) > > +{ > > + if (fc->s_fs_info) > > + put_user_ns(fc->s_fs_info); > > +} > > + > > static int bm_get_tree(struct fs_context *fc) > > { > > - return get_tree_single(fc, bm_fill_super); > > + return get_tree_keyed(fc, bm_fill_super, > > get_user_ns(fc->user_ns)); } > > > > static const struct fs_context_operations bm_context_ops = { > > + .free = bm_free, > > .get_tree = bm_get_tree, > > }; > > > > @@ -861,6 +919,7 @@ static struct file_system_type bm_fs_type = { > > .owner = THIS_MODULE, > > .name = "binfmt_misc", > > .init_fs_context = bm_init_fs_context, > > + .fs_flags = FS_USERNS_MOUNT, > > .kill_sb = kill_litter_super, > > }; > > MODULE_ALIAS_FS("binfmt_misc"); > > diff --git a/include/linux/user_namespace.h > > b/include/linux/user_namespace.h index fb9f4f799554..16e6f3a97a01 > > 100644 --- a/include/linux/user_namespace.h > > +++ b/include/linux/user_namespace.h > > @@ -52,6 +52,18 @@ enum ucount_type { > > UCOUNT_COUNTS, > > }; > > > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > +struct binfmt_namespace { > > + struct list_head entries; > > + rwlock_t entries_lock; > > + int enabled; > > + struct vfsmount *bm_mnt; > > + int entry_count; > > +} __randomize_layout; > > + > > +extern struct binfmt_namespace init_binfmt_ns; > > +#endif > > + > > struct user_namespace { > > struct uid_gid_map uid_map; > > struct uid_gid_map gid_map; > > @@ -86,6 +98,9 @@ struct user_namespace { > > #endif > > struct ucounts *ucounts; > > int ucount_max[UCOUNT_COUNTS]; > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > + struct binfmt_namespace *binfmt_ns; > > +#endif > > } __randomize_layout; > > > > struct ucounts { > > diff --git a/kernel/user.c b/kernel/user.c > > index 5235d7f49982..092b2b4d47a6 100644 > > --- a/kernel/user.c > > +++ b/kernel/user.c > > @@ -20,6 +20,17 @@ > > #include <linux/user_namespace.h> > > #include <linux/proc_ns.h> > > > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > +struct binfmt_namespace init_binfmt_ns = { > > + .entries = LIST_HEAD_INIT(init_binfmt_ns.entries), > > + .enabled = 1, > > + .entries_lock = > > __RW_LOCK_UNLOCKED(init_binfmt_ns.entries_lock), > > + .bm_mnt = NULL, > > + .entry_count = 0, > > +}; > > +EXPORT_SYMBOL_GPL(init_binfmt_ns); > > +#endif > > + > > /* > > * userns count is 1 for root user, 1 for init_uts_ns, > > * and 1 for... ? > > @@ -67,6 +78,9 @@ struct user_namespace init_user_ns = { > > .keyring_name_list = > > LIST_HEAD_INIT(init_user_ns.keyring_name_list), .keyring_sem = > > __RWSEM_INITIALIZER(init_user_ns.keyring_sem), #endif > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > + .binfmt_ns = &init_binfmt_ns, > > +#endif > > }; > > EXPORT_SYMBOL_GPL(init_user_ns); > > > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > > index 8eadadc478f9..f42c32269e20 100644 > > --- a/kernel/user_namespace.c > > +++ b/kernel/user_namespace.c > > @@ -191,6 +191,9 @@ static void free_user_ns(struct work_struct > > *work) kfree(ns->projid_map.forward); > > kfree(ns->projid_map.reverse); > > } > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > + kfree(ns->binfmt_ns); > > +#endif > > retire_userns_sysctls(ns); > > key_free_user_ns(ns); > > ns_free_inum(&ns->ns); > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-13 17:51 ` Henning Schild @ 2019-12-13 19:59 ` James Bottomley 2019-12-14 11:34 ` Laurent Vivier 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2019-12-13 19:59 UTC (permalink / raw) To: Henning Schild, Laurent Vivier Cc: linux-kernel, Dmitry Safonov, linux-fsdevel, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka On Fri, 2019-12-13 at 18:51 +0100, Henning Schild wrote: > Hi all, > > that is a very useful contribution, which will hopefully be > considered. I'm technically the maintainer on the you touched it last you own it basis, so if Christian's concerns get addressed I'll shepherd it upstream. James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-13 19:59 ` James Bottomley @ 2019-12-14 11:34 ` Laurent Vivier 2019-12-14 12:32 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 10+ messages in thread From: Laurent Vivier @ 2019-12-14 11:34 UTC (permalink / raw) To: James Bottomley, Henning Schild Cc: linux-kernel, Dmitry Safonov, linux-fsdevel, Eric Biederman, linux-api, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, containers, Alexander Viro, Jan Kiszka Le 13/12/2019 à 20:59, James Bottomley a écrit : > On Fri, 2019-12-13 at 18:51 +0100, Henning Schild wrote: >> Hi all, >> >> that is a very useful contribution, which will hopefully be >> considered. > > I'm technically the maintainer on the you touched it last you own it > basis, so if Christian's concerns get addressed I'll shepherd it > upstream. Thank you. I update this in the next days and re-send the patch. Thanks, Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-14 11:34 ` Laurent Vivier @ 2019-12-14 12:32 ` Michael Kerrisk (man-pages) 2019-12-16 9:13 ` Laurent Vivier 0 siblings, 1 reply; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-14 12:32 UTC (permalink / raw) To: Laurent Vivier Cc: James Bottomley, Henning Schild, lkml, Dmitry Safonov, linux-fsdevel, Eric Biederman, Linux API, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, Containers, Alexander Viro, Jan Kiszka Hello Laurent, On Sat, 14 Dec 2019 at 12:35, Laurent Vivier <laurent@vivier.eu> wrote: > > Le 13/12/2019 à 20:59, James Bottomley a écrit : > > On Fri, 2019-12-13 at 18:51 +0100, Henning Schild wrote: > >> Hi all, > >> > >> that is a very useful contribution, which will hopefully be > >> considered. > > > > I'm technically the maintainer on the you touched it last you own it > > basis, so if Christian's concerns get addressed I'll shepherd it > > upstream. > > Thank you. > > I update this in the next days and re-send the patch. Would you also be so kind as to craft a patch for the user_namespaces(7) manual page describing the changes (sent to me, linux-man@vger.kernel.org, and the other parties already in CC)? If you do not have the time to familiarize yourself with groff/man markup, a patch that uses plain text is fine; I can handle the formatting. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-14 12:32 ` Michael Kerrisk (man-pages) @ 2019-12-16 9:13 ` Laurent Vivier 2019-12-16 15:34 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 10+ messages in thread From: Laurent Vivier @ 2019-12-16 9:13 UTC (permalink / raw) To: mtk.manpages Cc: James Bottomley, Henning Schild, lkml, Dmitry Safonov, linux-fsdevel, Eric Biederman, Linux API, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, Containers, Alexander Viro, Jan Kiszka Le 14/12/2019 à 13:32, Michael Kerrisk (man-pages) a écrit : > Hello Laurent, > > On Sat, 14 Dec 2019 at 12:35, Laurent Vivier <laurent@vivier.eu> wrote: >> >> Le 13/12/2019 à 20:59, James Bottomley a écrit : >>> On Fri, 2019-12-13 at 18:51 +0100, Henning Schild wrote: >>>> Hi all, >>>> >>>> that is a very useful contribution, which will hopefully be >>>> considered. >>> >>> I'm technically the maintainer on the you touched it last you own it >>> basis, so if Christian's concerns get addressed I'll shepherd it >>> upstream. >> >> Thank you. >> >> I update this in the next days and re-send the patch. > > Would you also be so kind as to craft a patch for the > user_namespaces(7) manual page describing the changes (sent to me, > linux-man@vger.kernel.org, and the other parties already in CC)? > > If you do not have the time to familiarize yourself with groff/man > markup, a patch that uses plain text is fine; I can handle the > formatting. I will send a patch for the user_namespaces(7) manual. Thanks, Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-12-16 9:13 ` Laurent Vivier @ 2019-12-16 15:34 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-16 15:34 UTC (permalink / raw) To: Laurent Vivier Cc: James Bottomley, Henning Schild, lkml, Dmitry Safonov, linux-fsdevel, Eric Biederman, Linux API, Andrei Vagin, Cédric Le Goater, Greg Kurz, Jann Horn, Containers, Alexander Viro, Jan Kiszka Hi Laurent, On Mon, 16 Dec 2019 at 10:13, Laurent Vivier <laurent@vivier.eu> wrote: > > Le 14/12/2019 à 13:32, Michael Kerrisk (man-pages) a écrit : > > Hello Laurent, > > > > On Sat, 14 Dec 2019 at 12:35, Laurent Vivier <laurent@vivier.eu> wrote: > >> > >> Le 13/12/2019 à 20:59, James Bottomley a écrit : > >>> On Fri, 2019-12-13 at 18:51 +0100, Henning Schild wrote: > >>>> Hi all, > >>>> > >>>> that is a very useful contribution, which will hopefully be > >>>> considered. > >>> > >>> I'm technically the maintainer on the you touched it last you own it > >>> basis, so if Christian's concerns get addressed I'll shepherd it > >>> upstream. > >> > >> Thank you. > >> > >> I update this in the next days and re-send the patch. > > > > Would you also be so kind as to craft a patch for the > > user_namespaces(7) manual page describing the changes (sent to me, > > linux-man@vger.kernel.org, and the other parties already in CC)? > > > > If you do not have the time to familiarize yourself with groff/man > > markup, a patch that uses plain text is fine; I can handle the > > formatting. > > I will send a patch for the user_namespaces(7) manual. Thanks! Could you send that in parallel with (each of) the patch iterations please (rather than after the feature has been merged). Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace 2019-11-07 14:03 ` [PATCH v7 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier 2019-12-03 9:44 ` Laurent Vivier @ 2019-12-03 15:40 ` Christian Brauner 1 sibling, 0 replies; 10+ messages in thread From: Christian Brauner @ 2019-12-03 15:40 UTC (permalink / raw) To: Laurent Vivier Cc: linux-kernel, Henning Schild, Dmitry Safonov, linux-api, containers, Jann Horn, Greg Kurz, Alexander Viro, James Bottomley, Eric Biederman, Jan Kiszka, linux-fsdevel, Cédric Le Goater, dhowells On Thu, Nov 07, 2019 at 03:03:04PM +0100, Laurent Vivier wrote: > This patch allows to have a different binfmt_misc configuration > for each new user namespace. By default, the binfmt_misc configuration > is the one of the previous level, but if the binfmt_misc filesystem is > mounted in the new namespace a new empty binfmt instance is created and > used in this namespace. > > For instance, using "unshare" we can start a chroot of another > architecture and configure the binfmt_misc interpreter without being root > to run the binaries in this chroot. > > Signed-off-by: Laurent Vivier <laurent@vivier.eu> > Acked-by: Andrei Vagin <avagin@gmail.com> > --- > fs/binfmt_misc.c | 115 +++++++++++++++++++++++++-------- > include/linux/user_namespace.h | 15 +++++ > kernel/user.c | 14 ++++ > kernel/user_namespace.c | 3 + > 4 files changed, 119 insertions(+), 28 deletions(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index cdb45829354d..ba5f0d2ade96 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -40,9 +40,6 @@ enum { > VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ > }; > > -static LIST_HEAD(entries); > -static int enabled = 1; > - > enum {Enabled, Magic}; > #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) > #define MISC_FMT_OPEN_BINARY (1 << 30) > @@ -62,10 +59,7 @@ typedef struct { > struct file *interp_file; > } Node; > > -static DEFINE_RWLOCK(entries_lock); > static struct file_system_type bm_fs_type; > -static struct vfsmount *bm_mnt; > -static int entry_count; > > /* > * Max length of the register string. Determined by: > @@ -82,18 +76,37 @@ static int entry_count; > */ > #define MAX_REGISTER_LENGTH 1920 > > +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) > +{ > + struct binfmt_namespace *b_ns; > + > + while (ns) { > + b_ns = READ_ONCE(ns->binfmt_ns); > + if (b_ns) > + return b_ns; > + ns = ns->parent; > + } > + /* as the first user namespace is initialized with > + * &init_binfmt_ns we should never come here > + * but we try to stay safe by logging a warning > + * and returning a sane value > + */ > + WARN_ON_ONCE(1); > + return &init_binfmt_ns; > +} > + > /* > * Check if we support the binfmt > * if we do, return the node, else NULL > * locking is done in load_misc_binary > */ > -static Node *check_file(struct linux_binprm *bprm) > +static Node *check_file(struct binfmt_namespace *ns, struct linux_binprm *bprm) > { > char *p = strrchr(bprm->interp, '.'); > struct list_head *l; > > /* Walk all the registered handlers. */ > - list_for_each(l, &entries) { > + list_for_each(l, &ns->entries) { > Node *e = list_entry(l, Node, list); > char *s; > int j; > @@ -135,17 +148,18 @@ static int load_misc_binary(struct linux_binprm *bprm) > struct file *interp_file = NULL; > int retval; > int fd_binary = -1; > + struct binfmt_namespace *ns = binfmt_ns(current_user_ns()); > > retval = -ENOEXEC; > - if (!enabled) > + if (!ns->enabled) > return retval; > > /* to keep locking time low, we copy the interpreter string */ > - read_lock(&entries_lock); > - fmt = check_file(bprm); > + read_lock(&ns->entries_lock); > + fmt = check_file(ns, bprm); > if (fmt) > dget(fmt->dentry); > - read_unlock(&entries_lock); > + read_unlock(&ns->entries_lock); > if (!fmt) > return retval; > > @@ -611,19 +625,19 @@ static void bm_evict_inode(struct inode *inode) > kfree(e); > } > > -static void kill_node(Node *e) > +static void kill_node(struct binfmt_namespace *ns, Node *e) > { > struct dentry *dentry; > > - write_lock(&entries_lock); > + write_lock(&ns->entries_lock); > list_del_init(&e->list); > - write_unlock(&entries_lock); > + write_unlock(&ns->entries_lock); > > dentry = e->dentry; > drop_nlink(d_inode(dentry)); > d_drop(dentry); > dput(dentry); > - simple_release_fs(&bm_mnt, &entry_count); > + simple_release_fs(&ns->bm_mnt, &ns->entry_count); > } > > /* /<entry> */ > @@ -653,6 +667,9 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, > struct dentry *root; > Node *e = file_inode(file)->i_private; > int res = parse_command(buffer, count); > + struct binfmt_namespace *ns; > + > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); Sorry for being so late to the party. The naked dereferences here are not very pretty and also likely mean you access the wrong dentry when you do (weird but possible) things like: mount -t overlay overlay -o lowerdir=/proc/sys/fs/binfmt_misc,workdir=/work,upperdir=/upper /merged (which might already cause trouble in other parts of this code) so I think it's better if you replace file->f_path.dentry->d_sb->s_user_ns with file_dentry(file)->d_sb->s_user_ns in places where you do naked dereferences now. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-12-16 15:34 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-07 14:03 [PATCH v7 0/1] ns: introduce binfmt_misc namespace Laurent Vivier 2019-11-07 14:03 ` [PATCH v7 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier 2019-12-03 9:44 ` Laurent Vivier 2019-12-13 17:51 ` Henning Schild 2019-12-13 19:59 ` James Bottomley 2019-12-14 11:34 ` Laurent Vivier 2019-12-14 12:32 ` Michael Kerrisk (man-pages) 2019-12-16 9:13 ` Laurent Vivier 2019-12-16 15:34 ` Michael Kerrisk (man-pages) 2019-12-03 15:40 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).