From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-f67.google.com ([209.85.210.67]:35090 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727181AbeJHSih (ORCPT ); Mon, 8 Oct 2018 14:38:37 -0400 Received: by mail-ot1-f67.google.com with SMTP id j9-v6so19245251otl.2 for ; Mon, 08 Oct 2018 04:27:21 -0700 (PDT) MIME-Version: 1.0 References: <20181006193546.29340-1-laurent@vivier.eu> <20181006193546.29340-2-laurent@vivier.eu> In-Reply-To: <20181006193546.29340-2-laurent@vivier.eu> From: Jann Horn Date: Mon, 8 Oct 2018 13:26:54 +0200 Message-ID: Subject: Re: [RFC v4 1/1] ns: add binfmt_misc to the user namespace To: Laurent Vivier Cc: kernel list , avagin@gmail.com, linux-fsdevel@vger.kernel.org, "Eric W. Biederman" , Linux API , dima@arista.com, containers@lists.linux-foundation.org, Al Viro , James Bottomley Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Oct 6, 2018 at 9:36 PM Laurent Vivier wrote: > This patch allows to have a different binfmt_misc configuration > for each new user namespace. By default, the binfmt_misc configuration > is the one of the previous level, but if the binfmt_misc filesystem is > mounted in the new namespace a new empty binfmt instance is created and > used in this namespace. > > For instance, using "unshare" we can start a chroot of an another > architecture and configure the binfmt_misc interpreter without being root > to run the binaries in this chroot. > > Signed-off-by: Laurent Vivier > --- [...] > +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) > +{ > + while (ns) { > + if (ns->binfmt_ns) > + return ns->binfmt_ns; > + ns = ns->parent; > + } > + return NULL; > +} If the value being read can change under you, please use READ_ONCE(). Also: That "return NULL" can never happen, right? You should probably at least put a WARN(...) in there. [...] > @@ -838,7 +858,29 @@ static int bm_fill_super(struct super_block *sb, void *data, int silent) > static struct dentry *bm_mount(struct file_system_type *fs_type, > int flags, const char *dev_name, void *data) > { > - return mount_single(fs_type, flags, data, bm_fill_super); > + struct user_namespace *ns = current_user_ns(); > + > + /* create a new binfmt namespace > + * if we are not in the first user namespace > + * but the binfmt namespace is the first one > + */ > + if (ns->binfmt_ns == NULL) { > + struct binfmt_namespace *new_ns; > + > + new_ns = kmalloc(sizeof(struct binfmt_namespace), > + GFP_KERNEL); > + if (new_ns == NULL) > + return ERR_PTR(-ENOMEM); > + INIT_LIST_HEAD(&new_ns->entries); > + new_ns->enabled = 1; > + rwlock_init(&new_ns->entries_lock); > + new_ns->bm_mnt = NULL; > + new_ns->entry_count = 0; > + ns->binfmt_ns = new_ns; What happens if someone mounts two instances of the binfmt_misc filesystem at the same time? Would you end up creating two binfmt namespaces, one of which would never be freed again? > + } > + > + return mount_ns(fs_type, flags, data, ns, ns, > + bm_fill_super); > } [...] > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index e5222b5fb4fe..da4950282ea1 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -140,6 +140,10 @@ int create_user_ns(struct cred *new) > if (!setup_userns_sysctls(ns)) > goto fail_keyring; > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + ns->binfmt_ns = NULL; > +#endif Isn't this unnecessary? The namespace is allocated with all fields zeroed: ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);