All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Laurent Vivier <laurent@vivier.eu>
Cc: "kernel list" <linux-kernel@vger.kernel.org>,
	"Greg Kurz" <groug@kaod.org>, "Andrei Vagin" <avagin@gmail.com>,
	"Linux API" <linux-api@vger.kernel.org>,
	"Dmitry Safonov" <dima@arista.com>,
	"James Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Linux Containers" <containers@lists.linux-foundation.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Eric Biederman" <ebiederm@xmission.com>,
	"Henning Schild" <henning.schild@siemens.com>,
	"Cédric Le Goater" <clg@kaod.org>
Subject: Re: [PATCH v8 1/1] ns: add binfmt_misc to the user namespace
Date: Mon, 16 Dec 2019 23:53:54 +0100	[thread overview]
Message-ID: <CAG48ez2YE33KiuhnHa=cq_DymqWLAv9CyeD3BOrjsStKfb_dBQ@mail.gmail.com> (raw)
In-Reply-To: <15d270a6-2264-adc5-3f56-fdb8b67ad267@vivier.eu>

On Mon, Dec 16, 2019 at 9:05 PM Laurent Vivier <laurent@vivier.eu> wrote:
> Le 16/12/2019 à 20:08, Jann Horn a écrit :
> > On Mon, Dec 16, 2019 at 10:12 AM Laurent Vivier <laurent@vivier.eu> wrote:
> >> This patch allows to have a different binfmt_misc configuration
> >> for each new user namespace. By default, the binfmt_misc configuration
> >> is the one of the previous level, but if the binfmt_misc filesystem is
> >> mounted in the new namespace a new empty binfmt instance is created and
> >> used in this namespace.
> >>
> >> For instance, using "unshare" we can start a chroot of another
> >> architecture and configure the binfmt_misc interpreter without being root
> >> to run the binaries in this chroot.
> >
> > How do you ensure that when userspace is no longer using the user
> > namespace and mount namespace, the entries and the binfmt_misc
> > superblock are deleted? As far as I can tell from looking at the code,
> > at the moment, if I create a user namespace+mount namespace, mount
> > binfmt_misc in there, register a file format and then let all
> > processes inside the namespaces exit, the binfmt_misc mount will be
> > kept alive by the simple_pin_fs() stuff, and the binfmt_misc entries
> > will also stay in memory.
> >
> > [...]
>
> Do you have any idea how I can fix this issue?

I think the easiest way (keeping in mind that we want to avoid having
to fiddle around with reference loops, where e.g. interpreter
executable files opened by binfmt_misc have references back to the
user namespace through ->f_cred) would be to add a new patch in front
of this one that changes the semantics such that when binfmt_misc is
unmounted, all the existing format registrations are deleted. That's
probably also nicer from the perspective of inspectability. It could
in theory break stuff, but I think that's probably somewhat unlikely.
Still, it'd be an API change, and therefore you should CC linux-api@
on such a change.

> >> @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
> >>         if (!inode)
> >>                 goto out2;
> >>
> >> -       err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
> >> +       ns = binfmt_ns(file_dentry(file)->d_sb->s_user_ns);
> >> +       err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt,
> >> +                           &ns->entry_count);
> >
> > When you call simple_pin_fs() here, the user namespace of `current`
> > and the user namespace of the superblock are not necessarily related.
> > So simple_pin_fs() may end up taking a reference on the mountpoint for
> > a user namespace that has nothing to do with the namespace for which
> > an entry is being created.
>
> Do you have any idea how I can fix this issue?

If you fix the memory leak the way I suggested, this wouldn't be a
problem anymore either.

> >> +static void bm_free(struct fs_context *fc)
> >> +{
> >> +       if (fc->s_fs_info)
> >> +               put_user_ns(fc->s_fs_info);
> >> +}
> >
> > Silly question: Why the "if"? Can you ever reach this with fc->s_fs_info==NULL?
>
> So I understand the if is unnecessary and I will remove it.

Your code was actually exactly right, I didn't understand how
fc->s_fs_info works.

> >>  static int bm_get_tree(struct fs_context *fc)
> >>  {
> >> -       return get_tree_single(fc, bm_fill_super);
> >> +       return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns));
> >
> > get_user_ns() increments the refcount of the namespace, but in the
> > case where a binfmt_misc mount already exists, that refcount is never
> > dropped, right? That would be a security bug, since an attacker could
> > overflow the refcount of the user namespace and then trigger a UAF.
> > (And the refcount hardening won't catch it because user namespaces
> > still use raw atomics instead of refcount_t.)
>
> Do you have any idea how I can fix this issue?

Ah, this was actually fine. I missed that get_tree_keyed() stashes
that pointer in fc->s_fs_info.

> >> +#if IS_ENABLED(CONFIG_BINFMT_MISC)
> >
> > Nit: Isn't this kind of check normally written as "#ifdef"?
> >
>
> What is the difference?

As explained in Documentation/process/coding-style.rst and the
relevant header, IS_ENABLED() is for inline use in C expressions.

  reply	other threads:[~2019-12-16 22:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-16  9:12 [PATCH v8 0/1] ns: introduce binfmt_misc namespace Laurent Vivier
2019-12-16  9:12 ` [PATCH v8 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier
2019-12-16 19:08   ` Jann Horn
2019-12-16 20:05     ` Laurent Vivier
2019-12-16 22:53       ` Jann Horn [this message]
2021-01-08  8:22   ` Jan Kiszka
2021-01-08  8:22     ` Jan Kiszka
2021-01-18 19:51     ` Laurent Vivier
2021-01-18 19:51       ` Laurent Vivier
2023-06-30  8:38       ` Norbert Lange
2023-06-30  8:52         ` Laurent Vivier
2023-06-30  9:06           ` Christian Brauner
2023-07-12 19:40             ` Kees Cook
2023-09-06 10:28               ` Norbert Lange
2023-10-11  0:36                 ` Kees Cook
2019-12-16  9:46 ` [PATCH v8 0/1] ns: introduce binfmt_misc namespace Christian Brauner
2019-12-16  9:53   ` Laurent Vivier
2019-12-16 10:06     ` Christian Brauner
2019-12-16 10:08       ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez2YE33KiuhnHa=cq_DymqWLAv9CyeD3BOrjsStKfb_dBQ@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=avagin@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=clg@kaod.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dima@arista.com \
    --cc=ebiederm@xmission.com \
    --cc=groug@kaod.org \
    --cc=henning.schild@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=laurent@vivier.eu \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.