All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurent Vivier <laurent@vivier.eu>
To: Jann Horn <jannh@google.com>
Cc: ktkhai@virtuozzo.com, kernel list <linux-kernel@vger.kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	dima@arista.com, Linux API <linux-api@vger.kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Andrei Vagin <avagin@gmail.com>,
	containers@lists.linux-foundation.org
Subject: Re: [RFC v5 1/1] ns: add binfmt_misc to the user namespace
Date: Tue, 9 Oct 2018 18:57:41 +0200	[thread overview]
Message-ID: <795fea1a-97bd-87f5-6a14-6d47a4e42f74@vivier.eu> (raw)
In-Reply-To: <CAG48ez0pAxC+yetPfM+XyS8kfqA4hw1hxjKG55gbF-4qcNWUiA@mail.gmail.com>

Le 09/10/2018 à 18:53, Jann Horn a écrit :
> On Tue, Oct 9, 2018 at 6:45 PM Laurent Vivier <laurent@vivier.eu> wrote:
>> Le 09/10/2018 à 18:15, Kirill Tkhai a écrit :
>>> On 09.10.2018 13:37, Laurent Vivier wrote:
>>>> This patch allows to have a different binfmt_misc configuration
>>>> for each new user namespace. By default, the binfmt_misc configuration
>>>> is the one of the previous level, but if the binfmt_misc filesystem is
>>>> mounted in the new namespace a new empty binfmt instance is created and
>>>> used in this namespace.
>>>>
>>>> For instance, using "unshare" we can start a chroot of an another
>>>> architecture and configure the binfmt_misc interpreter without being root
>>>> to run the binaries in this chroot.
>>>>
>>>> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
>>>> ---
>>>>  fs/binfmt_misc.c               | 106 ++++++++++++++++++++++++---------
>>>>  include/linux/user_namespace.h |  13 ++++
>>>>  kernel/user.c                  |  13 ++++
>>>>  kernel/user_namespace.c        |   3 +
>>>>  4 files changed, 107 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
>>>> index aa4a7a23ff99..1e0029d097d9 100644
>>>> --- a/fs/binfmt_misc.c
>>>> +++ b/fs/binfmt_misc.c
>> ...
>>>> @@ -80,18 +74,32 @@ static int entry_count;
>>>>   */
>>>>  #define MAX_REGISTER_LENGTH 1920
>>>>
>>>> +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns)
>>>> +{
>>>> +    struct binfmt_namespace *b_ns;
>>>> +
>>>> +    while (ns) {
>>>> +            b_ns = READ_ONCE(ns->binfmt_ns);
>>>> +            if (b_ns)
>>>> +                    return b_ns;
>>>> +            ns = ns->parent;
>>>> +    }
>>>> +    WARN_ON_ONCE(1);
>>>> +    return NULL;
>>>> +}
>>>> +
>> ...
>>>> @@ -823,12 +847,34 @@ static const struct super_operations s_ops = {
>>>>  static int bm_fill_super(struct super_block *sb, void *data, int silent)
>>>>  {
>>>>      int err;
>>>> +    struct user_namespace *ns = sb->s_user_ns;
>>>>      static const struct tree_descr bm_files[] = {
>>>>              [2] = {"status", &bm_status_operations, S_IWUSR|S_IRUGO},
>>>>              [3] = {"register", &bm_register_operations, S_IWUSR},
>>>>              /* last one */ {""}
>>>>      };
>>>>
>>>> +    /* create a new binfmt namespace
>>>> +     * if we are not in the first user namespace
>>>> +     * but the binfmt namespace is the first one
>>>> +     */
>>>> +    if (READ_ONCE(ns->binfmt_ns) == NULL) {
>>>> +            struct binfmt_namespace *new_ns;
>>>> +
>>>> +            new_ns = kmalloc(sizeof(struct binfmt_namespace),
>>>> +                             GFP_KERNEL);
>>>> +            if (new_ns == NULL)
>>>> +                    return -ENOMEM;
>>>> +            INIT_LIST_HEAD(&new_ns->entries);
>>>> +            new_ns->enabled = 1;
>>>> +            rwlock_init(&new_ns->entries_lock);
>>>> +            new_ns->bm_mnt = NULL;
>>>> +            new_ns->entry_count = 0;
>>>> +            /* ensure new_ns is completely initialized before sharing it */
>>>> +            smp_wmb();
>>>
>>> (I haven't dived into patch logic, here just small barrier remark from quick sight).
>>> smp_wmb() has no sense without paired smp_rmb() on the read side. Possible,
>>> you want something like below in read hunk:
>>>
>>> +             b_ns = READ_ONCE(ns->binfmt_ns);
>>> +             if (b_ns) {
>>> +                     smp_rmb();
>>> +                     return b_ns;
>>> +             }
>>>
>>>
>>
>> The write barrier is here to ensure the structure is fully written
>> before we set the pointer.
>>
>> I don't understand how read barrier can change something at this level,
>> IMHO the couple WRITE_ONCE()/READ_ONCE() should be enough to ensure we
>> have correctly initialized the pointer and the structure when we read
>> the pointer back.
>>
>> I think the pointer itself is the "barrier" to access the memory
>> modified before.
> 
> Things don't work that way on alpha, but that's why READ_ONCE()
> includes an smp_read_barrier_depends():
> 
> #define __READ_ONCE(x, check)                                           \
> ({                                                                      \
>         union { typeof(x) __val; char __c[1]; } __u;                    \
>         if (check)                                                      \
>                 __read_once_size(&(x), __u.__c, sizeof(x));             \
>         else                                                            \
>                 __read_once_size_nocheck(&(x), __u.__c, sizeof(x));     \
>         smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
>         __u.__val;                                                      \
> })
> #define READ_ONCE(x) __READ_ONCE(x, 1)
> 

So my questions are:

- do we need a smp_wmb() barrier if we use READ_ONCE() and WRITE_ONCE()?

- if we need an smp_wmb() barrier, do we need an smp_rmb() barrier as
the data we want to "protect" are behind an access to the pointer?

Thanks,
Laurent

  reply	other threads:[~2018-10-09 16:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-09 10:37 [RFC v5 0/1] ns: introduce binfmt_misc namespace Laurent Vivier
2018-10-09 10:37 ` [RFC v5 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier
2018-10-09 12:43   ` Jann Horn
2018-10-09 13:06     ` Laurent Vivier
2018-10-09 13:15       ` Jann Horn
2018-10-09 15:16   ` Tycho Andersen
2018-10-09 15:19     ` Laurent Vivier
2018-10-10  7:14       ` Aleksa Sarai
2018-10-10  7:14         ` Aleksa Sarai
2018-10-10 10:11       ` Laurent Vivier
2018-10-09 16:15   ` Kirill Tkhai
2018-10-09 16:15     ` Kirill Tkhai
2018-10-09 16:45     ` Laurent Vivier
2018-10-09 16:45       ` Laurent Vivier
2018-10-09 16:53       ` Jann Horn
2018-10-09 16:57         ` Laurent Vivier [this message]
2018-10-09 17:01           ` Jann Horn
2018-10-09 17:01       ` Kirill Tkhai
2018-10-09 17:01         ` Kirill Tkhai
2018-10-09 17:22         ` Laurent Vivier
2018-10-09 17:22           ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=795fea1a-97bd-87f5-6a14-6d47a4e42f74@vivier.eu \
    --to=laurent@vivier.eu \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=avagin@gmail.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=dima@arista.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.