linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	raven@themaw.net, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-block@vger.kernel.org, keyrings@vger.kernel.org,
	linux-security-module <linux-security-module@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/7] General notification queue with user mmap()'able ring buffer
Date: Wed, 29 May 2019 01:16:26 +0200	[thread overview]
Message-ID: <CAG48ez3W=Og7_9P9X6fatkCw4Zwbz-N35wRGFFNK8q9k0qL5iw@mail.gmail.com> (raw)
In-Reply-To: <11466.1559082515@warthog.procyon.org.uk>

On Wed, May 29, 2019 at 12:28 AM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
> > I don't see you setting any special properties on the VMA that would
> > prevent userspace from extending its size via mremap() - no
> > VM_DONTEXPAND or VM_PFNMAP. So I think you might get an out-of-bounds
> > access here?
>
> Should I just set VM_DONTEXPAND in watch_queue_mmap()?  Like so:
>
>         vma->vm_flags |= VM_DONTEXPAND;

Yeah, that should work.

> > > +static void watch_queue_map_pages(struct vm_fault *vmf,
> > > +                                 pgoff_t start_pgoff, pgoff_t end_pgoff)
> > ...
> >
> > Same as above.
>
> Same solution as above?  Or do I need ot check start/end_pgoff too?

Same solution.

> > > +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma)
> > > +{
> > > +       struct watch_queue *wqueue = file->private_data;
> > > +
> > > +       if (vma->vm_pgoff != 0 ||
> > > +           vma->vm_end - vma->vm_start > wqueue->nr_pages * PAGE_SIZE ||
> > > +           !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED)))
> > > +               return -EINVAL;
> >
> > This thing should probably have locking against concurrent
> > watch_queue_set_size()?
>
> Yeah.
>
>         static int watch_queue_mmap(struct file *file,
>                                     struct vm_area_struct *vma)
>         {
>                 struct watch_queue *wqueue = file->private_data;
>                 struct inode *inode = file_inode(file);
>                 u8 nr_pages;
>
>                 inode_lock(inode);
>                 nr_pages = wqueue->nr_pages;
>                 inode_unlock(inode);
>
>                 if (nr_pages == 0 ||
>                 ...
>                         return -EINVAL;

Looks reasonable.

> > > +       smp_store_release(&buf->meta.head, len);
> >
> > Why is this an smp_store_release()? The entire buffer should not be visible to
> > userspace before this setup is complete, right?
>
> Yes - if I put the above locking in the mmap handler.  OTOH, it's a one-off
> barrier...
>
> > > +               if (wqueue->buffer)
> > > +                       return -EBUSY;
> >
> > The preceding check occurs without any locks held and therefore has no
> > reliable effect. It should probably be moved below the
> > inode_lock(...).
>
> Yeah, it can race.  I'll move it into watch_queue_set_size().

Sounds good.

> > > +static void free_watch(struct rcu_head *rcu)
> > > +{
> > > +       struct watch *watch = container_of(rcu, struct watch, rcu);
> > > +
> > > +       put_watch_queue(rcu_access_pointer(watch->queue));
> >
> > This should be rcu_dereference_protected(..., 1).
>
> That shouldn't be necessary.  rcu_access_pointer()'s description says:
>
>  * It is also permissible to use rcu_access_pointer() when read-side
>  * access to the pointer was removed at least one grace period ago, as
>  * is the case in the context of the RCU callback that is freeing up
>  * the data, ...
>
> It's in an rcu callback function, so accessing the __rcu pointers in the RCU'd
> struct should be fine with rcu_access_pointer().

Aah, whoops, you're right, I missed that paragraph in the
documentation of rcu_access_pointer().

> > > +       /* We don't need the watch list lock for the next bit as RCU is
> > > +        * protecting everything from being deallocated.
> >
> > Does "everything" mean "the wqueue" or more than that?
>
> Actually, just 'wqueue' and its buffer.  'watch' is held by us once we've
> dequeued it as we now own the ref 'wlist' had on it.  'wlist' and 'wq' must be
> pinned by the caller.
>
> > > +                       if (release) {
> > > +                               if (wlist->release_watch) {
> > > +                                       rcu_read_unlock();
> > > +                                       /* This might need to call dput(), so
> > > +                                        * we have to drop all the locks.
> > > +                                        */
> > > +                                       wlist->release_watch(wlist, watch);
> >
> > How are you holding a reference to `wlist` here? You got the reference through
> > rcu_dereference(), you've dropped the RCU read lock, and I don't see anything
> > that stabilizes the reference.
>
> The watch record must hold a ref on the watched object if the watch_list has a
> ->release_watch() method.  In the code snippet above, the watch record now
> belongs to us because we unlinked it under the wlist->lock some lines prior.

Ah, of course.

> However, you raise a good point, and I think the thing to do is to cache
> ->release_watch from it and not pass wlist into (*release_watch)().  We don't
> need to concern ourselves with cleaning up *wlist as it will be cleaned up
> when the target object is removed.
>
> Keyrings don't have a ->release_watch method and neither does the block-layer
> notification stuff.
>
> > > +       if (wqueue->pages && wqueue->pages[0])
> > > +               WARN_ON(page_ref_count(wqueue->pages[0]) != 1);
> >
> > Is there a reason why there couldn't still be references to the pages
> > from get_user_pages()/get_user_pages_fast()?
>
> I'm not sure.  I'm not sure what to do if there are.  What do you suggest?

I would use put_page() instead of manually freeing it; I think that
should be enough? I'm not entirely sure though.

> > > +       n->info &= (WATCH_INFO_LENGTH | WATCH_INFO_TYPE_FLAGS | WATCH_INFO_ID);
> >
> > Should the non-atomic modification of n->info
>
> n's an unpublished copy of some userspace data that's local to the function
> instance.  There shouldn't be any way to race with it at this point.

Ah, derp. Yes.

  reply	other threads:[~2019-05-28 23:16 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-28 16:01 [RFC][PATCH 0/7] Mount, FS, Block and Keyrings notifications David Howells
2019-05-28 16:01 ` [PATCH 1/7] General notification queue with user mmap()'able ring buffer David Howells
2019-05-28 16:26   ` Greg KH
2019-05-28 17:30   ` David Howells
2019-05-28 23:12     ` Greg KH
2019-05-29 16:06     ` David Howells
2019-05-29 17:46       ` Jann Horn
2019-05-29 21:02       ` David Howells
2019-05-31 11:14         ` Peter Zijlstra
2019-05-31 12:02         ` David Howells
2019-05-31 13:26           ` Peter Zijlstra
2019-05-31 14:20           ` David Howells
2019-05-31 16:44             ` Peter Zijlstra
2019-05-31 17:12             ` David Howells
2019-06-17 16:24               ` Peter Zijlstra
2019-05-29 23:09       ` Greg KH
2019-05-29 23:11       ` Greg KH
2019-05-30  9:50         ` Andrea Parri
2019-05-31  8:35           ` Peter Zijlstra
2019-05-31  8:47       ` Peter Zijlstra
2019-05-31 12:42       ` David Howells
2019-05-31 14:55       ` David Howells
2019-05-28 19:14   ` Jann Horn
2019-05-28 22:28   ` David Howells
2019-05-28 23:16     ` Jann Horn [this message]
2019-05-28 16:02 ` [PATCH 2/7] keys: Add a notification facility David Howells
2019-05-28 16:02 ` [PATCH 3/7] vfs: Add a mount-notification facility David Howells
2019-05-28 20:06   ` Jann Horn
2019-05-28 23:04   ` David Howells
2019-05-28 23:23     ` Jann Horn
2019-05-29 11:16     ` David Howells
2019-05-28 23:08   ` David Howells
2019-05-29 10:55   ` David Howells
2019-05-29 11:00   ` David Howells
2019-05-29 15:53     ` Casey Schaufler
2019-05-29 16:12       ` Jann Horn
2019-05-29 17:04         ` Casey Schaufler
2019-06-03 16:30         ` David Howells
2019-05-29 17:13       ` Andy Lutomirski
2019-05-29 17:46         ` Casey Schaufler
2019-05-29 18:11           ` Jann Horn
2019-05-29 19:28             ` Casey Schaufler
2019-05-29 19:47               ` Jann Horn
2019-05-29 20:50                 ` Casey Schaufler
2019-05-29 23:12           ` Andy Lutomirski
2019-05-29 23:56             ` Casey Schaufler
2019-05-28 16:02 ` [PATCH 4/7] vfs: Add superblock notifications David Howells
2019-05-28 20:27   ` Jann Horn
2019-05-29 12:58   ` David Howells
2019-05-29 14:16     ` Jann Horn
2019-05-28 16:02 ` [PATCH 5/7] fsinfo: Export superblock notification counter David Howells
2019-05-28 16:02 ` [PATCH 6/7] block: Add block layer notifications David Howells
2019-05-28 20:37   ` Jann Horn
2019-05-28 16:02 ` [PATCH 7/7] Add sample notification program David Howells
2019-05-28 23:58 ` [RFC][PATCH 0/7] Mount, FS, Block and Keyrings notifications Greg KH
2019-05-29  6:33 ` Amir Goldstein
2019-05-29 14:25   ` Jan Kara
2019-05-29 15:10     ` Greg KH
2019-05-29 15:53     ` Amir Goldstein
2019-05-30 11:00       ` Jan Kara
2019-06-04 12:33     ` David Howells
2019-05-29  6:45 ` David Howells
2019-05-29  7:40   ` Amir Goldstein
2019-05-29  9:09 ` David Howells
2019-05-29 15:41   ` Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez3W=Og7_9P9X6fatkCw4Zwbz-N35wRGFFNK8q9k0qL5iw@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=dhowells@redhat.com \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).