All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org, linux-mm@kvack.org,
	Ralph Campbell <rcampbell@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>
Subject: Re: [RFC PATCH 00/11] mm/hmm: Various revisions from a locking/code review
Date: Fri, 24 May 2019 14:46:08 -0400	[thread overview]
Message-ID: <20190524184608.GE3346@redhat.com> (raw)
In-Reply-To: <20190524183225.GI16845@ziepe.ca>

On Fri, May 24, 2019 at 03:32:25PM -0300, Jason Gunthorpe wrote:
> On Fri, May 24, 2019 at 02:03:22PM -0400, Jerome Glisse wrote:
> > On Fri, May 24, 2019 at 02:52:03PM -0300, Jason Gunthorpe wrote:
> > > On Fri, May 24, 2019 at 01:01:49PM -0400, Jerome Glisse wrote:
> > > > On Fri, May 24, 2019 at 01:59:31PM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, May 24, 2019 at 12:49:02PM -0400, Jerome Glisse wrote:
> > > > > > On Fri, May 24, 2019 at 11:36:49AM -0300, Jason Gunthorpe wrote:
> > > > > > > On Thu, May 23, 2019 at 12:34:25PM -0300, Jason Gunthorpe wrote:
> > > > > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > > > > 
> > > > > > > > This patch series arised out of discussions with Jerome when looking at the
> > > > > > > > ODP changes, particularly informed by use after free races we have already
> > > > > > > > found and fixed in the ODP code (thanks to syzkaller) working with mmu
> > > > > > > > notifiers, and the discussion with Ralph on how to resolve the lifetime model.
> > > > > > > 
> > > > > > > So the last big difference with ODP's flow is how 'range->valid'
> > > > > > > works.
> > > > > > > 
> > > > > > > In ODP this was done using the rwsem umem->umem_rwsem which is
> > > > > > > obtained for read in invalidate_start and released in invalidate_end.
> > > > > > > 
> > > > > > > Then any other threads that wish to only work on a umem which is not
> > > > > > > undergoing invalidation will obtain the write side of the lock, and
> > > > > > > within that lock's critical section the virtual address range is known
> > > > > > > to not be invalidating.
> > > > > > > 
> > > > > > > I cannot understand how hmm gets to the same approach. It has
> > > > > > > range->valid, but it is not locked by anything that I can see, so when
> > > > > > > we test it in places like hmm_range_fault it seems useless..
> > > > > > > 
> > > > > > > Jerome, how does this work?
> > > > > > > 
> > > > > > > I have a feeling we should copy the approach from ODP and use an
> > > > > > > actual lock here.
> > > > > > 
> > > > > > range->valid is use as bail early if invalidation is happening in
> > > > > > hmm_range_fault() to avoid doing useless work. The synchronization
> > > > > > is explained in the documentation:
> > > > > 
> > > > > That just says the hmm APIs handle locking. I asked how the apis
> > > > > implement that locking internally.
> > > > > 
> > > > > Are you trying to say that if I do this, hmm will still work completely
> > > > > correctly?
> > > > 
> > > > Yes it will keep working correctly. You would just be doing potentialy
> > > > useless work.
> > > 
> > > I don't see how it works correctly.
> > > 
> > > Apply the comment out patch I showed and this trivially happens:
> > > 
> > >       CPU0                                               CPU1
> > >   hmm_invalidate_start()
> > >     ops->sync_cpu_device_pagetables()
> > >       device_lock()
> > >        // Wipe out page tables in device, enable faulting
> > >       device_unlock()
> > > 
> > >                                                        DEVICE PAGE FAULT
> > >                                                        device_lock()
> > >                                                        hmm_range_register()
> > >                                                        hmm_range_dma_map()
> > >                                                        device_unlock()
> > >   hmm_invalidate_end()
> > 
> > No in the above scenario hmm_range_register() will not mark the range
> > as valid thus the driver will bailout after taking its lock and checking
> > the range->valid value.
> 
> I see your confusion, I only asked about removing valid from hmm.c,
> not the unlocked use of valid in your hmm.rst example. My mistake,
> sorry for being unclear.

No i did understand properly and it is fine to remove all the valid
check within hmm_range_fault() or hmm_range_snapshot() nothing bad
will come out of that.

> 
> Here is the big 3 CPU ladder diagram that shows how 'valid' does not
> work:
> 
>        CPU0                                               CPU1                                          CPU2
>                                                         DEVICE PAGE FAULT
>                                                         range = hmm_range_register()
>
>   // Overlaps with range
>   hmm_invalidate_start()
>     range->valid = false
>     ops->sync_cpu_device_pagetables()
>       take_lock(driver->update);
>        // Wipe out page tables in device, enable faulting
>       release_lock(driver->update);
>                                                                                                    // Does not overlap with range
>                                                                                                    hmm_invalidate_start()
>                                                                                                    hmm_invalidate_end()
>                                                                                                        list_for_each
>                                                                                                            range->valid =  true

                                                                                                             ^
No this can not happen because CPU0 still has invalidate_range in progress and
thus hmm->notifiers > 0 so the hmm_invalidate_range_end() will not set the
range->valid as true.

>
>
>                                                        device_lock()
>                                                        // Note range->valid = true now
>                                                        hmm_range_snapshot(&range);
>                                                        take_lock(driver->update);
>                                                        if (!hmm_range_valid(&range))
>                                                            goto again
>                                                        ESTABLISHE SPTES
>                                                        device_unlock()
>   hmm_invalidate_end()
> 
> 
> And I can make this more complicated (ie overlapping parallel
> invalidates, etc) and show any 'bool' valid cannot work.

It does work. If you want i can remove the range->valid = true from the
hmm_invalidate_range_end() and move it within hmm_range_wait_until_valid()
ie modifying the hmm_range_wait_until_valid() logic, this might look
cleaner.

> > > The mmu notifier spec says:
> > > 
> > >  	 * Invalidation of multiple concurrent ranges may be
> > > 	 * optionally permitted by the driver. Either way the
> > > 	 * establishment of sptes is forbidden in the range passed to
> > > 	 * invalidate_range_begin/end for the whole duration of the
> > > 	 * invalidate_range_begin/end critical section.
> > > 
> > > And I understand "establishment of sptes is forbidden" means
> > > "hmm_range_dmap_map() must fail with EAGAIN". 
> > 
> > No it means that secondary page table entry (SPTE) must not
> > materialize thus what hmm_range_dmap_map() is doing if fine and safe
> > as long as the driver do not use the result to populate the device
> > page table if there was an invalidation for the range.
> 
> Okay, so we agree, if there is an invalidate_start/end critical region
> then it is OK to *call* hmm_range_dmap_map(), however the driver must
> not *use* the result, and you are expecting this bit:
> 
>       take_lock(driver->update);
>       if (!hmm_range_valid(&range)) {
>          goto again
> 
> In your hmm.rst to prevent the pfns from being used by the driver?
> 
> I think the above ladder shows that hmm_range_valid can return true
> during a invalidate_start/end critical region, so this is a problem.
>
> I still think the best solution is to move device_lock() into mirror
> and have hmm manage it for the driver as ODP does. It is certainly the
> simplest solution to understand.

It is un-efficient and would block further than needed forward progress
by mm code.

Cheers,
Jérôme


  reply	other threads:[~2019-05-24 18:46 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 15:34 [RFC PATCH 00/11] mm/hmm: Various revisions from a locking/code review Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 01/11] mm/hmm: Fix use after free with struct hmm in the mmu notifiers Jason Gunthorpe
2019-06-06 23:54   ` Ira Weiny
2019-06-07 14:17     ` Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 02/11] mm/hmm: Use hmm_mirror not mm as an argument for hmm_register_range Jason Gunthorpe
2019-05-23 18:22   ` Christoph Hellwig
2019-05-23 15:34 ` [RFC PATCH 03/11] mm/hmm: Hold a mmgrab from hmm to mm Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 04/11] mm/hmm: Simplify hmm_get_or_create and make it reliable Jason Gunthorpe
2019-05-23 23:38   ` Ralph Campbell
2019-05-24  1:23     ` Jason Gunthorpe
2019-05-24 17:06       ` Ralph Campbell
2019-05-23 15:34 ` [RFC PATCH 05/11] mm/hmm: Improve locking around hmm->dead Jason Gunthorpe
2019-05-24 13:40   ` Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 06/11] mm/hmm: Remove duplicate condition test before wait_event_timeout Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 07/11] mm/hmm: Delete hmm_mirror_mm_is_alive() Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 08/11] mm/hmm: Use lockdep instead of comments Jason Gunthorpe
2019-06-07 19:33   ` Souptick Joarder
2019-06-07 19:39     ` Jason Gunthorpe
2019-06-07 21:02       ` Souptick Joarder
2019-06-08  1:15         ` Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 09/11] mm/hmm: Remove racy protection against double-unregistration Jason Gunthorpe
2019-06-07 19:38   ` Souptick Joarder
2019-06-07 19:37     ` Jason Gunthorpe
2019-06-07 19:55       ` Souptick Joarder
2019-05-23 15:34 ` [RFC PATCH 10/11] mm/hmm: Poison hmm_range during unregister Jason Gunthorpe
2019-06-07 20:13   ` Souptick Joarder
2019-06-07 20:18     ` Jason Gunthorpe
2019-05-23 15:34 ` [RFC PATCH 11/11] mm/hmm: Do not use list*_rcu() for hmm->ranges Jason Gunthorpe
2019-06-07 20:22   ` Souptick Joarder
2019-05-23 19:04 ` [RFC PATCH 00/11] mm/hmm: Various revisions from a locking/code review John Hubbard
2019-05-23 19:37   ` Jason Gunthorpe
2019-05-23 20:59   ` Jerome Glisse
2019-05-24 13:35 ` Jason Gunthorpe
2019-05-24 14:36 ` Jason Gunthorpe
2019-05-24 16:49   ` Jerome Glisse
2019-05-24 16:59     ` Jason Gunthorpe
2019-05-24 17:01       ` Jerome Glisse
2019-05-24 17:52         ` Jason Gunthorpe
2019-05-24 18:03           ` Jerome Glisse
2019-05-24 18:32             ` Jason Gunthorpe
2019-05-24 18:46               ` Jerome Glisse [this message]
2019-05-24 22:09                 ` Jason Gunthorpe
2019-05-27 19:58                   ` Jason Gunthorpe
2019-05-24 17:47     ` Ralph Campbell
2019-05-24 17:51       ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190524184608.GE3346@redhat.com \
    --to=jglisse@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.