Am 17.10.2019 18:26 schrieb "Yang, Philip" <Philip.Yang@amd.com>:


On 2019-10-17 4:54 a.m., Christian König wrote:
> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
>>>> From: Jason Gunthorpe <jgg@mellanox.com>
>>>>
>>>> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp,
>>>> hfi1,
>>>> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
>>>> they only use invalidate_range_start/end and immediately check the
>>>> invalidating range against some driver data structure to tell if the
>>>> driver is interested. Half of them use an interval_tree, the others are
>>>> simple linear search lists.
>>>>
>>>> Of the ones I checked they largely seem to have various kinds of races,
>>>> bugs and poor implementation. This is a result of the complexity in how
>>>> the notifier interacts with get_user_pages(). It is extremely
>>>> difficult to
>>>> use it correctly.
>>>>
>>>> Consolidate all of this code together into the core mmu_notifier and
>>>> provide a locking scheme similar to hmm_mirror that allows the user to
>>>> safely use get_user_pages() and reliably know if the page list still
>>>> matches the mm.
>>> That sounds really good, but could you outline for a moment how that is
>>> archived?
>> It uses the same basic scheme as hmm and rdma odp, outlined in the
>> revisions to hmm.rst later on.
>>
>> Basically,
>>
>>   seq = mmu_range_read_begin(&mrn);
>>
>>   // This is a speculative region
>>   .. get_user_pages()/hmm_range_fault() ..
>
> How do we enforce that this get_user_pages()/hmm_range_fault() doesn't
> see outdated page table information?
>
> In other words how the the following race prevented:
>
> CPU A CPU B
> invalidate_range_start()
>        mmu_range_read_begin()
>        get_user_pages()/hmm_range_fault()
> Updating the ptes
> invalidate_range_end()
>
>
> I mean get_user_pages() tries to circumvent this issue by grabbing a
> reference to the pages in question, but that isn't sufficient for the
> SVM use case.
>
> That's the reason why we had this horrible solution with a r/w lock and
> a linked list of BOs in an interval tree.
>
> Regards,
> Christian.
get_user_pages/hmm_range_fault() and invalidate_range_start() both are
called while holding mm->map_sem, so they are always serialized.

Not even remotely.

For calling get_user_pages()/hmm_range_fault() you only need to hold the mmap_sem in read mode.

And IIRC invalidate_range_start() is sometimes called without holding the mmap_sem at all.

So again how are they serialized?

Regards,
Christian.


Philip
>
>>   // Result cannot be derferenced
>>
>>   take_lock(driver->update);
>>   if (mmu_range_read_retry(&mrn, range.notifier_seq) {
>>      // collision! The results are not correct
>>      goto again
>>   }
>>
>>   // no collision, and now under lock. Now we can de-reference the
>> pages/etc
>>   // program HW
>>   // Now the invalidate callback is responsible to synchronize against
>> changes
>>   unlock(driver->update)
>>
>> Basically, anything that was using hmm_mirror correctly transisions
>> over fairly trivially, just with the modification to store a sequence
>> number to close that race described in the hmm commit.
>>
>> For something like AMD gpu I expect it to transition to use dma_fence
>> from the notifier for coherency right before it unlocks driver->update.
>>
>> Jason
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx