Am 17.10.2019 18:26 schrieb "Yang, Philip" : On 2019-10-17 4:54 a.m., Christian König wrote: > Am 16.10.19 um 18:04 schrieb Jason Gunthorpe: >> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote: >>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe: >>>> From: Jason Gunthorpe >>>> >>>> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, >>>> hfi1, >>>> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where >>>> they only use invalidate_range_start/end and immediately check the >>>> invalidating range against some driver data structure to tell if the >>>> driver is interested. Half of them use an interval_tree, the others are >>>> simple linear search lists. >>>> >>>> Of the ones I checked they largely seem to have various kinds of races, >>>> bugs and poor implementation. This is a result of the complexity in how >>>> the notifier interacts with get_user_pages(). It is extremely >>>> difficult to >>>> use it correctly. >>>> >>>> Consolidate all of this code together into the core mmu_notifier and >>>> provide a locking scheme similar to hmm_mirror that allows the user to >>>> safely use get_user_pages() and reliably know if the page list still >>>> matches the mm. >>> That sounds really good, but could you outline for a moment how that is >>> archived? >> It uses the same basic scheme as hmm and rdma odp, outlined in the >> revisions to hmm.rst later on. >> >> Basically, >> >> seq = mmu_range_read_begin(&mrn); >> >> // This is a speculative region >> .. get_user_pages()/hmm_range_fault() .. > > How do we enforce that this get_user_pages()/hmm_range_fault() doesn't > see outdated page table information? > > In other words how the the following race prevented: > > CPU A CPU B > invalidate_range_start() > mmu_range_read_begin() > get_user_pages()/hmm_range_fault() > Updating the ptes > invalidate_range_end() > > > I mean get_user_pages() tries to circumvent this issue by grabbing a > reference to the pages in question, but that isn't sufficient for the > SVM use case. > > That's the reason why we had this horrible solution with a r/w lock and > a linked list of BOs in an interval tree. > > Regards, > Christian. get_user_pages/hmm_range_fault() and invalidate_range_start() both are called while holding mm->map_sem, so they are always serialized. Not even remotely. For calling get_user_pages()/hmm_range_fault() you only need to hold the mmap_sem in read mode. And IIRC invalidate_range_start() is sometimes called without holding the mmap_sem at all. So again how are they serialized? Regards, Christian. Philip > >> // Result cannot be derferenced >> >> take_lock(driver->update); >> if (mmu_range_read_retry(&mrn, range.notifier_seq) { >> // collision! The results are not correct >> goto again >> } >> >> // no collision, and now under lock. Now we can de-reference the >> pages/etc >> // program HW >> // Now the invalidate callback is responsible to synchronize against >> changes >> unlock(driver->update) >> >> Basically, anything that was using hmm_mirror correctly transisions >> over fairly trivially, just with the modification to store a sequence >> number to close that race described in the hmm commit. >> >> For something like AMD gpu I expect it to transition to use dma_fence >> from the notifier for coherency right before it unlocks driver->update. >> >> Jason >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx