Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Yang, Philip" <Philip.Yang@amd.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Jerome Glisse <jglisse@redhat.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>
Subject: Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking
Date: Mon, 21 Oct 2019 14:28:46 +0000	[thread overview]
Message-ID: <e07092c3-8ccd-9814-835c-6c462017aff8@amd.com> (raw)
In-Reply-To: <20191021135744.GA25164@mellanox.com>

Am 21.10.19 um 15:57 schrieb Jason Gunthorpe:
> On Sun, Oct 20, 2019 at 02:21:42PM +0000, Koenig, Christian wrote:
>> Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
>>> On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote:
>>> [SNIP]
>>>    
>>>> So again how are they serialized?
>>> The 'driver lock' thing does it, read the hmm documentation, the hmm
>>> approach is basically the only approach that was correct of all the
>>> drivers..
>> Well that's what I've did, but what HMM does still doesn't looks correct
>> to me.
> It has a bug, but the basic flow seems to work.
>
> https://patchwork.kernel.org/patch/11191

Maybe wrong link? That link looks like an unrelated discussion on kernel 
image relocation.

>>> So long as the 'driver lock' is held the range cannot become
>>> invalidated as the 'driver lock' prevents progress of invalidation.
>> Correct, but the problem is it doesn't wait for ongoing operations to
>> complete.
>>
>> See I'm talking about the following case:
>>
>> Thread A    Thread B
>> invalidate_range_start()
>>                       mmu_range_read_begin()
>>                       get_user_pages()/hmm_range_fault()
>>                       grab_driver_lock()
>> Updating the ptes
>> invalidate_range_end()
>>
>> As far as I can see in invalidate_range_start() the driver lock is taken
>> to make sure that we can't start any invalidation while the driver is
>> using the pages for a command submission.
> Again, this uses the seqlock like scheme *and* the driver lock.
>
> In this case after grab_driver_lock() mmu_range_read_retry() will
> return false if Thread A has progressed to 'updating the ptes.
>
> For instance here is how the concurrency resolves for retry:
>
>         CPU1                                CPU2
>                                    seq = mmu_range_read_begin()
> invalidate_range_start()
>    invalidate_seq++

How that was order was what confusing me. But I've read up on the code 
in mmu_range_read_begin() and found the lines I was looking for:

+    if (is_invalidating)
+        wait_event(mmn_mm->wq,
+               READ_ONCE(mmn_mm->invalidate_seq) != seq);

[SNIP]

> For the above I've simplified the mechanics of the invalidate_seq, you
> need to look through the patch to see how it actually works.

Yea, that you also allow multiple write sides is pretty neat.

>> Well we don't update the seqlock after the update to the protected data
>> structure (the page table) happened, but rather before that.
> ??? This is what mn_itree_inv_end() does, it is called by
> invalidate_range_end
>
>> That doesn't looks like the normal patter for a seqlock to me and as far
>> as I can see that is quite a bug in the HMM design/logic.
> Well, hmm has a bug because it doesn't use a seqlock pattern, see the
> above URL.
>
> One of the motivations for this work is to squash that bug by adding a
> seqlock like pattern. But the basic hmm flow and collision-retry
> approach seems sound.
>
> Do you see a problem with this patch?

No, not any more.

Essentially you are doing the same thing I've tried to do with the 
original amdgpu implementation. The difference is that you don't try to 
use a per range sequence (which is a good idea, we never got that fully 
working) and you allow multiple writers at the same time.

Feel free to stitch an Acked-by: Christian König 
<christian.koenig@amd.com> on patch #2, but you still doing a bunch of 
things in there which are way beyond my understanding (e.g. where are 
all the SMP barriers?).

Cheers,
Christian.

>
> Jason