Re: [PATCH] mm: gup: fix the fast GUP race against THP collapse

From: David Hildenbrand <david@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Yang Shi <shy828301@gmail.com>,
	peterx@redhat.com, kirill.shutemov@linux.intel.com,
	jgg@nvidia.com, hughd@google.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: gup: fix the fast GUP race against THP collapse
Date: Tue, 6 Sep 2022 14:50:46 +0200	[thread overview]
Message-ID: <30e2f5e1-a2de-0036-6242-b6f7021a8692@redhat.com> (raw)
In-Reply-To: <c38494f0-1bc5-61e4-8459-be9160029539@nvidia.com>

> OK, I believe you're referring to this:
> 
> 	folio = try_grab_folio(page, 1, flags);
> 
> just earlier in gup_pte_range(). Yes that's true...but it's hidden, which
> is unfortunate. Maybe a comment could help.
> 

Most certainly.

>>
>> If we still intend to change that code, we should fixup all GUP-fast
>> functions in a similar way. But again, I don't think we need a change here.
>>
> 
> It's really rough, having to play this hide-and-seek game of "who did
> the memory barrier". And I'm tempted to suggest adding READ_ONCE() to
> any and all reads of the page table entries, just to help stay out of
> trouble. It's a visual reminder that page table reads are always a
> lockless read and are inherently volatile.
> 
> Of course, I realize that adding extra READ_ONCE() calls is not a good
> thing. It might be a performance hit, although, again, these are
> volatile reads by nature, so you probably had a membar anyway.
> 
> And looking in reverse, there are actually a number of places here where
> we could probably get away with *removing* READ_ONCE()!
> 
> Overall, I would be inclined to load up on READ_ONCE() calls, yes. But I
> sort of expect to be overridden on that, due to potential performance
> concerns, and that's reasonable.
> 
> At a minimum we should add a few short comments about what memory
> barriers are used, and why we don't need a READ_ONCE() or something
> stronger when reading the pte.

Adding more unnecessary memory barriers doesn't necessarily improve the 
situation.

Messing with memory barriers is and remains absolutely disgusting.

IMHO, only clear documentation and ASCII art can keep it somehow 
maintainable for human beings.

> 
> 
>>
>>>> -	 * After this gup_fast can't run anymore. This also removes
>>>> -	 * any huge TLB entry from the CPU so we won't allow
>>>> -	 * huge and small TLB entries for the same virtual address
>>>> -	 * to avoid the risk of CPU bugs in that area.
>>>> +	 * This removes any huge TLB entry from the CPU so we won't allow
>>>> +	 * huge and small TLB entries for the same virtual address to
>>>> +	 * avoid the risk of CPU bugs in that area.
>>>> +	 *
>>>> +	 * Parallel fast GUP is fine since fast GUP will back off when
>>>> +	 * it detects PMD is changed.
>>>>    	 */
>>>>    	_pmd = pmdp_collapse_flush(vma, address, pmd);
>>>
>>> To follow up on David Hildenbrand's note about this in the nearby thread...
>>> I'm also not sure if pmdp_collapse_flush() implies a memory barrier on
>>> all arches. It definitely does do an atomic op with a return value on x86,
>>> but that's just one arch.
>>>
>>
>> I think a ptep/pmdp clear + TLB flush really has to imply a memory
>> barrier, otherwise TLB flushing code might easily mess up with
>> surrounding code. But we should better double-check.
> 
> Let's document the function as such, once it's verified: "This is a
> guaranteed memory barrier".

Yes. Hopefully it indeed is on all architectures :)

-- 
Thanks,

David / dhildenb