From: "Thomas Hellström (Intel)" <firstname.lastname@example.org>
To: Jason Gunthorpe <email@example.com>
Cc: Dave Hansen <firstname.lastname@example.org>,
"Williams, Dan J" <email@example.com>,
Nick Piggin <firstname.lastname@example.org>
Subject: Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages
Date: Fri, 26 Mar 2021 13:33:29 +0100 [thread overview]
Message-ID: <email@example.com> (raw)
On 3/26/21 12:46 PM, Jason Gunthorpe wrote:
> On Fri, Mar 26, 2021 at 10:08:09AM +0100, Thomas Hellström (Intel) wrote:
>> On 3/25/21 7:24 PM, Jason Gunthorpe wrote:
>>> On Thu, Mar 25, 2021 at 07:13:33PM +0100, Thomas Hellström (Intel) wrote:
>>>> On 3/25/21 6:55 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Mar 25, 2021 at 06:51:26PM +0100, Thomas Hellström (Intel) wrote:
>>>>>> On 3/24/21 9:25 PM, Dave Hansen wrote:
>>>>>>> On 3/24/21 1:22 PM, Thomas Hellström (Intel) wrote:
>>>>>>>>> We also have not been careful at *all* about how _PAGE_BIT_SOFTW* are
>>>>>>>>> used. It's quite possible we can encode another use even in the
>>>>>>>>> existing bits.
>>>>>>>>> Personally, I'd just try:
>>>>>>>>> #define _PAGE_BIT_SOFTW5 57 /* available for programmer */
>>>>>>>> OK, I'll follow your advise here. FWIW I grepped for SW1 and it seems
>>>>>>>> used in a selftest, but only for PTEs AFAICT.
>>>>>>>> Oh, and we don't care about 32-bit much anymore?
>>>>>>> On x86, we have 64-bit PTEs when running 32-bit kernels if PAE is
>>>>>>> enabled. IOW, we can handle the majority of 32-bit CPUs out there.
>>>>>>> But, yeah, we don't care about 32-bit. :)
>>>>>> Actually it makes some sense to use SW1, to make it end up in the same dword
>>>>>> as the PSE bit, as from what I can tell, reading of a 64-bit pmd_t on 32-bit
>>>>>> PAE is not atomic, so in theory a huge pmd could be modified while reading
>>>>>> the pmd_t making the dwords inconsistent.... How does that work with fast
>>>>>> gup anyway?
>>>>> It loops to get an atomic 64 bit value if the arch can't provide an
>>>>> atomic 64 bit load
>>>> Hmm, ok, I see a READ_ONCE() in gup_pmd_range(), and then the resulting pmd
>>>> is dereferenced either in try_grab_compound_head() or __gup_device_huge(),
>>>> before the pmd is compared to the value the pointer is currently pointing
>>>> to. Couldn't those dereferences be on invalid pointers?
>>> Uhhhhh.. That does look questionable, yes. Unless there is some tricky
>>> reason why a 64 bit pmd entry on a 32 bit arch either can't exist or
>>> has a stable upper 32 bits..
>>> The pte does it with ptep_get_lockless(), we probably need the same
>>> for the other levels too instead of open coding a READ_ONCE?
>> TBH, ptep_get_lockless() also looks a bit fishy. it says
>> "it will not switch to a completely different present page without a TLB
>> flush in between".
>> What if the following happens:
>> processor 1: Reads lower dword of PTE.
>> processor 2: Zaps PTE. Gets stuck waiting to do TLB flush
>> processor 1: Reads upper dword of PTE, which is now zero.
>> processor 3: Hits a TLB miss, reads an unpopulated PTE and faults in a new
>> PTE value which happens to be the same as the original one before the zap.
>> processor 1: Reads the newly faulted in lower dword, compares to the old
>> one, gives an OK and returns a bogus PTE.
> So you are saying that while the zap will wait for the TLB flush to
> globally finish once it gets started any other processor can still
> write to the pte?
> I can't think of any serialization that would cause fault to wait for
> the zap/TLB flush, especially if the zap comes from the address_space
> and doesn't hold the mmap lock.
I might of course be completely wrong, but It seems there is an
assumption made that all potentially affected processors would have a
valid TLB entry for the PTE. Then the fault would not happen (well
unless of course the TLB flush completes on some processors before
getting stuck on the local_irq_disable() on processor 1).
+CC: Nick Piggin
Seems like Nick Piggin is the original author of the comment. Perhaps he
can can clarify a bit.
> Seems worth bringing up in a bigger thread, maybe someone else knows?
next prev parent reply other threads:[~2021-03-26 12:33 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-21 18:45 [RFC PATCH 0/2] mm,drm/ttm: Always block GUP to TTM pages Thomas Hellström (Intel)
2021-03-21 18:45 ` [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages Thomas Hellström (Intel)
2021-03-23 11:34 ` Daniel Vetter
2021-03-23 16:34 ` Thomas Hellström (Intel)
2021-03-23 16:37 ` Jason Gunthorpe
2021-03-23 16:59 ` Christoph Hellwig
2021-03-23 17:06 ` Thomas Hellström (Intel)
2021-03-24 9:56 ` Daniel Vetter
2021-03-24 12:24 ` Jason Gunthorpe
2021-03-24 12:35 ` Thomas Hellström (Intel)
2021-03-24 12:41 ` Jason Gunthorpe
2021-03-24 13:35 ` Thomas Hellström (Intel)
2021-03-24 13:48 ` Jason Gunthorpe
2021-03-24 15:50 ` Thomas Hellström (Intel)
2021-03-24 16:38 ` Jason Gunthorpe
2021-03-24 18:31 ` Christian König
2021-03-24 20:07 ` Thomas Hellström (Intel)
2021-03-24 23:14 ` Jason Gunthorpe
2021-03-25 7:48 ` Thomas Hellström (Intel)
2021-03-25 8:27 ` Christian König
2021-03-25 9:51 ` Thomas Hellström (Intel)
2021-03-25 11:30 ` Jason Gunthorpe
2021-03-25 11:53 ` Thomas Hellström (Intel)
2021-03-25 12:01 ` Jason Gunthorpe
2021-03-25 12:09 ` Christian König
2021-03-25 12:36 ` Thomas Hellström (Intel)
2021-03-25 13:02 ` Christian König
2021-03-25 13:31 ` Thomas Hellström (Intel)
2021-03-25 12:42 ` Jason Gunthorpe
2021-03-25 13:05 ` Christian König
2021-03-25 13:17 ` Jason Gunthorpe
2021-03-25 13:26 ` Christian König
2021-03-25 13:33 ` Jason Gunthorpe
2021-03-25 13:54 ` Christian König
2021-03-25 13:56 ` Jason Gunthorpe
2021-03-25 7:49 ` Christian König
2021-03-25 9:41 ` Daniel Vetter
2021-03-23 13:52 ` Jason Gunthorpe
2021-03-23 15:05 ` Thomas Hellström (Intel)
2021-03-23 19:52 ` Williams, Dan J
2021-03-23 20:42 ` Thomas Hellström (Intel)
2021-03-24 9:58 ` Daniel Vetter
2021-03-24 10:05 ` Thomas Hellström (Intel)
[not found] ` <firstname.lastname@example.org>
2021-03-24 20:22 ` Thomas Hellström (Intel)
2021-03-24 20:25 ` Dave Hansen
2021-03-25 17:51 ` Thomas Hellström (Intel)
2021-03-25 17:55 ` Jason Gunthorpe
2021-03-25 18:13 ` Thomas Hellström (Intel)
2021-03-25 18:24 ` Jason Gunthorpe
2021-03-25 18:42 ` Thomas Hellström (Intel)
2021-03-26 9:08 ` Thomas Hellström (Intel)
2021-03-26 11:46 ` Jason Gunthorpe
2021-03-26 12:33 ` Thomas Hellström (Intel) [this message]
2021-03-21 18:45 ` [RFC PATCH 2/2] mm,drm/ttm: Use VM_PFNMAP for TTM vmas Thomas Hellström (Intel)
2021-03-22 7:47 ` Christian König
2021-03-22 8:13 ` Thomas Hellström (Intel)
2021-03-23 11:57 ` Christian König
2021-03-23 11:47 ` Daniel Vetter
2021-03-23 14:04 ` Jason Gunthorpe
2021-03-23 15:51 ` Thomas Hellström (Intel)
2021-03-23 14:00 ` Jason Gunthorpe
2021-03-23 15:46 ` Thomas Hellström (Intel)
2021-03-23 16:06 ` Jason Gunthorpe
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).