linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>,
	"airlied@linux.ie" <airlied@linux.ie>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages
Date: Fri, 26 Mar 2021 13:33:29 +0100	[thread overview]
Message-ID: <a30e4116-4d51-9143-3195-791fc1e70e87@shipmail.org> (raw)
In-Reply-To: <20210326114654.GL2356281@nvidia.com>


On 3/26/21 12:46 PM, Jason Gunthorpe wrote:
> On Fri, Mar 26, 2021 at 10:08:09AM +0100, Thomas Hellström (Intel) wrote:
>> On 3/25/21 7:24 PM, Jason Gunthorpe wrote:
>>> On Thu, Mar 25, 2021 at 07:13:33PM +0100, Thomas Hellström (Intel) wrote:
>>>> On 3/25/21 6:55 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Mar 25, 2021 at 06:51:26PM +0100, Thomas Hellström (Intel) wrote:
>>>>>> On 3/24/21 9:25 PM, Dave Hansen wrote:
>>>>>>> On 3/24/21 1:22 PM, Thomas Hellström (Intel) wrote:
>>>>>>>>> We also have not been careful at *all* about how _PAGE_BIT_SOFTW* are
>>>>>>>>> used.  It's quite possible we can encode another use even in the
>>>>>>>>> existing bits.
>>>>>>>>>
>>>>>>>>> Personally, I'd just try:
>>>>>>>>>
>>>>>>>>> #define _PAGE_BIT_SOFTW5        57      /* available for programmer */
>>>>>>>>>
>>>>>>>> OK, I'll follow your advise here. FWIW I grepped for SW1 and it seems
>>>>>>>> used in a selftest, but only for PTEs AFAICT.
>>>>>>>>
>>>>>>>> Oh, and we don't care about 32-bit much anymore?
>>>>>>> On x86, we have 64-bit PTEs when running 32-bit kernels if PAE is
>>>>>>> enabled.  IOW, we can handle the majority of 32-bit CPUs out there.
>>>>>>>
>>>>>>> But, yeah, we don't care about 32-bit. :)
>>>>>> Hmm,
>>>>>>
>>>>>> Actually it makes some sense to use SW1, to make it end up in the same dword
>>>>>> as the PSE bit, as from what I can tell, reading of a 64-bit pmd_t on 32-bit
>>>>>> PAE is not atomic, so in theory a huge pmd could be modified while reading
>>>>>> the pmd_t making the dwords inconsistent.... How does that work with fast
>>>>>> gup anyway?
>>>>> It loops to get an atomic 64 bit value if the arch can't provide an
>>>>> atomic 64 bit load
>>>> Hmm, ok, I see a READ_ONCE() in gup_pmd_range(), and then the resulting pmd
>>>> is dereferenced either in try_grab_compound_head() or __gup_device_huge(),
>>>> before the pmd is compared to the value the pointer is currently pointing
>>>> to. Couldn't those dereferences be on invalid pointers?
>>> Uhhhhh.. That does look questionable, yes. Unless there is some tricky
>>> reason why a 64 bit pmd entry on a 32 bit arch either can't exist or
>>> has a stable upper 32 bits..
>>>
>>> The pte does it with ptep_get_lockless(), we probably need the same
>>> for the other levels too instead of open coding a READ_ONCE?
>>>
>>> Jason
>> TBH, ptep_get_lockless() also looks a bit fishy. it says
>> "it will not switch to a completely different present page without a TLB
>> flush in between".
>>
>> What if the following happens:
>>
>> processor 1: Reads lower dword of PTE.
>> processor 2: Zaps PTE. Gets stuck waiting to do TLB flush
>> processor 1: Reads upper dword of PTE, which is now zero.
>> processor 3: Hits a TLB miss, reads an unpopulated PTE and faults in a new
>> PTE value which happens to be the same as the original one before the zap.
>> processor 1: Reads the newly faulted in lower dword, compares to the old
>> one, gives an OK and returns a bogus PTE.
> So you are saying that while the zap will wait for the TLB flush to
> globally finish once it gets started any other processor can still
> write to the pte?
>
> I can't think of any serialization that would cause fault to wait for
> the zap/TLB flush, especially if the zap comes from the address_space
> and doesn't hold the mmap lock.

I might of course be completely wrong, but It seems there is an 
assumption made that all potentially affected processors would have a 
valid TLB entry for the PTE. Then the fault would not happen (well 
unless of course the TLB flush completes on some processors before 
getting stuck on the local_irq_disable() on processor 1).

+CC: Nick Piggin

Seems like Nick Piggin is the original author of the comment. Perhaps he 
can can clarify a bit.

/Thomas


>
> Seems worth bringing up in a bigger thread, maybe someone else knows?
>
> Jason


  reply	other threads:[~2021-03-26 12:33 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-21 18:45 [RFC PATCH 0/2] mm,drm/ttm: Always block GUP to TTM pages Thomas Hellström (Intel)
2021-03-21 18:45 ` [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages Thomas Hellström (Intel)
2021-03-23 11:34   ` Daniel Vetter
2021-03-23 16:34     ` Thomas Hellström (Intel)
2021-03-23 16:37       ` Jason Gunthorpe
2021-03-23 16:59         ` Christoph Hellwig
2021-03-23 17:06         ` Thomas Hellström (Intel)
2021-03-24  9:56           ` Daniel Vetter
2021-03-24 12:24             ` Jason Gunthorpe
2021-03-24 12:35               ` Thomas Hellström (Intel)
2021-03-24 12:41                 ` Jason Gunthorpe
2021-03-24 13:35                   ` Thomas Hellström (Intel)
2021-03-24 13:48                     ` Jason Gunthorpe
2021-03-24 15:50                       ` Thomas Hellström (Intel)
2021-03-24 16:38                         ` Jason Gunthorpe
2021-03-24 18:31                           ` Christian König
2021-03-24 20:07                             ` Thomas Hellström (Intel)
2021-03-24 23:14                               ` Jason Gunthorpe
2021-03-25  7:48                                 ` Thomas Hellström (Intel)
2021-03-25  8:27                                   ` Christian König
2021-03-25  9:51                                     ` Thomas Hellström (Intel)
2021-03-25 11:30                                       ` Jason Gunthorpe
2021-03-25 11:53                                         ` Thomas Hellström (Intel)
2021-03-25 12:01                                           ` Jason Gunthorpe
2021-03-25 12:09                                             ` Christian König
2021-03-25 12:36                                               ` Thomas Hellström (Intel)
2021-03-25 13:02                                                 ` Christian König
2021-03-25 13:31                                                   ` Thomas Hellström (Intel)
2021-03-25 12:42                                               ` Jason Gunthorpe
2021-03-25 13:05                                                 ` Christian König
2021-03-25 13:17                                                   ` Jason Gunthorpe
2021-03-25 13:26                                                     ` Christian König
2021-03-25 13:33                                                       ` Jason Gunthorpe
2021-03-25 13:54                                                         ` Christian König
2021-03-25 13:56                                                           ` Jason Gunthorpe
2021-03-25  7:49                                 ` Christian König
2021-03-25  9:41                                   ` Daniel Vetter
2021-03-23 13:52   ` Jason Gunthorpe
2021-03-23 15:05     ` Thomas Hellström (Intel)
2021-03-23 19:52   ` Williams, Dan J
2021-03-23 20:42     ` Thomas Hellström (Intel)
2021-03-24  9:58       ` Daniel Vetter
2021-03-24 10:05         ` Thomas Hellström (Intel)
     [not found]           ` <75423f64-adef-a2c4-8e7d-2cb814127b18@intel.com>
2021-03-24 20:22             ` Thomas Hellström (Intel)
2021-03-24 20:25               ` Dave Hansen
2021-03-25 17:51                 ` Thomas Hellström (Intel)
2021-03-25 17:55                   ` Jason Gunthorpe
2021-03-25 18:13                     ` Thomas Hellström (Intel)
2021-03-25 18:24                       ` Jason Gunthorpe
2021-03-25 18:42                         ` Thomas Hellström (Intel)
2021-03-26  9:08                         ` Thomas Hellström (Intel)
2021-03-26 11:46                           ` Jason Gunthorpe
2021-03-26 12:33                             ` Thomas Hellström (Intel) [this message]
2021-03-21 18:45 ` [RFC PATCH 2/2] mm,drm/ttm: Use VM_PFNMAP for TTM vmas Thomas Hellström (Intel)
2021-03-22  7:47   ` Christian König
2021-03-22  8:13     ` Thomas Hellström (Intel)
2021-03-23 11:57       ` Christian König
2021-03-23 11:47   ` Daniel Vetter
2021-03-23 14:04     ` Jason Gunthorpe
2021-03-23 15:51       ` Thomas Hellström (Intel)
2021-03-23 14:00   ` Jason Gunthorpe
2021-03-23 15:46     ` Thomas Hellström (Intel)
2021-03-23 16:06       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a30e4116-4d51-9143-3195-791fc1e70e87@shipmail.org \
    --to=thomas_os@shipmail.org \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).