linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] mm: Fix race between __split_huge_pmd_locked() and GUP-fast
Date: Mon, 29 Apr 2024 12:02:59 -0400	[thread overview]
Message-ID: <FF2DE3E1-0A11-4DAC-9176-969B7DCAE3A1@nvidia.com> (raw)
In-Reply-To: <99384a25-9ff5-43c9-b09d-5a048c456d02@arm.com>

[-- Attachment #1: Type: text/plain, Size: 6958 bytes --]

On 29 Apr 2024, at 11:34, Ryan Roberts wrote:

> On 29/04/2024 15:45, Zi Yan wrote:
>> On 29 Apr 2024, at 5:29, Ryan Roberts wrote:
>>
>>> On 27/04/2024 20:11, John Hubbard wrote:
>>>> On 4/27/24 8:14 AM, Zi Yan wrote:
>>>>> On 27 Apr 2024, at 0:41, John Hubbard wrote:
>>>>>> On 4/25/24 10:07 AM, Ryan Roberts wrote:
>>>>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>>>>> unconditionally on the pmdp and only determines if it is present or not
>>>>>>> based on the returned old pmd. This is a problem for the migration entry
>>>>>>> case because pmd_mkinvalid(), called by pmdp_invalidate() must only be
>>>>>>> called for a present pmd.
>>>>>>>
>>>>>>> On arm64 at least, pmd_mkinvalid() will mark the pmd such that any
>>>>>>> future call to pmd_present() will return true. And therefore any
>>>>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>>>>> and start interpretting the fields as if it were present, leading to
>>>>>>> BadThings (TM). GUP-fast appears to be one such lockless pgtable walker.
>>>>>>> I suspect the same is possible on other architectures.
>>>>>>>
>>>>>>> Fix this by only calling pmdp_invalidate() for a present pmd. And for
>>>>>>
>>>>>> Yes, this seems like a good design decision (after reading through the
>>>>>> discussion that you all had in the other threads).
>>>>>
>>>>> This will only be good for arm64 and does not prevent other arch developers
>>>>> to write code breaking arm64, since only arm64's pmd_mkinvalid() can turn
>>>>> a swap entry to a pmd_present() entry.
>>>>
>>>> Well, let's characterize it in a bit more detail, then:
>>>
>>> Hi All,
>>>
>>> Thanks for all the feedback! I had thought that this patch would be entirely
>>> uncontraversial - obviously I was wrong :)
>>>
>>> I've read all the emails, and trying to summarize a way forward here...
>>>
>>>>
>>>> 1) This patch will make things better for arm64. That's important!
>>>>
>>>> 2) Equally important, this patch does not make anything worse for
>>>>    other CPU arches.
>>>>
>>>> 3) This patch represents a new design constraint on the CPU arch
>>>>    layer, and thus requires documentation and whatever enforcement
>>>>    we can provide, in order to keep future code out of trouble.
>>>
>>> I know its only semantics, but I don't view this as a new design constraint. I
>>> see it as an existing constraint that was previously being violated, and this
>>> patch aims to fix that. The generic version of pmdp_invalidate() unconditionally
>>> does a tlb invalidation on the address range covered by the pmd. That makes no
>>> sense unless the pmd was previously present. So my conclusion is that the
>>> function only expects to be called for present pmds.
>>>
>>> Additionally Documentation/mm/arch_pgtable_helpers.rst already says this:
>>>
>>> "
>>> | pmd_mkinvalid             | Invalidates a mapped PMD [1]                     |
>>> "
>>>
>>> I read "mapped" to be a synonym for "present". So I think its already
>>> documented. Happy to explcitly change "mapped" to "present" though, if it helps?
>>>
>>> Finally, [1] which is linked from Documentation/mm/arch_pgtable_helpers.rst,
>>> also implies this constraint, although it doesn't explicitly say it.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/
>>>
>>>>
>>>> 3.a) See the VM_WARN_ON() hunks below.
>>>
>>> It sounds like everybody would be happy if I sprinkle these into the arches that
>>> override pmdp_invalidate[_ad]()? There are 3 arches that have their own version
>>> of pmdp_invalidate(); powerpc, s390 and sparc. And 1 that has its own version of
>>> pmdp_invalidate_ad(); x86. I'll add them in all of those.
>>>
>>> I'll use VM_WARN_ON_ONCE() as suggested by John.
>>>
>>> I'd rather not put it directly into pmd_mkinvalid() since that would set a
>>> precedent for adding them absolutely everywhere. (e.g. pte_mkdirty(), ...).
>>
>> I understand your concern here. I assume you also understand the potential issue
>> with this, namely it does not prevent one from using pmd_mkinvalid() improperly
>> and causing a bug and the bug might only appear on arm64.
>
> Are you saying that arm64 is the *only* arch where pmd_mkinvalid() can turn a
> swap pte into a present pte? I hadn't appreciated that; in your first reply to

Yes.

> this patch you said "I notice that x86, risc-v, mips behave the same" - I
> thought you were saying they behaved the same as arm64, but on re-reading, I
> think I've taken that out of context.
>
> But in spite of that, it still remains my view that making arm64's
> pmd_mkinvalid() robust to non-present ptes is not the right fix - or at least
> not sufficient on its own. That change on its own would still result in issuing
> a TLBI for the non-present pte from pmdp_invalidate(). That's not a correctness
> issue, but certainly could be a performance concern.

I agree with you that using pmd_mkinvalid() on non-presenet entries does not make
sense, but there is no easy way of enforcing it to prevent anyone doing that. And
if people do it and they are not working or testing on arm64, they can break arm64
without noticing it. It becomes arm64's burden to watch out for this potential
break all the time.

Yes, TLB invalidation should be avoided in pmdp_invalidate() to recover performance
loss. It is a separate issue from the pmd_mkinvalid() correction issue. Thank you
for pointing this out explicitly.

>
> I think its much better to have the design constraint that pmd_mkinvalid(),
> pmdp_invalidate() and pmdp_invalidate_ad() can only be called for present ptes.
> And I think the combination of WARNs and docs that we've discussed should be
> enough to allay your concerns about introduction of arm64-specific bugs.

Yes. I also understand that putting a WARN in pmd_mkinvalid() might not be desirable.

>
>>
>>>
>>>>
>>>> 3.b) I like the new design constraint, because it is reasonable and
>>>>      clearly understandable: don't invalidate a non-present page
>>>>      table entry.
>>>>
>>>> I do wonder if there is somewhere else that this should be documented?
>>>
>>> If I change:
>>>
>>> "
>>> | pmd_mkinvalid             | Invalidates a mapped PMD [1]                     |
>>> "
>>>
>>> To:
>>>
>>> "
>>> | pmd_mkinvalid             | Invalidates a present PMD; do not call for       |
>>> |                             non-present pmd [1]                              |
>>> "
>>>
>>> Is that sufficient? (I'll do the same for pud_mkinvalid() too.
>>
>> Sounds good to me.
>>
>> Also, if you move pmdp_invalidate(), please move the big comment with it to
>> avoid confusion. Thanks.
>
> Yes good spot, I'll move it.
>
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

      reply	other threads:[~2024-04-29 16:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 17:07 [PATCH v1] mm: Fix race between __split_huge_pmd_locked() and GUP-fast Ryan Roberts
2024-04-25 18:58 ` Zi Yan
2024-04-26  4:50   ` Anshuman Khandual
2024-04-26 14:33     ` Zi Yan
2024-04-29  3:36       ` Anshuman Khandual
2024-04-26  7:48   ` Ryan Roberts
2024-04-26  4:19 ` Anshuman Khandual
2024-04-26  7:43   ` Ryan Roberts
2024-04-26 14:49     ` Zi Yan
2024-04-26 14:53       ` Zi Yan
2024-04-27  4:25         ` John Hubbard
2024-04-27 15:07           ` Zi Yan
2024-04-29  5:31             ` Anshuman Khandual
2024-04-29  5:25       ` Anshuman Khandual
2024-04-29  5:07     ` Anshuman Khandual
2024-04-27  4:41 ` John Hubbard
2024-04-27 15:14   ` Zi Yan
2024-04-27 19:11     ` John Hubbard
2024-04-27 20:45       ` Zi Yan
2024-04-27 20:48         ` Zi Yan
2024-04-29  6:17           ` Anshuman Khandual
2024-04-29 14:41             ` Zi Yan
2024-04-29  9:29       ` Ryan Roberts
2024-04-29 14:45         ` Zi Yan
2024-04-29 15:29           ` Zi Yan
2024-04-29 15:35             ` Ryan Roberts
2024-04-29 15:34           ` Ryan Roberts
2024-04-29 16:02             ` Zi Yan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF2DE3E1-0A11-4DAC-9176-969B7DCAE3A1@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).