linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Mon, 29 Apr 2024 14:23:35 +0100	[thread overview]
Message-ID: <3ee07020-74d9-4f13-a3d0-4924a1aa69c6@arm.com> (raw)
In-Reply-To: <e946c510-9ba3-4d7b-9561-5ded86086df0@arm.com>

On 29/04/2024 14:01, Ryan Roberts wrote:
> On 29/04/2024 13:38, Catalin Marinas wrote:
>> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
>>> On 26/04/2024 15:48, Catalin Marinas wrote:
>>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
>>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
>>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is
>>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
>>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or
>>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
>>>>
>>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
>>>> and use it for both ptes and pmds.
>>>
>>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
>>> core-mm so will now fix that before I can validate my change. see
>>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
>>>
>>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
>>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
>>> with both PTE_WRITE=0 and PTE_RDONLY=0.
>>>
>>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
>>> modification", this is not a problem as the pte is invalid, so the HW doesn't
>>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
>>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
>>> that we now repurpose for PROT_NONE.
>>
>> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
>> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
>> for PAGE_NONE (bar the kernel mappings).
> 
> Yes I guess that works. I personally prefer my proposal because it is more
> intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But
> if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is
> preferable, then I'll go with that.

Ignore this - I looked at your proposed approach and agree it's better. I'll use
`PTE_USER|PTE_UXN==0b01`. Posting shortly...

> 
>>
>> For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
>> means pte_protnone(). For pmds, however, we can end up with
>> pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
>> permissions encoded into a valid pmd. That's where a dedicated
>> PTE_PROT_NONE bit helped.
> 
> Yes agreed.
> 
>>
>> Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
>> first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
>> the pmd is present, it goes and checks pmd_protnone() which returns
>> true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
>> but it looks fragile to rely on them.
>>
>> So I think for protnone we need to check some other bits (like USER and
>> UXN) in addition to PTE_PRESENT_INVALID.
> 
> Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that
> purpose, I think?
> 
>>
>>> This will subtly change behaviour in an edge case though. Imagine:
>>>
>>> pte_t pte;
>>>
>>> pte = pte_modify(pte, PAGE_NONE);
>>> pte = pte_mkwrite_novma(pte);
>>> WARN_ON(pte_protnone(pte));
>>>
>>> Should that warning fire or not? Previously, because we had a dedicated bit for
>>> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
>>> it's more intuitive if it doesn't fire. Regardless there is no core code that
>>> ever does this. Once you have a protnone pte, its terminal - nothing ever
>>> modifies it with these helpers AFAICS.
>>
>> I don't think any core code should try to make page a PAGE_NONE pte
>> writeable.
> 
> I looked at some other arches; some (at least alpha and hexagon) will not fire
> this warning because they have R and W bits and 0b00 means NONE. Others (x86)
> will fire it because they have an explicit NONE bit and don't remove it on
> permission change. So I conclude its UB and fine to do either.
> 
>>
>>> Personally I think this is a nice tidy up that saves a SW bit in both present
>>> and swap ptes. What do you think? (I'll just post the series if its easier to
>>> provide feedback in that context).
>>
>> It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
>> it doesn't affect the pmd case I mentioned above.
>>
>>>>> But there is a problem with this: __split_huge_pmd_locked() calls
>>>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
>>>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
>>>>> but was trying to avoid the whole thing unravelling so didn't persue.
>>>>
>>>> Maybe what's wrong is the arm64 implementation setting this bit on a
>>>> swap/migration pmd (though we could handle this in the core code as
>>>> well, it depends what the other architectures do). The only check for
>>>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
>>>> into the pmd_present() check. I think it is currently broken as
>>>> pmd_present() can return true for a swap pmd after pmd_mkinvalid().
>>>
>>> I've posted a fix here:
>>> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>
>>> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.
>>
>> I agree, thanks.
>>
> 


  reply	other threads:[~2024-04-29 13:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts
2024-04-24 16:43   ` Catalin Marinas
2024-04-25  8:40     ` Ryan Roberts
2024-04-25  9:16   ` David Hildenbrand
2024-04-25 10:29     ` Ryan Roberts
2024-04-25 10:37       ` Ryan Roberts
2024-04-26 14:48         ` Catalin Marinas
2024-04-29 10:04           ` Ryan Roberts
2024-04-29 12:38             ` Catalin Marinas
2024-04-29 13:01               ` Ryan Roberts
2024-04-29 13:23                 ` Ryan Roberts [this message]
2024-04-29 14:18                   ` Catalin Marinas
2024-04-29 15:04                     ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts
2024-04-24 11:57   ` Peter Xu
2024-04-24 12:51     ` Ryan Roberts
2024-04-26 13:17     ` Ryan Roberts
2024-04-26 13:54       ` Peter Xu
2024-04-29  9:39         ` Ryan Roberts
2024-04-24 16:46   ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ee07020-74d9-4f13-a3d0-4924a1aa69c6@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=joey.gouly@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=shivanshvij@outlook.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).