From: Nadav Amit <namit@vmware.com> To: Dave Hansen <dave.hansen@intel.com> Cc: Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Andrea Arcangeli <aarcange@redhat.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Nick Piggin <npiggin@gmail.com>, "x86@kernel.org" <x86@kernel.org> Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Tue, 26 Oct 2021 16:47:56 +0000 [thread overview] Message-ID: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com> (raw) In-Reply-To: <c415820a-aebb-265c-7f47-e048ee429102@intel.com> > On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/21/21 5:21 AM, Nadav Amit wrote: >> The first TLB flush is only necessary to prevent the dirty bit (and with >> a lesser importance the access bit) from changing while the PTE is >> modified. However, this is not necessary as the x86 CPUs set the >> dirty-bit atomically with an additional check that the PTE is (still) >> present. One caveat is Intel's Knights Landing that has a bug and does >> not do so. > > First, did I miss the check in this patch for X86_BUG_PTE_LEAK? I don't > see it anywhere. No, it is me who missed it. It should have been in pmdp_invalidate_ad(): diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..f14f64cc17b5 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd) return 0; } +/* + * pmdp_invalidate_ad() - prevents the access and dirty bits from being further + * updated by the CPU. + * + * Returns the original PTE. + * + * During an access to a page, x86 CPUs set the dirty and access bit atomically + * with an additional check of the present-bit. Therefore, it is possible to + * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does. + * + * We do not make this optimization on certain CPUs that has a bug that violates + * this behavior (specifically Knights Landing). + */ +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); + + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return old; +} > >> - * pmdp_invalidate() is required to make sure we don't miss >> - * dirty/young flags set by hardware. > > This got me thinking... In here: > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&reserved=0 > > I wrote: > >> These bits are truly "stray". In the case of the Dirty bit, the >> thread associated with the stray set was *not* allowed to write to >> the page. This means that we do not have to launder the bit(s); we >> can simply ignore them. > > Is the goal of your proposed patch here to ensure that the dirty bit is > not set at *all*? Or, is it to ensure that a dirty bit which we need to > *launder* is never set? At *all*. Err… I remembered from our previous discussions that the dirty bit cannot be set once the R/W bit is cleared atomically. But going back to the SDM, I see the (relatively new?) note: "If software on one logical processor writes to a page while software on another logical processor concurrently clears the R/W flag in the paging-structure entry that maps the page, execution on some processors may result in the entry’s dirty flag being set (due to the write on the first logical processor) and the entry’s R/W flag being clear (due to the update to the entry on the second logical processor). This will never occur on a processor that supports control-flow enforcement technology (CET)” So I guess that this optimization can only be enabled when CET is enabled. :(
WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <namit@vmware.com> To: Dave Hansen <dave.hansen@intel.com> Cc: Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Andrea Arcangeli <aarcange@redhat.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Nick Piggin <npiggin@gmail.com>, "x86@kernel.org" <x86@kernel.org> Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Tue, 26 Oct 2021 16:53:02 +0000 [thread overview] Message-ID: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com> (raw) Message-ID: <20211026165302.tb-jvBTwgN22ZWQOQQ2gtC7iE8JQLH4n7JWvCKNWyX4@z> (raw) In-Reply-To: <c415820a-aebb-265c-7f47-e048ee429102@intel.com> > On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/21/21 5:21 AM, Nadav Amit wrote: >> The first TLB flush is only necessary to prevent the dirty bit (and with >> a lesser importance the access bit) from changing while the PTE is >> modified. However, this is not necessary as the x86 CPUs set the >> dirty-bit atomically with an additional check that the PTE is (still) >> present. One caveat is Intel's Knights Landing that has a bug and does >> not do so. > > First, did I miss the check in this patch for X86_BUG_PTE_LEAK? I don't > see it anywhere. No, it is me who missed it. It should have been in pmdp_invalidate_ad(): diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..f14f64cc17b5 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd) return 0; } +/* + * pmdp_invalidate_ad() - prevents the access and dirty bits from being further + * updated by the CPU. + * + * Returns the original PTE. + * + * During an access to a page, x86 CPUs set the dirty and access bit atomically + * with an additional check of the present-bit. Therefore, it is possible to + * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does. + * + * We do not make this optimization on certain CPUs that has a bug that violates + * this behavior (specifically Knights Landing). + */ +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); + + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return old; +} > >> - * pmdp_invalidate() is required to make sure we don't miss >> - * dirty/young flags set by hardware. > > This got me thinking... In here: > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&reserved=0 > > I wrote: > >> These bits are truly "stray". In the case of the Dirty bit, the >> thread associated with the stray set was *not* allowed to write to >> the page. This means that we do not have to launder the bit(s); we >> can simply ignore them. > > Is the goal of your proposed patch here to ensure that the dirty bit is > not set at *all*? Or, is it to ensure that a dirty bit which we need to > *launder* is never set? At *all*. Err… I remembered from our previous discussions that the dirty bit cannot be set once the R/W bit is cleared atomically. But going back to the SDM, I see the (relatively new?) note: "If software on one logical processor writes to a page while software on another logical processor concurrently clears the R/W flag in the paging-structure entry that maps the page, execution on some processors may result in the entry’s dirty flag being set (due to the write on the first logical processor) and the entry’s R/W flag being clear (due to the update to the entry on the second logical processor). This will never occur on a processor that supports control-flow enforcement technology (CET)” So I guess that this optimization can only be enabled when CET is enabled. :(
next prev parent reply other threads:[~2021-10-26 16:48 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit 2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit 2021-10-26 15:54 ` Dave Hansen 2021-10-26 15:57 ` Nadav Amit 2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit 2021-10-25 10:52 ` Peter Zijlstra 2021-10-25 16:29 ` Nadav Amit 2021-10-26 16:06 ` Dave Hansen 2021-10-26 16:47 ` Nadav Amit [this message] 2021-10-26 16:53 ` Nadav Amit 2021-10-26 17:44 ` Nadav Amit 2021-10-26 18:44 ` Dave Hansen 2021-10-26 19:06 ` Nadav Amit 2021-10-26 19:40 ` Dave Hansen 2021-10-26 20:07 ` Nadav Amit 2021-10-26 20:47 ` Dave Hansen 2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit 2021-10-25 10:59 ` Peter Zijlstra 2021-10-25 11:13 ` Andrew Cooper 2021-10-25 14:23 ` Dave Hansen 2021-10-25 14:20 ` Dave Hansen 2021-10-25 16:19 ` Nadav Amit 2021-10-25 17:45 ` Dave Hansen 2021-10-25 17:51 ` Nadav Amit 2021-10-25 18:00 ` Dave Hansen 2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit 2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit 2021-10-25 11:12 ` Peter Zijlstra 2021-10-25 16:27 ` Nadav Amit 2021-10-22 3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton 2021-10-22 21:58 ` Nadav Amit 2021-10-26 16:09 ` Dave Hansen 2021-10-25 10:50 ` Peter Zijlstra 2021-10-25 16:42 ` Nadav Amit
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com \ --to=namit@vmware.com \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=andrew.cooper3@citrix.com \ --cc=dave.hansen@intel.com \ --cc=dave.hansen@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=luto@kernel.org \ --cc=npiggin@gmail.com \ --cc=peterx@redhat.com \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=will@kernel.org \ --cc=x86@kernel.org \ --cc=yuzhao@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).