All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Tue, 26 Oct 2021 16:47:56 +0000	[thread overview]
Message-ID: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com> (raw)
In-Reply-To: <c415820a-aebb-265c-7f47-e048ee429102@intel.com>



> On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 10/21/21 5:21 AM, Nadav Amit wrote:
>> The first TLB flush is only necessary to prevent the dirty bit (and with
>> a lesser importance the access bit) from changing while the PTE is
>> modified. However, this is not necessary as the x86 CPUs set the
>> dirty-bit atomically with an additional check that the PTE is (still)
>> present. One caveat is Intel's Knights Landing that has a bug and does
>> not do so.
> 
> First, did I miss the check in this patch for X86_BUG_PTE_LEAK?  I don't
> see it anywhere.

No, it is me who missed it. It should have been in pmdp_invalidate_ad():

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3481b35cb4ec..f14f64cc17b5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * pmdp_invalidate_ad() - prevents the access and dirty bits from being further
+ * updated by the CPU.
+ *
+ * Returns the original PTE.
+ *
+ * During an access to a page, x86 CPUs set the dirty and access bit atomically
+ * with an additional check of the present-bit. Therefore, it is possible to
+ * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does.
+ *
+ * We do not make this optimization on certain CPUs that has a bug that violates
+ * this behavior (specifically Knights Landing).
+ */
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+			 pmd_t *pmdp)
+{
+	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
+
+	if (cpu_feature_enabled(X86_BUG_PTE_LEAK))
+		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+	return old;
+}

> 
>> -	 * pmdp_invalidate() is required to make sure we don't miss
>> -	 * dirty/young flags set by hardware.
> 
> This got me thinking...  In here:
> 
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&amp;data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&amp;reserved=0
> 
> I wrote:
> 
>> These bits are truly "stray".  In the case of the Dirty bit, the
>> thread associated with the stray set was *not* allowed to write to
>> the page.  This means that we do not have to launder the bit(s); we
>> can simply ignore them.
> 
> Is the goal of your proposed patch here to ensure that the dirty bit is
> not set at *all*?  Or, is it to ensure that a dirty bit which we need to
> *launder* is never set?

At *all*.

Err… I remembered from our previous discussions that the dirty bit cannot
be set once the R/W bit is cleared atomically. But going back to the SDM,
I see the (relatively new?) note:

"If software on one logical processor writes to a page while software on
 another logical processor concurrently clears the R/W flag in the
 paging-structure entry that maps the page, execution on some processors may
 result in the entry’s dirty flag being set (due to the write on the first
 logical processor) and the entry’s R/W flag being clear (due to the update
 to the entry on the second logical processor). This will never occur on a
 processor that supports control-flow enforcement technology (CET)”

So I guess that this optimization can only be enabled when CET is enabled.

:(



WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <namit@vmware.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Tue, 26 Oct 2021 16:53:02 +0000	[thread overview]
Message-ID: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com> (raw)
Message-ID: <20211026165302.tb-jvBTwgN22ZWQOQQ2gtC7iE8JQLH4n7JWvCKNWyX4@z> (raw)
In-Reply-To: <c415820a-aebb-265c-7f47-e048ee429102@intel.com>



> On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 10/21/21 5:21 AM, Nadav Amit wrote:
>> The first TLB flush is only necessary to prevent the dirty bit (and with
>> a lesser importance the access bit) from changing while the PTE is
>> modified. However, this is not necessary as the x86 CPUs set the
>> dirty-bit atomically with an additional check that the PTE is (still)
>> present. One caveat is Intel's Knights Landing that has a bug and does
>> not do so.
> 
> First, did I miss the check in this patch for X86_BUG_PTE_LEAK?  I don't
> see it anywhere.

No, it is me who missed it. It should have been in pmdp_invalidate_ad():

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3481b35cb4ec..f14f64cc17b5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * pmdp_invalidate_ad() - prevents the access and dirty bits from being further
+ * updated by the CPU.
+ *
+ * Returns the original PTE.
+ *
+ * During an access to a page, x86 CPUs set the dirty and access bit atomically
+ * with an additional check of the present-bit. Therefore, it is possible to
+ * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does.
+ *
+ * We do not make this optimization on certain CPUs that has a bug that violates
+ * this behavior (specifically Knights Landing).
+ */
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+			 pmd_t *pmdp)
+{
+	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
+
+	if (cpu_feature_enabled(X86_BUG_PTE_LEAK))
+		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+	return old;
+}

> 
>> -	 * pmdp_invalidate() is required to make sure we don't miss
>> -	 * dirty/young flags set by hardware.
> 
> This got me thinking...  In here:
> 
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&amp;data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&amp;reserved=0
> 
> I wrote:
> 
>> These bits are truly "stray".  In the case of the Dirty bit, the
>> thread associated with the stray set was *not* allowed to write to
>> the page.  This means that we do not have to launder the bit(s); we
>> can simply ignore them.
> 
> Is the goal of your proposed patch here to ensure that the dirty bit is
> not set at *all*?  Or, is it to ensure that a dirty bit which we need to
> *launder* is never set?

At *all*.

Err… I remembered from our previous discussions that the dirty bit cannot
be set once the R/W bit is cleared atomically. But going back to the SDM,
I see the (relatively new?) note:

"If software on one logical processor writes to a page while software on
 another logical processor concurrently clears the R/W flag in the
 paging-structure entry that maps the page, execution on some processors may
 result in the entry’s dirty flag being set (due to the write on the first
 logical processor) and the entry’s R/W flag being clear (due to the update
 to the entry on the second logical processor). This will never occur on a
 processor that supports control-flow enforcement technology (CET)”

So I guess that this optimization can only be enabled when CET is enabled.

:(



  reply	other threads:[~2021-10-26 16:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2021-10-26 15:54   ` Dave Hansen
2021-10-26 15:57     ` Nadav Amit
2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2021-10-25 10:52   ` Peter Zijlstra
2021-10-25 16:29     ` Nadav Amit
2021-10-26 16:06   ` Dave Hansen
2021-10-26 16:47     ` Nadav Amit [this message]
2021-10-26 16:53       ` Nadav Amit
2021-10-26 17:44       ` Nadav Amit
2021-10-26 18:44         ` Dave Hansen
2021-10-26 19:06           ` Nadav Amit
2021-10-26 19:40             ` Dave Hansen
2021-10-26 20:07               ` Nadav Amit
2021-10-26 20:47                 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit
2021-10-25 10:59   ` Peter Zijlstra
2021-10-25 11:13     ` Andrew Cooper
2021-10-25 14:23     ` Dave Hansen
2021-10-25 14:20   ` Dave Hansen
2021-10-25 16:19     ` Nadav Amit
2021-10-25 17:45       ` Dave Hansen
2021-10-25 17:51         ` Nadav Amit
2021-10-25 18:00           ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit
2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-10-25 11:12   ` Peter Zijlstra
2021-10-25 16:27     ` Nadav Amit
2021-10-22  3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton
2021-10-22 21:58   ` Nadav Amit
2021-10-26 16:09     ` Dave Hansen
2021-10-25 10:50   ` Peter Zijlstra
2021-10-25 16:42     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com \
    --to=namit@vmware.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.