linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
@ 2018-08-30 21:45 Tony Luck
  2018-08-31  1:29 ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Luck @ 2018-08-30 21:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, linux-edac, linux-kernel, x86, Dan Williams,
	Dave Jiang

The trick with flipping bit 63 to avoid loading the address of the
1:1 mapping of the poisoned page while we update the 1:1 map used
to work when we wanted to unmap the page. But it falls down horribly
when we try to directly set the page as uncacheable.

The problem is that when we change the cache mode to uncachable we
try to flush the page from the cache. But the decoy address is
non-canonical, and the CLFLUSH instruction throws a #GP fault.

Fix is to move one step at a time. First mark the page not present
(using the decoy address). Then it is safe to use the actual address
of the 1:1 mapping to mark it "uc", and finally as present.

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

Maybe this is horrible. Other suggestions gratefully received.

 arch/x86/include/asm/set_memory.h | 23 +++++++++++++++++++++--
 arch/x86/mm/pageattr.c            |  5 +++++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 07a25753e85c..e876860988bf 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -43,6 +43,7 @@ int set_memory_wc(unsigned long addr, int numpages);
 int set_memory_wt(unsigned long addr, int numpages);
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
@@ -111,9 +112,27 @@ static inline int set_mce_nospec(unsigned long pfn)
 	 */
 	decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
 
-	rc = set_memory_uc(decoy_addr, 1);
-	if (rc)
+	rc = set_memory_np(decoy_addr, 1);
+	if (rc) {
 		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
+		return rc;
+	}
+
+	native_cpuid_eax(0);
+
+	/* Now safe to use the virtual address in the 1:1 map */
+	rc = set_memory_uc((unsigned long)pfn_to_kaddr(pfn), 1);
+	if (rc) {
+		pr_warn("Could not set pfn=0x%lx uncacheable in 1:1 map\n", pfn);
+		return rc;
+	}
+
+	rc = set_memory_p((unsigned long)pfn_to_kaddr(pfn), 1);
+	if (rc) {
+		pr_warn("Could not remap pfn=0x%lx uncacheable in 1:1 map\n", pfn);
+		return rc;
+	}
+
 	return rc;
 }
 #define set_mce_nospec set_mce_nospec
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8d6c34fe49be..87400351c5a0 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1776,6 +1776,11 @@ int set_memory_np(unsigned long addr, int numpages)
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
 }
 
+int set_memory_p(unsigned long addr, int numpages)
+{
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
 int set_memory_np_noalias(unsigned long addr, int numpages)
 {
 	int cpa_flags = CPA_NO_CHECK_ALIAS;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-30 21:45 [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault Tony Luck
@ 2018-08-31  1:29 ` Linus Torvalds
  2018-08-31  1:48   ` Tony Luck
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-08-31  1:29 UTC (permalink / raw)
  To: Tony Luck
  Cc: Thomas Gleixner, Ingo Molnar, Peter Anvin, Borislav Petkov,
	linux-edac, Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

On Thu, Aug 30, 2018 at 2:45 PM Tony Luck <tony.luck@intel.com> wrote:
>
> Fix is to move one step at a time. First mark the page not present
> (using the decoy address). Then it is safe to use the actual address
> of the 1:1 mapping to mark it "uc", and finally as present.

Can't we do it in one step, but make sure that he clflush gets the real address?

              Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31  1:29 ` Linus Torvalds
@ 2018-08-31  1:48   ` Tony Luck
  2018-08-31  4:25     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Luck @ 2018-08-31  1:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Linux Edac Mailing List, Linux Kernel Mailing List, X86-ML,
	Dan Williams, dave.jiang

On Thu, Aug 30, 2018 at 6:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 30, 2018 at 2:45 PM Tony Luck <tony.luck@intel.com> wrote:
> >
> > Fix is to move one step at a time. First mark the page not present
> > (using the decoy address). Then it is safe to use the actual address
> > of the 1:1 mapping to mark it "uc", and finally as present.
>
> Can't we do it in one step, but make sure that he clflush gets the real address?
>

I'd like to, but I'd need to have a way to mark the address as needing fixing as
it gets passed from set_memory_uc() to _set_memory_uc() to
change_page_attr_set()
to change_page_attr_set_clear()

Just checking "do we have a non-canonical address" at the bottom of that
call stack and flipping bit 63 back on again seems like a bad idea. But adding
extra flag arguments is majorly ugly too.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31  1:48   ` Tony Luck
@ 2018-08-31  4:25     ` Linus Torvalds
  2018-08-31 15:35       ` Thomas Gleixner
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-08-31  4:25 UTC (permalink / raw)
  To: Tony Luck
  Cc: Thomas Gleixner, Ingo Molnar, Peter Anvin, Borislav Petkov,
	linux-edac, Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

On Thu, Aug 30, 2018 at 6:49 PM Tony Luck <tony.luck@intel.com> wrote:
>
> Just checking "do we have a non-canonical address" at the bottom of that
> call stack and flipping bit 63 back on again seems like a bad idea.

You could literally do something like

    /* Make it canonical in case we flipped the high bit */
    addr = (long)(addr<<1)>>1;

in the call to clflush and it magically does the right thing.

Pretty? No. But with a big comment about what is going on and why it's
done, I think it's prettier than your much bigger patch.

I dunno. It does strike me as a bit hacky, but I'd rather have a
*small*  one-liner hack that generates two instructions, than add a
complex hack that modifies the page tables three times and has a
serializing instruction in it.

Both are subtle fixes for a subtle issue, but one seems pretty
harmless in comparison.

Hmm?

But I'll bow to the x86 maintainers.

             Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31  4:25     ` Linus Torvalds
@ 2018-08-31 15:35       ` Thomas Gleixner
  2018-08-31 16:55       ` [PATCH V2] " Luck, Tony
  2018-09-10 13:52       ` [PATCH] " David Laight
  2 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2018-08-31 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Ingo Molnar, Peter Anvin, Borislav Petkov, linux-edac,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

On Thu, 30 Aug 2018, Linus Torvalds wrote:
> On Thu, Aug 30, 2018 at 6:49 PM Tony Luck <tony.luck@intel.com> wrote:
> >
> > Just checking "do we have a non-canonical address" at the bottom of that
> > call stack and flipping bit 63 back on again seems like a bad idea.
> 
> You could literally do something like
> 
>     /* Make it canonical in case we flipped the high bit */
>     addr = (long)(addr<<1)>>1;
> 
> in the call to clflush and it magically does the right thing.
> 
> Pretty? No. But with a big comment about what is going on and why it's
> done, I think it's prettier than your much bigger patch.
> 
> I dunno. It does strike me as a bit hacky, but I'd rather have a
> *small*  one-liner hack that generates two instructions, than add a
> complex hack that modifies the page tables three times and has a
> serializing instruction in it.
> 
> Both are subtle fixes for a subtle issue, but one seems pretty
> harmless in comparison.
> 
> Hmm?
> 
> But I'll bow to the x86 maintainers.

The above is fugly, but it has the charm of simplicity and I assume it's
going to be useful for other places as well. With a big fat comment WHY we
are doing it it's not that horrible. We have all the other L1TF places
where we fiddle with bits in non-obvious ways, so having another instance
of magic bit fiddling is not that big of a problem.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31  4:25     ` Linus Torvalds
  2018-08-31 15:35       ` Thomas Gleixner
@ 2018-08-31 16:55       ` Luck, Tony
  2018-08-31 16:57         ` Linus Torvalds
  2018-09-01 13:03         ` [tip:x86/urgent] " tip-bot for LuckTony
  2018-09-10 13:52       ` [PATCH] " David Laight
  2 siblings, 2 replies; 9+ messages in thread
From: Luck, Tony @ 2018-08-31 16:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, Peter Anvin, Borislav Petkov,
	linux-edac, Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

The trick with flipping bit 63 to avoid loading the address of the
1:1 mapping of the poisoned page while we update the 1:1 map used
to work when we wanted to unmap the page. But it falls down horribly
when we try to directly set the page as uncacheable.

The problem is that when we change the cache mode to uncachable we
try to flush the page from the cache. But the decoy address is
non-canonical, and the CLFLUSH instruction throws a #GP fault.

Add code to change_page_attr_set_clr() to fix the address
before calling flush.

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

The magic code to make address canonical would mess up 32-bit (which
doesn't play these games).  So I made an inline function. I've put it
in pageattr.c as I don't expect anywhere else to need this. If the #ifdef
in a ".c" file offends too badly it can move to some ".h" file.

 arch/x86/mm/pageattr.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8d6c34fe49be..51a5a69ecac9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1420,6 +1420,29 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
 	return 0;
 }
 
+/*
+ * Machine check recovery code needs to change cache mode of poisoned
+ * pages to UC to avoid speculative access logging another error. But
+ * passing the address of the 1:1 mapping to set_memory_uc() is a fine
+ * way to encourage a speculative access. So we cheat and flip the top
+ * bit of the address. This works fine for the code that updates the
+ * page tables. But at the end of the process we need to flush the cache
+ * and the non-canonical address causes a #GP fault when used by the
+ * CLFLUSH instruction.
+ *
+ * But in the common case we already have a canonical address. This code
+ * will fix the top bit if needed and is a no-op otherwise.
+ */
+static inline unsigned long make_addr_canonical_again(unsigned long addr)
+{
+#ifdef CONFIG_X86_64
+	return (long)(addr << 1) >> 1;
+#else
+	return addr;
+#endif
+}
+
+
 static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 				    pgprot_t mask_set, pgprot_t mask_clr,
 				    int force_split, int in_flag,
@@ -1465,7 +1488,7 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 		 * Save address for cache flush. *addr is modified in the call
 		 * to __change_page_attr_set_clr() below.
 		 */
-		baddr = *addr;
+		baddr = make_addr_canonical_again(*addr);
 	}
 
 	/* Must avoid aliasing mappings in the highmem code */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V2] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31 16:55       ` [PATCH V2] " Luck, Tony
@ 2018-08-31 16:57         ` Linus Torvalds
  2018-09-01 13:03         ` [tip:x86/urgent] " tip-bot for LuckTony
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-08-31 16:57 UTC (permalink / raw)
  To: Tony Luck
  Cc: Thomas Gleixner, Ingo Molnar, Peter Anvin, Borislav Petkov,
	linux-edac, Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

Ack. This looks good with the big comment and explanation.

             Linus

On Fri, Aug 31, 2018 at 9:55 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> The trick with flipping bit 63 to avoid loading the address of the
> 1:1 mapping of the poisoned page while we update the 1:1 map used
> to work when we wanted to unmap the page. But it falls down horribly
> when we try to directly set the page as uncacheable.
> [ ... ]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:x86/urgent] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31 16:55       ` [PATCH V2] " Luck, Tony
  2018-08-31 16:57         ` Linus Torvalds
@ 2018-09-01 13:03         ` tip-bot for LuckTony
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for LuckTony @ 2018-09-01 13:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.jiang, hpa, bp, mingo, linux-kernel, tglx, tony.luck,
	dan.j.williams, linux-edac, torvalds

Commit-ID:  c7486104a5ce7e8763e3cb5157bba8d0f1468d87
Gitweb:     https://git.kernel.org/tip/c7486104a5ce7e8763e3cb5157bba8d0f1468d87
Author:     LuckTony <tony.luck@intel.com>
AuthorDate: Fri, 31 Aug 2018 09:55:06 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 1 Sep 2018 14:59:19 +0200

x86/mce: Fix set_mce_nospec() to avoid #GP fault

The trick with flipping bit 63 to avoid loading the address of the 1:1
mapping of the poisoned page while the 1:1 map is updated used to work when
unmapping the page. But it falls down horribly when attempting to directly
set the page as uncacheable.

The problem is that when the cache mode is changed to uncachable, the pages
needs to be flushed from the cache first. But the decoy address is
non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a
#GP fault.

Add code to change_page_attr_set_clr() to fix the address before calling
flush.

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk

---
 arch/x86/mm/pageattr.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8d6c34fe49be..51a5a69ecac9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1420,6 +1420,29 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
 	return 0;
 }
 
+/*
+ * Machine check recovery code needs to change cache mode of poisoned
+ * pages to UC to avoid speculative access logging another error. But
+ * passing the address of the 1:1 mapping to set_memory_uc() is a fine
+ * way to encourage a speculative access. So we cheat and flip the top
+ * bit of the address. This works fine for the code that updates the
+ * page tables. But at the end of the process we need to flush the cache
+ * and the non-canonical address causes a #GP fault when used by the
+ * CLFLUSH instruction.
+ *
+ * But in the common case we already have a canonical address. This code
+ * will fix the top bit if needed and is a no-op otherwise.
+ */
+static inline unsigned long make_addr_canonical_again(unsigned long addr)
+{
+#ifdef CONFIG_X86_64
+	return (long)(addr << 1) >> 1;
+#else
+	return addr;
+#endif
+}
+
+
 static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 				    pgprot_t mask_set, pgprot_t mask_clr,
 				    int force_split, int in_flag,
@@ -1465,7 +1488,7 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 		 * Save address for cache flush. *addr is modified in the call
 		 * to __change_page_attr_set_clr() below.
 		 */
-		baddr = *addr;
+		baddr = make_addr_canonical_again(*addr);
 	}
 
 	/* Must avoid aliasing mappings in the highmem code */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault
  2018-08-31  4:25     ` Linus Torvalds
  2018-08-31 15:35       ` Thomas Gleixner
  2018-08-31 16:55       ` [PATCH V2] " Luck, Tony
@ 2018-09-10 13:52       ` David Laight
  2 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2018-09-10 13:52 UTC (permalink / raw)
  To: 'Linus Torvalds', Tony Luck
  Cc: Thomas Gleixner, Ingo Molnar, Peter Anvin, Borislav Petkov,
	linux-edac, Linux Kernel Mailing List, the arch/x86 maintainers,
	Dan Williams, Dave Jiang

From: Linus Torvalds
> ...
> You could literally do something like
> 
>     /* Make it canonical in case we flipped the high bit */
>     addr = (long)(addr<<1)>>1;

Isn't it safer to use a mask and let the compiler decide if two
shifts are a good implementation?

	addr &= ~HIGH_MAGIC_BIT;

ISTR fixing a bug in gld where it had (foo << 16) >> 16 instead
of foo & 0xffff. Didn't work well on 64bit.

(I may need to recompile that old gcc/gld again.
Failed miserably last time I tried.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-09-10 13:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 21:45 [PATCH] x86/mce: Fix set_mce_nospec() to avoid #GP fault Tony Luck
2018-08-31  1:29 ` Linus Torvalds
2018-08-31  1:48   ` Tony Luck
2018-08-31  4:25     ` Linus Torvalds
2018-08-31 15:35       ` Thomas Gleixner
2018-08-31 16:55       ` [PATCH V2] " Luck, Tony
2018-08-31 16:57         ` Linus Torvalds
2018-09-01 13:03         ` [tip:x86/urgent] " tip-bot for LuckTony
2018-09-10 13:52       ` [PATCH] " David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).