All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] More PCID fixes
@ 2017-09-17 16:03 Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
  To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski

This fixes a 32-bit boot warning, a 32-bit boot oddity that seems
unsymtomatic right now, and a potential userspace corruption issue
across EFI calls if PCID is enabled.

With this series applied, the only remaining issue I'm aware of is
the paging-structure cache laziness problem.

I've tested SMP boot, suspend/resume, hotplug/unplug, and hibernate
on a 32-bit VM, a 64-bit VM, and a 64-bit laptop.  (Except hibernate
on the laptop -- my laptop can't resume from hibernation on *any*
kernel due to a userspace (systemd? Fedora overall integration?)
issue.)

Andy Lutomirski (4):
  x86/mm: Factor out CR3-building code
  x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
  x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
  x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs

 arch/x86/include/asm/mmu_context.h | 32 ++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/bugs.c         |  8 --------
 arch/x86/kernel/cpu/common.c       |  8 ++++++++
 arch/x86/kernel/smpboot.c          | 13 +++++++------
 arch/x86/mm/tlb.c                  | 11 +++++------
 5 files changed, 48 insertions(+), 24 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] x86/mm: Factor out CR3-building code
  2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
  2017-09-17 18:18   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
  To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski, Tom Lendacky

Current, the code that assembles a value to load into CR3 is
open-coded everywhere.  Factor it out into helpers build_cr3() and
build_cr3_noflush().

This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems.  No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.

Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 15 +++++++++++----
 arch/x86/mm/tlb.c                  | 11 +++++------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 7ae318c340d9..a999ba6b721f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,6 +286,15 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
 
+static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
+{
+	return __sme_pa(mm->pgd) | asid;
+}
+
+static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
+{
+	return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+}
 
 /*
  * This can be used from process context to figure out what the value of
@@ -296,10 +305,8 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
  */
 static inline unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd);
-
-	if (static_cpu_has(X86_FEATURE_PCID))
-		cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1ab3821f9e26..93fe97cce581 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -126,8 +126,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() !=
-			 (__sme_pa(real_prev->pgd) | prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -172,7 +171,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			 */
 			this_cpu_write(cpu_tlbstate.ctxs[prev_asid].tlb_gen,
 				       next_tlb_gen);
-			write_cr3(__sme_pa(next->pgd) | prev_asid);
+			write_cr3(build_cr3(next, prev_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
 					TLB_FLUSH_ALL);
 		}
@@ -216,12 +215,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		if (need_flush) {
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-			write_cr3(__sme_pa(next->pgd) | new_asid);
+			write_cr3(build_cr3(next, new_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
 					TLB_FLUSH_ALL);
 		} else {
 			/* The new ASID is already up to date. */
-			write_cr3(__sme_pa(next->pgd) | new_asid | CR3_NOFLUSH);
+			write_cr3(build_cr3_noflush(next, new_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 		}
 
@@ -265,7 +264,7 @@ void initialize_tlbstate_and_flush(void)
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
 	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(cr3 & ~CR3_PCID_MASK);
+	write_cr3(build_cr3(mm, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
  2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
  2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
  3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
  To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski

Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.

There were some bugs like this in the past, and I think there's
still one in current kernels.  In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3.  This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed.  This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0.  We could potentially see corruption
when switching back to ASID 0.  In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this.  (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index a999ba6b721f..9001f7a52216 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,14 +286,31 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
 
+/*
+ * If PCID is on, ASID-aware code paths put the ASID + 1 into the PCID
+ * bits.  This serves two purposes.  It prevents a nasty situation in
+ * which PCID-unaware code saves CR3, loads some other value (with PCID
+ * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
+ * the saved ASID was nonzero.  It also means that any bugs involving
+ * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
+ * deterministically.
+ */
+
 static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
 {
-	return __sme_pa(mm->pgd) | asid;
+	if (static_cpu_has(X86_FEATURE_PCID)) {
+		VM_WARN_ON_ONCE(asid > 4094);
+		return __sme_pa(mm->pgd) | (asid + 1);
+	} else {
+		VM_WARN_ON_ONCE(asid != 0);
+		return __sme_pa(mm->pgd);
+	}
 }
 
 static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
 {
-	return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+	VM_WARN_ON_ONCE(asid > 4094);
+	return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
 }
 
 /*
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
  2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
  2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
  3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
  To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski

Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness.  I haven't seen any actual bugs here.

Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/cpu/bugs.c   | 8 --------
 arch/x86/kernel/cpu/common.c | 8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index db684880d74a..0af86d9242da 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,14 +21,6 @@
 
 void __init check_bugs(void)
 {
-#ifdef CONFIG_X86_32
-	/*
-	 * Regardless of whether PCID is enumerated, the SDM says
-	 * that it can't be enabled in 32-bit mode.
-	 */
-	setup_clear_cpu_cap(X86_FEATURE_PCID);
-#endif
-
 	identify_boot_cpu();
 
 	if (!IS_ENABLED(CONFIG_SMP)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 775f10100d7f..c9176bae7fd8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -904,6 +904,14 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	setup_force_cpu_cap(X86_FEATURE_ALWAYS);
 	fpu__init_system(c);
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Regardless of whether PCID is enumerated, the SDM says
+	 * that it can't be enabled in 32-bit mode.
+	 */
+	setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
 }
 
 void __init early_cpu_init(void)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
  2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
  To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski, Borislav Petkov

For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
intiial_page_table loaded into CR3.  Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3.  This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.

Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.

Cc: Borislav Petkov <bp@alien8.de>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/smpboot.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0854ff169274..ad59edd84de7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,12 +232,6 @@ static void notrace start_secondary(void *unused)
 	 */
 	if (boot_cpu_has(X86_FEATURE_PCID))
 		__write_cr4(__read_cr4() | X86_CR4_PCIDE);
-	cpu_init();
-	x86_cpuinit.early_percpu_clock_init();
-	preempt_disable();
-	smp_callin();
-
-	enable_start_cpu0 = 0;
 
 #ifdef CONFIG_X86_32
 	/* switch away from the initial page table */
@@ -245,6 +239,13 @@ static void notrace start_secondary(void *unused)
 	__flush_tlb_all();
 #endif
 
+	cpu_init();
+	x86_cpuinit.early_percpu_clock_init();
+	preempt_disable();
+	smp_callin();
+
+	enable_start_cpu0 = 0;
+
 	/* otherwise gcc will move up smp_processor_id before the cpu_init */
 	barrier();
 	/*
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/urgent] x86/mm: Factor out CR3-building code
  2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
@ 2017-09-17 18:18   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, torvalds, tglx, bpetkov, mingo, linux-kernel,
	Thomas.Lendacky, hpa, luto

Commit-ID:  47061a24e2ee5bd8a40d473d47a5bd823fa0081f
Gitweb:     http://git.kernel.org/tip/47061a24e2ee5bd8a40d473d47a5bd823fa0081f
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:48 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200

x86/mm: Factor out CR3-building code

Current, the code that assembles a value to load into CR3 is
open-coded everywhere.  Factor it out into helpers build_cr3() and
build_cr3_noflush().

This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems.  No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 15 +++++++++++----
 arch/x86/mm/tlb.c                  | 11 +++++------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 7ae318c..a999ba6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,6 +286,15 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
 
+static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
+{
+	return __sme_pa(mm->pgd) | asid;
+}
+
+static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
+{
+	return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+}
 
 /*
  * This can be used from process context to figure out what the value of
@@ -296,10 +305,8 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
  */
 static inline unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd);
-
-	if (static_cpu_has(X86_FEATURE_PCID))
-		cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1ab3821..93fe97c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -126,8 +126,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() !=
-			 (__sme_pa(real_prev->pgd) | prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -172,7 +171,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			 */
 			this_cpu_write(cpu_tlbstate.ctxs[prev_asid].tlb_gen,
 				       next_tlb_gen);
-			write_cr3(__sme_pa(next->pgd) | prev_asid);
+			write_cr3(build_cr3(next, prev_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
 					TLB_FLUSH_ALL);
 		}
@@ -216,12 +215,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		if (need_flush) {
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 			this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-			write_cr3(__sme_pa(next->pgd) | new_asid);
+			write_cr3(build_cr3(next, new_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
 					TLB_FLUSH_ALL);
 		} else {
 			/* The new ASID is already up to date. */
-			write_cr3(__sme_pa(next->pgd) | new_asid | CR3_NOFLUSH);
+			write_cr3(build_cr3_noflush(next, new_asid));
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 		}
 
@@ -265,7 +264,7 @@ void initialize_tlbstate_and_flush(void)
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
 	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(cr3 & ~CR3_PCID_MASK);
+	write_cr3(build_cr3(mm, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/urgent] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
  2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
@ 2017-09-17 18:19   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, tglx, mingo, linux-kernel, hpa, bpetkov, luto, torvalds

Commit-ID:  52a2af400c1075219b3f0ce5c96fc961da44018a
Gitweb:     http://git.kernel.org/tip/52a2af400c1075219b3f0ce5c96fc961da44018a
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:49 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200

x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code

Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.

There were some bugs like this in the past, and I think there's
still one in current kernels.  In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3.  This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed.  This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0.  We could potentially see corruption
when switching back to ASID 0.  In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this.  (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index a999ba6..c120b5d 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,14 +286,31 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
 
+/*
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID
+ * bits.  This serves two purposes.  It prevents a nasty situation in
+ * which PCID-unaware code saves CR3, loads some other value (with PCID
+ * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
+ * the saved ASID was nonzero.  It also means that any bugs involving
+ * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
+ * deterministically.
+ */
+
 static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
 {
-	return __sme_pa(mm->pgd) | asid;
+	if (static_cpu_has(X86_FEATURE_PCID)) {
+		VM_WARN_ON_ONCE(asid > 4094);
+		return __sme_pa(mm->pgd) | (asid + 1);
+	} else {
+		VM_WARN_ON_ONCE(asid != 0);
+		return __sme_pa(mm->pgd);
+	}
 }
 
 static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
 {
-	return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+	VM_WARN_ON_ONCE(asid > 4094);
+	return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/urgent] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
  2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
@ 2017-09-17 18:19   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, peterz, mingo, hpa, torvalds, bpetkov, luto, tglx

Commit-ID:  b8b7abaed7a49b350f8ba659ddc264b04931d581
Gitweb:     http://git.kernel.org/tip/b8b7abaed7a49b350f8ba659ddc264b04931d581
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:50 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200

x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier

Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness.  I haven't seen any actual bugs here.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/bugs.c   | 8 --------
 arch/x86/kernel/cpu/common.c | 8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index db68488..0af86d9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,14 +21,6 @@
 
 void __init check_bugs(void)
 {
-#ifdef CONFIG_X86_32
-	/*
-	 * Regardless of whether PCID is enumerated, the SDM says
-	 * that it can't be enabled in 32-bit mode.
-	 */
-	setup_clear_cpu_cap(X86_FEATURE_PCID);
-#endif
-
 	identify_boot_cpu();
 
 	if (!IS_ENABLED(CONFIG_SMP)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 775f101..c9176ba 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -904,6 +904,14 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	setup_force_cpu_cap(X86_FEATURE_ALWAYS);
 	fpu__init_system(c);
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Regardless of whether PCID is enumerated, the SDM says
+	 * that it can't be enabled in 32-bit mode.
+	 */
+	setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
 }
 
 void __init early_cpu_init(void)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
@ 2017-09-17 18:19   ` tip-bot for Andy Lutomirski
  2017-09-18  8:17     ` Paul Menzel
  2017-09-20 12:52     ` Pavel Machek
  0 siblings, 2 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bpetkov, peterz, tglx, bp, mingo, torvalds, pmenzel,
	linux-kernel, hpa, pavel, luto

Commit-ID:  4ba55e65f471d011d3ba2ac2022180ea0877d68e
Gitweb:     http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:09 +0200

x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs

For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
initial_page_table loaded into CR3.  Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3.  This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.

Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0854ff1..ad59edd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,12 +232,6 @@ static void notrace start_secondary(void *unused)
 	 */
 	if (boot_cpu_has(X86_FEATURE_PCID))
 		__write_cr4(__read_cr4() | X86_CR4_PCIDE);
-	cpu_init();
-	x86_cpuinit.early_percpu_clock_init();
-	preempt_disable();
-	smp_callin();
-
-	enable_start_cpu0 = 0;
 
 #ifdef CONFIG_X86_32
 	/* switch away from the initial page table */
@@ -245,6 +239,13 @@ static void notrace start_secondary(void *unused)
 	__flush_tlb_all();
 #endif
 
+	cpu_init();
+	x86_cpuinit.early_percpu_clock_init();
+	preempt_disable();
+	smp_callin();
+
+	enable_start_cpu0 = 0;
+
 	/* otherwise gcc will move up smp_processor_id before the cpu_init */
 	barrier();
 	/*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
@ 2017-09-18  8:17     ` Paul Menzel
  2017-09-20 12:52     ` Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Paul Menzel @ 2017-09-18  8:17 UTC (permalink / raw)
  To: mingo, bpetkov, peterz, bp, tglx, pavel, hpa, luto, torvalds,
	linux-kernel, linux-tip-commits

Dear Andy,


On 09/17/17 20:19, tip-bot for Andy Lutomirski wrote:
> Commit-ID:  4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Gitweb:     http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Author:     Andy Lutomirski <luto@kernel.org>
> AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 17 Sep 2017 18:59:09 +0200
> 
> x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
> 
> For unknown historical reasons (i.e. Borislav doesn't recall),
> 32-bit kernels invoke cpu_init() on secondary CPUs with
> initial_page_table loaded into CR3.  Then they set
> current->active_mm to &init_mm and call enter_lazy_tlb() before
> fixing CR3.

If this should be changed, I’d be happy to help.

> This means that the x86 TLB code gets invoked while CR3
> is inconsistent, and, with the improved PCID sanity checks I added,
> we warn.
> 
> Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
> 
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bpetkov@suse.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
> Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org

This should use the HTTPS protocol.

> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>   arch/x86/kernel/smpboot.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)

[…]

Thank you for the fix.

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Lenovo X60t)


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2017-09-18  8:17     ` Paul Menzel
@ 2017-09-20 12:52     ` Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2017-09-20 12:52 UTC (permalink / raw)
  To: mingo, bpetkov, peterz, bp, tglx, hpa, luto, torvalds, pmenzel,
	linux-kernel
  Cc: linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

On Sun 2017-09-17 11:19:53, tip-bot for Andy Lutomirski wrote:
> Commit-ID:  4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Gitweb:     http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Author:     Andy Lutomirski <luto@kernel.org>
> AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 17 Sep 2017 18:59:09 +0200
> 
> x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
> 
> For unknown historical reasons (i.e. Borislav doesn't recall),
> 32-bit kernels invoke cpu_init() on secondary CPUs with
> initial_page_table loaded into CR3.  Then they set
> current->active_mm to &init_mm and call enter_lazy_tlb() before
> fixing CR3.  This means that the x86 TLB code gets invoked while CR3
> is inconsistent, and, with the improved PCID sanity checks I added,
> we warn.
> 
> Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
> 
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Pavel Machek <pavel@ucw.cz>

4.14.0-rc1-next-20170919 does not produce the warning during bootup.

									Pavel

Tested-by: Pavel Machek <pavel@ucw.cz>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-20 12:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
2017-09-17 18:18   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
2017-09-17 18:19   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-18  8:17     ` Paul Menzel
2017-09-20 12:52     ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.