* [PATCH 0/4] More PCID fixes
@ 2017-09-17 16:03 Andy Lutomirski
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski
This fixes a 32-bit boot warning, a 32-bit boot oddity that seems
unsymtomatic right now, and a potential userspace corruption issue
across EFI calls if PCID is enabled.
With this series applied, the only remaining issue I'm aware of is
the paging-structure cache laziness problem.
I've tested SMP boot, suspend/resume, hotplug/unplug, and hibernate
on a 32-bit VM, a 64-bit VM, and a 64-bit laptop. (Except hibernate
on the laptop -- my laptop can't resume from hibernation on *any*
kernel due to a userspace (systemd? Fedora overall integration?)
issue.)
Andy Lutomirski (4):
x86/mm: Factor out CR3-building code
x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
arch/x86/include/asm/mmu_context.h | 32 ++++++++++++++++++++++++++++----
arch/x86/kernel/cpu/bugs.c | 8 --------
arch/x86/kernel/cpu/common.c | 8 ++++++++
arch/x86/kernel/smpboot.c | 13 +++++++------
arch/x86/mm/tlb.c | 11 +++++------
5 files changed, 48 insertions(+), 24 deletions(-)
--
2.13.5
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] x86/mm: Factor out CR3-building code
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
2017-09-17 18:18 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski, Tom Lendacky
Current, the code that assembles a value to load into CR3 is
open-coded everywhere. Factor it out into helpers build_cr3() and
build_cr3_noflush().
This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems. No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/mmu_context.h | 15 +++++++++++----
arch/x86/mm/tlb.c | 11 +++++------
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 7ae318c340d9..a999ba6b721f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,6 +286,15 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
return __pkru_allows_pkey(vma_pkey(vma), write);
}
+static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
+{
+ return __sme_pa(mm->pgd) | asid;
+}
+
+static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
+{
+ return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+}
/*
* This can be used from process context to figure out what the value of
@@ -296,10 +305,8 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
*/
static inline unsigned long __get_current_cr3_fast(void)
{
- unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd);
-
- if (static_cpu_has(X86_FEATURE_PCID))
- cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+ unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+ this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/* For now, be very restrictive about when this can be called. */
VM_WARN_ON(in_nmi() || preemptible());
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1ab3821f9e26..93fe97cce581 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -126,8 +126,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* isn't free.
*/
#ifdef CONFIG_DEBUG_VM
- if (WARN_ON_ONCE(__read_cr3() !=
- (__sme_pa(real_prev->pgd) | prev_asid))) {
+ if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
/*
* If we were to BUG here, we'd be very likely to kill
* the system so hard that we don't see the call trace.
@@ -172,7 +171,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
*/
this_cpu_write(cpu_tlbstate.ctxs[prev_asid].tlb_gen,
next_tlb_gen);
- write_cr3(__sme_pa(next->pgd) | prev_asid);
+ write_cr3(build_cr3(next, prev_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
TLB_FLUSH_ALL);
}
@@ -216,12 +215,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(__sme_pa(next->pgd) | new_asid);
+ write_cr3(build_cr3(next, new_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(__sme_pa(next->pgd) | new_asid | CR3_NOFLUSH);
+ write_cr3(build_cr3_noflush(next, new_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
}
@@ -265,7 +264,7 @@ void initialize_tlbstate_and_flush(void)
!(cr4_read_shadow() & X86_CR4_PCIDE));
/* Force ASID 0 and force a TLB flush. */
- write_cr3(cr3 & ~CR3_PCID_MASK);
+ write_cr3(build_cr3(mm, 0));
/* Reinitialize tlbstate. */
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
--
2.13.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski
Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.
There were some bugs like this in the past, and I think there's
still one in current kernels. In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3. This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed. This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0. We could potentially see corruption
when switching back to ASID 0. In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this. (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/mmu_context.h | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index a999ba6b721f..9001f7a52216 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,14 +286,31 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
return __pkru_allows_pkey(vma_pkey(vma), write);
}
+/*
+ * If PCID is on, ASID-aware code paths put the ASID + 1 into the PCID
+ * bits. This serves two purposes. It prevents a nasty situation in
+ * which PCID-unaware code saves CR3, loads some other value (with PCID
+ * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
+ * the saved ASID was nonzero. It also means that any bugs involving
+ * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
+ * deterministically.
+ */
+
static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
{
- return __sme_pa(mm->pgd) | asid;
+ if (static_cpu_has(X86_FEATURE_PCID)) {
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(mm->pgd) | (asid + 1);
+ } else {
+ VM_WARN_ON_ONCE(asid != 0);
+ return __sme_pa(mm->pgd);
+ }
}
static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
{
- return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
}
/*
--
2.13.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski
Otherwise we might have the PCID feature bit set during cpu_init().
This is just for robustness. I haven't seen any actual bugs here.
Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/cpu/bugs.c | 8 --------
arch/x86/kernel/cpu/common.c | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index db684880d74a..0af86d9242da 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,14 +21,6 @@
void __init check_bugs(void)
{
-#ifdef CONFIG_X86_32
- /*
- * Regardless of whether PCID is enumerated, the SDM says
- * that it can't be enabled in 32-bit mode.
- */
- setup_clear_cpu_cap(X86_FEATURE_PCID);
-#endif
-
identify_boot_cpu();
if (!IS_ENABLED(CONFIG_SMP)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 775f10100d7f..c9176bae7fd8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -904,6 +904,14 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
fpu__init_system(c);
+
+#ifdef CONFIG_X86_32
+ /*
+ * Regardless of whether PCID is enumerated, the SDM says
+ * that it can't be enabled in 32-bit mode.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
}
void __init early_cpu_init(void)
--
2.13.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
` (2 preceding siblings ...)
2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
@ 2017-09-17 16:03 ` Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
3 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-09-17 16:03 UTC (permalink / raw)
To: X86 ML; +Cc: Borislav Petkov, linux-kernel, Andy Lutomirski, Borislav Petkov
For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
intiial_page_table loaded into CR3. Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3. This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.
Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
Cc: Borislav Petkov <bp@alien8.de>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/smpboot.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0854ff169274..ad59edd84de7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,12 +232,6 @@ static void notrace start_secondary(void *unused)
*/
if (boot_cpu_has(X86_FEATURE_PCID))
__write_cr4(__read_cr4() | X86_CR4_PCIDE);
- cpu_init();
- x86_cpuinit.early_percpu_clock_init();
- preempt_disable();
- smp_callin();
-
- enable_start_cpu0 = 0;
#ifdef CONFIG_X86_32
/* switch away from the initial page table */
@@ -245,6 +239,13 @@ static void notrace start_secondary(void *unused)
__flush_tlb_all();
#endif
+ cpu_init();
+ x86_cpuinit.early_percpu_clock_init();
+ preempt_disable();
+ smp_callin();
+
+ enable_start_cpu0 = 0;
+
/* otherwise gcc will move up smp_processor_id before the cpu_init */
barrier();
/*
--
2.13.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [tip:x86/urgent] x86/mm: Factor out CR3-building code
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
@ 2017-09-17 18:18 ` tip-bot for Andy Lutomirski
0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, torvalds, tglx, bpetkov, mingo, linux-kernel,
Thomas.Lendacky, hpa, luto
Commit-ID: 47061a24e2ee5bd8a40d473d47a5bd823fa0081f
Gitweb: http://git.kernel.org/tip/47061a24e2ee5bd8a40d473d47a5bd823fa0081f
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:48 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200
x86/mm: Factor out CR3-building code
Current, the code that assembles a value to load into CR3 is
open-coded everywhere. Factor it out into helpers build_cr3() and
build_cr3_noflush().
This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems. No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/mmu_context.h | 15 +++++++++++----
arch/x86/mm/tlb.c | 11 +++++------
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 7ae318c..a999ba6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,6 +286,15 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
return __pkru_allows_pkey(vma_pkey(vma), write);
}
+static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
+{
+ return __sme_pa(mm->pgd) | asid;
+}
+
+static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
+{
+ return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+}
/*
* This can be used from process context to figure out what the value of
@@ -296,10 +305,8 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
*/
static inline unsigned long __get_current_cr3_fast(void)
{
- unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd);
-
- if (static_cpu_has(X86_FEATURE_PCID))
- cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+ unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+ this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/* For now, be very restrictive about when this can be called. */
VM_WARN_ON(in_nmi() || preemptible());
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1ab3821..93fe97c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -126,8 +126,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* isn't free.
*/
#ifdef CONFIG_DEBUG_VM
- if (WARN_ON_ONCE(__read_cr3() !=
- (__sme_pa(real_prev->pgd) | prev_asid))) {
+ if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
/*
* If we were to BUG here, we'd be very likely to kill
* the system so hard that we don't see the call trace.
@@ -172,7 +171,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
*/
this_cpu_write(cpu_tlbstate.ctxs[prev_asid].tlb_gen,
next_tlb_gen);
- write_cr3(__sme_pa(next->pgd) | prev_asid);
+ write_cr3(build_cr3(next, prev_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
TLB_FLUSH_ALL);
}
@@ -216,12 +215,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(__sme_pa(next->pgd) | new_asid);
+ write_cr3(build_cr3(next, new_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(__sme_pa(next->pgd) | new_asid | CR3_NOFLUSH);
+ write_cr3(build_cr3_noflush(next, new_asid));
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
}
@@ -265,7 +264,7 @@ void initialize_tlbstate_and_flush(void)
!(cr4_read_shadow() & X86_CR4_PCIDE));
/* Force ASID 0 and force a TLB flush. */
- write_cr3(cr3 & ~CR3_PCID_MASK);
+ write_cr3(build_cr3(mm, 0));
/* Reinitialize tlbstate. */
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [tip:x86/urgent] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
@ 2017-09-17 18:19 ` tip-bot for Andy Lutomirski
0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, tglx, mingo, linux-kernel, hpa, bpetkov, luto, torvalds
Commit-ID: 52a2af400c1075219b3f0ce5c96fc961da44018a
Gitweb: http://git.kernel.org/tip/52a2af400c1075219b3f0ce5c96fc961da44018a
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:49 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200
x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.
There were some bugs like this in the past, and I think there's
still one in current kernels. In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3. This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed. This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0. We could potentially see corruption
when switching back to ASID 0. In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this. (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/mmu_context.h | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index a999ba6..c120b5d 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -286,14 +286,31 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
return __pkru_allows_pkey(vma_pkey(vma), write);
}
+/*
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID
+ * bits. This serves two purposes. It prevents a nasty situation in
+ * which PCID-unaware code saves CR3, loads some other value (with PCID
+ * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
+ * the saved ASID was nonzero. It also means that any bugs involving
+ * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
+ * deterministically.
+ */
+
static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
{
- return __sme_pa(mm->pgd) | asid;
+ if (static_cpu_has(X86_FEATURE_PCID)) {
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(mm->pgd) | (asid + 1);
+ } else {
+ VM_WARN_ON_ONCE(asid != 0);
+ return __sme_pa(mm->pgd);
+ }
}
static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
{
- return __sme_pa(mm->pgd) | asid | CR3_NOFLUSH;
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
}
/*
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [tip:x86/urgent] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
@ 2017-09-17 18:19 ` tip-bot for Andy Lutomirski
0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, peterz, mingo, hpa, torvalds, bpetkov, luto, tglx
Commit-ID: b8b7abaed7a49b350f8ba659ddc264b04931d581
Gitweb: http://git.kernel.org/tip/b8b7abaed7a49b350f8ba659ddc264b04931d581
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:50 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:08 +0200
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
Otherwise we might have the PCID feature bit set during cpu_init().
This is just for robustness. I haven't seen any actual bugs here.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/bugs.c | 8 --------
arch/x86/kernel/cpu/common.c | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index db68488..0af86d9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,14 +21,6 @@
void __init check_bugs(void)
{
-#ifdef CONFIG_X86_32
- /*
- * Regardless of whether PCID is enumerated, the SDM says
- * that it can't be enabled in 32-bit mode.
- */
- setup_clear_cpu_cap(X86_FEATURE_PCID);
-#endif
-
identify_boot_cpu();
if (!IS_ENABLED(CONFIG_SMP)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 775f101..c9176ba 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -904,6 +904,14 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
fpu__init_system(c);
+
+#ifdef CONFIG_X86_32
+ /*
+ * Regardless of whether PCID is enumerated, the SDM says
+ * that it can't be enabled in 32-bit mode.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
}
void __init early_cpu_init(void)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
@ 2017-09-17 18:19 ` tip-bot for Andy Lutomirski
2017-09-18 8:17 ` Paul Menzel
2017-09-20 12:52 ` Pavel Machek
0 siblings, 2 replies; 11+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-09-17 18:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: bpetkov, peterz, tglx, bp, mingo, torvalds, pmenzel,
linux-kernel, hpa, pavel, luto
Commit-ID: 4ba55e65f471d011d3ba2ac2022180ea0877d68e
Gitweb: http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 17 Sep 2017 18:59:09 +0200
x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
initial_page_table loaded into CR3. Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3. This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.
Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/smpboot.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0854ff1..ad59edd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,12 +232,6 @@ static void notrace start_secondary(void *unused)
*/
if (boot_cpu_has(X86_FEATURE_PCID))
__write_cr4(__read_cr4() | X86_CR4_PCIDE);
- cpu_init();
- x86_cpuinit.early_percpu_clock_init();
- preempt_disable();
- smp_callin();
-
- enable_start_cpu0 = 0;
#ifdef CONFIG_X86_32
/* switch away from the initial page table */
@@ -245,6 +239,13 @@ static void notrace start_secondary(void *unused)
__flush_tlb_all();
#endif
+ cpu_init();
+ x86_cpuinit.early_percpu_clock_init();
+ preempt_disable();
+ smp_callin();
+
+ enable_start_cpu0 = 0;
+
/* otherwise gcc will move up smp_processor_id before the cpu_init */
barrier();
/*
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
@ 2017-09-18 8:17 ` Paul Menzel
2017-09-20 12:52 ` Pavel Machek
1 sibling, 0 replies; 11+ messages in thread
From: Paul Menzel @ 2017-09-18 8:17 UTC (permalink / raw)
To: mingo, bpetkov, peterz, bp, tglx, pavel, hpa, luto, torvalds,
linux-kernel, linux-tip-commits
Dear Andy,
On 09/17/17 20:19, tip-bot for Andy Lutomirski wrote:
> Commit-ID: 4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Gitweb: http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Author: Andy Lutomirski <luto@kernel.org>
> AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 17 Sep 2017 18:59:09 +0200
>
> x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
>
> For unknown historical reasons (i.e. Borislav doesn't recall),
> 32-bit kernels invoke cpu_init() on secondary CPUs with
> initial_page_table loaded into CR3. Then they set
> current->active_mm to &init_mm and call enter_lazy_tlb() before
> fixing CR3.
If this should be changed, I’d be happy to help.
> This means that the x86 TLB code gets invoked while CR3
> is inconsistent, and, with the improved PCID sanity checks I added,
> we warn.
>
> Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Borislav Petkov <bpetkov@suse.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume")
> Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org
This should use the HTTPS protocol.
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
> arch/x86/kernel/smpboot.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
[…]
Thank you for the fix.
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Lenovo X60t)
Kind regards,
Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:x86/urgent] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-18 8:17 ` Paul Menzel
@ 2017-09-20 12:52 ` Pavel Machek
1 sibling, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2017-09-20 12:52 UTC (permalink / raw)
To: mingo, bpetkov, peterz, bp, tglx, hpa, luto, torvalds, pmenzel,
linux-kernel
Cc: linux-tip-commits
[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]
On Sun 2017-09-17 11:19:53, tip-bot for Andy Lutomirski wrote:
> Commit-ID: 4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Gitweb: http://git.kernel.org/tip/4ba55e65f471d011d3ba2ac2022180ea0877d68e
> Author: Andy Lutomirski <luto@kernel.org>
> AuthorDate: Sun, 17 Sep 2017 09:03:51 -0700
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 17 Sep 2017 18:59:09 +0200
>
> x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
>
> For unknown historical reasons (i.e. Borislav doesn't recall),
> 32-bit kernels invoke cpu_init() on secondary CPUs with
> initial_page_table loaded into CR3. Then they set
> current->active_mm to &init_mm and call enter_lazy_tlb() before
> fixing CR3. This means that the x86 TLB code gets invoked while CR3
> is inconsistent, and, with the improved PCID sanity checks I added,
> we warn.
>
> Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Pavel Machek <pavel@ucw.cz>
4.14.0-rc1-next-20170919 does not produce the warning during bootup.
Pavel
Tested-by: Pavel Machek <pavel@ucw.cz>
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-09-20 12:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17 16:03 [PATCH 0/4] More PCID fixes Andy Lutomirski
2017-09-17 16:03 ` [PATCH 1/4] x86/mm: Factor out CR3-building code Andy Lutomirski
2017-09-17 18:18 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 2/4] x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 3/4] x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-17 16:03 ` [PATCH 4/4] x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs Andy Lutomirski
2017-09-17 18:19 ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-09-18 8:17 ` Paul Menzel
2017-09-20 12:52 ` Pavel Machek
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.