linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
@ 2020-04-19 20:31 Thomas Gleixner
  2020-04-19 20:31 ` [patch 01/15] x86/tlb: Uninline __get_current_cr3_fast() Thomas Gleixner
                   ` (18 more replies)
  0 siblings, 19 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

The per-CPU tlbstate contains sensitive information which should be really
only accessible in core code. It is exported to modules because some inline
functions which are required by KVM need access to it.

The following series creates regular exported functions for the few things
which are needed by KVM and hides the struct definition and some low level
helpers from modules.

The series is also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel x86/tlb

Thanks,

	tglx

8<-----------------
 arch/x86/events/core.c             |   11 
 arch/x86/include/asm/mmu_context.h |   88 -------
 arch/x86/include/asm/paravirt.h    |   12 
 arch/x86/include/asm/pgtable_32.h  |    2 
 arch/x86/include/asm/tlbflush.h    |  454 ++++---------------------------------
 arch/x86/kernel/alternative.c      |   55 ++++
 arch/x86/kernel/cpu/common.c       |   25 +-
 arch/x86/kernel/cpu/mtrr/generic.c |    4 
 arch/x86/kernel/paravirt.c         |   21 -
 arch/x86/kernel/process.c          |   11 
 arch/x86/mm/init.c                 |   14 +
 arch/x86/mm/init_64.c              |    2 
 arch/x86/mm/ioremap.c              |    2 
 arch/x86/mm/kmmio.c                |    2 
 arch/x86/mm/mem_encrypt.c          |    2 
 arch/x86/mm/pat/set_memory.c       |    2 
 arch/x86/mm/pgtable.c              |    8 
 arch/x86/mm/pgtable_32.c           |    2 
 arch/x86/mm/tlb.c                  |  367 +++++++++++++++++++++++++++++
 arch/x86/platform/uv/tlb_uv.c      |    2 
 drivers/xen/privcmd.c              |    1 
 21 files changed, 556 insertions(+), 531 deletions(-)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 01/15] x86/tlb: Uninline __get_current_cr3_fast()
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 02/15] x86/cpu: Uninline CR4 accessors Thomas Gleixner
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

In preparation of unexporting cpu_tlbstate move __get_current_cr3_fast()
into the x86 TLB management code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/mmu_context.h |   19 +------------------
 arch/x86/mm/tlb.c                  |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -225,24 +225,7 @@ static inline bool arch_vma_access_permi
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
 
-/*
- * This can be used from process context to figure out what the value of
- * CR3 is without needing to do a (slow) __read_cr3().
- *
- * It's intended to be used for code like KVM that sneakily changes CR3
- * and needs to restore it.  It needs to be used very carefully.
- */
-static inline unsigned long __get_current_cr3_fast(void)
-{
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
-
-	/* For now, be very restrictive about when this can be called. */
-	VM_WARN_ON(in_nmi() || preemptible());
-
-	VM_BUG_ON(cr3 != __read_cr3());
-	return cr3;
-}
+unsigned long __get_current_cr3_fast(void);
 
 typedef struct {
 	struct mm_struct *mm;
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -844,6 +844,26 @@ void flush_tlb_kernel_range(unsigned lon
 }
 
 /*
+ * This can be used from process context to figure out what the value of
+ * CR3 is without needing to do a (slow) __read_cr3().
+ *
+ * It's intended to be used for code like KVM that sneakily changes CR3
+ * and needs to restore it.  It needs to be used very carefully.
+ */
+unsigned long __get_current_cr3_fast(void)
+{
+	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+
+	/* For now, be very restrictive about when this can be called. */
+	VM_WARN_ON(in_nmi() || preemptible());
+
+	VM_BUG_ON(cr3 != __read_cr3());
+	return cr3;
+}
+EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
+
+/*
  * arch_tlbbatch_flush() performs a full TLB flush regardless of the active mm.
  * This means that the 'struct flush_tlb_info' that describes which mappings to
  * flush is actually fixed. We therefore set a single fixed struct and use it in


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 02/15] x86/cpu: Uninline CR4 accessors
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
  2020-04-19 20:31 ` [patch 01/15] x86/tlb: Uninline __get_current_cr3_fast() Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-20  9:01   ` Christoph Hellwig
  2020-04-19 20:31 ` [patch 03/15] x86/cr4: Sanitize CR4.PCE update Thomas Gleixner
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache is
located there.

In preparation of unexporting cpu_tlbstate create a builtin function for
manipulating CR4 and rework the various helpers to use it.

Export native_write_cr4() only when CONFIG_LKTDM=m.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |   36 +++++-------------------------------
 arch/x86/kernel/cpu/common.c    |   25 ++++++++++++++++++++++++-
 arch/x86/kernel/process.c       |   11 +++++++++++
 3 files changed, 40 insertions(+), 32 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -276,37 +276,25 @@ static inline bool nmi_uaccess_okay(void
 
 #define nmi_uaccess_okay nmi_uaccess_okay
 
+void cr4_update_irqsoff(unsigned long set, unsigned long clear);
+unsigned long cr4_read_shadow(void);
+
 /* Initialize cr4 shadow for this CPU. */
 static inline void cr4_init_shadow(void)
 {
 	this_cpu_write(cpu_tlbstate.cr4, __read_cr4());
 }
 
-static inline void __cr4_set(unsigned long cr4)
-{
-	lockdep_assert_irqs_disabled();
-	this_cpu_write(cpu_tlbstate.cr4, cr4);
-	__write_cr4(cr4);
-}
-
 /* Set in this cpu's CR4. */
 static inline void cr4_set_bits_irqsoff(unsigned long mask)
 {
-	unsigned long cr4;
-
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	if ((cr4 | mask) != cr4)
-		__cr4_set(cr4 | mask);
+	cr4_update_irqsoff(mask, 0);
 }
 
 /* Clear in this cpu's CR4. */
 static inline void cr4_clear_bits_irqsoff(unsigned long mask)
 {
-	unsigned long cr4;
-
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	if ((cr4 & ~mask) != cr4)
-		__cr4_set(cr4 & ~mask);
+	cr4_update_irqsoff(0, mask);
 }
 
 /* Set in this cpu's CR4. */
@@ -329,20 +317,6 @@ static inline void cr4_clear_bits(unsign
 	local_irq_restore(flags);
 }
 
-static inline void cr4_toggle_bits_irqsoff(unsigned long mask)
-{
-	unsigned long cr4;
-
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	__cr4_set(cr4 ^ mask);
-}
-
-/* Read the CR4 shadow. */
-static inline unsigned long cr4_read_shadow(void)
-{
-	return this_cpu_read(cpu_tlbstate.cr4);
-}
-
 /*
  * Mark all other ASIDs as invalid, preserves the current.
  */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -387,7 +387,30 @@ void native_write_cr4(unsigned long val)
 			  bits_missing);
 	}
 }
-EXPORT_SYMBOL(native_write_cr4);
+#if IS_MODULE(CONFIG_LKDTM)
+EXPORT_SYMBOL_GPL(native_write_cr4);
+#endif
+
+void cr4_update_irqsoff(unsigned long set, unsigned long clear)
+{
+	unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
+
+	lockdep_assert_irqs_disabled();
+
+	newval = (cr4 & ~clear) | set;
+	if (newval != cr4) {
+		this_cpu_write(cpu_tlbstate.cr4, newval);
+		__write_cr4(newval);
+	}
+}
+EXPORT_SYMBOL(cr4_update_irqsoff);
+
+/* Read the CR4 shadow. */
+unsigned long cr4_read_shadow(void)
+{
+	return this_cpu_read(cpu_tlbstate.cr4);
+}
+EXPORT_SYMBOL_GPL(cr4_read_shadow);
 
 void cr4_init(void)
 {
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -612,6 +612,17 @@ void speculation_ctrl_update_current(voi
 	preempt_enable();
 }
 
+static inline void cr4_toggle_bits_irqsoff(unsigned long mask)
+{
+	unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
+
+	newval = cr4 ^ mask;
+	if (newval != cr4) {
+		this_cpu_write(cpu_tlbstate.cr4, newval);
+		__write_cr4(newval);
+	}
+}
+
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 {
 	unsigned long tifp, tifn;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 03/15] x86/cr4: Sanitize CR4.PCE update
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
  2020-04-19 20:31 ` [patch 01/15] x86/tlb: Uninline __get_current_cr3_fast() Thomas Gleixner
  2020-04-19 20:31 ` [patch 02/15] x86/cpu: Uninline CR4 accessors Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 04/15] x86/alternatives: Move temporary_mm helpers into C Thomas Gleixner
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

load_mm_cr4_irqsoff() is really a strange name for a function which has
only one purpose: Update the CR4.PCE bit depending on the perf state.

Rename it to update_cr4_pce_mm(), move it into the tlb code and provide a
function which can be invoked by the perf smp function calls.

Another step to remove exposure of cpu_tlbstate.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/events/core.c             |   11 +++--------
 arch/x86/include/asm/mmu_context.h |   14 +-------------
 arch/x86/mm/tlb.c                  |   22 +++++++++++++++++++++-
 3 files changed, 25 insertions(+), 22 deletions(-)

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2162,11 +2162,6 @@ static int x86_pmu_event_init(struct per
 	return err;
 }
 
-static void refresh_pce(void *ignored)
-{
-	load_mm_cr4_irqsoff(this_cpu_read(cpu_tlbstate.loaded_mm));
-}
-
 static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 {
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
@@ -2185,7 +2180,7 @@ static void x86_pmu_event_mapped(struct
 	lockdep_assert_held_write(&mm->mmap_sem);
 
 	if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1)
-		on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1);
+		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
 }
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
@@ -2195,7 +2190,7 @@ static void x86_pmu_event_unmapped(struc
 		return;
 
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
-		on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1);
+		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
 }
 
 static int x86_pmu_event_idx(struct perf_event *event)
@@ -2253,7 +2248,7 @@ static ssize_t set_attr_rdpmc(struct dev
 		else if (x86_pmu.attr_rdpmc == 2)
 			static_branch_dec(&rdpmc_always_available_key);
 
-		on_each_cpu(refresh_pce, NULL, 1);
+		on_each_cpu(cr4_update_pce, NULL, 1);
 		x86_pmu.attr_rdpmc = val;
 	}
 
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -24,21 +24,9 @@ static inline void paravirt_activate_mm(
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
 #ifdef CONFIG_PERF_EVENTS
-
 DECLARE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DECLARE_STATIC_KEY_FALSE(rdpmc_always_available_key);
-
-static inline void load_mm_cr4_irqsoff(struct mm_struct *mm)
-{
-	if (static_branch_unlikely(&rdpmc_always_available_key) ||
-	    (!static_branch_unlikely(&rdpmc_never_available_key) &&
-	     atomic_read(&mm->context.perf_rdpmc_allowed)))
-		cr4_set_bits_irqsoff(X86_CR4_PCE);
-	else
-		cr4_clear_bits_irqsoff(X86_CR4_PCE);
-}
-#else
-static inline void load_mm_cr4_irqsoff(struct mm_struct *mm) {}
+void cr4_update_pce(void *ignored);
 #endif
 
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -272,6 +272,26 @@ static void cond_ibpb(struct task_struct
 	}
 }
 
+#ifdef CONFIG_PERF_EVENTS
+static inline void cr4_update_pce_mm(struct mm_struct *mm)
+{
+	if (static_branch_unlikely(&rdpmc_always_available_key) ||
+	    (!static_branch_unlikely(&rdpmc_never_available_key) &&
+	     atomic_read(&mm->context.perf_rdpmc_allowed)))
+		cr4_set_bits_irqsoff(X86_CR4_PCE);
+	else
+		cr4_clear_bits_irqsoff(X86_CR4_PCE);
+}
+
+void cr4_update_pce(void *ignored)
+{
+	cr4_update_pce_mm(this_cpu_read(cpu_tlbstate.loaded_mm));
+}
+
+#else
+static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
+#endif
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -440,7 +460,7 @@ void switch_mm_irqs_off(struct mm_struct
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
 
 	if (next != real_prev) {
-		load_mm_cr4_irqsoff(next);
+		cr4_update_pce_mm(next);
 		switch_ldt(real_prev, next);
 	}
 }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 04/15] x86/alternatives: Move temporary_mm helpers into C
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-04-19 20:31 ` [patch 03/15] x86/cr4: Sanitize CR4.PCE update Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 05/15] x86/tlb: Move __flush_tlb() out of line Thomas Gleixner
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

The only user of these inlines is the text poke code and this must not be
exposed to the world.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/mmu_context.h |   55 -------------------------------------
 arch/x86/kernel/alternative.c      |   55 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 55 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -215,59 +215,4 @@ static inline bool arch_vma_access_permi
 
 unsigned long __get_current_cr3_fast(void);
 
-typedef struct {
-	struct mm_struct *mm;
-} temp_mm_state_t;
-
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
- */
-static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
-{
-	temp_mm_state_t temp_state;
-
-	lockdep_assert_irqs_disabled();
-	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	switch_mm_irqs_off(NULL, mm, current);
-
-	/*
-	 * If breakpoints are enabled, disable them while the temporary mm is
-	 * used. Userspace might set up watchpoints on addresses that are used
-	 * in the temporary mm, which would lead to wrong signals being sent or
-	 * crashes.
-	 *
-	 * Note that breakpoints are not disabled selectively, which also causes
-	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
-	 * undesirable, but still seems reasonable as the code that runs in the
-	 * temporary mm should be short.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_disable();
-
-	return temp_state;
-}
-
-static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
-{
-	lockdep_assert_irqs_disabled();
-	switch_mm_irqs_off(NULL, prev_state.mm, current);
-
-	/*
-	 * Restore the breakpoints if they were disabled before the temporary mm
-	 * was loaded.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
-}
-
 #endif /* _ASM_X86_MMU_CONTEXT_H */
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -783,6 +783,61 @@ void __init_or_module text_poke_early(vo
 	}
 }
 
+typedef struct {
+	struct mm_struct *mm;
+} temp_mm_state_t;
+
+/*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ *          harden security IRQs must be disabled while the temporary mm is
+ *          loaded, thereby preventing interrupt handler bugs from overriding
+ *          the kernel memory protection.
+ */
+static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
+{
+	temp_mm_state_t temp_state;
+
+	lockdep_assert_irqs_disabled();
+	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, mm, current);
+
+	/*
+	 * If breakpoints are enabled, disable them while the temporary mm is
+	 * used. Userspace might set up watchpoints on addresses that are used
+	 * in the temporary mm, which would lead to wrong signals being sent or
+	 * crashes.
+	 *
+	 * Note that breakpoints are not disabled selectively, which also causes
+	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
+	 * undesirable, but still seems reasonable as the code that runs in the
+	 * temporary mm should be short.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_disable();
+
+	return temp_state;
+}
+
+static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
+{
+	lockdep_assert_irqs_disabled();
+	switch_mm_irqs_off(NULL, prev_state.mm, current);
+
+	/*
+	 * Restore the breakpoints if they were disabled before the temporary mm
+	 * was loaded.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_restore();
+}
+
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-04-19 20:31 ` [patch 04/15] x86/alternatives: Move temporary_mm helpers into C Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-20 13:48   ` Tom Lendacky
  2020-04-19 20:31 ` [patch 06/15] x86/tlb: Move __flush_tlb_global() " Thomas Gleixner
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various TLB flush functions need access to cpu_tlbstate. As a first
step move __flush_tlb() out of line and hide the native function. The
latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h    |    4 +++-
 arch/x86/include/asm/tlbflush.h    |   29 +++++------------------------
 arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
 arch/x86/kernel/paravirt.c         |    7 +------
 arch/x86/mm/mem_encrypt.c          |    2 +-
 arch/x86/mm/tlb.c                  |   33 ++++++++++++++++++++++++++++++++-
 arch/x86/platform/uv/tlb_uv.c      |    2 +-
 7 files changed, 45 insertions(+), 36 deletions(-)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -47,7 +47,9 @@ static inline void slow_down_io(void)
 #endif
 }
 
-static inline void __flush_tlb(void)
+void native_flush_tlb_local(void);
+
+static inline void __flush_tlb_local(void)
 {
 	PVOP_VCALL0(mmu.flush_tlb_user);
 }
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
 	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
+void flush_tlb_local(void);
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-#define __flush_tlb() __native_flush_tlb()
-#define __flush_tlb_global() __native_flush_tlb_global()
-#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
+#define __flush_tlb_global()		__native_flush_tlb_global()
+#define __flush_tlb_one_user(addr)	__native_flush_tlb_one_user(addr)
 #endif
 
 struct tlb_context {
@@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
 }
 
 /*
- * flush the entire current user mapping
- */
-static inline void __native_flush_tlb(void)
-{
-	/*
-	 * Preemption or interrupts must be disabled to protect the access
-	 * to the per CPU variable and to prevent being preempted between
-	 * read_cr3() and write_cr3().
-	 */
-	WARN_ON_ONCE(preemptible());
-
-	invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
-
-	/* If current->mm == NULL then the read_cr3() "borrows" an mm */
-	native_write_cr3(__native_read_cr3());
-}
-
-/*
  * flush everything
  */
 static inline void __native_flush_tlb_global(void)
@@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
 		/*
 		 * !PGE -> !PCID (setup_pcid()), thus every flush is total.
 		 */
-		__flush_tlb();
+		flush_tlb_local();
 	}
 }
 
@@ -537,8 +520,6 @@ struct flush_tlb_info {
 	bool			freed_tables;
 };
 
-#define local_flush_tlb() __flush_tlb()
-
 #define flush_tlb_mm(mm)						\
 		flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
 
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
 
 	/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
 	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
-	__flush_tlb();
+	flush_tlb_local();
 
 	/* Save MTRR state */
 	rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
@@ -778,7 +778,7 @@ static void post_set(void) __releases(se
 {
 	/* Flush TLBs (no need to flush caches - they are disabled) */
 	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
-	__flush_tlb();
+	flush_tlb_local();
 
 	/* Intel (P6) standard MTRRs */
 	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
 	return insn_len;
 }
 
-static void native_flush_tlb(void)
-{
-	__native_flush_tlb();
-}
-
 /*
  * Global pages have to be flushed a bit differently. Not a real
  * performance problem because this does not happen often.
@@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
 #endif /* CONFIG_PARAVIRT_XXL */
 
 	/* Mmu ops. */
-	.mmu.flush_tlb_user	= native_flush_tlb,
+	.mmu.flush_tlb_user	= native_flush_tlb_local,
 	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
 	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
 	.mmu.flush_tlb_others	= native_flush_tlb_others,
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
 		size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
 	} while (size);
 
-	__native_flush_tlb();
+	flush_tlb_local();
 }
 
 void __init sme_unmap_bootdata(char *real_mode_data)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -18,6 +18,13 @@
 
 #include "mm_internal.h"
 
+#ifdef CONFIG_PARAVIRT
+# define STATIC_NOPV
+#else
+# define STATIC_NOPV			static
+# define __flush_tlb_local		native_flush_tlb_local
+#endif
+
 /*
  *	TLB flushing, formerly SMP-only
  *		c/o Linus Torvalds.
@@ -645,7 +652,7 @@ static void flush_tlb_func_common(const
 		trace_tlb_flush(reason, nr_invalidate);
 	} else {
 		/* Full flush. */
-		local_flush_tlb();
+		flush_tlb_local();
 		if (local)
 			count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
 		trace_tlb_flush(reason, TLB_FLUSH_ALL);
@@ -884,6 +891,30 @@ unsigned long __get_current_cr3_fast(voi
 EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
 
 /*
+ * Flush the entire current user mapping
+ */
+STATIC_NOPV void native_flush_tlb_local(void)
+{
+	/*
+	 * Preemption or interrupts must be disabled to protect the access
+	 * to the per CPU variable and to prevent being preempted between
+	 * read_cr3() and write_cr3().
+	 */
+	WARN_ON_ONCE(preemptible());
+
+	invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+
+	/* If current->mm == NULL then the read_cr3() "borrows" an mm */
+	native_write_cr3(__native_read_cr3());
+}
+
+void flush_tlb_local(void)
+{
+	__flush_tlb_local();
+}
+EXPORT_SYMBOL_GPL(flush_tlb_local);
+
+/*
  * arch_tlbbatch_flush() performs a full TLB flush regardless of the active mm.
  * This means that the 'struct flush_tlb_info' that describes which mappings to
  * flush is actually fixed. We therefore set a single fixed struct and use it in
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -293,7 +293,7 @@ static void bau_process_message(struct m
 	 * This must be a normal message, or retry of a normal message
 	 */
 	if (msg->address == TLB_FLUSH_ALL) {
-		local_flush_tlb();
+		flush_tlb_local();
 		stat->d_alltlb++;
 	} else {
 		__flush_tlb_one_user(msg->address);


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 06/15] x86/tlb: Move __flush_tlb_global() out of line
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (4 preceding siblings ...)
  2020-04-19 20:31 ` [patch 05/15] x86/tlb: Move __flush_tlb() out of line Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 07/15] x86/tlb: Move __flush_tlb_one_user() " Thomas Gleixner
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Juergen Gross, Thomas Lendacky,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various TLB flush functions need access to cpu_tlbstate. As 2nd step
move __flush_tlb_global() out of line and hide the native function. The
latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h |    1 
 arch/x86/include/asm/tlbflush.h |   38 +------------------------------------
 arch/x86/kernel/paravirt.c      |    9 --------
 arch/x86/mm/tlb.c               |   41 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+), 45 deletions(-)
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -48,6 +48,7 @@ static inline void slow_down_io(void)
 }
 
 void native_flush_tlb_local(void);
+void native_flush_tlb_global(void);
 
 static inline void __flush_tlb_local(void)
 {
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -141,11 +141,11 @@ static inline unsigned long build_cr3_no
 }
 
 void flush_tlb_local(void);
+void flush_tlb_global(void);
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-#define __flush_tlb_global()		__native_flush_tlb_global()
 #define __flush_tlb_one_user(addr)	__native_flush_tlb_one_user(addr)
 #endif
 
@@ -372,40 +372,6 @@ static inline void invalidate_user_asid(
 }
 
 /*
- * flush everything
- */
-static inline void __native_flush_tlb_global(void)
-{
-	unsigned long cr4, flags;
-
-	if (static_cpu_has(X86_FEATURE_INVPCID)) {
-		/*
-		 * Using INVPCID is considerably faster than a pair of writes
-		 * to CR4 sandwiched inside an IRQ flag save/restore.
-		 *
-		 * Note, this works with CR4.PCIDE=0 or 1.
-		 */
-		invpcid_flush_all();
-		return;
-	}
-
-	/*
-	 * Read-modify-write to CR4 - protect it from preemption and
-	 * from interrupts. (Use the raw variant because this code can
-	 * be called from deep inside debugging code.)
-	 */
-	raw_local_irq_save(flags);
-
-	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	/* toggle PGE */
-	native_write_cr4(cr4 ^ X86_CR4_PGE);
-	/* write old PGE again and flush TLBs */
-	native_write_cr4(cr4);
-
-	raw_local_irq_restore(flags);
-}
-
-/*
  * flush one page in the user mapping
  */
 static inline void __native_flush_tlb_one_user(unsigned long addr)
@@ -439,7 +405,7 @@ static inline void __flush_tlb_all(void)
 	VM_WARN_ON_ONCE(preemptible());
 
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
-		__flush_tlb_global();
+		flush_tlb_global();
 	} else {
 		/*
 		 * !PGE -> !PCID (setup_pcid()), thus every flush is total.
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -160,15 +160,6 @@ unsigned paravirt_patch_insns(void *insn
 	return insn_len;
 }
 
-/*
- * Global pages have to be flushed a bit differently. Not a real
- * performance problem because this does not happen often.
- */
-static void native_flush_tlb_global(void)
-{
-	__native_flush_tlb_global();
-}
-
 static void native_flush_tlb_one_user(unsigned long addr)
 {
 	__native_flush_tlb_one_user(addr);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -23,6 +23,7 @@
 #else
 # define STATIC_NOPV			static
 # define __flush_tlb_local		native_flush_tlb_local
+# define __flush_tlb_global		native_flush_tlb_global
 #endif
 
 /*
@@ -891,6 +892,46 @@ unsigned long __get_current_cr3_fast(voi
 EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
 
 /*
+ * Flush everything
+ */
+STATIC_NOPV void native_flush_tlb_global(void)
+{
+	unsigned long cr4, flags;
+
+	if (static_cpu_has(X86_FEATURE_INVPCID)) {
+		/*
+		 * Using INVPCID is considerably faster than a pair of writes
+		 * to CR4 sandwiched inside an IRQ flag save/restore.
+		 *
+		 * Note, this works with CR4.PCIDE=0 or 1.
+		 */
+		invpcid_flush_all();
+		return;
+	}
+
+	/*
+	 * Read-modify-write to CR4 - protect it from preemption and
+	 * from interrupts. (Use the raw variant because this code can
+	 * be called from deep inside debugging code.)
+	 */
+	raw_local_irq_save(flags);
+
+	cr4 = this_cpu_read(cpu_tlbstate.cr4);
+	/* toggle PGE */
+	native_write_cr4(cr4 ^ X86_CR4_PGE);
+	/* write old PGE again and flush TLBs */
+	native_write_cr4(cr4);
+
+	raw_local_irq_restore(flags);
+}
+
+void flush_tlb_global(void)
+{
+	__flush_tlb_global();
+}
+EXPORT_SYMBOL_GPL(flush_tlb_global);
+
+/*
  * Flush the entire current user mapping
  */
 STATIC_NOPV void native_flush_tlb_local(void)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 07/15] x86/tlb: Move __flush_tlb_one_user() out of line
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (5 preceding siblings ...)
  2020-04-19 20:31 ` [patch 06/15] x86/tlb: Move __flush_tlb_global() " Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 08/15] x86/tlb: Move __flush_tlb_one_kernel() " Thomas Gleixner
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Juergen Gross, Thomas Lendacky,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various TLB flush functions need access to cpu_tlbstate. As third step
move _flush_tlb_one_user() out of line and hide the native function. The
latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h |    1 
 arch/x86/include/asm/tlbflush.h |   53 +--------------------------------------
 arch/x86/kernel/paravirt.c      |    5 ---
 arch/x86/mm/tlb.c               |   54 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 56 deletions(-)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -49,6 +49,7 @@ static inline void slow_down_io(void)
 
 void native_flush_tlb_local(void);
 void native_flush_tlb_global(void);
+void native_flush_tlb_one_user(unsigned long addr);
 
 static inline void __flush_tlb_local(void)
 {
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -142,11 +142,10 @@ static inline unsigned long build_cr3_no
 
 void flush_tlb_local(void);
 void flush_tlb_global(void);
+void flush_tlb_one_user(unsigned long addr);
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
-#else
-#define __flush_tlb_one_user(addr)	__native_flush_tlb_one_user(addr)
 #endif
 
 struct tlb_context {
@@ -346,54 +345,6 @@ static inline void cr4_set_bits_and_upda
 extern void initialize_tlbstate_and_flush(void);
 
 /*
- * Given an ASID, flush the corresponding user ASID.  We can delay this
- * until the next time we switch to it.
- *
- * See SWITCH_TO_USER_CR3.
- */
-static inline void invalidate_user_asid(u16 asid)
-{
-	/* There is no user ASID if address space separation is off */
-	if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
-		return;
-
-	/*
-	 * We only have a single ASID if PCID is off and the CR3
-	 * write will have flushed it.
-	 */
-	if (!cpu_feature_enabled(X86_FEATURE_PCID))
-		return;
-
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return;
-
-	__set_bit(kern_pcid(asid),
-		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
-}
-
-/*
- * flush one page in the user mapping
- */
-static inline void __native_flush_tlb_one_user(unsigned long addr)
-{
-	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
-
-	asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
-
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return;
-
-	/*
-	 * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
-	 * Just use invalidate_user_asid() in case we are called early.
-	 */
-	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
-		invalidate_user_asid(loaded_mm_asid);
-	else
-		invpcid_flush_one(user_pcid(loaded_mm_asid), addr);
-}
-
-/*
  * flush everything
  */
 static inline void __flush_tlb_all(void)
@@ -432,7 +383,7 @@ static inline void __flush_tlb_one_kerne
 	 * kernel address space and for its usermode counterpart, but it does
 	 * not flush it for other address spaces.
 	 */
-	__flush_tlb_one_user(addr);
+	flush_tlb_one_user(addr);
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
 	return insn_len;
 }
 
-static void native_flush_tlb_one_user(unsigned long addr)
-{
-	__native_flush_tlb_one_user(addr);
-}
-
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -24,6 +24,7 @@
 # define STATIC_NOPV			static
 # define __flush_tlb_local		native_flush_tlb_local
 # define __flush_tlb_global		native_flush_tlb_global
+# define __flush_tlb_one_user(addr)	native_flush_tlb_one_user(addr)
 #endif
 
 /*
@@ -118,6 +119,32 @@ static void choose_new_asid(struct mm_st
 	*need_flush = true;
 }
 
+/*
+ * Given an ASID, flush the corresponding user ASID.  We can delay this
+ * until the next time we switch to it.
+ *
+ * See SWITCH_TO_USER_CR3.
+ */
+static inline void invalidate_user_asid(u16 asid)
+{
+	/* There is no user ASID if address space separation is off */
+	if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+		return;
+
+	/*
+	 * We only have a single ASID if PCID is off and the CR3
+	 * write will have flushed it.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_PCID))
+		return;
+
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	__set_bit(kern_pcid(asid),
+		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
+}
+
 static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
 {
 	unsigned long new_mm_cr3;
@@ -892,6 +919,33 @@ unsigned long __get_current_cr3_fast(voi
 EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
 
 /*
+ * Flush one page in the user mapping
+ */
+STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr)
+{
+	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+
+	asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	/*
+	 * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
+	 * Just use invalidate_user_asid() in case we are called early.
+	 */
+	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
+		invalidate_user_asid(loaded_mm_asid);
+	else
+		invpcid_flush_one(user_pcid(loaded_mm_asid), addr);
+}
+
+void flush_tlb_one_user(unsigned long addr)
+{
+	__flush_tlb_one_user(addr);
+}
+
+/*
  * Flush everything
  */
 STATIC_NOPV void native_flush_tlb_global(void)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 08/15] x86/tlb: Move __flush_tlb_one_kernel() out of line
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (6 preceding siblings ...)
  2020-04-19 20:31 ` [patch 07/15] x86/tlb: Move __flush_tlb_one_user() " Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 09/15] x86/tlb: Move flush_tlb_others() " Thomas Gleixner
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various TLB flush functions need access to cpu_tlbstate. As forth step
move __flush_tlb_one_kernel() out of line and hide the native function. The
latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/pgtable_32.h |    2 -
 arch/x86/include/asm/tlbflush.h   |   41 --------------------------------------
 arch/x86/mm/init_64.c             |    2 -
 arch/x86/mm/ioremap.c             |    2 -
 arch/x86/mm/kmmio.c               |    2 -
 arch/x86/mm/pat/set_memory.c      |    2 -
 arch/x86/mm/pgtable_32.c          |    2 -
 arch/x86/mm/tlb.c                 |   34 ++++++++++++++++++++++++++++++-
 8 files changed, 40 insertions(+), 47 deletions(-)

--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -60,7 +60,7 @@ void sync_initial_page_table(void);
 #define kpte_clear_flush(ptep, vaddr)		\
 do {						\
 	pte_clear(&init_mm, (vaddr), (ptep));	\
-	__flush_tlb_one_kernel((vaddr));		\
+	flush_tlb_one_kernel((vaddr));		\
 } while (0)
 
 #endif /* !__ASSEMBLY__ */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -143,6 +143,7 @@ static inline unsigned long build_cr3_no
 void flush_tlb_local(void);
 void flush_tlb_global(void);
 void flush_tlb_one_user(unsigned long addr);
+void flush_tlb_one_kernel(unsigned long addr);
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
@@ -318,14 +319,6 @@ static inline void cr4_clear_bits(unsign
 }
 
 /*
- * Mark all other ASIDs as invalid, preserves the current.
- */
-static inline void invalidate_other_asid(void)
-{
-	this_cpu_write(cpu_tlbstate.invalidate_other, true);
-}
-
-/*
  * Save some of cr4 feature set we're using (e.g.  Pentium 4MB
  * enable and PPro Global page enable), so that any CPU's that boot
  * up after us can get the correct flags.  This should only be used
@@ -365,38 +358,6 @@ static inline void __flush_tlb_all(void)
 	}
 }
 
-/*
- * flush one page in the kernel mapping
- */
-static inline void __flush_tlb_one_kernel(unsigned long addr)
-{
-	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
-
-	/*
-	 * If PTI is off, then __flush_tlb_one_user() is just INVLPG or its
-	 * paravirt equivalent.  Even with PCID, this is sufficient: we only
-	 * use PCID if we also use global PTEs for the kernel mapping, and
-	 * INVLPG flushes global translations across all address spaces.
-	 *
-	 * If PTI is on, then the kernel is mapped with non-global PTEs, and
-	 * __flush_tlb_one_user() will flush the given address for the current
-	 * kernel address space and for its usermode counterpart, but it does
-	 * not flush it for other address spaces.
-	 */
-	flush_tlb_one_user(addr);
-
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return;
-
-	/*
-	 * See above.  We need to propagate the flush to all other address
-	 * spaces.  In principle, we only need to propagate it to kernelmode
-	 * address spaces, but the extra bookkeeping we would need is not
-	 * worth it.
-	 */
-	invalidate_other_asid();
-}
-
 #define TLB_FLUSH_ALL	-1UL
 
 /*
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -298,7 +298,7 @@ static void __set_pte_vaddr(pud_t *pud,
 	 * It's enough to flush this one mapping.
 	 * (PGE mappings get flushed as well)
 	 */
-	__flush_tlb_one_kernel(vaddr);
+	flush_tlb_one_kernel(vaddr);
 }
 
 void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte)
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -889,5 +889,5 @@ void __init __early_set_fixmap(enum fixe
 		set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags));
 	else
 		pte_clear(&init_mm, addr, pte);
-	__flush_tlb_one_kernel(addr);
+	flush_tlb_one_kernel(addr);
 }
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -173,7 +173,7 @@ static int clear_page_presence(struct km
 		return -1;
 	}
 
-	__flush_tlb_one_kernel(f->addr);
+	flush_tlb_one_kernel(f->addr);
 	return 0;
 }
 
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -340,7 +340,7 @@ static void __cpa_flush_tlb(void *data)
 	unsigned int i;
 
 	for (i = 0; i < cpa->numpages; i++)
-		__flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
+		flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
 }
 
 static void cpa_flush(struct cpa_data *data, int cache)
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -64,7 +64,7 @@ void set_pte_vaddr(unsigned long vaddr,
 	 * It's enough to flush this one mapping.
 	 * (PGE mappings get flushed as well)
 	 */
-	__flush_tlb_one_kernel(vaddr);
+	flush_tlb_one_kernel(vaddr);
 }
 
 unsigned long __FIXADDR_TOP = 0xfffff000;
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -876,7 +876,7 @@ static void do_kernel_range_flush(void *
 
 	/* flush range by one by one 'invlpg' */
 	for (addr = f->start; addr < f->end; addr += PAGE_SIZE)
-		__flush_tlb_one_kernel(addr);
+		flush_tlb_one_kernel(addr);
 }
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
@@ -919,6 +919,38 @@ unsigned long __get_current_cr3_fast(voi
 EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
 
 /*
+ * Flush one page in the kernel mapping
+ */
+void flush_tlb_one_kernel(unsigned long addr)
+{
+	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
+
+	/*
+	 * If PTI is off, then __flush_tlb_one_user() is just INVLPG or its
+	 * paravirt equivalent.  Even with PCID, this is sufficient: we only
+	 * use PCID if we also use global PTEs for the kernel mapping, and
+	 * INVLPG flushes global translations across all address spaces.
+	 *
+	 * If PTI is on, then the kernel is mapped with non-global PTEs, and
+	 * __flush_tlb_one_user() will flush the given address for the current
+	 * kernel address space and for its usermode counterpart, but it does
+	 * not flush it for other address spaces.
+	 */
+	flush_tlb_one_user(addr);
+
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	/*
+	 * See above.  We need to propagate the flush to all other address
+	 * spaces.  In principle, we only need to propagate it to kernelmode
+	 * address spaces, but the extra bookkeeping we would need is not
+	 * worth it.
+	 */
+	this_cpu_write(cpu_tlbstate.invalidate_other, true);
+}
+
+/*
  * Flush one page in the user mapping
  */
 STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 09/15] x86/tlb: Move flush_tlb_others() out of line
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (7 preceding siblings ...)
  2020-04-19 20:31 ` [patch 08/15] x86/tlb: Move __flush_tlb_one_kernel() " Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 10/15] x86/tlb: Move paravirt_tlb_remove_table() to the usage site Thomas Gleixner
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

The various TLB flush functions need access to cpu_tlbstate. As last step
move __flush_tlb_others() out of line and hide the native function. The
latter can be static when CONFIG_PARAVIRT is disabled.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/paravirt.h |    6 ++++--
 arch/x86/include/asm/tlbflush.h |   10 ++++------
 arch/x86/mm/tlb.c               |   11 +++++++++--
 3 files changed, 17 insertions(+), 10 deletions(-)


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/paravirt.h |    6 ++++--
 arch/x86/include/asm/tlbflush.h |   10 ++++------
 arch/x86/mm/tlb.c               |   11 +++++++++--
 3 files changed, 17 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -50,6 +50,8 @@ static inline void slow_down_io(void)
 void native_flush_tlb_local(void);
 void native_flush_tlb_global(void);
 void native_flush_tlb_one_user(unsigned long addr);
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     const struct flush_tlb_info *info);
 
 static inline void __flush_tlb_local(void)
 {
@@ -66,8 +68,8 @@ static inline void __flush_tlb_one_user(
 	PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
 }
 
-static inline void flush_tlb_others(const struct cpumask *cpumask,
-				    const struct flush_tlb_info *info)
+static inline void __flush_tlb_others(const struct cpumask *cpumask,
+				      const struct flush_tlb_info *info)
 {
 	PVOP_VCALL2(mmu.flush_tlb_others, cpumask, info);
 }
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -140,10 +140,14 @@ static inline unsigned long build_cr3_no
 	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
+struct flush_tlb_info;
+
 void flush_tlb_local(void);
 void flush_tlb_global(void);
 void flush_tlb_one_user(unsigned long addr);
 void flush_tlb_one_kernel(unsigned long addr);
+void flush_tlb_others(const struct cpumask *cpumask,
+		      const struct flush_tlb_info *info);
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
@@ -418,9 +422,6 @@ static inline void flush_tlb_page(struct
 	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
 }
 
-void native_flush_tlb_others(const struct cpumask *cpumask,
-			     const struct flush_tlb_info *info);
-
 static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
 {
 	/*
@@ -442,9 +443,6 @@ static inline void arch_tlbbatch_add_mm(
 extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
 #ifndef CONFIG_PARAVIRT
-#define flush_tlb_others(mask, info)	\
-	native_flush_tlb_others(mask, info)
-
 #define paravirt_tlb_remove_table(tlb, page) \
 	tlb_remove_page(tlb, (void *)(page))
 #endif
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -25,6 +25,7 @@
 # define __flush_tlb_local		native_flush_tlb_local
 # define __flush_tlb_global		native_flush_tlb_global
 # define __flush_tlb_one_user(addr)	native_flush_tlb_one_user(addr)
+# define __flush_tlb_others(msk, info)	native_flush_tlb_others(msk, info)
 #endif
 
 /*
@@ -715,8 +716,8 @@ static bool tlb_is_not_lazy(int cpu, voi
 	return !per_cpu(cpu_tlbstate.is_lazy, cpu);
 }
 
-void native_flush_tlb_others(const struct cpumask *cpumask,
-			     const struct flush_tlb_info *info)
+STATIC_NOPV void native_flush_tlb_others(const struct cpumask *cpumask,
+					 const struct flush_tlb_info *info)
 {
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
 	if (info->end == TLB_FLUSH_ALL)
@@ -766,6 +767,12 @@ void native_flush_tlb_others(const struc
 				(void *)info, 1, cpumask);
 }
 
+void flush_tlb_others(const struct cpumask *cpumask,
+		      const struct flush_tlb_info *info)
+{
+	__flush_tlb_others(cpumask, info);
+}
+
 /*
  * See Documentation/x86/tlb.rst for details.  We choose 33
  * because it is large enough to cover the vast majority (at


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 10/15] x86/tlb: Move paravirt_tlb_remove_table() to the usage site
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (8 preceding siblings ...)
  2020-04-19 20:31 ` [patch 09/15] x86/tlb: Move flush_tlb_others() " Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 11/15] x86/tlb: Move cr4_set_bits_and_update_boot() " Thomas Gleixner
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

Move it where the only user is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |    5 -----
 arch/x86/mm/pgtable.c           |    8 ++++++++
 2 files changed, 8 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -442,9 +442,4 @@ static inline void arch_tlbbatch_add_mm(
 
 extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
-#ifndef CONFIG_PARAVIRT
-#define paravirt_tlb_remove_table(tlb, page) \
-	tlb_remove_page(tlb, (void *)(page))
-#endif
-
 #endif /* _ASM_X86_TLBFLUSH_H */
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -19,6 +19,14 @@ EXPORT_SYMBOL(physical_mask);
 #define PGTABLE_HIGHMEM 0
 #endif
 
+#ifndef CONFIG_PARAVIRT
+static inline
+void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	tlb_remove_page(tlb, table);
+}
+#endif
+
 gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM;
 
 pgtable_t pte_alloc_one(struct mm_struct *mm)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 11/15] x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (9 preceding siblings ...)
  2020-04-19 20:31 ` [patch 10/15] x86/tlb: Move paravirt_tlb_remove_table() to the usage site Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 12/15] x86/tlb: Uninline nmi_uaccess_okay() Thomas Gleixner
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

No point in having this exposed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |   14 --------------
 arch/x86/mm/init.c              |   13 +++++++++++++
 2 files changed, 13 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -332,23 +332,9 @@ static inline void cr4_clear_bits(unsign
 	local_irq_restore(flags);
 }
 
-/*
- * Save some of cr4 feature set we're using (e.g.  Pentium 4MB
- * enable and PPro Global page enable), so that any CPU's that boot
- * up after us can get the correct flags.  This should only be used
- * during boot on the boot cpu.
- */
 extern unsigned long mmu_cr4_features;
 extern u32 *trampoline_cr4_features;
 
-static inline void cr4_set_bits_and_update_boot(unsigned long mask)
-{
-	mmu_cr4_features |= mask;
-	if (trampoline_cr4_features)
-		*trampoline_cr4_features = mmu_cr4_features;
-	cr4_set_bits(mask);
-}
-
 extern void initialize_tlbstate_and_flush(void);
 
 /*
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -172,6 +172,19 @@ struct map_range {
 
 static int page_size_mask;
 
+/*
+ * Save some of cr4 feature set we're using (e.g.  Pentium 4MB
+ * enable and PPro Global page enable), so that any CPU's that boot
+ * up after us can get the correct flags. Invoked on the boot CPU.
+ */
+static inline void cr4_set_bits_and_update_boot(unsigned long mask)
+{
+	mmu_cr4_features |= mask;
+	if (trampoline_cr4_features)
+		*trampoline_cr4_features = mmu_cr4_features;
+	cr4_set_bits(mask);
+}
+
 static void __init probe_page_size_mask(void)
 {
 	/*


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 12/15] x86/tlb: Uninline nmi_uaccess_okay()
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (10 preceding siblings ...)
  2020-04-19 20:31 ` [patch 11/15] x86/tlb: Move cr4_set_bits_and_update_boot() " Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 13/15] x86/tlb: Move PCID helpers where they are used Thomas Gleixner
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

cpu_tlbstate is exported because various TLB related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well contained kernel functions and not be directly exposed to
modules.

nmi_access_ok() is the last inline function which requires access to
cpu_tlbstate. Move it into the TLB code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |   33 +--------------------------------
 arch/x86/mm/tlb.c               |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 32 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -247,38 +247,7 @@ struct tlb_state {
 };
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate);
 
-/*
- * Blindly accessing user memory from NMI context can be dangerous
- * if we're in the middle of switching the current user task or
- * switching the loaded mm.  It can also be dangerous if we
- * interrupted some kernel code that was temporarily using a
- * different mm.
- */
-static inline bool nmi_uaccess_okay(void)
-{
-	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	struct mm_struct *current_mm = current->mm;
-
-	VM_WARN_ON_ONCE(!loaded_mm);
-
-	/*
-	 * The condition we want to check is
-	 * current_mm->pgd == __va(read_cr3_pa()).  This may be slow, though,
-	 * if we're running in a VM with shadow paging, and nmi_uaccess_okay()
-	 * is supposed to be reasonably fast.
-	 *
-	 * Instead, we check the almost equivalent but somewhat conservative
-	 * condition below, and we rely on the fact that switch_mm_irqs_off()
-	 * sets loaded_mm to LOADED_MM_SWITCHING before writing to CR3.
-	 */
-	if (loaded_mm != current_mm)
-		return false;
-
-	VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa()));
-
-	return true;
-}
-
+bool nmi_uaccess_okay(void);
 #define nmi_uaccess_okay nmi_uaccess_okay
 
 void cr4_update_irqsoff(unsigned long set, unsigned long clear);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1079,6 +1079,38 @@ void arch_tlbbatch_flush(struct arch_tlb
 	put_cpu();
 }
 
+/*
+ * Blindly accessing user memory from NMI context can be dangerous
+ * if we're in the middle of switching the current user task or
+ * switching the loaded mm.  It can also be dangerous if we
+ * interrupted some kernel code that was temporarily using a
+ * different mm.
+ */
+bool nmi_uaccess_okay(void)
+{
+	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	struct mm_struct *current_mm = current->mm;
+
+	VM_WARN_ON_ONCE(!loaded_mm);
+
+	/*
+	 * The condition we want to check is
+	 * current_mm->pgd == __va(read_cr3_pa()).  This may be slow, though,
+	 * if we're running in a VM with shadow paging, and nmi_uaccess_okay()
+	 * is supposed to be reasonably fast.
+	 *
+	 * Instead, we check the almost equivalent but somewhat conservative
+	 * condition below, and we rely on the fact that switch_mm_irqs_off()
+	 * sets loaded_mm to LOADED_MM_SWITCHING before writing to CR3.
+	 */
+	if (loaded_mm != current_mm)
+		return false;
+
+	VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa()));
+
+	return true;
+}
+
 static ssize_t tlbflush_read_file(struct file *file, char __user *user_buf,
 			     size_t count, loff_t *ppos)
 {


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 13/15] x86/tlb: Move PCID helpers where they are used
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (11 preceding siblings ...)
  2020-04-19 20:31 ` [patch 12/15] x86/tlb: Uninline nmi_uaccess_okay() Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 14/15] xen/privcmd: Remove unneeded asm/tlb.h include Thomas Gleixner
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

Aside of the fact that they are only used in the TLB code, especially
having the comment close to the actual implementation makes a lot of sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |  133 +---------------------------------------
 arch/x86/mm/tlb.c               |  120 ++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 127 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,133 +13,6 @@
 #include <asm/pti.h>
 #include <asm/processor-flags.h>
 
-/*
- * The x86 feature is called PCID (Process Context IDentifier). It is similar
- * to what is traditionally called ASID on the RISC processors.
- *
- * We don't use the traditional ASID implementation, where each process/mm gets
- * its own ASID and flush/restart when we run out of ASID space.
- *
- * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
- * that came by on this CPU, allowing cheaper switch_mm between processes on
- * this CPU.
- *
- * We end up with different spaces for different things. To avoid confusion we
- * use different names for each of them:
- *
- * ASID  - [0, TLB_NR_DYN_ASIDS-1]
- *         the canonical identifier for an mm
- *
- * kPCID - [1, TLB_NR_DYN_ASIDS]
- *         the value we write into the PCID part of CR3; corresponds to the
- *         ASID+1, because PCID 0 is special.
- *
- * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
- *         for KPTI each mm has two address spaces and thus needs two
- *         PCID values, but we can still do with a single ASID denomination
- *         for each mm. Corresponds to kPCID + 2048.
- *
- */
-
-/* There are 12 bits of space for ASIDS in CR3 */
-#define CR3_HW_ASID_BITS		12
-
-/*
- * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
- * user/kernel switches
- */
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define PTI_CONSUMED_PCID_BITS	1
-#else
-# define PTI_CONSUMED_PCID_BITS	0
-#endif
-
-#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS)
-
-/*
- * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid.  -1 below to account
- * for them being zero-based.  Another -1 is because PCID 0 is reserved for
- * use by non-PCID-aware users.
- */
-#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
-
-/*
- * 6 because 6 should be plenty and struct tlb_state will fit in two cache
- * lines.
- */
-#define TLB_NR_DYN_ASIDS	6
-
-/*
- * Given @asid, compute kPCID
- */
-static inline u16 kern_pcid(u16 asid)
-{
-	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	/*
-	 * Make sure that the dynamic ASID space does not confict with the
-	 * bit we are using to switch between user and kernel ASIDs.
-	 */
-	BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_PCID_USER_BIT));
-
-	/*
-	 * The ASID being passed in here should have respected the
-	 * MAX_ASID_AVAILABLE and thus never have the switch bit set.
-	 */
-	VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT));
-#endif
-	/*
-	 * The dynamically-assigned ASIDs that get passed in are small
-	 * (<TLB_NR_DYN_ASIDS).  They never have the high switch bit set,
-	 * so do not bother to clear it.
-	 *
-	 * If PCID is on, ASID-aware code paths put the ASID+1 into the
-	 * PCID bits.  This serves two purposes.  It prevents a nasty
-	 * situation in which PCID-unaware code saves CR3, loads some other
-	 * value (with PCID == 0), and then restores CR3, thus corrupting
-	 * the TLB for ASID 0 if the saved ASID was nonzero.  It also means
-	 * that any bugs involving loading a PCID-enabled CR3 with
-	 * CR4.PCIDE off will trigger deterministically.
-	 */
-	return asid + 1;
-}
-
-/*
- * Given @asid, compute uPCID
- */
-static inline u16 user_pcid(u16 asid)
-{
-	u16 ret = kern_pcid(asid);
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	ret |= 1 << X86_CR3_PTI_PCID_USER_BIT;
-#endif
-	return ret;
-}
-
-struct pgd_t;
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
-{
-	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
-	} else {
-		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
-	}
-}
-
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
-{
-	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-	/*
-	 * Use boot_cpu_has() instead of this_cpu_has() as this function
-	 * might be called during early boot. This should work even after
-	 * boot because all CPU's the have same capabilities:
-	 */
-	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
-}
-
 struct flush_tlb_info;
 
 void flush_tlb_local(void);
@@ -153,6 +26,12 @@ void flush_tlb_others(const struct cpuma
 #include <asm/paravirt.h>
 #endif
 
+/*
+ * 6 because 6 should be plenty and struct tlb_state will fit in two cache
+ * lines.
+ */
+#define TLB_NR_DYN_ASIDS	6
+
 struct tlb_context {
 	u64 ctx_id;
 	u64 tlb_gen;
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -49,6 +49,126 @@
 #define LAST_USER_MM_IBPB	0x1UL
 
 /*
+ * The x86 feature is called PCID (Process Context IDentifier). It is similar
+ * to what is traditionally called ASID on the RISC processors.
+ *
+ * We don't use the traditional ASID implementation, where each process/mm gets
+ * its own ASID and flush/restart when we run out of ASID space.
+ *
+ * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
+ * that came by on this CPU, allowing cheaper switch_mm between processes on
+ * this CPU.
+ *
+ * We end up with different spaces for different things. To avoid confusion we
+ * use different names for each of them:
+ *
+ * ASID  - [0, TLB_NR_DYN_ASIDS-1]
+ *         the canonical identifier for an mm
+ *
+ * kPCID - [1, TLB_NR_DYN_ASIDS]
+ *         the value we write into the PCID part of CR3; corresponds to the
+ *         ASID+1, because PCID 0 is special.
+ *
+ * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
+ *         for KPTI each mm has two address spaces and thus needs two
+ *         PCID values, but we can still do with a single ASID denomination
+ *         for each mm. Corresponds to kPCID + 2048.
+ *
+ */
+
+/* There are 12 bits of space for ASIDS in CR3 */
+#define CR3_HW_ASID_BITS		12
+
+/*
+ * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
+ * user/kernel switches
+ */
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define PTI_CONSUMED_PCID_BITS	1
+#else
+# define PTI_CONSUMED_PCID_BITS	0
+#endif
+
+#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS)
+
+/*
+ * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid.  -1 below to account
+ * for them being zero-based.  Another -1 is because PCID 0 is reserved for
+ * use by non-PCID-aware users.
+ */
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
+
+/*
+ * Given @asid, compute kPCID
+ */
+static inline u16 kern_pcid(u16 asid)
+{
+	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	/*
+	 * Make sure that the dynamic ASID space does not confict with the
+	 * bit we are using to switch between user and kernel ASIDs.
+	 */
+	BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_PCID_USER_BIT));
+
+	/*
+	 * The ASID being passed in here should have respected the
+	 * MAX_ASID_AVAILABLE and thus never have the switch bit set.
+	 */
+	VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT));
+#endif
+	/*
+	 * The dynamically-assigned ASIDs that get passed in are small
+	 * (<TLB_NR_DYN_ASIDS).  They never have the high switch bit set,
+	 * so do not bother to clear it.
+	 *
+	 * If PCID is on, ASID-aware code paths put the ASID+1 into the
+	 * PCID bits.  This serves two purposes.  It prevents a nasty
+	 * situation in which PCID-unaware code saves CR3, loads some other
+	 * value (with PCID == 0), and then restores CR3, thus corrupting
+	 * the TLB for ASID 0 if the saved ASID was nonzero.  It also means
+	 * that any bugs involving loading a PCID-enabled CR3 with
+	 * CR4.PCIDE off will trigger deterministically.
+	 */
+	return asid + 1;
+}
+
+/*
+ * Given @asid, compute uPCID
+ */
+static inline u16 user_pcid(u16 asid)
+{
+	u16 ret = kern_pcid(asid);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	ret |= 1 << X86_CR3_PTI_PCID_USER_BIT;
+#endif
+	return ret;
+}
+
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+{
+	if (static_cpu_has(X86_FEATURE_PCID)) {
+		return __sme_pa(pgd) | kern_pcid(asid);
+	} else {
+		VM_WARN_ON_ONCE(asid != 0);
+		return __sme_pa(pgd);
+	}
+}
+
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+{
+	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+	/*
+	 * Use boot_cpu_has() instead of this_cpu_has() as this function
+	 * might be called during early boot. This should work even after
+	 * boot because all CPU's the have same capabilities:
+	 */
+	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
+	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+}
+
+/*
  * We get here when we do something requiring a TLB invalidation
  * but could not go invalidate all of the contexts.  We do the
  * necessary invalidation by clearing out the 'ctx_id' which


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 14/15] xen/privcmd: Remove unneeded asm/tlb.h include
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (12 preceding siblings ...)
  2020-04-19 20:31 ` [patch 13/15] x86/tlb: Move PCID helpers where they are used Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-19 20:31 ` [patch 15/15] x86/tlb: Restrict access to tlbstate Thomas Gleixner
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Boris Ostrovsky, Juergen Gross,
	Thomas Lendacky

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
---
 drivers/xen/privcmd.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -27,7 +27,6 @@
 
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
-#include <asm/tlb.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [patch 15/15] x86/tlb: Restrict access to tlbstate
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (13 preceding siblings ...)
  2020-04-19 20:31 ` [patch 14/15] xen/privcmd: Remove unneeded asm/tlb.h include Thomas Gleixner
@ 2020-04-19 20:31 ` Thomas Gleixner
  2020-04-20  9:20 ` [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Christoph Hellwig
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-19 20:31 UTC (permalink / raw)
  To: LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky

Hide tlbstate, flush_tlb_info and related helpers when tlbflush.h is
included from a module. Modules have absolutely no business with these
internals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |  136 ++++++++++++++++++++--------------------
 arch/x86/mm/init.c              |    1 
 2 files changed, 69 insertions(+), 68 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,20 +13,69 @@
 #include <asm/pti.h>
 #include <asm/processor-flags.h>
 
-struct flush_tlb_info;
-
 void flush_tlb_local(void);
 void flush_tlb_global(void);
-void flush_tlb_one_user(unsigned long addr);
-void flush_tlb_one_kernel(unsigned long addr);
-void flush_tlb_others(const struct cpumask *cpumask,
-		      const struct flush_tlb_info *info);
 
-#ifdef CONFIG_PARAVIRT
-#include <asm/paravirt.h>
-#endif
+#define TLB_FLUSH_ALL	-1UL
 
 /*
+ * flush everything
+ */
+static inline void __flush_tlb_all(void)
+{
+	/*
+	 * This is to catch users with enabled preemption and the PGE feature
+	 * and don't trigger the warning in __native_flush_tlb().
+	 */
+	VM_WARN_ON_ONCE(preemptible());
+
+	if (boot_cpu_has(X86_FEATURE_PGE)) {
+		flush_tlb_global();
+	} else {
+		/*
+		 * !PGE -> !PCID (setup_pcid()), thus every flush is total.
+		 */
+		flush_tlb_local();
+	}
+}
+
+void cr4_update_irqsoff(unsigned long set, unsigned long clear);
+unsigned long cr4_read_shadow(void);
+
+/* Set in this cpu's CR4. */
+static inline void cr4_set_bits_irqsoff(unsigned long mask)
+{
+	cr4_update_irqsoff(mask, 0);
+}
+
+/* Clear in this cpu's CR4. */
+static inline void cr4_clear_bits_irqsoff(unsigned long mask)
+{
+	cr4_update_irqsoff(0, mask);
+}
+
+/* Set in this cpu's CR4. */
+static inline void cr4_set_bits(unsigned long mask)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	cr4_set_bits_irqsoff(mask);
+	local_irq_restore(flags);
+}
+
+/* Clear in this cpu's CR4. */
+static inline void cr4_clear_bits(unsigned long mask)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	cr4_clear_bits_irqsoff(mask);
+	local_irq_restore(flags);
+}
+
+#ifndef MODULE
+/*
  * 6 because 6 should be plenty and struct tlb_state will fit in two cache
  * lines.
  */
@@ -129,76 +178,18 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct tl
 bool nmi_uaccess_okay(void);
 #define nmi_uaccess_okay nmi_uaccess_okay
 
-void cr4_update_irqsoff(unsigned long set, unsigned long clear);
-unsigned long cr4_read_shadow(void);
-
 /* Initialize cr4 shadow for this CPU. */
 static inline void cr4_init_shadow(void)
 {
 	this_cpu_write(cpu_tlbstate.cr4, __read_cr4());
 }
 
-/* Set in this cpu's CR4. */
-static inline void cr4_set_bits_irqsoff(unsigned long mask)
-{
-	cr4_update_irqsoff(mask, 0);
-}
-
-/* Clear in this cpu's CR4. */
-static inline void cr4_clear_bits_irqsoff(unsigned long mask)
-{
-	cr4_update_irqsoff(0, mask);
-}
-
-/* Set in this cpu's CR4. */
-static inline void cr4_set_bits(unsigned long mask)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	cr4_set_bits_irqsoff(mask);
-	local_irq_restore(flags);
-}
-
-/* Clear in this cpu's CR4. */
-static inline void cr4_clear_bits(unsigned long mask)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	cr4_clear_bits_irqsoff(mask);
-	local_irq_restore(flags);
-}
-
 extern unsigned long mmu_cr4_features;
 extern u32 *trampoline_cr4_features;
 
 extern void initialize_tlbstate_and_flush(void);
 
 /*
- * flush everything
- */
-static inline void __flush_tlb_all(void)
-{
-	/*
-	 * This is to catch users with enabled preemption and the PGE feature
-	 * and don't trigger the warning in __native_flush_tlb().
-	 */
-	VM_WARN_ON_ONCE(preemptible());
-
-	if (boot_cpu_has(X86_FEATURE_PGE)) {
-		flush_tlb_global();
-	} else {
-		/*
-		 * !PGE -> !PCID (setup_pcid()), thus every flush is total.
-		 */
-		flush_tlb_local();
-	}
-}
-
-#define TLB_FLUSH_ALL	-1UL
-
-/*
  * TLB flushing:
  *
  *  - flush_tlb_all() flushes all processes TLBs
@@ -236,6 +227,15 @@ struct flush_tlb_info {
 	bool			freed_tables;
 };
 
+void flush_tlb_one_user(unsigned long addr);
+void flush_tlb_one_kernel(unsigned long addr);
+void flush_tlb_others(const struct cpumask *cpumask,
+		      const struct flush_tlb_info *info);
+
+#ifdef CONFIG_PARAVIRT
+#include <asm/paravirt.h>
+#endif
+
 #define flush_tlb_mm(mm)						\
 		flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
 
@@ -276,4 +276,6 @@ static inline void arch_tlbbatch_add_mm(
 
 extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
+#endif /* !MODULE */
+
 #endif /* _ASM_X86_TLBFLUSH_H */
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -970,7 +970,6 @@ void __init zone_sizes_init(void)
 	.next_asid = 1,
 	.cr4 = ~0UL,	/* fail hard if we screw up cr4 shadow initialization */
 };
-EXPORT_PER_CPU_SYMBOL(cpu_tlbstate);
 
 void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
 {


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 02/15] x86/cpu: Uninline CR4 accessors
  2020-04-19 20:31 ` [patch 02/15] x86/cpu: Uninline CR4 accessors Thomas Gleixner
@ 2020-04-20  9:01   ` Christoph Hellwig
  2020-04-20  9:34     ` Borislav Petkov
  2020-04-20 17:25     ` Thomas Gleixner
  0 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2020-04-20  9:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -387,7 +387,30 @@ void native_write_cr4(unsigned long val)
>  			  bits_missing);
>  	}
>  }
> -EXPORT_SYMBOL(native_write_cr4);
> +#if IS_MODULE(CONFIG_LKDTM)
> +EXPORT_SYMBOL_GPL(native_write_cr4);
> +#endif

While this is better than what we had before we really need to have
a discussion on lkdtm - it needs a lot of crap that otherwise wouldn't
be exported, and I'm really worried about people enabling it and thus
adding exports even if they are conditional.  Can we force the code
to be built in require a boot option for it to be activated?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (14 preceding siblings ...)
  2020-04-19 20:31 ` [patch 15/15] x86/tlb: Restrict access to tlbstate Thomas Gleixner
@ 2020-04-20  9:20 ` Christoph Hellwig
  2020-04-20 16:58   ` Alexandre Chartre
  2020-04-20 17:27   ` Thomas Gleixner
  2020-04-20 10:25 ` Peter Zijlstra
                   ` (2 subsequent siblings)
  18 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2020-04-20  9:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

Just looking over some exports at the end of the series (and thus
ignoring bisection issues):

 - Is there any good reason to keep __flush_tlb_all inline vs moving it
   out of line and kill the flush_tlb_local and flush_tlb_global exports.
   Also there is just a single modular users in SVM, I wonder if there is
   any way to get rid of that one as well.

Also I think cpu_tlbstate itself could be marked static in tlb.c with
a few more changes, I wonder if would be worth it?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 02/15] x86/cpu: Uninline CR4 accessors
  2020-04-20  9:01   ` Christoph Hellwig
@ 2020-04-20  9:34     ` Borislav Petkov
  2020-04-20 17:25     ` Thomas Gleixner
  1 sibling, 0 replies; 34+ messages in thread
From: Borislav Petkov @ 2020-04-20  9:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, LKML, x86, Kees Cook, Paolo Bonzini,
	Thomas Lendacky, Juergen Gross, Boris Ostrovsky

On Mon, Apr 20, 2020 at 02:01:02AM -0700, Christoph Hellwig wrote:
> While this is better than what we had before we really need to have
> a discussion on lkdtm - it needs a lot of crap that otherwise wouldn't
> be exported, and I'm really worried about people enabling it and thus
> adding exports even if they are conditional.

Thought the same too, while looking at that. It is fine and dandy that
it injects all kinds of crap into a running kernel but not at the price
of exporting such internal interfaces.

> Can we force the code to be built in require a boot option for it to
> be activated?

Yes please.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (15 preceding siblings ...)
  2020-04-20  9:20 ` [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Christoph Hellwig
@ 2020-04-20 10:25 ` Peter Zijlstra
  2020-04-20 16:33 ` Alexandre Chartre
  2020-04-21 17:10 ` Andy Lutomirski
  18 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2020-04-20 10:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

On Sun, Apr 19, 2020 at 10:31:37PM +0200, Thomas Gleixner wrote:
> The per-CPU tlbstate contains sensitive information which should be really
> only accessible in core code. It is exported to modules because some inline
> functions which are required by KVM need access to it.
> 
> The following series creates regular exported functions for the few things
> which are needed by KVM and hides the struct definition and some low level
> helpers from modules.
> 
> The series is also available from git:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel x86/tlb
> 

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-19 20:31 ` [patch 05/15] x86/tlb: Move __flush_tlb() out of line Thomas Gleixner
@ 2020-04-20 13:48   ` Tom Lendacky
  2020-04-20 14:03     ` Jürgen Groß
  0 siblings, 1 reply; 34+ messages in thread
From: Tom Lendacky @ 2020-04-20 13:48 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Juergen Gross, Boris Ostrovsky

On 4/19/20 3:31 PM, Thomas Gleixner wrote:
> cpu_tlbstate is exported because various TLB related functions need access
> to it, but cpu_tlbstate is sensitive information which should only be
> accessed by well contained kernel functions and not be directly exposed to
> modules.
> 
> The various TLB flush functions need access to cpu_tlbstate. As a first
> step move __flush_tlb() out of line and hide the native function. The
> latter can be static when CONFIG_PARAVIRT is disabled.
> 
> Consolidate the name space while at it and remove the pointless extra
> wrapper in the paravirt code.
> 
> No functional change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   arch/x86/include/asm/paravirt.h    |    4 +++-
>   arch/x86/include/asm/tlbflush.h    |   29 +++++------------------------
>   arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
>   arch/x86/kernel/paravirt.c         |    7 +------
>   arch/x86/mm/mem_encrypt.c          |    2 +-
>   arch/x86/mm/tlb.c                  |   33 ++++++++++++++++++++++++++++++++-
>   arch/x86/platform/uv/tlb_uv.c      |    2 +-
>   7 files changed, 45 insertions(+), 36 deletions(-)
> 
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -47,7 +47,9 @@ static inline void slow_down_io(void)
>   #endif
>   }
>   
> -static inline void __flush_tlb(void)
> +void native_flush_tlb_local(void);
> +
> +static inline void __flush_tlb_local(void)
>   {
>   	PVOP_VCALL0(mmu.flush_tlb_user);
>   }
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
>   	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
>   }
>   
> +void flush_tlb_local(void);
> +
>   #ifdef CONFIG_PARAVIRT
>   #include <asm/paravirt.h>
>   #else
> -#define __flush_tlb() __native_flush_tlb()
> -#define __flush_tlb_global() __native_flush_tlb_global()
> -#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
> +#define __flush_tlb_global()		__native_flush_tlb_global()
> +#define __flush_tlb_one_user(addr)	__native_flush_tlb_one_user(addr)
>   #endif
>   
>   struct tlb_context {
> @@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
>   }
>   
>   /*
> - * flush the entire current user mapping
> - */
> -static inline void __native_flush_tlb(void)
> -{
> -	/*
> -	 * Preemption or interrupts must be disabled to protect the access
> -	 * to the per CPU variable and to prevent being preempted between
> -	 * read_cr3() and write_cr3().
> -	 */
> -	WARN_ON_ONCE(preemptible());
> -
> -	invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
> -
> -	/* If current->mm == NULL then the read_cr3() "borrows" an mm */
> -	native_write_cr3(__native_read_cr3());
> -}
> -
> -/*
>    * flush everything
>    */
>   static inline void __native_flush_tlb_global(void)
> @@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
>   		/*
>   		 * !PGE -> !PCID (setup_pcid()), thus every flush is total.
>   		 */
> -		__flush_tlb();
> +		flush_tlb_local();
>   	}
>   }
>   
> @@ -537,8 +520,6 @@ struct flush_tlb_info {
>   	bool			freed_tables;
>   };
>   
> -#define local_flush_tlb() __flush_tlb()
> -
>   #define flush_tlb_mm(mm)						\
>   		flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
>   
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
>   
>   	/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
>   	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
> -	__flush_tlb();
> +	flush_tlb_local();
>   
>   	/* Save MTRR state */
>   	rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
> @@ -778,7 +778,7 @@ static void post_set(void) __releases(se
>   {
>   	/* Flush TLBs (no need to flush caches - they are disabled) */
>   	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
> -	__flush_tlb();
> +	flush_tlb_local();
>   
>   	/* Intel (P6) standard MTRRs */
>   	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
>   	return insn_len;
>   }
>   
> -static void native_flush_tlb(void)
> -{
> -	__native_flush_tlb();
> -}
> -
>   /*
>    * Global pages have to be flushed a bit differently. Not a real
>    * performance problem because this does not happen often.
> @@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
>   #endif /* CONFIG_PARAVIRT_XXL */
>   
>   	/* Mmu ops. */
> -	.mmu.flush_tlb_user	= native_flush_tlb,
> +	.mmu.flush_tlb_user	= native_flush_tlb_local,
>   	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
>   	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
>   	.mmu.flush_tlb_others	= native_flush_tlb_others,
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
>   		size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
>   	} while (size);
>   
> -	__native_flush_tlb();
> +	flush_tlb_local();

This invoked __native_flush_tlb() because of how early it is called and 
the paravirt ops support isn't set up yet, resulting in a crash if not 
invoking the native version directly. So this needs a "native" version of 
the tlb flush to invoke.

Thanks,
Tom

>   }
>   
>   void __init sme_unmap_bootdata(char *real_mode_data)
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -18,6 +18,13 @@
>   
>   #include "mm_internal.h"
>   
> +#ifdef CONFIG_PARAVIRT
> +# define STATIC_NOPV
> +#else
> +# define STATIC_NOPV			static
> +# define __flush_tlb_local		native_flush_tlb_local
> +#endif
> +
>   /*
>    *	TLB flushing, formerly SMP-only
>    *		c/o Linus Torvalds.
> @@ -645,7 +652,7 @@ static void flush_tlb_func_common(const
>   		trace_tlb_flush(reason, nr_invalidate);
>   	} else {
>   		/* Full flush. */
> -		local_flush_tlb();
> +		flush_tlb_local();
>   		if (local)
>   			count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>   		trace_tlb_flush(reason, TLB_FLUSH_ALL);
> @@ -884,6 +891,30 @@ unsigned long __get_current_cr3_fast(voi
>   EXPORT_SYMBOL_GPL(__get_current_cr3_fast);
>   
>   /*
> + * Flush the entire current user mapping
> + */
> +STATIC_NOPV void native_flush_tlb_local(void)
> +{
> +	/*
> +	 * Preemption or interrupts must be disabled to protect the access
> +	 * to the per CPU variable and to prevent being preempted between
> +	 * read_cr3() and write_cr3().
> +	 */
> +	WARN_ON_ONCE(preemptible());
> +
> +	invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
> +
> +	/* If current->mm == NULL then the read_cr3() "borrows" an mm */
> +	native_write_cr3(__native_read_cr3());
> +}
> +
> +void flush_tlb_local(void)
> +{
> +	__flush_tlb_local();
> +}
> +EXPORT_SYMBOL_GPL(flush_tlb_local);
> +
> +/*
>    * arch_tlbbatch_flush() performs a full TLB flush regardless of the active mm.
>    * This means that the 'struct flush_tlb_info' that describes which mappings to
>    * flush is actually fixed. We therefore set a single fixed struct and use it in
> --- a/arch/x86/platform/uv/tlb_uv.c
> +++ b/arch/x86/platform/uv/tlb_uv.c
> @@ -293,7 +293,7 @@ static void bau_process_message(struct m
>   	 * This must be a normal message, or retry of a normal message
>   	 */
>   	if (msg->address == TLB_FLUSH_ALL) {
> -		local_flush_tlb();
> +		flush_tlb_local();
>   		stat->d_alltlb++;
>   	} else {
>   		__flush_tlb_one_user(msg->address);
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-20 13:48   ` Tom Lendacky
@ 2020-04-20 14:03     ` Jürgen Groß
  2020-04-20 14:26       ` Tom Lendacky
  0 siblings, 1 reply; 34+ messages in thread
From: Jürgen Groß @ 2020-04-20 14:03 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Boris Ostrovsky

On 20.04.20 15:48, Tom Lendacky wrote:
> On 4/19/20 3:31 PM, Thomas Gleixner wrote:
>> cpu_tlbstate is exported because various TLB related functions need 
>> access
>> to it, but cpu_tlbstate is sensitive information which should only be
>> accessed by well contained kernel functions and not be directly 
>> exposed to
>> modules.
>>
>> The various TLB flush functions need access to cpu_tlbstate. As a first
>> step move __flush_tlb() out of line and hide the native function. The
>> latter can be static when CONFIG_PARAVIRT is disabled.
>>
>> Consolidate the name space while at it and remove the pointless extra
>> wrapper in the paravirt code.
>>
>> No functional change.
>>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/include/asm/paravirt.h    |    4 +++-
>>   arch/x86/include/asm/tlbflush.h    |   29 +++++------------------------
>>   arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
>>   arch/x86/kernel/paravirt.c         |    7 +------
>>   arch/x86/mm/mem_encrypt.c          |    2 +-
>>   arch/x86/mm/tlb.c                  |   33 
>> ++++++++++++++++++++++++++++++++-
>>   arch/x86/platform/uv/tlb_uv.c      |    2 +-
>>   7 files changed, 45 insertions(+), 36 deletions(-)
>>
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -47,7 +47,9 @@ static inline void slow_down_io(void)
>>   #endif
>>   }
>> -static inline void __flush_tlb(void)
>> +void native_flush_tlb_local(void);
>> +
>> +static inline void __flush_tlb_local(void)
>>   {
>>       PVOP_VCALL0(mmu.flush_tlb_user);
>>   }
>> --- a/arch/x86/include/asm/tlbflush.h
>> +++ b/arch/x86/include/asm/tlbflush.h
>> @@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
>>       return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
>>   }
>> +void flush_tlb_local(void);
>> +
>>   #ifdef CONFIG_PARAVIRT
>>   #include <asm/paravirt.h>
>>   #else
>> -#define __flush_tlb() __native_flush_tlb()
>> -#define __flush_tlb_global() __native_flush_tlb_global()
>> -#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
>> +#define __flush_tlb_global()        __native_flush_tlb_global()
>> +#define __flush_tlb_one_user(addr)    __native_flush_tlb_one_user(addr)
>>   #endif
>>   struct tlb_context {
>> @@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
>>   }
>>   /*
>> - * flush the entire current user mapping
>> - */
>> -static inline void __native_flush_tlb(void)
>> -{
>> -    /*
>> -     * Preemption or interrupts must be disabled to protect the access
>> -     * to the per CPU variable and to prevent being preempted between
>> -     * read_cr3() and write_cr3().
>> -     */
>> -    WARN_ON_ONCE(preemptible());
>> -
>> -    invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
>> -
>> -    /* If current->mm == NULL then the read_cr3() "borrows" an mm */
>> -    native_write_cr3(__native_read_cr3());
>> -}
>> -
>> -/*
>>    * flush everything
>>    */
>>   static inline void __native_flush_tlb_global(void)
>> @@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
>>           /*
>>            * !PGE -> !PCID (setup_pcid()), thus every flush is total.
>>            */
>> -        __flush_tlb();
>> +        flush_tlb_local();
>>       }
>>   }
>> @@ -537,8 +520,6 @@ struct flush_tlb_info {
>>       bool            freed_tables;
>>   };
>> -#define local_flush_tlb() __flush_tlb()
>> -
>>   #define flush_tlb_mm(mm)                        \
>>           flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
>>       /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>> -    __flush_tlb();
>> +    flush_tlb_local();
>>       /* Save MTRR state */
>>       rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>> @@ -778,7 +778,7 @@ static void post_set(void) __releases(se
>>   {
>>       /* Flush TLBs (no need to flush caches - they are disabled) */
>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>> -    __flush_tlb();
>> +    flush_tlb_local();
>>       /* Intel (P6) standard MTRRs */
>>       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>> --- a/arch/x86/kernel/paravirt.c
>> +++ b/arch/x86/kernel/paravirt.c
>> @@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
>>       return insn_len;
>>   }
>> -static void native_flush_tlb(void)
>> -{
>> -    __native_flush_tlb();
>> -}
>> -
>>   /*
>>    * Global pages have to be flushed a bit differently. Not a real
>>    * performance problem because this does not happen often.
>> @@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
>>   #endif /* CONFIG_PARAVIRT_XXL */
>>       /* Mmu ops. */
>> -    .mmu.flush_tlb_user    = native_flush_tlb,
>> +    .mmu.flush_tlb_user    = native_flush_tlb_local,
>>       .mmu.flush_tlb_kernel    = native_flush_tlb_global,
>>       .mmu.flush_tlb_one_user    = native_flush_tlb_one_user,
>>       .mmu.flush_tlb_others    = native_flush_tlb_others,
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
>>           size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
>>       } while (size);
>> -    __native_flush_tlb();
>> +    flush_tlb_local();
> 
> This invoked __native_flush_tlb() because of how early it is called and 
> the paravirt ops support isn't set up yet, resulting in a crash if not 
> invoking the native version directly. So this needs a "native" version 
> of the tlb flush to invoke.

I don't think this is still true. With my rework of pvops to have all
functions in one struct which is initialized statically initially
everything should work from the time the kernel is mapped.

In case it doesn't there is something very wrong IMO.


Juergen

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-20 14:03     ` Jürgen Groß
@ 2020-04-20 14:26       ` Tom Lendacky
  2020-04-20 14:38         ` Jürgen Groß
  0 siblings, 1 reply; 34+ messages in thread
From: Tom Lendacky @ 2020-04-20 14:26 UTC (permalink / raw)
  To: Jürgen Groß, Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Boris Ostrovsky

On 4/20/20 9:03 AM, Jürgen Groß wrote:
> On 20.04.20 15:48, Tom Lendacky wrote:
>> On 4/19/20 3:31 PM, Thomas Gleixner wrote:
>>> cpu_tlbstate is exported because various TLB related functions need access
>>> to it, but cpu_tlbstate is sensitive information which should only be
>>> accessed by well contained kernel functions and not be directly exposed to
>>> modules.
>>>
>>> The various TLB flush functions need access to cpu_tlbstate. As a first
>>> step move __flush_tlb() out of line and hide the native function. The
>>> latter can be static when CONFIG_PARAVIRT is disabled.
>>>
>>> Consolidate the name space while at it and remove the pointless extra
>>> wrapper in the paravirt code.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
>>> Cc: Juergen Gross <jgross@suse.com>
>>> ---
>>>   arch/x86/include/asm/paravirt.h    |    4 +++-
>>>   arch/x86/include/asm/tlbflush.h    |   29 +++++------------------------
>>>   arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
>>>   arch/x86/kernel/paravirt.c         |    7 +------
>>>   arch/x86/mm/mem_encrypt.c          |    2 +-
>>>   arch/x86/mm/tlb.c                  |   33 
>>> ++++++++++++++++++++++++++++++++-
>>>   arch/x86/platform/uv/tlb_uv.c      |    2 +-
>>>   7 files changed, 45 insertions(+), 36 deletions(-)
>>>
>>> --- a/arch/x86/include/asm/paravirt.h
>>> +++ b/arch/x86/include/asm/paravirt.h
>>> @@ -47,7 +47,9 @@ static inline void slow_down_io(void)
>>>   #endif
>>>   }
>>> -static inline void __flush_tlb(void)
>>> +void native_flush_tlb_local(void);
>>> +
>>> +static inline void __flush_tlb_local(void)
>>>   {
>>>       PVOP_VCALL0(mmu.flush_tlb_user);
>>>   }
>>> --- a/arch/x86/include/asm/tlbflush.h
>>> +++ b/arch/x86/include/asm/tlbflush.h
>>> @@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
>>>       return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
>>>   }
>>> +void flush_tlb_local(void);
>>> +
>>>   #ifdef CONFIG_PARAVIRT
>>>   #include <asm/paravirt.h>
>>>   #else
>>> -#define __flush_tlb() __native_flush_tlb()
>>> -#define __flush_tlb_global() __native_flush_tlb_global()
>>> -#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
>>> +#define __flush_tlb_global()        __native_flush_tlb_global()
>>> +#define __flush_tlb_one_user(addr)    __native_flush_tlb_one_user(addr)
>>>   #endif
>>>   struct tlb_context {
>>> @@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
>>>   }
>>>   /*
>>> - * flush the entire current user mapping
>>> - */
>>> -static inline void __native_flush_tlb(void)
>>> -{
>>> -    /*
>>> -     * Preemption or interrupts must be disabled to protect the access
>>> -     * to the per CPU variable and to prevent being preempted between
>>> -     * read_cr3() and write_cr3().
>>> -     */
>>> -    WARN_ON_ONCE(preemptible());
>>> -
>>> -    invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
>>> -
>>> -    /* If current->mm == NULL then the read_cr3() "borrows" an mm */
>>> -    native_write_cr3(__native_read_cr3());
>>> -}
>>> -
>>> -/*
>>>    * flush everything
>>>    */
>>>   static inline void __native_flush_tlb_global(void)
>>> @@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
>>>           /*
>>>            * !PGE -> !PCID (setup_pcid()), thus every flush is total.
>>>            */
>>> -        __flush_tlb();
>>> +        flush_tlb_local();
>>>       }
>>>   }
>>> @@ -537,8 +520,6 @@ struct flush_tlb_info {
>>>       bool            freed_tables;
>>>   };
>>> -#define local_flush_tlb() __flush_tlb()
>>> -
>>>   #define flush_tlb_mm(mm)                        \
>>>           flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
>>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>>> @@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
>>>       /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>> -    __flush_tlb();
>>> +    flush_tlb_local();
>>>       /* Save MTRR state */
>>>       rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>> @@ -778,7 +778,7 @@ static void post_set(void) __releases(se
>>>   {
>>>       /* Flush TLBs (no need to flush caches - they are disabled) */
>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>> -    __flush_tlb();
>>> +    flush_tlb_local();
>>>       /* Intel (P6) standard MTRRs */
>>>       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>> --- a/arch/x86/kernel/paravirt.c
>>> +++ b/arch/x86/kernel/paravirt.c
>>> @@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
>>>       return insn_len;
>>>   }
>>> -static void native_flush_tlb(void)
>>> -{
>>> -    __native_flush_tlb();
>>> -}
>>> -
>>>   /*
>>>    * Global pages have to be flushed a bit differently. Not a real
>>>    * performance problem because this does not happen often.
>>> @@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
>>>   #endif /* CONFIG_PARAVIRT_XXL */
>>>       /* Mmu ops. */
>>> -    .mmu.flush_tlb_user    = native_flush_tlb,
>>> +    .mmu.flush_tlb_user    = native_flush_tlb_local,
>>>       .mmu.flush_tlb_kernel    = native_flush_tlb_global,
>>>       .mmu.flush_tlb_one_user    = native_flush_tlb_one_user,
>>>       .mmu.flush_tlb_others    = native_flush_tlb_others,
>>> --- a/arch/x86/mm/mem_encrypt.c
>>> +++ b/arch/x86/mm/mem_encrypt.c
>>> @@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
>>>           size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
>>>       } while (size);
>>> -    __native_flush_tlb();
>>> +    flush_tlb_local();
>>
>> This invoked __native_flush_tlb() because of how early it is called and 
>> the paravirt ops support isn't set up yet, resulting in a crash if not 
>> invoking the native version directly. So this needs a "native" version 
>> of the tlb flush to invoke.
> 
> I don't think this is still true. With my rework of pvops to have all
> functions in one struct which is initialized statically initially
> everything should work from the time the kernel is mapped.
> 
> In case it doesn't there is something very wrong IMO.

The memory encryption support was implemented in 4.14, so it's quite 
possible that this isn't an issue now. I'll test out the patch and verify 
it. What release did your pvops rework land in?

Thanks,
Tom

> 
> 
> Juergen

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-20 14:26       ` Tom Lendacky
@ 2020-04-20 14:38         ` Jürgen Groß
  2020-04-20 18:30           ` Tom Lendacky
  0 siblings, 1 reply; 34+ messages in thread
From: Jürgen Groß @ 2020-04-20 14:38 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Boris Ostrovsky

On 20.04.20 16:26, Tom Lendacky wrote:
> On 4/20/20 9:03 AM, Jürgen Groß wrote:
>> On 20.04.20 15:48, Tom Lendacky wrote:
>>> On 4/19/20 3:31 PM, Thomas Gleixner wrote:
>>>> cpu_tlbstate is exported because various TLB related functions need 
>>>> access
>>>> to it, but cpu_tlbstate is sensitive information which should only be
>>>> accessed by well contained kernel functions and not be directly 
>>>> exposed to
>>>> modules.
>>>>
>>>> The various TLB flush functions need access to cpu_tlbstate. As a first
>>>> step move __flush_tlb() out of line and hide the native function. The
>>>> latter can be static when CONFIG_PARAVIRT is disabled.
>>>>
>>>> Consolidate the name space while at it and remove the pointless extra
>>>> wrapper in the paravirt code.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
>>>> Cc: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>   arch/x86/include/asm/paravirt.h    |    4 +++-
>>>>   arch/x86/include/asm/tlbflush.h    |   29 
>>>> +++++------------------------
>>>>   arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
>>>>   arch/x86/kernel/paravirt.c         |    7 +------
>>>>   arch/x86/mm/mem_encrypt.c          |    2 +-
>>>>   arch/x86/mm/tlb.c                  |   33 
>>>> ++++++++++++++++++++++++++++++++-
>>>>   arch/x86/platform/uv/tlb_uv.c      |    2 +-
>>>>   7 files changed, 45 insertions(+), 36 deletions(-)
>>>>
>>>> --- a/arch/x86/include/asm/paravirt.h
>>>> +++ b/arch/x86/include/asm/paravirt.h
>>>> @@ -47,7 +47,9 @@ static inline void slow_down_io(void)
>>>>   #endif
>>>>   }
>>>> -static inline void __flush_tlb(void)
>>>> +void native_flush_tlb_local(void);
>>>> +
>>>> +static inline void __flush_tlb_local(void)
>>>>   {
>>>>       PVOP_VCALL0(mmu.flush_tlb_user);
>>>>   }
>>>> --- a/arch/x86/include/asm/tlbflush.h
>>>> +++ b/arch/x86/include/asm/tlbflush.h
>>>> @@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
>>>>       return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
>>>>   }
>>>> +void flush_tlb_local(void);
>>>> +
>>>>   #ifdef CONFIG_PARAVIRT
>>>>   #include <asm/paravirt.h>
>>>>   #else
>>>> -#define __flush_tlb() __native_flush_tlb()
>>>> -#define __flush_tlb_global() __native_flush_tlb_global()
>>>> -#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
>>>> +#define __flush_tlb_global()        __native_flush_tlb_global()
>>>> +#define __flush_tlb_one_user(addr)    
>>>> __native_flush_tlb_one_user(addr)
>>>>   #endif
>>>>   struct tlb_context {
>>>> @@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
>>>>   }
>>>>   /*
>>>> - * flush the entire current user mapping
>>>> - */
>>>> -static inline void __native_flush_tlb(void)
>>>> -{
>>>> -    /*
>>>> -     * Preemption or interrupts must be disabled to protect the access
>>>> -     * to the per CPU variable and to prevent being preempted between
>>>> -     * read_cr3() and write_cr3().
>>>> -     */
>>>> -    WARN_ON_ONCE(preemptible());
>>>> -
>>>> -    invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
>>>> -
>>>> -    /* If current->mm == NULL then the read_cr3() "borrows" an mm */
>>>> -    native_write_cr3(__native_read_cr3());
>>>> -}
>>>> -
>>>> -/*
>>>>    * flush everything
>>>>    */
>>>>   static inline void __native_flush_tlb_global(void)
>>>> @@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
>>>>           /*
>>>>            * !PGE -> !PCID (setup_pcid()), thus every flush is total.
>>>>            */
>>>> -        __flush_tlb();
>>>> +        flush_tlb_local();
>>>>       }
>>>>   }
>>>> @@ -537,8 +520,6 @@ struct flush_tlb_info {
>>>>       bool            freed_tables;
>>>>   };
>>>> -#define local_flush_tlb() __flush_tlb()
>>>> -
>>>>   #define flush_tlb_mm(mm)                        \
>>>>           flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
>>>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>>>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>>>> @@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
>>>>       /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
>>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>>> -    __flush_tlb();
>>>> +    flush_tlb_local();
>>>>       /* Save MTRR state */
>>>>       rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>>> @@ -778,7 +778,7 @@ static void post_set(void) __releases(se
>>>>   {
>>>>       /* Flush TLBs (no need to flush caches - they are disabled) */
>>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>>> -    __flush_tlb();
>>>> +    flush_tlb_local();
>>>>       /* Intel (P6) standard MTRRs */
>>>>       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>>> --- a/arch/x86/kernel/paravirt.c
>>>> +++ b/arch/x86/kernel/paravirt.c
>>>> @@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
>>>>       return insn_len;
>>>>   }
>>>> -static void native_flush_tlb(void)
>>>> -{
>>>> -    __native_flush_tlb();
>>>> -}
>>>> -
>>>>   /*
>>>>    * Global pages have to be flushed a bit differently. Not a real
>>>>    * performance problem because this does not happen often.
>>>> @@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
>>>>   #endif /* CONFIG_PARAVIRT_XXL */
>>>>       /* Mmu ops. */
>>>> -    .mmu.flush_tlb_user    = native_flush_tlb,
>>>> +    .mmu.flush_tlb_user    = native_flush_tlb_local,
>>>>       .mmu.flush_tlb_kernel    = native_flush_tlb_global,
>>>>       .mmu.flush_tlb_one_user    = native_flush_tlb_one_user,
>>>>       .mmu.flush_tlb_others    = native_flush_tlb_others,
>>>> --- a/arch/x86/mm/mem_encrypt.c
>>>> +++ b/arch/x86/mm/mem_encrypt.c
>>>> @@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
>>>>           size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
>>>>       } while (size);
>>>> -    __native_flush_tlb();
>>>> +    flush_tlb_local();
>>>
>>> This invoked __native_flush_tlb() because of how early it is called 
>>> and the paravirt ops support isn't set up yet, resulting in a crash 
>>> if not invoking the native version directly. So this needs a "native" 
>>> version of the tlb flush to invoke.
>>
>> I don't think this is still true. With my rework of pvops to have all
>> functions in one struct which is initialized statically initially
>> everything should work from the time the kernel is mapped.
>>
>> In case it doesn't there is something very wrong IMO.
> 
> The memory encryption support was implemented in 4.14, so it's quite 
> possible that this isn't an issue now. I'll test out the patch and 
> verify it. What release did your pvops rework land in?

4.20.


Juergen


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (16 preceding siblings ...)
  2020-04-20 10:25 ` Peter Zijlstra
@ 2020-04-20 16:33 ` Alexandre Chartre
  2020-04-21 17:10 ` Andy Lutomirski
  18 siblings, 0 replies; 34+ messages in thread
From: Alexandre Chartre @ 2020-04-20 16:33 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Thomas Lendacky, Juergen Gross,
	Boris Ostrovsky



On 4/19/20 10:31 PM, Thomas Gleixner wrote:
> The per-CPU tlbstate contains sensitive information which should be really
> only accessible in core code. It is exported to modules because some inline
> functions which are required by KVM need access to it.
> 
> The following series creates regular exported functions for the few things
> which are needed by KVM and hides the struct definition and some low level
> helpers from modules.
> 
> The series is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel x86/tlb
> 
> Thanks,
> 
> 	tglx
> 

Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>

For all patches.

alex.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-20  9:20 ` [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Christoph Hellwig
@ 2020-04-20 16:58   ` Alexandre Chartre
  2020-04-20 20:08     ` Thomas Gleixner
  2020-04-20 17:27   ` Thomas Gleixner
  1 sibling, 1 reply; 34+ messages in thread
From: Alexandre Chartre @ 2020-04-20 16:58 UTC (permalink / raw)
  To: Christoph Hellwig, Thomas Gleixner
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky


On 4/20/20 11:20 AM, Christoph Hellwig wrote:
> Just looking over some exports at the end of the series (and thus
> ignoring bisection issues):
> 
>   - Is there any good reason to keep __flush_tlb_all inline vs moving it
>     out of line and kill the flush_tlb_local and flush_tlb_global exports.
>     Also there is just a single modular users in SVM, I wonder if there is
>     any way to get rid of that one as well.
> 
> Also I think cpu_tlbstate itself could be marked static in tlb.c with
> a few more changes, I wonder if would be worth it?
> 

For Address Space Isolation (ASI), I was planning on storing the ASI session
into cpu_tlbstate (https://lkml.org/lkml/2020/2/26/699) as the ASI session
then provides the TLB flushing information based on the ASI used. In that case,
I would need cpu_tlbstate to be non-static. Otherwise I can have my own percpu
asi_session structures, but using cpu_tlbstate seemed more appropriate to me.
This is opened for discussion; for now, I am waiting for more changes that tglx
is making, before rebasing ASI.

alex.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 02/15] x86/cpu: Uninline CR4 accessors
  2020-04-20  9:01   ` Christoph Hellwig
  2020-04-20  9:34     ` Borislav Petkov
@ 2020-04-20 17:25     ` Thomas Gleixner
  1 sibling, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-20 17:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

Christoph Hellwig <hch@infradead.org> writes:
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -387,7 +387,30 @@ void native_write_cr4(unsigned long val)
>>  			  bits_missing);
>>  	}
>>  }
>> -EXPORT_SYMBOL(native_write_cr4);
>> +#if IS_MODULE(CONFIG_LKDTM)
>> +EXPORT_SYMBOL_GPL(native_write_cr4);
>> +#endif
>
> While this is better than what we had before we really need to have
> a discussion on lkdtm - it needs a lot of crap that otherwise wouldn't
> be exported, and I'm really worried about people enabling it and thus
> adding exports even if they are conditional.  Can we force the code
> to be built in require a boot option for it to be activated?

I can live with that :)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-20  9:20 ` [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Christoph Hellwig
  2020-04-20 16:58   ` Alexandre Chartre
@ 2020-04-20 17:27   ` Thomas Gleixner
  2020-04-21  8:09     ` Sean Christopherson
  1 sibling, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-20 17:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

Christoph Hellwig <hch@infradead.org> writes:
> Just looking over some exports at the end of the series (and thus
> ignoring bisection issues):
>
>  - Is there any good reason to keep __flush_tlb_all inline vs moving it
>    out of line and kill the flush_tlb_local and flush_tlb_global exports.
>    Also there is just a single modular users in SVM, I wonder if there is
>    any way to get rid of that one as well.

I'll have a look again.

> Also I think cpu_tlbstate itself could be marked static in tlb.c with
> a few more changes, I wonder if would be worth it?

Unfortunately it can't. We need it in the low level entry code due to
PTI and solving that would be a messy surgery.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 05/15] x86/tlb: Move __flush_tlb() out of line
  2020-04-20 14:38         ` Jürgen Groß
@ 2020-04-20 18:30           ` Tom Lendacky
  0 siblings, 0 replies; 34+ messages in thread
From: Tom Lendacky @ 2020-04-20 18:30 UTC (permalink / raw)
  To: Jürgen Groß, Thomas Gleixner, LKML
  Cc: x86, Kees Cook, Paolo Bonzini, Boris Ostrovsky

On 4/20/20 9:38 AM, Jürgen Groß wrote:
> On 20.04.20 16:26, Tom Lendacky wrote:
>> On 4/20/20 9:03 AM, Jürgen Groß wrote:
>>> On 20.04.20 15:48, Tom Lendacky wrote:
>>>> On 4/19/20 3:31 PM, Thomas Gleixner wrote:
>>>>> cpu_tlbstate is exported because various TLB related functions need 
>>>>> access
>>>>> to it, but cpu_tlbstate is sensitive information which should only be
>>>>> accessed by well contained kernel functions and not be directly 
>>>>> exposed to
>>>>> modules.
>>>>>
>>>>> The various TLB flush functions need access to cpu_tlbstate. As a first
>>>>> step move __flush_tlb() out of line and hide the native function. The
>>>>> latter can be static when CONFIG_PARAVIRT is disabled.
>>>>>
>>>>> Consolidate the name space while at it and remove the pointless extra
>>>>> wrapper in the paravirt code.
>>>>>
>>>>> No functional change.
>>>>>
>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>> ---
>>>>>   arch/x86/include/asm/paravirt.h    |    4 +++-
>>>>>   arch/x86/include/asm/tlbflush.h    |   29 
>>>>> +++++------------------------
>>>>>   arch/x86/kernel/cpu/mtrr/generic.c |    4 ++--
>>>>>   arch/x86/kernel/paravirt.c         |    7 +------
>>>>>   arch/x86/mm/mem_encrypt.c          |    2 +-
>>>>>   arch/x86/mm/tlb.c                  |   33 
>>>>> ++++++++++++++++++++++++++++++++-
>>>>>   arch/x86/platform/uv/tlb_uv.c      |    2 +-
>>>>>   7 files changed, 45 insertions(+), 36 deletions(-)
>>>>>
>>>>> --- a/arch/x86/include/asm/paravirt.h
>>>>> +++ b/arch/x86/include/asm/paravirt.h
>>>>> @@ -47,7 +47,9 @@ static inline void slow_down_io(void)
>>>>>   #endif
>>>>>   }
>>>>> -static inline void __flush_tlb(void)
>>>>> +void native_flush_tlb_local(void);
>>>>> +
>>>>> +static inline void __flush_tlb_local(void)
>>>>>   {
>>>>>       PVOP_VCALL0(mmu.flush_tlb_user);
>>>>>   }
>>>>> --- a/arch/x86/include/asm/tlbflush.h
>>>>> +++ b/arch/x86/include/asm/tlbflush.h
>>>>> @@ -140,12 +140,13 @@ static inline unsigned long build_cr3_no
>>>>>       return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
>>>>>   }
>>>>> +void flush_tlb_local(void);
>>>>> +
>>>>>   #ifdef CONFIG_PARAVIRT
>>>>>   #include <asm/paravirt.h>
>>>>>   #else
>>>>> -#define __flush_tlb() __native_flush_tlb()
>>>>> -#define __flush_tlb_global() __native_flush_tlb_global()
>>>>> -#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
>>>>> +#define __flush_tlb_global()        __native_flush_tlb_global()
>>>>> +#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
>>>>>   #endif
>>>>>   struct tlb_context {
>>>>> @@ -371,24 +372,6 @@ static inline void invalidate_user_asid(
>>>>>   }
>>>>>   /*
>>>>> - * flush the entire current user mapping
>>>>> - */
>>>>> -static inline void __native_flush_tlb(void)
>>>>> -{
>>>>> -    /*
>>>>> -     * Preemption or interrupts must be disabled to protect the access
>>>>> -     * to the per CPU variable and to prevent being preempted between
>>>>> -     * read_cr3() and write_cr3().
>>>>> -     */
>>>>> -    WARN_ON_ONCE(preemptible());
>>>>> -
>>>>> -    invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
>>>>> -
>>>>> -    /* If current->mm == NULL then the read_cr3() "borrows" an mm */
>>>>> -    native_write_cr3(__native_read_cr3());
>>>>> -}
>>>>> -
>>>>> -/*
>>>>>    * flush everything
>>>>>    */
>>>>>   static inline void __native_flush_tlb_global(void)
>>>>> @@ -461,7 +444,7 @@ static inline void __flush_tlb_all(void)
>>>>>           /*
>>>>>            * !PGE -> !PCID (setup_pcid()), thus every flush is total.
>>>>>            */
>>>>> -        __flush_tlb();
>>>>> +        flush_tlb_local();
>>>>>       }
>>>>>   }
>>>>> @@ -537,8 +520,6 @@ struct flush_tlb_info {
>>>>>       bool            freed_tables;
>>>>>   };
>>>>> -#define local_flush_tlb() __flush_tlb()
>>>>> -
>>>>>   #define flush_tlb_mm(mm)                        \
>>>>>           flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true)
>>>>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>>>>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>>>>> @@ -761,7 +761,7 @@ static void prepare_set(void) __acquires
>>>>>       /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
>>>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>>>> -    __flush_tlb();
>>>>> +    flush_tlb_local();
>>>>>       /* Save MTRR state */
>>>>>       rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>>>> @@ -778,7 +778,7 @@ static void post_set(void) __releases(se
>>>>>   {
>>>>>       /* Flush TLBs (no need to flush caches - they are disabled) */
>>>>>       count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>>>>> -    __flush_tlb();
>>>>> +    flush_tlb_local();
>>>>>       /* Intel (P6) standard MTRRs */
>>>>>       mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>>>>> --- a/arch/x86/kernel/paravirt.c
>>>>> +++ b/arch/x86/kernel/paravirt.c
>>>>> @@ -160,11 +160,6 @@ unsigned paravirt_patch_insns(void *insn
>>>>>       return insn_len;
>>>>>   }
>>>>> -static void native_flush_tlb(void)
>>>>> -{
>>>>> -    __native_flush_tlb();
>>>>> -}
>>>>> -
>>>>>   /*
>>>>>    * Global pages have to be flushed a bit differently. Not a real
>>>>>    * performance problem because this does not happen often.
>>>>> @@ -359,7 +354,7 @@ struct paravirt_patch_template pv_ops =
>>>>>   #endif /* CONFIG_PARAVIRT_XXL */
>>>>>       /* Mmu ops. */
>>>>> -    .mmu.flush_tlb_user    = native_flush_tlb,
>>>>> +    .mmu.flush_tlb_user    = native_flush_tlb_local,
>>>>>       .mmu.flush_tlb_kernel    = native_flush_tlb_global,
>>>>>       .mmu.flush_tlb_one_user    = native_flush_tlb_one_user,
>>>>>       .mmu.flush_tlb_others    = native_flush_tlb_others,
>>>>> --- a/arch/x86/mm/mem_encrypt.c
>>>>> +++ b/arch/x86/mm/mem_encrypt.c
>>>>> @@ -134,7 +134,7 @@ static void __init __sme_early_map_unmap
>>>>>           size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
>>>>>       } while (size);
>>>>> -    __native_flush_tlb();
>>>>> +    flush_tlb_local();
>>>>
>>>> This invoked __native_flush_tlb() because of how early it is called 
>>>> and the paravirt ops support isn't set up yet, resulting in a crash if 
>>>> not invoking the native version directly. So this needs a "native" 
>>>> version of the tlb flush to invoke.
>>>
>>> I don't think this is still true. With my rework of pvops to have all
>>> functions in one struct which is initialized statically initially
>>> everything should work from the time the kernel is mapped.
>>>
>>> In case it doesn't there is something very wrong IMO.
>>
>> The memory encryption support was implemented in 4.14, so it's quite 
>> possible that this isn't an issue now. I'll test out the patch and 
>> verify it. What release did your pvops rework land in?
> 
> 4.20.

Ok, my (limited) testing appears good at this point. I see a couple of 
calls to ftrace (__fentry__) that just return before invoking the native 
version of the TLB flushing function.

I don't remember what the original problem was that required the call to 
__native_flush_tlb(), maybe some overall kernel instrumentation that once 
removed via the Makefile took care of it. But it appears ok now, so 
disregard my original comment.

Thanks,
Tom

> 
> 
> Juergen
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-20 16:58   ` Alexandre Chartre
@ 2020-04-20 20:08     ` Thomas Gleixner
  0 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-20 20:08 UTC (permalink / raw)
  To: Alexandre Chartre, Christoph Hellwig
  Cc: LKML, x86, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

Alexandre Chartre <alexandre.chartre@oracle.com> writes:
> On 4/20/20 11:20 AM, Christoph Hellwig wrote:
>> Just looking over some exports at the end of the series (and thus
>> ignoring bisection issues):
>> 
>>   - Is there any good reason to keep __flush_tlb_all inline vs moving it
>>     out of line and kill the flush_tlb_local and flush_tlb_global exports.
>>     Also there is just a single modular users in SVM, I wonder if there is
>>     any way to get rid of that one as well.
>> 
>> Also I think cpu_tlbstate itself could be marked static in tlb.c with
>> a few more changes, I wonder if would be worth it?
>> 
> For Address Space Isolation (ASI), I was planning on storing the ASI session
> into cpu_tlbstate (https://lkml.org/lkml/2020/2/26/699) as the ASI session
> then provides the TLB flushing information based on the ASI used. In that case,
> I would need cpu_tlbstate to be non-static. Otherwise I can have my own percpu
> asi_session structures, but using cpu_tlbstate seemed more appropriate to me.
> This is opened for discussion; for now, I am waiting for more changes that tglx
> is making, before rebasing ASI.

Even in that case we could restrict the availability to arch/x86/mm/
which would still make it available for ASI.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-20 17:27   ` Thomas Gleixner
@ 2020-04-21  8:09     ` Sean Christopherson
  2020-04-21  9:09       ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2020-04-21  8:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, LKML, x86, Kees Cook, Paolo Bonzini,
	Thomas Lendacky, Juergen Gross, Boris Ostrovsky

On Mon, Apr 20, 2020 at 07:27:06PM +0200, Thomas Gleixner wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> > Just looking over some exports at the end of the series (and thus
> > ignoring bisection issues):
> >
> >  - Is there any good reason to keep __flush_tlb_all inline vs moving it
> >    out of line and kill the flush_tlb_local and flush_tlb_global exports.
> >    Also there is just a single modular users in SVM, I wonder if there is
> >    any way to get rid of that one as well.
> 
> I'll have a look again.

Regarding the SVM case, the only usage is for a TLB errata.  At a glance,
svm_init_erratum_383() and is_erratum_383() don't use any KVM hooks, i.e. I
don't see anything that would prevent moving those to .../kernel/cpu/amd.c.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-21  8:09     ` Sean Christopherson
@ 2020-04-21  9:09       ` Thomas Gleixner
  2020-04-22  0:42         ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2020-04-21  9:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Christoph Hellwig, LKML, x86, Kees Cook, Paolo Bonzini,
	Thomas Lendacky, Juergen Gross, Boris Ostrovsky

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Mon, Apr 20, 2020 at 07:27:06PM +0200, Thomas Gleixner wrote:
>> Christoph Hellwig <hch@infradead.org> writes:
>> > Just looking over some exports at the end of the series (and thus
>> > ignoring bisection issues):
>> >
>> >  - Is there any good reason to keep __flush_tlb_all inline vs moving it
>> >    out of line and kill the flush_tlb_local and flush_tlb_global exports.
>> >    Also there is just a single modular users in SVM, I wonder if there is
>> >    any way to get rid of that one as well.
>> 
>> I'll have a look again.
>
> Regarding the SVM case, the only usage is for a TLB errata.  At a glance,
> svm_init_erratum_383() and is_erratum_383() don't use any KVM hooks, i.e. I
> don't see anything that would prevent moving those to .../kernel/cpu/amd.c.

Right, but that would trade one export vs. two SVM errata specific
exports. Not really a win.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
                   ` (17 preceding siblings ...)
  2020-04-20 16:33 ` Alexandre Chartre
@ 2020-04-21 17:10 ` Andy Lutomirski
  18 siblings, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2020-04-21 17:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Kees Cook, Paolo Bonzini, Thomas Lendacky,
	Juergen Gross, Boris Ostrovsky

On Sun, Apr 19, 2020 at 1:36 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> The per-CPU tlbstate contains sensitive information which should be really
> only accessible in core code. It is exported to modules because some inline
> functions which are required by KVM need access to it.
>
> The following series creates regular exported functions for the few things
> which are needed by KVM and hides the struct definition and some low level
> helpers from modules.
>
> The series is also available from git:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel x86/tlb
>
> Thanks,

The whole series is Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [patch 00/15] x86/tlb: Unexport per-CPU tlbstate
  2020-04-21  9:09       ` Thomas Gleixner
@ 2020-04-22  0:42         ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2020-04-22  0:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, LKML, x86, Kees Cook, Paolo Bonzini,
	Thomas Lendacky, Juergen Gross, Boris Ostrovsky

On Tue, Apr 21, 2020 at 11:09:07AM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > On Mon, Apr 20, 2020 at 07:27:06PM +0200, Thomas Gleixner wrote:
> >> Christoph Hellwig <hch@infradead.org> writes:
> >> > Just looking over some exports at the end of the series (and thus
> >> > ignoring bisection issues):
> >> >
> >> >  - Is there any good reason to keep __flush_tlb_all inline vs moving it
> >> >    out of line and kill the flush_tlb_local and flush_tlb_global exports.
> >> >    Also there is just a single modular users in SVM, I wonder if there is
> >> >    any way to get rid of that one as well.
> >> 
> >> I'll have a look again.
> >
> > Regarding the SVM case, the only usage is for a TLB errata.  At a glance,
> > svm_init_erratum_383() and is_erratum_383() don't use any KVM hooks, i.e. I
> > don't see anything that would prevent moving those to .../kernel/cpu/amd.c.
> 
> Right, but that would trade one export vs. two SVM errata specific
> exports. Not really a win.

True.  I was thinking the svm_init_erratum_383() call could be moved to
init_amd(), assuming the MSR's magic bit 47 is "safe" outside of SVM
enabled, but that's probably not worth risking the potential for breakage
unless you really want to hide __flush_tlb_all().

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-04-22  0:42 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 20:31 [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Thomas Gleixner
2020-04-19 20:31 ` [patch 01/15] x86/tlb: Uninline __get_current_cr3_fast() Thomas Gleixner
2020-04-19 20:31 ` [patch 02/15] x86/cpu: Uninline CR4 accessors Thomas Gleixner
2020-04-20  9:01   ` Christoph Hellwig
2020-04-20  9:34     ` Borislav Petkov
2020-04-20 17:25     ` Thomas Gleixner
2020-04-19 20:31 ` [patch 03/15] x86/cr4: Sanitize CR4.PCE update Thomas Gleixner
2020-04-19 20:31 ` [patch 04/15] x86/alternatives: Move temporary_mm helpers into C Thomas Gleixner
2020-04-19 20:31 ` [patch 05/15] x86/tlb: Move __flush_tlb() out of line Thomas Gleixner
2020-04-20 13:48   ` Tom Lendacky
2020-04-20 14:03     ` Jürgen Groß
2020-04-20 14:26       ` Tom Lendacky
2020-04-20 14:38         ` Jürgen Groß
2020-04-20 18:30           ` Tom Lendacky
2020-04-19 20:31 ` [patch 06/15] x86/tlb: Move __flush_tlb_global() " Thomas Gleixner
2020-04-19 20:31 ` [patch 07/15] x86/tlb: Move __flush_tlb_one_user() " Thomas Gleixner
2020-04-19 20:31 ` [patch 08/15] x86/tlb: Move __flush_tlb_one_kernel() " Thomas Gleixner
2020-04-19 20:31 ` [patch 09/15] x86/tlb: Move flush_tlb_others() " Thomas Gleixner
2020-04-19 20:31 ` [patch 10/15] x86/tlb: Move paravirt_tlb_remove_table() to the usage site Thomas Gleixner
2020-04-19 20:31 ` [patch 11/15] x86/tlb: Move cr4_set_bits_and_update_boot() " Thomas Gleixner
2020-04-19 20:31 ` [patch 12/15] x86/tlb: Uninline nmi_uaccess_okay() Thomas Gleixner
2020-04-19 20:31 ` [patch 13/15] x86/tlb: Move PCID helpers where they are used Thomas Gleixner
2020-04-19 20:31 ` [patch 14/15] xen/privcmd: Remove unneeded asm/tlb.h include Thomas Gleixner
2020-04-19 20:31 ` [patch 15/15] x86/tlb: Restrict access to tlbstate Thomas Gleixner
2020-04-20  9:20 ` [patch 00/15] x86/tlb: Unexport per-CPU tlbstate Christoph Hellwig
2020-04-20 16:58   ` Alexandre Chartre
2020-04-20 20:08     ` Thomas Gleixner
2020-04-20 17:27   ` Thomas Gleixner
2020-04-21  8:09     ` Sean Christopherson
2020-04-21  9:09       ` Thomas Gleixner
2020-04-22  0:42         ` Sean Christopherson
2020-04-20 10:25 ` Peter Zijlstra
2020-04-20 16:33 ` Alexandre Chartre
2020-04-21 17:10 ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).