kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections
@ 2020-07-08 19:51 Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 1/7] x86/kvm: Move context tracking where it belongs Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

Folks,

this is a rebased and adopted version of the original series which is
available here:

     https://lore.kernel.org/r/20200519203128.773151484@linutronix.de
 
It deals with the RCU and context tracking state and the protection against
instrumentation in sensitive places:

  - Placing the guest_enter/exit() calls at the correct place

  - Moving the sensitive VMENTER/EXIT code into the non-instrumentable code
    section.

  - Fixup the tracing code to comply with the non-instrumentation rules

  - Use native functions to access CR2 and the GS base MSR in the critical
    code pathes to prevent them from being instrumented.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 1/7] x86/kvm: Move context tracking where it belongs
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 2/7] x86/kvm/vmx: Add hardirq tracing to guest enter/exit Thomas Gleixner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

Context tracking for KVM happens way too early in the vcpu_run()
code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
cannot use RCU and should also be not instrumented.

The current way of doing this covers way too much code. Move it closer to
the actual vmenter/exit code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/svm.c |   16 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c |   10 ++++++++++
 arch/x86/kvm/x86.c     |    2 --
 3 files changed, 26 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3399,6 +3399,14 @@ static __no_kcsan fastpath_t svm_vcpu_ru
 	 */
 	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
+	/*
+	 * Tell context tracking that this CPU is about to enter guest
+	 * mode. This has to be after x86_spec_ctrl_set_guest() because
+	 * that can take locks (lockdep needs RCU) and calls into world and
+	 * some more.
+	 */
+	guest_enter_irqoff();
+
 	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
 
 #ifdef CONFIG_X86_64
@@ -3409,6 +3417,14 @@ static __no_kcsan fastpath_t svm_vcpu_ru
 	loadsegment(gs, svm->host.gs);
 #endif
 #endif
+	/*
+	 * Tell context tracking that this CPU is back.
+	 *
+	 * This needs to be done before the below as native_read_msr()
+	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
+	 * into world and some more.
+	 */
+	guest_exit_irqoff();
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6728,6 +6728,11 @@ static fastpath_t vmx_vcpu_run(struct kv
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
+	/*
+	 * Tell context tracking that this CPU is about to enter guest mode.
+	 */
+	guest_enter_irqoff();
+
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
 		vmx_l1d_flush(vcpu);
@@ -6743,6 +6748,11 @@ static fastpath_t vmx_vcpu_run(struct kv
 	vcpu->arch.cr2 = read_cr2();
 
 	/*
+	 * Tell context tracking that this CPU is back.
+	 */
+	guest_exit_irqoff();
+
+	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
 	 * SPEC_CTRL MSR it may have left it on; save the value and
 	 * turn it off. This is much more efficient than blindly adding
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8501,7 +8501,6 @@ static int vcpu_enter_guest(struct kvm_v
 	}
 
 	trace_kvm_entry(vcpu->vcpu_id);
-	guest_enter_irqoff();
 
 	fpregs_assert_state_consistent();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
@@ -8563,7 +8562,6 @@ static int vcpu_enter_guest(struct kvm_v
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
 
-	guest_exit_irqoff();
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
 		if (delta != S64_MIN) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 2/7] x86/kvm/vmx: Add hardirq tracing to guest enter/exit
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 1/7] x86/kvm: Move context tracking where it belongs Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 3/7] x86/kvm/svm: Add hardirq tracing on " Thomas Gleixner
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

Entering guest mode is more or less the same as returning to user
space. From an instrumentation point of view both leave kernel mode and the
transition to guest or user mode reenables interrupts on the host. In user
mode an interrupt is served directly and in guest mode it causes a VM exit
which then handles or reinjects the interrupt.

The transition from guest mode or user mode to kernel mode disables
interrupts, which needs to be recorded in instrumentation to set the
correct state again.

This is important for e.g. latency analysis because otherwise the execution
time in guest or user mode would be wrongly accounted as interrupt disabled
and could trigger false positives.

Add hardirq tracing to guest enter/exit functions in the same way as it
is done in the user mode enter/exit code, respecting the RCU requirements.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>



---
 arch/x86/kvm/vmx/vmx.c |   27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6729,9 +6729,21 @@ static fastpath_t vmx_vcpu_run(struct kv
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
 	/*
-	 * Tell context tracking that this CPU is about to enter guest mode.
+	 * VMENTER enables interrupts (host state), but the kernel state is
+	 * interrupts disabled when this is invoked. Also tell RCU about
+	 * it. This is the same logic as for exit_to_user_mode().
+	 *
+	 * This ensures that e.g. latency analysis on the host observes
+	 * guest mode as interrupt enabled.
+	 *
+	 * guest_enter_irqoff() informs context tracking about the
+	 * transition to guest mode and if enabled adjusts RCU state
+	 * accordingly.
 	 */
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 	guest_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
@@ -6748,9 +6760,20 @@ static fastpath_t vmx_vcpu_run(struct kv
 	vcpu->arch.cr2 = read_cr2();
 
 	/*
-	 * Tell context tracking that this CPU is back.
+	 * VMEXIT disables interrupts (host state), but tracing and lockdep
+	 * have them in state 'on' as recorded before entering guest mode.
+	 * Same as enter_from_user_mode().
+	 *
+	 * guest_exit_irqoff() restores host context and reinstates RCU if
+	 * enabled and required.
+	 *
+	 * This needs to be done before the below as native_read_msr()
+	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
+	 * into world and some more.
 	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
 	guest_exit_irqoff();
+	trace_hardirqs_off_finish();
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 3/7] x86/kvm/svm: Add hardirq tracing on guest enter/exit
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 1/7] x86/kvm: Move context tracking where it belongs Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 2/7] x86/kvm/vmx: Add hardirq tracing to guest enter/exit Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 4/7] x86/kvm/vmx: Move guest enter/exit into .noinstr.text Thomas Gleixner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

Entering guest mode is more or less the same as returning to user
space. From an instrumentation point of view both leave kernel mode and the
transition to guest or user mode reenables interrupts on the host. In user
mode an interrupt is served directly and in guest mode it causes a VM exit
which then handles or reinjects the interrupt.

The transition from guest mode or user mode to kernel mode disables
interrupts, which needs to be recorded in instrumentation to set the
correct state again.

This is important for e.g. latency analysis because otherwise the execution
time in guest or user mode would be wrongly accounted as interrupt disabled
and could trigger false positives.

Add hardirq tracing to guest enter/exit functions in the same way as it
is done in the user mode enter/exit code, respecting the RCU requirements.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>



---
 arch/x86/kvm/svm/svm.c |   27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3400,12 +3400,21 @@ static __no_kcsan fastpath_t svm_vcpu_ru
 	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
 	/*
-	 * Tell context tracking that this CPU is about to enter guest
-	 * mode. This has to be after x86_spec_ctrl_set_guest() because
-	 * that can take locks (lockdep needs RCU) and calls into world and
-	 * some more.
+	 * VMENTER enables interrupts (host state), but the kernel state is
+	 * interrupts disabled when this is invoked. Also tell RCU about
+	 * it. This is the same logic as for exit_to_user_mode().
+	 *
+	 * This ensures that e.g. latency analysis on the host observes
+	 * guest mode as interrupt enabled.
+	 *
+	 * guest_enter_irqoff() informs context tracking about the
+	 * transition to guest mode and if enabled adjusts RCU state
+	 * accordingly.
 	 */
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 	guest_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
 
 	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
 
@@ -3417,14 +3426,22 @@ static __no_kcsan fastpath_t svm_vcpu_ru
 	loadsegment(gs, svm->host.gs);
 #endif
 #endif
+
 	/*
-	 * Tell context tracking that this CPU is back.
+	 * VMEXIT disables interrupts (host state), but tracing and lockdep
+	 * have them in state 'on' as recorded before entering guest mode.
+	 * Same as enter_from_user_mode().
+	 *
+	 * guest_exit_irqoff() restores host context and reinstates RCU if
+	 * enabled and required.
 	 *
 	 * This needs to be done before the below as native_read_msr()
 	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
 	 * into world and some more.
 	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
 	guest_exit_irqoff();
+	trace_hardirqs_off_finish();
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 4/7] x86/kvm/vmx: Move guest enter/exit into .noinstr.text
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-07-08 19:51 ` [patch V2 3/7] x86/kvm/svm: Add hardirq tracing on " Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 5/7] x86/kvm/svm: " Thomas Gleixner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

Move the functions which are inside the RCU off region into the
non-instrumentable text section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>



---
 arch/x86/include/asm/hardirq.h  |    4 -
 arch/x86/include/asm/kvm_host.h |    8 ++
 arch/x86/kvm/vmx/ops.h          |    4 +
 arch/x86/kvm/vmx/vmenter.S      |    5 +
 arch/x86/kvm/vmx/vmx.c          |  111 ++++++++++++++++++++++------------------
 arch/x86/kvm/x86.c              |    2 
 6 files changed, 81 insertions(+), 53 deletions(-)

--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -67,12 +67,12 @@ static inline void kvm_set_cpu_l1tf_flus
 	__this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 1);
 }
 
-static inline void kvm_clear_cpu_l1tf_flush_l1d(void)
+static __always_inline void kvm_clear_cpu_l1tf_flush_l1d(void)
 {
 	__this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 0);
 }
 
-static inline bool kvm_get_cpu_l1tf_flush_l1d(void)
+static __always_inline bool kvm_get_cpu_l1tf_flush_l1d(void)
 {
 	return __this_cpu_read(irq_stat.kvm_cpu_l1tf_flush_l1d);
 }
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1636,7 +1636,15 @@ asmlinkage void kvm_spurious_fault(void)
 	insn "\n\t"							\
 	"jmp	668f \n\t"						\
 	"667: \n\t"							\
+	"1: \n\t"							\
+	".pushsection .discard.instr_begin \n\t"			\
+	".long 1b - . \n\t"						\
+	".popsection \n\t"						\
 	"call	kvm_spurious_fault \n\t"				\
+	"1: \n\t"							\
+	".pushsection .discard.instr_end \n\t"				\
+	".long 1b - . \n\t"						\
+	".popsection \n\t"						\
 	"668: \n\t"							\
 	_ASM_EXTABLE(666b, 667b)
 
--- a/arch/x86/kvm/vmx/ops.h
+++ b/arch/x86/kvm/vmx/ops.h
@@ -146,7 +146,9 @@ do {									\
 			  : : op1 : "cc" : error, fault);		\
 	return;								\
 error:									\
+	instrumentation_begin();					\
 	insn##_error(error_args);					\
+	instrumentation_end();						\
 	return;								\
 fault:									\
 	kvm_spurious_fault();						\
@@ -161,7 +163,9 @@ do {									\
 			  : : op1, op2 : "cc" : error, fault);		\
 	return;								\
 error:									\
+	instrumentation_begin();					\
 	insn##_error(error_args);					\
+	instrumentation_end();						\
 	return;								\
 fault:									\
 	kvm_spurious_fault();						\
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -27,7 +27,7 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
-	.text
+.section .noinstr.text, "ax"
 
 /**
  * vmx_vmenter - VM-Enter the current loaded VMCS
@@ -234,6 +234,9 @@ SYM_FUNC_START(__vmx_vcpu_run)
 	jmp 1b
 SYM_FUNC_END(__vmx_vcpu_run)
 
+
+.section .text, "ax"
+
 /**
  * vmread_error_trampoline - Trampoline from inline asm to vmread_error()
  * @field:	VMCS field encoding that failed
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6113,7 +6113,7 @@ static int vmx_handle_exit(struct kvm_vc
  * information but as all relevant affected CPUs have 32KiB L1D cache size
  * there is no point in doing so.
  */
-static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 {
 	int size = PAGE_SIZE << L1D_CACHE_ORDER;
 
@@ -6146,7 +6146,7 @@ static void vmx_l1d_flush(struct kvm_vcp
 	vcpu->stat.l1d_flush++;
 
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
-		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+		native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
 		return;
 	}
 
@@ -6632,7 +6632,7 @@ static void vmx_update_hv_timer(struct k
 	}
 }
 
-void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
+void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
 {
 	if (unlikely(host_rsp != vmx->loaded_vmcs->host_state.rsp)) {
 		vmx->loaded_vmcs->host_state.rsp = host_rsp;
@@ -6654,6 +6654,63 @@ static fastpath_t vmx_exit_handlers_fast
 
 bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs, bool launched);
 
+static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
+					struct vcpu_vmx *vmx)
+{
+	/*
+	 * VMENTER enables interrupts (host state), but the kernel state is
+	 * interrupts disabled when this is invoked. Also tell RCU about
+	 * it. This is the same logic as for exit_to_user_mode().
+	 *
+	 * This ensures that e.g. latency analysis on the host observes
+	 * guest mode as interrupt enabled.
+	 *
+	 * guest_enter_irqoff() informs context tracking about the
+	 * transition to guest mode and if enabled adjusts RCU state
+	 * accordingly.
+	 */
+	instrumentation_begin();
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+	instrumentation_end();
+
+	guest_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
+
+	/* L1D Flush includes CPU buffer clear to mitigate MDS */
+	if (static_branch_unlikely(&vmx_l1d_should_flush))
+		vmx_l1d_flush(vcpu);
+	else if (static_branch_unlikely(&mds_user_clear))
+		mds_clear_cpu_buffers();
+
+	if (vcpu->arch.cr2 != read_cr2())
+		write_cr2(vcpu->arch.cr2);
+
+	vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
+				   vmx->loaded_vmcs->launched);
+
+	vcpu->arch.cr2 = read_cr2();
+
+	/*
+	 * VMEXIT disables interrupts (host state), but tracing and lockdep
+	 * have them in state 'on' as recorded before entering guest mode.
+	 * Same as enter_from_user_mode().
+	 *
+	 * guest_exit_irqoff() restores host context and reinstates RCU if
+	 * enabled and required.
+	 *
+	 * This needs to be done before the below as native_read_msr()
+	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
+	 * into world and some more.
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	guest_exit_irqoff();
+
+	instrumentation_begin();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+}
+
 static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	fastpath_t exit_fastpath;
@@ -6728,52 +6785,8 @@ static fastpath_t vmx_vcpu_run(struct kv
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-
-	/* L1D Flush includes CPU buffer clear to mitigate MDS */
-	if (static_branch_unlikely(&vmx_l1d_should_flush))
-		vmx_l1d_flush(vcpu);
-	else if (static_branch_unlikely(&mds_user_clear))
-		mds_clear_cpu_buffers();
-
-	if (vcpu->arch.cr2 != read_cr2())
-		write_cr2(vcpu->arch.cr2);
-
-	vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
-				   vmx->loaded_vmcs->launched);
-
-	vcpu->arch.cr2 = read_cr2();
-
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * guest_exit_irqoff() restores host context and reinstates RCU if
-	 * enabled and required.
-	 *
-	 * This needs to be done before the below as native_read_msr()
-	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
-	 * into world and some more.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
-	trace_hardirqs_off_finish();
+	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
+	vmx_vcpu_enter_exit(vcpu, vmx);
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -379,7 +379,7 @@ int kvm_set_apic_base(struct kvm_vcpu *v
 }
 EXPORT_SYMBOL_GPL(kvm_set_apic_base);
 
-asmlinkage __visible void kvm_spurious_fault(void)
+asmlinkage __visible noinstr void kvm_spurious_fault(void)
 {
 	/* Fault while not rebooting.  We want the trace. */
 	BUG_ON(!kvm_rebooting);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 5/7] x86/kvm/svm: Move guest enter/exit into .noinstr.text
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-07-08 19:51 ` [patch V2 4/7] x86/kvm/vmx: Move guest enter/exit into .noinstr.text Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:51 ` [patch V2 6/7] x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS Thomas Gleixner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

Move the functions which are inside the RCU off region into the
non-instrumentable text section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

---
 arch/x86/kvm/svm/svm.c     |   98 +++++++++++++++++++++++++--------------------
 arch/x86/kvm/svm/vmenter.S |    2 
 2 files changed, 56 insertions(+), 44 deletions(-)

--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3344,6 +3344,60 @@ static fastpath_t svm_exit_handlers_fast
 
 void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs);
 
+static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
+					struct vcpu_svm *svm)
+{
+	/*
+	 * VMENTER enables interrupts (host state), but the kernel state is
+	 * interrupts disabled when this is invoked. Also tell RCU about
+	 * it. This is the same logic as for exit_to_user_mode().
+	 *
+	 * This ensures that e.g. latency analysis on the host observes
+	 * guest mode as interrupt enabled.
+	 *
+	 * guest_enter_irqoff() informs context tracking about the
+	 * transition to guest mode and if enabled adjusts RCU state
+	 * accordingly.
+	 */
+	instrumentation_begin();
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+	instrumentation_end();
+
+	guest_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
+
+	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
+
+#ifdef CONFIG_X86_64
+	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+#else
+	loadsegment(fs, svm->host.fs);
+#ifndef CONFIG_X86_32_LAZY_GS
+	loadsegment(gs, svm->host.gs);
+#endif
+#endif
+
+	/*
+	 * VMEXIT disables interrupts (host state), but tracing and lockdep
+	 * have them in state 'on' as recorded before entering guest mode.
+	 * Same as enter_from_user_mode().
+	 *
+	 * guest_exit_irqoff() restores host context and reinstates RCU if
+	 * enabled and required.
+	 *
+	 * This needs to be done before the below as native_read_msr()
+	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
+	 * into world and some more.
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	guest_exit_irqoff();
+
+	instrumentation_begin();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+}
+
 static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	fastpath_t exit_fastpath;
@@ -3399,49 +3453,7 @@ static __no_kcsan fastpath_t svm_vcpu_ru
 	 */
 	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-
-	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
-
-#ifdef CONFIG_X86_64
-	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
-	loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-	loadsegment(gs, svm->host.gs);
-#endif
-#endif
-
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * guest_exit_irqoff() restores host context and reinstates RCU if
-	 * enabled and required.
-	 *
-	 * This needs to be done before the below as native_read_msr()
-	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
-	 * into world and some more.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
-	trace_hardirqs_off_finish();
+	svm_vcpu_enter_exit(vcpu, svm);
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -27,7 +27,7 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
-	.text
+.section .noinstr.text, "ax"
 
 /**
  * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 6/7] x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
                   ` (4 preceding siblings ...)
  2020-07-08 19:51 ` [patch V2 5/7] x86/kvm/svm: " Thomas Gleixner
@ 2020-07-08 19:51 ` Thomas Gleixner
  2020-07-08 19:52 ` [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2() Thomas Gleixner
  2020-07-08 20:28 ` [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Paolo Bonzini
  7 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Juergen Gross, Alexandre Chartre,
	Peter Zijlstra

From: Thomas Gleixner <tglx@linutronix.de>

On guest exit MSR_GS_BASE contains whatever the guest wrote to it and the
first action after returning from the ASM code is to set it to the host
kernel value. This uses wrmsrl() which is interesting at least.

wrmsrl() is either using native_write_msr() or the paravirt variant. The
XEN_PV code is uninteresting as nested SVM in a XEN_PV guest does not work.

But native_write_msr() can be placed out of line by the compiler especially
when paravirtualization is enabled in the kernel configuration. The
function is marked notrace, but still can be probed if
CONFIG_KPROBE_EVENTS_ON_NOTRACE is enabled.

That would be a fatal problem as kprobe events use per-CPU variables which
are GS based and would be accessed with the guest GS. Depending on the GS
value this would either explode in colorful ways or lead to completely
undebugable data corruption.

Aside of that native_write_msr() contains a tracepoint which objtool
complains about as it is invoked from the noinstr section.

As this cannot run inside a XEN_PV guest there is no point in using
wrmsrl(). Use native_wrmsrl() instead which is just a plain native WRMSR
without tracing or anything else attached.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kvm/svm/svm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3370,7 +3370,7 @@ static noinstr void svm_vcpu_enter_exit(
 	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
 
 #ifdef CONFIG_X86_64
-	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+	native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
 #else
 	loadsegment(fs, svm->host.fs);
 #ifndef CONFIG_X86_32_LAZY_GS


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2()
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
                   ` (5 preceding siblings ...)
  2020-07-08 19:51 ` [patch V2 6/7] x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS Thomas Gleixner
@ 2020-07-08 19:52 ` Thomas Gleixner
  2020-07-09  4:13   ` Jürgen Groß
  2020-07-08 20:28 ` [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Paolo Bonzini
  7 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2020-07-08 19:52 UTC (permalink / raw)
  To: LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra,
	Juergen Gross

From: Thomas Gleixner <tglx@linutronix.de>

read/write_cr2() go throuh the paravirt XXL indirection, but nested VMX in
a XEN_PV guest is not supported.

Use the native variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kvm/vmx/vmx.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6683,13 +6683,13 @@ static noinstr void vmx_vcpu_enter_exit(
 	else if (static_branch_unlikely(&mds_user_clear))
 		mds_clear_cpu_buffers();
 
-	if (vcpu->arch.cr2 != read_cr2())
-		write_cr2(vcpu->arch.cr2);
+	if (vcpu->arch.cr2 != native_read_cr2())
+		native_write_cr2(vcpu->arch.cr2);
 
 	vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs,
 				   vmx->loaded_vmcs->launched);
 
-	vcpu->arch.cr2 = read_cr2();
+	vcpu->arch.cr2 = native_read_cr2();
 
 	/*
 	 * VMEXIT disables interrupts (host state), but tracing and lockdep


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections
  2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
                   ` (6 preceding siblings ...)
  2020-07-08 19:52 ` [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2() Thomas Gleixner
@ 2020-07-08 20:28 ` Paolo Bonzini
  7 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2020-07-08 20:28 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, kvm, Alexandre Chartre, Peter Zijlstra, Juergen Gross

On 08/07/20 21:51, Thomas Gleixner wrote:
> Folks,
> 
> this is a rebased and adopted version of the original series which is
> available here:
> 
>      https://lore.kernel.org/r/20200519203128.773151484@linutronix.de
>  
> It deals with the RCU and context tracking state and the protection against
> instrumentation in sensitive places:
> 
>   - Placing the guest_enter/exit() calls at the correct place
> 
>   - Moving the sensitive VMENTER/EXIT code into the non-instrumentable code
>     section.
> 
>   - Fixup the tracing code to comply with the non-instrumentation rules
> 
>   - Use native functions to access CR2 and the GS base MSR in the critical
>     code pathes to prevent them from being instrumented.
> 
> Thanks,
> 
> 	tglx
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2()
  2020-07-08 19:52 ` [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2() Thomas Gleixner
@ 2020-07-09  4:13   ` Jürgen Groß
  0 siblings, 0 replies; 10+ messages in thread
From: Jürgen Groß @ 2020-07-09  4:13 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Paolo Bonzini, kvm, Alexandre Chartre, Peter Zijlstra

On 08.07.20 21:52, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> read/write_cr2() go throuh the paravirt XXL indirection, but nested VMX in
> a XEN_PV guest is not supported.
> 
> Use the native variants.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-09  4:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 19:51 [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Thomas Gleixner
2020-07-08 19:51 ` [patch V2 1/7] x86/kvm: Move context tracking where it belongs Thomas Gleixner
2020-07-08 19:51 ` [patch V2 2/7] x86/kvm/vmx: Add hardirq tracing to guest enter/exit Thomas Gleixner
2020-07-08 19:51 ` [patch V2 3/7] x86/kvm/svm: Add hardirq tracing on " Thomas Gleixner
2020-07-08 19:51 ` [patch V2 4/7] x86/kvm/vmx: Move guest enter/exit into .noinstr.text Thomas Gleixner
2020-07-08 19:51 ` [patch V2 5/7] x86/kvm/svm: " Thomas Gleixner
2020-07-08 19:51 ` [patch V2 6/7] x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS Thomas Gleixner
2020-07-08 19:52 ` [patch V2 7/7] x86/kvm/vmx: Use native read/write_cr2() Thomas Gleixner
2020-07-09  4:13   ` Jürgen Groß
2020-07-08 20:28 ` [patch V2 0/7] x86/kvm: RCU/context tracking and instrumentation protections Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).