All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [RFC PATCH 1/6] kvm: handle host mode irqs 1
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
@ 2018-07-21 20:16 ` Nicolai Stange
  2018-07-21 20:25 ` [MODERATED] [RFC PATCH 2/6] kvm: handle host mode irqs 2 Nicolai Stange
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-21 20:16 UTC (permalink / raw)
  To: speck

vmx_l1d_flush() gets invoked only if ->l1tf_flush_l1d is true. There's
no point in setting ->l1tf_flush_l1d to true from there again.

Don't do that.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/kvm/vmx.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b4b8e8cb4a7e..695e161ffb36 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9691,15 +9691,15 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 	/*
 	 * This code is only executed when the the flush mode is 'cond' or
 	 * 'always'
-	 *
-	 * If 'flush always', keep the flush bit set, otherwise clear
-	 * it. The flush bit gets set again either from vcpu_run() or from
-	 * one of the unsafe VMEXIT handlers.
 	 */
-	if (static_branch_unlikely(&vmx_l1d_flush_always))
-		vcpu->arch.l1tf_flush_l1d = true;
-	else
+	if (!static_branch_unlikely(&vmx_l1d_flush_always)) {
+		/*
+		 * Clear the flush bit, it gets set again either from
+		 * vcpu_run() or from one of the unsafe VMEXIT
+		 * handlers.
+		 */
 		vcpu->arch.l1tf_flush_l1d = false;
+	}
 
 	vcpu->stat.l1d_flush++;
 
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 2/6] kvm: handle host mode irqs 2
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
  2018-07-21 20:16 ` [MODERATED] [RFC PATCH 1/6] kvm: handle host mode irqs 1 Nicolai Stange
@ 2018-07-21 20:25 ` Nicolai Stange
  2018-07-21 20:35 ` [MODERATED] [RFC PATCH 3/6] kvm: handle host mode irqs 3 Nicolai Stange
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-21 20:25 UTC (permalink / raw)
  To: speck

The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".

The "conditional" mode's implementation tends to require more
sophisticated logic than the "always" mode.

Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.

There is no change in functionality.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/kvm/vmx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 695e161ffb36..5139738cc5a9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -189,7 +189,7 @@ module_param(ple_window_max, uint, 0444);
 extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
-static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always);
+static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
 static DEFINE_MUTEX(vmx_l1d_flush_mutex);
 
 /* Storage for pre module init parameter parsing */
@@ -263,10 +263,10 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 	else
 		static_branch_disable(&vmx_l1d_should_flush);
 
-	if (l1tf == VMENTER_L1D_FLUSH_ALWAYS)
-		static_branch_enable(&vmx_l1d_flush_always);
+	if (l1tf == VMENTER_L1D_FLUSH_COND)
+		static_branch_enable(&vmx_l1d_flush_cond);
 	else
-		static_branch_disable(&vmx_l1d_flush_always);
+		static_branch_disable(&vmx_l1d_flush_cond);
 	return 0;
 }
 
@@ -9692,7 +9692,7 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 	 * This code is only executed when the the flush mode is 'cond' or
 	 * 'always'
 	 */
-	if (!static_branch_unlikely(&vmx_l1d_flush_always)) {
+	if (static_branch_likely(&vmx_l1d_flush_cond)) {
 		/*
 		 * Clear the flush bit, it gets set again either from
 		 * vcpu_run() or from one of the unsafe VMEXIT
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 3/6] kvm: handle host mode irqs 3
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
  2018-07-21 20:16 ` [MODERATED] [RFC PATCH 1/6] kvm: handle host mode irqs 1 Nicolai Stange
  2018-07-21 20:25 ` [MODERATED] [RFC PATCH 2/6] kvm: handle host mode irqs 2 Nicolai Stange
@ 2018-07-21 20:35 ` Nicolai Stange
  2018-07-22  9:35 ` [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4 Nicolai Stange
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-21 20:35 UTC (permalink / raw)
  To: speck

Currently, vmx_vcpu_run() checks if ->l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.

This test is unncessary for the "always flush L1d" mode.

Move the check to vmx_l1d_flush()'s conditional mode code path.

Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
  extra function call.
- This change inverts the (static) branch prediction, but there hadn't been
  any explicit likely()/unlikely() annotations before and so I decided
  to leave it as is.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/kvm/vmx.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5139738cc5a9..ec955b870756 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9693,12 +9693,16 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 	 * 'always'
 	 */
 	if (static_branch_likely(&vmx_l1d_flush_cond)) {
+		bool flush_l1d = vcpu->arch.l1tf_flush_l1d;
+
 		/*
 		 * Clear the flush bit, it gets set again either from
 		 * vcpu_run() or from one of the unsafe VMEXIT
 		 * handlers.
 		 */
 		vcpu->arch.l1tf_flush_l1d = false;
+		if (!flush_l1d)
+			return;
 	}
 
 	vcpu->stat.l1d_flush++;
@@ -10228,10 +10232,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	evmcs_rsp = static_branch_unlikely(&enable_evmcs) ?
 		(unsigned long)&current_evmcs->host_rsp : 0;
 
-	if (static_branch_unlikely(&vmx_l1d_should_flush)) {
-		if (vcpu->arch.l1tf_flush_l1d)
-			vmx_l1d_flush(vcpu);
-	}
+	if (static_branch_unlikely(&vmx_l1d_should_flush))
+		vmx_l1d_flush(vcpu);
 
 	asm(
 		/* Store host registers */
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
                   ` (2 preceding siblings ...)
  2018-07-21 20:35 ` [MODERATED] [RFC PATCH 3/6] kvm: handle host mode irqs 3 Nicolai Stange
@ 2018-07-22  9:35 ` Nicolai Stange
  2018-07-23 15:40   ` [MODERATED] " Andi Kleen
  2018-07-24 15:21   ` Peter Zijlstra
  2018-07-22 11:06 ` [MODERATED] [RFC PATCH 5/6] kvm: handle host mode irqs 5 Nicolai Stange
  2018-07-22 11:38 ` [MODERATED] [RFC PATCH 6/6] kvm: handle host mode irqs 6 Nicolai Stange
  5 siblings, 2 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-22  9:35 UTC (permalink / raw)
  To: speck

Part of the L1TF mitigation for vmx includes flushing the L1d cache upon
VMENTRY.

L1d flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.

If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.

As it stands, external interrupts taken while a guest is running, i.e.
inbetween a VMENTER/VMEXIT pair, will cause L1d flushes.

However, external interrupts delivered "normally" after the VMEXIT and
before the next VMENTER don't.

Part of the reason is that interrupts taken from host mode don't leave any
traces and thus, the vmx code can't tell if an interrupt has happened since
the last VMEXIT, even if it wanted to.

Implement "kernel mode irq tracking" to address this.

Introduce a per-cpu generation count, kernel_mode_irq_gen. Its semantics
are defined as follows: if its values as read at two points in time compare
equal, then an interrupt has not happened inbetween while in kernel mode
on that CPU.

Let the 32 and 64 bit interrupt specific entry code increment this
generation counter. Note that the increment is done unconditionally to
safe a branch. This doesn't matter though because nobody is watching
anyway when interrupts are taken from userspace.

NMIs are not tracked -- non-racy usage of kernel_mode_irq_gen requires
interrupts to be off.

Exceptions are not tracked -- VM exit handlers that can potentially trap
should be made to cause L1d flushes by themselves already.

Finally, introduce the X86_TRACK_KERNEL_MODE_IRQS Kconfig option to
enable all of the above. Make it default to disabled. KVM_INTEL will select
it in a later patch.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/Kconfig                 |  3 +++
 arch/x86/entry/entry_32.S        | 12 ++++++++++++
 arch/x86/entry/entry_64.S        | 12 ++++++++++++
 arch/x86/include/asm/processor.h |  4 ++++
 arch/x86/kernel/cpu/common.c     |  5 +++++
 5 files changed, 36 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7a34fdf8daf0..4704d23bfcdb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2935,6 +2935,9 @@ config X86_DMA_REMAP
 	bool
 	depends on STA2X11
 
+config X86_TRACK_KERNEL_MODE_IRQS
+	bool
+
 config HAVE_GENERIC_GUP
 	def_bool y
 
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2582881d19ce..aeb3eb142983 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -663,6 +663,16 @@ ENTRY(irq_entries_start)
     .endr
 END(irq_entries_start)
 
+.macro TRACK_KERNEL_MODE_IRQ
+#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
+	/*
+	 * Increment the kernel mode irq generation count even if
+	 * coming from userspace -- nobody will care.
+	 */
+	incl	PER_CPU_VAR(kernel_mode_irq_gen)
+#endif
+.endm
+
 /*
  * the CPU automatically disables interrupts when executing an IRQ vector,
  * so IRQ-flags tracing has to follow that:
@@ -674,6 +684,7 @@ common_interrupt:
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
+	TRACK_KERNEL_MODE_IRQ
 	movl	%esp, %eax
 	call	do_IRQ
 	jmp	ret_from_intr
@@ -686,6 +697,7 @@ ENTRY(name)				\
 	SAVE_ALL;			\
 	ENCODE_FRAME_POINTER;		\
 	TRACE_IRQS_OFF			\
+	TRACK_KERNEL_MODE_IRQ;		\
 	movl	%esp, %eax;		\
 	call	fn;			\
 	jmp	ret_from_intr;		\
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 73a522d53b53..0f8be73cdb26 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -543,6 +543,16 @@ END(irq_entries_start)
 	decl	PER_CPU_VAR(irq_count)
 .endm
 
+.macro TRACK_KERNEL_MODE_IRQ
+#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
+	/*
+	 * Increment the kernel mode irq generation count even if
+	 * coming from userspace -- nobody will care.
+	 */
+	incl	PER_CPU_VAR(kernel_mode_irq_gen)
+#endif
+.endm
+
 /*
  * Interrupt entry helper function.
  *
@@ -624,6 +634,8 @@ ENTRY(interrupt_entry)
 	/* We entered an interrupt context - irqs are off: */
 	TRACE_IRQS_OFF
 
+	TRACK_KERNEL_MODE_IRQ
+
 	ret
 END(interrupt_entry)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 79e409974ccc..f0823c667f4d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -441,6 +441,10 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif	/* X86_64 */
 
+#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
+DECLARE_PER_CPU(unsigned int, kernel_mode_irq_gen);
+#endif
+
 extern unsigned int fpu_kernel_xstate_size;
 extern unsigned int fpu_user_xstate_size;
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0768492f4687..274a135c6ca9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1619,6 +1619,11 @@ DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 
 #endif	/* CONFIG_X86_64 */
 
+#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
+DEFINE_PER_CPU(unsigned int, kernel_mode_irq_gen);
+EXPORT_PER_CPU_SYMBOL_GPL(kernel_mode_irq_gen);
+#endif
+
 /*
  * Clear all 6 debug registers:
  */
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 5/6] kvm: handle host mode irqs 5
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
                   ` (3 preceding siblings ...)
  2018-07-22  9:35 ` [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4 Nicolai Stange
@ 2018-07-22 11:06 ` Nicolai Stange
  2018-07-22 11:38 ` [MODERATED] [RFC PATCH 6/6] kvm: handle host mode irqs 6 Nicolai Stange
  5 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-22 11:06 UTC (permalink / raw)
  To: speck

As part of its L1TF mitigation in conditional mode, the vmx code flushes
the L1d cache on VMENTRY if the preceeding VMEXIT was caused by an external
interrupt, for example.

This covers interrupts issued in the first part of a
VMENTER->VMEXIT->VMENTER cycle. However, those taken in the second part,
i.e. while in host mode, are not being handled yet.

Use the recently added kernel_mode_irq_gen generation counter to detect
those. Make it available by selecting X86_TRACK_KERNEL_MODE_IRQS from the
KVM_INTEL Kconfig option.

Introduce a new field, ->last_kernel_mode_irq_gen, to struct kvm_vcpu_arch.
Store a snapshot of kernel_mode_irq_gen to it right after VMEXIT when
interrupts are still disabled.

Let the VMENTER conditional flushing logic in vmx_l1d_flush() compare that
snapshot against the current value and cause a L1d flush if they don't
match. Note that interrupts are disabled again when vmx_l1d_flush() runs
and this extends up to the actual VM entry.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Kconfig            |  1 +
 arch/x86/kvm/vmx.c              | 20 +++++++++++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 57d418061c55..3831f0eada4b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -716,6 +716,7 @@ struct kvm_vcpu_arch {
 
 	/* Flush the L1 Data cache for L1TF mitigation on VMENTER */
 	bool l1tf_flush_l1d;
+	unsigned int last_kernel_mode_irq_gen;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 92fd433c50b9..78372b792927 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_INTEL
 	depends on KVM
 	# for perf_guest_get_msrs():
 	depends on CPU_SUP_INTEL
+	select X86_TRACK_KERNEL_MODE_IRQS
 	---help---
 	  Provides support for KVM on Intel processors equipped with the VT
 	  extensions.
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ec955b870756..3db17606c418 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9701,8 +9701,23 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 		 * handlers.
 		 */
 		vcpu->arch.l1tf_flush_l1d = false;
-		if (!flush_l1d)
+
+		/*
+		 * Flush the L1d if
+		 *  - an unsafe VMEXIT handler had been executed or
+		 *  - an interrupt handler had been run after the last
+		 *    VMEXIT, either directly or indirectly from
+		 *    vmx_handle_external_intr().
+		 *
+		 * Note that if we're on a different CPU than the one
+		 * where ->last_kernel_mode_irq_gen was taken, then
+		 * flush_l1d is set anyway.
+		 */
+		if (!flush_l1d &&
+		    (vcpu->arch.last_kernel_mode_irq_gen ==
+		     *this_cpu_ptr(&kernel_mode_irq_gen))) {
 			return;
+		}
 	}
 
 	vcpu->stat.l1d_flush++;
@@ -10435,6 +10450,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx_complete_atomic_exit(vmx);
 	vmx_recover_nmi_blocking(vmx);
 	vmx_complete_interrupts(vmx);
+
+	vcpu->arch.last_kernel_mode_irq_gen =
+		*this_cpu_ptr(&kernel_mode_irq_gen);
 }
 STACK_FRAME_NON_STANDARD(vmx_vcpu_run);
 
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 6/6] kvm: handle host mode irqs 6
  2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
                   ` (4 preceding siblings ...)
  2018-07-22 11:06 ` [MODERATED] [RFC PATCH 5/6] kvm: handle host mode irqs 5 Nicolai Stange
@ 2018-07-22 11:38 ` Nicolai Stange
  5 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-22 11:38 UTC (permalink / raw)
  To: speck

For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the entry code through the host's IDT.

It follows that these interrupts get accounted for in the
->kernel_mode_irq_gen generation counter. The subsequently executed
vmx_l1d_flush() will thus be aware that some interrupts have happened and
conduct a L1d flush anyway.

Setting ->l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/kvm/vmx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3db17606c418..3c8da75518b4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9989,7 +9989,6 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 			[ss]"i"(__KERNEL_DS),
 			[cs]"i"(__KERNEL_CS)
 			);
-		vcpu->arch.l1tf_flush_l1d = true;
 	}
 }
 STACK_FRAME_NON_STANDARD(vmx_handle_external_intr);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0
@ 2018-07-23  9:54 Nicolai Stange
  2018-07-21 20:16 ` [MODERATED] [RFC PATCH 1/6] kvm: handle host mode irqs 1 Nicolai Stange
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-23  9:54 UTC (permalink / raw)
  To: speck

Hi all,

there had been some discussion on whether or not IRQs that happened between
vmexit and the next vmenter should trigger L1d flushes, c.f. [1] and [2].

This series implements what Paolo suggested at [2]:

> handle at least the common case of softirqs by adding some kind of
> "# of times softirqs were run" percpu variable.

I chose to track irqs instead of softirqs only.

[1/6] - [3/6] are preparatory
[4/6] implements the "# of time irqs were run" percpu variable
[5/6] makes vmx issue L1d flushes if that count has changed
[6/6] is a followup cleanup to that

Each patch has been compile-tested on i686 and x86_64. However, only the
latter received some runtime testing.

Thanks,

Nicolai

[1] dc14f2c1-f69c-e690-dad2-8b4f1b772ee3@suse.de
[2] 3270ad99-2008-ccea-8e0a-d3fbf7fc4e50@redhat.com


Nicolai Stange (6):
  x86/KVM/VMX: don't set ->l1tf_flush_l1d to true from vmx_l1d_flush()
  x86/KVM/VMX: replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
  x86/KVM/VMX: move the ->l1tf_flush_l1d test to vmx_l1d_flush()
  x86: implement kernel mode irq tracking
  x86/KVM/VMX: flush L1d upon interrupt after VMEXIT
  x86/KVM/VMX: don't set ->l1tf_flush_l1d from
    vmx_handle_external_intr()

 arch/x86/Kconfig                 |  3 +++
 arch/x86/entry/entry_32.S        | 12 ++++++++++
 arch/x86/entry/entry_64.S        | 12 ++++++++++
 arch/x86/include/asm/kvm_host.h  |  1 +
 arch/x86/include/asm/processor.h |  4 ++++
 arch/x86/kernel/cpu/common.c     |  5 ++++
 arch/x86/kvm/Kconfig             |  1 +
 arch/x86/kvm/vmx.c               | 51 +++++++++++++++++++++++++++-------------
 8 files changed, 73 insertions(+), 16 deletions(-)

-- 
2.13.7

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-22  9:35 ` [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4 Nicolai Stange
@ 2018-07-23 15:40   ` Andi Kleen
  2018-07-24  5:58     ` Nicolai Stange
  2018-07-24 15:21   ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2018-07-23 15:40 UTC (permalink / raw)
  To: speck

> +.macro TRACK_KERNEL_MODE_IRQ
> +#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
> +	/*
> +	 * Increment the kernel mode irq generation count even if
> +	 * coming from userspace -- nobody will care.
> +	 */
> +	incl	PER_CPU_VAR(kernel_mode_irq_gen)
> +#endif
> +.endm

You can put it into irq_enter/exit, no need to mess with assembler code.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-23 15:40   ` [MODERATED] " Andi Kleen
@ 2018-07-24  5:58     ` Nicolai Stange
  2018-07-24 14:12       ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolai Stange @ 2018-07-24  5:58 UTC (permalink / raw)
  To: speck

speck for Andi Kleen  <speck@linutronix.de> writes:

>> +.macro TRACK_KERNEL_MODE_IRQ
>> +#ifdef CONFIG_X86_TRACK_KERNEL_MODE_IRQS
>> +	/*
>> +	 * Increment the kernel mode irq generation count even if
>> +	 * coming from userspace -- nobody will care.
>> +	 */
>> +	incl	PER_CPU_VAR(kernel_mode_irq_gen)
>> +#endif
>> +.endm
>
> You can put it into irq_enter/exit, no need to mess with assembler code.

The reason I put it there was to avoid touching anything outside of
arch/x86.

In addition,
- uv_bau_message_interrupt() doesn't call irq_enter() at all,
- smp_reschedule_interrupt() calls it conditionally

But I'm not sure it matters much and either way is fine by me.

Thanks,

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-24  5:58     ` Nicolai Stange
@ 2018-07-24 14:12       ` Andi Kleen
  2018-07-24 14:39         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2018-07-24 14:12 UTC (permalink / raw)
  To: speck

> > You can put it into irq_enter/exit, no need to mess with assembler code.
> 
> The reason I put it there was to avoid touching anything outside of
> arch/x86.
> 
> In addition,
> - uv_bau_message_interrupt() doesn't call irq_enter() at all,
> - smp_reschedule_interrupt() calls it conditionally
> 
> But I'm not sure it matters much and either way is fine by me.

Upstream normally prefers C changes

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-24 14:12       ` Andi Kleen
@ 2018-07-24 14:39         ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2018-07-24 14:39 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

On 24/07/2018 16:12, speck for Andi Kleen wrote:
>>> You can put it into irq_enter/exit, no need to mess with assembler code.
>>
>> The reason I put it there was to avoid touching anything outside of
>> arch/x86.
>>
>> In addition,
>> - uv_bau_message_interrupt() doesn't call irq_enter() at all,
>> - smp_reschedule_interrupt() calls it conditionally
>>
>> But I'm not sure it matters much and either way is fine by me.
> 
> Upstream normally prefers C changes

I agree with Nicolai that this seems more like an x86-specific change.

As an alternative to the generation count, the latched flag is also
possible (KVM writes 0 while interrupts are disabled, irq entry writes
1, KVM checks for 0 or 1).  The advantage is that it would be a smaller
change when Andi posts his code for SMT-safe vmexits.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-22  9:35 ` [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4 Nicolai Stange
  2018-07-23 15:40   ` [MODERATED] " Andi Kleen
@ 2018-07-24 15:21   ` Peter Zijlstra
  2018-07-25 11:45     ` Nicolai Stange
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2018-07-24 15:21 UTC (permalink / raw)
  To: speck

On Sun, Jul 22, 2018 at 11:35:38AM +0200, speck for Nicolai Stange wrote:

> Implement "kernel mode irq tracking" to address this.
> 
> Introduce a per-cpu generation count, kernel_mode_irq_gen. Its semantics
> are defined as follows: if its values as read at two points in time compare
> equal, then an interrupt has not happened inbetween while in kernel mode
> on that CPU.
> 
> Let the 32 and 64 bit interrupt specific entry code increment this
> generation counter. Note that the increment is done unconditionally to
> safe a branch. This doesn't matter though because nobody is watching
> anyway when interrupts are taken from userspace.
> 
> NMIs are not tracked -- non-racy usage of kernel_mode_irq_gen requires
> interrupts to be off.

Should this not be part of irq_cpustat_t ? And have you looked at how
much is already covered by kstat_incr_irqs_this_cpu() ? Also note that
struct kernel_stat already includes a count vector for all softirqs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-24 15:21   ` Peter Zijlstra
@ 2018-07-25 11:45     ` Nicolai Stange
  2018-07-27  7:45       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolai Stange @ 2018-07-25 11:45 UTC (permalink / raw)
  To: speck

speck for Peter Zijlstra  <speck@linutronix.de> writes:

> On Sun, Jul 22, 2018 at 11:35:38AM +0200, speck for Nicolai Stange wrote:
>
>> Implement "kernel mode irq tracking" to address this.
>> 
>> Introduce a per-cpu generation count, kernel_mode_irq_gen. Its semantics
>> are defined as follows: if its values as read at two points in time compare
>> equal, then an interrupt has not happened inbetween while in kernel mode
>> on that CPU.
>> 
>> Let the 32 and 64 bit interrupt specific entry code increment this
>> generation counter. Note that the increment is done unconditionally to
>> safe a branch. This doesn't matter though because nobody is watching
>> anyway when interrupts are taken from userspace.
>> 
>> NMIs are not tracked -- non-racy usage of kernel_mode_irq_gen requires
>> interrupts to be off.
>
> Should this not be part of irq_cpustat_t ? And have you looked at how
> much is already covered by kstat_incr_irqs_this_cpu() ? Also note that
> struct kernel_stat already includes a count vector for all softirqs.

kstat_incr_irqs_this_cpu() covers anything coming through do_IRQ(),
provided that irq_domain implementations call it from their associated
struct irq_desc ->handle_irq(). Those I checked do, but there's no
guarantee AFAICT.

Anyway, if we were to reuse the existing stats you suggested above, then
- either softirqs only or
- softirqs + do_IRQ()
could get made to trigger L1d flushes.

I'd still opt for also including the non-do_IRQ() irqs though, because
- The number of additional L1d flushes would be limited: Paolo
  mentioned somewhere that the expected time between vmexit -> vmenter
  where ->l1tf_flush_l1d would not have been set already is ~2500 clock
  cycles.
- The non-do_IRQ() irqs wouldn't have to get audited for potentially
  pulling in any sensitive data.

So if nobody objects, I'll go with what Paolo proposed at [1]: replace
the 'kernel_mode_irq_gen' counter with a latch flag and set that for any
irq (but NMIs for a start).

Thanks,

Nicolai


[1] 25bdef0a-d65a-a8f8-020c-f9a2056be76f@redhat.com

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-25 11:45     ` Nicolai Stange
@ 2018-07-27  7:45       ` Peter Zijlstra
  2018-07-27  9:17         ` Nicolai Stange
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2018-07-27  7:45 UTC (permalink / raw)
  To: speck

On Wed, Jul 25, 2018 at 01:45:32PM +0200, speck for Nicolai Stange wrote:
> So if nobody objects, I'll go with what Paolo proposed at [1]: replace
> the 'kernel_mode_irq_gen' counter with a latch flag and set that for any
> irq (but NMIs for a start).

Well, the reason I suggested the other counters, is to avoid adding yet
another dirty cacheline to IRQs. Since we're already touching those
fields for most interrupts anyway.. might as well extend them to cover
what we need instead of adding more dirty lines.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-27  7:45       ` Peter Zijlstra
@ 2018-07-27  9:17         ` Nicolai Stange
  2018-07-27  9:55           ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolai Stange @ 2018-07-27  9:17 UTC (permalink / raw)
  To: speck

speck for Peter Zijlstra  <speck@linutronix.de> writes:

> On Wed, Jul 25, 2018 at 01:45:32PM +0200, speck for Nicolai Stange wrote:
>> So if nobody objects, I'll go with what Paolo proposed at [1]: replace
>> the 'kernel_mode_irq_gen' counter with a latch flag and set that for any
>> irq (but NMIs for a start).
>
> Well, the reason I suggested the other counters, is to avoid adding yet
> another dirty cacheline to IRQs. Since we're already touching those
> fields for most interrupts anyway.. might as well extend them to cover
> what we need instead of adding more dirty lines.
>

Ah I see, that makes sense of course.

I think there are two possible ways forward:

1.) Squeeze that new "kvm_cpu_l1tf_flush_l1d" latch flag into
    irq_cpustat_t. irq_cpustat_t is arch-specific and there
    is some room in its first cacheline, for example inserting
    up to four bytes right after ->irq_tlb_count wouldn't move
    anything critical out.

2.) Make the non-do_IRQ() irqs, i.e. smp_apic_timer_interrupt() & Co,
    call kstat_incr_irqs_this_cpu() and adapt arch_irq_stat_cpu()
    accordingly. Let vmx use struct kernel_stat ->irqs_sum as a
    irq generation counter as before.

Pros 1.)
- Easy to implement.
- Doesn't rely on irq_domain drivers.

Cons 1.)
- Extra store from do_IRQ() irqs in comparison to 2.)
- That "kvm_cpu_l1tf_flush_l1d" flag doesn't really qualify
  as an "IRQ stat" and it might be surprising to have it there.

Pros 2.)
- Might be cleaner, i.e. no random KVM specific flag in irq_cpustat_t.
- Can perhaps be reused in non-KVM contexts, should there ever be a need
  for it.

Cons 2.)
- Slightly more implementation work, i.e. fiddling with
  arch_irq_stat_cpu().
- Relies on irq_domain drivers in that whatever they've installed
  at struct irq_desc ->handle_irq() does a kstat_incr_irqs_this_cpu().

I'm undecided. Thoughts?

Thanks,

Nicolai

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-27  9:17         ` Nicolai Stange
@ 2018-07-27  9:55           ` Paolo Bonzini
  2018-07-29 20:00             ` Nicolai Stange
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2018-07-27  9:55 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]

On 27/07/2018 11:17, speck for Nicolai Stange wrote:
> speck for Peter Zijlstra  <speck@linutronix.de> writes:
> 
>> On Wed, Jul 25, 2018 at 01:45:32PM +0200, speck for Nicolai Stange wrote:
>>> So if nobody objects, I'll go with what Paolo proposed at [1]: replace
>>> the 'kernel_mode_irq_gen' counter with a latch flag and set that for any
>>> irq (but NMIs for a start).
>>
>> Well, the reason I suggested the other counters, is to avoid adding yet
>> another dirty cacheline to IRQs. Since we're already touching those
>> fields for most interrupts anyway.. might as well extend them to cover
>> what we need instead of adding more dirty lines.
>>
> 
> Ah I see, that makes sense of course.
> 
> I think there are two possible ways forward:
> 
> 1.) Squeeze that new "kvm_cpu_l1tf_flush_l1d" latch flag into
>     irq_cpustat_t. irq_cpustat_t is arch-specific and there
>     is some room in its first cacheline, for example inserting
>     up to four bytes right after ->irq_tlb_count wouldn't move
>     anything critical out.
> 
> 2.) Make the non-do_IRQ() irqs, i.e. smp_apic_timer_interrupt() & Co,
>     call kstat_incr_irqs_this_cpu() and adapt arch_irq_stat_cpu()
>     accordingly. Let vmx use struct kernel_stat ->irqs_sum as a
>     irq generation counter as before.
> 
> Pros 1.)
> - Easy to implement.
> - Doesn't rely on irq_domain drivers.
> 
> Cons 1.)
> - Extra store from do_IRQ() irqs in comparison to 2.)
> - That "kvm_cpu_l1tf_flush_l1d" flag doesn't really qualify
>   as an "IRQ stat" and it might be surprising to have it there.

There's precedent for this, for example __softirq_pending.  In fact you
could reduce __softirq_pending to an u16 and not increase the size of
irq_cpustat_t, since there are <16 softirqs.

> - Can perhaps be reused in non-KVM contexts, should there ever be a need
>   for it.

If there was a need for it in other contexts, you could also have more
than one bit in the latch.  For example bit 0 is KVM, bit 1 is for
RandomOtherThing; then do_IRQ() sets all bits and each subsystem only
clears its own.

Thanks,

Paolo

> 
> Pros 2.)
> - Might be cleaner, i.e. no random KVM specific flag in irq_cpustat_t.
> - Can perhaps be reused in non-KVM contexts, should there ever be a need
>   for it.
> 
> Cons 2.)
> - Slightly more implementation work, i.e. fiddling with
>   arch_irq_stat_cpu().
> - Relies on irq_domain drivers in that whatever they've installed
>   at struct irq_desc ->handle_irq() does a kstat_incr_irqs_this_cpu().
> 
> I'm undecided. Thoughts?
> 
> Thanks,
> 
> Nicolai
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [MODERATED] Re: [RFC PATCH 4/6] kvm: handle host mode irqs 4
  2018-07-27  9:55           ` Paolo Bonzini
@ 2018-07-29 20:00             ` Nicolai Stange
  0 siblings, 0 replies; 17+ messages in thread
From: Nicolai Stange @ 2018-07-29 20:00 UTC (permalink / raw)
  To: speck

speck for Paolo Bonzini  <speck@linutronix.de> writes:

> On 27/07/2018 11:17, speck for Nicolai Stange wrote:
>> speck for Peter Zijlstra  <speck@linutronix.de> writes:
>> 
>>> On Wed, Jul 25, 2018 at 01:45:32PM +0200, speck for Nicolai Stange wrote:
>>>> So if nobody objects, I'll go with what Paolo proposed at [1]: replace
>>>> the 'kernel_mode_irq_gen' counter with a latch flag and set that for any
>>>> irq (but NMIs for a start).
>>>
>>> Well, the reason I suggested the other counters, is to avoid adding yet
>>> another dirty cacheline to IRQs. Since we're already touching those
>>> fields for most interrupts anyway.. might as well extend them to cover
>>> what we need instead of adding more dirty lines.
>>>
>> 
>> Ah I see, that makes sense of course.
>> 
>> I think there are two possible ways forward:
>> 
>> 1.) Squeeze that new "kvm_cpu_l1tf_flush_l1d" latch flag into
>>     irq_cpustat_t. irq_cpustat_t is arch-specific and there
>>     is some room in its first cacheline, for example inserting
>>     up to four bytes right after ->irq_tlb_count wouldn't move
>>     anything critical out.
>> 
>> 2.) Make the non-do_IRQ() irqs, i.e. smp_apic_timer_interrupt() & Co,
>>     call kstat_incr_irqs_this_cpu() and adapt arch_irq_stat_cpu()
>>     accordingly. Let vmx use struct kernel_stat ->irqs_sum as a
>>     irq generation counter as before.
>> 
>> Pros 1.)
>> - Easy to implement.
>> - Doesn't rely on irq_domain drivers.
>> 
>> Cons 1.)
>> - Extra store from do_IRQ() irqs in comparison to 2.)
>> - That "kvm_cpu_l1tf_flush_l1d" flag doesn't really qualify
>>   as an "IRQ stat" and it might be surprising to have it there.
>
> There's precedent for this, for example __softirq_pending.  In fact you
> could reduce __softirq_pending to an u16 and not increase the size of
> irq_cpustat_t, since there are <16 softirqs.

Perfect, thanks!

Nicolai

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-07-29 20:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-23  9:54 [MODERATED] [RFC PATCH 0/6] kvm: handle host mode irqs 0 Nicolai Stange
2018-07-21 20:16 ` [MODERATED] [RFC PATCH 1/6] kvm: handle host mode irqs 1 Nicolai Stange
2018-07-21 20:25 ` [MODERATED] [RFC PATCH 2/6] kvm: handle host mode irqs 2 Nicolai Stange
2018-07-21 20:35 ` [MODERATED] [RFC PATCH 3/6] kvm: handle host mode irqs 3 Nicolai Stange
2018-07-22  9:35 ` [MODERATED] [RFC PATCH 4/6] kvm: handle host mode irqs 4 Nicolai Stange
2018-07-23 15:40   ` [MODERATED] " Andi Kleen
2018-07-24  5:58     ` Nicolai Stange
2018-07-24 14:12       ` Andi Kleen
2018-07-24 14:39         ` Paolo Bonzini
2018-07-24 15:21   ` Peter Zijlstra
2018-07-25 11:45     ` Nicolai Stange
2018-07-27  7:45       ` Peter Zijlstra
2018-07-27  9:17         ` Nicolai Stange
2018-07-27  9:55           ` Paolo Bonzini
2018-07-29 20:00             ` Nicolai Stange
2018-07-22 11:06 ` [MODERATED] [RFC PATCH 5/6] kvm: handle host mode irqs 5 Nicolai Stange
2018-07-22 11:38 ` [MODERATED] [RFC PATCH 6/6] kvm: handle host mode irqs 6 Nicolai Stange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.