stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning
@ 2022-02-09 18:56 Sasha Levin
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Sasha Levin
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hou Wenlong, Sean Christopherson, Paolo Bonzini, Sasha Levin, kvm

From: Hou Wenlong <houwenlong93@linux.alibaba.com>

[ Upstream commit 6a0c61703e3a5d67845a4b275e1d9d7bc1b5aad7 ]

Fix the following false positive warning:
 =============================
 WARNING: suspicious RCU usage
 5.16.0-rc4+ #57 Not tainted
 -----------------------------
 arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

 stack backtrace:
 CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 virt/kvm/eventfd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 2ad013b8bde96..59b1dd4a549ee 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -463,8 +463,8 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
 	idx = srcu_read_lock(&kvm->irq_srcu);
 	gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
 	if (gsi != -1)
-		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
-					 link)
+		hlist_for_each_entry_srcu(kian, &kvm->irq_ack_notifier_list,
+					  link, srcu_read_lock_held(&kvm->irq_srcu))
 			if (kian->gsi == gsi) {
 				srcu_read_unlock(&kvm->irq_srcu, idx);
 				return true;
@@ -480,8 +480,8 @@ void kvm_notify_acked_gsi(struct kvm *kvm, int gsi)
 {
 	struct kvm_irq_ack_notifier *kian;
 
-	hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
-				 link)
+	hlist_for_each_entry_srcu(kian, &kvm->irq_ack_notifier_list,
+				  link, srcu_read_lock_held(&kvm->irq_srcu))
 		if (kian->gsi == gsi)
 			kian->irq_acked(kian);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:42   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Sasha Levin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, kvm

From: Vitaly Kuznetsov <vkuznets@redhat.com>

[ Upstream commit 7a601e2cf61558dfd534a9ecaad09f5853ad8204 ]

Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.

Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/vmx/evmcs.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 6255fa7167720..8d70f9aea94bc 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -59,7 +59,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
 	 SECONDARY_EXEC_SHADOW_VMCS |					\
 	 SECONDARY_EXEC_TSC_SCALING |					\
 	 SECONDARY_EXEC_PAUSE_LOOP_EXITING)
-#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
+#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL					\
+	(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |				\
+	 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
 #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:42   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 Sasha Levin
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, kvm

From: Vitaly Kuznetsov <vkuznets@redhat.com>

[ Upstream commit f80ae0ef089a09e8c18da43a382c3caac9a424a7 ]

Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.

Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/vmx/evmcs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 09fac0ddac8bd..87e3dc10edf40 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -361,6 +361,7 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
 	case MSR_IA32_VMX_PROCBASED_CTLS2:
 		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
 		break;
+	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
 	case MSR_IA32_VMX_PINBASED_CTLS:
 		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Sasha Levin
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:33   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode Sasha Levin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Christopherson, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, kvm

From: Sean Christopherson <seanjc@google.com>

[ Upstream commit d6e656cd266cdcc95abd372c7faef05bee271d1a ]

WARN if KVM attempts to allocate a shadow VMCS for vmcs02.  KVM emulates
VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
a "real" shadow VMCS for L2.

The previous code WARNed but continued anyway with the allocation,
presumably in an attempt to avoid NULL pointer dereference.
However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
indeed the sole caller does:

	if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
		goto out_shadow_vmcs;

which makes it not a useful attempt.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125220527.2093146-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/vmx/nested.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c605c2c01394b..9cd68e1fcf602 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4827,18 +4827,20 @@ static struct vmcs *alloc_shadow_vmcs(struct kvm_vcpu *vcpu)
 	struct loaded_vmcs *loaded_vmcs = vmx->loaded_vmcs;
 
 	/*
-	 * We should allocate a shadow vmcs for vmcs01 only when L1
-	 * executes VMXON and free it when L1 executes VMXOFF.
-	 * As it is invalid to execute VMXON twice, we shouldn't reach
-	 * here when vmcs01 already have an allocated shadow vmcs.
+	 * KVM allocates a shadow VMCS only when L1 executes VMXON and frees it
+	 * when L1 executes VMXOFF or the vCPU is forced out of nested
+	 * operation.  VMXON faults if the CPU is already post-VMXON, so it
+	 * should be impossible to already have an allocated shadow VMCS.  KVM
+	 * doesn't support virtualization of VMCS shadowing, so vmcs01 should
+	 * always be the loaded VMCS.
 	 */
-	WARN_ON(loaded_vmcs == &vmx->vmcs01 && loaded_vmcs->shadow_vmcs);
+	if (WARN_ON(loaded_vmcs != &vmx->vmcs01 || loaded_vmcs->shadow_vmcs))
+		return loaded_vmcs->shadow_vmcs;
+
+	loaded_vmcs->shadow_vmcs = alloc_vmcs(true);
+	if (loaded_vmcs->shadow_vmcs)
+		vmcs_clear(loaded_vmcs->shadow_vmcs);
 
-	if (!loaded_vmcs->shadow_vmcs) {
-		loaded_vmcs->shadow_vmcs = alloc_vmcs(true);
-		if (loaded_vmcs->shadow_vmcs)
-			vmcs_clear(loaded_vmcs->shadow_vmcs);
-	}
 	return loaded_vmcs->shadow_vmcs;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
                   ` (2 preceding siblings ...)
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:33   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support Sasha Levin
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Christopherson, Liam Merwick, Paolo Bonzini, Sasha Levin,
	tglx, mingo, bp, dave.hansen, x86, kvm

From: Sean Christopherson <seanjc@google.com>

[ Upstream commit cdf85e0c5dc766fc7fc779466280e454a6d04f87 ]

Inject a #GP instead of synthesizing triple fault to try to avoid killing
the guest if emulation of an SEV guest fails due to encountering the SMAP
erratum.  The injected #GP may still be fatal to the guest, e.g. if the
userspace process is providing critical functionality, but KVM should
make every attempt to keep the guest alive.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/svm/svm.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3efada37272c0..d6a4acaa65742 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4530,7 +4530,21 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int i
 	is_user = svm_get_cpl(vcpu) == 3;
 	if (smap && (!smep || is_user)) {
 		pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
-		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+		/*
+		 * If the fault occurred in userspace, arbitrarily inject #GP
+		 * to avoid killing the guest and to hopefully avoid confusing
+		 * the guest kernel too much, e.g. injecting #PF would not be
+		 * coherent with respect to the guest's page tables.  Request
+		 * triple fault if the fault occurred in the kernel as there's
+		 * no fault that KVM can inject without confusing the guest.
+		 * In practice, the triple fault is moot as no sane SEV kernel
+		 * will execute from user memory while also running with SMAP=1.
+		 */
+		if (is_user)
+			kvm_inject_gp(vcpu, 0);
+		else
+			kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 	}
 
 	return false;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
                   ` (3 preceding siblings ...)
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:33   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sasha Levin
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Christopherson, Liam Merwick, Paolo Bonzini, Sasha Levin,
	tglx, mingo, bp, dave.hansen, x86, kvm

From: Sean Christopherson <seanjc@google.com>

[ Upstream commit c532f2903b69b775d27016511fbe29a14a098f95 ]

Add a sanity check on DECODEASSIST being support if SEV is supported, as
KVM cannot read guest private memory and thus relies on the CPU to
provide the instruction byte stream on #NPF for emulation.  The intent of
the check is to document the dependency, it should never fail in practice
as producing hardware that supports SEV but not DECODEASSISTS would be
non-sensical.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/svm/sev.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index be28831412209..932afd713a02c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2099,8 +2099,13 @@ void __init sev_hardware_setup(void)
 	if (!sev_enabled || !npt_enabled)
 		goto out;
 
-	/* Does the CPU support SEV? */
-	if (!boot_cpu_has(X86_FEATURE_SEV))
+	/*
+	 * SEV must obviously be supported in hardware.  Sanity check that the
+	 * CPU supports decode assists, which is mandatory for SEV guests to
+	 * support instruction emulation.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_SEV) ||
+	    WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_DECODEASSISTS)))
 		goto out;
 
 	/* Retrieve SEV CPUID information */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
                   ` (4 preceding siblings ...)
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:42   ` Paolo Bonzini
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID Sasha Levin
  2022-02-10 16:41 ` [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Paolo Bonzini
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Christopherson, David Woodhouse, Alexander Graf,
	Paolo Bonzini, Sasha Levin, tglx, mingo, bp, dave.hansen, x86,
	kvm

From: Sean Christopherson <seanjc@google.com>

[ Upstream commit b9bed78e2fa9571b7c983b20666efa0009030c71 ]

Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
set and STI/MOVSS blocking is true, then the previous instruction must
have been STI or MOV/POP, and therefore a single-step #DB must be pending
since the RFLAGS.TF cannot have been set by the previous instruction,
i.e. the one instruction delay after setting RFLAGS.TF must have already
expired.

Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
when recording guest state as part of a VM-Exit, but #DB VM-Exits
intentionally do not treat the #DB as "guest state" as interception of
the #DB effectively makes the #DB host-owned, thus KVM needs to manually
set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.

Note, although this bug can be triggered by guest userspace, doing so
requires IOPL=3, and guest userspace running with IOPL=3 has full access
to all I/O ports (from the guest's perspective) and can crash/reboot the
guest any number of ways.  IOPL=3 is required because STI blocking kicks
in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.

MOVSS blocking can be initiated by userspace, but can be coincident with
a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
is problematic only for CPL0 (and only if the guest is crazy enough to
access a DR in a MOVSS shadow).  All other sources of #DBs are either
suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
are mutually exclusive with MOVSS blocking (T-bit task switch), or are
already handled by KVM (ICEBP, a.k.a. INT1).

This bug was originally found by running tests[1] created for XSA-308[2].
Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
presumably why the Xen bug was deemed to be an exploitable DOS from guest
userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
and thus clears MOVSS blocking as a side effect of its "emulation".

[1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
[2] https://xenbits.xen.org/xsa/advisory-308.html

Reported-by: David Woodhouse <dwmw2@infradead.org>
Reported-by: Alexander Graf <graf@amazon.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220120000624.655815-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7f4e6f625abcf..fe4a36c984460 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4811,8 +4811,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		dr6 = vmx_get_exit_qual(vcpu);
 		if (!(vcpu->guest_debug &
 		      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
+			/*
+			 * If the #DB was due to ICEBP, a.k.a. INT1, skip the
+			 * instruction.  ICEBP generates a trap-like #DB, but
+			 * despite its interception control being tied to #DB,
+			 * is an instruction intercept, i.e. the VM-Exit occurs
+			 * on the ICEBP itself.  Note, skipping ICEBP also
+			 * clears STI and MOVSS blocking.
+			 *
+			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
+			 * if single-step is enabled in RFLAGS and STI or MOVSS
+			 * blocking is active, as the CPU doesn't set the bit
+			 * on VM-Exit due to #DB interception.  VM-Entry has a
+			 * consistency check that a single-step #DB is pending
+			 * in this scenario as the previous instruction cannot
+			 * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV
+			 * don't modify RFLAGS), therefore the one instruction
+			 * delay when activating single-step breakpoints must
+			 * have already expired.  Note, the CPU sets/clears BS
+			 * as appropriate for all other VM-Exits types.
+			 */
 			if (is_icebp(intr_info))
 				WARN_ON(!skip_emulated_instruction(vcpu));
+			else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
+				 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+				  (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
+				vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
+					    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
 
 			kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
 			return 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
                   ` (5 preceding siblings ...)
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sasha Levin
@ 2022-02-09 18:56 ` Sasha Levin
  2022-02-10 16:42   ` Paolo Bonzini
  2022-02-10 16:41 ` [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Paolo Bonzini
  7 siblings, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2022-02-09 18:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jim Mattson, Aaron Lewis, Paolo Bonzini, Sasha Levin, tglx,
	mingo, bp, dave.hansen, x86, kvm

From: Jim Mattson <jmattson@google.com>

[ Upstream commit e3bcfda012edd3564e12551b212afbd2521a1f68 ]

CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and
CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature"
bits. Unlike most of the other CPUID feature bits, these bits are
clear if the features are present and set if the features are not
present. These bits should be reported in KVM_GET_SUPPORTED_CPUID,
because if these bits are set on hardware, they cannot be cleared in
the guest CPUID. Doing so would claim guest support for a feature that
the hardware doesn't support and that can't be efficiently emulated.

Of course, any software (e.g WIN87EM.DLL) expecting these features to
be present likely predates these CPUID feature bits and therefore
doesn't know to check for them anyway.

Aaron Lewis added the corresponding X86_FEATURE macros in
commit cbb99c0f5887 ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and
ZERO_FCS_FDS"), with the intention of reporting these bits in
KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on
the kvm list.

Opportunistically reordered the CPUID_7_0_EBX capability bits from
least to most significant.

Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220204001348.2844660-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/cpuid.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index add8f58d686e3..bf18679757c70 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -532,12 +532,13 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_mask(CPUID_7_0_EBX,
-		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
-		F(BMI2) | F(ERMS) | F(INVPCID) | F(RTM) | 0 /*MPX*/ | F(RDSEED) |
-		F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
-		F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
-		F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | 0 /*INTEL_PT*/
-	);
+		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
+		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
+		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
+		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
+		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
+		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
+		F(AVX512VL));
 
 	kvm_cpu_cap_mask(CPUID_7_ECX,
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 Sasha Levin
@ 2022-02-10 16:33   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:33 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Sean Christopherson, tglx, mingo, bp, dave.hansen, x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> [ Upstream commit d6e656cd266cdcc95abd372c7faef05bee271d1a ]
> 
> WARN if KVM attempts to allocate a shadow VMCS for vmcs02.  KVM emulates
> VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
> a "real" shadow VMCS for L2.
> 
> The previous code WARNed but continued anyway with the allocation,
> presumably in an attempt to avoid NULL pointer dereference.
> However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
> indeed the sole caller does:
> 
> 	if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
> 		goto out_shadow_vmcs;
> 
> which makes it not a useful attempt.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-Id: <20220125220527.2093146-1-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/vmx/nested.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index c605c2c01394b..9cd68e1fcf602 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4827,18 +4827,20 @@ static struct vmcs *alloc_shadow_vmcs(struct kvm_vcpu *vcpu)
>   	struct loaded_vmcs *loaded_vmcs = vmx->loaded_vmcs;
>   
>   	/*
> -	 * We should allocate a shadow vmcs for vmcs01 only when L1
> -	 * executes VMXON and free it when L1 executes VMXOFF.
> -	 * As it is invalid to execute VMXON twice, we shouldn't reach
> -	 * here when vmcs01 already have an allocated shadow vmcs.
> +	 * KVM allocates a shadow VMCS only when L1 executes VMXON and frees it
> +	 * when L1 executes VMXOFF or the vCPU is forced out of nested
> +	 * operation.  VMXON faults if the CPU is already post-VMXON, so it
> +	 * should be impossible to already have an allocated shadow VMCS.  KVM
> +	 * doesn't support virtualization of VMCS shadowing, so vmcs01 should
> +	 * always be the loaded VMCS.
>   	 */
> -	WARN_ON(loaded_vmcs == &vmx->vmcs01 && loaded_vmcs->shadow_vmcs);
> +	if (WARN_ON(loaded_vmcs != &vmx->vmcs01 || loaded_vmcs->shadow_vmcs))
> +		return loaded_vmcs->shadow_vmcs;
> +
> +	loaded_vmcs->shadow_vmcs = alloc_vmcs(true);
> +	if (loaded_vmcs->shadow_vmcs)
> +		vmcs_clear(loaded_vmcs->shadow_vmcs);
>   
> -	if (!loaded_vmcs->shadow_vmcs) {
> -		loaded_vmcs->shadow_vmcs = alloc_vmcs(true);
> -		if (loaded_vmcs->shadow_vmcs)
> -			vmcs_clear(loaded_vmcs->shadow_vmcs);
> -	}
>   	return loaded_vmcs->shadow_vmcs;
>   }
>   

NACK, it's just extra care but not particularly useful.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode Sasha Levin
@ 2022-02-10 16:33   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:33 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Sean Christopherson, Liam Merwick, tglx, mingo, bp, dave.hansen,
	x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> [ Upstream commit cdf85e0c5dc766fc7fc779466280e454a6d04f87 ]
> 
> Inject a #GP instead of synthesizing triple fault to try to avoid killing
> the guest if emulation of an SEV guest fails due to encountering the SMAP
> erratum.  The injected #GP may still be fatal to the guest, e.g. if the
> userspace process is providing critical functionality, but KVM should
> make every attempt to keep the guest alive.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
> Message-Id: <20220120010719.711476-10-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/svm/svm.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3efada37272c0..d6a4acaa65742 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4530,7 +4530,21 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int i
>   	is_user = svm_get_cpl(vcpu) == 3;
>   	if (smap && (!smep || is_user)) {
>   		pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
> -		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +
> +		/*
> +		 * If the fault occurred in userspace, arbitrarily inject #GP
> +		 * to avoid killing the guest and to hopefully avoid confusing
> +		 * the guest kernel too much, e.g. injecting #PF would not be
> +		 * coherent with respect to the guest's page tables.  Request
> +		 * triple fault if the fault occurred in the kernel as there's
> +		 * no fault that KVM can inject without confusing the guest.
> +		 * In practice, the triple fault is moot as no sane SEV kernel
> +		 * will execute from user memory while also running with SMAP=1.
> +		 */
> +		if (is_user)
> +			kvm_inject_gp(vcpu, 0);
> +		else
> +			kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>   	}
>   
>   	return false;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support Sasha Levin
@ 2022-02-10 16:33   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:33 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Sean Christopherson, Liam Merwick, tglx, mingo, bp, dave.hansen,
	x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> [ Upstream commit c532f2903b69b775d27016511fbe29a14a098f95 ]
> 
> Add a sanity check on DECODEASSIST being support if SEV is supported, as
> KVM cannot read guest private memory and thus relies on the CPU to
> provide the instruction byte stream on #NPF for emulation.  The intent of
> the check is to document the dependency, it should never fail in practice
> as producing hardware that supports SEV but not DECODEASSISTS would be
> non-sensical.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
> Message-Id: <20220120010719.711476-5-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/svm/sev.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index be28831412209..932afd713a02c 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2099,8 +2099,13 @@ void __init sev_hardware_setup(void)
>   	if (!sev_enabled || !npt_enabled)
>   		goto out;
>   
> -	/* Does the CPU support SEV? */
> -	if (!boot_cpu_has(X86_FEATURE_SEV))
> +	/*
> +	 * SEV must obviously be supported in hardware.  Sanity check that the
> +	 * CPU supports decode assists, which is mandatory for SEV guests to
> +	 * support instruction emulation.
> +	 */
> +	if (!boot_cpu_has(X86_FEATURE_SEV) ||
> +	    WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_DECODEASSISTS)))
>   		goto out;
>   
>   	/* Retrieve SEV CPUID information */

NACK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning
  2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
                   ` (6 preceding siblings ...)
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID Sasha Levin
@ 2022-02-10 16:41 ` Paolo Bonzini
  2022-02-12 22:20   ` Sasha Levin
  7 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:41 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable; +Cc: Hou Wenlong, Sean Christopherson, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Hou Wenlong <houwenlong93@linux.alibaba.com>
> 
> [ Upstream commit 6a0c61703e3a5d67845a4b275e1d9d7bc1b5aad7 ]

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Fix the following false positive warning:
>   =============================
>   WARNING: suspicious RCU usage
>   5.16.0-rc4+ #57 Not tainted
>   -----------------------------
>   arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!
> 
>   other info that might help us debug this:
> 
>   rcu_scheduler_active = 2, debug_locks = 1
>   3 locks held by fc_vcpu 0/330:
>    #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
>    #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
>    #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]
> 
>   stack backtrace:
>   CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x44/0x57
>    kvm_notify_acked_gsi+0x6b/0x70 [kvm]
>    kvm_notify_acked_irq+0x8d/0x180 [kvm]
>    kvm_ioapic_update_eoi+0x92/0x240 [kvm]
>    kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
>    handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
>    vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
>    vcpu_enter_guest+0x66e/0x1860 [kvm]
>    kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
>    kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
>    __x64_sys_ioctl+0x89/0xc0
>    do_syscall_64+0x3a/0x90
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
> kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
> kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
> a false positive warning. So use hlist_for_each_entry_srcu() instead of
> hlist_for_each_entry_rcu().
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   virt/kvm/eventfd.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2ad013b8bde96..59b1dd4a549ee 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -463,8 +463,8 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
>   	idx = srcu_read_lock(&kvm->irq_srcu);
>   	gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
>   	if (gsi != -1)
> -		hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
> -					 link)
> +		hlist_for_each_entry_srcu(kian, &kvm->irq_ack_notifier_list,
> +					  link, srcu_read_lock_held(&kvm->irq_srcu))
>   			if (kian->gsi == gsi) {
>   				srcu_read_unlock(&kvm->irq_srcu, idx);
>   				return true;
> @@ -480,8 +480,8 @@ void kvm_notify_acked_gsi(struct kvm *kvm, int gsi)
>   {
>   	struct kvm_irq_ack_notifier *kian;
>   
> -	hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
> -				 link)
> +	hlist_for_each_entry_srcu(kian, &kvm->irq_ack_notifier_list,
> +				  link, srcu_read_lock_held(&kvm->irq_srcu))
>   		if (kian->gsi == gsi)
>   			kian->irq_acked(kian);
>   }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID Sasha Levin
@ 2022-02-10 16:42   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:42 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Jim Mattson, Aaron Lewis, tglx, mingo, bp, dave.hansen, x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Jim Mattson <jmattson@google.com>
> 
> [ Upstream commit e3bcfda012edd3564e12551b212afbd2521a1f68 ]

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and
> CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature"
> bits. Unlike most of the other CPUID feature bits, these bits are
> clear if the features are present and set if the features are not
> present. These bits should be reported in KVM_GET_SUPPORTED_CPUID,
> because if these bits are set on hardware, they cannot be cleared in
> the guest CPUID. Doing so would claim guest support for a feature that
> the hardware doesn't support and that can't be efficiently emulated.
> 
> Of course, any software (e.g WIN87EM.DLL) expecting these features to
> be present likely predates these CPUID feature bits and therefore
> doesn't know to check for them anyway.
> 
> Aaron Lewis added the corresponding X86_FEATURE macros in
> commit cbb99c0f5887 ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and
> ZERO_FCS_FDS"), with the intention of reporting these bits in
> KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on
> the kvm list.
> 
> Opportunistically reordered the CPUID_7_0_EBX capability bits from
> least to most significant.
> 
> Cc: Aaron Lewis <aaronlewis@google.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Message-Id: <20220204001348.2844660-1-jmattson@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/cpuid.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index add8f58d686e3..bf18679757c70 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -532,12 +532,13 @@ void kvm_set_cpu_caps(void)
>   	);
>   
>   	kvm_cpu_cap_mask(CPUID_7_0_EBX,
> -		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> -		F(BMI2) | F(ERMS) | F(INVPCID) | F(RTM) | 0 /*MPX*/ | F(RDSEED) |
> -		F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
> -		F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
> -		F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | 0 /*INTEL_PT*/
> -	);
> +		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
> +		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
> +		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> +		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
> +		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
> +		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
> +		F(AVX512VL));
>   
>   	kvm_cpu_cap_mask(CPUID_7_ECX,
>   		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Sasha Levin
@ 2022-02-10 16:42   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:42 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Vitaly Kuznetsov, tglx, mingo, bp, dave.hansen, x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> [ Upstream commit 7a601e2cf61558dfd534a9ecaad09f5853ad8204 ]

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
> PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
> sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.
> 
> Note, none of the currently existing Windows/Hyper-V versions are known
> to enable 'save VMX-preemption timer value' when eVMCS is in use, the
> change is aimed at making the filtering future proof.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/vmx/evmcs.h | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 6255fa7167720..8d70f9aea94bc 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -59,7 +59,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
>   	 SECONDARY_EXEC_SHADOW_VMCS |					\
>   	 SECONDARY_EXEC_TSC_SCALING |					\
>   	 SECONDARY_EXEC_PAUSE_LOOP_EXITING)
> -#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
> +#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL					\
> +	(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |				\
> +	 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
>   #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
>   #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
>   


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Sasha Levin
@ 2022-02-10 16:42   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:42 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Vitaly Kuznetsov, tglx, mingo, bp, dave.hansen, x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> [ Upstream commit f80ae0ef089a09e8c18da43a382c3caac9a424a7 ]
> 
> Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
> MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
> MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
> MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
> on 'true' MSR data.
> 
> Note, none of the currently existing Windows/Hyper-V versions are known
> to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
> is aimed at making the filtering future proof.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/vmx/evmcs.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 09fac0ddac8bd..87e3dc10edf40 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -361,6 +361,7 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>   	case MSR_IA32_VMX_PROCBASED_CTLS2:
>   		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
>   		break;
> +	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
>   	case MSR_IA32_VMX_PINBASED_CTLS:
>   		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
>   		break;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
  2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sasha Levin
@ 2022-02-10 16:42   ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2022-02-10 16:42 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Sean Christopherson, David Woodhouse, Alexander Graf, tglx,
	mingo, bp, dave.hansen, x86, kvm

On 2/9/22 19:56, Sasha Levin wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> [ Upstream commit b9bed78e2fa9571b7c983b20666efa0009030c71 ]

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> 
> Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
> breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
> MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
> consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
> set and STI/MOVSS blocking is true, then the previous instruction must
> have been STI or MOV/POP, and therefore a single-step #DB must be pending
> since the RFLAGS.TF cannot have been set by the previous instruction,
> i.e. the one instruction delay after setting RFLAGS.TF must have already
> expired.
> 
> Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
> when recording guest state as part of a VM-Exit, but #DB VM-Exits
> intentionally do not treat the #DB as "guest state" as interception of
> the #DB effectively makes the #DB host-owned, thus KVM needs to manually
> set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
> 
> Note, although this bug can be triggered by guest userspace, doing so
> requires IOPL=3, and guest userspace running with IOPL=3 has full access
> to all I/O ports (from the guest's perspective) and can crash/reboot the
> guest any number of ways.  IOPL=3 is required because STI blocking kicks
> in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
> takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
> 
> MOVSS blocking can be initiated by userspace, but can be coincident with
> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
> executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
> is problematic only for CPL0 (and only if the guest is crazy enough to
> access a DR in a MOVSS shadow).  All other sources of #DBs are either
> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
> are mutually exclusive with MOVSS blocking (T-bit task switch), or are
> already handled by KVM (ICEBP, a.k.a. INT1).
> 
> This bug was originally found by running tests[1] created for XSA-308[2].
> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
> presumably why the Xen bug was deemed to be an exploitable DOS from guest
> userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
> and thus clears MOVSS blocking as a side effect of its "emulation".
> 
> [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
> [2] https://xenbits.xen.org/xsa/advisory-308.html
> 
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Reported-by: Alexander Graf <graf@amazon.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-Id: <20220120000624.655815-1-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7f4e6f625abcf..fe4a36c984460 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4811,8 +4811,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>   		dr6 = vmx_get_exit_qual(vcpu);
>   		if (!(vcpu->guest_debug &
>   		      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
> +			/*
> +			 * If the #DB was due to ICEBP, a.k.a. INT1, skip the
> +			 * instruction.  ICEBP generates a trap-like #DB, but
> +			 * despite its interception control being tied to #DB,
> +			 * is an instruction intercept, i.e. the VM-Exit occurs
> +			 * on the ICEBP itself.  Note, skipping ICEBP also
> +			 * clears STI and MOVSS blocking.
> +			 *
> +			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
> +			 * if single-step is enabled in RFLAGS and STI or MOVSS
> +			 * blocking is active, as the CPU doesn't set the bit
> +			 * on VM-Exit due to #DB interception.  VM-Entry has a
> +			 * consistency check that a single-step #DB is pending
> +			 * in this scenario as the previous instruction cannot
> +			 * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV
> +			 * don't modify RFLAGS), therefore the one instruction
> +			 * delay when activating single-step breakpoints must
> +			 * have already expired.  Note, the CPU sets/clears BS
> +			 * as appropriate for all other VM-Exits types.
> +			 */
>   			if (is_icebp(intr_info))
>   				WARN_ON(!skip_emulated_instruction(vcpu));
> +			else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> +				 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> +				  (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
> +				vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> +					    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
>   
>   			kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
>   			return 1;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning
  2022-02-10 16:41 ` [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Paolo Bonzini
@ 2022-02-12 22:20   ` Sasha Levin
  0 siblings, 0 replies; 17+ messages in thread
From: Sasha Levin @ 2022-02-12 22:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, stable, Hou Wenlong, Sean Christopherson, kvm

On Thu, Feb 10, 2022 at 05:41:03PM +0100, Paolo Bonzini wrote:
>On 2/9/22 19:56, Sasha Levin wrote:
>>From: Hou Wenlong <houwenlong93@linux.alibaba.com>
>>
>>[ Upstream commit 6a0c61703e3a5d67845a4b275e1d9d7bc1b5aad7 ]
>
>Acked-by: Paolo Bonzini <pbonzini@redhat.com>

I've queued up all the ones you've acked, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-12 22:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 18:56 [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Sasha Levin
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 2/8] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Sasha Levin
2022-02-10 16:42   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 3/8] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Sasha Levin
2022-02-10 16:42   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 4/8] KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 Sasha Levin
2022-02-10 16:33   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 5/8] KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode Sasha Levin
2022-02-10 16:33   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 6/8] KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support Sasha Levin
2022-02-10 16:33   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 7/8] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sasha Levin
2022-02-10 16:42   ` Paolo Bonzini
2022-02-09 18:56 ` [PATCH MANUALSEL 5.16 8/8] KVM: x86: Report deprecated x87 features in supported CPUID Sasha Levin
2022-02-10 16:42   ` Paolo Bonzini
2022-02-10 16:41 ` [PATCH MANUALSEL 5.16 1/8] KVM: eventfd: Fix false positive RCU usage warning Paolo Bonzini
2022-02-12 22:20   ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).