All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1
@ 2020-02-20 17:22 Vitaly Kuznetsov
  2020-02-20 17:22 ` [PATCH RFC 1/2] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled Vitaly Kuznetsov
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-20 17:22 UTC (permalink / raw)
  To: Jim Mattson, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, linux-kernel

It was found that fine-grained VMX feature enablement in QEMU is broken
when combined with SynIC:

    qemu-system-x86_64 -machine q35,accel=kvm -cpu host,hv_vpindex,hv_synic -smp 2 -m 16384 -vnc :0
    qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016
    qemu-system-x86_64: <...>: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
    Aborted

QEMU thread: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04838.html

Turns out, this is a KVM issue: when SynIC is enabled, PIN_BASED_POSTED_INTR
gets filtered out from VMX MSRs for all newly created (but not existent!)
vCPUS. Patch1 addresses this. Also, apicv disablement for L1 doesn't seem
to disable it for L2 (at least on CPU0) so unless there's a good reason
to not allow this we need to make it work. PATCH2, suggested by Paolo,
is supposed to do the job.

RFC: I looked at the code and ran some tests and nothing suspicious popped
out, however, I'm still not convinced this is a good idea to have apicv
enabled for L2 when it's disabled for L1... Also, we may prefer to merge
or re-order these two patches.

Vitaly Kuznetsov (2):
  KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only
    when apicv is globally disabled
  KVM: nVMX: handle nested posted interrupts when apicv is disabled for
    L1

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            |  5 +----
 arch/x86/kvm/svm.c              |  7 ++++++-
 arch/x86/kvm/vmx/capabilities.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  5 ++---
 arch/x86/kvm/vmx/nested.h       |  3 +--
 arch/x86/kvm/vmx/vmx.c          | 23 +++++++++++++----------
 7 files changed, 25 insertions(+), 21 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RFC 1/2] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
  2020-02-20 17:22 [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1 Vitaly Kuznetsov
@ 2020-02-20 17:22 ` Vitaly Kuznetsov
  2020-02-20 17:22 ` [PATCH RFC 2/2] KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-20 17:22 UTC (permalink / raw)
  To: Jim Mattson, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, linux-kernel

When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:

$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016

This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.

L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/capabilities.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  5 ++---
 arch/x86/kvm/vmx/nested.h       |  3 +--
 arch/x86/kvm/vmx/vmx.c          | 10 ++++------
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 283bdb7071af..f486e2606247 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
 extern bool __read_mostly enable_unrestricted_guest;
 extern bool __read_mostly enable_ept_ad_bits;
 extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_apicv;
 extern int __read_mostly pt_mode;
 
 #define PT_MODE_SYSTEM		0
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3589cd3c0fcc..d0462a035505 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5949,8 +5949,7 @@ void nested_vmx_set_vmcs_shadowing_bitmap(void)
  * bit in the high half is on if the corresponding bit in the control field
  * may be on. See also vmx_control_verify().
  */
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
-				bool apicv)
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 {
 	/*
 	 * Note that as a general rule, the high half of the MSRs (bits in
@@ -5977,7 +5976,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
 		PIN_BASED_EXT_INTR_MASK |
 		PIN_BASED_NMI_EXITING |
 		PIN_BASED_VIRTUAL_NMIS |
-		(apicv ? PIN_BASED_POSTED_INTR : 0);
+		(enable_apicv ? PIN_BASED_POSTED_INTR : 0);
 	msrs->pinbased_ctls_high |=
 		PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
 		PIN_BASED_VMX_PREEMPTION_TIMER;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..1c5fbff45d69 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,8 +17,7 @@ enum nvmx_vmentry_status {
 };
 
 void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
-				bool apicv);
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
 void nested_vmx_hardware_unsetup(void);
 __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
 void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3be25ecae145..946c74e661e2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -95,7 +95,7 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
-static bool __read_mostly enable_apicv = 1;
+bool __read_mostly enable_apicv = 1;
 module_param(enable_apicv, bool, S_IRUGO);
 
 /*
@@ -6757,8 +6757,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 
 	if (nested)
 		nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
-					   vmx_capability.ept,
-					   kvm_vcpu_apicv_active(vcpu));
+					   vmx_capability.ept);
 	else
 		memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
 
@@ -6839,8 +6838,7 @@ static int __init vmx_check_processor_compat(void)
 	if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
 		return -EIO;
 	if (nested)
-		nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept,
-					   enable_apicv);
+		nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
 	if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
 		printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
 				smp_processor_id());
@@ -7702,7 +7700,7 @@ static __init int hardware_setup(void)
 
 	if (nested) {
 		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
-					   vmx_capability.ept, enable_apicv);
+					   vmx_capability.ept);
 
 		r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
 		if (r)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFC 2/2] KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
  2020-02-20 17:22 [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1 Vitaly Kuznetsov
  2020-02-20 17:22 ` [PATCH RFC 1/2] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled Vitaly Kuznetsov
@ 2020-02-20 17:22 ` Vitaly Kuznetsov
  2020-02-20 17:33 ` [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement " Sean Christopherson
  2020-02-21 16:50 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-20 17:22 UTC (permalink / raw)
  To: Jim Mattson, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, linux-kernel

Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            |  5 +----
 arch/x86/kvm/svm.c              |  7 ++++++-
 arch/x86/kvm/vmx/vmx.c          | 13 +++++++++----
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 40a0c0fd95ca..a84e8c5acda8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1146,7 +1146,7 @@ struct kvm_x86_ops {
 	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 	void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
 	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
-	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+	int (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afcd30d44cbb..cc8ee8125712 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1046,11 +1046,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 						       apic->regs + APIC_TMR);
 		}
 
-		if (vcpu->arch.apicv_active)
-			kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
-		else {
+		if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) {
 			kvm_lapic_set_irr(vector, apic);
-
 			kvm_make_request(KVM_REQ_EVENT, vcpu);
 			kvm_vcpu_kick(vcpu);
 		}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bef0ba35f121..970a3ab2e926 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5255,8 +5255,11 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	return;
 }
 
-static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+static int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
+	if (!vcpu->arch.apicv_active)
+		return -1;
+
 	kvm_lapic_set_irr(vec, vcpu->arch.apic);
 	smp_mb__after_atomic();
 
@@ -5268,6 +5271,8 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 		put_cpu();
 	} else
 		kvm_vcpu_wake_up(vcpu);
+
+	return 0;
 }
 
 static bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 946c74e661e2..92a14ea4be69 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3818,24 +3818,29 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
  * 2. If target vcpu isn't running(root mode), kick it to pick up the
  * interrupt from PIR in next vmentry.
  */
-static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int r;
 
 	r = vmx_deliver_nested_posted_interrupt(vcpu, vector);
 	if (!r)
-		return;
+		return 0;
+
+	if (!vcpu->arch.apicv_active)
+		return -1;
 
 	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
-		return;
+		return 0;
 
 	/* If a previous notification has sent the IPI, nothing to do.  */
 	if (pi_test_and_set_on(&vmx->pi_desc))
-		return;
+		return 0;
 
 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
 		kvm_vcpu_kick(vcpu);
+
+	return 0;
 }
 
 /*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1
  2020-02-20 17:22 [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1 Vitaly Kuznetsov
  2020-02-20 17:22 ` [PATCH RFC 1/2] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled Vitaly Kuznetsov
  2020-02-20 17:22 ` [PATCH RFC 2/2] KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 Vitaly Kuznetsov
@ 2020-02-20 17:33 ` Sean Christopherson
  2020-02-21 16:50 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2020-02-20 17:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Jim Mattson, kvm, Paolo Bonzini, Wanpeng Li, linux-kernel

On Thu, Feb 20, 2020 at 06:22:03PM +0100, Vitaly Kuznetsov wrote:
> It was found that fine-grained VMX feature enablement in QEMU is broken
> when combined with SynIC:
> 
>     qemu-system-x86_64 -machine q35,accel=kvm -cpu host,hv_vpindex,hv_synic -smp 2 -m 16384 -vnc :0
>     qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016
>     qemu-system-x86_64: <...>: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>     Aborted
> 
> QEMU thread: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04838.html
> 
> Turns out, this is a KVM issue: when SynIC is enabled, PIN_BASED_POSTED_INTR
> gets filtered out from VMX MSRs for all newly created (but not existent!)
> vCPUS. Patch1 addresses this. Also, apicv disablement for L1 doesn't seem
> to disable it for L2 (at least on CPU0) so unless there's a good reason
> to not allow this we need to make it work. PATCH2, suggested by Paolo,
> is supposed to do the job.
> 
> RFC: I looked at the code and ran some tests and nothing suspicious popped
> out, however, I'm still not convinced this is a good idea to have apicv
> enabled for L2 when it's disabled for L1...

Eh, if it works...  IMO, enabling apicv for both L1 and L2 is far more
mindbending.

> Also, we may prefer to merge or re-order these two patches.

It does seem like patch 2 should be applied first.  

> Vitaly Kuznetsov (2):
>   KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only
>     when apicv is globally disabled
>   KVM: nVMX: handle nested posted interrupts when apicv is disabled for
>     L1
> 
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/lapic.c            |  5 +----
>  arch/x86/kvm/svm.c              |  7 ++++++-
>  arch/x86/kvm/vmx/capabilities.h |  1 +
>  arch/x86/kvm/vmx/nested.c       |  5 ++---
>  arch/x86/kvm/vmx/nested.h       |  3 +--
>  arch/x86/kvm/vmx/vmx.c          | 23 +++++++++++++----------
>  7 files changed, 25 insertions(+), 21 deletions(-)
> 
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1
  2020-02-20 17:22 [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1 Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2020-02-20 17:33 ` [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement " Sean Christopherson
@ 2020-02-21 16:50 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2020-02-21 16:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Jim Mattson, kvm
  Cc: Sean Christopherson, Wanpeng Li, linux-kernel

On 20/02/20 18:22, Vitaly Kuznetsov wrote:
> 
> RFC: I looked at the code and ran some tests and nothing suspicious popped
> out, however, I'm still not convinced this is a good idea to have apicv
> enabled for L2 when it's disabled for L1... Also, we may prefer to merge
> or re-order these two patches.

I swapped the patches and queued them.  The basic observation is that
APICv is only about virtualizing the APIC, without any interaction with
the hypervisor's APIC apart from the IPI path.  So if L1 turns it on it
wants L1 and L2's APICs to be completely independent, and SynIC is
completely irrelevant.  All that matters is again whether the IPI path
works, and that is what patch 2 fixes.

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-21 16:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 17:22 [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement for L1 Vitaly Kuznetsov
2020-02-20 17:22 ` [PATCH RFC 1/2] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled Vitaly Kuznetsov
2020-02-20 17:22 ` [PATCH RFC 2/2] KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 Vitaly Kuznetsov
2020-02-20 17:33 ` [PATCH RFC 0/2] KVM: nVMX: fix apicv disablement " Sean Christopherson
2020-02-21 16:50 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.