[PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts
@ 2017-12-05  8:16 Liran Alon
  2017-12-05  8:16 ` [PATCH v2 1/5] KVM: nVMX: Remove pi_pending as signal to process nested posted-interrupts Liran Alon
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: jmattson, wanpeng.li, idan.brown

Hi,

This series aims to fix multiple issues with nested-posted-interrupts.

The first patch removes a per vCPU flag called pi_pending which
is used to signal KVM that it should emulate nested-posted-interrupt
dispatching on next resume of L2. However, this flag is unnecessary as
it has the exact same meaning as vmx->nested.pi_desc->control ON bit.

The second patch fixes an issue of not re-evaluating what should be done
with a new L1 pending interrupt that was discovered by syncing PIR to
IRR just before resuming L2 guest. For example, this pending L1 event
should in most cases result in exiting from L2 to L1 on
external-interrupt. But currently, we will just continue resuming L2
which is wrong.

The third patch clean-up & fix handling of directly injecting a L1
interrupt to L2 when L1 don't intercept external-interrupts. The
current handling of this case doesn't correctly consider the LAPIC TPR
and don't update it's IRR & ISR after injecting the interrupt to L2.
Fix this by using standard interrupt-injection code-path in this
scenario as-well.

The fourth patch fix multiple race-condition issues in sending &
dispatching nested-posted-interrupts. The patch fixes these issues by
checking if there is pending nested-posted-interrupts before each
vmentry and if yes, using self-IPI to make hardware dispatch them
instead of emulating behavior in software.

The fifth patch fixes a bug of not waking up a halted L2 when L1 sends
it a nested-posted-interrupt and L1 doesn't intercept HLT.

Regards,
-Liran

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] KVM: nVMX: Remove pi_pending as signal to process nested posted-interrupts
  2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
@ 2017-12-05  8:16 ` Liran Alon
  2017-12-05  8:16 ` [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt Liran Alon
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

When L1 wants to send a posted-interrupt to another L1 CPU which runs
L2, it does the following operations:
1. Sets the relevant bit in vmx->nested.pi_desc PIR.
2. Set ON bit in vmx->nested.pi_desc->control field.
3. Sends an IPI to dest L1 CPU by writing the the posted-interrupt
   notification-vector into the LAPIC ICR.

Step (3) will exit to L0 on APIC_WRITE which will eventually reach
vmx_deliver_nested_posted_interrupt(). If dest L0 CPU is in guest,
then a physical IPI of notification-vector will be sent which will
trigger evaluation of posted-interrupts and clear
vmx->nested.pi_desc->control ON bit.
Otherwise (or if dest L0 CPU exited from guest just before sending the
physical IPI), the nested-posted-interrupts will be evaluated on next
vmentry by vmx_complete_nested_posted_interrupt().

In order for vmx_complete_nested_posted_interrupt() to know if it
should do any work, the flag vmx->nested.pi_pending was used which
was set by vmx_deliver_nested_posted_interrupt().

However, this seems unnecessary because if posted-interrupts was not
processed yet, vmx->nested.pi_desc->control ON bit should still be
set. Therefore, it should suffice to use it in order to know if work
should be done.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 714a0673ec3c..f5074ec5701b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -453,7 +453,6 @@ struct nested_vmx {
 	struct page *virtual_apic_page;
 	struct page *pi_desc_page;
 	struct pi_desc *pi_desc;
-	bool pi_pending;
 	u16 posted_intr_nv;

 	unsigned long *msr_bitmap;
@@ -5050,10 +5049,9 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	void *vapic_page;
 	u16 status;

-	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
+	if (!vmx->nested.pi_desc)
 		return;

-	vmx->nested.pi_pending = false;
 	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
 		return;

@@ -5126,7 +5124,6 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 		 * If a posted intr is not recognized by hardware,
 		 * we will accomplish it in the next vmentry.
 		 */
-		vmx->nested.pi_pending = true;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		return 0;
 	}
@@ -10488,7 +10485,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	/* Posted interrupts setting is only taken from vmcs12.  */
 	if (nested_cpu_has_posted_intr(vmcs12)) {
 		vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
-		vmx->nested.pi_pending = false;
 		vmcs_write16(POSTED_INTR_NV, POSTED_INTR_NESTED_VECTOR);
 	} else {
 		exec_control &= ~PIN_BASED_POSTED_INTR;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt
  2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
  2017-12-05  8:16 ` [PATCH v2 1/5] KVM: nVMX: Remove pi_pending as signal to process nested posted-interrupts Liran Alon
@ 2017-12-05  8:16 ` Liran Alon
  2017-12-06 18:52   ` Radim Krčmář
  2017-12-05  8:16 ` [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts Liran Alon
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

In case posted-interrupt was delivered to CPU while it is in host
(outside guest), then posted-interrupt delivery will be done by
calling sync_pir_to_irr() at vmentry after interrupts are disabled.

sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
ensure virtual-interrupt-delivery will dispatch interrupt to guest.

However, it is possible that L1 will receive a posted-interrupt while
CPU runs at host and is about to enter L2. In this case, the call to
sync_pir_to_irr() will indeed update the L1's APIC IRR but
vcpu_enter_guest() will then just resume into L2 guest without
re-evaluating if it should exit from L2 to L1 as a result of this
new pending L1 event.

To address this case, if sync_pir_to_irr() has a new L1 injectable
interrupt and CPU is running L2, we set KVM_REQ_EVENT.
This will cause vcpu_enter_guest() to run another iteration of
evaluating pending KVM requests and will therefore consume
KVM_REQ_EVENT which will make sure to call check_nested_events() which
will handle the pending L1 event properly.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f5074ec5701b..47bbb8b691e8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int prev_max_irr;
 	int max_irr;

 	WARN_ON(!vcpu->arch.apicv_active);
+
+	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
 	if (pi_test_on(&vmx->pi_desc)) {
 		pi_clear_on(&vmx->pi_desc);
+
 		/*
 		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
 		 * But on x86 this is just a compiler barrier anyway.
 		 */
 		smp_mb__after_atomic();
 		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+
+		/*
+		 * If we are running L2 and L1 has a new pending interrupt
+		 * which can be injected, we should re-evaluate
+		 * what should be done with this new L1 interrupt.
+		 */
+		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
 	} else {
-		max_irr = kvm_lapic_find_highest_irr(vcpu);
+		max_irr = prev_max_irr;
 	}
+
 	vmx_hwapic_irr_update(vcpu, max_irr);
 	return max_irr;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts
  2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
  2017-12-05  8:16 ` [PATCH v2 1/5] KVM: nVMX: Remove pi_pending as signal to process nested posted-interrupts Liran Alon
  2017-12-05  8:16 ` [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt Liran Alon
@ 2017-12-05  8:16 ` Liran Alon
  2017-12-06 20:20   ` Radim Krčmář
  2017-12-05  8:16 ` [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
  2017-12-05  8:16 ` [PATCH v2 5/5] KVM: nVMX: Wake halted L2 on nested posted-interrupt Liran Alon
  4 siblings, 1 reply; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
which calls vmx_hwapic_irr_update() to update RVI.
Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
when CPU is running L2 and L1 don't intercept external-interrupts.
In that case, code injects interrupt directly into L2 instead of
updating RVI.

Besides being hacky (wouldn't expect function updating RVI to also
inject interrupt), it also doesn't handle this case correctly.
The code contains several issues:
1. When code calls kvm_queue_interrupt() it just passes it max_irr which
represents the highest IRR currently pending in L1 LAPIC.
This is problematic as interrupt was injected to guest but it's bit is
still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
max_irr priority. It just checks if interrupts are enabled in guest with
vmx_interrupt_allowed().

To fix the above issues:
1. Simplify vmx_hwapic_irr_update() to just update RVI.
Note that this shouldn't happen when CPU is running L2
(See comment in code).
2. Since now vmx_hwapic_irr_update() only does logic for L1
virtual-interrupt-delivery, inject_pending_event() should be the
one responsible for injecting the interrupt directly into L2.
Therefore, change kvm_cpu_has_injectable_intr() to check L1
LAPIC when CPU is running L2.

This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
L1 pending events when running L2 and L1 got posted-interrupt")
made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
injectable interrupt.

Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
injection")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/irq.c |  2 +-
 arch/x86/kvm/vmx.c | 32 ++++++++++++--------------------
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 5c24811e8b0b..f171051eecf3 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -79,7 +79,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;
 
-	if (kvm_vcpu_apicv_active(v))
+	if (!is_guest_mode(v) && kvm_vcpu_apicv_active(v))
 		return 0;
 
 	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 47bbb8b691e8..bbc023fb9ef1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9002,30 +9002,18 @@ static void vmx_set_rvi(int vector)
 
 static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 {
-	if (!is_guest_mode(vcpu)) {
-		vmx_set_rvi(max_irr);
-		return;
-	}
-
-	if (max_irr == -1)
-		return;
-
 	/*
-	 * In guest mode.  If a vmexit is needed, vmx_check_nested_events
-	 * handles it.
+	 * When running L2, updating RVI is only relevant when
+	 * vmcs12 virtual-interrupt-delivery enabled.
+	 * However, it can be enabled only when L1 also
+	 * intercepts external-interrupts and in that case
+	 * we should not update vmcs02 RVI but instead intercept
+	 * interrupt. Therefore, do nothing when running L2.
 	 */
-	if (nested_exit_on_intr(vcpu))
+	if ((max_irr == -1) || is_guest_mode(vcpu))
 		return;
 
-	/*
-	 * Else, fall back to pre-APICv interrupt injection since L2
-	 * is run without virtual interrupt delivery.
-	 */
-	if (!kvm_event_needs_reinjection(vcpu) &&
-	    vmx_interrupt_allowed(vcpu)) {
-		kvm_queue_interrupt(vcpu, max_irr, false);
-		vmx_inject_irq(vcpu);
-	}
+	vmx_set_rvi(max_irr);
 }
 
 static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
@@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 		 * If we are running L2 and L1 has a new pending interrupt
 		 * which can be injected, we should re-evaluate
 		 * what should be done with this new L1 interrupt.
+		 * If L1 intercepts external-interrupts, we should
+		 * exit from L2 to L1. Otherwise, interrupt should be
+		 * delivered directly to L2.
+		 * These cases are handled correctly by KVM_REQ_EVENT.
 		 */
 		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
 			kvm_make_request(KVM_REQ_EVENT, vcpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
                   ` (2 preceding siblings ...)
  2017-12-05  8:16 ` [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts Liran Alon
@ 2017-12-05  8:16 ` Liran Alon
  2017-12-05  8:36   ` Wincy Van
  2017-12-06 20:45   ` Radim Krčmář
  2017-12-05  8:16 ` [PATCH v2 5/5] KVM: nVMX: Wake halted L2 on nested posted-interrupt Liran Alon
  4 siblings, 2 replies; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

When L1 wants to send a posted-interrupt to another L1 CPU running L2,
it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in
vmx->nested.pi_desc->control. Then it attempts to send a
notification-vector IPI to dest L1 CPU.
This attempt to send IPI will exit to L0 which will reach
vmx_deliver_nested_posted_interrupt() which does the following:
1. If dest L0 CPU is currently running guest (vcpu->mode ==
IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector.
2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest
L0 CPU exits from guest to host and therefore doesn't recognize physical
IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next
vmentry which will call vmx_check_nested_events() which should call
(in theory) vmx_complete_nested_posted_interrupt(). That function
should see vmx->nested.pi_desc->control ON bit is set and therefore
"emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1
virtual-apic-page & update vmcs02 RVI).

The above logic regarding nested-posted-interrupts contains multiple
issues:

A) Race-condition:
On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE
but before sending physical IPI, the dest CPU will exit to host.
Therefore, physical IPI could be received at host which it's hanlder
does nothing. In addition, assume that dest CPU passes the checks
for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore,
dest CPU will resume L2 without evaluating nested-posted-interrupts.

B) vmx_complete_nested_posted_interrupt() is not always called
when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug
that vmx_check_nested_events() could exit from L2 to L1 before calling
vmx_complete_nested_posted_interrupt(). Therefore, on next resume of
L1 into L2, nested-posted-interrupts won't be evaluated even though L0
resume L2 (We may resume L2 without having KVM_REQ_EVENT set).

This commit removes nested-posted-interrupts processing from
check_nested_events() and instead makes sure to process
nested-posted-interrupts on vmentry after interrupts
disabled. Processing of nested-posted-interrupts is delegated
to hardware by issueing a self-IPI of relevant notification-vector
which will be delivered to CPU when CPU is in guest.

* Bug (A) is solved by the fact that processing of
nested-posted-interrupts is not dependent on KVM_REQ_EVENT and
happens before every vmentry to L2.
* Bug (B) is now trivially solved by processing
nested-posted-interrupts before each vmentry to L2 guest.

An alternative could have been to just call
vmx_complete_nested_posted_interrupt() at this call-site. However, we
have decided to go with this approach because:
1. It would require modifying vmx_complete_nested_posted_interrupt()
to be able to work with interrupts disabled (not-trivial).
2. We preffer to avoid software-emulation of hardware behavior if it
is possible.

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Co-authored-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c              | 57 ++++++++++-------------------------------
 arch/x86/kvm/x86.c              | 14 +++++++---
 3 files changed, 25 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1bfb99770c34..dc0affe69903 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -980,6 +980,7 @@ struct kvm_x86_ops {
 	void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
 	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
 	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+	void (*complete_nested_posted_interrupt)(struct kvm_vcpu *vcpu);
 	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bbc023fb9ef1..517822f94158 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -534,12 +534,6 @@ static bool pi_test_and_set_on(struct pi_desc *pi_desc)
 			(unsigned long *)&pi_desc->control);
 }
 
-static bool pi_test_and_clear_on(struct pi_desc *pi_desc)
-{
-	return test_and_clear_bit(POSTED_INTR_ON,
-			(unsigned long *)&pi_desc->control);
-}
-
 static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 {
 	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
@@ -5041,37 +5035,6 @@ static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
 	}
 }
 
-
-static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	int max_irr;
-	void *vapic_page;
-	u16 status;
-
-	if (!vmx->nested.pi_desc)
-		return;
-
-	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
-		return;
-
-	max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
-	if (max_irr != 256) {
-		vapic_page = kmap(vmx->nested.virtual_apic_page);
-		__kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page);
-		kunmap(vmx->nested.virtual_apic_page);
-
-		status = vmcs_read16(GUEST_INTR_STATUS);
-		if ((u8)max_irr > ((u8)status & 0xff)) {
-			status &= ~0xff;
-			status |= (u8)max_irr;
-			vmcs_write16(GUEST_INTR_STATUS, status);
-		}
-	}
-
-	nested_mark_vmcs12_pages_dirty(vcpu);
-}
-
 static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
 						     bool nested)
 {
@@ -5120,11 +5083,6 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 	    vector == vmx->nested.posted_intr_nv) {
 		/* the PIR and ON have been set by L1. */
 		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
-		/*
-		 * If a posted intr is not recognized by hardware,
-		 * we will accomplish it in the next vmentry.
-		 */
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		return 0;
 	}
 	return -1;
@@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 		kvm_vcpu_kick(vcpu);
 }
 
+static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	WARN_ON(!vcpu->arch.apicv_active);
+
+	if (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
+	    pi_test_on(vmx->nested.pi_desc))
+		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -6823,6 +6792,7 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_apicv()) {
 		enable_apicv = 0;
 		kvm_x86_ops->sync_pir_to_irr = NULL;
+		kvm_x86_ops->complete_nested_posted_interrupt = NULL;
 	}
 
 	if (cpu_has_vmx_tsc_scaling()) {
@@ -11144,7 +11114,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}
 
-	vmx_complete_nested_posted_interrupt(vcpu);
 	return 0;
 }
 
@@ -12136,6 +12105,8 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
 	.hwapic_isr_update = vmx_hwapic_isr_update,
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+	.complete_nested_posted_interrupt =
+		vmx_complete_nested_posted_interrupt,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 34c85aa2e2d1..6fb58ab9a85c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	smp_mb__after_srcu_read_unlock();
 
 	/*
-	 * This handles the case where a posted interrupt was
-	 * notified with kvm_vcpu_kick.
+	 * In case guest got the posted-interrupt notification
+	 * vector while running in host, we need to make sure
+	 * it arrives to guest.
+	 * For L1 posted-interrupts, we manually sync PIR to IRR.
+	 * For L2 posted-interrupts, we send notification-vector
+	 * again by self IPI such that it will now be received in guest.
 	 */
-	if (kvm_lapic_enabled(vcpu)) {
-		if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
+	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) {
+		if (kvm_x86_ops->sync_pir_to_irr)
 			kvm_x86_ops->sync_pir_to_irr(vcpu);
+		if (kvm_x86_ops->complete_nested_posted_interrupt)
+			kvm_x86_ops->complete_nested_posted_interrupt(vcpu);
 	}
 
 	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] KVM: nVMX: Wake halted L2 on nested posted-interrupt
  2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
                   ` (3 preceding siblings ...)
  2017-12-05  8:16 ` [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
@ 2017-12-05  8:16 ` Liran Alon
  4 siblings, 0 replies; 17+ messages in thread
From: Liran Alon @ 2017-12-05  8:16 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Konrad Rzeszutek Wilk

If L1 don't intercept L2 HLT (doesn't set CPU_BASED_HLT_EXITING),
then when L2 executes HLT instruction, KVM will block vCPU from
further execution (just like what happens when L1 executes HLT).

Therefore, when some CPU sends nested-posted-interrupts to L2 there
are 2 important cases to handle:

1. vmx_deliver_nested_posted_interrupt() note that
vcpu->mode != IN_GUEST_MODE and therefore doesn't send a physical IPI.
Because the dest vCPU could be blocked by HLT, we should kick it.

2. vmx_deliver_nested_posted_interrupt() sees that
vcpu->mode == IN_GUEST_MODE and therefore sends a physical IPI but
before it sends the physical IPI, the dest CPU executing L2 executes
HLT which caused the dest vCPU to be blocked. Therefore, the physical
IPI will be received at host and it's handler should make sure to
unblock the vCPU.

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt
processing")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kernel/irq.c | 1 +
 arch/x86/kvm/vmx.c    | 9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 49cfd9fe7589..48c5e4a49279 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -326,6 +326,7 @@ __visible void smp_kvm_posted_intr_nested_ipi(struct pt_regs *regs)
 
 	entering_ack_irq();
 	inc_irq_stat(kvm_posted_intr_nested_ipis);
+	kvm_posted_intr_wakeup_handler();
 	exiting_irq();
 	set_irq_regs(old_regs);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 517822f94158..dcbc4ce5a32a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5082,7 +5082,8 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 	if (is_guest_mode(vcpu) &&
 	    vector == vmx->nested.posted_intr_nv) {
 		/* the PIR and ON have been set by L1. */
-		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
+		if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true))
+			kvm_vcpu_kick(vcpu);
 		return 0;
 	}
 	return -1;
@@ -6680,9 +6681,11 @@ static void wakeup_handler(void)
 	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
 			blocked_vcpu_list) {
-		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+		struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-		if (pi_test_on(pi_desc) == 1)
+		if ((pi_test_on(&vmx->pi_desc) == 1) ||
+		    (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
+		     (pi_test_on(vmx->nested.pi_desc) == 1)))
 			kvm_vcpu_kick(vcpu);
 	}
 	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-05  8:16 ` [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
@ 2017-12-05  8:36   ` Wincy Van
  2017-12-06 20:45   ` Radim Krčmář
  1 sibling, 0 replies; 17+ messages in thread
From: Wincy Van @ 2017-12-05  8:36 UTC (permalink / raw)
  To: Liran Alon
  Cc: Paolo Bonzini, Radim Krčmář,
	kvm, Jim Mattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

Good idea !
Using self-ipi is much more simple and efficient than emulation.

On Tue, Dec 5, 2017 at 4:16 PM, Liran Alon <liran.alon@oracle.com> wrote:
> When L1 wants to send a posted-interrupt to another L1 CPU running L2,
> it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in
> vmx->nested.pi_desc->control. Then it attempts to send a
> notification-vector IPI to dest L1 CPU.
> This attempt to send IPI will exit to L0 which will reach
> vmx_deliver_nested_posted_interrupt() which does the following:
> 1. If dest L0 CPU is currently running guest (vcpu->mode ==
> IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector.
> 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest
> L0 CPU exits from guest to host and therefore doesn't recognize physical
> IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next
> vmentry which will call vmx_check_nested_events() which should call
> (in theory) vmx_complete_nested_posted_interrupt(). That function
> should see vmx->nested.pi_desc->control ON bit is set and therefore
> "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1
> virtual-apic-page & update vmcs02 RVI).
>
> The above logic regarding nested-posted-interrupts contains multiple
> issues:
>
> A) Race-condition:
> On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE
> but before sending physical IPI, the dest CPU will exit to host.
> Therefore, physical IPI could be received at host which it's hanlder
> does nothing. In addition, assume that dest CPU passes the checks
> for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore,
> dest CPU will resume L2 without evaluating nested-posted-interrupts.
>
> B) vmx_complete_nested_posted_interrupt() is not always called
> when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug
> that vmx_check_nested_events() could exit from L2 to L1 before calling
> vmx_complete_nested_posted_interrupt(). Therefore, on next resume of
> L1 into L2, nested-posted-interrupts won't be evaluated even though L0
> resume L2 (We may resume L2 without having KVM_REQ_EVENT set).
>
> This commit removes nested-posted-interrupts processing from
> check_nested_events() and instead makes sure to process
> nested-posted-interrupts on vmentry after interrupts
> disabled. Processing of nested-posted-interrupts is delegated
> to hardware by issueing a self-IPI of relevant notification-vector
> which will be delivered to CPU when CPU is in guest.
>
> * Bug (A) is solved by the fact that processing of
> nested-posted-interrupts is not dependent on KVM_REQ_EVENT and
> happens before every vmentry to L2.
> * Bug (B) is now trivially solved by processing
> nested-posted-interrupts before each vmentry to L2 guest.
>
> An alternative could have been to just call
> vmx_complete_nested_posted_interrupt() at this call-site. However, we
> have decided to go with this approach because:
> 1. It would require modifying vmx_complete_nested_posted_interrupt()
> to be able to work with interrupts disabled (not-trivial).
> 2. We preffer to avoid software-emulation of hardware behavior if it
> is possible.
>
> Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Co-authored-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/vmx.c              | 57 ++++++++++-------------------------------
>  arch/x86/kvm/x86.c              | 14 +++++++---
>  3 files changed, 25 insertions(+), 47 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1bfb99770c34..dc0affe69903 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -980,6 +980,7 @@ struct kvm_x86_ops {
>         void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>         void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
>         void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> +       void (*complete_nested_posted_interrupt)(struct kvm_vcpu *vcpu);
>         int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
>         int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>         int (*get_tdp_level)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bbc023fb9ef1..517822f94158 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -534,12 +534,6 @@ static bool pi_test_and_set_on(struct pi_desc *pi_desc)
>                         (unsigned long *)&pi_desc->control);
>  }
>
> -static bool pi_test_and_clear_on(struct pi_desc *pi_desc)
> -{
> -       return test_and_clear_bit(POSTED_INTR_ON,
> -                       (unsigned long *)&pi_desc->control);
> -}
> -
>  static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
>  {
>         return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
> @@ -5041,37 +5035,6 @@ static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
>         }
>  }
>
> -
> -static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
> -{
> -       struct vcpu_vmx *vmx = to_vmx(vcpu);
> -       int max_irr;
> -       void *vapic_page;
> -       u16 status;
> -
> -       if (!vmx->nested.pi_desc)
> -               return;
> -
> -       if (!pi_test_and_clear_on(vmx->nested.pi_desc))
> -               return;
> -
> -       max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
> -       if (max_irr != 256) {
> -               vapic_page = kmap(vmx->nested.virtual_apic_page);
> -               __kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page);
> -               kunmap(vmx->nested.virtual_apic_page);
> -
> -               status = vmcs_read16(GUEST_INTR_STATUS);
> -               if ((u8)max_irr > ((u8)status & 0xff)) {
> -                       status &= ~0xff;
> -                       status |= (u8)max_irr;
> -                       vmcs_write16(GUEST_INTR_STATUS, status);
> -               }
> -       }
> -
> -       nested_mark_vmcs12_pages_dirty(vcpu);
> -}
> -
>  static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
>                                                      bool nested)
>  {
> @@ -5120,11 +5083,6 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
>             vector == vmx->nested.posted_intr_nv) {
>                 /* the PIR and ON have been set by L1. */
>                 kvm_vcpu_trigger_posted_interrupt(vcpu, true);
> -               /*
> -                * If a posted intr is not recognized by hardware,
> -                * we will accomplish it in the next vmentry.
> -                */
> -               kvm_make_request(KVM_REQ_EVENT, vcpu);
>                 return 0;
>         }
>         return -1;
> @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>                 kvm_vcpu_kick(vcpu);
>  }
>
> +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
> +{
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +       WARN_ON(!vcpu->arch.apicv_active);
> +
> +       if (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
> +           pi_test_on(vmx->nested.pi_desc))
> +               kvm_vcpu_trigger_posted_interrupt(vcpu, true);
> +}
> +
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> @@ -6823,6 +6792,7 @@ static __init int hardware_setup(void)
>         if (!cpu_has_vmx_apicv()) {
>                 enable_apicv = 0;
>                 kvm_x86_ops->sync_pir_to_irr = NULL;
> +               kvm_x86_ops->complete_nested_posted_interrupt = NULL;
>         }
>
>         if (cpu_has_vmx_tsc_scaling()) {
> @@ -11144,7 +11114,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>                 return 0;
>         }
>
> -       vmx_complete_nested_posted_interrupt(vcpu);
>         return 0;
>  }
>
> @@ -12136,6 +12105,8 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
>         .hwapic_isr_update = vmx_hwapic_isr_update,
>         .sync_pir_to_irr = vmx_sync_pir_to_irr,
>         .deliver_posted_interrupt = vmx_deliver_posted_interrupt,
> +       .complete_nested_posted_interrupt =
> +               vmx_complete_nested_posted_interrupt,
>
>         .set_tss_addr = vmx_set_tss_addr,
>         .get_tdp_level = get_ept_level,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 34c85aa2e2d1..6fb58ab9a85c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         smp_mb__after_srcu_read_unlock();
>
>         /*
> -        * This handles the case where a posted interrupt was
> -        * notified with kvm_vcpu_kick.
> +        * In case guest got the posted-interrupt notification
> +        * vector while running in host, we need to make sure
> +        * it arrives to guest.
> +        * For L1 posted-interrupts, we manually sync PIR to IRR.
> +        * For L2 posted-interrupts, we send notification-vector
> +        * again by self IPI such that it will now be received in guest.
>          */
> -       if (kvm_lapic_enabled(vcpu)) {
> -               if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
> +       if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) {
> +               if (kvm_x86_ops->sync_pir_to_irr)
>                         kvm_x86_ops->sync_pir_to_irr(vcpu);
> +               if (kvm_x86_ops->complete_nested_posted_interrupt)
> +                       kvm_x86_ops->complete_nested_posted_interrupt(vcpu);
>         }
>
>         if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt
  2017-12-05  8:16 ` [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt Liran Alon
@ 2017-12-06 18:52   ` Radim Krčmář
  2017-12-07  2:29     ` Liran Alon
  2017-12-11 22:53     ` Paolo Bonzini
  0 siblings, 2 replies; 17+ messages in thread
From: Radim Krčmář @ 2017-12-06 18:52 UTC (permalink / raw)
  To: Liran Alon
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

2017-12-05 10:16+0200, Liran Alon:
> In case posted-interrupt was delivered to CPU while it is in host
> (outside guest), then posted-interrupt delivery will be done by
> calling sync_pir_to_irr() at vmentry after interrupts are disabled.
> 
> sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
> ensure virtual-interrupt-delivery will dispatch interrupt to guest.
> 
> However, it is possible that L1 will receive a posted-interrupt while
> CPU runs at host and is about to enter L2. In this case, the call to
> sync_pir_to_irr() will indeed update the L1's APIC IRR but
> vcpu_enter_guest() will then just resume into L2 guest without
> re-evaluating if it should exit from L2 to L1 as a result of this
> new pending L1 event.
> 
> To address this case, if sync_pir_to_irr() has a new L1 injectable
> interrupt and CPU is running L2, we set KVM_REQ_EVENT.
> This will cause vcpu_enter_guest() to run another iteration of
> evaluating pending KVM requests and will therefore consume
> KVM_REQ_EVENT which will make sure to call check_nested_events() which
> will handle the pending L1 event properly.
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/kvm/vmx.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f5074ec5701b..47bbb8b691e8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int prev_max_irr;
>  	int max_irr;
>  
>  	WARN_ON(!vcpu->arch.apicv_active);
> +
> +	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
>  	if (pi_test_on(&vmx->pi_desc)) {
>  		pi_clear_on(&vmx->pi_desc);
> +
>  		/*
>  		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>  		 * But on x86 this is just a compiler barrier anyway.
>  		 */
>  		smp_mb__after_atomic();
>  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);

I think the optimization (partly livelock protection) is not worth the
overhead of two IRR scans for non-nested guests.  Please make
kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

> +
> +		/*
> +		 * If we are running L2 and L1 has a new pending interrupt
> +		 * which can be injected, we should re-evaluate
> +		 * what should be done with this new L1 interrupt.
> +		 */
> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> +			kvm_make_request(KVM_REQ_EVENT, vcpu);

We don't need anything from KVM_REQ_EVENT and only use it to abort the
VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

>  	} else {
> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
> +		max_irr = prev_max_irr;
>  	}
> +
>  	vmx_hwapic_irr_update(vcpu, max_irr);

We also should just inject the interrupt if L2 is run without
nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts
  2017-12-05  8:16 ` [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts Liran Alon
@ 2017-12-06 20:20   ` Radim Krčmář
  2017-12-07 11:19     ` Liran Alon
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-12-06 20:20 UTC (permalink / raw)
  To: Liran Alon
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

2017-12-05 10:16+0200, Liran Alon:
> Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
> which calls vmx_hwapic_irr_update() to update RVI.
> Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
> when CPU is running L2 and L1 don't intercept external-interrupts.
> In that case, code injects interrupt directly into L2 instead of
> updating RVI.
> 
> Besides being hacky (wouldn't expect function updating RVI to also
> inject interrupt), it also doesn't handle this case correctly.
> The code contains several issues:
> 1. When code calls kvm_queue_interrupt() it just passes it max_irr which
> represents the highest IRR currently pending in L1 LAPIC.
> This is problematic as interrupt was injected to guest but it's bit is
> still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
> 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
> max_irr priority. It just checks if interrupts are enabled in guest with
> vmx_interrupt_allowed().

Good catch.

> To fix the above issues:
> 1. Simplify vmx_hwapic_irr_update() to just update RVI.
> Note that this shouldn't happen when CPU is running L2
> (See comment in code).
> 2. Since now vmx_hwapic_irr_update() only does logic for L1
> virtual-interrupt-delivery, inject_pending_event() should be the
> one responsible for injecting the interrupt directly into L2.
> Therefore, change kvm_cpu_has_injectable_intr() to check L1
> LAPIC when CPU is running L2.
> 
> This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
> L1 pending events when running L2 and L1 got posted-interrupt")
> made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
> injectable interrupt.

Right, that answers my question from the first patch.

> Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
> injection")
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 5c24811e8b0b..f171051eecf3 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -79,7 +79,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
>  	if (kvm_cpu_has_extint(v))
>  		return 1;
>  
> -	if (kvm_vcpu_apicv_active(v))
> +	if (!is_guest_mode(v) && kvm_vcpu_apicv_active(v))
>  		return 0;
>  
>  	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 47bbb8b691e8..bbc023fb9ef1 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9002,30 +9002,18 @@ static void vmx_set_rvi(int vector)
>  
>  static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>  {
> -	if (!is_guest_mode(vcpu)) {
> -		vmx_set_rvi(max_irr);
> -		return;
> -	}
> -
> -	if (max_irr == -1)
> -		return;
> -
>  	/*
> -	 * In guest mode.  If a vmexit is needed, vmx_check_nested_events
> -	 * handles it.
> +	 * When running L2, updating RVI is only relevant when
> +	 * vmcs12 virtual-interrupt-delivery enabled.
> +	 * However, it can be enabled only when L1 also
> +	 * intercepts external-interrupts and in that case
> +	 * we should not update vmcs02 RVI but instead intercept
> +	 * interrupt. Therefore, do nothing when running L2.
>  	 */
> -	if (nested_exit_on_intr(vcpu))
> +	if ((max_irr == -1) || is_guest_mode(vcpu))
>  		return;

The logic should remain

if (!is_guest_mode(vcpu))
	vmx_set_rvi(max_irr);

because we had a bug where RVI was not cleared on reset and I don't
think we fixed it elsewhere.  4114c27d450b ("KVM: x86: reset RVI upon
system reset")

>  
> -	/*
> -	 * Else, fall back to pre-APICv interrupt injection since L2
> -	 * is run without virtual interrupt delivery.
> -	 */
> -	if (!kvm_event_needs_reinjection(vcpu) &&
> -	    vmx_interrupt_allowed(vcpu)) {
> -		kvm_queue_interrupt(vcpu, max_irr, false);
> -		vmx_inject_irq(vcpu);
> -	}
> +	vmx_set_rvi(max_irr);
>  }
>  
>  static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>  		 * If we are running L2 and L1 has a new pending interrupt
>  		 * which can be injected, we should re-evaluate
>  		 * what should be done with this new L1 interrupt.
> +		 * If L1 intercepts external-interrupts, we should
> +		 * exit from L2 to L1. Otherwise, interrupt should be
> +		 * delivered directly to L2.
> +		 * These cases are handled correctly by KVM_REQ_EVENT.
>  		 */
>  		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>  			kvm_make_request(KVM_REQ_EVENT, vcpu);


KVM_REQ_EVENT is not needed when we are intercepting the interrupt,
because we just want to run vmx_check_nested_events(), but it is needed
when delivering L1's interrupt inside L2.

When vmcs12 doesn't intercept external interrupts, I'm thinking we could
pass virtual interrupts from vmcs01 and use RVI normally, as well as
delivering L1 posted interrupts to L2 without a VM exit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-05  8:16 ` [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
  2017-12-05  8:36   ` Wincy Van
@ 2017-12-06 20:45   ` Radim Krčmář
  2017-12-07 11:33     ` Liran Alon
  1 sibling, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-12-06 20:45 UTC (permalink / raw)
  To: Liran Alon
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

2017-12-05 10:16+0200, Liran Alon:
> When L1 wants to send a posted-interrupt to another L1 CPU running L2,
> it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in
> vmx->nested.pi_desc->control. Then it attempts to send a
> notification-vector IPI to dest L1 CPU.
> This attempt to send IPI will exit to L0 which will reach
> vmx_deliver_nested_posted_interrupt() which does the following:
> 1. If dest L0 CPU is currently running guest (vcpu->mode ==
> IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector.
> 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest
> L0 CPU exits from guest to host and therefore doesn't recognize physical
> IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next
> vmentry which will call vmx_check_nested_events() which should call
> (in theory) vmx_complete_nested_posted_interrupt(). That function
> should see vmx->nested.pi_desc->control ON bit is set and therefore
> "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1
> virtual-apic-page & update vmcs02 RVI).
> 
> The above logic regarding nested-posted-interrupts contains multiple
> issues:
> 
> A) Race-condition:
> On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE
> but before sending physical IPI, the dest CPU will exit to host.
> Therefore, physical IPI could be received at host which it's hanlder
> does nothing. In addition, assume that dest CPU passes the checks
> for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore,
> dest CPU will resume L2 without evaluating nested-posted-interrupts.
> 
> B) vmx_complete_nested_posted_interrupt() is not always called
> when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug
> that vmx_check_nested_events() could exit from L2 to L1 before calling
> vmx_complete_nested_posted_interrupt(). Therefore, on next resume of
> L1 into L2, nested-posted-interrupts won't be evaluated even though L0
> resume L2 (We may resume L2 without having KVM_REQ_EVENT set).
> 
> This commit removes nested-posted-interrupts processing from
> check_nested_events() and instead makes sure to process
> nested-posted-interrupts on vmentry after interrupts
> disabled. Processing of nested-posted-interrupts is delegated
> to hardware by issueing a self-IPI of relevant notification-vector
> which will be delivered to CPU when CPU is in guest.
> 
> * Bug (A) is solved by the fact that processing of
> nested-posted-interrupts is not dependent on KVM_REQ_EVENT and
> happens before every vmentry to L2.
> * Bug (B) is now trivially solved by processing
> nested-posted-interrupts before each vmentry to L2 guest.
> 
> An alternative could have been to just call
> vmx_complete_nested_posted_interrupt() at this call-site. However, we
> have decided to go with this approach because:
> 1. It would require modifying vmx_complete_nested_posted_interrupt()
> to be able to work with interrupts disabled (not-trivial).
> 2. We preffer to avoid software-emulation of hardware behavior if it
> is possible.

Nice solution!  Self-IPI was slower on non-nested, but even if it is
slower here, the code saving is definitely worth it.

> Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Co-authored-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  		kvm_vcpu_kick(vcpu);
>  }
>  
> +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	WARN_ON(!vcpu->arch.apicv_active);
> +
> +	if (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
> +	    pi_test_on(vmx->nested.pi_desc))
> +		kvm_vcpu_trigger_posted_interrupt(vcpu, true);

We're always sending to self, so the function would be better with
apic->send_IPI_self() as it might be accelerated by hardware.

> +}
> +
>  /*
>   * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>   * will not change in the lifetime of the guest.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	smp_mb__after_srcu_read_unlock();
>  
>  	/*
> -	 * This handles the case where a posted interrupt was
> -	 * notified with kvm_vcpu_kick.
> +	 * In case guest got the posted-interrupt notification
> +	 * vector while running in host, we need to make sure
> +	 * it arrives to guest.
> +	 * For L1 posted-interrupts, we manually sync PIR to IRR.
> +	 * For L2 posted-interrupts, we send notification-vector
> +	 * again by self IPI such that it will now be received in guest.
>  	 */
> -	if (kvm_lapic_enabled(vcpu)) {
> -		if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
> +	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) {
> +		if (kvm_x86_ops->sync_pir_to_irr)
>  			kvm_x86_ops->sync_pir_to_irr(vcpu);
> +		if (kvm_x86_ops->complete_nested_posted_interrupt)
> +			kvm_x86_ops->complete_nested_posted_interrupt(vcpu);

I think that vcpu->arch.apicv_active is enough to guarantee presence of
both sync_pir_to_irr and complete_nested_posted_interrupt,

and considering this is a hot path, I'd check is_guest_mode(vcpu) before
calling the second function.

(Saving the function call with a new x86_op for those two seems viable
 too, but would need deeper analysis and benchmarking as it is putting
 more code into hot cache ...)

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt
  2017-12-06 18:52   ` Radim Krčmář
@ 2017-12-07  2:29     ` Liran Alon
  2017-12-11 22:53     ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Liran Alon @ 2017-12-07  2:29 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk



On 06/12/17 20:52, Radim Krčmář wrote:
> 2017-12-05 10:16+0200, Liran Alon:
>> In case posted-interrupt was delivered to CPU while it is in host
>> (outside guest), then posted-interrupt delivery will be done by
>> calling sync_pir_to_irr() at vmentry after interrupts are disabled.
>>
>> sync_pir_to_irr() will check if vmx->pi_desc.control ON bit and if
>> set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to
>> ensure virtual-interrupt-delivery will dispatch interrupt to guest.
>>
>> However, it is possible that L1 will receive a posted-interrupt while
>> CPU runs at host and is about to enter L2. In this case, the call to
>> sync_pir_to_irr() will indeed update the L1's APIC IRR but
>> vcpu_enter_guest() will then just resume into L2 guest without
>> re-evaluating if it should exit from L2 to L1 as a result of this
>> new pending L1 event.
>>
>> To address this case, if sync_pir_to_irr() has a new L1 injectable
>> interrupt and CPU is running L2, we set KVM_REQ_EVENT.
>> This will cause vcpu_enter_guest() to run another iteration of
>> evaluating pending KVM requests and will therefore consume
>> KVM_REQ_EVENT which will make sure to call check_nested_events() which
>> will handle the pending L1 event properly.
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>   arch/x86/kvm/vmx.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f5074ec5701b..47bbb8b691e8 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9031,20 +9031,33 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>>   static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +	int prev_max_irr;
>>   	int max_irr;
>>
>>   	WARN_ON(!vcpu->arch.apicv_active);
>> +
>> +	prev_max_irr = kvm_lapic_find_highest_irr(vcpu);
>>   	if (pi_test_on(&vmx->pi_desc)) {
>>   		pi_clear_on(&vmx->pi_desc);
>> +
>>   		/*
>>   		 * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>>   		 * But on x86 this is just a compiler barrier anyway.
>>   		 */
>>   		smp_mb__after_atomic();
>>   		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
>
> I think the optimization (partly livelock protection) is not worth the
> overhead of two IRR scans for non-nested guests.  Please make
> kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

OK. I will modify kvm_apic_update_irr().

>
>> +
>> +		/*
>> +		 * If we are running L2 and L1 has a new pending interrupt
>> +		 * which can be injected, we should re-evaluate
>> +		 * what should be done with this new L1 interrupt.
>> +		 */
>> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> We don't need anything from KVM_REQ_EVENT and only use it to abort the
> VM entry, kvm_vcpu_exiting_guest_mode() is better for that.

Yes you are right. I will change to kvm_vcpu_exiting_guest_mode().

>
>>   	} else {
>> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
>> +		max_irr = prev_max_irr;
>>   	}
>> +
>>   	vmx_hwapic_irr_update(vcpu, max_irr);
>
> We also should just inject the interrupt if L2 is run without
> nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?
See next patch in series :)

>
> Thanks.
>

Regards,
-Liran

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts
  2017-12-06 20:20   ` Radim Krčmář
@ 2017-12-07 11:19     ` Liran Alon
  2017-12-07 16:41       ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Liran Alon @ 2017-12-07 11:19 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk



On 06/12/17 22:20, Radim Krčmář wrote:
> 2017-12-05 10:16+0200, Liran Alon:
>> Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
>> which calls vmx_hwapic_irr_update() to update RVI.
>> Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
>> when CPU is running L2 and L1 don't intercept external-interrupts.
>> In that case, code injects interrupt directly into L2 instead of
>> updating RVI.
>>
>> Besides being hacky (wouldn't expect function updating RVI to also
>> inject interrupt), it also doesn't handle this case correctly.
>> The code contains several issues:
>> 1. When code calls kvm_queue_interrupt() it just passes it max_irr which
>> represents the highest IRR currently pending in L1 LAPIC.
>> This is problematic as interrupt was injected to guest but it's bit is
>> still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
>> 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
>> max_irr priority. It just checks if interrupts are enabled in guest with
>> vmx_interrupt_allowed().
>
> Good catch.
>
>> To fix the above issues:
>> 1. Simplify vmx_hwapic_irr_update() to just update RVI.
>> Note that this shouldn't happen when CPU is running L2
>> (See comment in code).
>> 2. Since now vmx_hwapic_irr_update() only does logic for L1
>> virtual-interrupt-delivery, inject_pending_event() should be the
>> one responsible for injecting the interrupt directly into L2.
>> Therefore, change kvm_cpu_has_injectable_intr() to check L1
>> LAPIC when CPU is running L2.
>>
>> This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
>> L1 pending events when running L2 and L1 got posted-interrupt")
>> made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
>> injectable interrupt.
>
> Right, that answers my question from the first patch.
>
>> Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
>> injection")
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>> index 5c24811e8b0b..f171051eecf3 100644
>> --- a/arch/x86/kvm/irq.c
>> +++ b/arch/x86/kvm/irq.c
>> @@ -79,7 +79,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
>>   	if (kvm_cpu_has_extint(v))
>>   		return 1;
>>
>> -	if (kvm_vcpu_apicv_active(v))
>> +	if (!is_guest_mode(v) && kvm_vcpu_apicv_active(v))
>>   		return 0;
>>
>>   	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 47bbb8b691e8..bbc023fb9ef1 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9002,30 +9002,18 @@ static void vmx_set_rvi(int vector)
>>
>>   static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>>   {
>> -	if (!is_guest_mode(vcpu)) {
>> -		vmx_set_rvi(max_irr);
>> -		return;
>> -	}
>> -
>> -	if (max_irr == -1)
>> -		return;
>> -
>>   	/*
>> -	 * In guest mode.  If a vmexit is needed, vmx_check_nested_events
>> -	 * handles it.
>> +	 * When running L2, updating RVI is only relevant when
>> +	 * vmcs12 virtual-interrupt-delivery enabled.
>> +	 * However, it can be enabled only when L1 also
>> +	 * intercepts external-interrupts and in that case
>> +	 * we should not update vmcs02 RVI but instead intercept
>> +	 * interrupt. Therefore, do nothing when running L2.
>>   	 */
>> -	if (nested_exit_on_intr(vcpu))
>> +	if ((max_irr == -1) || is_guest_mode(vcpu))
>>   		return;
>
> The logic should remain
>
> if (!is_guest_mode(vcpu))
> 	vmx_set_rvi(max_irr);
>
> because we had a bug where RVI was not cleared on reset and I don't
> think we fixed it elsewhere.  4114c27d450b ("KVM: x86: reset RVI upon
> system reset")

I see. You are right. I wasn't aware of that commit. Will change.

>
>>
>> -	/*
>> -	 * Else, fall back to pre-APICv interrupt injection since L2
>> -	 * is run without virtual interrupt delivery.
>> -	 */
>> -	if (!kvm_event_needs_reinjection(vcpu) &&
>> -	    vmx_interrupt_allowed(vcpu)) {
>> -		kvm_queue_interrupt(vcpu, max_irr, false);
>> -		vmx_inject_irq(vcpu);
>> -	}
>> +	vmx_set_rvi(max_irr);
>>   }
>>
>>   static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>> @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>>   		 * If we are running L2 and L1 has a new pending interrupt
>>   		 * which can be injected, we should re-evaluate
>>   		 * what should be done with this new L1 interrupt.
>> +		 * If L1 intercepts external-interrupts, we should
>> +		 * exit from L2 to L1. Otherwise, interrupt should be
>> +		 * delivered directly to L2.
>> +		 * These cases are handled correctly by KVM_REQ_EVENT.
>>   		 */
>>   		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>>   			kvm_make_request(KVM_REQ_EVENT, vcpu);
>
>
> KVM_REQ_EVENT is not needed when we are intercepting the interrupt,
> because we just want to run vmx_check_nested_events(), but it is needed
> when delivering L1's interrupt inside L2.

OK. If vmcs12 intercept external interrupts, I will just call 
kvm_vcpu_exiting_guest_mode(). Otherwise, I will use KVM_REQ_EVENT.

>
> When vmcs12 doesn't intercept external interrupts, I'm thinking we could
> pass virtual interrupts from vmcs01 and use RVI normally, as well as
> delivering L1 posted interrupts to L2 without a VM exit.
>

Maybe I am missing something but I am not sure this is possible.

As the comment I added to vmx_hwapic_irr_update() specifies,
virtual-interrupt-delivery cannot be enabled if exit on 
external-interrupts is disabled. This can be seen in the following code 
in nested_vmx_check_apicv_controls():
/*
* If virtual interrupt delivery is enabled,
* we must exit on external interrupts.
*/
if (nested_cpu_has_vid(vmcs12) &&
     !nested_exit_on_intr(vcpu))
         return -EINVAL;

Therefore, in case vmcs12 disables exit on
external-interrupts then vmcs12 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 
must be cleared. As prepare_vmcs02() currently takes value of 
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY as-is from vmcs12 to vmcs02, it 
means vmcs02 doesn't have virtual-interrupt-delivery in this case. 
Therefore, in this case, we should not update vmcs02 RVI.

Regards,
-Liran

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-06 20:45   ` Radim Krčmář
@ 2017-12-07 11:33     ` Liran Alon
  2017-12-07 16:26       ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Liran Alon @ 2017-12-07 11:33 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk



On 06/12/17 22:45, Radim Krčmář wrote:
> 2017-12-05 10:16+0200, Liran Alon:
>> When L1 wants to send a posted-interrupt to another L1 CPU running L2,
>> it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in
>> vmx->nested.pi_desc->control. Then it attempts to send a
>> notification-vector IPI to dest L1 CPU.
>> This attempt to send IPI will exit to L0 which will reach
>> vmx_deliver_nested_posted_interrupt() which does the following:
>> 1. If dest L0 CPU is currently running guest (vcpu->mode ==
>> IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector.
>> 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest
>> L0 CPU exits from guest to host and therefore doesn't recognize physical
>> IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next
>> vmentry which will call vmx_check_nested_events() which should call
>> (in theory) vmx_complete_nested_posted_interrupt(). That function
>> should see vmx->nested.pi_desc->control ON bit is set and therefore
>> "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1
>> virtual-apic-page & update vmcs02 RVI).
>>
>> The above logic regarding nested-posted-interrupts contains multiple
>> issues:
>>
>> A) Race-condition:
>> On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE
>> but before sending physical IPI, the dest CPU will exit to host.
>> Therefore, physical IPI could be received at host which it's hanlder
>> does nothing. In addition, assume that dest CPU passes the checks
>> for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore,
>> dest CPU will resume L2 without evaluating nested-posted-interrupts.
>>
>> B) vmx_complete_nested_posted_interrupt() is not always called
>> when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug
>> that vmx_check_nested_events() could exit from L2 to L1 before calling
>> vmx_complete_nested_posted_interrupt(). Therefore, on next resume of
>> L1 into L2, nested-posted-interrupts won't be evaluated even though L0
>> resume L2 (We may resume L2 without having KVM_REQ_EVENT set).
>>
>> This commit removes nested-posted-interrupts processing from
>> check_nested_events() and instead makes sure to process
>> nested-posted-interrupts on vmentry after interrupts
>> disabled. Processing of nested-posted-interrupts is delegated
>> to hardware by issueing a self-IPI of relevant notification-vector
>> which will be delivered to CPU when CPU is in guest.
>>
>> * Bug (A) is solved by the fact that processing of
>> nested-posted-interrupts is not dependent on KVM_REQ_EVENT and
>> happens before every vmentry to L2.
>> * Bug (B) is now trivially solved by processing
>> nested-posted-interrupts before each vmentry to L2 guest.
>>
>> An alternative could have been to just call
>> vmx_complete_nested_posted_interrupt() at this call-site. However, we
>> have decided to go with this approach because:
>> 1. It would require modifying vmx_complete_nested_posted_interrupt()
>> to be able to work with interrupts disabled (not-trivial).
>> 2. We preffer to avoid software-emulation of hardware behavior if it
>> is possible.
>
> Nice solution!  Self-IPI was slower on non-nested, but even if it is
> slower here, the code saving is definitely worth it.
>
>> Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
>>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Co-authored-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>>   		kvm_vcpu_kick(vcpu);
>>   }
>>
>> +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	WARN_ON(!vcpu->arch.apicv_active);
>> +
>> +	if (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
>> +	    pi_test_on(vmx->nested.pi_desc))
>> +		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
>
> We're always sending to self, so the function would be better with
> apic->send_IPI_self() as it might be accelerated by hardware.
>

Agreed. Will change.

>> +}
>> +
>>   /*
>>    * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>>    * will not change in the lifetime of the guest.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   	smp_mb__after_srcu_read_unlock();
>>
>>   	/*
>> -	 * This handles the case where a posted interrupt was
>> -	 * notified with kvm_vcpu_kick.
>> +	 * In case guest got the posted-interrupt notification
>> +	 * vector while running in host, we need to make sure
>> +	 * it arrives to guest.
>> +	 * For L1 posted-interrupts, we manually sync PIR to IRR.
>> +	 * For L2 posted-interrupts, we send notification-vector
>> +	 * again by self IPI such that it will now be received in guest.
>>   	 */
>> -	if (kvm_lapic_enabled(vcpu)) {
>> -		if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
>> +	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) {
>> +		if (kvm_x86_ops->sync_pir_to_irr)
>>   			kvm_x86_ops->sync_pir_to_irr(vcpu);
>> +		if (kvm_x86_ops->complete_nested_posted_interrupt)
>> +			kvm_x86_ops->complete_nested_posted_interrupt(vcpu);
>
> I think that vcpu->arch.apicv_active is enough to guarantee presence of
> both sync_pir_to_irr and complete_nested_posted_interrupt,

I am not sure you can remove the check for kvm_lapic_enabled().

apicv_active can be true even when lapic_in_kernel()==false.
apicv_active is only dependent on if physical CPU supports APICv related 
features & enable_apicv module parameter is 1.

However, sync_pir_to_irr() assumes that lapic_in_kernel()==true because
it reads & updates data in in-kernel LAPIC IRR register. So I think at 
least it doesn't makes sense to invoke sync_pir_to_irr() in case 
kvm_lapic_enabled()==false.

But I do agree that complete_nested_posted_interrupt() should be invoked 
if just apicv_active==true. Because the LAPIC that L1 emulate for L2 can 
be enabled and implemented in-kernel even if the LAPIC that L0 emulate 
for L1 is not currently enabled. Therefore, L1 can still send 
posted-interrupts for it's L2 guest in this case.

>
> and considering this is a hot path, I'd check is_guest_mode(vcpu) before
> calling the second function.
>
> (Saving the function call with a new x86_op for those two seems viable
>   too, but would need deeper analysis and benchmarking as it is putting
>   more code into hot cache ...)
>

OK. Will change.

Regards,
-Liran

> Thanks.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-07 11:33     ` Liran Alon
@ 2017-12-07 16:26       ` Radim Krčmář
  2017-12-11 22:57         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-12-07 16:26 UTC (permalink / raw)
  To: Liran Alon
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

2017-12-07 13:33+0200, Liran Alon:
> 
> 
> On 06/12/17 22:45, Radim Krčmář wrote:
> > 2017-12-05 10:16+0200, Liran Alon:
> > > When L1 wants to send a posted-interrupt to another L1 CPU running L2,
> > > it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in
> > > vmx->nested.pi_desc->control. Then it attempts to send a
> > > notification-vector IPI to dest L1 CPU.
> > > This attempt to send IPI will exit to L0 which will reach
> > > vmx_deliver_nested_posted_interrupt() which does the following:
> > > 1. If dest L0 CPU is currently running guest (vcpu->mode ==
> > > IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector.
> > > 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest
> > > L0 CPU exits from guest to host and therefore doesn't recognize physical
> > > IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next
> > > vmentry which will call vmx_check_nested_events() which should call
> > > (in theory) vmx_complete_nested_posted_interrupt(). That function
> > > should see vmx->nested.pi_desc->control ON bit is set and therefore
> > > "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1
> > > virtual-apic-page & update vmcs02 RVI).
> > > 
> > > The above logic regarding nested-posted-interrupts contains multiple
> > > issues:
> > > 
> > > A) Race-condition:
> > > On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE
> > > but before sending physical IPI, the dest CPU will exit to host.
> > > Therefore, physical IPI could be received at host which it's hanlder
> > > does nothing. In addition, assume that dest CPU passes the checks
> > > for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore,
> > > dest CPU will resume L2 without evaluating nested-posted-interrupts.
> > > 
> > > B) vmx_complete_nested_posted_interrupt() is not always called
> > > when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug
> > > that vmx_check_nested_events() could exit from L2 to L1 before calling
> > > vmx_complete_nested_posted_interrupt(). Therefore, on next resume of
> > > L1 into L2, nested-posted-interrupts won't be evaluated even though L0
> > > resume L2 (We may resume L2 without having KVM_REQ_EVENT set).
> > > 
> > > This commit removes nested-posted-interrupts processing from
> > > check_nested_events() and instead makes sure to process
> > > nested-posted-interrupts on vmentry after interrupts
> > > disabled. Processing of nested-posted-interrupts is delegated
> > > to hardware by issueing a self-IPI of relevant notification-vector
> > > which will be delivered to CPU when CPU is in guest.
> > > 
> > > * Bug (A) is solved by the fact that processing of
> > > nested-posted-interrupts is not dependent on KVM_REQ_EVENT and
> > > happens before every vmentry to L2.
> > > * Bug (B) is now trivially solved by processing
> > > nested-posted-interrupts before each vmentry to L2 guest.
> > > 
> > > An alternative could have been to just call
> > > vmx_complete_nested_posted_interrupt() at this call-site. However, we
> > > have decided to go with this approach because:
> > > 1. It would require modifying vmx_complete_nested_posted_interrupt()
> > > to be able to work with interrupts disabled (not-trivial).
> > > 2. We preffer to avoid software-emulation of hardware behavior if it
> > > is possible.
> > 
> > Nice solution!  Self-IPI was slower on non-nested, but even if it is
> > slower here, the code saving is definitely worth it.
> > 
> > > Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
> > > 
> > > Signed-off-by: Liran Alon <liran.alon@oracle.com>
> > > Co-authored-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> > > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
> > >   		kvm_vcpu_kick(vcpu);
> > >   }
> > > 
> > > +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > +
> > > +	WARN_ON(!vcpu->arch.apicv_active);
> > > +
> > > +	if (is_guest_mode(vcpu) && vmx->nested.pi_desc &&
> > > +	    pi_test_on(vmx->nested.pi_desc))
> > > +		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
> > 
> > We're always sending to self, so the function would be better with
> > apic->send_IPI_self() as it might be accelerated by hardware.
> > 
> 
> Agreed. Will change.
> 
> > > +}
> > > +
> > >   /*
> > >    * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> > >    * will not change in the lifetime of the guest.
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >   	smp_mb__after_srcu_read_unlock();
> > > 
> > >   	/*
> > > -	 * This handles the case where a posted interrupt was
> > > -	 * notified with kvm_vcpu_kick.
> > > +	 * In case guest got the posted-interrupt notification
> > > +	 * vector while running in host, we need to make sure
> > > +	 * it arrives to guest.
> > > +	 * For L1 posted-interrupts, we manually sync PIR to IRR.
> > > +	 * For L2 posted-interrupts, we send notification-vector
> > > +	 * again by self IPI such that it will now be received in guest.
> > >   	 */
> > > -	if (kvm_lapic_enabled(vcpu)) {
> > > -		if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
> > > +	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) {
> > > +		if (kvm_x86_ops->sync_pir_to_irr)
> > >   			kvm_x86_ops->sync_pir_to_irr(vcpu);
> > > +		if (kvm_x86_ops->complete_nested_posted_interrupt)
> > > +			kvm_x86_ops->complete_nested_posted_interrupt(vcpu);
> > 
> > I think that vcpu->arch.apicv_active is enough to guarantee presence of
> > both sync_pir_to_irr and complete_nested_posted_interrupt,
> 
> I am not sure you can remove the check for kvm_lapic_enabled().
> 
> apicv_active can be true even when lapic_in_kernel()==false.
> apicv_active is only dependent on if physical CPU supports APICv related
> features & enable_apicv module parameter is 1.
> 
> However, sync_pir_to_irr() assumes that lapic_in_kernel()==true because
> it reads & updates data in in-kernel LAPIC IRR register. So I think at least
> it doesn't makes sense to invoke sync_pir_to_irr() in case
> kvm_lapic_enabled()==false.

I agree.

> But I do agree that complete_nested_posted_interrupt() should be invoked if
> just apicv_active==true. Because the LAPIC that L1 emulate for L2 can be
> enabled and implemented in-kernel even if the LAPIC that L0 emulate for L1
> is not currently enabled. Therefore, L1 can still send posted-interrupts for
> it's L2 guest in this case.

I think that posted interrupts depend on enabled harware LAPIC (with
nested the lapic L0 provides), so if L1 can't send or receive
interrupts, the posted interrupts wouldn't work.

We emulate posted interrupts "incorrectly" by looking at PIR and
injecting even if there was no notification vector, but we don't really
have to do anything if there can be no notification vector coming to L1.

Keeping the 'kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active' for
both is fine by me, I just didn't see a reason to check for presence of
sync_pir_to_irr and complete_nested_posted_interrupt, because if they
are NULL, then vcpu->arch.apicv_active must be false.

> > 
> > and considering this is a hot path, I'd check is_guest_mode(vcpu) before
> > calling the second function.
> > 
> > (Saving the function call with a new x86_op for those two seems viable
> >   too, but would need deeper analysis and benchmarking as it is putting
> >   more code into hot cache ...)
> > 
> 
> OK. Will change.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts
  2017-12-07 11:19     ` Liran Alon
@ 2017-12-07 16:41       ` Radim Krčmář
  0 siblings, 0 replies; 17+ messages in thread
From: Radim Krčmář @ 2017-12-07 16:41 UTC (permalink / raw)
  To: Liran Alon
  Cc: pbonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

2017-12-07 13:19+0200, Liran Alon:
> On 06/12/17 22:20, Radim Krčmář wrote:
> > 2017-12-05 10:16+0200, Liran Alon:
> > > Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
> > > which calls vmx_hwapic_irr_update() to update RVI.
> > > Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
> > > when CPU is running L2 and L1 don't intercept external-interrupts.
> > > In that case, code injects interrupt directly into L2 instead of
> > > updating RVI.
> > > 
> > > Besides being hacky (wouldn't expect function updating RVI to also
> > > inject interrupt), it also doesn't handle this case correctly.
> > > The code contains several issues:
> > > 1. When code calls kvm_queue_interrupt() it just passes it max_irr which
> > > represents the highest IRR currently pending in L1 LAPIC.
> > > This is problematic as interrupt was injected to guest but it's bit is
> > > still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
> > > 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
> > > max_irr priority. It just checks if interrupts are enabled in guest with
> > > vmx_interrupt_allowed().
> > 
> > Good catch.
> > 
> > > To fix the above issues:
> > > 1. Simplify vmx_hwapic_irr_update() to just update RVI.
> > > Note that this shouldn't happen when CPU is running L2
> > > (See comment in code).
> > > 2. Since now vmx_hwapic_irr_update() only does logic for L1
> > > virtual-interrupt-delivery, inject_pending_event() should be the
> > > one responsible for injecting the interrupt directly into L2.
> > > Therefore, change kvm_cpu_has_injectable_intr() to check L1
> > > LAPIC when CPU is running L2.
> > > 
> > > This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
> > > L1 pending events when running L2 and L1 got posted-interrupt")
> > > made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
> > > injectable interrupt.
> > 
> > Right, that answers my question from the first patch.
> > 
> > > Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
> > > injection")
> > > 
> > > Signed-off-by: Liran Alon <liran.alon@oracle.com>
> > > Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> > > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > >   		 * If we are running L2 and L1 has a new pending interrupt
> > >   		 * which can be injected, we should re-evaluate
> > >   		 * what should be done with this new L1 interrupt.
> > > +		 * If L1 intercepts external-interrupts, we should
> > > +		 * exit from L2 to L1. Otherwise, interrupt should be
> > > +		 * delivered directly to L2.
> > > +		 * These cases are handled correctly by KVM_REQ_EVENT.
> > >   		 */
> > >   		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> > >   			kvm_make_request(KVM_REQ_EVENT, vcpu);
> > 
> > 
> > KVM_REQ_EVENT is not needed when we are intercepting the interrupt,
> > because we just want to run vmx_check_nested_events(), but it is needed
> > when delivering L1's interrupt inside L2.
> 
> OK. If vmcs12 intercept external interrupts, I will just call
> kvm_vcpu_exiting_guest_mode(). Otherwise, I will use KVM_REQ_EVENT.
> 
> > 
> > When vmcs12 doesn't intercept external interrupts, I'm thinking we could
> > pass virtual interrupts from vmcs01 and use RVI normally, as well as
> > delivering L1 posted interrupts to L2 without a VM exit.
> > 
> 
> Maybe I am missing something but I am not sure this is possible.
> 
> As the comment I added to vmx_hwapic_irr_update() specifies,
> virtual-interrupt-delivery cannot be enabled if exit on external-interrupts
> is disabled. This can be seen in the following code in
> nested_vmx_check_apicv_controls():
> /*
> * If virtual interrupt delivery is enabled,
> * we must exit on external interrupts.
> */
> if (nested_cpu_has_vid(vmcs12) &&
>     !nested_exit_on_intr(vcpu))
>         return -EINVAL;
> 
> Therefore, in case vmcs12 disables exit on
> external-interrupts then vmcs12 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY must be
> cleared. As prepare_vmcs02() currently takes value of
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY as-is from vmcs12 to vmcs02, it means
> vmcs02 doesn't have virtual-interrupt-delivery in this case. Therefore, in
> this case, we should not update vmcs02 RVI.

Right, vmcs12 can't configure it, but we already have
PIN_BASED_EXT_INTR_MASK in vmcs02, so we'd also enable
VIRTUAL_INTR_DELIVERY in vmcs02 if we detect that vmcs12 disables
PIN_BASED_EXT_INTR_MASK.  And because vmcs12 can't use it, we're free to
configure vmcs02 with data from vmcs01, so interrupt delivery would
behave as if there was no nesting.
(Dislaimer: I haven't verified that it would actually work.)

For now, I think that throwing KVM_REQ_EVENT in this case is perfectly
acceptable and we can optimize later if we notice that it is being
widely used.  (Windows with device guard come to mind.)

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt
  2017-12-06 18:52   ` Radim Krčmář
  2017-12-07  2:29     ` Liran Alon
@ 2017-12-11 22:53     ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2017-12-11 22:53 UTC (permalink / raw)
  To: Radim Krčmář, Liran Alon
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

On 06/12/2017 19:52, Radim Krčmář wrote:
>>  		smp_mb__after_atomic();
>>  		max_irr = kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
> I think the optimization (partly livelock protection) is not worth the
> overhead of two IRR scans for non-nested guests.  Please make
> kvm_apic_update_irr() return both prev_max_irr and max_irr in one pass.

You could also return max_irr in an int*, and give the function a "bool"
return type for max_irr > prev_max_irr.  That is more efficient because
you can do the check in the "if (pir_val)" conditional of
__kvm_apic_update_irr.

Paolo

>> +
>> +		/*
>> +		 * If we are running L2 and L1 has a new pending interrupt
>> +		 * which can be injected, we should re-evaluate
>> +		 * what should be done with this new L1 interrupt.
>> +		 */
>> +		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
>> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
> We don't need anything from KVM_REQ_EVENT and only use it to abort the
> VM entry, kvm_vcpu_exiting_guest_mode() is better for that.
> 
>>  	} else {
>> -		max_irr = kvm_lapic_find_highest_irr(vcpu);
>> +		max_irr = prev_max_irr;
>>  	}
>> +
>>  	vmx_hwapic_irr_update(vcpu, max_irr);
> We also should just inject the interrupt if L2 is run without
> nested_exit_on_intr(), maybe reusing the check in vmx_hwapic_irr_update?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled
  2017-12-07 16:26       ` Radim Krčmář
@ 2017-12-11 22:57         ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2017-12-11 22:57 UTC (permalink / raw)
  To: Radim Krčmář, Liran Alon
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan,
	Konrad Rzeszutek Wilk

On 07/12/2017 17:26, Radim Krčmář wrote:
> 
> Keeping the 'kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active' for
> both is fine by me, I just didn't see a reason to check for presence of
> sync_pir_to_irr and complete_nested_posted_interrupt, because if they
> are NULL, then vcpu->arch.apicv_active must be false.

Actually AMD doesn't have them, does it?  But adding dummies would be
fine for me.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-12-11 22:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-05  8:16 [PATCH v2 0/5]: KVM: nVMX: Fix multiple issues with nested-posted-interrupts Liran Alon
2017-12-05  8:16 ` [PATCH v2 1/5] KVM: nVMX: Remove pi_pending as signal to process nested posted-interrupts Liran Alon
2017-12-05  8:16 ` [PATCH v2 2/5] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt Liran Alon
2017-12-06 18:52   ` Radim Krčmář
2017-12-07  2:29     ` Liran Alon
2017-12-11 22:53     ` Paolo Bonzini
2017-12-05  8:16 ` [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts Liran Alon
2017-12-06 20:20   ` Radim Krčmář
2017-12-07 11:19     ` Liran Alon
2017-12-07 16:41       ` Radim Krčmář
2017-12-05  8:16 ` [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
2017-12-05  8:36   ` Wincy Van
2017-12-06 20:45   ` Radim Krčmář
2017-12-07 11:33     ` Liran Alon
2017-12-07 16:26       ` Radim Krčmář
2017-12-11 22:57         ` Paolo Bonzini
2017-12-05  8:16 ` [PATCH v2 5/5] KVM: nVMX: Wake halted L2 on nested posted-interrupt Liran Alon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.