[PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios
@ 2017-11-05 14:07 Liran Alon
  2017-11-05 14:07 ` [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 Liran Alon
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Liran Alon @ 2017-11-05 14:07 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: jmattson, idan.brown

Hi,

This series of patches aim to fix multiple related issues with how
pending event injection works on nVMX.

The first patch fixes a simple error in the return-value of
vmx_check_nested_events(). Next patch relies on it to correctly
determine when an immediate-exit is required from L2 to L1.

The second patch fixes a critical bug which caused L1 to miss an IPI
in case it was sent when destination CPU exited from L2 to L0 due to
event-delivery.

The third patch removes a now reduntant signaling of KVM_REQ_EVENT.
This actually masked the issue fixed in the previous patch.

The fourth patch fixes an issue of not always syncing PIR to L1
Virtual-APIC-Page when KVM_REQ_EVENT is signaled. This patch relies
on vmx_check_nested_events() always being called when KVM_REQ_EVENT is
set which is true since the second patch.

Thanks,
-Liran Alon.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
@ 2017-11-05 14:07 ` Liran Alon
  2017-11-10 21:44   ` Radim Krčmář
  2017-11-05 14:07 ` [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending Liran Alon
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Liran Alon @ 2017-11-05 14:07 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, idan.brown, Liran Alon, Konrad Rzeszutek Wilk

vmx_check_nested_events() should return -EBUSY only in case there is a
pending L1 event which requires a VMExit from L2 to L1 but such a
VMExit is currently blocked. Such VMExits are blocked either
because nested_run_pending=1 or an event was reinjected to L2.
vmx_check_nested_events() should return 0 in case there are no
pending L1 events which requires a VMExit from L2 to L1 or if
a VMExit from L2 to L1 was done internally.

However, upstream commit which introduced blocking in case an event was
reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events
injection")) contains a bug: It returns -EBUSY even if there are no
pending L1 events which requires VMExit from L2 to L1.

This commit fix this issue.

Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 95a01609d7ee..ec3bf833a1bc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11034,13 +11034,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long exit_qual;
-
-	if (kvm_event_needs_reinjection(vcpu))
-		return -EBUSY;
+	bool block_nested_events =
+	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
 
 	if (vcpu->arch.exception.pending &&
 		nested_vmx_check_exception(vcpu, &exit_qual)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
 		vcpu->arch.exception.pending = false;
@@ -11049,14 +11048,14 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 
 	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
 	    vmx->nested.preemption_timer_expired) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
 		return 0;
 	}
 
 	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
 				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
@@ -11072,7 +11071,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 
 	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
 	    nested_exit_on_intr(vcpu)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
 		return 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
  2017-11-05 14:07 ` [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 Liran Alon
@ 2017-11-05 14:07 ` Liran Alon
  2017-11-10 23:26   ` Paolo Bonzini
  2017-11-05 14:07 ` [PATCH 3/4] KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending Liran Alon
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Liran Alon @ 2017-11-05 14:07 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, idan.brown, Liran Alon, Konrad Rzeszutek Wilk

In case L2 VMExit to L0 during event-delivery, VMCS02 is filled with
IDT-vectoring-info which vmx_complete_interrupts() makes sure to
reinject before next resume of L2.

While handling the VMExit in L0, an IPI could be sent by another L1 vCPU
to the L1 vCPU which currently runs L2 and exited to L0.

When L0 will reach vcpu_enter_guest() and call inject_pending_event(),
it will note that a previous event was re-injected to L2 (by
IDT-vectoring-info) and therefore won't check if there are pending L1
events which require exit from L2 to L1. Thus, L0 enters L2 without
immediate VMExit even though there are pending L1 events!

This commit fixes the issue by making sure to check for L1 pending
events even if a previous event was reinjected to L2 and bailing out
from inject_pending_event() before evaluating a new pending event in
case an event was already reinjected.

The bug was observed by the following setup:
* L0 is a 64CPU machine which runs KVM.
* L1 is a 16CPU machine which runs KVM.
* L0 & L1 runs with APICv disabled.
(Also reproduced with APICv enabled but easier to analyze below info
with APICv disabled)
* L1 runs a 16CPU L2 Windows Server 2012 R2 guest.
During L2 boot, L1 hungs completely and analyzing the hung reveals that
one L1 vCPU is holding KVM's mmu_lock and is waiting forever on an IPI
that he has sent for another L1 vCPU. And all other L1 vCPUs are
currently attempting to grab mmu_lock. Therefore, all L1 vCPUs are stuck
forever (as L1 runs with kernel-preemption disabled).

Observing /sys/kernel/debug/tracing/trace_pipe reveals the following
series of events:
(1) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip:
0xfffff802c5dca82f reason: EPT_VIOLATION ext_inf1: 0x0000000000000182
ext_inf2: 0x00000000800000d2 ext_int: 0x00000000 ext_int_err: 0x00000000
(2) qemu-system-x86-19054 [028] kvm_apic_accept_irq: apicid f
vec 252 (Fixed|edge)
(3) qemu-system-x86-19066 [030] kvm_inj_virq: irq 210
(4) qemu-system-x86-19066 [030] kvm_entry: vcpu 15
(5) qemu-system-x86-19066 [030] kvm_exit: reason EPT_VIOLATION
rip 0xffffe00069202690 info 83 0
(6) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip:
0xffffe00069202690 reason: EPT_VIOLATION ext_inf1: 0x0000000000000083
ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
(7) qemu-system-x86-19066 [030] kvm_nested_vmexit_inject: reason:
EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000
ext_int: 0x00000000 ext_int_err: 0x00000000
(8) qemu-system-x86-19066 [030] kvm_entry: vcpu 15

Which can be analyzed as follows:
(1) L2 VMExit to L0 on EPT_VIOLATION during delivery of vector 0xd2.
Therefore, vmx_complete_interrupts() will set KVM_REQ_EVENT and reinject
a pending-interrupt of 0xd2.
(2) L1 sends an IPI of vector 0xfc (CALL_FUNCTION_VECTOR) to destination
vCPU 15. This will set relevant bit in LAPIC's IRR and set KVM_REQ_EVENT.
(3) L0 reach vcpu_enter_guest() which calls inject_pending_event() which
notes that interrupt 0xd2 was reinjected and therefore calls
vmx_inject_irq() and returns. Without checking for pending L1 events!
Note that at this point, KVM_REQ_EVENT was cleared by vcpu_enter_guest()
before calling inject_pending_event().
(4) L0 resumes L2 without immediate-exit even though there is a pending
L1 event (The IPI pending in LAPIC's IRR).

We have already reached the buggy scenario but events could be
furthered analyzed:
(5+6) L2 VMExit to L0 on EPT_VIOLATION.  This time not during
event-delivery.
(7) L0 decides to forward the VMExit to L1 for further handling.
(8) L0 resumes into L1. Note that because KVM_REQ_EVENT is cleared, the
LAPIC's IRR is not examined and therefore the IPI is still not delivered
into L1!

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/x86.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..41f199287edf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6379,33 +6379,38 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 	int r;

 	/* try to reinject previous events if any */
-	if (vcpu->arch.exception.injected) {
+	if (vcpu->arch.exception.injected)
 		kvm_x86_ops->queue_exception(vcpu);
-		return 0;
-	}
-
 	/*
 	 * Exceptions must be injected immediately, or the exception
 	 * frame will have the address of the NMI or interrupt handler.
 	 */
-	if (!vcpu->arch.exception.pending) {
-		if (vcpu->arch.nmi_injected) {
+	else if (!vcpu->arch.exception.pending) {
+		if (vcpu->arch.nmi_injected)
 			kvm_x86_ops->set_nmi(vcpu);
-			return 0;
-		}
-
-		if (vcpu->arch.interrupt.pending) {
+		else if (vcpu->arch.interrupt.pending)
 			kvm_x86_ops->set_irq(vcpu);
-			return 0;
-		}
 	}

+	/*
+	 * Call check_nested_events() even if we reinjected a previous event
+	 * in order for caller to determine if it should require immediate-exit
+	 * from L2 to L1 due to pending L1 events which require exit
+	 * from L2 to L1.
+	 */
 	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
 		r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
 		if (r != 0)
 			return r;
 	}

+	/*
+	 * If we reinjected a previous event,
+	 * don't consider a new pending event
+	 */
+	if (kvm_event_needs_reinjection(vcpu))
+		return 0;
+
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
 		trace_kvm_inj_exception(vcpu->arch.exception.nr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
  2017-11-05 14:07 ` [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 Liran Alon
  2017-11-05 14:07 ` [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending Liran Alon
@ 2017-11-05 14:07 ` Liran Alon
  2017-11-05 14:07 ` [PATCH 4/4] KVM: nVMX: APICv: Always sync PIR to Virtual-APIC-Page on processing KVM_REQ_EVENT Liran Alon
  2017-11-07 14:25 ` [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Liran Alon @ 2017-11-05 14:07 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, idan.brown, Liran Alon, Konrad Rzeszutek Wilk

When vCPU runs L2 and there is a pending event that requires to exit
from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request
an immediate-exit from L2 (See req_immediate_exit).

Since now handling of req_immediate_exit also makes sure to set
KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when
nested_run_pending=1.

This optimizes cases where VMRESUME was executed by L1 to enter L2 and
there is no pending events that require exit from L2 to L1. Previously,
this would have set KVM_REQ_EVENT unnecessarly.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ec3bf833a1bc..c440df4a1604 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9442,14 +9442,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 			__write_pkru(vmx->host_pkru);
 	}
 
-	/*
-	 * the KVM_REQ_EVENT optimization bit is only on for one entry, and if
-	 * we did not inject a still-pending event to L1 now because of
-	 * nested_run_pending, we need to re-enable this bit.
-	 */
-	if (vmx->nested.nested_run_pending)
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
-
 	vmx->nested.nested_run_pending = 0;
 	vmx->idt_vectoring_info = 0;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] KVM: nVMX: APICv: Always sync PIR to Virtual-APIC-Page on processing KVM_REQ_EVENT
  2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
                   ` (2 preceding siblings ...)
  2017-11-05 14:07 ` [PATCH 3/4] KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending Liran Alon
@ 2017-11-05 14:07 ` Liran Alon
  2017-11-07 14:25 ` [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Liran Alon @ 2017-11-05 14:07 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, idan.brown, Liran Alon, Konrad Rzeszutek Wilk

Consider the case L2 exits to L0 during event-delivery.
In this case, vmx_complete_interrupts() will see IDT-vectoring-info is
valid and therefore update KVM structs for event reinjection on next L2
resume.

Assume that before L0 reaches vcpu_enter_guest(), another L1 CPU sends
an IPI via virtual-posted-interrupts. That CPU will write a new vector
to destination nested.pi_desc->pir and then will trigger an
IPI with vector nested.posted_intr_nv. This will reach
vmx_deliver_nested_posted_interrupt() which won't send a physical IPI
(as vcpu->mode != IN_GUEST_MODE) but instead just signal
nested.pi_pending=true and set KVM_REQ_EVENT.

When destination CPU will reach vcpu_enter_guest(), it will consume the
KVM_REQ_EVENT and call inject_pending_event() which will call
check_nested_events(). However, because we have an event for reinjection
to L2, vmx_check_nested_events() will return before calling
vmx_complete_nested_posted_interrupt()! Therefore, not updating
L1 virtual-apic-page and vmcs02's RVI.

Assume that at this point we exit L2 and some L1
interrupt is raised afterwards (For example, another L1 CPU IPI).
We will reach again vcpu_enter_guest() and call check_nested_events()
that will exit from L2 to L1 due to pending interrupt and return from
check_nested_events(). Again, without calling
vmx_complete_nested_posted_interrupts()!
At this point KVM_REQ_EVENT was already consumed and therefore cleared.

When L1 will again VMRESUME into L2, it will run L2 without updated
virtual-apic-page, with bad RVI and with PIR.ON set. Which is of course
a bug...

Fix this entire complex issue by just make vmx_check_nested_events()
always call vmx_complete_nested_posted_interrupt().

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c440df4a1604..d1981620c13a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11029,6 +11029,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 	bool block_nested_events =
 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);

+	vmx_complete_nested_posted_interrupt(vcpu);
+
 	if (vcpu->arch.exception.pending &&
 		nested_vmx_check_exception(vcpu, &exit_qual)) {
 		if (block_nested_events)
@@ -11069,7 +11071,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}

-	vmx_complete_nested_posted_interrupt(vcpu);
 	return 0;
 }

-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios
  2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
                   ` (3 preceding siblings ...)
  2017-11-05 14:07 ` [PATCH 4/4] KVM: nVMX: APICv: Always sync PIR to Virtual-APIC-Page on processing KVM_REQ_EVENT Liran Alon
@ 2017-11-07 14:25 ` Paolo Bonzini
  2017-11-08  0:40   ` Liran Alon
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-07 14:25 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm; +Cc: jmattson, idan.brown

On 05/11/2017 15:07, Liran Alon wrote:
> Hi,
> 
> This series of patches aim to fix multiple related issues with how
> pending event injection works on nVMX.
> 
> The first patch fixes a simple error in the return-value of
> vmx_check_nested_events(). Next patch relies on it to correctly
> determine when an immediate-exit is required from L2 to L1.
> 
> The second patch fixes a critical bug which caused L1 to miss an IPI
> in case it was sent when destination CPU exited from L2 to L0 due to
> event-delivery.
> 
> The third patch removes a now reduntant signaling of KVM_REQ_EVENT.
> This actually masked the issue fixed in the previous patch.
> 
> The fourth patch fixes an issue of not always syncing PIR to L1
> Virtual-APIC-Page when KVM_REQ_EVENT is signaled. This patch relies
> on vmx_check_nested_events() always being called when KVM_REQ_EVENT is
> set which is true since the second patch.

With all the discussions on the other series, I didn't reply here.  I
haven't commented yet because I want to see first of all whether we have
coverage in kvm-unit-tests of the issue that the first two patches fix
(i.e., does something break in kvm-unit-tests if I only apply patch 3).

Also, I had some incomplete work that eliminates
vmx_inject_page_fault_nested (which IMO is only papering over bugs,
because anything it does should be covered in theory by
nested_vmx_inject_exception_vmexit).  I'm curious if patches 1-2 help
there too.

However, all this is going to be work for 4.16.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios
  2017-11-07 14:25 ` [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Paolo Bonzini
@ 2017-11-08  0:40   ` Liran Alon
  0 siblings, 0 replies; 11+ messages in thread
From: Liran Alon @ 2017-11-08  0:40 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: jmattson, idan.brown, wanpeng.li, bsd

On 07/11/17 16:25, Paolo Bonzini wrote:
> On 05/11/2017 15:07, Liran Alon wrote:
>> Hi,
>>
>> This series of patches aim to fix multiple related issues with how
>> pending event injection works on nVMX.
>>
>> The first patch fixes a simple error in the return-value of
>> vmx_check_nested_events(). Next patch relies on it to correctly
>> determine when an immediate-exit is required from L2 to L1.
>>
>> The second patch fixes a critical bug which caused L1 to miss an IPI
>> in case it was sent when destination CPU exited from L2 to L0 due to
>> event-delivery.
>>
>> The third patch removes a now reduntant signaling of KVM_REQ_EVENT.
>> This actually masked the issue fixed in the previous patch.
>>
>> The fourth patch fixes an issue of not always syncing PIR to L1
>> Virtual-APIC-Page when KVM_REQ_EVENT is signaled. This patch relies
>> on vmx_check_nested_events() always being called when KVM_REQ_EVENT is
>> set which is true since the second patch.
>
> With all the discussions on the other series, I didn't reply here.  I
> haven't commented yet because I want to see first of all whether we have
> coverage in kvm-unit-tests of the issue that the first two patches fix
> (i.e., does something break in kvm-unit-tests if I only apply patch 3).
I don't think there is a kvm-unit-test which should cover this as just 
by looking at x86/unittests.cfg at all tests which have a "smp" 
parameter, it seems none of them relates to nVMX. Because the bug nature 
requires at least 2 L1 CPUs, I think this is not covered.
I actually have a couple of additional patches that relates to 
race-conditions in nVMX. I will send them soon.

I thought about writing a unit-test for this issue but it is a complex 
race-condition and every test I thought about would be probabilistic. As 
you will need a L1 IPI to be sent to dest CPU that is exactly at the 
moment of exiting from L2 to L0 due to event-delivery but not because of 
L1 (you want to avoid VMExit to be forwarded from L0 to L1 but rather 
resume again to L2. For example, making sure that nEPT will have L2 
IDT-descriptor unmapped but is mapped correctly in L1 EPT.)

I would be happy for ideas on how to make such a unit-test not 
probabilistic (Besides repeating it large amount of iterations in hope 
that chance not to reproduce is very small).

The unit-test I thought about goes something like this:
1. CPU A sends X L1 IPIs to CPU B. The IPI handler in CPU B will 
increment a var which documents amount of times the IPI handler actually 
got to dest CPU. Before each time CPU A send an IPI, it will busy-loop 
to see var indeed incremented (L1 IPI Ack).
2. CPU B will run a loop of Y iterations of the following:
a) Setup L1 EPT that maps memory 1-to-1. Including memory that will be 
used for L2 IDT.
b) Before entering L2, execute INVEPT on L2 IDT memory such that L0 nEPT 
will remove it's mapping for it. (As done for Shadow MMU when running 
"invlpg" instruction)
c) Enter L2 and perform an "int $x" instruction. This should exit from 
L2 to L0 due to event-delivery. After L2's interrupt-handler returns, 
exit to L1 (by using vmcall) and repeat from step a.
>
> Also, I had some incomplete work that eliminates
> vmx_inject_page_fault_nested (which IMO is only papering over bugs,
> because anything it does should be covered in theory by
> nested_vmx_inject_exception_vmexit).  I'm curious if patches 1-2 help
> there too.
I'm probably missing something but I fail to see how patches 1-2 could 
help there. Can you explain?
>
> However, all this is going to be work for 4.16.
I am sorry for the long respond but I think a bit of context here will 
help discussion a lot.

Oracle Ravello (which I work at) is a cloud provider running on top of 
cloud providers. When we run on top of Oracle OCI cloud, we run customer 
VMs using KVM nested virtualization.
The issue at hand here was discovered from *real production* use-cases 
and was rare until we managed to find a simple single use-case which 
reproduced ~100% (I described the setup which reproduce on commit 
message with hope that others can reproduce it as-well).
After issue was reproduced, we were able to pin-point the root-cause and 
verify the fix works (as it reproduced ~100%). Also it was clear from 
KVM trace-pipe that this is indeed the issue.

In addition, in inject_pending_event() one can see a comment referring 
to: https://lkml.org/lkml/2014/7/2/60
Following the thread, one can see that an attempt was made to only set 
KVM_REQ_EVENT when a L1 event was actually blocked instead of when 
nested_run_pending=true. However, Bandan Das then notes this change 
caused L1 to be stuck on smp_call_function_many(). In addition, looking 
at the bug discussion the thread claims to fix 
(https://bugzilla.kernel.org/show_bug.cgi?id=72381), one can see at the 
attached L1 serial the exact stack-traces that we saw in the issue I 
fixed: All L1 CPUs attempting to grab mmu_lock while one CPU is holding 
the mmu_lock and is currently stuck waiting for ACK on some IPI.
At the end of patch-discussion, it was decided to fix the issue 
(correctly I must say) by adding an extra call to check_nested_events() 
in inject_pending_event(). However, the original setting of 
KVM_REQ_EVENT on nested_run_pending=true remained as no-one in thread 
was able to pin-point root-cause of that bug.

I believe that my commit solves exactly the bug mentioned there by 
Bandan and therefore the third patch in series now removes the setting 
of KVM_REQ_EVENT on nested_run_pending=true from code.

Adding to CC the guys who worked on the above mentioned patch. They 
probably have smart things to say :)
(+Wanpeng Li, +Bandan Das)
>
> Paolo
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  2017-11-05 14:07 ` [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 Liran Alon
@ 2017-11-10 21:44   ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2017-11-10 21:44 UTC (permalink / raw)
  To: Liran Alon; +Cc: pbonzini, kvm, jmattson, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 16:07+0200, Liran Alon:
> vmx_check_nested_events() should return -EBUSY only in case there is a
> pending L1 event which requires a VMExit from L2 to L1 but such a
> VMExit is currently blocked. Such VMExits are blocked either
> because nested_run_pending=1 or an event was reinjected to L2.
> vmx_check_nested_events() should return 0 in case there are no
> pending L1 events which requires a VMExit from L2 to L1 or if
> a VMExit from L2 to L1 was done internally.
> 
> However, upstream commit which introduced blocking in case an event was
> reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events
> injection")) contains a bug: It returns -EBUSY even if there are no
> pending L1 events which requires VMExit from L2 to L1.
> 
> This commit fix this issue.
> 
> Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection")
> 
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---

Applied, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  2017-11-05 14:07 ` [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending Liran Alon
@ 2017-11-10 23:26   ` Paolo Bonzini
  2017-11-11  1:44     ` Liran Alon
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-10 23:26 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm; +Cc: jmattson, idan.brown, Konrad Rzeszutek Wilk

On 05/11/2017 15:07, Liran Alon wrote:
> +	/*
> +	 * If we reinjected a previous event,
> +	 * don't consider a new pending event
> +	 */
> +	if (kvm_event_needs_reinjection(vcpu))
> +		return 0;
> +

Could you end up with

                        WARN_ON(vcpu->arch.exception.pending);

in vcpu_enter_guest after returning 0 here?

Maybe it would be safer to return a non-zero value so that the caller
sets req_immediate_exit = true.  But I haven't really thought through
the consequences.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  2017-11-10 23:26   ` Paolo Bonzini
@ 2017-11-11  1:44     ` Liran Alon
  2017-11-13  8:38       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Liran Alon @ 2017-11-11  1:44 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: jmattson, idan.brown, Konrad Rzeszutek Wilk

On 11/11/17 01:26, Paolo Bonzini wrote:
> On 05/11/2017 15:07, Liran Alon wrote:
>> +	/*
>> +	 * If we reinjected a previous event,
>> +	 * don't consider a new pending event
>> +	 */
>> +	if (kvm_event_needs_reinjection(vcpu))
>> +		return 0;
>> +
>
> Could you end up with
>
>                          WARN_ON(vcpu->arch.exception.pending);
>
> in vcpu_enter_guest after returning 0 here?
>
> Maybe it would be safer to return a non-zero value so that the caller
> sets req_immediate_exit = true.  But I haven't really thought through
> the consequences.

The only difference before and after this patch *should* have been that 
now if L1 has pending event (as specified by vmx_check_nested_events()), 
a value of -EBUSY will be returned and an immediate-exit will be 
requested. Even if a re-injection had occurred.
If that is not the case, previous code and this code should return 0 on 
the exact same cases.

*However*, if exception.pending=true and 
nmi_injected/interrupt.pending=true, then previous code would have 
continued inject_pending_event() while this code would return too-soon.
Indeed triggering the above mentioned warning.

Therefore I think you have found here a bug that was missed in the 
review-chain from some reason and wasn't observed in tests... Will 
investigate how tests didn't caught this. Sorry for this.
It seems that this patch approach would have worked on version before 
commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet 
been injected") but afterwards will break...

Will fix and re-run all tests.

Thanks,

-Liran

>
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  2017-11-11  1:44     ` Liran Alon
@ 2017-11-13  8:38       ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-13  8:38 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm; +Cc: jmattson, idan.brown, Konrad Rzeszutek Wilk

On 11/11/2017 02:44, Liran Alon wrote:
> 
> Therefore I think you have found here a bug that was missed in the
> review-chain from some reason and wasn't observed in tests... Will
> investigate how tests didn't caught this. Sorry for this.

Probably because the exception.pending case is very rare right now.  It
will become more common however if I can get rid of
kvm_inject_page_fault_nested.  This more or less also answers your
question as to why this patch is related to the removal of
kvm_inject_page_fault_nested.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-13  8:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-05 14:07 [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Liran Alon
2017-11-05 14:07 ` [PATCH 1/4] KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 Liran Alon
2017-11-10 21:44   ` Radim Krčmář
2017-11-05 14:07 ` [PATCH 2/4] KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending Liran Alon
2017-11-10 23:26   ` Paolo Bonzini
2017-11-11  1:44     ` Liran Alon
2017-11-13  8:38       ` Paolo Bonzini
2017-11-05 14:07 ` [PATCH 3/4] KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending Liran Alon
2017-11-05 14:07 ` [PATCH 4/4] KVM: nVMX: APICv: Always sync PIR to Virtual-APIC-Page on processing KVM_REQ_EVENT Liran Alon
2017-11-07 14:25 ` [PATCH 0/4] KVM: nVMX: Fix pending event injection on nested scenarios Paolo Bonzini
2017-11-08  0:40   ` Liran Alon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.