All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
@ 2017-11-09 18:27 Liran Alon
  2017-11-10 17:06 ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Liran Alon @ 2017-11-09 18:27 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan

Consider the following scenario:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. CPU B is currently executing L2 guest.
3. vmx_deliver_nested_posted_interrupt() calls
kvm_vcpu_trigger_posted_interrupt() which will note that
vcpu->mode == IN_GUEST_MODE.
4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
IPI, CPU B exits from L2 to L0 during event-delivery
(valid IDT-vectoring-info).
5. CPU A now sends the physical IPI. The IPI is received in host and
it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
CPU B continues to run in L0 and reach vcpu_enter_guest(). As
KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
L2 guest.
7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
consumed at next L2 entry!

Another scenario to consider:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. Assume that before CPU A calls kvm_vcpu_trigger_posted_interrupt(),
CPU B is at L0 and is about to resume into L2. Further assume that it is
in vcpu_enter_guest() after check for KVM_REQ_EVENT.
3. At this point, CPU A calls kvm_vcpu_trigger_posted_interrupt() which
will note that vcpu->mode != IN_GUEST_MODE. Therefore, do nothing and
return false. Then, will set pi_pending=true and KVM_REQ_EVENT.
4. Now CPU B continue and resumes into L2 guest without processing
the posted-interrupt until next L2 entry!

To fix both issues, we just need to change
vmx_deliver_nested_posted_interrupt() to set pi_pending=true and
KVM_REQ_EVENT before calling kvm_vcpu_trigger_posted_interrupt().

It will fix first scenario by chaging step (6) to note that
KVM_REQ_EVENT and pi_pending=true and therefore process
nested posted-interrupt.

It will fix second scenario by two possible ways:
1. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B has changed
vcpu->mode to IN_GUEST_MODE, physical IPI will be sent and will be received
when CPU resumes into L2.
2. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B hasn't yet
changed vcpu->mode to IN_GUEST_MODE, then after CPU B will change
vcpu->mode it will call kvm_request_pending() which will return true and
therefore force another round of vcpu_enter_guest() which will note that
KVM_REQ_EVENT and pi_pending=true and therefore process nested
posted-interrupt.

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 95a01609d7ee..07cb2be269b1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5114,14 +5114,14 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 
 	if (is_guest_mode(vcpu) &&
 	    vector == vmx->nested.posted_intr_nv) {
-		/* the PIR and ON have been set by L1. */
-		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
 		/*
 		 * If a posted intr is not recognized by hardware,
 		 * we will accomplish it in the next vmentry.
 		 */
 		vmx->nested.pi_pending = true;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		/* the PIR and ON have been set by L1. */
+		kvm_vcpu_trigger_posted_interrupt(vcpu, true);
 		return 0;
 	}
 	return -1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-09 18:27 [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 Liran Alon
@ 2017-11-10 17:06 ` Paolo Bonzini
  2017-11-10 18:06   ` Radim Krčmář
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-11-10 17:06 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 09/11/2017 19:27, Liran Alon wrote:
> Consider the following scenario:
> 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
> to CPU B via virtual posted-interrupt mechanism.
> 2. CPU B is currently executing L2 guest.
> 3. vmx_deliver_nested_posted_interrupt() calls
> kvm_vcpu_trigger_posted_interrupt() which will note that
> vcpu->mode == IN_GUEST_MODE.
> 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
> IPI, CPU B exits from L2 to L0 during event-delivery
> (valid IDT-vectoring-info).
> 5. CPU A now sends the physical IPI. The IPI is received in host and
> it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
> 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
> CPU B continues to run in L0 and reach vcpu_enter_guest(). As
> KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
> L2 guest.
> 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
> it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
> consumed at next L2 entry!

The bug is real (great debugging!) but I think the fix is wrong.  The
basic issue is that we're not kicking the VCPU, so this should also fix it:

	/* the PIR and ON have been set by L1. */
	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
		/*
		 * If a posted intr is not recognized by hardware,
		 * we will accomplish it in the next vmentry.
		 */
		vmx->nested.pi_pending = true;
		kvm_make_request(KVM_REQ_EVENT, vcpu);
		kvm_vcpu_kick(vcpu);
	}

See the comments around the setting of IN_GUEST_MODE, introduced by
commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv
interrupt injection", 2017-02-15).

Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the
ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases.

> Another scenario to consider:
> 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
> to CPU B via virtual posted-interrupt mechanism.
> 2. Assume that before CPU A calls kvm_vcpu_trigger_posted_interrupt(),
> CPU B is at L0 and is about to resume into L2. Further assume that it is
> in vcpu_enter_guest() after check for KVM_REQ_EVENT.
> 3. At this point, CPU A calls kvm_vcpu_trigger_posted_interrupt() which
> will note that vcpu->mode != IN_GUEST_MODE. Therefore, do nothing and
> return false. Then, will set pi_pending=true and KVM_REQ_EVENT.
> 4. Now CPU B continue and resumes into L2 guest without processing
> the posted-interrupt until next L2 entry!

Adding a kick should fix this as well.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 17:06 ` Paolo Bonzini
@ 2017-11-10 18:06   ` Radim Krčmář
  2017-11-10 20:40     ` Liran Alon
  2017-11-10 21:37     ` Paolo Bonzini
  0 siblings, 2 replies; 15+ messages in thread
From: Radim Krčmář @ 2017-11-10 18:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liran Alon, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

2017-11-10 18:06+0100, Paolo Bonzini:
> On 09/11/2017 19:27, Liran Alon wrote:
> > Consider the following scenario:
> > 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
> > to CPU B via virtual posted-interrupt mechanism.
> > 2. CPU B is currently executing L2 guest.
> > 3. vmx_deliver_nested_posted_interrupt() calls
> > kvm_vcpu_trigger_posted_interrupt() which will note that
> > vcpu->mode == IN_GUEST_MODE.
> > 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
> > IPI, CPU B exits from L2 to L0 during event-delivery
> > (valid IDT-vectoring-info).
> > 5. CPU A now sends the physical IPI. The IPI is received in host and
> > it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
> > 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
> > CPU B continues to run in L0 and reach vcpu_enter_guest(). As
> > KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
> > L2 guest.
> > 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
> > it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
> > consumed at next L2 entry!
> 
> The bug is real (great debugging!) but I think the fix is wrong.  The
> basic issue is that we're not kicking the VCPU, so this should also fix it:
> 
> 	/* the PIR and ON have been set by L1. */
> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {

This would still fail on the exiting case.

If one VCPU was just after a VM exit, then the sender would see it
IN_GUEST_MODE, send the posted notification and return true, but the
notification would do nothing and we didn't set vmx->nested.pi_pending =
true, so the interrupt would not get handled until the next posted
notification or nested VM entry.

> 		/*
> 		 * If a posted intr is not recognized by hardware,
> 		 * we will accomplish it in the next vmentry.
> 		 */
> 		vmx->nested.pi_pending = true;
> 		kvm_make_request(KVM_REQ_EVENT, vcpu);
> 		kvm_vcpu_kick(vcpu);
> 	}
> 
> See the comments around the setting of IN_GUEST_MODE, introduced by
> commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv
> interrupt injection", 2017-02-15).
> 
> Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the
> ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases.

I think the fix is correct, but the code is using very awkward
synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT
on top.
If we checked vmx->nested.pi_pending after cli, we could get rid of
KVM_REQ_EVENT and if we want to support nested posted interrupts from
PCI devices, we should explicitly check for pi.on during VM entry.

Kicks do not seem necessary as the only case where we need the kick is
the same case where kvm_vcpu_trigger_posted_interrupt() should just
deliver the interrupt.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 18:06   ` Radim Krčmář
@ 2017-11-10 20:40     ` Liran Alon
  2017-11-10 21:24       ` Radim Krčmář
  2017-11-10 21:37     ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Liran Alon @ 2017-11-10 20:40 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan



On 10/11/17 20:06, Radim Krčmář wrote:
> 2017-11-10 18:06+0100, Paolo Bonzini:
>> On 09/11/2017 19:27, Liran Alon wrote:
>>> Consider the following scenario:
>>> 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
>>> to CPU B via virtual posted-interrupt mechanism.
>>> 2. CPU B is currently executing L2 guest.
>>> 3. vmx_deliver_nested_posted_interrupt() calls
>>> kvm_vcpu_trigger_posted_interrupt() which will note that
>>> vcpu->mode == IN_GUEST_MODE.
>>> 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
>>> IPI, CPU B exits from L2 to L0 during event-delivery
>>> (valid IDT-vectoring-info).
>>> 5. CPU A now sends the physical IPI. The IPI is received in host and
>>> it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
>>> 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
>>> CPU B continues to run in L0 and reach vcpu_enter_guest(). As
>>> KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
>>> L2 guest.
>>> 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
>>> it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
>>> consumed at next L2 entry!
>>
>> The bug is real (great debugging!) but I think the fix is wrong.  The
>> basic issue is that we're not kicking the VCPU, so this should also fix it:
>>
>> 	/* the PIR and ON have been set by L1. */
>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>
> This would still fail on the exiting case.
>
> If one VCPU was just after a VM exit, then the sender would see it
> IN_GUEST_MODE, send the posted notification and return true, but the
> notification would do nothing and we didn't set vmx->nested.pi_pending =
> true, so the interrupt would not get handled until the next posted
> notification or nested VM entry.
>
I agree. I was about to write to Paolo the exact same thing.
>> 		/*
>> 		 * If a posted intr is not recognized by hardware,
>> 		 * we will accomplish it in the next vmentry.
>> 		 */
>> 		vmx->nested.pi_pending = true;
>> 		kvm_make_request(KVM_REQ_EVENT, vcpu);
>> 		kvm_vcpu_kick(vcpu);
>> 	}
>>
>> See the comments around the setting of IN_GUEST_MODE, introduced by
>> commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv
>> interrupt injection", 2017-02-15).
>>
>> Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the
>> ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases.
>
> I think the fix is correct, but the code is using very awkward
> synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT
> on top.
> If we checked vmx->nested.pi_pending after cli, we could get rid of
> KVM_REQ_EVENT and if we want to support nested posted interrupts from
> PCI devices, we should explicitly check for pi.on during VM entry.
I like this suggestion. If I understand correctly, these are the changes 
I will do for v2 of this patch:

1. I Will change vmx_deliver_nested_posted_interrupt() to first set 
pi_pending=true and then call kvm_vcpu_trigger_posted_interrupt(). No 
need to set KVM_REQ_EVENT and no need to check 
kvm_vcpu_trigger_posted_interrupt() ret-val.

2. I will create a kvm_x86_ops->complete_nested_posted_interrupt() that 
will be called from vcpu_enter_guest() just after sync_pir_to_irr() is 
called but outside the "if" for kvm_lapic_enabled().

3. I will remove call to vmx_complete_nested_posted_interrupt() from 
vmx_check_nested_events(). That call-site had a bug anyway that I fixed 
in another patch-series I posted here. This change will make that patch 
(not the series) redundant. See redundant patch here:
https://marc.info/?l=kvm&m=150989090007281&w=2

I will prepare a v2 patch by this suggestion and send it.

Thanks!
-Liran
>
> Kicks do not seem necessary as the only case where we need the kick is
> the same case where kvm_vcpu_trigger_posted_interrupt() should just
> deliver the interrupt.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 20:40     ` Liran Alon
@ 2017-11-10 21:24       ` Radim Krčmář
  2017-11-10 21:30         ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Radim Krčmář @ 2017-11-10 21:24 UTC (permalink / raw)
  To: Liran Alon
  Cc: Paolo Bonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

2017-11-10 22:40+0200, Liran Alon:
> On 10/11/17 20:06, Radim Krčmář wrote:
> > 2017-11-10 18:06+0100, Paolo Bonzini:
> > > On 09/11/2017 19:27, Liran Alon wrote:
> > > 		/*
> > > 		 * If a posted intr is not recognized by hardware,
> > > 		 * we will accomplish it in the next vmentry.
> > > 		 */
> > > 		vmx->nested.pi_pending = true;
> > > 		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > > 		kvm_vcpu_kick(vcpu);
> > > 	}
> > > 
> > > See the comments around the setting of IN_GUEST_MODE, introduced by
> > > commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv
> > > interrupt injection", 2017-02-15).
> > > 
> > > Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the
> > > ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases.
> > 
> > I think the fix is correct, but the code is using very awkward
> > synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT
> > on top.
> > If we checked vmx->nested.pi_pending after cli, we could get rid of
> > KVM_REQ_EVENT and if we want to support nested posted interrupts from
> > PCI devices, we should explicitly check for pi.on during VM entry.
> I like this suggestion. If I understand correctly, these are the changes I
> will do for v2 of this patch:
> 
> 1. I Will change vmx_deliver_nested_posted_interrupt() to first set
> pi_pending=true and then call kvm_vcpu_trigger_posted_interrupt(). No need
> to set KVM_REQ_EVENT and no need to check
> kvm_vcpu_trigger_posted_interrupt() ret-val.
> 
> 2. I will create a kvm_x86_ops->complete_nested_posted_interrupt() that will
> be called from vcpu_enter_guest() just after sync_pir_to_irr() is called but
> outside the "if" for kvm_lapic_enabled().

I think that we can't have nested posted interrupts without
kvm_lapic_enabled() or kvm_vcpu_apicv_active(), so we could add nested
handling to sync_pir_to_irr and save x86 op.

vmx_complete_nested_posted_interrupt will need some changes as it is not
called with disabled interrupts -- kmap() and probably also
nested_mark_vmcs12_pages_dirty() should be done outside;  I guess that
other code is already be taking care of both, but better check.

> 3. I will remove call to vmx_complete_nested_posted_interrupt() from
> vmx_check_nested_events(). That call-site had a bug anyway that I fixed in
> another patch-series I posted here. This change will make that patch (not
> the series) redundant. See redundant patch here:
> https://marc.info/?l=kvm&m=150989090007281&w=2

I'll ignore that patch,

> I will prepare a v2 patch by this suggestion and send it.

thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 21:24       ` Radim Krčmář
@ 2017-11-10 21:30         ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-11-10 21:30 UTC (permalink / raw)
  To: Radim Krčmář, Liran Alon
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 10/11/2017 22:24, Radim Krčmář wrote:
> I think that we can't have nested posted interrupts without
> kvm_lapic_enabled() or kvm_vcpu_apicv_active(), so we could add nested
> handling to sync_pir_to_irr and save x86 op.

It could be just a single kvm_x86_ops->complete_posted_interrupts()
callback, since there are other callers of sync_pir_to_irr (and those
want it to do host LAPIC processing only).

Paolo

> vmx_complete_nested_posted_interrupt will need some changes as it is not
> called with disabled interrupts -- kmap() and probably also
> nested_mark_vmcs12_pages_dirty() should be done outside;  I guess that
> other code is already be taking care of both, but better check.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 18:06   ` Radim Krčmář
  2017-11-10 20:40     ` Liran Alon
@ 2017-11-10 21:37     ` Paolo Bonzini
  2017-11-10 22:30       ` Radim Krčmář
  2017-11-10 22:37       ` Liran Alon
  1 sibling, 2 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-11-10 21:37 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Liran Alon, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 10/11/2017 19:06, Radim Krčmář wrote:
>> 	/* the PIR and ON have been set by L1. */
>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
> This would still fail on the exiting case.
> 
> If one VCPU was just after a VM exit, then the sender would see it
> IN_GUEST_MODE, send the posted notification and return true, but the
> notification would do nothing

It would cause *something*---a vmexit because the vector doesn't match
the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
invoked from vmx_handle_external_intr.

Could we detect the vector in vmx_handle_external_intr and set
pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...

Thanks,

Paolo

> and we didn't set vmx->nested.pi_pending =
> true, so the interrupt would not get handled until the next posted
> notification or nested VM entry.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 21:37     ` Paolo Bonzini
@ 2017-11-10 22:30       ` Radim Krčmář
  2017-11-10 22:47         ` Liran Alon
  2017-11-10 22:37       ` Liran Alon
  1 sibling, 1 reply; 15+ messages in thread
From: Radim Krčmář @ 2017-11-10 22:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liran Alon, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

2017-11-10 22:37+0100, Paolo Bonzini:
> On 10/11/2017 19:06, Radim Krčmář wrote:
> >> 	/* the PIR and ON have been set by L1. */
> >> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
> > This would still fail on the exiting case.
> > 
> > If one VCPU was just after a VM exit, then the sender would see it
> > IN_GUEST_MODE, send the posted notification and return true, but the
> > notification would do nothing
> 
> It would cause *something*---a vmexit because the vector doesn't match
> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
> invoked from vmx_handle_external_intr.
> 
> Could we detect the vector in vmx_handle_external_intr and set
> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...

I think it is a trade-off.

We could call KVM from smp_kvm_posted_intr_nested_ipi(), which would
handle the case when the notification arrives after
vmx_handle_external_intr().

It doesn't performance, because we'd have to avoid a race on VM entry by
possibly needlessly kicking the guest after seeing that it went from
OUTSIDE_GUEST_MODE to IN_GUEST_MODE while we were setting the pending
bit.

But the behavior is slightly better because we can't be scanning PIR
twice for one notification. (If the notification was handled directly by
guest and then also by KVM due to the unconditionally set pending bit.)

Well, I better think about it with fresh mind ...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 21:37     ` Paolo Bonzini
  2017-11-10 22:30       ` Radim Krčmář
@ 2017-11-10 22:37       ` Liran Alon
  2017-11-16 17:37         ` Radim Krčmář
  1 sibling, 1 reply; 15+ messages in thread
From: Liran Alon @ 2017-11-10 22:37 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan



On 10/11/17 23:37, Paolo Bonzini wrote:
> On 10/11/2017 19:06, Radim Krčmář wrote:
>>> 	/* the PIR and ON have been set by L1. */
>>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>> This would still fail on the exiting case.
>>
>> If one VCPU was just after a VM exit, then the sender would see it
>> IN_GUEST_MODE, send the posted notification and return true, but the
>> notification would do nothing
>
> It would cause *something*---a vmexit because the vector doesn't match
> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
> invoked from vmx_handle_external_intr.
>
> Could we detect the vector in vmx_handle_external_intr and set
> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
>

I have actually thought about it before writing this patch. But have 
found an issue with this approach (which doesn't exist in this v1 patch 
and in Radim's suggestion for v2):

Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it 
sends the physical IPI, dest CPU exits from guest and continues in L0 
all the way until vcpu_enter_guest() and pass the part it checks for 
KVM_REQ_EVENT but before it disables interrupts. Then sender sends the 
physical IPI which is received in host-context and therefore runs 
smp_kvm_posted_intr_nested_ipi() which sets KVM_REQ_EVENT & 
pi_pending=true. But without Radim's suggestion of checking pi_pending 
after interrupts disabled, this is too late as dest CPU will not check 
these again until next exit from L2 guest.

I hope I didn't misunderstand something here :)

> Thanks,
>
> Paolo
>
>> and we didn't set vmx->nested.pi_pending =
>> true, so the interrupt would not get handled until the next posted
>> notification or nested VM entry.
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 22:30       ` Radim Krčmář
@ 2017-11-10 22:47         ` Liran Alon
  2017-11-10 22:51           ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Liran Alon @ 2017-11-10 22:47 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan



On 11/11/17 00:30, Radim Krčmář wrote:
> 2017-11-10 22:37+0100, Paolo Bonzini:
>> On 10/11/2017 19:06, Radim Krčmář wrote:
>>>> 	/* the PIR and ON have been set by L1. */
>>>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>>> This would still fail on the exiting case.
>>>
>>> If one VCPU was just after a VM exit, then the sender would see it
>>> IN_GUEST_MODE, send the posted notification and return true, but the
>>> notification would do nothing
>>
>> It would cause *something*---a vmexit because the vector doesn't match
>> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
>> invoked from vmx_handle_external_intr.
>>
>> Could we detect the vector in vmx_handle_external_intr and set
>> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
>> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
>
> I think it is a trade-off.
>
> We could call KVM from smp_kvm_posted_intr_nested_ipi(), which would
> handle the case when the notification arrives after
> vmx_handle_external_intr().
>
> It doesn't performance, because we'd have to avoid a race on VM entry by
> possibly needlessly kicking the guest after seeing that it went from
> OUTSIDE_GUEST_MODE to IN_GUEST_MODE while we were setting the pending
> bit.
>
> But the behavior is slightly better because we can't be scanning PIR
> twice for one notification. (If the notification was handled directly by
> guest and then also by KVM due to the unconditionally set pending bit.)
>
> Well, I better think about it with fresh mind ...
>

If notification was handled directly by guest, the CPU is suppose to 
clear POSTED_INTR_ON bit in pi_desc->control (bit 256 - Outstanding 
Notification).

In that case, even though vmx_complete_nested_posted_interrupt() will be 
called on next VMEntry, it will just set pi_pending=false and do nothing 
because of:
if (!pi_test_and_clear_on(vmx->nested.pi_desc))
     return;

Therefore, there should be no harm in unconditionally setting pi_pending 
bit and I think Radim's original suggestion should still work well.

-Liran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 22:47         ` Liran Alon
@ 2017-11-10 22:51           ` Paolo Bonzini
  2017-11-10 22:59             ` Liran Alon
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-11-10 22:51 UTC (permalink / raw)
  To: Liran Alon
  Cc: Radim Krčmář,
	kvm, jmattson, wanpeng li, idan brown, Krish Sadhukhan


Liran Alon <LIRAN.ALON@ORACLE.COM> wrote:
> On 10/11/17 23:37, Paolo Bonzini wrote:
> > On 10/11/2017 19:06, Radim Krčmář wrote:
> >>> 	/* the PIR and ON have been set by L1. */
> >>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
> >> This would still fail on the exiting case.
> >>
> >> If one VCPU was just after a VM exit, then the sender would see it
> >> IN_GUEST_MODE, send the posted notification and return true, but the
> >> notification would do nothing
> >
> > It would cause *something*---a vmexit because the vector doesn't match
> > the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
> > invoked from vmx_handle_external_intr.
> >
> > Could we detect the vector in vmx_handle_external_intr and set
> > pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
> > smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
> 
> Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it
> sends the physical IPI, dest CPU exits from guest and continues in L0
> all the way until vcpu_enter_guest() and pass the part it checks for
> KVM_REQ_EVENT but before it disables interrupts. Then sender sends the
> physical IPI which is received in host-context and therefore runs
> smp_kvm_posted_intr_nested_ipi() which sets KVM_REQ_EVENT &
> pi_pending=true. But without Radim's suggestion of checking pi_pending
> after interrupts disabled, this is too late as dest CPU will not check
> these again until next exit from L2 guest.

The destination CPU checks vcpu->requests before entering the guest, so this
case would be handled fine (see Documentation/virtual/kvm/vcpu-requests.rst
for an explanation of IN_GUEST_MODE, vcpu->requests, and PIR.ON).

Thanks,

Paolo

> I hope I didn't misunderstand something here :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 22:51           ` Paolo Bonzini
@ 2017-11-10 22:59             ` Liran Alon
  0 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2017-11-10 22:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	kvm, jmattson, wanpeng li, idan brown, Krish Sadhukhan



On 11/11/17 00:51, Paolo Bonzini wrote:
>
> Liran Alon <LIRAN.ALON@ORACLE.COM> wrote:
>> On 10/11/17 23:37, Paolo Bonzini wrote:
>>> On 10/11/2017 19:06, Radim Krčmář wrote:
>>>>> 	/* the PIR and ON have been set by L1. */
>>>>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>>>> This would still fail on the exiting case.
>>>>
>>>> If one VCPU was just after a VM exit, then the sender would see it
>>>> IN_GUEST_MODE, send the posted notification and return true, but the
>>>> notification would do nothing
>>>
>>> It would cause *something*---a vmexit because the vector doesn't match
>>> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
>>> invoked from vmx_handle_external_intr.
>>>
>>> Could we detect the vector in vmx_handle_external_intr and set
>>> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
>>> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
>>
>> Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it
>> sends the physical IPI, dest CPU exits from guest and continues in L0
>> all the way until vcpu_enter_guest() and pass the part it checks for
>> KVM_REQ_EVENT but before it disables interrupts. Then sender sends the
>> physical IPI which is received in host-context and therefore runs
>> smp_kvm_posted_intr_nested_ipi() which sets KVM_REQ_EVENT &
>> pi_pending=true. But without Radim's suggestion of checking pi_pending
>> after interrupts disabled, this is too late as dest CPU will not check
>> these again until next exit from L2 guest.
>
> The destination CPU checks vcpu->requests before entering the guest, so this
> case would be handled fine (see Documentation/virtual/kvm/vcpu-requests.rst
> for an explanation of IN_GUEST_MODE, vcpu->requests, and PIR.ON).
>

Oh I see. You are right. My bad.
So I think your suggestion also works and may be simpler but I would 
further think about it before sending a v2.

Thanks for clarifying this.
-Liran

> Thanks,
>
> Paolo
>
>> I hope I didn't misunderstand something here :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-10 22:37       ` Liran Alon
@ 2017-11-16 17:37         ` Radim Krčmář
  2017-11-16 18:36           ` Liran Alon
  0 siblings, 1 reply; 15+ messages in thread
From: Radim Krčmář @ 2017-11-16 17:37 UTC (permalink / raw)
  To: Liran Alon
  Cc: Paolo Bonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

2017-11-11 00:37+0200, Liran Alon:
> On 10/11/17 23:37, Paolo Bonzini wrote:
> > On 10/11/2017 19:06, Radim Krčmář wrote:
> > > > 	/* the PIR and ON have been set by L1. */
> > > > 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
> > > This would still fail on the exiting case.
> > > 
> > > If one VCPU was just after a VM exit, then the sender would see it
> > > IN_GUEST_MODE, send the posted notification and return true, but the
> > > notification would do nothing
> > 
> > It would cause *something*---a vmexit because the vector doesn't match
> > the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
> > invoked from vmx_handle_external_intr.
> > 
> > Could we detect the vector in vmx_handle_external_intr and set
> > pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
> > smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
> > 
> 
> I have actually thought about it before writing this patch. But have found
> an issue with this approach (which doesn't exist in this v1 patch and in
> Radim's suggestion for v2):
> 
> Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it sends
> the physical IPI, dest CPU exits from guest and continues in L0 all the way
> until vcpu_enter_guest() and pass the part it checks for KVM_REQ_EVENT but
> before it disables interrupts. Then sender sends the physical IPI which is
> received in host-context and therefore runs smp_kvm_posted_intr_nested_ipi()
> which sets KVM_REQ_EVENT & pi_pending=true. But without Radim's suggestion
> of checking pi_pending after interrupts disabled, this is too late as dest
> CPU will not check these again until next exit from L2 guest.
> 
> I hope I didn't misunderstand something here :)

kvm_request_pending() would notice KVM_REQ_EVENT and forces the VM entry
to restart, so that wouldn't be a problem.

I just realized another complication, though: when the sender looks at
IN_GUEST_MODE and before it sends IPI, the destination can reschedule to
a different VCPU => the IPI handler cannot use the 'current VCPU' and we
have to build a list of VCPUs potentially awaiting notification vector
for every PCPU, which makes it strictly worse than just looking directly
at the ON bit, IMO.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-16 17:37         ` Radim Krčmář
@ 2017-11-16 18:36           ` Liran Alon
  2017-11-16 19:47             ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Liran Alon @ 2017-11-16 18:36 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan



On 16/11/17 19:37, Radim Krčmář wrote:
> 2017-11-11 00:37+0200, Liran Alon:
>> On 10/11/17 23:37, Paolo Bonzini wrote:
>>> On 10/11/2017 19:06, Radim Krčmář wrote:
>>>>> 	/* the PIR and ON have been set by L1. */
>>>>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>>>> This would still fail on the exiting case.
>>>>
>>>> If one VCPU was just after a VM exit, then the sender would see it
>>>> IN_GUEST_MODE, send the posted notification and return true, but the
>>>> notification would do nothing
>>>
>>> It would cause *something*---a vmexit because the vector doesn't match
>>> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
>>> invoked from vmx_handle_external_intr.
>>>
>>> Could we detect the vector in vmx_handle_external_intr and set
>>> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
>>> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
>>>
>>
>> I have actually thought about it before writing this patch. But have found
>> an issue with this approach (which doesn't exist in this v1 patch and in
>> Radim's suggestion for v2):
>>
>> Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it sends
>> the physical IPI, dest CPU exits from guest and continues in L0 all the way
>> until vcpu_enter_guest() and pass the part it checks for KVM_REQ_EVENT but
>> before it disables interrupts. Then sender sends the physical IPI which is
>> received in host-context and therefore runs smp_kvm_posted_intr_nested_ipi()
>> which sets KVM_REQ_EVENT & pi_pending=true. But without Radim's suggestion
>> of checking pi_pending after interrupts disabled, this is too late as dest
>> CPU will not check these again until next exit from L2 guest.
>>
>> I hope I didn't misunderstand something here :)
>
> kvm_request_pending() would notice KVM_REQ_EVENT and forces the VM entry
> to restart, so that wouldn't be a problem.
>
> I just realized another complication, though: when the sender looks at
> IN_GUEST_MODE and before it sends IPI, the destination can reschedule to
> a different VCPU => the IPI handler cannot use the 'current VCPU' and we
> have to build a list of VCPUs potentially awaiting notification vector
> for every PCPU, which makes it strictly worse than just looking directly
> at the ON bit, IMO.
>

It's funny because I have actually tried to first implement Paolo's 
suggestion and noticed this exact issue when implementing the IPI 
handler at host-side.

Therefore, I decided to go with Radim's suggestion. This also makes 
nested posted-interrupt handling more symmetrical to non-nested 
posted-interrupt handling.

I will soon post here the v2 series after I will complete more internal 
review & tests on this series.

BTW, I was wondering why do we really need the vmx->nested.pi_pending.
Can't we just rely on vmx->nested.pi_desc ON bit that will be checked at 
start of vmx_complete_nested_posted_interrupt()?

Thanks.
-Liran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
  2017-11-16 18:36           ` Liran Alon
@ 2017-11-16 19:47             ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-11-16 19:47 UTC (permalink / raw)
  To: Liran Alon, Radim Krčmář
  Cc: kvm, jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 16/11/2017 19:36, Liran Alon wrote:
> BTW, I was wondering why do we really need the vmx->nested.pi_pending.
> Can't we just rely on vmx->nested.pi_desc ON bit that will be checked at
> start of vmx_complete_nested_posted_interrupt()?

Probably not, the ON bit should be enough.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-11-16 19:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 18:27 [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 Liran Alon
2017-11-10 17:06 ` Paolo Bonzini
2017-11-10 18:06   ` Radim Krčmář
2017-11-10 20:40     ` Liran Alon
2017-11-10 21:24       ` Radim Krčmář
2017-11-10 21:30         ` Paolo Bonzini
2017-11-10 21:37     ` Paolo Bonzini
2017-11-10 22:30       ` Radim Krčmář
2017-11-10 22:47         ` Liran Alon
2017-11-10 22:51           ` Paolo Bonzini
2017-11-10 22:59             ` Liran Alon
2017-11-10 22:37       ` Liran Alon
2017-11-16 17:37         ` Radim Krčmář
2017-11-16 18:36           ` Liran Alon
2017-11-16 19:47             ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.