linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] KVM: VMX: Posted Interrupts fixes
@ 2019-11-11 17:20 Joao Martins
  2019-11-11 17:20 ` [PATCH v2 1/3] KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts Joao Martins
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Joao Martins @ 2019-11-11 17:20 UTC (permalink / raw)
  To: kvm
  Cc: Joao Martins, linux-kernel, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

Hey,

This mini-series addresses two bugs plus a small cleanup: 

 1) Considering PID.PIR as opposed to just PID.ON for pending interrupt checks
    in the direct yield code path;
 
 2) Not changing NDST in vmx_vcpu_pi_load(), which addresses a regression. 
    Prior to this, we would observe Win2K12 guests hanging momentarily.
 
 3) Small cleanup to streamline PIR checks with a common helper.

The cleanup is done last to simplify backports.

	Joao

v2:
* Fixed missing Tb/Rb tags;
* Fixed broken Sob chains (on first two patches)
* Only test PID.PIR if PID.SN=1 (third patch)

Joao Martins (3):
  KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
  KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
  KVM: VMX: Introduce pi_is_pir_empty() helper

 arch/x86/kvm/vmx/vmx.c | 21 +++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h | 11 +++++++++++
 2 files changed, 30 insertions(+), 2 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/3] KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
  2019-11-11 17:20 [PATCH v2 0/3] KVM: VMX: Posted Interrupts fixes Joao Martins
@ 2019-11-11 17:20 ` Joao Martins
  2019-11-11 17:20 ` [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU Joao Martins
  2019-11-11 17:20 ` [PATCH v2 3/3] KVM: VMX: Introduce pi_is_pir_empty() helper Joao Martins
  2 siblings, 0 replies; 6+ messages in thread
From: Joao Martins @ 2019-11-11 17:20 UTC (permalink / raw)
  To: kvm
  Cc: Joao Martins, linux-kernel, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

Commit 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
introduced vmx_dy_apicv_has_pending_interrupt() in order to determine
if a vCPU have a pending posted interrupt. This routine is used by
kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule
on pCPU instead of a vCPU doing busy loop.

vmx_dy_apicv_has_pending_interrupt() determines if a
vCPU has a pending posted interrupt solely based on PID.ON. However,
when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause
raised posted interrupts to only set bit in PID.PIR without setting
PID.ON (and without sending notification vector), as depicted in VT-d
manual section 5.2.3 "Interrupt-Posting Hardware Operation".

Therefore, checking PID.ON is insufficient to determine if a vCPU has
pending posted interrupts and instead we should also check if there is
some bit set on PID.PIR if PID.SN=1.

Fixes: 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
Reviewed-by: Jagannathan Raman <jag.raman@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
v2:
* Fixed broken Sob Chain;
* Add missing tags;
* Only test PID.PIR if PID.SN=1 and reflect that in commit message last
paragraph;
---
 arch/x86/kvm/vmx/vmx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 31ce6bc2c371..4c7d2935f7ec 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6141,7 +6141,11 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 
 static bool vmx_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
-	return pi_test_on(vcpu_to_pi_desc(vcpu));
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	return pi_test_on(pi_desc) ||
+		(pi_test_sn(pi_desc) &&
+		!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS));
 }
 
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
  2019-11-11 17:20 [PATCH v2 0/3] KVM: VMX: Posted Interrupts fixes Joao Martins
  2019-11-11 17:20 ` [PATCH v2 1/3] KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts Joao Martins
@ 2019-11-11 17:20 ` Joao Martins
  2019-11-19 11:36   ` Wanpeng Li
  2019-11-11 17:20 ` [PATCH v2 3/3] KVM: VMX: Introduce pi_is_pir_empty() helper Joao Martins
  2 siblings, 1 reply; 6+ messages in thread
From: Joao Martins @ 2019-11-11 17:20 UTC (permalink / raw)
  To: kvm
  Cc: Joao Martins, linux-kernel, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
(wakeup_handler()) is responsible to kick (unblock) any vCPU on that
linked list that now has pending posted interrupts.

While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
PID.SN but will also *overwrite PID.NDST to this different pCPU*.
Instead of keeping it with original pCPU which vCPU had entered block
phase on.

This results in an issue because when a posted interrupt is delivered, as
the wakeup_handler() will be executed and fail to find blocked vCPU on
its per pCPU linked list of all vCPUs that are blocked on this pCPU.
Which is due to the vCPU being placed on a *different* per pCPU
linked list i.e. the original pCPU in which it entered block phase.

The regression is introduced by commit c112b5f50232 ("KVM: x86:
Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
avoiding changing PID.NDST when loading a blocked vCPU.

Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
Tested-by: Nathan Ni <nathan.ni@oracle.com>
Co-developed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
v2:
* Fixed wrong Sob chain;
* Add missing Tb;
---
 arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  6 ++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4c7d2935f7ec..ccd06fdfbb76 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1274,6 +1274,18 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
 		return;
 
+	/*
+	 * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
+	 * PI.NDST: pi_post_block is the one expected to change PID.NDST and the
+	 * wakeup handler expects the vCPU to be on the blocked_vcpu_list that
+	 * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up
+	 * correctly.
+	 */
+	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) {
+		pi_clear_sn(pi_desc);
+		goto after_clear_sn;
+	}
+
 	/* The full case.  */
 	do {
 		old.control = new.control = pi_desc->control;
@@ -1289,6 +1301,8 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	} while (cmpxchg64(&pi_desc->control, old.control,
 			   new.control) != old.control);
 
+after_clear_sn:
+
 	/*
 	 * Clear SN before reading the bitmap.  The VT-d firmware
 	 * writes the bitmap and reads SN atomically (5.2.3 in the
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bee16687dc0b..1e32ab54fc2d 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -373,6 +373,12 @@ static inline void pi_clear_on(struct pi_desc *pi_desc)
 		(unsigned long *)&pi_desc->control);
 }
 
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+	clear_bit(POSTED_INTR_SN,
+		(unsigned long *)&pi_desc->control);
+}
+
 static inline int pi_test_on(struct pi_desc *pi_desc)
 {
 	return test_bit(POSTED_INTR_ON,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] KVM: VMX: Introduce pi_is_pir_empty() helper
  2019-11-11 17:20 [PATCH v2 0/3] KVM: VMX: Posted Interrupts fixes Joao Martins
  2019-11-11 17:20 ` [PATCH v2 1/3] KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts Joao Martins
  2019-11-11 17:20 ` [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU Joao Martins
@ 2019-11-11 17:20 ` Joao Martins
  2 siblings, 0 replies; 6+ messages in thread
From: Joao Martins @ 2019-11-11 17:20 UTC (permalink / raw)
  To: kvm
  Cc: Joao Martins, linux-kernel, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

Streamline the PID.PIR check and change its call sites to use
the newly added helper.

Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 ++---
 arch/x86/kvm/vmx/vmx.h | 5 +++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ccd06fdfbb76..74048629f0d6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1311,7 +1311,7 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 */
 	smp_mb__after_atomic();
 
-	if (!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS))
+	if (!pi_is_pir_empty(pi_desc))
 		pi_set_on(pi_desc);
 }
 
@@ -6158,8 +6158,7 @@ static bool vmx_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
 	return pi_test_on(pi_desc) ||
-		(pi_test_sn(pi_desc) &&
-		!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS));
+		(pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc));
 }
 
 static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1e32ab54fc2d..5a0f34b1e226 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -355,6 +355,11 @@ static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
 }
 
+static inline bool pi_is_pir_empty(struct pi_desc *pi_desc)
+{
+	return bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS);
+}
+
 static inline void pi_set_sn(struct pi_desc *pi_desc)
 {
 	set_bit(POSTED_INTR_SN,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
  2019-11-11 17:20 ` [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU Joao Martins
@ 2019-11-19 11:36   ` Wanpeng Li
  2019-11-19 14:44     ` Joao Martins
  0 siblings, 1 reply; 6+ messages in thread
From: Wanpeng Li @ 2019-11-19 11:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: kvm, LKML, Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

On Tue, 12 Nov 2019 at 01:23, Joao Martins <joao.m.martins@oracle.com> wrote:
>
> While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
> will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be

How can this happen? See the prepare_to_swait_exlusive() in
kvm_vcpu_block(), the task will be set in TASK_INTERRUPTIBLE state,
kvm_sched_out will set vcpu->preempted to true iff current->state ==
TASK_RUNNING.

    Wanpeng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
  2019-11-19 11:36   ` Wanpeng Li
@ 2019-11-19 14:44     ` Joao Martins
  0 siblings, 0 replies; 6+ messages in thread
From: Joao Martins @ 2019-11-19 14:44 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kvm, LKML, Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Liran Alon, Jag Raman

On 11/19/19 11:36 AM, Wanpeng Li wrote:
> On Tue, 12 Nov 2019 at 01:23, Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
>> will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
> 
> How can this happen? See the prepare_to_swait_exlusive() in
> kvm_vcpu_block(), the task will be set in TASK_INTERRUPTIBLE state,
> kvm_sched_out will set vcpu->preempted to true iff current->state ==
> TASK_RUNNING.

You're right.

But preemption (thus setting PID.SN) can still happen before vcpu_block(), or
otherwise scheduled out if we are already in vcpu_block() (with task in
TASK_INTERRUPTIBLE). The right term we should have written in that sentence
above would have been 'scheduled out' and drop the vcpu_block mention and it
would encompass both cases. A better sentence would perhaps be:

"While vCPU is blocked (or about to block) it may be scheduled out which will
cause vmx_vcpu_pi_put() to be called."

But setting or not preempted/PID.SN doesn't change the rest and was mentioned
for completeness.

	Joao

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-19 14:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 17:20 [PATCH v2 0/3] KVM: VMX: Posted Interrupts fixes Joao Martins
2019-11-11 17:20 ` [PATCH v2 1/3] KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts Joao Martins
2019-11-11 17:20 ` [PATCH v2 2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU Joao Martins
2019-11-19 11:36   ` Wanpeng Li
2019-11-19 14:44     ` Joao Martins
2019-11-11 17:20 ` [PATCH v2 3/3] KVM: VMX: Introduce pi_is_pir_empty() helper Joao Martins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).