linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running
@ 2023-08-08 23:31 Sean Christopherson
  2023-08-08 23:31 ` [PATCH 1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-08 23:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, dengqiao . joey, Alejandro Jimenez,
	Joao Martins, Maxim Levitsky, Suravee Suthikulpanit

Fix a bug where KVM doesn't set the pCPU affinity for running vCPUs when
updating IRTE routing.  Not setting the pCPU means the IOMMU will signal
the wrong pCPU's doorbell until the vCPU goes through a put+load cycle.

I waffled for far too long between making this one patch or two.  Moving
the lock doesn't make all that much sense as a standalone patch, but in the
end, I decided that isolating the locking change would be useful in the
unlikely event that it breaks something.  If anyone feels strongly about
making this a single patch, I have no objection to squashing these together.

Sean Christopherson (2):
  KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID
    entry
  KVM: SVM: Set target pCPU during IRTE update if target vCPU is running

 arch/x86/kvm/svm/avic.c | 59 +++++++++++++++++++++++++++++++++++------
 1 file changed, 51 insertions(+), 8 deletions(-)


base-commit: 240f736891887939571854bd6d734b6c9291f22e
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
  2023-08-08 23:31 [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running Sean Christopherson
@ 2023-08-08 23:31 ` Sean Christopherson
  2023-08-08 23:31 ` [PATCH 2/2] KVM: SVM: Set target pCPU during IRTE update if target vCPU is running Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-08 23:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, dengqiao . joey, Alejandro Jimenez,
	Joao Martins, Maxim Levitsky, Suravee Suthikulpanit

Hoist the acquisition of ir_list_lock from avic_update_iommu_vcpu_affinity()
to its two callers, avic_vcpu_load() and avic_vcpu_put(), specifically to
encapsulate the write to the vCPU's entry in the AVIC Physical ID table.
This will allow a future fix to pull information from the Physical ID entry
when updating the IRTE, without potentially consuming stale information,
i.e. without racing with the vCPU being (un)loaded.

Add a comment to call out that ir_list_lock does NOT protect against
multiple writers, specifically that reading the Physical ID entry in
avic_vcpu_put() outside of the lock is safe.

To preserve some semblance of independence from ir_list_lock, keep the
READ_ONCE() in avic_vcpu_load() even though acuiring the spinlock
effectively ensures the load(s) will be generated after acquiring the
lock.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cfc8ab773025..8e041b215ddb 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -986,10 +986,11 @@ static inline int
 avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 {
 	int ret = 0;
-	unsigned long flags;
 	struct amd_svm_iommu_ir *ir;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	lockdep_assert_held(&svm->ir_list_lock);
+
 	if (!kvm_arch_has_assigned_device(vcpu->kvm))
 		return 0;
 
@@ -997,19 +998,15 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 	 * Here, we go through the per-vcpu ir_list to update all existing
 	 * interrupt remapping table entry targeting this vcpu.
 	 */
-	spin_lock_irqsave(&svm->ir_list_lock, flags);
-
 	if (list_empty(&svm->ir_list))
-		goto out;
+		return 0;
 
 	list_for_each_entry(ir, &svm->ir_list, node) {
 		ret = amd_iommu_update_ga(cpu, r, ir->data);
 		if (ret)
-			break;
+			return ret;
 	}
-out:
-	spin_unlock_irqrestore(&svm->ir_list_lock, flags);
-	return ret;
+	return 0;
 }
 
 void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -1017,6 +1014,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	u64 entry;
 	int h_physical_id = kvm_cpu_get_apicid(cpu);
 	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long flags;
 
 	lockdep_assert_preemption_disabled();
 
@@ -1033,6 +1031,8 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (kvm_vcpu_is_blocking(vcpu))
 		return;
 
+	spin_lock_irqsave(&svm->ir_list_lock, flags);
+
 	entry = READ_ONCE(*(svm->avic_physical_id_cache));
 	WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
 
@@ -1042,25 +1042,40 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
 	avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
+
+	spin_unlock_irqrestore(&svm->ir_list_lock, flags);
 }
 
 void avic_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	u64 entry;
 	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long flags;
 
 	lockdep_assert_preemption_disabled();
 
+	/*
+	 * Note, reading the Physical ID entry outside of ir_list_lock is safe
+	 * as only the pCPU that has loaded (or is loading) the vCPU is allowed
+	 * to modify the entry, and preemption is disabled.  I.e. the vCPU
+	 * can't be scheduled out and thus avic_vcpu_{put,load}() can't run
+	 * recursively.
+	 */
 	entry = READ_ONCE(*(svm->avic_physical_id_cache));
 
 	/* Nothing to do if IsRunning == '0' due to vCPU blocking. */
 	if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
 		return;
 
+	spin_lock_irqsave(&svm->ir_list_lock, flags);
+
 	avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
 
 	entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
 	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+
+	spin_unlock_irqrestore(&svm->ir_list_lock, flags);
+
 }
 
 void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
  2023-08-08 23:31 [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running Sean Christopherson
  2023-08-08 23:31 ` [PATCH 1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry Sean Christopherson
@ 2023-08-08 23:31 ` Sean Christopherson
  2023-08-09 10:30 ` [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if " Joao Martins
  2023-08-18  0:12 ` Sean Christopherson
  3 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-08 23:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, dengqiao . joey, Alejandro Jimenez,
	Joao Martins, Maxim Levitsky, Suravee Suthikulpanit

Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU.  KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delivered to the wrong pCPU until the vCPU goes through a
put+load cycle (which might very well "never" happen for the lifetime of
the VM), ultimately resulting in lost IRQs in the guest.

To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock.  Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.

Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey@bytedance.com>
Cc: stable@vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8e041b215ddb..2092db892d7d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -791,6 +791,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 	int ret = 0;
 	unsigned long flags;
 	struct amd_svm_iommu_ir *ir;
+	u64 entry;
 
 	/**
 	 * In some cases, the existing irte is updated and re-set,
@@ -824,6 +825,18 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 	ir->data = pi->ir_data;
 
 	spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+	/*
+	 * Update the target pCPU for IOMMU doorbells if the vCPU is running.
+	 * If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
+	 * will update the pCPU info when the vCPU awkened and/or scheduled in.
+	 * See also avic_vcpu_load().
+	 */
+	entry = READ_ONCE(*(svm->avic_physical_id_cache));
+	if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+		amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
+				    true, pi->ir_data);
+
 	list_add(&ir->node, &svm->ir_list);
 	spin_unlock_irqrestore(&svm->ir_list_lock, flags);
 out:
@@ -1031,6 +1044,13 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (kvm_vcpu_is_blocking(vcpu))
 		return;
 
+	/*
+	 * Grab the per-vCPU interrupt remapping lock even if the VM doesn't
+	 * _currently_ have assigned devices, as that can change.  Holding
+	 * ir_list_lock ensures that either svm_ir_list_add() will consume
+	 * up-to-date entry information, or that this task will wait until
+	 * svm_ir_list_add() completes to set the new target pCPU.
+	 */
 	spin_lock_irqsave(&svm->ir_list_lock, flags);
 
 	entry = READ_ONCE(*(svm->avic_physical_id_cache));
@@ -1067,6 +1087,14 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
 	if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
 		return;
 
+	/*
+	 * Take and hold the per-vCPU interrupt remapping lock while updating
+	 * the Physical ID entry even though the lock doesn't protect against
+	 * multiple writers (see above).  Holding ir_list_lock ensures that
+	 * either svm_ir_list_add() will consume up-to-date entry information,
+	 * or that this task will wait until svm_ir_list_add() completes to
+	 * mark the vCPU as not running.
+	 */
 	spin_lock_irqsave(&svm->ir_list_lock, flags);
 
 	avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running
  2023-08-08 23:31 [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running Sean Christopherson
  2023-08-08 23:31 ` [PATCH 1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry Sean Christopherson
  2023-08-08 23:31 ` [PATCH 2/2] KVM: SVM: Set target pCPU during IRTE update if target vCPU is running Sean Christopherson
@ 2023-08-09 10:30 ` Joao Martins
  2023-08-09 14:23   ` Sean Christopherson
  2023-08-18  0:12 ` Sean Christopherson
  3 siblings, 1 reply; 7+ messages in thread
From: Joao Martins @ 2023-08-09 10:30 UTC (permalink / raw)
  To: Alejandro Jimenez, Sean Christopherson
  Cc: kvm, linux-kernel, dengqiao . joey, Maxim Levitsky,
	Suravee Suthikulpanit, Paolo Bonzini

On 09/08/2023 00:31, Sean Christopherson wrote:
> Fix a bug where KVM doesn't set the pCPU affinity for running vCPUs when
> updating IRTE routing.  Not setting the pCPU means the IOMMU will signal
> the wrong pCPU's doorbell until the vCPU goes through a put+load cycle.
> 

Or also framed as an inefficiency that we depend on the GALog (for a running
vCPU) for interrupt delivery until the put+load cycle happens. I don't think I
ever reproduced the missed interrupt case in our stress testing.

> I waffled for far too long between making this one patch or two.  Moving
> the lock doesn't make all that much sense as a standalone patch, but in the
> end, I decided that isolating the locking change would be useful in the
> unlikely event that it breaks something.  If anyone feels strongly about
> making this a single patch, I have no objection to squashing these together.
> 
IMHO, as two patches looks better;

For what is worth:

	Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

I think Alejandro had reported his testing as successful here:

https://lore.kernel.org/kvm/caefe41b-2736-3df9-b5cd-b81fc4c30ff0@oracle.com/

OTOH, he didn't give the Tested-by explicitly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running
  2023-08-09 10:30 ` [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if " Joao Martins
@ 2023-08-09 14:23   ` Sean Christopherson
  2023-08-09 14:58     ` Alejandro Jimenez
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2023-08-09 14:23 UTC (permalink / raw)
  To: Joao Martins
  Cc: Alejandro Jimenez, kvm, linux-kernel, dengqiao . joey,
	Maxim Levitsky, Suravee Suthikulpanit, Paolo Bonzini

On Wed, Aug 09, 2023, Joao Martins wrote:
> On 09/08/2023 00:31, Sean Christopherson wrote:
> > Fix a bug where KVM doesn't set the pCPU affinity for running vCPUs when
> > updating IRTE routing.  Not setting the pCPU means the IOMMU will signal
> > the wrong pCPU's doorbell until the vCPU goes through a put+load cycle.
> > 
> 
> Or also framed as an inefficiency that we depend on the GALog (for a running
> vCPU) for interrupt delivery until the put+load cycle happens. I don't think I
> ever reproduced the missed interrupt case in our stress testing.

Ah, I'll reword the changelog in patch 2 if this only delays the interrupt instead
of dropping it entirely.

> > I waffled for far too long between making this one patch or two.  Moving
> > the lock doesn't make all that much sense as a standalone patch, but in the
> > end, I decided that isolating the locking change would be useful in the
> > unlikely event that it breaks something.  If anyone feels strongly about
> > making this a single patch, I have no objection to squashing these together.
> > 
> IMHO, as two patches looks better;
> 
> For what is worth:
> 
> 	Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> 
> I think Alejandro had reported his testing as successful here:
> 
> https://lore.kernel.org/kvm/caefe41b-2736-3df9-b5cd-b81fc4c30ff0@oracle.com/
> 
> OTOH, he didn't give the Tested-by explicitly

Yeah, I almost asked for a Tested-by, but figured it would be just as easy to
post the patches.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running
  2023-08-09 14:23   ` Sean Christopherson
@ 2023-08-09 14:58     ` Alejandro Jimenez
  0 siblings, 0 replies; 7+ messages in thread
From: Alejandro Jimenez @ 2023-08-09 14:58 UTC (permalink / raw)
  To: Sean Christopherson, Joao Martins
  Cc: kvm, linux-kernel, dengqiao . joey, Maxim Levitsky,
	Suravee Suthikulpanit, Paolo Bonzini



On 8/9/23 10:23, Sean Christopherson wrote:
> On Wed, Aug 09, 2023, Joao Martins wrote:
>> On 09/08/2023 00:31, Sean Christopherson wrote:
>>> Fix a bug where KVM doesn't set the pCPU affinity for running vCPUs when
>>> updating IRTE routing.  Not setting the pCPU means the IOMMU will signal
>>> the wrong pCPU's doorbell until the vCPU goes through a put+load cycle.
>>>
>>
>> Or also framed as an inefficiency that we depend on the GALog (for a running
>> vCPU) for interrupt delivery until the put+load cycle happens. I don't think I
>> ever reproduced the missed interrupt case in our stress testing.

Right, I was never able to see any dropped interrupts when testing the baseline host kernel with "idle=poll" on my guest.
Though I didn't reproduce Dengqiao's setup exactly e.g. they imply using isolcpus in the host kernel params.

> 
> Ah, I'll reword the changelog in patch 2 if this only delays the interrupt instead
> of dropping it entirely.
> 
>>> I waffled for far too long between making this one patch or two.  Moving
>>> the lock doesn't make all that much sense as a standalone patch, but in the
>>> end, I decided that isolating the locking change would be useful in the
>>> unlikely event that it breaks something.  If anyone feels strongly about
>>> making this a single patch, I have no objection to squashing these together.
>>>
>> IMHO, as two patches looks better;
>>
>> For what is worth:
>>
>> 	Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
>>
>> I think Alejandro had reported his testing as successful here:
>>
>> https://lore.kernel.org/kvm/caefe41b-2736-3df9-b5cd-b81fc4c30ff0@oracle.com/
>>
>> OTOH, he didn't give the Tested-by explicitly
> 
> Yeah, I almost asked for a Tested-by, but figured it would be just as easy to
> post the patches.

I was hoping to find more time to test with other configs (i.e. more closely matching the original environment).
That being said, besides the positive results from the validation script mentioned earlier, I have been using the
patched kernel to launch guests in my setup for quite some time now without encountering any issues. From my side:

Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running
  2023-08-08 23:31 [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running Sean Christopherson
                   ` (2 preceding siblings ...)
  2023-08-09 10:30 ` [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if " Joao Martins
@ 2023-08-18  0:12 ` Sean Christopherson
  3 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2023-08-18  0:12 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, dengqiao . joey, Alejandro Jimenez,
	Joao Martins, Maxim Levitsky, Suravee Suthikulpanit

On Tue, 08 Aug 2023 16:31:30 -0700, Sean Christopherson wrote:
> Fix a bug where KVM doesn't set the pCPU affinity for running vCPUs when
> updating IRTE routing.  Not setting the pCPU means the IOMMU will signal
> the wrong pCPU's doorbell until the vCPU goes through a put+load cycle.
> 
> I waffled for far too long between making this one patch or two.  Moving
> the lock doesn't make all that much sense as a standalone patch, but in the
> end, I decided that isolating the locking change would be useful in the
> unlikely event that it breaks something.  If anyone feels strongly about
> making this a single patch, I have no objection to squashing these together.
> 
> [...]

Applied to kvm-x86 svm, with a massaged changelog in patch 2 to state that the
bug "only" delays IRQs.

[1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
      https://github.com/kvm-x86/linux/commit/4c08e737f056
[2/2] KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
      https://github.com/kvm-x86/linux/commit/f3cebc75e742

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-18  0:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-08 23:31 [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if vCPU is running Sean Christopherson
2023-08-08 23:31 ` [PATCH 1/2] KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry Sean Christopherson
2023-08-08 23:31 ` [PATCH 2/2] KVM: SVM: Set target pCPU during IRTE update if target vCPU is running Sean Christopherson
2023-08-09 10:30 ` [PATCH 0/2] KVM: SVM: Set pCPU during IRTE update if " Joao Martins
2023-08-09 14:23   ` Sean Christopherson
2023-08-09 14:58     ` Alejandro Jimenez
2023-08-18  0:12 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).