linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
@ 2023-03-13 11:10 Yan Zhao
  2023-03-24 23:13 ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhao @ 2023-03-13 11:10 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: seanjc, pbonzini, Yan Zhao

Split a single per_cpu lock on wakeup_list into a sched_in lock and
a sched_out lock to break the possible circular locking dependency
reported by lockdep.

The lockdep complains about "possible circular locking dependency
detected".

Chain exists of:
   &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu)

  Possible unsafe locking scenario:

        CPU0                CPU1
        ----                ----
   lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
                            lock(&rq->__lock);
                            lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
   lock(&p->pi_lock);

  *** DEADLOCK ***

path irq,
        sysvec_kvm_posted_intr_wakeup_ipi() --> pi_wakeup_handler()
        --> kvm_vcpu_wake_up() --> try_to_wake_up(),
        the lock order is
        &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock.

path sched_out,
        vcpu_block() --> schedule() --> kvm_sched_out() --> vmx_vcpu_put()
        --> vmx_vcpu_pi_put() --> pi_enable_wakeup_handler(),
        the lock order is
        &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu).

However, it is found out that path irq and sched_out are not racing
because: path irq is in interrupt context, path sched_out is in interrupt
disabled context, at the same pcpu as path irq.

Consider path sched_out is the very path that tells lockdep the lock
ordering: &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu),
it's desired for path irq not to hold the same per cpu lock as path
sched_out.

So, in the patch, a single wakeup_list lock is divided into a sched_in lock
and a sched_out lock.
- "path sched_out": add vcpu on pcpu (irq disabled)
              It takes sched_out lock.

- "path irq": read vcpu list on pcpu (irq context, running on the same pcpu
              as "path sched_out")
              It only takes sched_in lock.

- "path sched_in": delete vcpu on previous pCPU.
                  (irq disabled, running on the same or different pCPU
                  as "path irq")
                  It takes sched_in and sched_out lock as it can race
                  with the other two paths.

The lock ordering after this patch are:
- &p->pi_lock --> &rq->__lock -->
  &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
- &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) -->
  &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
- &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) --> &p->pi_lock

Currently, &rq->__lock is not held in "path sched_in".
However, if in future "path sched_in" takes &p->pi_lock or &rq->__lock,
lockdep is able to detect and warn in that case.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: path sched_out and path irq does not race, path sched_in does not
take &rq->__lock]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
v3:
removed unnecessary blank line changes and delete a improper
description.

v2:
switch from rcu to two raw_spin_locks. as rcu may not let the irq
handler see list removal timely. (sean)
https://lore.kernel.org/all/20230313094753.8345-1-yan.y.zhao@intel.com/

v1:
https://lore.kernel.org/all/20230310155955.29652-1-yan.y.zhao@intel.com/
---
 arch/x86/kvm/vmx/posted_intr.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 94c38bea60e7..f4a5fcb751c1 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -23,13 +23,19 @@
  */
 static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
 /*
- * Protect the per-CPU list with a per-CPU spinlock to handle task migration.
+ * Protect the per-CPU list with two per-CPU spinlocks to handle task migration.
+ * IRQs must be disabled when taking the two locks, otherwise deadlock will
+ * occur if a wakeup IRQ arrives and attempts to acquire the locks.
+ * ->sched_out() path before a vCPU blocking takes the "out lock", which will not
+ * be taken in the wakeup IRQ handler that running at the same pCPU as the
+ * ->sched_out() path.
  * When a blocking vCPU is awakened _and_ migrated to a different pCPU, the
  * ->sched_in() path will need to take the vCPU off the list of the _previous_
- * CPU.  IRQs must be disabled when taking this lock, otherwise deadlock will
- * occur if a wakeup IRQ arrives and attempts to acquire the lock.
+ * CPU. It takes both "in lock" and "out lock" to take care of list racing of the
+ * _previous_ CPU.
  */
-static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
+static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock_in);
+static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock_out);
 
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
@@ -89,9 +95,11 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 * current pCPU if the task was migrated.
 	 */
 	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
-		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock_in, vcpu->cpu));
+		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock_out, vcpu->cpu));
 		list_del(&vmx->pi_wakeup_list);
-		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock_out, vcpu->cpu));
+		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock_in, vcpu->cpu));
 	}
 
 	dest = cpu_physical_id(cpu);
@@ -152,10 +160,10 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
 
 	local_irq_save(flags);
 
-	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock_out, vcpu->cpu));
 	list_add_tail(&vmx->pi_wakeup_list,
 		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
-	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock_out, vcpu->cpu));
 
 	WARN(pi_desc->sn, "PI descriptor SN field set before blocking");
 
@@ -219,7 +227,7 @@ void pi_wakeup_handler(void)
 {
 	int cpu = smp_processor_id();
 	struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
-	raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
+	raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu);
 	struct vcpu_vmx *vmx;
 
 	raw_spin_lock(spinlock);
@@ -234,7 +242,8 @@ void pi_wakeup_handler(void)
 void __init pi_init_cpu(int cpu)
 {
 	INIT_LIST_HEAD(&per_cpu(wakeup_vcpus_on_cpu, cpu));
-	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu));
+	raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu));
 }
 
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu)

base-commit: 89400df96a7570b651404bbc3b7afe627c52a192
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-13 11:10 [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup Yan Zhao
@ 2023-03-24 23:13 ` Sean Christopherson
  2023-03-29  1:53   ` Yan Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2023-03-24 23:13 UTC (permalink / raw)
  To: Yan Zhao; +Cc: kvm, linux-kernel, pbonzini

On Mon, Mar 13, 2023, Yan Zhao wrote:
> The lock ordering after this patch are:
> - &p->pi_lock --> &rq->__lock -->
>   &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
> - &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) -->
>   &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
> - &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) --> &p->pi_lock
> 
> Currently, &rq->__lock is not held in "path sched_in".
> However, if in future "path sched_in" takes &p->pi_lock or &rq->__lock,
> lockdep is able to detect and warn in that case.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> [sean: path sched_out and path irq does not race, path sched_in does not
> take &rq->__lock]

But there's no actual deadlock, right?  I have zero interest in fixing a lockdep
false positive by making functional changes to KVM.  I am definitely open to making
changes to somehow let lockdep know what's going on, but complicating KVM's actual
functionality is too much.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-24 23:13 ` Sean Christopherson
@ 2023-03-29  1:53   ` Yan Zhao
  2023-03-29 11:51     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhao @ 2023-03-29  1:53 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel, pbonzini

On Fri, Mar 24, 2023 at 04:13:37PM -0700, Sean Christopherson wrote:
> On Mon, Mar 13, 2023, Yan Zhao wrote:
> > The lock ordering after this patch are:
> > - &p->pi_lock --> &rq->__lock -->
> >   &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
> > - &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) -->
> >   &per_cpu(wakeup_vcpus_on_cpu_lock_out, cpu)
> > - &per_cpu(wakeup_vcpus_on_cpu_lock_in, cpu) --> &p->pi_lock
> > 
> > Currently, &rq->__lock is not held in "path sched_in".
> > However, if in future "path sched_in" takes &p->pi_lock or &rq->__lock,
> > lockdep is able to detect and warn in that case.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > [sean: path sched_out and path irq does not race, path sched_in does not
> > take &rq->__lock]
> 
> But there's no actual deadlock, right?  I have zero interest in fixing a lockdep
> false positive by making functional changes to KVM.  I am definitely open to making
> changes to somehow let lockdep know what's going on, but complicating KVM's actual
> functionality is too much.
Yes, there's no actual deadlock currently.

But without fixing this issue, debug_locks will be set to false along
with below messages printed. Then lockdep will be turned off and any
other lock detections like lockdep_assert_held()... will not print
warning even when it's obviously violated. 

[  118.873499] ======================================================
[  118.880413] WARNING: possible circular locking dependency detected
[  118.887325] 6.2.0-rc5+ #600 Not tainted
[  118.891613] ------------------------------------------------------
[  118.898519] swapper/5/0 is trying to acquire lock:
[  118.903869] ffff88810f5cac90 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0xbb/0x510
[  118.912624]
[  118.912624] but task is already holding lock:
[  118.919138] ffff88885f7fdab8 (&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)){-...}-{2:2}, at: pi_wakeup_handler+0x36/0x80 [kvm_intel]
[  118.931962]
[  118.931962] which lock already depends on the new lock.

Any suggestion?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-29  1:53   ` Yan Zhao
@ 2023-03-29 11:51     ` Paolo Bonzini
  2023-03-30  9:56       ` Yan Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2023-03-29 11:51 UTC (permalink / raw)
  To: Yan Zhao, Sean Christopherson; +Cc: kvm, linux-kernel

On 3/29/23 03:53, Yan Zhao wrote:
> Yes, there's no actual deadlock currently.
> 
> But without fixing this issue, debug_locks will be set to false along
> with below messages printed. Then lockdep will be turned off and any
> other lock detections like lockdep_assert_held()... will not print
> warning even when it's obviously violated.

Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to 
the sched_out path?

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-29 11:51     ` Paolo Bonzini
@ 2023-03-30  9:56       ` Yan Zhao
  2023-03-30 18:14         ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhao @ 2023-03-30  9:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sean Christopherson, kvm, linux-kernel

On Wed, Mar 29, 2023 at 01:51:23PM +0200, Paolo Bonzini wrote:
> On 3/29/23 03:53, Yan Zhao wrote:
> > Yes, there's no actual deadlock currently.
> > 
> > But without fixing this issue, debug_locks will be set to false along
> > with below messages printed. Then lockdep will be turned off and any
> > other lock detections like lockdep_assert_held()... will not print
> > warning even when it's obviously violated.
> 
> Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to the
> sched_out path?

Yes, thanks for the suggestion!
This can avoid this warning of "possible circular locking dependency".

I tried it like this:
- in sched_out path:
  raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);

- in irq and sched_in paths:
  raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));

But I have a concern:
If sched_in path removes vcpu A from wakeup list of its previous pcpu A,
and at the mean time, sched_out path adds vcpu B to the wakeup list of
pcpu A, the sched_in and sched_out paths should race for the same
subclass of lock.
But if sched_in path only holds subclass 0, and sched_out path holds
subclass 1, then lockdep would not warn of "possible circular locking
dependency" if someone made a change as below in sched_in path.

if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
            raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
            list_del(&vmx->pi_wakeup_list);
+            raw_spin_lock(&current->pi_lock);
+            raw_spin_unlock(&current->pi_lock);
            raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
}

While with v3 of this patch (sched_in path holds both out_lock and in_lock),
lockdep is still able to warn about this issue.


Thanks
Yan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-30  9:56       ` Yan Zhao
@ 2023-03-30 18:14         ` Sean Christopherson
  2023-03-31  0:06           ` Yan Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2023-03-30 18:14 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Paolo Bonzini, kvm, linux-kernel

On Thu, Mar 30, 2023, Yan Zhao wrote:
> On Wed, Mar 29, 2023 at 01:51:23PM +0200, Paolo Bonzini wrote:
> > On 3/29/23 03:53, Yan Zhao wrote:
> > > Yes, there's no actual deadlock currently.
> > > 
> > > But without fixing this issue, debug_locks will be set to false along
> > > with below messages printed. Then lockdep will be turned off and any
> > > other lock detections like lockdep_assert_held()... will not print
> > > warning even when it's obviously violated.
> > 
> > Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to the
> > sched_out path?
> 
> Yes, thanks for the suggestion!
> This can avoid this warning of "possible circular locking dependency".
> 
> I tried it like this:
> - in sched_out path:
>   raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
> 
> - in irq and sched_in paths:
>   raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> 
> But I have a concern:
> If sched_in path removes vcpu A from wakeup list of its previous pcpu A,
> and at the mean time, sched_out path adds vcpu B to the wakeup list of
> pcpu A, the sched_in and sched_out paths should race for the same
> subclass of lock.
> But if sched_in path only holds subclass 0, and sched_out path holds
> subclass 1, then lockdep would not warn of "possible circular locking
> dependency" if someone made a change as below in sched_in path.
> 
> if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
>             raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>             list_del(&vmx->pi_wakeup_list);
> +            raw_spin_lock(&current->pi_lock);
> +            raw_spin_unlock(&current->pi_lock);
>             raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> }
> 
> While with v3 of this patch (sched_in path holds both out_lock and in_lock),
> lockdep is still able to warn about this issue.

Couldn't we just add a manual assertion?  That'd also be a good location for a
comment to document all of this, and to clarify that current->pi_lock is a
completely different lock that has nothing to do with posted interrupts.

It's not foolproof, but any patches that substantially touch this code need a
ton of scrutiny as the scheduling interactions are gnarly, i.e. IMO a deadlock
bug sneaking in is highly unlikely.

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 94c38bea60e7..19325a10e42f 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -90,6 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
         */
        if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
                raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+               lockdep_assert_not_held(&current->pi_lock);
                list_del(&vmx->pi_wakeup_list);
                raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
        }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-30 18:14         ` Sean Christopherson
@ 2023-03-31  0:06           ` Yan Zhao
  2023-04-10 17:30             ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhao @ 2023-03-31  0:06 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

On Thu, Mar 30, 2023 at 11:14:27AM -0700, Sean Christopherson wrote:
> On Thu, Mar 30, 2023, Yan Zhao wrote:
> > On Wed, Mar 29, 2023 at 01:51:23PM +0200, Paolo Bonzini wrote:
> > > On 3/29/23 03:53, Yan Zhao wrote:
> > > > Yes, there's no actual deadlock currently.
> > > > 
> > > > But without fixing this issue, debug_locks will be set to false along
> > > > with below messages printed. Then lockdep will be turned off and any
> > > > other lock detections like lockdep_assert_held()... will not print
> > > > warning even when it's obviously violated.
> > > 
> > > Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to the
> > > sched_out path?
> > 
> > Yes, thanks for the suggestion!
> > This can avoid this warning of "possible circular locking dependency".
> > 
> > I tried it like this:
> > - in sched_out path:
> >   raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
> > 
> > - in irq and sched_in paths:
> >   raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > 
> > But I have a concern:
> > If sched_in path removes vcpu A from wakeup list of its previous pcpu A,
> > and at the mean time, sched_out path adds vcpu B to the wakeup list of
> > pcpu A, the sched_in and sched_out paths should race for the same
> > subclass of lock.
> > But if sched_in path only holds subclass 0, and sched_out path holds
> > subclass 1, then lockdep would not warn of "possible circular locking
> > dependency" if someone made a change as below in sched_in path.
> > 
> > if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> >             raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> >             list_del(&vmx->pi_wakeup_list);
> > +            raw_spin_lock(&current->pi_lock);
> > +            raw_spin_unlock(&current->pi_lock);
> >             raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > }
> > 
> > While with v3 of this patch (sched_in path holds both out_lock and in_lock),
> > lockdep is still able to warn about this issue.
> 
> Couldn't we just add a manual assertion?  That'd also be a good location for a
> comment to document all of this, and to clarify that current->pi_lock is a
> completely different lock that has nothing to do with posted interrupts.
> 
> It's not foolproof, but any patches that substantially touch this code need a
> ton of scrutiny as the scheduling interactions are gnarly, i.e. IMO a deadlock
> bug sneaking in is highly unlikely.
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 94c38bea60e7..19325a10e42f 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -90,6 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>          */
>         if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
>                 raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +               lockdep_assert_not_held(&current->pi_lock);
>                 list_del(&vmx->pi_wakeup_list);
>                 raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>         }
Hmm...No. It's not about "current->pi_lock" cannot be held, it's about
the lock ordering.
In sched_out thread, the lock ordering is
"current->pi_lock" --> "rq->__lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
then in sched_in thread, if the lock ordering is
"per_cpu(wakeup_vcpus_on_cpu_lock, cpu)" --> "current->pi_lock",
circular locking dependency will happen.
while if the lock ordering in sched_in thread is
"current->pi_lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
it's fine!

If sched_out thread and sched_in thread actually should hold the same
subclass of lock, we can't fool the lockdep just to let it shut up.
And, we may not be able to list or document out all potential locks that cannot
be held inside the "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)", right?

BTW, could you tell me why you think v3 complicates KVM's functionality?
It just splits a single lock into two sub locks, and let irq path only
takes in_lock, sched_out path only takes out_lock, while sched_in path takes
both in_lock and out_lock.
IMHO, it does not make any functional change to KVM code.
Maybe it's because the commit message is not well written and gave people a wrong
impression that the logic changes a lot?


Thanks
Yan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup
  2023-03-31  0:06           ` Yan Zhao
@ 2023-04-10 17:30             ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2023-04-10 17:30 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Paolo Bonzini, kvm, linux-kernel

On Fri, Mar 31, 2023, Yan Zhao wrote:
> On Thu, Mar 30, 2023 at 11:14:27AM -0700, Sean Christopherson wrote:
> > On Thu, Mar 30, 2023, Yan Zhao wrote:
> > > While with v3 of this patch (sched_in path holds both out_lock and in_lock),
> > > lockdep is still able to warn about this issue.
> > 
> > Couldn't we just add a manual assertion?  That'd also be a good location for a
> > comment to document all of this, and to clarify that current->pi_lock is a
> > completely different lock that has nothing to do with posted interrupts.
> > 
> > It's not foolproof, but any patches that substantially touch this code need a
> > ton of scrutiny as the scheduling interactions are gnarly, i.e. IMO a deadlock
> > bug sneaking in is highly unlikely.
> > 
> > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> > index 94c38bea60e7..19325a10e42f 100644
> > --- a/arch/x86/kvm/vmx/posted_intr.c
> > +++ b/arch/x86/kvm/vmx/posted_intr.c
> > @@ -90,6 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> >          */
> >         if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> >                 raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > +               lockdep_assert_not_held(&current->pi_lock);
> >                 list_del(&vmx->pi_wakeup_list);
> >                 raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> >         }
> Hmm...No. It's not about "current->pi_lock" cannot be held, it's about
> the lock ordering.
> In sched_out thread, the lock ordering is
> "current->pi_lock" --> "rq->__lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
> then in sched_in thread, if the lock ordering is
> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)" --> "current->pi_lock",
> circular locking dependency will happen.
> while if the lock ordering in sched_in thread is
> "current->pi_lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
> it's fine!

Right, but IIUC, neither ordering happens today.  In other words, the lockdep
assertion isn't defining a hard rule, rather it's enforcing an assumption that KVM
relies on to avoid a potential deadlock.

> If sched_out thread and sched_in thread actually should hold the same
> subclass of lock, we can't fool the lockdep just to let it shut up.
> And, we may not be able to list or document out all potential locks that cannot
> be held inside the "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)", right?

Eh, IMO this is a non-issue.  It's a raw_spin_lock() in an IRQs disabled section
that wraps a single line of benign code.  If it's really concerning, we could add
a scary comment like this.

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 94c38bea60e7..a7ec0371aeca 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -87,6 +87,12 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
         * If the vCPU was waiting for wakeup, remove the vCPU from the wakeup
         * list of the _previous_ pCPU, which will not be the same as the
         * current pCPU if the task was migrated.
+        *
+        * Stating the obvious, do not under any circumstance let this path
+        * acquire a different lock while holding the per-CPU lock.  To avoid
+        * false postives in lockdep, KVM uses different lockdep subclasses for
+        * vmx_vcpu_pi_put() vs vmx_vcpu_pi_load(), i.e. lockdep may not be
+        * to detect circular dependencies and other issues.
         */
        if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
                raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 
> BTW, could you tell me why you think v3 complicates KVM's functionality?
> It just splits a single lock into two sub locks, and let irq path only

Heh, "just".  

> takes in_lock, sched_out path only takes out_lock, while sched_in path takes
> both in_lock and out_lock.
> IMHO, it does not make any functional change to KVM code.
> Maybe it's because the commit message is not well written and gave people a wrong
> impression that the logic changes a lot?

No, this is not a problem that can be solved by any changelog.  My very strong
objection to having two separate locks is that when reading the code, it's not
remotely obvious why two locks are needed, or that the code is correct.  Adding
copious amounts of documentation/comments can help, but it'll never fully solve
the problem that having two locks simply isn't intuitive/straightforward.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-10 17:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 11:10 [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup Yan Zhao
2023-03-24 23:13 ` Sean Christopherson
2023-03-29  1:53   ` Yan Zhao
2023-03-29 11:51     ` Paolo Bonzini
2023-03-30  9:56       ` Yan Zhao
2023-03-30 18:14         ` Sean Christopherson
2023-03-31  0:06           ` Yan Zhao
2023-04-10 17:30             ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).