On Fri, 2023-01-13 at 10:20 +0100, Paolo Bonzini wrote:
> On 1/13/23 08:18, Boqun Feng wrote:
> > On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> > > > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> > > > > 
> > > > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > > > > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> > > > > +  be taken inside a kvm->srcu read-side critical section; that is, the
> > > > > +  following is broken::
> > > > > +
> > > > > +      srcu_read_lock(&kvm->srcu);
> > > > > +      mutex_lock(&kvm->slots_lock);
> > > > > +
> > > > 
> > > > "Don't tell me. Tell lockdep!"
> > > > 
> > > > Did we conclude in
> > > > https://lore.kernel.org/kvm/122f38e724aae9ae8ab474233da1ba19760c20d2.camel@infradead.org/
> > > > that lockdep *could* be clever enough to catch a violation of this rule
> > > > by itself?
> > > > 
> > > > The general case of the rule would be that 'if mutex A is taken in a
> > > > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> > > > held shall be verboten'. And vice versa.
> > > > 
> > > > If we can make lockdep catch it automatically, yay!
> > > 
> > > Unfortunately, lockdep needs to see a writer to complain, and that patch
> > > just adds a reader.  And adding that writer would make lockdep complain
> > > about things that are perfectly fine.  It should be possible to make
> > > lockdep catch this sort of thing, but from what I can see, doing so
> > > requires modifications to lockdep itself.
> > > 
> > 
> > Please see if the follow patchset works:
> > 
> >         https://lore.kernel.org/lkml/20230113065955.815667-1-boqun.feng@gmail.com
> > 
> > "I have been called. I must answer. Always." ;-)

Amazing! Thank you, Boqun!

> It's missing an important testcase; if it passes (does not warn), then 
> it should work:

I think it does.

I started with kvm/master from
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=master so that
lockdep didn't find other things to complain about first. I then:

 • Dropped the last two commits, putting us back to using kvm->lock and
   removing the dummy mutex lock that would have told lockdep that it's
   a (different) problem.

 • I then added Boqun's three commits

 • Reverted a79b53aa so that the srcu_synchronize() deadlock returns

 • Couldn't reproduce with xen_shinfo_test, so added Michal's test from
   https://lore.kernel.org/kvm/15599980-bd2e-b6c2-1479-e1eef02da0b5@rbox.co/

The resulting tree is at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-srcu-lockdep


Now when I run tools/testing/selftests/kvm/x86_64/deadlocks_test I do
see lockdep complain about it (shown below; I have a cosmetic
nit/request to make). If I restore the evtchn_reset fix then it also
complains about kvm_xen_set_evtchn() vs. kvm_xen_kvm_set_attr(), and if
I restore the xen_lock fix from the tip of kvm/master then Michal's
deadlock_test passes and there are no complaints.

So everything seems to be working as it should... *except* for the fact
that I don't quite understand why xen_shinfo_test didn't trigger the
warning. Michal, I guess you already worked that out when you came up
with your deadlock-test instead... is there something we should add to
xen_shinfo_test that would mean it *would* have triggered? I even tried
this:

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
        if (init_srcu_struct(&kvm->irq_srcu))
                goto out_err_no_irq_srcu;
 
+#ifdef CONFIG_LOCKDEP
+       /*
+        * Ensure lockdep knows that it's not permitted to lock kvm->lock
+        * from a SRCU read section on kvm->srcu.
+        */
+       mutex_lock(&kvm->lock);
+       synchronize_srcu(&kvm->srcu);
+       mutex_unlock(&kvm->lock);
+#endif
+
        refcount_set(&kvm->users_count, 1);
        for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
                for (j = 0; j < 2; j++) {




> [  845.474169] ======================================================
> [  845.474170] WARNING: possible circular locking dependency detected
> [  845.474172] 6.2.0-rc3+ #1025 Tainted: G            E     
> [  845.474175] ------------------------------------------------------
> [  845.474176] deadlocks_test/22767 is trying to acquire lock:
> [  845.474178] ffffc9000ba4b868 (&kvm->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x170
> [  845.474192] 
>                but task is already holding lock:
> [  845.474194] ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> [  845.474319] 
>                which lock already depends on the new lock.

So the above part is good, and clearly tells me it was synchronize_srcu()

> [  845.474320] 
>                the existing dependency chain (in reverse order) is:
> [  845.474322] 
>                -> #1 (&kvm->lock){+.+.}-{3:3}:
> [  845.474327]        __lock_acquire+0x4b4/0x940
> [  845.474333]        lock_acquire.part.0+0xa8/0x210
> [  845.474337]        __mutex_lock+0x94/0x920
> [  845.474344]        kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
> [  845.474437]        kvm_xen_inject_timer_irqs+0x79/0xa0 [kvm]
> [  845.474529]        vcpu_run+0x20c/0x450 [kvm]
> [  845.474618]        kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
> [  845.474707]        kvm_vcpu_ioctl+0x279/0x700 [kvm]
> [  845.474783]        __x64_sys_ioctl+0x8a/0xc0
> [  845.474787]        do_syscall_64+0x3b/0x90
> [  845.474796]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  845.474804] 
>                -> #0 (&kvm->srcu){.+.+}-{0:0}:
> [  845.474809]        check_prev_add+0x8f/0xc20
> [  845.474812]        validate_chain+0x3ba/0x450
> [  845.474814]        __lock_acquire+0x4b4/0x940
> [  845.474817]        lock_sync+0x99/0x110
> [  845.474820]        __synchronize_srcu+0x4d/0x170
> [  845.474824]        kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
. [  845.474907]        kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> [  845.474997]        kvm_vm_ioctl+0x5ca/0x800 [kvm]
> [  845.475075]        __x64_sys_ioctl+0x8a/0xc0
> [  845.475079]        do_syscall_64+0x3b/0x90
> [  845.475084]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  845.475089] 
>                other info that might help us debug this:
>
> [  845.475091]  Possible unsafe locking scenario:
>
> [  845.475092]        CPU0                    CPU1
> [  845.475093]        ----                    ----
> [  845.475095]   lock(&kvm->lock);
> [  845.475098]                                lock(&kvm->srcu);
> [  845.475101]                                lock(&kvm->lock);
> [  845.475103]   lock(&kvm->srcu);
> [  845.475106] 
>                 *** DEADLOCK ***

But is there any chance the above could say 'synchronize_srcu' and
'read_lock_srcu' in the appropriate places?

> [  845.475108] 1 lock held by deadlocks_test/22767:
> [  845.475110]  #0: ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> [  845.475200] 
>                stack backtrace:
> [  845.475202] CPU: 10 PID: 22767 Comm: deadlocks_test Tainted: G            E      6.2.0-rc3+ #1025
> [  845.475206] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> [  845.475208] Call Trace:
> [  845.475210]  <TASK>
> [  845.475214]  dump_stack_lvl+0x56/0x73
> [  845.475221]  check_noncircular+0x102/0x120
> [  845.475229]  ? check_noncircular+0x7f/0x120
> [  845.475236]  check_prev_add+0x8f/0xc20
> [  845.475239]  ? add_chain_cache+0x10b/0x2d0
> [  845.475244]  validate_chain+0x3ba/0x450
> [  845.475249]  __lock_acquire+0x4b4/0x940
> [  845.475253]  ? __synchronize_srcu+0x5/0x170
> [  845.475258]  lock_sync+0x99/0x110
> [  845.475261]  ? __synchronize_srcu+0x5/0x170
> [  845.475265]  __synchronize_srcu+0x4d/0x170
? [  845.475269]  ? mark_held_locks+0x49/0x80
> [  845.475272]  ? _raw_spin_unlock_irqrestore+0x2d/0x60
> [  845.475278]  ? __pfx_read_tsc+0x10/0x10
> [  845.475286]  ? ktime_get_mono_fast_ns+0x3d/0x90
> [  845.475292]  kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> [  845.475380]  kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> [  845.475472]  ? __lock_acquire+0x4b4/0x940
> [  845.475485]  kvm_vm_ioctl+0x5ca/0x800 [kvm]
> [  845.475566]  ? lockdep_unregister_key+0x76/0x110
> [  845.475575]  __x64_sys_ioctl+0x8a/0xc0
> [  845.475579]  do_syscall_64+0x3b/0x90
> [  845.475586]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  845.475591] RIP: 0033:0x7f79de23fd1b
> [  845.475595] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
> [  845.475598] RSP: 002b:00007f79ddff7c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  845.475602] RAX: ffffffffffffffda RBX: 00007f79ddff8640 RCX: 00007f79de23fd1b
> [  845.475605] RDX: 00007f79ddff7ca0 RSI: 000000004188aec6 RDI: 0000000000000004
> [  845.475607] RBP: 00007f79ddff85c0 R08: 0000000000000000 R09: 00007fffceb1ff2f
> [  845.475609] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f79ddff7ca0
> [  845.475611] R13: 0000000001c322a0 R14: 00007f79de2a05f0 R15: 0000000000000000
> [  845.475617]  </TASK>