On Fri, 2023-01-13 at 10:20 +0100, Paolo Bonzini wrote: > On 1/13/23 08:18, Boqun Feng wrote: > > On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote: > > > On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote: > > > > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote: > > > > > > > > > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections > > > > > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_ > > > > > +  be taken inside a kvm->srcu read-side critical section; that is, the > > > > > +  following is broken:: > > > > > + > > > > > +      srcu_read_lock(&kvm->srcu); > > > > > +      mutex_lock(&kvm->slots_lock); > > > > > + > > > > > > > > "Don't tell me. Tell lockdep!" > > > > > > > > Did we conclude in > > > > https://lore.kernel.org/kvm/122f38e724aae9ae8ab474233da1ba19760c20d2.camel@infradead.org/ > > > > that lockdep *could* be clever enough to catch a violation of this rule > > > > by itself? > > > > > > > > The general case of the rule would be that 'if mutex A is taken in a > > > > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is > > > > held shall be verboten'. And vice versa. > > > > > > > > If we can make lockdep catch it automatically, yay! > > > > > > Unfortunately, lockdep needs to see a writer to complain, and that patch > > > just adds a reader.  And adding that writer would make lockdep complain > > > about things that are perfectly fine.  It should be possible to make > > > lockdep catch this sort of thing, but from what I can see, doing so > > > requires modifications to lockdep itself. > > > > > > > Please see if the follow patchset works: > > > >         https://lore.kernel.org/lkml/20230113065955.815667-1-boqun.feng@gmail.com > > > > "I have been called. I must answer. Always." ;-) Amazing! Thank you, Boqun! > It's missing an important testcase; if it passes (does not warn), then > it should work: I think it does. I started with kvm/master from https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=master so that lockdep didn't find other things to complain about first. I then: • Dropped the last two commits, putting us back to using kvm->lock and removing the dummy mutex lock that would have told lockdep that it's a (different) problem. • I then added Boqun's three commits • Reverted a79b53aa so that the srcu_synchronize() deadlock returns • Couldn't reproduce with xen_shinfo_test, so added Michal's test from https://lore.kernel.org/kvm/15599980-bd2e-b6c2-1479-e1eef02da0b5@rbox.co/ The resulting tree is at https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-srcu-lockdep Now when I run tools/testing/selftests/kvm/x86_64/deadlocks_test I do see lockdep complain about it (shown below; I have a cosmetic nit/request to make). If I restore the evtchn_reset fix then it also complains about kvm_xen_set_evtchn() vs. kvm_xen_kvm_set_attr(), and if I restore the xen_lock fix from the tip of kvm/master then Michal's deadlock_test passes and there are no complaints. So everything seems to be working as it should... *except* for the fact that I don't quite understand why xen_shinfo_test didn't trigger the warning. Michal, I guess you already worked that out when you came up with your deadlock-test instead... is there something we should add to xen_shinfo_test that would mean it *would* have triggered? I even tried this: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) if (init_srcu_struct(&kvm->irq_srcu)) goto out_err_no_irq_srcu; +#ifdef CONFIG_LOCKDEP + /* + * Ensure lockdep knows that it's not permitted to lock kvm->lock + * from a SRCU read section on kvm->srcu. + */ + mutex_lock(&kvm->lock); + synchronize_srcu(&kvm->srcu); + mutex_unlock(&kvm->lock); +#endif + refcount_set(&kvm->users_count, 1); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { for (j = 0; j < 2; j++) { > [ 845.474169] ====================================================== > [ 845.474170] WARNING: possible circular locking dependency detected > [ 845.474172] 6.2.0-rc3+ #1025 Tainted: G E > [ 845.474175] ------------------------------------------------------ > [ 845.474176] deadlocks_test/22767 is trying to acquire lock: > [ 845.474178] ffffc9000ba4b868 (&kvm->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x170 > [ 845.474192] > but task is already holding lock: > [ 845.474194] ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm] > [ 845.474319] > which lock already depends on the new lock. So the above part is good, and clearly tells me it was synchronize_srcu() > [ 845.474320] > the existing dependency chain (in reverse order) is: > [ 845.474322] > -> #1 (&kvm->lock){+.+.}-{3:3}: > [ 845.474327] __lock_acquire+0x4b4/0x940 > [ 845.474333] lock_acquire.part.0+0xa8/0x210 > [ 845.474337] __mutex_lock+0x94/0x920 > [ 845.474344] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm] > [ 845.474437] kvm_xen_inject_timer_irqs+0x79/0xa0 [kvm] > [ 845.474529] vcpu_run+0x20c/0x450 [kvm] > [ 845.474618] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm] > [ 845.474707] kvm_vcpu_ioctl+0x279/0x700 [kvm] > [ 845.474783] __x64_sys_ioctl+0x8a/0xc0 > [ 845.474787] do_syscall_64+0x3b/0x90 > [ 845.474796] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 845.474804] > -> #0 (&kvm->srcu){.+.+}-{0:0}: > [ 845.474809] check_prev_add+0x8f/0xc20 > [ 845.474812] validate_chain+0x3ba/0x450 > [ 845.474814] __lock_acquire+0x4b4/0x940 > [ 845.474817] lock_sync+0x99/0x110 > [ 845.474820] __synchronize_srcu+0x4d/0x170 > [ 845.474824] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm] . [ 845.474907] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm] > [ 845.474997] kvm_vm_ioctl+0x5ca/0x800 [kvm] > [ 845.475075] __x64_sys_ioctl+0x8a/0xc0 > [ 845.475079] do_syscall_64+0x3b/0x90 > [ 845.475084] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 845.475089] > other info that might help us debug this: > > [ 845.475091] Possible unsafe locking scenario: > > [ 845.475092] CPU0 CPU1 > [ 845.475093] ---- ---- > [ 845.475095] lock(&kvm->lock); > [ 845.475098] lock(&kvm->srcu); > [ 845.475101] lock(&kvm->lock); > [ 845.475103] lock(&kvm->srcu); > [ 845.475106] > *** DEADLOCK *** But is there any chance the above could say 'synchronize_srcu' and 'read_lock_srcu' in the appropriate places? > [ 845.475108] 1 lock held by deadlocks_test/22767: > [ 845.475110] #0: ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm] > [ 845.475200] > stack backtrace: > [ 845.475202] CPU: 10 PID: 22767 Comm: deadlocks_test Tainted: G E 6.2.0-rc3+ #1025 > [ 845.475206] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 > [ 845.475208] Call Trace: > [ 845.475210] > [ 845.475214] dump_stack_lvl+0x56/0x73 > [ 845.475221] check_noncircular+0x102/0x120 > [ 845.475229] ? check_noncircular+0x7f/0x120 > [ 845.475236] check_prev_add+0x8f/0xc20 > [ 845.475239] ? add_chain_cache+0x10b/0x2d0 > [ 845.475244] validate_chain+0x3ba/0x450 > [ 845.475249] __lock_acquire+0x4b4/0x940 > [ 845.475253] ? __synchronize_srcu+0x5/0x170 > [ 845.475258] lock_sync+0x99/0x110 > [ 845.475261] ? __synchronize_srcu+0x5/0x170 > [ 845.475265] __synchronize_srcu+0x4d/0x170 ? [ 845.475269] ? mark_held_locks+0x49/0x80 > [ 845.475272] ? _raw_spin_unlock_irqrestore+0x2d/0x60 > [ 845.475278] ? __pfx_read_tsc+0x10/0x10 > [ 845.475286] ? ktime_get_mono_fast_ns+0x3d/0x90 > [ 845.475292] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm] > [ 845.475380] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm] > [ 845.475472] ? __lock_acquire+0x4b4/0x940 > [ 845.475485] kvm_vm_ioctl+0x5ca/0x800 [kvm] > [ 845.475566] ? lockdep_unregister_key+0x76/0x110 > [ 845.475575] __x64_sys_ioctl+0x8a/0xc0 > [ 845.475579] do_syscall_64+0x3b/0x90 > [ 845.475586] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 845.475591] RIP: 0033:0x7f79de23fd1b > [ 845.475595] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48 > [ 845.475598] RSP: 002b:00007f79ddff7c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 845.475602] RAX: ffffffffffffffda RBX: 00007f79ddff8640 RCX: 00007f79de23fd1b > [ 845.475605] RDX: 00007f79ddff7ca0 RSI: 000000004188aec6 RDI: 0000000000000004 > [ 845.475607] RBP: 00007f79ddff85c0 R08: 0000000000000000 R09: 00007fffceb1ff2f > [ 845.475609] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f79ddff7ca0 > [ 845.475611] R13: 0000000001c322a0 R14: 00007f79de2a05f0 R15: 0000000000000000 > [ 845.475617]