Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

From: David Woodhouse <dwmw2@infradead.org>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	seanjc@google.com, Joel Fernandes <joel@joelfernandes.org>,
	Matthew Wilcox <willy@infradead.org>,
	Josh Triplett <josh@joshtriplett.org>,
	rcu@vger.kernel.org, Michal Luczaj <mhal@rbox.co>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] Documentation: kvm: fix SRCU locking order docs
Date: Fri, 13 Jan 2023 11:03:02 +0000	[thread overview]
Message-ID: <41fad1dc90c2bef4f2f17f1495c2f85105707d9f.camel@infradead.org> (raw)
In-Reply-To: <023e131b3de80c4bc2b6711804a4769466b90c6f.camel@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 8791 bytes --]

On Fri, 2023-01-13 at 10:33 +0000, David Woodhouse wrote:
> 
> So everything seems to be working as it should... *except* for the fact
> that I don't quite understand why xen_shinfo_test didn't trigger the
> warning. Michal, I guess you already worked that out when you came up
> with your deadlock-test instead... is there something we should add to
> xen_shinfo_test that would mean it *would* have triggered?

Got it. It only happens when kvm_xen_set_evtchn() takes the slow path
when kvm_xen_set_evtchn_fast() fails. Not utterly sure why that works
in your deadlock_test but I can make it happen in xen_shinfo_test just
by invalidating the GPC by changing the memslots:

--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -29,6 +29,9 @@
 #define DUMMY_REGION_GPA       (SHINFO_REGION_GPA + (3 * PAGE_SIZE))
 #define DUMMY_REGION_SLOT      11
 
+#define DUMMY_REGION_GPA_2     (SHINFO_REGION_GPA + (4 * PAGE_SIZE))
+#define DUMMY_REGION_SLOT_2    12
+
 #define SHINFO_ADDR    (SHINFO_REGION_GPA)
 #define VCPU_INFO_ADDR (SHINFO_REGION_GPA + 0x40)
 #define PVTIME_ADDR    (SHINFO_REGION_GPA + PAGE_SIZE)
@@ -765,6 +768,8 @@ int main(int argc, char *argv[])
 
                                if (verbose)
                                        printf("Testing guest EVTCHNOP_send direct to evtchn\n");
+                               vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+                                                           DUMMY_REGION_GPA_2, DUMMY_REGION_SLOT_2, 1, 0);
                                evtchn_irq_expected = true;
                                alarm(1);
                                break;



I did also need the trick below. I'll send patches properly, keeping
the fast path test and *adding* the slow one, instead of just changing
it as above.

I also validated that if I put back the evtchn_reset deadlock fix, and
the separate xen_lock which is currently the tip of kvm/master, it all
works without complaints (or deadlocks).

>  I even tried this:
> 
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>         if (init_srcu_struct(&kvm->irq_srcu))
>                 goto out_err_no_irq_srcu;
>  
> +#ifdef CONFIG_LOCKDEP
> +       /*
> +        * Ensure lockdep knows that it's not permitted to lock kvm->lock
> +        * from a SRCU read section on kvm->srcu.
> +        */
> +       mutex_lock(&kvm->lock);
> +       synchronize_srcu(&kvm->srcu);
> +       mutex_unlock(&kvm->lock);
> +#endif
> +
>         refcount_set(&kvm->users_count, 1);
>         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>                 for (j = 0; j < 2; j++) {
> 
> 

[   91.866348] ======================================================
[   91.866349] WARNING: possible circular locking dependency detected
[   91.866351] 6.2.0-rc3+ #1025 Tainted: G           OE     
[   91.866353] ------------------------------------------------------
[   91.866354] xen_shinfo_test/2938 is trying to acquire lock:
[   91.866356] ffffc9000179e3c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.866453] 
               but task is already holding lock:
[   91.866454] ffffc900017a7868 (&kvm->srcu){.+.+}-{0:0}, at: vcpu_enter_guest.constprop.0+0xa89/0x1270 [kvm]
[   91.866527] 
               which lock already depends on the new lock.

[   91.866528] 
               the existing dependency chain (in reverse order) is:
[   91.866529] 
               -> #1 (&kvm->srcu){.+.+}-{0:0}:
[   91.866532]        __lock_acquire+0x4b4/0x940
[   91.866537]        lock_sync+0x99/0x110
[   91.866540]        __synchronize_srcu+0x4d/0x170
[   91.866543]        kvm_create_vm+0x271/0x6e0 [kvm]
[   91.866621]        kvm_dev_ioctl+0x102/0x1c0 [kvm]
[   91.866694]        __x64_sys_ioctl+0x8a/0xc0
[   91.866697]        do_syscall_64+0x3b/0x90
[   91.866701]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   91.866707] 
               -> #0 (&kvm->lock){+.+.}-{3:3}:
[   91.866710]        check_prev_add+0x8f/0xc20
[   91.866712]        validate_chain+0x3ba/0x450
[   91.866714]        __lock_acquire+0x4b4/0x940
[   91.866716]        lock_acquire.part.0+0xa8/0x210
[   91.866717]        __mutex_lock+0x94/0x920
[   91.866721]        kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.866790]        kvm_xen_hcall_evtchn_send.constprop.0+0x138/0x1c0 [kvm]
[   91.866869]        kvm_xen_hypercall+0x475/0x5a0 [kvm]
[   91.866938]        vmx_handle_exit+0xe/0x50 [kvm_intel]
[   91.866955]        vcpu_enter_guest.constprop.0+0xb08/0x1270 [kvm]
[   91.867034]        vcpu_run+0x1bd/0x450 [kvm]
[   91.867100]        kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
[   91.867167]        kvm_vcpu_ioctl+0x279/0x700 [kvm]
[   91.867229]        __x64_sys_ioctl+0x8a/0xc0
[   91.867231]        do_syscall_64+0x3b/0x90
[   91.867235]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   91.867238] 
               other info that might help us debug this:

[   91.867239]  Possible unsafe locking scenario:

[   91.867240]        CPU0                    CPU1
[   91.867241]        ----                    ----
[   91.867242]   lock(&kvm->srcu);
[   91.867244]                                lock(&kvm->lock);
[   91.867246]                                lock(&kvm->srcu);
[   91.867248]   lock(&kvm->lock);
[   91.867249] 
                *** DEADLOCK ***

[   91.867250] 2 locks held by xen_shinfo_test/2938:
[   91.867252]  #0: ffff88815a8800b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x77/0x700 [kvm]
[   91.867318]  #1: ffffc900017a7868 (&kvm->srcu){.+.+}-{0:0}, at: vcpu_enter_guest.constprop.0+0xa89/0x1270 [kvm]
[   91.867387] 
               stack backtrace:
[   91.867389] CPU: 26 PID: 2938 Comm: xen_shinfo_test Tainted: G           OE      6.2.0-rc3+ #1025
[   91.867392] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[   91.867394] Call Trace:
[   91.867395]  <TASK>
[   91.867398]  dump_stack_lvl+0x56/0x73
[   91.867403]  check_noncircular+0x102/0x120
[   91.867409]  check_prev_add+0x8f/0xc20
[   91.867411]  ? add_chain_cache+0x10b/0x2d0
[   91.867415]  validate_chain+0x3ba/0x450
[   91.867418]  __lock_acquire+0x4b4/0x940
[   91.867421]  lock_acquire.part.0+0xa8/0x210
[   91.867424]  ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.867494]  ? rcu_read_lock_sched_held+0x43/0x70
[   91.867498]  ? lock_acquire+0x102/0x140
[   91.867501]  __mutex_lock+0x94/0x920
[   91.867505]  ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.867574]  ? find_held_lock+0x2b/0x80
[   91.867578]  ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.867647]  ? __lock_release+0x5f/0x170
[   91.867652]  ? kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.867721]  kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
[   91.867791]  kvm_xen_hcall_evtchn_send.constprop.0+0x138/0x1c0 [kvm]
[   91.867875]  kvm_xen_hypercall+0x475/0x5a0 [kvm]
[   91.867947]  ? rcu_read_lock_sched_held+0x43/0x70
[   91.867952]  vmx_handle_exit+0xe/0x50 [kvm_intel]
[   91.867966]  vcpu_enter_guest.constprop.0+0xb08/0x1270 [kvm]
[   91.868046]  ? lock_acquire.part.0+0xa8/0x210
[   91.868050]  ? vcpu_run+0x1bd/0x450 [kvm]
[   91.868117]  ? lock_acquire+0x102/0x140
[   91.868121]  vcpu_run+0x1bd/0x450 [kvm]
[   91.868189]  kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
[   91.868257]  kvm_vcpu_ioctl+0x279/0x700 [kvm]
[   91.868322]  ? get_cpu_entry_area+0xb/0x30
[   91.868327]  ? _raw_spin_unlock_irq+0x34/0x50
[   91.868330]  ? do_setitimer+0x190/0x1e0
[   91.868335]  __x64_sys_ioctl+0x8a/0xc0
[   91.868338]  do_syscall_64+0x3b/0x90
[   91.868341]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   91.868345] RIP: 0033:0x7f313103fd1b
[   91.868348] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
[   91.868350] RSP: 002b:00007ffcdc02dba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   91.868353] RAX: ffffffffffffffda RBX: 00007f31313d2000 RCX: 00007f313103fd1b
[   91.868355] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
[   91.868356] RBP: 00007f313139a6c0 R08: 000000000065a2f0 R09: 0000000000000000
[   91.868357] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000065c800
[   91.868359] R13: 000000000000000a R14: 00007f31313d0ff1 R15: 000000000065a2a0
[   91.868363]  </TASK>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]