Deadlock cpuctx_mutex / pmus_lock / &mm->mmap_lock#2

* Deadlock cpuctx_mutex / pmus_lock / &mm->mmap_lock#2
@ 2020-11-18 23:04 Heiner Kallweit
  2020-11-19 13:02 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Heiner Kallweit @ 2020-11-18 23:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Linux Kernel Mailing List

Just got the following when running perf.

[  648.247718] ======================================================
[  648.247725] WARNING: possible circular locking dependency detected
[  648.247734] 5.10.0-rc4-next-20201118+ #1 Not tainted
[  648.247740] ------------------------------------------------------
[  648.247748] perf/19761 is trying to acquire lock:
[  648.247755] ffffa00200abad18 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x2f/0x80
[  648.247777]
               but task is already holding lock:
[  648.247785] ffffa0027bc2edb0 (&cpuctx_mutex){+.+.}-{3:3}, at: perf_event_ctx_lock_nested+0xd8/0x1f0
[  648.247801]
               which lock already depends on the new lock.

[  648.247810]
               the existing dependency chain (in reverse order) is:
[  648.247818]
               -> #5 (&cpuctx_mutex){+.+.}-{3:3}:
[  648.247834]        __mutex_lock+0x88/0x900
[  648.247840]        mutex_lock_nested+0x16/0x20
[  648.247848]        perf_event_init_cpu+0x89/0x140
[  648.247857]        perf_event_init+0x172/0x1a0
[  648.247864]        start_kernel+0x655/0x7de
[  648.247871]        x86_64_start_reservations+0x24/0x26
[  648.247878]        x86_64_start_kernel+0x70/0x74
[  648.247887]        secondary_startup_64_no_verify+0xb0/0xbb
[  648.247894]
               -> #4 (pmus_lock){+.+.}-{3:3}:
[  648.247907]        __mutex_lock+0x88/0x900
[  648.247914]        mutex_lock_nested+0x16/0x20
[  648.247921]        perf_event_init_cpu+0x52/0x140
[  648.247929]        cpuhp_invoke_callback+0xa4/0x810
[  648.247937]        _cpu_up+0xaa/0x150
[  648.247943]        cpu_up+0x79/0x90
[  648.247949]        bringup_nonboot_cpus+0x4d/0x60
[  648.247958]        smp_init+0x25/0x65
[  648.247964]        kernel_init_freeable+0x144/0x267
[  648.247972]        kernel_init+0x9/0xf8
[  648.247978]        ret_from_fork+0x22/0x30
[  648.247984]
               -> #3 (cpu_hotplug_lock){++++}-{0:0}:
[  648.247998]        cpus_read_lock+0x38/0xb0
[  648.248006]        stop_machine+0x18/0x40
[  648.248075]        bxt_vtd_ggtt_insert_entries__BKL+0x37/0x50 [i915]
[  648.248129]        ggtt_bind_vma+0x43/0x60 [i915]
[  648.248192]        __vma_bind+0x38/0x40 [i915]
[  648.248242]        fence_work+0x21/0xac [i915]
[  648.248292]        fence_notify+0x95/0x134 [i915]
[  648.248342]        __i915_sw_fence_complete+0x3b/0x1d0 [i915]
[  648.248394]        i915_sw_fence_commit+0x12/0x20 [i915]
[  648.248458]        i915_vma_pin_ww+0x25c/0x8c0 [i915]
[  648.248520]        i915_ggtt_pin+0x52/0xf0 [i915]
[  648.248576]        intel_ring_pin+0x5b/0x110 [i915]
[  648.248628]        __intel_context_do_pin_ww+0xd3/0x510 [i915]
[  648.248681]        __intel_context_do_pin+0x55/0x90 [i915]
[  648.248734]        intel_engines_init+0x43d/0x570 [i915]
[  648.248787]        intel_gt_init+0x119/0x2d0 [i915]
[  648.248848]        i915_gem_init+0x133/0x1c0 [i915]
[  648.248895]        i915_driver_probe+0x68d/0xc90 [i915]
[  648.248943]        i915_pci_probe+0x45/0x120 [i915]
[  648.248952]        pci_device_probe+0xd8/0x150
[  648.248960]        really_probe+0x259/0x460
[  648.248967]        driver_probe_device+0x50/0xb0
[  648.248973]        device_driver_attach+0xad/0xc0
[  648.248980]        __driver_attach+0x75/0x110
[  648.248988]        bus_for_each_dev+0x7c/0xc0
[  648.248995]        driver_attach+0x19/0x20
[  648.249001]        bus_add_driver+0x117/0x1c0
[  648.249008]        driver_register+0x8c/0xe0
[  648.249015]        __pci_register_driver+0x6e/0x80
[  648.249022]        0xffffffffc0a5c061
[  648.249028]        do_one_initcall+0x5a/0x2c0
[  648.249036]        do_init_module+0x5d/0x240
[  648.249043]        load_module+0x2367/0x2710
[  648.249049]        __do_sys_finit_module+0xb6/0xf0
[  648.249056]        __x64_sys_finit_module+0x15/0x20
[  648.249064]        do_syscall_64+0x38/0x50
[  648.249071]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  648.249078]
               -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[  648.249093]        __ww_mutex_lock.constprop.0+0xac/0x1090
[  648.249100]        ww_mutex_lock+0x3d/0xa0
[  648.249108]        dma_resv_lockdep+0x141/0x281
[  648.249114]        do_one_initcall+0x5a/0x2c0
[  648.249121]        kernel_init_freeable+0x220/0x267
[  648.249129]        kernel_init+0x9/0xf8
[  648.249135]        ret_from_fork+0x22/0x30
[  648.249140]
               -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[  648.249155]        dma_resv_lockdep+0x115/0x281
[  648.249162]        do_one_initcall+0x5a/0x2c0
[  648.249168]        kernel_init_freeable+0x220/0x267
[  648.249176]        kernel_init+0x9/0xf8
[  648.249182]        ret_from_fork+0x22/0x30
[  648.249188]
               -> #0 (&mm->mmap_lock#2){++++}-{3:3}:
[  648.249203]        __lock_acquire+0x125d/0x2160
[  648.249210]        lock_acquire+0x137/0x3e0
[  648.249217]        __might_fault+0x59/0x80
[  648.249223]        perf_copy_attr+0x35/0x340
[  648.249230]        _perf_ioctl+0x3e1/0xd40
[  648.249237]        perf_ioctl+0x34/0x60
[  648.249245]        __x64_sys_ioctl+0x8c/0xb0
[  648.249252]        do_syscall_64+0x38/0x50
[  648.249259]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  648.249265]
               other info that might help us debug this:

[  648.249277] Chain exists of:
                 &mm->mmap_lock#2 --> pmus_lock --> &cpuctx_mutex

[  648.249295]  Possible unsafe locking scenario:

[  648.249302]        CPU0                    CPU1
[  648.249308]        ----                    ----
[  648.249314]   lock(&cpuctx_mutex);
[  648.249321]                                lock(pmus_lock);
[  648.249330]                                lock(&cpuctx_mutex);
[  648.249339]   lock(&mm->mmap_lock#2);
[  648.249347]
                *** DEADLOCK ***

[  648.249355] 1 lock held by perf/19761:
[  648.249360]  #0: ffffa0027bc2edb0 (&cpuctx_mutex){+.+.}-{3:3}, at: perf_event_ctx_lock_nested+0xd8/0x1f0
[  648.249377]
               stack backtrace:
[  648.249386] CPU: 0 PID: 19761 Comm: perf Not tainted 5.10.0-rc4-next-20201118+ #1
[  648.249395] Hardware name: NA ZBOX-CI327NANO-GS-01/ZBOX-CI327NANO-GS-01, BIOS 5.12 04/28/2020
[  648.249405] Call Trace:
[  648.249413]  dump_stack+0x7d/0x9f
[  648.249420]  print_circular_bug.cold+0x13c/0x141
[  648.249428]  check_noncircular+0xf1/0x110
[  648.249435]  __lock_acquire+0x125d/0x2160
[  648.249442]  lock_acquire+0x137/0x3e0
[  648.249449]  ? __might_fault+0x2f/0x80
[  648.249456]  __might_fault+0x59/0x80
[  648.249463]  ? __might_fault+0x2f/0x80
[  648.249469]  perf_copy_attr+0x35/0x340
[  648.249476]  _perf_ioctl+0x3e1/0xd40
[  648.249482]  ? __mutex_lock+0x88/0x900
[  648.249489]  ? perf_event_ctx_lock_nested+0xd8/0x1f0
[  648.249497]  ? perf_event_ctx_lock_nested+0x1a/0x1f0
[  648.249504]  ? mutex_lock_nested+0x16/0x20
[  648.249511]  perf_ioctl+0x34/0x60
[  648.249518]  __x64_sys_ioctl+0x8c/0xb0
[  648.249524]  do_syscall_64+0x38/0x50
[  648.249531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  648.249539] RIP: 0033:0x7f68dd378f6b
[  648.249548] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  648.249566] RSP: 002b:00007ffd6c018c58 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  648.249577] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f68dd378f6b
[  648.249586] RDX: 00007ffd6c018c90 RSI: 000000004008240b RDI: 0000000000000003
[  648.249594] RBP: 00007ffd6c018d40 R08: 0000000000000008 R09: 0000003000000018
[  648.249602] R10: fffffffffffff8f8 R11: 0000000000000202 R12: 0000000000000003
[  648.249611] R13: 00007ffd6c018c90 R14: 0000007800000005 R15: 000080056c01a5a0

^ permalink raw reply	[flat|nested] 6+ messages in thread