All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Ward <david.ward@gatech.edu>
To: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>,
	Harry Wentland <harry.wentland@amd.com>,
	Leo Li <sunpeng.li@amd.com>,
	Alexander Deucher <Alexander.Deucher@amd.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Subject: Lockdep bug during hdcp_create_workqueue() (Was: BUG: key ffff8b521bda9148 has not been registered!)
Date: Tue, 4 May 2021 07:58:24 -0400	[thread overview]
Message-ID: <44b4dbe2-a808-9788-7a4f-dfd628a93256@gatech.edu> (raw)
In-Reply-To: <CABXGCsPZvfsUBiMr5fdQYRf26bMchN=UL8oXojgoVbWtwJhXjQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 8078 bytes --]

On 1/9/21 7:42 AM, Mikhail Gavrilov wrote:
> Hi folks!
> I started to see this message every boot after replacing Radeon VII to 6900XT.
>
> <...>
>
> [    6.333672] [drm] REG_WAIT timeout 1us * 100000 tries -
> mpc2_assert_idle_mpcc line:480
> [    6.335258] BUG: key ffff8b521bda9148 has not been registered!
> [    6.335271] ------------[ cut here ]------------
> [    6.335273] DEBUG_LOCKS_WARN_ON(1)
> [    6.335279] WARNING: CPU: 18 PID: 525 at
> kernel/locking/lockdep.c:4618 lockdep_init_map_waits+0x18b/0x210
> [    6.335284] Modules linked in: fjes(-) amdgpu(+) iommu_v2 gpu_sched
> ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm
> ghash_clmulni_intel ccp igb nvme nvme_core dca i2c_algo_bit wmi
> pinctrl_amd fuse
> [    6.335298] CPU: 18 PID: 525 Comm: systemd-udevd Not tainted
> 5.10.0-0.rc6.20201204git34816d20f173.92.fc34.x86_64 #1
> [    6.335302] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020
> [    6.335306] RIP: 0010:lockdep_init_map_waits+0x18b/0x210
> [    6.335309] Code: 00 85 c0 0f 84 75 ff ff ff 8b 3d 18 c4 f1 01 85
> ff 0f 85 67 ff ff ff 48 c7 c6 68 43 60 97 48 c7 c7 1d 90 5a 97 e8 70
> 1f b6 00 <0f> 0b e9 4d ff ff ff e8 19 59 bc 00 85 c0 74 21 44 8b 1d e6
> c3 f1
> [    6.335315] RSP: 0018:ffff9e5a013d3910 EFLAGS: 00010282
> [    6.335317] RAX: 0000000000000016 RBX: ffffffff97247d80 RCX: ffff8b5908fdb238
> [    6.335320] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8b5908fdb230
> [    6.335322] RBP: ffff8b520e2a7978 R08: 0000000000000000 R09: 0000000000000000
> [    6.335325] R10: ffff9e5a013d3740 R11: ffff8b592e2fffe8 R12: ffff8b521bda9148
> [    6.335327] R13: 0000000000000000 R14: ffff8b521bc30330 R15: ffff8b521bc30330
> [    6.335330] FS:  00007fe019eb9140(0000) GS:ffff8b5908e00000(0000)
> knlGS:0000000000000000
> [    6.335333] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.335336] CR2: 00007fe018f5e000 CR3: 00000001142ee000 CR4: 0000000000350ee0
> [    6.335338] Call Trace:
> [    6.335342]  __kernfs_create_file+0x7b/0x100
> [    6.335344]  sysfs_add_file_mode_ns+0xa3/0x190
> [    6.335347]  sysfs_create_bin_file+0x50/0x70
> [    6.335428]  hdcp_create_workqueue+0x3bd/0x410 [amdgpu]
> [    6.335499]  amdgpu_dm_init.isra.0.cold+0x136/0x126d [amdgpu]
> [    6.335570]  ? psp_set_srm+0xb0/0xb0 [amdgpu]
> [    6.335637]  ? hdcp_update_display+0x1f0/0x1f0 [amdgpu]
> [    6.335641]  ? dev_printk_emit+0x3e/0x40
> [    6.335709]  dm_hw_init+0xe/0x20 [amdgpu]
> [    6.335776]  amdgpu_device_init.cold+0x18c3/0x1bbc [amdgpu]
> [    6.335781]  ? pci_bus_read_config_word+0x39/0x50
> [    6.335831]  amdgpu_driver_load_kms+0x2b/0x1f0 [amdgpu]
> [    6.335879]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
> [    6.335889]  local_pci_probe+0x42/0x80
> [    6.335891]  pci_device_probe+0xd9/0x1a0
> [    6.335896]  really_probe+0x205/0x460
> [    6.335898]  driver_probe_device+0xe1/0x150
> [    6.335901]  device_driver_attach+0xa8/0xb0
> [    6.335904]  __driver_attach+0x8c/0x150
> [    6.335907]  ? device_driver_attach+0xb0/0xb0
> [    6.335909]  ? device_driver_attach+0xb0/0xb0
> [    6.335911]  bus_for_each_dev+0x67/0x90
> [    6.335914]  bus_add_driver+0x12e/0x1f0
> [    6.335917]  driver_register+0x8b/0xe0
> [    6.335919]  ? 0xffffffffc0e4c000
> [    6.335922]  do_one_initcall+0x67/0x320
> [    6.335925]  ? rcu_read_lock_sched_held+0x3f/0x80
> [    6.335928]  ? trace_kmalloc+0xb2/0xe0
> [    6.335930]  ? kmem_cache_alloc_trace+0x157/0x270
> [    6.335934]  do_init_module+0x5c/0x260
> [    6.335936]  __do_sys_init_module+0x13d/0x1a0
> [    6.335940]  do_syscall_64+0x33/0x40
> [    6.335943]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [    6.335945] RIP: 0033:0x7fe01aab2efe
> [    6.335948] Code: 48 8b 0d 7d 1f 0c 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4a 1f 0c 00 f7 d8 64 89
> 01 48
> [    6.335953] RSP: 002b:00007ffdf4879928 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000af
> [    6.335957] RAX: ffffffffffffffda RBX: 00005636774ad820 RCX: 00007fe01aab2efe
> [    6.335959] RDX: 00005636774856e0 RSI: 0000000000b4f95e RDI: 00007fe01840f010
> [    6.335962] RBP: 00007fe01840f010 R08: 000056367748bd30 R09: 0000000000b4f970
> [    6.335964] R10: 00005633142fc82b R11: 0000000000000246 R12: 00005636774856e0
> [    6.335967] R13: 00005636774d22d0 R14: 0000000000000000 R15: 00005636774a1d80
> [    6.335971] irq event stamp: 343839
> [    6.335973] hardirqs last  enabled at (343839):
> [<ffffffff96162861>] console_unlock+0x511/0x640
> [    6.335977] hardirqs last disabled at (343838):
> [<ffffffff961627c8>] console_unlock+0x478/0x640
> [    6.335981] softirqs last  enabled at (343730):
> [<ffffffff96e01112>] asm_call_irq_on_stack+0x12/0x20
> [    6.335984] softirqs last disabled at (343657):
> [<ffffffff96e01112>] asm_call_irq_on_stack+0x12/0x20
> [    6.335987] ---[ end trace a4445e953bea9224 ]---

Another user I am helping is seeing this bug, with a very similar stack 
trace, in v5.12 (vanilla build) on different hardware.


> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux lockdep_init_map_waits+0x18b
> lockdep_init_map_waits+0x18b/0x210:
> lockdep_init_map_waits at kernel/locking/lockdep.c:4618 (discriminator 7)
>
> $ git blame -L 4613,4623 kernel/locking/lockdep.c

I assume the issue is not actually in the lockdep code itself, but more 
likely in the amdgpu / amd display code that ultimately calls it.

Using scripts/decode_stacktrace.sh, the stack trace for v5.12 reads like 
this:

[ 12.817369] Call Trace:
[ 12.819991] __kernfs_create_file (fs/kernfs/file.c:998)
[ 12.824581] sysfs_add_file_mode_ns (fs/sysfs/file.c:324)
[ 12.829334] ? init_timer_key (kernel/time/timer.c:816)
[ 12.833527] sysfs_create_bin_file (fs/sysfs/file.c:558)
[ 12.838115] hdcp_create_workqueue 
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:648) 
amdgpu
[ 12.843964] amdgpu_dm_init.isra.0.cold 
(drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1481) amdgpu
[ 12.850303] ? lock_acquire (kernel/locking/lockdep.c:437 
kernel/locking/lockdep.c:5513 kernel/locking/lockdep.c:5476)
[ 12.854303] ? lock_is_held_type (kernel/locking/lockdep.c:5254 
kernel/locking/lockdep.c:5550)
[ 12.858779] ? smum_send_msg_to_smc_with_parameter 
(drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smumgr.c:169) amdgpu
[ 12.865954] ? find_held_lock (kernel/locking/lockdep.c:5004)
[ 12.870065] ? smum_send_msg_to_smc_with_parameter 
(drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smumgr.c:169) amdgpu
[ 12.877258] ? psp_set_srm 
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:396) 
amdgpu
[ 12.882124] ? hdcp_update_display 
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_hdcp.c:431) 
amdgpu
[ 12.887914] ? arch_jump_label_transform (arch/x86/kernel/jump_label.c:99)
[ 12.892945] ? sched_clock_cpu (kernel/sched/clock.c:371)
[ 12.897051] dm_hw_init 
(drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1712) amdgpu
[ 12.901526] amdgpu_device_init.cold 
(drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:295) amdgpu
[ 12.907703] ? _raw_spin_unlock_irqrestore 
(./arch/x86/include/asm/paravirt.h:658 
./arch/x86/include/asm/irqflags.h:145 
./include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 12.913015] amdgpu_driver_load_kms 
(drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:157 (discriminator 6)) amdgpu
[ 12.918773] amdgpu_pci_probe 
(drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1221) amdgpu

Line numbers above correspond to the v5.12 tag in Linus's tree (this is 
a vanilla kernel).


> Who can help fix this?
>
> Full kernel logs is here: https://pastebin.com/d2Nq01SX

I created an issue for this bug before I found this e-mail:
https://gitlab.freedesktop.org/drm/amd/-/issues/1586

The full kernel logs for v5.12 are posted there.


Thank you,

David


[-- Attachment #1.2: Type: text/html, Size: 10098 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-05-04 14:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-09 12:42 BUG: key ffff8b521bda9148 has not been registered! Mikhail Gavrilov
2021-01-09 12:42 ` Mikhail Gavrilov
2021-05-04 11:58 ` David Ward [this message]
2021-05-10  9:30   ` [PATCH] drm/amd/display: Initialize attribute for hdcp_srm sysfs file David Ward
2021-05-10 21:24     ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44b4dbe2-a808-9788-7a4f-dfd628a93256@gatech.edu \
    --to=david.ward@gatech.edu \
    --cc=Alexander.Deucher@amd.com \
    --cc=Bhawanpreet.Lakha@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=harry.wentland@amd.com \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=sunpeng.li@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.