From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: DRI Development <dri-devel@lists.freedesktop.org>
Cc: "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org,
"Daniel Vetter" <daniel.vetter@ffwll.ch>,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
"Chris Wilson" <chris@chris-wilson.co.uk>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Christian König" <christian.koenig@amd.com>,
"Daniel Vetter" <daniel.vetter@intel.com>
Subject: [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets
Date: Thu, 4 Jun 2020 10:12:23 +0200 [thread overview]
Message-ID: <20200604081224.863494-18-daniel.vetter@ffwll.ch> (raw)
In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch>
...
I think it's time to stop this little exercise.
The lockdep splat, for the record:
[ 132.583381] ======================================================
[ 132.584091] WARNING: possible circular locking dependency detected
[ 132.584775] 5.7.0-rc3+ #346 Tainted: G W
[ 132.585461] ------------------------------------------------------
[ 132.586184] kworker/2:3/865 is trying to acquire lock:
[ 132.586857] ffffc90000677c70 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
[ 132.587569]
but task is already holding lock:
[ 132.589044] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched]
[ 132.589803]
which lock already depends on the new lock.
[ 132.592009]
the existing dependency chain (in reverse order) is:
[ 132.593507]
-> #2 (dma_fence_map){++++}-{0:0}:
[ 132.595019] dma_fence_begin_signalling+0x50/0x60
[ 132.595767] drm_atomic_helper_commit+0xa1/0x180 [drm_kms_helper]
[ 132.596567] drm_client_modeset_commit_atomic+0x1ea/0x250 [drm]
[ 132.597420] drm_client_modeset_commit_locked+0x55/0x190 [drm]
[ 132.598178] drm_client_modeset_commit+0x24/0x40 [drm]
[ 132.598948] drm_fb_helper_restore_fbdev_mode_unlocked+0x4b/0xa0 [drm_kms_helper]
[ 132.599738] drm_fb_helper_set_par+0x30/0x40 [drm_kms_helper]
[ 132.600539] fbcon_init+0x2e8/0x660
[ 132.601344] visual_init+0xce/0x130
[ 132.602156] do_bind_con_driver+0x1bc/0x2b0
[ 132.602970] do_take_over_console+0x115/0x180
[ 132.603763] do_fbcon_takeover+0x58/0xb0
[ 132.604564] register_framebuffer+0x1ee/0x300
[ 132.605369] __drm_fb_helper_initial_config_and_unlock+0x36e/0x520 [drm_kms_helper]
[ 132.606187] amdgpu_fbdev_init+0xb3/0xf0 [amdgpu]
[ 132.607032] amdgpu_device_init.cold+0xe90/0x1677 [amdgpu]
[ 132.607862] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu]
[ 132.608697] amdgpu_pci_probe+0xf7/0x180 [amdgpu]
[ 132.609511] local_pci_probe+0x42/0x80
[ 132.610324] pci_device_probe+0x104/0x1a0
[ 132.611130] really_probe+0x147/0x3c0
[ 132.611939] driver_probe_device+0xb6/0x100
[ 132.612766] device_driver_attach+0x53/0x60
[ 132.613593] __driver_attach+0x8c/0x150
[ 132.614419] bus_for_each_dev+0x7b/0xc0
[ 132.615249] bus_add_driver+0x14c/0x1f0
[ 132.616071] driver_register+0x6c/0xc0
[ 132.616902] do_one_initcall+0x5d/0x2f0
[ 132.617731] do_init_module+0x5c/0x230
[ 132.618560] load_module+0x2981/0x2bc0
[ 132.619391] __do_sys_finit_module+0xaa/0x110
[ 132.620228] do_syscall_64+0x5a/0x250
[ 132.621064] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 132.621903]
-> #1 (crtc_ww_class_mutex){+.+.}-{3:3}:
[ 132.623587] __ww_mutex_lock.constprop.0+0xcc/0x10c0
[ 132.624448] ww_mutex_lock+0x43/0xb0
[ 132.625315] drm_modeset_lock+0x44/0x120 [drm]
[ 132.626184] drmm_mode_config_init+0x2db/0x8b0 [drm]
[ 132.627098] amdgpu_device_init.cold+0xbd1/0x1677 [amdgpu]
[ 132.628007] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu]
[ 132.628920] amdgpu_pci_probe+0xf7/0x180 [amdgpu]
[ 132.629804] local_pci_probe+0x42/0x80
[ 132.630690] pci_device_probe+0x104/0x1a0
[ 132.631583] really_probe+0x147/0x3c0
[ 132.632479] driver_probe_device+0xb6/0x100
[ 132.633379] device_driver_attach+0x53/0x60
[ 132.634275] __driver_attach+0x8c/0x150
[ 132.635170] bus_for_each_dev+0x7b/0xc0
[ 132.636069] bus_add_driver+0x14c/0x1f0
[ 132.636974] driver_register+0x6c/0xc0
[ 132.637870] do_one_initcall+0x5d/0x2f0
[ 132.638765] do_init_module+0x5c/0x230
[ 132.639654] load_module+0x2981/0x2bc0
[ 132.640522] __do_sys_finit_module+0xaa/0x110
[ 132.641372] do_syscall_64+0x5a/0x250
[ 132.642203] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 132.643022]
-> #0 (crtc_ww_class_acquire){+.+.}-{0:0}:
[ 132.644643] __lock_acquire+0x1241/0x23f0
[ 132.645469] lock_acquire+0xad/0x370
[ 132.646274] drm_modeset_acquire_init+0xd2/0x100 [drm]
[ 132.647071] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
[ 132.647902] dm_suspend+0x1c/0x60 [amdgpu]
[ 132.648698] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
[ 132.649498] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 132.650300] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu]
[ 132.651084] amdgpu_job_timedout+0xfb/0x150 [amdgpu]
[ 132.651825] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched]
[ 132.652594] process_one_work+0x23c/0x580
[ 132.653402] worker_thread+0x50/0x3b0
[ 132.654139] kthread+0x12e/0x150
[ 132.654868] ret_from_fork+0x27/0x50
[ 132.655598]
other info that might help us debug this:
[ 132.657739] Chain exists of:
crtc_ww_class_acquire --> crtc_ww_class_mutex --> dma_fence_map
[ 132.659877] Possible unsafe locking scenario:
[ 132.661416] CPU0 CPU1
[ 132.662126] ---- ----
[ 132.662847] lock(dma_fence_map);
[ 132.663574] lock(crtc_ww_class_mutex);
[ 132.664319] lock(dma_fence_map);
[ 132.665063] lock(crtc_ww_class_acquire);
[ 132.665799]
*** DEADLOCK ***
[ 132.667965] 4 locks held by kworker/2:3/865:
[ 132.668701] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580
[ 132.669462] #1: ffffc90000677e58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580
[ 132.670242] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched]
[ 132.671039] #3: ffff8887b84a1748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x59e/0xe64 [amdgpu]
[ 132.671902]
stack backtrace:
[ 132.673515] CPU: 2 PID: 865 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346
[ 132.674347] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018
[ 132.675194] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 132.676046] Call Trace:
[ 132.676897] dump_stack+0x8f/0xd0
[ 132.677748] check_noncircular+0x162/0x180
[ 132.678604] ? stack_trace_save+0x4b/0x70
[ 132.679459] __lock_acquire+0x1241/0x23f0
[ 132.680311] lock_acquire+0xad/0x370
[ 132.681163] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
[ 132.682021] ? cpumask_next+0x16/0x20
[ 132.682880] ? module_assert_mutex_or_preempt+0x14/0x40
[ 132.683737] ? __module_address+0x28/0xf0
[ 132.684601] drm_modeset_acquire_init+0xd2/0x100 [drm]
[ 132.685466] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
[ 132.686335] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
[ 132.687255] dm_suspend+0x1c/0x60 [amdgpu]
[ 132.688152] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
[ 132.689057] ? amdgpu_fence_process+0x4c/0x150 [amdgpu]
[ 132.689963] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 132.690893] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu]
[ 132.691818] amdgpu_job_timedout+0xfb/0x150 [amdgpu]
[ 132.692707] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched]
[ 132.693597] process_one_work+0x23c/0x580
[ 132.694487] worker_thread+0x50/0x3b0
[ 132.695373] ? process_one_work+0x580/0x580
[ 132.696264] kthread+0x12e/0x150
[ 132.697154] ? kthread_create_worker_on_cpu+0x70/0x70
[ 132.698057] ret_from_fork+0x27/0x50
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4c4492de670c..3ea4b9258fb0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2441,6 +2441,14 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
/* displays are handled separately */
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) {
/* XXX handle errors */
+
+ /*
+ * This is dm_suspend, which calls modeset locks, and
+ * that a pretty good inversion against dma_fence_signal
+ * which gpu recovery is supposed to guarantee.
+ *
+ * Dont ask me how to fix this.
+ */
r = adev->ip_blocks[i].version->funcs->suspend(adev);
/* XXX handle errors */
if (r) {
--
2.26.2
next prev parent reply other threads:[~2020-06-04 8:12 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-04 8:12 [PATCH 00/18] dma-fence lockdep annotations, round 2 Daniel Vetter
2020-06-04 8:12 ` [PATCH 01/18] mm: Track mmu notifiers in fs_reclaim_acquire/release Daniel Vetter
2020-06-10 12:01 ` Thomas Hellström (Intel)
2020-06-10 12:25 ` [Intel-gfx] " Daniel Vetter
2020-06-10 19:41 ` [PATCH] " Daniel Vetter
2020-06-11 14:29 ` Jason Gunthorpe
2020-06-21 17:42 ` Qian Cai
2020-06-21 18:07 ` Daniel Vetter
2020-06-21 20:01 ` Daniel Vetter
2020-06-21 22:09 ` Qian Cai
2020-06-23 16:17 ` Qian Cai
2020-06-23 22:13 ` Daniel Vetter
2020-06-23 22:29 ` Qian Cai
2020-06-23 22:31 ` Dave Chinner
2020-06-23 22:36 ` Daniel Vetter
2020-06-21 17:00 ` [PATCH 01/18] " Qian Cai
2020-06-21 17:28 ` Daniel Vetter
2020-06-21 17:46 ` Qian Cai
2020-06-04 8:12 ` [PATCH 02/18] dma-buf: minor doc touch-ups Daniel Vetter
2020-06-10 13:07 ` Thomas Hellström (Intel)
2020-06-04 8:12 ` [PATCH 03/18] dma-fence: basic lockdep annotations Daniel Vetter
2020-06-04 8:57 ` Thomas Hellström (Intel)
2020-06-04 9:21 ` Daniel Vetter
[not found] ` <159126281827.25109.3992161193069793005@build.alporthouse.com>
2020-06-04 9:36 ` [Intel-gfx] " Daniel Vetter
2020-06-05 13:29 ` [PATCH] " Daniel Vetter
2020-06-05 14:30 ` Thomas Hellström (Intel)
2020-06-11 9:57 ` Maarten Lankhorst
2020-06-10 14:21 ` [Intel-gfx] [PATCH 03/18] " Tvrtko Ursulin
2020-06-10 15:17 ` Daniel Vetter
2020-06-11 10:36 ` Tvrtko Ursulin
2020-06-11 11:29 ` Daniel Vetter
2020-06-11 14:29 ` Tvrtko Ursulin
2020-06-11 15:03 ` Daniel Vetter
[not found] ` <159186243606.1506.4437341616828968890@build.alporthouse.com>
2020-06-11 8:44 ` Dave Airlie
2020-06-11 9:01 ` [Intel-gfx] " Daniel Stone
[not found] ` <159255511144.7737.12635440776531222029@build.alporthouse.com>
2020-06-19 8:51 ` Daniel Vetter
[not found] ` <159255801588.7737.4425728073225310839@build.alporthouse.com>
2020-06-19 9:43 ` Daniel Vetter
[not found] ` <159257233754.7737.17318605310513355800@build.alporthouse.com>
2020-06-22 9:16 ` Daniel Vetter
2020-07-09 7:29 ` Daniel Stone
2020-07-09 8:01 ` Daniel Vetter
2020-06-12 7:06 ` [PATCH] " Daniel Vetter
2020-06-04 8:12 ` [PATCH 04/18] dma-fence: prime " Daniel Vetter
2020-06-11 7:30 ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-06-11 8:34 ` Daniel Vetter
2020-06-11 14:15 ` Jason Gunthorpe
2020-06-11 23:35 ` Felix Kuehling
2020-06-12 5:11 ` Daniel Vetter
2020-06-19 18:13 ` Jerome Glisse
2020-06-23 7:39 ` Daniel Vetter
2020-06-23 18:44 ` Felix Kuehling
2020-06-23 19:02 ` Daniel Vetter
2020-06-16 12:07 ` Daniel Vetter
2020-06-16 14:53 ` Jason Gunthorpe
2020-06-17 7:57 ` Daniel Vetter
2020-06-17 15:29 ` Jason Gunthorpe
2020-06-18 14:42 ` Daniel Vetter
2020-06-17 6:48 ` Daniel Vetter
2020-06-17 15:28 ` Jason Gunthorpe
2020-06-18 15:00 ` Daniel Vetter
2020-06-18 17:23 ` Jason Gunthorpe
2020-06-19 7:22 ` Daniel Vetter
2020-06-19 11:39 ` Jason Gunthorpe
2020-06-19 15:06 ` Daniel Vetter
2020-06-19 15:15 ` Jason Gunthorpe
2020-06-19 16:19 ` Daniel Vetter
2020-06-19 17:23 ` Jason Gunthorpe
2020-06-19 18:09 ` Jerome Glisse
2020-06-19 18:18 ` Jason Gunthorpe
2020-06-19 19:48 ` Felix Kuehling
2020-06-19 19:55 ` Jason Gunthorpe
2020-06-19 20:03 ` Felix Kuehling
2020-06-19 20:31 ` Jerome Glisse
2020-06-22 11:46 ` Jason Gunthorpe
2020-06-22 20:15 ` Jerome Glisse
2020-06-23 0:02 ` Jason Gunthorpe
2020-06-19 20:10 ` Jerome Glisse
2020-06-19 20:43 ` Daniel Vetter
2020-06-19 20:59 ` Jerome Glisse
2020-06-23 0:05 ` Jason Gunthorpe
2020-06-19 19:11 ` Alex Deucher
2020-06-19 19:30 ` Felix Kuehling
2020-06-19 19:40 ` Jerome Glisse
2020-06-19 19:51 ` Jason Gunthorpe
2020-06-12 7:01 ` [PATCH] " Daniel Vetter
2020-06-04 8:12 ` [PATCH 05/18] drm/vkms: Annotate vblank timer Daniel Vetter
2020-06-04 8:12 ` [PATCH 06/18] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-06-04 8:12 ` [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-06-04 8:12 ` [PATCH 08/18] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-06-23 10:51 ` Daniel Vetter
2020-06-04 8:12 ` [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-06-04 8:12 ` [PATCH 10/18] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-06-04 8:12 ` [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-06-04 8:12 ` [PATCH 12/18] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-06-04 8:12 ` [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-06-05 8:30 ` Pierre-Eric Pelloux-Prayer
2020-06-05 12:41 ` Daniel Vetter
2020-06-04 8:12 ` [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-06-04 8:12 ` [PATCH 15/18] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-06-04 8:12 ` [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-06-04 8:12 ` Daniel Vetter [this message]
2020-06-04 8:12 ` [PATCH 18/18] drm/i915: Annotate dma_fence_work Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200604081224.863494-18-daniel.vetter@ffwll.ch \
--to=daniel.vetter@ffwll.ch \
--cc=amd-gfx@lists.freedesktop.org \
--cc=chris@chris-wilson.co.uk \
--cc=christian.koenig@amd.com \
--cc=daniel.vetter@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).