All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 109692] deadlock occurs during GPU reset
@ 2019-02-20 17:30 bugzilla-daemon
  2019-02-20 19:27 ` bugzilla-daemon
                   ` (43 more replies)
  0 siblings, 44 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 17:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 8239 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

            Bug ID: 109692
           Summary: deadlock occurs during GPU reset
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mikhail.v.gavrilov@gmail.com

Created attachment 143419
  --> https://bugs.freedesktop.org/attachment.cgi?id=143419&action=edit
dmesg

Steps for reproduce:
1. $ git clone git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next
2. $ make bzImage && make module
3. # make modules_install && make install
4. Launch "Shadow of the Tomb Raider"
--- Here GPU hung occurs ---
and after few time 
--- Here start GPU reset ---
--- Here Deadlock occurs ---

[  291.746741] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:158 vmid:7 pasid:32774, for process SOTTR.exe pid 5250 thread SOTTR.exe
pid 5250)
[  291.746750] amdgpu 0000:0b:00.0:   in page starting at address
0x0000000000002000 from 27
[  291.746754] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C
[  297.135183] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.255032] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.265813] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=13292, emitted seq=13293
[  302.265950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process SOTTR.exe pid 5250 thread SOTTR.exe pid 5250
[  302.265974] amdgpu 0000:0b:00.0: GPU reset begin!

[  302.266337] ======================================================
[  302.266338] WARNING: possible circular locking dependency detected
[  302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G         C       
[  302.266341] ------------------------------------------------------
[  302.266343] kworker/5:2/871 is trying to acquire lock:
[  302.266345] 000000000abbb16a (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[  302.266352] 
               but task is already holding lock:
[  302.266353] 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266358] 
               which lock already depends on the new lock.

[  302.266360] 
               the existing dependency chain (in reverse order) is:
[  302.266361] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[  302.266366]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[  302.266368]        dma_fence_signal+0x111/0x1a0
[  302.266414]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[  302.266470]        sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
[  302.266523]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[  302.266576]        amdgpu_ih_process+0x84/0xf0 [amdgpu]
[  302.266628]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[  302.266632]        __handle_irq_event_percpu+0x3f/0x290
[  302.266635]        handle_irq_event_percpu+0x31/0x80
[  302.266637]        handle_irq_event+0x34/0x51
[  302.266639]        handle_edge_irq+0x7c/0x1a0
[  302.266643]        handle_irq+0xbf/0x100
[  302.266646]        do_IRQ+0x61/0x120
[  302.266648]        ret_from_intr+0x0/0x22
[  302.266651]        cpuidle_enter_state+0xbf/0x470
[  302.266654]        do_idle+0x1ec/0x280
[  302.266657]        cpu_startup_entry+0x19/0x20
[  302.266660]        start_secondary+0x1b3/0x200
[  302.266663]        secondary_startup_64+0xa4/0xb0
[  302.266664] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[  302.266668]        _raw_spin_lock_irqsave+0x49/0x83
[  302.266670]        dma_fence_remove_callback+0x1a/0x60
[  302.266673]        drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.266717]        amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.266761]        amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.266822]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.266827]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.266831]        process_one_work+0x272/0x5d0
[  302.266834]        worker_thread+0x50/0x3b0
[  302.266836]        kthread+0x108/0x140
[  302.266839]        ret_from_fork+0x27/0x50
[  302.266840] 
               other info that might help us debug this:

[  302.266841]  Possible unsafe locking scenario:

[  302.266842]        CPU0                    CPU1
[  302.266843]        ----                    ----
[  302.266844]   lock(&(&sched->job_list_lock)->rlock);
[  302.266846]                               
lock(&(&ring->fence_drv.lock)->rlock);
[  302.266847]                               
lock(&(&sched->job_list_lock)->rlock);
[  302.266849]   lock(&(&ring->fence_drv.lock)->rlock);
[  302.266850] 
                *** DEADLOCK ***

[  302.266852] 5 locks held by kworker/5:2/871:
[  302.266853]  #0: 00000000d133fb6e ((wq_completion)"events"){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266857]  #1: 000000008a5c3f7e
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266862]  #2: 00000000b9b2c76f (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
[  302.266908]  #3: 00000000ac637728 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[  302.266965]  #4: 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266971] 
               stack backtrace:
[  302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G         C       
5.0.0-rc1-drm-next-kernel+ #1
[  302.266976] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[  302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
[  302.266982] Call Trace:
[  302.266987]  dump_stack+0x85/0xc0
[  302.266991]  print_circular_bug.isra.0.cold+0x15c/0x195
[  302.266994]  __lock_acquire+0x134c/0x1660
[  302.266998]  ? add_lock_to_list.isra.0+0x67/0xb0
[  302.267003]  lock_acquire+0xa2/0x1b0
[  302.267006]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267011]  _raw_spin_lock_irqsave+0x49/0x83
[  302.267013]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267016]  dma_fence_remove_callback+0x1a/0x60
[  302.267020]  drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.267065]  amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.267110]  amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.267173]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.267178]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.267183]  process_one_work+0x272/0x5d0
[  302.267188]  worker_thread+0x50/0x3b0
[  302.267191]  kthread+0x108/0x140
[  302.267194]  ? process_one_work+0x5d0/0x5d0
[  302.267196]  ? kthread_park+0x90/0x90
[  302.267199]  ret_from_fork+0x27/0x50
[  302.692194] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  302.692234] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  302.768931] amdgpu 0000:0b:00.0: GPU BACO reset
[  303.278874] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[  303.279006] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[  303.279072] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  303.279234] [drm] PSP is resuming...
[  303.426601] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[  303.572227] [drm] UVD and UVD ENC initialized successfully.
[  303.687727] [drm] VCE initialized successfully.
[  303.689585] [drm] recover vram bo from shadow start
[  303.722757] [drm] recover vram bo from shadow done
[  303.722761] [drm] Skip scheduling IBs!
[  303.722791] amdgpu 0000:0b:00.0: GPU reset(2) succeeded!
[  303.722811] [drm] Skip scheduling IBs!
[  303.722838] [drm] Skip scheduling IBs!
[  303.722846] [drm] Skip scheduling IBs!
[  303.722854] [drm] Skip scheduling IBs!
[  303.722863] [drm] Skip scheduling IBs!
[  303.722871] [drm] Skip scheduling IBs!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 9817 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2019-11-19  9:14 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
2019-02-20 19:27 ` bugzilla-daemon
2019-02-20 19:41 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:02 ` bugzilla-daemon
2019-02-20 21:06 ` bugzilla-daemon
2019-02-21 11:46 ` bugzilla-daemon
2019-02-21 14:23 ` bugzilla-daemon
2019-02-21 17:23 ` bugzilla-daemon
2019-02-26 10:42 ` bugzilla-daemon
2019-02-26 10:43 ` bugzilla-daemon
2019-02-26 15:17 ` bugzilla-daemon
2019-02-26 15:20 ` bugzilla-daemon
2019-02-27  4:24 ` bugzilla-daemon
2019-02-27  4:25 ` bugzilla-daemon
2019-02-27  9:23 ` bugzilla-daemon
2019-02-28 16:43 ` bugzilla-daemon
2019-02-28 16:45 ` bugzilla-daemon
2019-04-05 20:41 ` bugzilla-daemon
2019-04-05 20:51 ` bugzilla-daemon
2019-04-06 14:44 ` bugzilla-daemon
2019-04-06 14:48 ` bugzilla-daemon
2019-04-08 14:46 ` bugzilla-daemon
2019-04-08 14:48 ` bugzilla-daemon
2019-04-08 21:28 ` bugzilla-daemon
2019-04-09  1:51 ` bugzilla-daemon
2019-04-09  3:15 ` bugzilla-daemon
2019-04-09  3:16 ` bugzilla-daemon
2019-04-09 16:34 ` bugzilla-daemon
2019-04-10  3:41 ` bugzilla-daemon
2019-04-10 14:25 ` bugzilla-daemon
2019-04-10 19:01 ` bugzilla-daemon
2019-04-10 20:11 ` bugzilla-daemon
2019-04-10 20:14 ` bugzilla-daemon
2019-04-11  3:53 ` bugzilla-daemon
2019-04-11  4:03 ` bugzilla-daemon
2019-04-11 14:25 ` bugzilla-daemon
2019-04-11 14:41 ` bugzilla-daemon
2019-04-11 14:52 ` bugzilla-daemon
2019-04-11 16:32 ` bugzilla-daemon
2019-04-11 16:42 ` bugzilla-daemon
2019-05-12 20:58 ` bugzilla-daemon
2019-05-12 21:00 ` bugzilla-daemon
2019-11-19  9:14 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.