All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 109692] deadlock occurs during GPU reset
@ 2019-02-20 17:30 bugzilla-daemon
  2019-02-20 19:27 ` bugzilla-daemon
                   ` (43 more replies)
  0 siblings, 44 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 17:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 8239 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

            Bug ID: 109692
           Summary: deadlock occurs during GPU reset
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mikhail.v.gavrilov@gmail.com

Created attachment 143419
  --> https://bugs.freedesktop.org/attachment.cgi?id=143419&action=edit
dmesg

Steps for reproduce:
1. $ git clone git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next
2. $ make bzImage && make module
3. # make modules_install && make install
4. Launch "Shadow of the Tomb Raider"
--- Here GPU hung occurs ---
and after few time 
--- Here start GPU reset ---
--- Here Deadlock occurs ---

[  291.746741] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:158 vmid:7 pasid:32774, for process SOTTR.exe pid 5250 thread SOTTR.exe
pid 5250)
[  291.746750] amdgpu 0000:0b:00.0:   in page starting at address
0x0000000000002000 from 27
[  291.746754] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C
[  297.135183] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.255032] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.265813] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=13292, emitted seq=13293
[  302.265950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process SOTTR.exe pid 5250 thread SOTTR.exe pid 5250
[  302.265974] amdgpu 0000:0b:00.0: GPU reset begin!

[  302.266337] ======================================================
[  302.266338] WARNING: possible circular locking dependency detected
[  302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G         C       
[  302.266341] ------------------------------------------------------
[  302.266343] kworker/5:2/871 is trying to acquire lock:
[  302.266345] 000000000abbb16a (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[  302.266352] 
               but task is already holding lock:
[  302.266353] 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266358] 
               which lock already depends on the new lock.

[  302.266360] 
               the existing dependency chain (in reverse order) is:
[  302.266361] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[  302.266366]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[  302.266368]        dma_fence_signal+0x111/0x1a0
[  302.266414]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[  302.266470]        sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
[  302.266523]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[  302.266576]        amdgpu_ih_process+0x84/0xf0 [amdgpu]
[  302.266628]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[  302.266632]        __handle_irq_event_percpu+0x3f/0x290
[  302.266635]        handle_irq_event_percpu+0x31/0x80
[  302.266637]        handle_irq_event+0x34/0x51
[  302.266639]        handle_edge_irq+0x7c/0x1a0
[  302.266643]        handle_irq+0xbf/0x100
[  302.266646]        do_IRQ+0x61/0x120
[  302.266648]        ret_from_intr+0x0/0x22
[  302.266651]        cpuidle_enter_state+0xbf/0x470
[  302.266654]        do_idle+0x1ec/0x280
[  302.266657]        cpu_startup_entry+0x19/0x20
[  302.266660]        start_secondary+0x1b3/0x200
[  302.266663]        secondary_startup_64+0xa4/0xb0
[  302.266664] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[  302.266668]        _raw_spin_lock_irqsave+0x49/0x83
[  302.266670]        dma_fence_remove_callback+0x1a/0x60
[  302.266673]        drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.266717]        amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.266761]        amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.266822]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.266827]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.266831]        process_one_work+0x272/0x5d0
[  302.266834]        worker_thread+0x50/0x3b0
[  302.266836]        kthread+0x108/0x140
[  302.266839]        ret_from_fork+0x27/0x50
[  302.266840] 
               other info that might help us debug this:

[  302.266841]  Possible unsafe locking scenario:

[  302.266842]        CPU0                    CPU1
[  302.266843]        ----                    ----
[  302.266844]   lock(&(&sched->job_list_lock)->rlock);
[  302.266846]                               
lock(&(&ring->fence_drv.lock)->rlock);
[  302.266847]                               
lock(&(&sched->job_list_lock)->rlock);
[  302.266849]   lock(&(&ring->fence_drv.lock)->rlock);
[  302.266850] 
                *** DEADLOCK ***

[  302.266852] 5 locks held by kworker/5:2/871:
[  302.266853]  #0: 00000000d133fb6e ((wq_completion)"events"){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266857]  #1: 000000008a5c3f7e
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266862]  #2: 00000000b9b2c76f (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
[  302.266908]  #3: 00000000ac637728 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[  302.266965]  #4: 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266971] 
               stack backtrace:
[  302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G         C       
5.0.0-rc1-drm-next-kernel+ #1
[  302.266976] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[  302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
[  302.266982] Call Trace:
[  302.266987]  dump_stack+0x85/0xc0
[  302.266991]  print_circular_bug.isra.0.cold+0x15c/0x195
[  302.266994]  __lock_acquire+0x134c/0x1660
[  302.266998]  ? add_lock_to_list.isra.0+0x67/0xb0
[  302.267003]  lock_acquire+0xa2/0x1b0
[  302.267006]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267011]  _raw_spin_lock_irqsave+0x49/0x83
[  302.267013]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267016]  dma_fence_remove_callback+0x1a/0x60
[  302.267020]  drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.267065]  amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.267110]  amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.267173]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.267178]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.267183]  process_one_work+0x272/0x5d0
[  302.267188]  worker_thread+0x50/0x3b0
[  302.267191]  kthread+0x108/0x140
[  302.267194]  ? process_one_work+0x5d0/0x5d0
[  302.267196]  ? kthread_park+0x90/0x90
[  302.267199]  ret_from_fork+0x27/0x50
[  302.692194] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  302.692234] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  302.768931] amdgpu 0000:0b:00.0: GPU BACO reset
[  303.278874] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[  303.279006] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[  303.279072] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  303.279234] [drm] PSP is resuming...
[  303.426601] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[  303.572227] [drm] UVD and UVD ENC initialized successfully.
[  303.687727] [drm] VCE initialized successfully.
[  303.689585] [drm] recover vram bo from shadow start
[  303.722757] [drm] recover vram bo from shadow done
[  303.722761] [drm] Skip scheduling IBs!
[  303.722791] amdgpu 0000:0b:00.0: GPU reset(2) succeeded!
[  303.722811] [drm] Skip scheduling IBs!
[  303.722838] [drm] Skip scheduling IBs!
[  303.722846] [drm] Skip scheduling IBs!
[  303.722854] [drm] Skip scheduling IBs!
[  303.722863] [drm] Skip scheduling IBs!
[  303.722871] [drm] Skip scheduling IBs!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 9817 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
@ 2019-02-20 19:27 ` bugzilla-daemon
  2019-02-20 19:41 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 19:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 285 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #1 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Thanks, I will try to find time sometime soon to investigate this new deadlock.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1045 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
  2019-02-20 19:27 ` bugzilla-daemon
@ 2019-02-20 19:41 ` bugzilla-daemon
  2019-02-20 21:01 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 19:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 268 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #2 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Is this deadlock happens to you always now or only sometimes ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1028 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
  2019-02-20 19:27 ` bugzilla-daemon
  2019-02-20 19:41 ` bugzilla-daemon
@ 2019-02-20 21:01 ` bugzilla-daemon
  2019-02-20 21:01 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 21:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 306 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #3 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143422
  --> https://bugs.freedesktop.org/attachment.cgi?id=143422&action=edit
yet another dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1179 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-02-20 21:01 ` bugzilla-daemon
@ 2019-02-20 21:01 ` bugzilla-daemon
  2019-02-20 21:02 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 21:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 306 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #4 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143423
  --> https://bugs.freedesktop.org/attachment.cgi?id=143423&action=edit
yet another dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1179 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-02-20 21:01 ` bugzilla-daemon
@ 2019-02-20 21:02 ` bugzilla-daemon
  2019-02-20 21:06 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 21:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 341 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #5 from mikhail.v.gavrilov@gmail.com ---
(In reply to Andrey Grodzovsky from comment #2)
> Is this deadlock happens to you always now or only sometimes ?

Looks like deadlock happens always.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1159 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-02-20 21:02 ` bugzilla-daemon
@ 2019-02-20 21:06 ` bugzilla-daemon
  2019-02-21 11:46 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-20 21:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 380 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #6 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Andrey Grodzovsky from comment #2)
> Is this deadlock happens to you always now or only sometimes ?

I think i see the problem but not sure yet how to fix it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1211 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-02-20 21:06 ` bugzilla-daemon
@ 2019-02-21 11:46 ` bugzilla-daemon
  2019-02-21 14:23 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-21 11:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #7 from Tom St Denis <tom.stdenis@amd.com> ---
Hi from Reddit!

I don't see the attached umr output in the bugzilla report.  Though it looks
like Andrey (who is on my team) is looking at it.

It looks like an unhandled (or just bad) VM fault is locking up the GFX ring
and then a reset occurs which then brings up your lockdep.

So it's really two bugs (or 1 and a half)

1.  GPU hang due to VM fault
2.  GPU reset not working properly

The VM fault shouldn't lock up the GPU though even if it's the applications
fault (which does happen).  

The VM fault could be because of a bug in the kernel processing a CS, could be
from the UMD (mesa/etc), or could be from the application itself.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1601 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-02-21 11:46 ` bugzilla-daemon
@ 2019-02-21 14:23 ` bugzilla-daemon
  2019-02-21 17:23 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-21 14:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 404 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #8 from Tom St Denis <tom.stdenis@amd.com> ---
Just a follow up.  If you "chmod +s" umr as a non-root user then even if you
sudo it it won't have permissions to read the debugfs files.

Make sure you "sudo chmod +s" the binary or simply just "sudo umr".

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1183 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-02-21 14:23 ` bugzilla-daemon
@ 2019-02-21 17:23 ` bugzilla-daemon
  2019-02-26 10:42 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-21 17:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1959 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christian.koenig@amd.com

--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Adding Christian to consult on solution to the problem:

The deadlock happens because we lock fence->lock from dma_fence_signal followed
by sched->job_list_lock from drm_sched_process_job on one hand while on the
other do reverse order locking in drm_sched_stop, locking first
sched->job_list_lock and then fence->lock when trying to remove the cb from
dma_fence_remove_callback.

I see 2 possible solutions - 

1) Change ring_mirror_list to lock-less list (e.g. RCU list), problem is I
don't think there is a readily available implementation of doubly linked
lockless list, I didn't do to much search though... 

2) Go back to removing a job from  ring_mirror_list in drm_sched_job_finish -
as I see it for this to work we need to add a wait_queue (signal) to
drm_sched_job which will signal AFTER job is removed from ring_mirror_list in
drm_sched_job_finish, then in drm_sched_stop when iterating over
ring_mirror_list you build a new list of all jobs in progress (for them
dma_fence_remove_callback returns false) and then you wait on all theirs
wait_queues - then you are sure they removed themselves from ring_mirror_list
and you can proceed. 
Specific to AMDGPU  - To insure the jobs are not being freed in
drm_sched_job_finish while we process them in drm_sched_stop it's not enough to
wait to do  cancel_delayed_work_sync(&sched->work_tdr) for this scheduler but
also for all the other schedulers in the device and even in the hive for XGMI
use case.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3278 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-02-21 17:23 ` bugzilla-daemon
@ 2019-02-26 10:42 ` bugzilla-daemon
  2019-02-26 10:43 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-26 10:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2449 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #10 from mikhail.v.gavrilov@gmail.com ---
Even without reproducing GPU hang in kernel log I found "suspicious RCU usage"
and some errors.

[drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* ctx 000000002caf7aed is still
alive
[drm:amdgpu_ctx_mgr_fini [amdgpu]] *ERROR* ctx 000000002caf7aed is still alive

=============================
WARNING: suspicious RCU usage
5.0.0-rc1-drm-next-kernel+ #1 Tainted: G         C       
-----------------------------
include/linux/rcupdate.h:280 Illegal context switch in RCU read-side critical
section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
3 locks held by CrashBandicootN/26312:
 #0: 00000000eb680bad (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x4d/0x60
 #1: 00000000b3a3c406 (&p->lock){+.+.}, at: seq_read+0x38/0x410
 #2: 000000007c893f05 (rcu_read_lock){....}, at: dev_seq_start+0x5/0x100

stack backtrace:
CPU: 8 PID: 26312 Comm: CrashBandicootN Tainted: G         C       
5.0.0-rc1-drm-next-kernel+ #1
Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING,
BIOS 1103 11/16/2018
Call Trace:
 dump_stack+0x85/0xc0
 ___might_sleep+0x100/0x180
 __mutex_lock+0x61/0x930
 ? igb_get_stats64+0x29/0x80 [igb]
 ? seq_vprintf+0x33/0x50
 ? igb_get_stats64+0x29/0x80 [igb]
 igb_get_stats64+0x29/0x80 [igb]
 dev_get_stats+0x5c/0xc0
 dev_seq_printf_stats+0x33/0xe0
 dev_seq_show+0x10/0x30
 seq_read+0x2fa/0x410
 proc_reg_read+0x3c/0x60
 __vfs_read+0x37/0x1b0
 vfs_read+0xb2/0x170
 ksys_read+0x52/0xc0
 do_syscall_64+0x5c/0xa0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f2188d8934c
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 79 c9 01 00 48 8b 54
24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 30
44 89 c7 48 89 44 24 08 e8 af c9 01 00 48
RSP: 002b:000000000023f010 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000007d11b6d0 RCX: 00007f2188d8934c
RDX: 0000000000000400 RSI: 000000007d0dd4f0 RDI: 000000000000007b
RBP: 0000000000000d68 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f2188621c40 R11: 0000000000000246 R12: 00007f2188e59740
R13: 00007f2188e5a340 R14: 00000000000001ff R15: 000000007d11b6d0

This only occures when I use "amd-staging-drm-next".

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3237 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-02-26 10:42 ` bugzilla-daemon
@ 2019-02-26 10:43 ` bugzilla-daemon
  2019-02-26 15:17 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-26 10:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 352 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #11 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143476
  --> https://bugs.freedesktop.org/attachment.cgi?id=143476&action=edit
[drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* - kernel log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1316 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-02-26 10:43 ` bugzilla-daemon
@ 2019-02-26 15:17 ` bugzilla-daemon
  2019-02-26 15:20 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-26 15:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #12 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Thanks for more info. 
Please be patient on this deadlock bug since the fix is not trivial and also I
am currently working on other pressing issues - so this might take a bit.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1145 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-02-26 15:17 ` bugzilla-daemon
@ 2019-02-26 15:20 ` bugzilla-daemon
  2019-02-27  4:24 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-26 15:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 499 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #13 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Also, in the meanwhile try reverting this 2 patches and see if it helps - 

3741540 Andrey Grodzovsky   5 weeks ago    drm/sched: Rework HW fence
processing.   
222b5f0 Andrey Grodzovsky   5 weeks ago    drm/sched: Refactor ring mirror list
handling.  

They are causing this deadlock.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1260 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (12 preceding siblings ...)
  2019-02-26 15:20 ` bugzilla-daemon
@ 2019-02-27  4:24 ` bugzilla-daemon
  2019-02-27  4:25 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-27  4:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 257 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #14 from mikhail.v.gavrilov@gmail.com ---
Removing commits 3741540 and 222b5f0 not fix "suspicious RCU usage"

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1015 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (13 preceding siblings ...)
  2019-02-27  4:24 ` bugzilla-daemon
@ 2019-02-27  4:25 ` bugzilla-daemon
  2019-02-27  9:23 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-27  4:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 357 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #15 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143485
  --> https://bugs.freedesktop.org/attachment.cgi?id=143485&action=edit
"suspicious RCU usage" the commits  3741540 and 222b5f0 is reverted

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1361 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (14 preceding siblings ...)
  2019-02-27  4:25 ` bugzilla-daemon
@ 2019-02-27  9:23 ` bugzilla-daemon
  2019-02-28 16:43 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-27  9:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #16 from Michel Dänzer <michel@daenzer.net> ---
The "suspicious RCU usage" looks related to the igb ethernet driver, not
amdgpu. Quite likely it's already fixed in later 5.0-rc releases.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1097 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (15 preceding siblings ...)
  2019-02-27  9:23 ` bugzilla-daemon
@ 2019-02-28 16:43 ` bugzilla-daemon
  2019-02-28 16:45 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-28 16:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4214 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #17 from mikhail.v.gavrilov@gmail.com ---
(In reply to Michel Dänzer from comment #16)
> The "suspicious RCU usage" looks related to the igb ethernet driver, not
> amdgpu. Quite likely it's already fixed in later 5.0-rc releases.

Thank Michel, I knew, but I forgot about it.
This can be fixed by revert commit 59361316afcb08569af21e1af83e89c7051c055a,
but kernel in this branch even after it still much buggy than from Linus tree.
For example when I launch Assassin Creed Odyssey errors begin out related to
file system after then the computer hangs.

[44188.533349] ------------[ cut here ]------------
[44188.533840] kernel BUG at fs/inode.c:515!
[44188.533906] invalid opcode: 0000 [#1] SMP NOPTI
[44188.533909] CPU: 2 PID: 114 Comm: kswapd0 Tainted: G         C       
5.0.0-rc1-drm-next-kernel-3+ #3
[44188.533910] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[44188.533915] RIP: 0010:clear_inode+0x7c/0x90
[44188.533917] Code: 2b a8 40 75 29 48 8b 83 e8 01 00 00 48 8d 93 e8 01 00 00
48 39 c2 75 18 48 c7 83 e0 00 00 00 60 00 00 00 5b 5d c3 0f 0b 0f 0b <0f> 0b 0f
0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f
[44188.533918] RSP: 0018:ffffb577c3effae8 EFLAGS: 00010216
[44188.533920] RAX: ff4d207cff51227d RBX: ffff8e4f4f12f208 RCX:
0000000000000007
[44188.533922] RDX: ffff8e4f4f12f5c8 RSI: ffff8e561c5ec020 RDI:
ffff8e561c5eb2c0
[44188.533923] RBP: ffff8e4f4f12f440 R08: 0000283071cf739b R09:
0000000000000000
[44188.533924] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff9b042f00
[44188.533925] R13: ffff8e4f4f120780 R14: ffff8e4f4f120810 R15:
ffff8e4f4f12f208
[44188.533927] FS:  0000000000000000(0000) GS:ffff8e563ce00000(0000)
knlGS:0000000000000000
[44188.533928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44188.533929] CR2: 00007fcc2acc75d0 CR3: 00000005437e0000 CR4:
00000000003406e0
[44188.533930] Call Trace:
[44188.533935]  proc_evict_inode+0x1d/0x50
[44188.533937]  evict+0xd0/0x1a0
[44188.533940]  __dentry_kill+0xdf/0x190
[44188.533942]  shrink_dentry_list+0x11c/0x330
[44188.533944]  prune_dcache_sb+0x53/0x80
[44188.533947]  super_cache_scan+0xf7/0x1b0
[44188.533951]  do_shrink_slab+0x18a/0x3d0
[44188.533954]  shrink_slab+0x170/0x2c0
[44188.533957]  shrink_node+0x177/0x460
[44188.533960]  balance_pgdat+0x2d5/0x560
[44188.533965]  kswapd+0x25b/0x540
[44188.533968]  ? finish_wait+0x90/0x90
[44188.533972]  kthread+0x108/0x140
[44188.533973]  ? balance_pgdat+0x560/0x560
[44188.533975]  ? kthread_park+0x90/0x90
[44188.533978]  ret_from_fork+0x27/0x50
[44188.533981] Modules linked in: snd_seq_dummy uinput fuse rfcomm xt_CHECKSUM
ipt_MASQUERADE tun bridge stp llc devlink nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc
vfat fat arc4 r8822be(C) edac_mce_amd mac80211 kvm_amd kvm uvcvideo irqbypass
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_codec_realtek
videobuf2_common snd_hda_codec_generic ledtrig_audio videodev
snd_hda_codec_hdmi snd_usb_audio joydev snd_hda_intel cfg80211 snd_hda_codec
media snd_usbmidi_lib btusb snd_hda_core snd_rawmidi btrtl snd_seq btbcm
btintel crct10dif_pclmul snd_hwdep bluetooth snd_seq_device crc32_pclmul
eeepc_wmi asus_wmi snd_pcm sparse_keymap video
[44188.534011]  wmi_bmof ghash_clmulni_intel snd_timer snd ccp ecdh_generic
sp5100_tco soundcore rfkill i2c_piix4 k10temp pcspkr pcc_cpufreq gpio_amdpt
gpio_generic acpi_cpufreq binfmt_misc hid_sony amdgpu ff_memless
hid_logitech_hidpp chash amd_iommu_v2 gpu_sched ttm drm_kms_helper drm
crc32c_intel igb nvme dca i2c_algo_bit hid_logitech_dj nvme_core wmi
pinctrl_amd
[44188.534028] ---[ end trace 85933fec4c138d6a ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5053 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (16 preceding siblings ...)
  2019-02-28 16:43 ` bugzilla-daemon
@ 2019-02-28 16:45 ` bugzilla-daemon
  2019-04-05 20:41 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-02-28 16:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 364 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #18 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143497
  --> https://bugs.freedesktop.org/attachment.cgi?id=143497&action=edit
"amd-staging-drm-next" + commits  3741540, 222b5f0 and 5936131 is reverted

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1382 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (17 preceding siblings ...)
  2019-02-28 16:45 ` bugzilla-daemon
@ 2019-04-05 20:41 ` bugzilla-daemon
  2019-04-05 20:51 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-05 20:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 354 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #19 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 143882
  --> https://bugs.freedesktop.org/attachment.cgi?id=143882&action=edit
0001-drm-scheduler-rework-job-destruction.patch

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1387 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-04-05 20:41 ` bugzilla-daemon
@ 2019-04-05 20:51 ` bugzilla-daemon
  2019-04-06 14:44 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-05 20:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 537 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #20 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 143883
  --> https://bugs.freedesktop.org/attachment.cgi?id=143883&action=edit
0002-drm-sched-Adapt-drivers-to-new-job-destruction-logic.patch

Mikhail, attached 2 patches should help avoid the deadlock you see. Please
apply them on top of latest
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1674 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (19 preceding siblings ...)
  2019-04-05 20:51 ` bugzilla-daemon
@ 2019-04-06 14:44 ` bugzilla-daemon
  2019-04-06 14:48 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-06 14:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 329 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #21 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143886
  --> https://bugs.freedesktop.org/attachment.cgi?id=143886&action=edit
dmesg from patched kernel [0001 + 0002]

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (20 preceding siblings ...)
  2019-04-06 14:44 ` bugzilla-daemon
@ 2019-04-06 14:48 ` bugzilla-daemon
  2019-04-08 14:46 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-06 14:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #22 from mikhail.v.gavrilov@gmail.com ---
GPU reset not working again.
In dmesg seen only message "GPU reset begin!", but  keys Alt - F3 .. F7 no
working and later "init 6" couldn't reboot PC.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1110 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (21 preceding siblings ...)
  2019-04-06 14:48 ` bugzilla-daemon
@ 2019-04-08 14:46 ` bugzilla-daemon
  2019-04-08 14:48 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-08 14:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 548 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #23 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 143897
  --> https://bugs.freedesktop.org/attachment.cgi?id=143897&action=edit
[PATCH] drm_amd_display: wait for fence without holding reservation lock.eml

This looks again like deadlock of GPU reset with display waiting on scanout FB
fences in display code. Please manually apply the attached patch and test
again.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1639 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (22 preceding siblings ...)
  2019-04-08 14:46 ` bugzilla-daemon
@ 2019-04-08 14:48 ` bugzilla-daemon
  2019-04-08 21:28 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-08 14:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 401 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #24 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 143898
  --> https://bugs.freedesktop.org/attachment.cgi?id=143898&action=edit
[PATCH] drm_amd_display: wait for fence without holding reservation lock.patch

Reattaching.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1496 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (23 preceding siblings ...)
  2019-04-08 14:48 ` bugzilla-daemon
@ 2019-04-08 21:28 ` bugzilla-daemon
  2019-04-09  1:51 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-08 21:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 494 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #25 from Dieter Nützel <Dieter@nuetzel-hh.de> ---
(In reply to Andrey Grodzovsky from comment #24)
> Created attachment 143898 [details] [review]
> [PATCH] drm_amd_display: wait for fence without holding reservation
> lock.patch
> 
> Reattaching.

Hello Andrey,

is this one needed even on your sent triple pack (dri-devel and amd-gfx)?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1728 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (24 preceding siblings ...)
  2019-04-08 21:28 ` bugzilla-daemon
@ 2019-04-09  1:51 ` bugzilla-daemon
  2019-04-09  3:15 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-09  1:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 787 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #26 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Dieter Nützel from comment #25)
> (In reply to Andrey Grodzovsky from comment #24)
> > Created attachment 143898 [details] [review] [review]
> > [PATCH] drm_amd_display: wait for fence without holding reservation
> > lock.patch
> > 
> > Reattaching.
> 
> Hello Andrey,
> 
> is this one needed even on your sent triple pack (dri-devel and amd-gfx)?

If by triple pack you mean the 3 patches I sent out today then yes, why not.
This patch was sent a while ago and also reviewed-by but I don't see it in the
latest code so it probably was never submitted.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2100 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (25 preceding siblings ...)
  2019-04-09  1:51 ` bugzilla-daemon
@ 2019-04-09  3:15 ` bugzilla-daemon
  2019-04-09  3:16 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-09  3:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 333 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #27 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143904
  --> https://bugs.freedesktop.org/attachment.cgi?id=143904&action=edit
dmesg from patched kernel [0001 + 0002 + 3]

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1259 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (26 preceding siblings ...)
  2019-04-09  3:15 ` bugzilla-daemon
@ 2019-04-09  3:16 ` bugzilla-daemon
  2019-04-09 16:34 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-09  3:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 218 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #28 from mikhail.v.gavrilov@gmail.com ---
And again deadlock occurred.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (27 preceding siblings ...)
  2019-04-09  3:16 ` bugzilla-daemon
@ 2019-04-09 16:34 ` bugzilla-daemon
  2019-04-10  3:41 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-09 16:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 498 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #29 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mikhail.v.gavrilov from comment #28)
> And again deadlock occurred.

Yea, looks like another annoying regression. Try first of all to revert all the
3 patches i uploaded here and see if you are back to the original deadlock or
this issue is still there masking our problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1331 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (28 preceding siblings ...)
  2019-04-09 16:34 ` bugzilla-daemon
@ 2019-04-10  3:41 ` bugzilla-daemon
  2019-04-10 14:25 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-10  3:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 383 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #30 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143908
  --> https://bugs.freedesktop.org/attachment.cgi?id=143908&action=edit
dmesg without patches

Without patches backtrace are looks like initial deadlock backtrace.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1265 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (29 preceding siblings ...)
  2019-04-10  3:41 ` bugzilla-daemon
@ 2019-04-10 14:25 ` bugzilla-daemon
  2019-04-10 19:01 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-10 14:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 478 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #31 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mikhail.v.gavrilov from comment #30)
> Created attachment 143908 [details]
> dmesg without patches
> 
> Without patches backtrace are looks like initial deadlock backtrace.

OK, seems the patches somehow break display engine reset. Let me look into it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1517 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (30 preceding siblings ...)
  2019-04-10 14:25 ` bugzilla-daemon
@ 2019-04-10 19:01 ` bugzilla-daemon
  2019-04-10 20:11 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-10 19:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 792 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #32 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mikhail.v.gavrilov from comment #30)
> Created attachment 143908 [details]
> dmesg without patches
> 
> Without patches backtrace are looks like initial deadlock backtrace.

Can you please apply back the first 2 patches and then add the following - 

Install trace-cmd utility
Type sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
to enable kernel event tracing log

After hang happens 

cd /sys/kernel/debug/tracing && cat trace > event_dump, upload this dump here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1872 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (31 preceding siblings ...)
  2019-04-10 19:01 ` bugzilla-daemon
@ 2019-04-10 20:11 ` bugzilla-daemon
  2019-04-10 20:14 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-10 20:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 435 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #33 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Another thing - I noted a regression in existing driver code - can you manually
apply this patch and test ?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7c8930d9e8b188caa53705883373c49f95fd284

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1329 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (32 preceding siblings ...)
  2019-04-10 20:11 ` bugzilla-daemon
@ 2019-04-10 20:14 ` bugzilla-daemon
  2019-04-11  3:53 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-10 20:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 717 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #34 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Andrey Grodzovsky from comment #33)
> Another thing - I noted a regression in existing driver code - can you
> manually apply this patch and test ?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=f7c8930d9e8b188caa53705883373c49f95fd284

Just to be clear - apply the first 2 patches and this one on top 
So
0001-drm-scheduler-rework-job-destruction.patch
0002-drm-sched-Adapt-drivers-to-new-job-destruction-logic.patch
And then the patch from the link.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1648 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (33 preceding siblings ...)
  2019-04-10 20:14 ` bugzilla-daemon
@ 2019-04-11  3:53 ` bugzilla-daemon
  2019-04-11  4:03 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11  3:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 830 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #35 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143927
  --> https://bugs.freedesktop.org/attachment.cgi?id=143927&action=edit
dmesg (with successfull GPU reset after applying patches from Comment 34)

Hooray, it's a win! Now GPU successfully reseted without deadlock.

1. When these patches will landed in the mainline?

2. Is it possible to reduce the amount messages (really better not showing this
messages)  in terminal?
Because it's very hart to kill `gnome-session` process when in console appears
new messages constantly like this: "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
Failed to initialize parser -125!"

This flooding message is real headache.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1867 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (34 preceding siblings ...)
  2019-04-11  3:53 ` bugzilla-daemon
@ 2019-04-11  4:03 ` bugzilla-daemon
  2019-04-11 14:25 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11  4:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 406 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #36 from mikhail.v.gavrilov@gmail.com ---
Created attachment 143928
  --> https://bugs.freedesktop.org/attachment.cgi?id=143928&action=edit
Demonstration how message "drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!" get a real pain

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1506 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (35 preceding siblings ...)
  2019-04-11  4:03 ` bugzilla-daemon
@ 2019-04-11 14:25 ` bugzilla-daemon
  2019-04-11 14:41 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11 14:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 644 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #37 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mikhail.v.gavrilov from comment #36)
> Created attachment 143928 [details]
> Demonstration how message "drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> initialize parser -125!" get a real pain

Those messages are actually expected for this particular  use case, i might
consider switching them to info.

I am working on completing  review and pushing those patches into our upstream
branch.

Thanks for testing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1898 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (36 preceding siblings ...)
  2019-04-11 14:25 ` bugzilla-daemon
@ 2019-04-11 14:41 ` bugzilla-daemon
  2019-04-11 14:52 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11 14:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 432 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #38 from Michel Dänzer <michel@daenzer.net> ---
(In reply to Andrey Grodzovsky from comment #37)
> Those messages are actually expected for this particular  use case, i might
> consider switching them to info.

I'd argue they should be debug, so they don't spam dmesg by default.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1258 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (37 preceding siblings ...)
  2019-04-11 14:41 ` bugzilla-daemon
@ 2019-04-11 14:52 ` bugzilla-daemon
  2019-04-11 16:32 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11 14:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 494 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #39 from mikhail.v.gavrilov@gmail.com ---
(In reply to Michel Dänzer from comment #38)
> 
> I'd argue they should be debug, so they don't spam dmesg by default.

Let this messages stay in kernel log (dmesg), but I don't want see they when I
enter any command in terminal. They makes the process entering commands in
terminal very annoying.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1317 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (38 preceding siblings ...)
  2019-04-11 14:52 ` bugzilla-daemon
@ 2019-04-11 16:32 ` bugzilla-daemon
  2019-04-11 16:42 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11 16:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 613 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #40 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Michel Dänzer from comment #38)
> (In reply to Andrey Grodzovsky from comment #37)
> > Those messages are actually expected for this particular  use case, i might
> > consider switching them to info.
> 
> I'd argue they should be debug, so they don't spam dmesg by default.

Actually since in non reset case this is an actual error maybe i just print it
this happens NOT during reset.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1505 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (39 preceding siblings ...)
  2019-04-11 16:32 ` bugzilla-daemon
@ 2019-04-11 16:42 ` bugzilla-daemon
  2019-05-12 20:58 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-04-11 16:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 479 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #41 from Michel Dänzer <michel@daenzer.net> ---
(In reply to Andrey Grodzovsky from comment #40)
> Actually since in non reset case this is an actual error maybe i just print
> it this happens NOT during reset.

Buggy / malicious userspace can probably trigger an error here, allowing it to
spam dmesg, so it should be debug.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1305 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (40 preceding siblings ...)
  2019-04-11 16:42 ` bugzilla-daemon
@ 2019-05-12 20:58 ` bugzilla-daemon
  2019-05-12 21:00 ` bugzilla-daemon
  2019-11-19  9:14 ` bugzilla-daemon
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-05-12 20:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 7637 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #42 from mikhail.v.gavrilov@gmail.com ---
Andrey, could you look on another GPU reset issue?

[18735.255511] sony 0003:054C:09CC.0008: input,hidraw4: USB HID v81.11 Gamepad
[Sony Interactive Entertainment Wireless Controller] on
usb-0000:0c:00.3-1.4.4/input3
[18735.340415] usbcore: registered new interface driver snd-usb-audio
[18742.241131] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page0 timeout,
signaled seq=201025, emitted seq=201027
[18742.241141] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[18742.241196] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[18742.241200] amdgpu 0000:0b:00.0: GPU reset begin!
[18742.251116] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2264508, emitted seq=2264510
[18742.251153] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process chrome pid 17912 thread chrome:cs0 pid 17915
[18742.251156] amdgpu 0000:0b:00.0: GPU reset begin!
[18742.754134] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=15388, emitted seq=15388
[18742.754203] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[18742.754207] amdgpu 0000:0b:00.0: GPU reset begin!
[18751.968977] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout,
signaled seq=346, emitted seq=346
[18751.969042] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process chrome pid 3587 thread chrome:cs0 pid 3604
[18751.969047] amdgpu 0000:0b:00.0: GPU reset begin!
[18753.504894] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[18763.744722] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[18763.744758] amdgpu: [powerplay] Failed to send message 0x47, response
0xffffffff
[18763.744770] amdgpu: [powerplay] Failed to send message 0x28, response
0xffffffff
[18763.744772] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[18763.744784] amdgpu: [powerplay] Failed to send message 0x28, response
0xffffffff
[18763.744785] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[18763.744796] amdgpu: [powerplay] Failed to send message 0x28, response
0xffffffff
[18763.744798] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
[18763.744809] amdgpu: [powerplay] Failed to send message 0x26, response
0xffffffff
[18763.744810] amdgpu: [powerplay] Failed to set soft min gfxclk !
[18763.744811] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[18763.779547] [drm] REG_WAIT timeout 10us * 3000 tries -
dce110_stream_encoder_dp_blank line:950
[18763.779651] WARNING: CPU: 4 PID: 25404 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:277
generic_reg_wait.cold+0x29/0x30 [amdgpu]
[18763.779652] Modules linked in: snd_usb_audio hid_sony ff_memless
snd_usbmidi_lib snd_rawmidi snd_seq_dummy uinput fuse rfcomm cmac
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables bnep sunrpc vfat fat arc4 joydev r8822be(C)
snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio
eeepc_wmi snd_hda_codec_hdmi asus_wmi kvm_amd sparse_keymap snd_hda_intel video
wmi_bmof mac80211 snd_hda_codec kvm snd_hda_core snd_hwdep snd_seq btusb btrtl
snd_seq_device btbcm irqbypass btintel bluetooth snd_pcm crct10dif_pclmul
cfg80211 crc32_pclmul snd_timer ghash_clmulni_intel snd sp5100_tco ecdh_generic
soundcore i2c_piix4 k10temp rfkill ccp
[18763.779676]  gpio_amdpt gpio_generic pcc_cpufreq acpi_cpufreq xfs libcrc32c
amdgpu chash gpu_sched amd_iommu_v2 ttm drm_kms_helper igb drm nvme dca
i2c_algo_bit crc32c_intel nvme_core wmi pinctrl_amd
[18763.779687] CPU: 4 PID: 25404 Comm: kworker/4:0 Tainted: G         C       
5.1.0-1.fc31.x86_64 #1
[18763.779689] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 2202 04/11/2019
[18763.779694] Workqueue: events drm_sched_job_timedout [gpu_sched]
[18763.779764] RIP: 0010:generic_reg_wait.cold+0x29/0x30 [amdgpu]
[18763.779766] Code: ff 44 8b 44 24 68 48 8b 4c 24 60 48 c7 c7 10 c0 8b c0 8b
54 24 58 8b 34 24 e8 b0 59 92 d0 41 83 7c 24 20 01 0f 84 9b 78 fe ff <0f> 0b e9
94 78 fe ff e8 9a 4d ed ff 48 c7 c7 00 b0 95 c0 e8 de 5f
[18763.779767] RSP: 0018:ffff9b5320bb7720 EFLAGS: 00010297
[18763.779769] RAX: 0000000000000052 RBX: 0000000000000bb9 RCX:
0000000000000000
[18763.779771] RDX: 0000000000000000 RSI: ffff8d40fd9168c8 RDI:
ffff8d40fd9168c8
[18763.779772] RBP: 00000000000050e2 R08: ffff8d40fd9168c8 R09:
000000000000053d
[18763.779773] R10: ffff8d411e39d7d0 R11: ffff9b5320bb75d5 R12:
ffff8d40f97ca580
[18763.779774] R13: 00000000000050e2 R14: 00000000ffffffff R15:
00000000ffffffff
[18763.779776] FS:  0000000000000000(0000) GS:ffff8d40fd900000(0000)
knlGS:0000000000000000
[18763.779778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18763.779779] CR2: 00000c0d6e33c000 CR3: 000000027ed00000 CR4:
00000000003406e0
[18763.779780] Call Trace:
[18763.779854]  dce110_stream_encoder_dp_blank+0x159/0x2e0 [amdgpu]
[18763.779921]  core_link_disable_stream+0x42/0x270 [amdgpu]
[18763.779987]  dce110_reset_hw_ctx_wrap+0xca/0x1f0 [amdgpu]
[18763.780053]  dce110_apply_ctx_to_hw+0x4a/0x490 [amdgpu]
[18763.780100]  ? amdgpu_pm_compute_clocks+0xb9/0x5e0 [amdgpu]
[18763.780168]  ? dm_pp_apply_display_requirements+0x19e/0x1b0 [amdgpu]
[18763.780232]  dc_commit_state+0x26b/0x570 [amdgpu]
[18763.780236]  ? vsnprintf+0x3aa/0x4f0
[18763.780304]  amdgpu_dm_atomic_commit_tail+0x3e2/0x1980 [amdgpu]
[18763.780308]  ? vt_console_print+0x74/0x3f0
[18763.780311]  ? up+0x12/0x60
[18763.780314]  ? irq_work_queue+0x91/0xa0
[18763.780316]  ? wake_up_klogd+0x30/0x40
[18763.780318]  ? vprintk_emit+0x1ef/0x250
[18763.780320]  ? printk+0x58/0x6f
[18763.780328]  ? drm_atomic_helper_wait_for_dependencies+0x1e4/0x1f0
[drm_kms_helper]
[18763.780341]  ? drm_err+0x72/0x90 [drm]
[18763.780349]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[18763.780356]  commit_tail+0x3c/0x70 [drm_kms_helper]
[18763.780364]  drm_atomic_helper_commit+0x108/0x110 [drm_kms_helper]
[18763.780371]  drm_atomic_helper_disable_all+0x144/0x160 [drm_kms_helper]
[18763.780378]  drm_atomic_helper_suspend+0x60/0xf0 [drm_kms_helper]
[18763.780445]  dm_suspend+0x1c/0x60 [amdgpu]
[18763.780489]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[18763.780532]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[18763.780601]  amdgpu_device_pre_asic_reset+0x1f4/0x209 [amdgpu]
[18763.780668]  amdgpu_device_gpu_recover+0x77/0x731 [amdgpu]
[18763.780730]  amdgpu_job_timedout+0xf7/0x120 [amdgpu]
[18763.780733]  drm_sched_job_timedout+0x3a/0x70 [gpu_sched]
[18763.780737]  process_one_work+0x19d/0x380
[18763.780739]  worker_thread+0x50/0x3b0
[18763.780742]  kthread+0xfb/0x130
[18763.780744]  ? process_one_work+0x380/0x380
[18763.780745]  ? kthread_park+0x90/0x90
[18763.780748]  ret_from_fork+0x22/0x40
[18763.780751] ---[ end trace 1b9ec5613589027e ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 8391 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (41 preceding siblings ...)
  2019-05-12 20:58 ` bugzilla-daemon
@ 2019-05-12 21:00 ` bugzilla-daemon
  2019-11-19  9:14 ` bugzilla-daemon
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-05-12 21:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 322 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

--- Comment #43 from mikhail.v.gavrilov@gmail.com ---
Created attachment 144242
  --> https://bugs.freedesktop.org/attachment.cgi?id=144242&action=edit
dmesg of another GPU reset issue

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug 109692] deadlock occurs during GPU reset
  2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
                   ` (42 preceding siblings ...)
  2019-05-12 21:00 ` bugzilla-daemon
@ 2019-11-19  9:14 ` bugzilla-daemon
  43 siblings, 0 replies; 45+ messages in thread
From: bugzilla-daemon @ 2019-11-19  9:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 806 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #44 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/702.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2322 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2019-11-19  9:14 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 17:30 [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
2019-02-20 19:27 ` bugzilla-daemon
2019-02-20 19:41 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:02 ` bugzilla-daemon
2019-02-20 21:06 ` bugzilla-daemon
2019-02-21 11:46 ` bugzilla-daemon
2019-02-21 14:23 ` bugzilla-daemon
2019-02-21 17:23 ` bugzilla-daemon
2019-02-26 10:42 ` bugzilla-daemon
2019-02-26 10:43 ` bugzilla-daemon
2019-02-26 15:17 ` bugzilla-daemon
2019-02-26 15:20 ` bugzilla-daemon
2019-02-27  4:24 ` bugzilla-daemon
2019-02-27  4:25 ` bugzilla-daemon
2019-02-27  9:23 ` bugzilla-daemon
2019-02-28 16:43 ` bugzilla-daemon
2019-02-28 16:45 ` bugzilla-daemon
2019-04-05 20:41 ` bugzilla-daemon
2019-04-05 20:51 ` bugzilla-daemon
2019-04-06 14:44 ` bugzilla-daemon
2019-04-06 14:48 ` bugzilla-daemon
2019-04-08 14:46 ` bugzilla-daemon
2019-04-08 14:48 ` bugzilla-daemon
2019-04-08 21:28 ` bugzilla-daemon
2019-04-09  1:51 ` bugzilla-daemon
2019-04-09  3:15 ` bugzilla-daemon
2019-04-09  3:16 ` bugzilla-daemon
2019-04-09 16:34 ` bugzilla-daemon
2019-04-10  3:41 ` bugzilla-daemon
2019-04-10 14:25 ` bugzilla-daemon
2019-04-10 19:01 ` bugzilla-daemon
2019-04-10 20:11 ` bugzilla-daemon
2019-04-10 20:14 ` bugzilla-daemon
2019-04-11  3:53 ` bugzilla-daemon
2019-04-11  4:03 ` bugzilla-daemon
2019-04-11 14:25 ` bugzilla-daemon
2019-04-11 14:41 ` bugzilla-daemon
2019-04-11 14:52 ` bugzilla-daemon
2019-04-11 16:32 ` bugzilla-daemon
2019-04-11 16:42 ` bugzilla-daemon
2019-05-12 20:58 ` bugzilla-daemon
2019-05-12 21:00 ` bugzilla-daemon
2019-11-19  9:14 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.