All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
@ 2019-04-24 17:26 bugzilla-daemon
  2019-04-24 17:31 ` bugzilla-daemon
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-24 17:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1278 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

            Bug ID: 110509
           Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
                    timeout
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/Gallium/radeonsi
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: James.Dutton@gmail.com
        QA Contact: dri-devel@lists.freedesktop.org

AMD Vega 56 fails to reset:
[  188.771043] Evicting PASID 32782 queues
[  188.782094] Restoring PASID 32782 queues
[  214.563362] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=19285, emitted seq=19287
[  214.563432] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process ACOdyssey.exe pid 3761 thread ACOdyssey.exe pid 3761
[  214.563439] amdgpu 0000:43:00.0: GPU reset begin!
[  214.563445] Evicting PASID 32782 queues
[  224.793032] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-0]
hw_done or flip_done timed out


How do I go about diagnosing this problem?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2641 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
@ 2019-04-24 17:31 ` bugzilla-daemon
  2019-04-24 17:32 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-24 17:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 354 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #1 from James.Dutton@gmail.com ---
Created attachment 144084
  --> https://bugs.freedesktop.org/attachment.cgi?id=144084&action=edit
./umr -O bits -r *.*.mmGRBM_STATUS

Output while GPU failed to reset.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1303 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
  2019-04-24 17:31 ` bugzilla-daemon
@ 2019-04-24 17:32 ` bugzilla-daemon
  2019-04-24 17:33 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-24 17:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #2 from James.Dutton@gmail.com ---
Created attachment 144085
  --> https://bugs.freedesktop.org/attachment.cgi?id=144085&action=edit
/usr/src/umr/build/src/app/umr -wa

Output of the wave.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1289 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
  2019-04-24 17:31 ` bugzilla-daemon
  2019-04-24 17:32 ` bugzilla-daemon
@ 2019-04-24 17:33 ` bugzilla-daemon
  2019-04-24 17:35 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-24 17:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 311 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #3 from James.Dutton@gmail.com ---
Created attachment 144086
  --> https://bugs.freedesktop.org/attachment.cgi?id=144086&action=edit
dmesg

dmesg during reset.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1202 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-04-24 17:33 ` bugzilla-daemon
@ 2019-04-24 17:35 ` bugzilla-daemon
  2019-04-28 15:42 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-24 17:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 571 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

James.Dutton@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144086|0                           |1
        is obsolete|                            |

--- Comment #4 from James.Dutton@gmail.com ---
Created attachment 144087
  --> https://bugs.freedesktop.org/attachment.cgi?id=144087&action=edit
dmesg

dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1977 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-04-24 17:35 ` bugzilla-daemon
@ 2019-04-28 15:42 ` bugzilla-daemon
  2019-04-29 13:41 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-28 15:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 865 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #5 from James.Dutton@gmail.com ---
This is a result of trying to play games in wine and dxvk.
It used to work, but the latest mesa git fails.
Games that fails are:
Assassin's creed odyssey
Devil May Cry 5

Both these games get through the title sequences, but fail when you reach the
actual game play. The GPU hangs and tries to reset, but fails to reset.

So, there are two problems:
1) Why does it hang in the first place
2) Why does it fail to recover and reset itself.

I can ssh into the PC.
poweroff   <-  Attempts to power off but never actually reaches off state.
echo b > /proc/sysrq-trigger    <-  reboots the box, and everything is then ok
again, so long as one does not try to play a game.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1663 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-04-28 15:42 ` bugzilla-daemon
@ 2019-04-29 13:41 ` bugzilla-daemon
  2019-04-29 18:30 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-29 13:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3056 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #6 from James.Dutton@gmail.com ---
I think I have found the problem.
[  657.526313] amdgpu 0000:43:00.0: GPU reset begin!
[  657.526318] Evicting PASID 32782 queues
[  667.756000] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-0]
hw_done or flip_done timed out


The intention is to do a GPU reset, but the implementation in the code is just
to try and do a suspend.
Part of the suspend does this:

Apr 29 14:29:19 thread kernel: [  363.445607] INFO: task kworker/u258:0:55
blocked for more than 120 seconds.
Apr 29 14:29:19 thread kernel: [  363.445612]       Not tainted 5.0.10-dirty
#26
Apr 29 14:29:19 thread kernel: [  363.445613] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 14:29:19 thread kernel: [  363.445615] kworker/u258:0  D    0    55     
2 0x80000000
Apr 29 14:29:19 thread kernel: [  363.445628] Workqueue: events_unbound
commit_work [drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445629] Call Trace:
Apr 29 14:29:19 thread kernel: [  363.445635]  __schedule+0x2c0/0x880
Apr 29 14:29:19 thread kernel: [  363.445637]  schedule+0x2c/0x70
Apr 29 14:29:19 thread kernel: [  363.445639]  schedule_timeout+0x1db/0x360
Apr 29 14:29:19 thread kernel: [  363.445641]  ? update_load_avg+0x8b/0x590
Apr 29 14:29:19 thread kernel: [  363.445645] 
dma_fence_default_wait+0x1eb/0x270
Apr 29 14:29:19 thread kernel: [  363.445647]  ? dma_fence_release+0xa0/0xa0
Apr 29 14:29:19 thread kernel: [  363.445649] 
dma_fence_wait_timeout+0xfd/0x110
Apr 29 14:29:19 thread kernel: [  363.445651] 
reservation_object_wait_timeout_rcu+0x17d/0x370
Apr 29 14:29:19 thread kernel: [  363.445710]  amdgpu_dm_do_flip+0x14a/0x4a0
[amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445767] 
amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445820]  ?
amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445828]  commit_tail+0x42/0x70
[drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445835]  commit_work+0x12/0x20
[drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445838]  process_one_work+0x1fd/0x400
Apr 29 14:29:19 thread kernel: [  363.445840]  worker_thread+0x34/0x410
Apr 29 14:29:19 thread kernel: [  363.445841]  kthread+0x121/0x140
Apr 29 14:29:19 thread kernel: [  363.445843]  ? process_one_work+0x400/0x400
Apr 29 14:29:19 thread kernel: [  363.445844]  ? kthread_park+0x90/0x90
Apr 29 14:29:19 thread kernel: [  363.445847]  ret_from_fork+0x22/0x40


So, amggpu_dm_do_flip()  is the bit that hangs.
If the GPU needs to be reset because some of it has hung, trying a "flip" is
unlikely to work.
It is failing/hanging when doing "suspend of IP block <dm>" in
amdgpu_device_ip_suspend_phase1().

I would suggest creating code that actually tries to reset the GPU, instead of
trying to suspend it while GPU is hung.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3884 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-04-29 13:41 ` bugzilla-daemon
@ 2019-04-29 18:30 ` bugzilla-daemon
  2019-04-29 22:41 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-29 18:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 606 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #7 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to James.Dutton from comment #6)
> 
> I would suggest creating code that actually tries to reset the GPU, instead
> of trying to suspend it while GPU is hung.

That is part of the GPU reset sequence.  We need to attempt to stop the engines
before resetting the GPU.  That is what the suspend code does.  Not all of the
engines are necessarily hung so you need to stop and drain them properly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-04-29 18:30 ` bugzilla-daemon
@ 2019-04-29 22:41 ` bugzilla-daemon
  2019-04-30  1:26 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-29 22:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1681 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #8 from James.Dutton@gmail.com ---
Thank you for the feedback.
Is there a data sheet somewhere that might help me work out a fix for this.
What I would like is:
1) A way to scan all the engines and detect which ones have hung.
2) A way to intentionally halt an engine and tidy up. So that the modprobe,
rmmod, modprobe scenario works. 
3) data sheet details regarding how to un-hang each engine.
Specifically, in this case the IP block <dm>.

Maybe that is not possible, and (I think you are hinting at it), one cannot
reset an individual IP block. So the approach is to suspend the card, and then
do a full reset of the entire card, then resume.

I think a different suspend process would be better.
We have a for_each within the suspend code. The output of that code should not
be a single error code, but instead an array indicating the current state of
each engine (running/hung), the intended state and status of whether the
intention worked or failed. If the loop through the for_each, it could compare
the current state and intended state, and attempt to reach the intended state,
and report an error code for each engine. Then the code to achieve the
transition can been different depending on the current -> intended transition.
i.e. code for running -> suspended, can be different than code for hung ->
suspended. The code already needs to know which engines are enabled/disabled 
(Vega 56 vs Vega 64)

I can hang this IP block <dm> at will. I have 2 games that hang it within
seconds of starting.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2491 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-04-29 22:41 ` bugzilla-daemon
@ 2019-04-30  1:26 ` bugzilla-daemon
  2019-04-30 10:40 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30  1:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2711 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #9 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to James.Dutton from comment #8)
> Thank you for the feedback.
> Is there a data sheet somewhere that might help me work out a fix for this.
> What I would like is:
> 1) A way to scan all the engines and detect which ones have hung.

If the gpu scheduler for a queue on a particular engine times out, you can be
pretty sure the engine has hung.  At that point you can check the current busy
status for the block (IP is_idle() callback).

> 2) A way to intentionally halt an engine and tidy up. So that the modprobe,
> rmmod, modprobe scenario works. 

hw_fini() IP callback.

> 3) data sheet details regarding how to un-hang each engine.
> Specifically, in this case the IP block <dm>.

Each IP has a soft reset (implemented via the IP soft_reset() callback), but
depending on the hang, in some cases, you may have to do a full GPU reset to
recover.  This is not a hw hang, it's a sw deadlock.  

> 
> Maybe that is not possible, and (I think you are hinting at it), one cannot
> reset an individual IP block. So the approach is to suspend the card, and
> then do a full reset of the entire card, then resume.

All asics support full GPU reset which is implemented via the SOC level
amdgpu_asic_funcs reset() callback.

> 
> I think a different suspend process would be better.
> We have a for_each within the suspend code. The output of that code should
> not be a single error code, but instead an array indicating the current
> state of each engine (running/hung), the intended state and status of
> whether the intention worked or failed. If the loop through the for_each, it
> could compare the current state and intended state, and attempt to reach the
> intended state, and report an error code for each engine. Then the code to
> achieve the transition can been different depending on the current ->
> intended transition.
> i.e. code for running -> suspended, can be different than code for hung ->
> suspended. The code already needs to know which engines are enabled/disabled
> (Vega 56 vs Vega 64)

We don't really care of the suspend fails or not.  See
amdgpu_device_gpu_recover() for the full sequence.

> 
> I can hang this IP block <dm> at will. I have 2 games that hang it within
> seconds of starting.

There was a deadlock in the dm code which has been fixed.  Please try a new
code base.  e.g.,
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-5.2-wip

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3993 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-04-30  1:26 ` bugzilla-daemon
@ 2019-04-30 10:40 ` bugzilla-daemon
  2019-04-30 10:44 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30 10:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 311 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #10 from James.Dutton@gmail.com ---
Created attachment 144118
  --> https://bugs.freedesktop.org/attachment.cgi?id=144118&action=edit
dmesg with drm-next-5.2-wip

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-04-30 10:40 ` bugzilla-daemon
@ 2019-04-30 10:44 ` bugzilla-daemon
  2019-04-30 14:22 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30 10:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 828 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #11 from James.Dutton@gmail.com ---
I tried with drm-next-5.2-wip.

It does not hang any more, but I have a new error now.

It is better, in the sense that I can now reboot the system normally, and not
resort to echo b >/proc/sysrq-trigger

[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

After the GPU reset, the screen is corrupted.
I can do, via ssh,  service gdm stop.  service gdm start   and I then get a
working login screen. (Mouse moves, I can type in password)
I cannot actually login because X fails. The desktop fails to appear and it
returns to the login greeter screen.

I will try to get more details when I have time later.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1621 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-04-30 10:44 ` bugzilla-daemon
@ 2019-04-30 14:22 ` bugzilla-daemon
  2019-04-30 14:26 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30 14:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 524 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #12 from James.Dutton@gmail.com ---

The error is from this bit of code in:
amdgpu_cs.c:  Line about 232
In function: amdgpu_cs_parser_init:
        if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
                ret = -ECANCELED;
                goto free_all_kdata;
        }

So, I guess, somewhere is the gpu reset, those values need to be fixed up.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1326 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-04-30 14:22 ` bugzilla-daemon
@ 2019-04-30 14:26 ` bugzilla-daemon
  2019-04-30 14:43 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30 14:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #13 from Michel Dänzer <michel@daenzer.net> ---
(In reply to James.Dutton from comment #12)
> In function: amdgpu_cs_parser_init:
>         if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
>                 ret = -ECANCELED;
>                 goto free_all_kdata;
>         }
> 
> So, I guess, somewhere is the gpu reset, those values need to be fixed up.

It means the VRAM contents were lost during the GPU reset, so any existing
userspace contexts are invalid and need to be re-created (which at this point
boils down to restarting any processes using the GPU for rendering).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1658 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (12 preceding siblings ...)
  2019-04-30 14:26 ` bugzilla-daemon
@ 2019-04-30 14:43 ` bugzilla-daemon
  2019-08-13 20:56 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-04-30 14:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 414 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #14 from James.Dutton@gmail.com ---
I stop gdm and kill any remaining X processes.
When I start gdm and login, it works, and displays the desktop.

Previously, I was leaving on of the X processes running.

So, I think this (drm-next-5.2-wip) has fixed this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1204 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (13 preceding siblings ...)
  2019-04-30 14:43 ` bugzilla-daemon
@ 2019-08-13 20:56 ` bugzilla-daemon
  2019-08-13 21:20 ` bugzilla-daemon
  2019-09-25 18:49 ` bugzilla-daemon
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-08-13 20:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 640 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

Alessandro <lifeisfoo@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lifeisfoo@gmail.com

--- Comment #15 from Alessandro <lifeisfoo@gmail.com> ---
Created attachment 145050
  --> https://bugs.freedesktop.org/attachment.cgi?id=145050&action=edit
dmsg drm amdgpu

I'm facing the same issue with 5.2.x and 5.3-rc4 kernel and a Radeon RX 580.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2118 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (14 preceding siblings ...)
  2019-08-13 20:56 ` bugzilla-daemon
@ 2019-08-13 21:20 ` bugzilla-daemon
  2019-09-25 18:49 ` bugzilla-daemon
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-08-13 21:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 458 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

Alessandro <lifeisfoo@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #145050|dmsg drm amdgpu             |dmsg drm amdgpu linux
        description|                            |5.3-rc4 from ubuntu ppa

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1140 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
  2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
                   ` (15 preceding siblings ...)
  2019-08-13 21:20 ` bugzilla-daemon
@ 2019-09-25 18:49 ` bugzilla-daemon
  16 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2019-09-25 18:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 843 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

GitLab Migration User <gitlab-migration@fdo.invalid> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #16 from GitLab Migration User <gitlab-migration@fdo.invalid> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1389.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2477 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-09-25 18:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
2019-04-24 17:31 ` bugzilla-daemon
2019-04-24 17:32 ` bugzilla-daemon
2019-04-24 17:33 ` bugzilla-daemon
2019-04-24 17:35 ` bugzilla-daemon
2019-04-28 15:42 ` bugzilla-daemon
2019-04-29 13:41 ` bugzilla-daemon
2019-04-29 18:30 ` bugzilla-daemon
2019-04-29 22:41 ` bugzilla-daemon
2019-04-30  1:26 ` bugzilla-daemon
2019-04-30 10:40 ` bugzilla-daemon
2019-04-30 10:44 ` bugzilla-daemon
2019-04-30 14:22 ` bugzilla-daemon
2019-04-30 14:26 ` bugzilla-daemon
2019-04-30 14:43 ` bugzilla-daemon
2019-08-13 20:56 ` bugzilla-daemon
2019-08-13 21:20 ` bugzilla-daemon
2019-09-25 18:49 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.