All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
Date: Mon, 29 Apr 2019 13:41:31 +0000	[thread overview]
Message-ID: <bug-110509-502-Zw3XXlfltL@http.bugs.freedesktop.org/> (raw)
In-Reply-To: <bug-110509-502@http.bugs.freedesktop.org/>


[-- Attachment #1.1: Type: text/plain, Size: 3056 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110509

--- Comment #6 from James.Dutton@gmail.com ---
I think I have found the problem.
[  657.526313] amdgpu 0000:43:00.0: GPU reset begin!
[  657.526318] Evicting PASID 32782 queues
[  667.756000] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-0]
hw_done or flip_done timed out


The intention is to do a GPU reset, but the implementation in the code is just
to try and do a suspend.
Part of the suspend does this:

Apr 29 14:29:19 thread kernel: [  363.445607] INFO: task kworker/u258:0:55
blocked for more than 120 seconds.
Apr 29 14:29:19 thread kernel: [  363.445612]       Not tainted 5.0.10-dirty
#26
Apr 29 14:29:19 thread kernel: [  363.445613] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 14:29:19 thread kernel: [  363.445615] kworker/u258:0  D    0    55     
2 0x80000000
Apr 29 14:29:19 thread kernel: [  363.445628] Workqueue: events_unbound
commit_work [drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445629] Call Trace:
Apr 29 14:29:19 thread kernel: [  363.445635]  __schedule+0x2c0/0x880
Apr 29 14:29:19 thread kernel: [  363.445637]  schedule+0x2c/0x70
Apr 29 14:29:19 thread kernel: [  363.445639]  schedule_timeout+0x1db/0x360
Apr 29 14:29:19 thread kernel: [  363.445641]  ? update_load_avg+0x8b/0x590
Apr 29 14:29:19 thread kernel: [  363.445645] 
dma_fence_default_wait+0x1eb/0x270
Apr 29 14:29:19 thread kernel: [  363.445647]  ? dma_fence_release+0xa0/0xa0
Apr 29 14:29:19 thread kernel: [  363.445649] 
dma_fence_wait_timeout+0xfd/0x110
Apr 29 14:29:19 thread kernel: [  363.445651] 
reservation_object_wait_timeout_rcu+0x17d/0x370
Apr 29 14:29:19 thread kernel: [  363.445710]  amdgpu_dm_do_flip+0x14a/0x4a0
[amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445767] 
amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445820]  ?
amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu]
Apr 29 14:29:19 thread kernel: [  363.445828]  commit_tail+0x42/0x70
[drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445835]  commit_work+0x12/0x20
[drm_kms_helper]
Apr 29 14:29:19 thread kernel: [  363.445838]  process_one_work+0x1fd/0x400
Apr 29 14:29:19 thread kernel: [  363.445840]  worker_thread+0x34/0x410
Apr 29 14:29:19 thread kernel: [  363.445841]  kthread+0x121/0x140
Apr 29 14:29:19 thread kernel: [  363.445843]  ? process_one_work+0x400/0x400
Apr 29 14:29:19 thread kernel: [  363.445844]  ? kthread_park+0x90/0x90
Apr 29 14:29:19 thread kernel: [  363.445847]  ret_from_fork+0x22/0x40


So, amggpu_dm_do_flip()  is the bit that hangs.
If the GPU needs to be reset because some of it has hung, trying a "flip" is
unlikely to work.
It is failing/hanging when doing "suspend of IP block <dm>" in
amdgpu_device_ip_suspend_phase1().

I would suggest creating code that actually tries to reset the GPU, instead of
trying to suspend it while GPU is hung.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3884 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2019-04-29 13:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-24 17:26 [Bug 110509] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout bugzilla-daemon
2019-04-24 17:31 ` bugzilla-daemon
2019-04-24 17:32 ` bugzilla-daemon
2019-04-24 17:33 ` bugzilla-daemon
2019-04-24 17:35 ` bugzilla-daemon
2019-04-28 15:42 ` bugzilla-daemon
2019-04-29 13:41 ` bugzilla-daemon [this message]
2019-04-29 18:30 ` bugzilla-daemon
2019-04-29 22:41 ` bugzilla-daemon
2019-04-30  1:26 ` bugzilla-daemon
2019-04-30 10:40 ` bugzilla-daemon
2019-04-30 10:44 ` bugzilla-daemon
2019-04-30 14:22 ` bugzilla-daemon
2019-04-30 14:26 ` bugzilla-daemon
2019-04-30 14:43 ` bugzilla-daemon
2019-08-13 20:56 ` bugzilla-daemon
2019-08-13 21:20 ` bugzilla-daemon
2019-09-25 18:49 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-110509-502-Zw3XXlfltL@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.