From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: Eric Anholt <eric@anholt.net>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Nayan Deshmukh <nayan26deshmukh@gmail.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH 1/2] Revert "drm/sched: fix timeout handling v2"
Date: Thu, 8 Nov 2018 16:48:48 +0000 [thread overview]
Message-ID: <da2b3215-9904-b7f6-4f45-4e5e4242fa12@amd.com> (raw)
In-Reply-To: <875zx7o82m.fsf@anholt.net>
Am 08.11.18 um 17:19 schrieb Eric Anholt:
> "Koenig, Christian" <Christian.Koenig@amd.com> writes:
>
>> Am 08.11.18 um 17:04 schrieb Eric Anholt:
>>> This reverts commit 0efd2d2f68cd5dbddf4ecd974c33133257d16a8e. Fixes
>>> this failure in V3D GPU reset:
>>>
>>> [ 1418.227796] Unable to handle kernel NULL pointer dereference at virtual address 00000018
>>> [ 1418.235947] pgd = dc4c55ca
>>> [ 1418.238695] [00000018] *pgd=80000040004003, *pmd=00000000
>>> [ 1418.244132] Internal error: Oops: 206 [#1] SMP ARM
>>> [ 1418.248934] Modules linked in:
>>> [ 1418.252001] CPU: 0 PID: 10253 Comm: kworker/0:0 Not tainted 4.19.0-rc6+ #486
>>> [ 1418.259058] Hardware name: Broadcom STB (Flattened Device Tree)
>>> [ 1418.265002] Workqueue: events drm_sched_job_timedout
>>> [ 1418.269986] PC is at dma_fence_remove_callback+0x8/0x50
>>> [ 1418.275218] LR is at drm_sched_job_timedout+0x4c/0x118
>>> ...
>>> [ 1418.415891] [<c086b754>] (dma_fence_remove_callback) from [<c06e7e6c>] (drm_sched_job_timedout+0x4c/0x118)
>>> [ 1418.425571] [<c06e7e6c>] (drm_sched_job_timedout) from [<c0242500>] (process_one_work+0x2c8/0x7bc)
>>> [ 1418.434552] [<c0242500>] (process_one_work) from [<c0242a38>] (worker_thread+0x44/0x590)
>>> [ 1418.442663] [<c0242a38>] (worker_thread) from [<c0249b10>] (kthread+0x160/0x168)
>>> [ 1418.450076] [<c0249b10>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28)
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Signed-off-by: Eric Anholt <eric@anholt.net>
>> Well NAK. The problem here is that fence->parent is NULL which is most
>> likely caused by an issue somewhere else.
>>
>> We could easily work around that with an extra NULL check, but reverting
>> the patch would break GPU recovery again.
> My GPU recovery works with the revert and reliably doesn't work without
> it, so my idea of "break GPU recovery" is the opposite of yours. Can
> you help figure out what in this change broke my driver?
The problem is here:
> - list_for_each_entry_reverse(job, &sched->ring_mirror_list, node) {
> - struct drm_sched_fence *fence = job->s_fence;
> -
> - if (!dma_fence_remove_callback(fence->parent, &fence->cb))
> - goto already_signaled;
dma_fence_remove_callback() will fault if fence->parent is NULL. A
simple "if (!fence->parent) continue;" should be enough to work around that.
But I'm not sure how exactly fence->parent became NULL in the first place.
Going to double check the code once more,
Christian.
next prev parent reply other threads:[~2018-11-08 16:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-08 16:04 [PATCH 0/2] reverts to un-regress v3d Eric Anholt
2018-11-08 16:04 ` [PATCH 1/2] Revert "drm/sched: fix timeout handling v2" Eric Anholt
2018-11-08 16:10 ` Koenig, Christian
2018-11-08 16:19 ` Eric Anholt
2018-11-08 16:48 ` Koenig, Christian [this message]
2018-11-08 16:04 ` [PATCH 2/2] drm: Revert syncobj timeline changes Eric Anholt
2018-11-08 16:07 ` Koenig, Christian
2018-11-08 16:52 ` Christian König
2018-11-09 2:35 ` zhoucm1
2018-11-09 21:10 ` Eric Anholt
2018-11-09 22:26 ` Eric Anholt
[not found] ` <199c35bc-e684-fbc4-dcef-d7105d82f0ff@gmail.com>
2018-11-12 10:48 ` Chris Wilson
2018-11-12 11:47 ` Koenig, Christian
2018-11-13 5:57 ` zhoucm1
2018-12-19 17:53 ` Dmitry Osipenko
2018-12-21 18:27 ` Christian König
2018-12-21 18:35 ` Dmitry Osipenko
2018-12-21 18:45 ` Koenig, Christian
2018-12-21 18:59 ` Dmitry Osipenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da2b3215-9904-b7f6-4f45-4e5e4242fa12@amd.com \
--to=christian.koenig@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=eric@anholt.net \
--cc=linux-kernel@vger.kernel.org \
--cc=nayan26deshmukh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).