linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	"sumit.semwal@linaro.org" <sumit.semwal@linaro.org>
Subject: Re: [PATCH] dma-buf: Fix missing excl fence waiting
Date: Tue, 25 Feb 2020 20:11:51 +0100	[thread overview]
Message-ID: <484ce316-55f2-b85e-a1e4-730db94f3fe3@amd.com> (raw)
In-Reply-To: <20200225172355.GO2363188@phenom.ffwll.local>

Am 25.02.20 um 18:23 schrieb Daniel Vetter:
> On Sun, Feb 23, 2020 at 01:04:15PM +0100, Christian König wrote:
>> Am 23.02.20 um 12:56 schrieb Pan, Xinhui:
>>> If shared fence list is not empty, even we want to test all fences, excl fence is ignored.
>>> That is abviously wrong, so fix it.
>> Yeah that is a known issue and I completely agree with you, but other
>> disagree.
>>
>> See the shared fences are meant to depend on the exclusive fence. So all
>> shared fences must finish only after the exclusive one has finished as well.
>>
>> The problem now is that for error handling this isn't necessary true. In
>> other words when a shared fence completes with an error it is perfectly
>> possible that he does this before the exclusive fence is finished.
>>
>> I'm trying to convince Daniel that this is a problem for years :)
> I thought the consensus is that reasonable gpu schedulers and gpu reset
> code should try to make really, really sure it only completes stuff in
> sequence? That's at least my take away from the syncobj timeline
> discussion, where you convinced me we shouldn't just crash&burn.
>
> I think as long as your scheduler is competent and your gpu reset tries to
> limit damage (i.e. kill offending ctx terminally, mark everything else
> that didn't complete for re-running) we should end up with everything
> completing in sequence. I guess if you do kill a lot more stuff, then
> you'd have to push these through your scheduler as dummy jobs, i.e. they
> still wait for their dependencies, but then all they do is set the
> dma_fence error and complete it. Maybe something the common scheduler
> could do.

Yes, that's exactly how we currently implement it. But I still think 
that this is not necessary the best approach :)

Anyway Xinhui's problem turned out to be deeper. We somehow add an old 
stale fence to the dma_resv object sometimes and that can result in 
quite a bunch of problems.

I'm currently trying to hunt down what's going wrong here in more detail.

Regards,
Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>    drivers/dma-buf/dma-resv.c | 9 +++++----
>>>    1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
>>> index 4264e64788c4..44dc64c547c6 100644
>>> --- a/drivers/dma-buf/dma-resv.c
>>> +++ b/drivers/dma-buf/dma-resv.c
>>> @@ -632,14 +632,14 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
>>>     */
>>>    bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
>>>    {
>>> -	unsigned seq, shared_count;
>>> +	unsigned int seq, shared_count, left;
>>>    	int ret;
>>>    	rcu_read_lock();
>>>    retry:
>>>    	ret = true;
>>>    	shared_count = 0;
>>> -	seq = read_seqcount_begin(&obj->seq);
>>> +	left = seq = read_seqcount_begin(&obj->seq);
>>>    	if (test_all) {
>>>    		unsigned i;
>>> @@ -647,7 +647,7 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
>>>    		struct dma_resv_list *fobj = rcu_dereference(obj->fence);
>>>    		if (fobj)
>>> -			shared_count = fobj->shared_count;
>>> +			left = shared_count = fobj->shared_count;
>>>    		for (i = 0; i < shared_count; ++i) {
>>>    			struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
>>> @@ -657,13 +657,14 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
>>>    				goto retry;
>>>    			else if (!ret)
>>>    				break;
>>> +			left--;
>>>    		}
>>>    		if (read_seqcount_retry(&obj->seq, seq))
>>>    			goto retry;
>>>    	}
>>> -	if (!shared_count) {
>>> +	if (!left) {
>>>    		struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
>>>    		if (fence_excl) {


  reply	other threads:[~2020-02-25 19:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-23 11:56 [PATCH] dma-buf: Fix missing excl fence waiting Pan, Xinhui
2020-02-23 12:00 ` Pan, Xinhui
2020-02-23 12:04 ` Christian König
2020-02-23 12:21   ` Pan, Xinhui
2020-02-23 12:12     ` Christian König
2020-02-25 17:23   ` Daniel Vetter
2020-02-25 19:11     ` Christian König [this message]
2020-02-28  5:45       ` Pan, Xinhui
2020-02-28  6:08         ` Pan, Xinhui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=484ce316-55f2-b85e-a1e4-730db94f3fe3@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).