All of lore.kernel.org
 help / color / mirror / Atom feed
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Chris Wilson <chris.p.wilson@intel.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH v3 1/2] drm/i915: Fix negative value passed as remaining time
Date: Wed, 23 Nov 2022 12:29:19 +0100	[thread overview]
Message-ID: <1933091.usQuhbGJ8B@jkrzyszt-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <d05b3556-19c7-6b50-8f23-0d17428dc741@linux.intel.com>

On Tuesday, 22 November 2022 11:41:29 CET Tvrtko Ursulin wrote:
> 
> On 21/11/2022 23:19, Janusz Krzysztofik wrote:
> > Hi Andrzej,
> > 
> > Thanks for providing your R-b.
> > 
> > On Monday, 21 November 2022 18:40:51 CET Andrzej Hajda wrote:
> >> On 21.11.2022 15:56, Janusz Krzysztofik wrote:
> >>> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> >>> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> >>> extra argument 'remaining_timeout', intended for passing back unconsumed
> >>> portion of requested timeout when 0 (success) is returned.  However, when
> >>> request retirement happens to succeed despite an error returned by a call
> >>> to dma_fence_wait_timeout(), that error code (a negative value) is passed
> >>> back instead of remaining time.  If we then pass that negative value
> >>> forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
> >>> will be triggered.
> 
> Right, AFAICT a GEM_BUG_ON in debug builds, but in production builds 
> negative timeout will get passed along all the way to schedule_timeout 
> where error and call trace will be dumped. So fix appears warranted indeed.
> 
> >>> If request retirement succeeds but an error code is passed back via
> >>> remaininig_timeout, we may have no clue on how much of the initial timeout
> >>> might have been left for spending it on waiting for GuC to become idle.
> >>> OTOH, since all pending requests have been successfully retired, that
> >>> error code has been already ignored by intel_gt_retire_requests_timeout(),
> >>> then we shouldn't fail.
> >>>
> >>> Assume no more time has been left on error and pass 0 timeout value to
> >>> intel_uc_wait_for_idle() to give it a chance to return success if GuC is
> >>> already idle.
> >>>
> >>> v3: Don't fail on any error passed back via remaining_timeout.
> >>>
> >>> v2: Fix the issue on the caller side, not the provider.
> >>>
> >>> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> > with GuC")
> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >>> Cc: stable@vger.kernel.org # v5.15+
> >>
> >> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > 
> > While still open for comments from others, I'm now looking for potential
> > committer.
> 
> Both patches are considered good to go?

Yes, I hope so.  The actual diff of 2/2 v3 has received R-b from Andrzej still 
under discussion of v2, then I decided to add that tag to v3.

Andrzej, can you please respond to 2/2 v3 and confirm your R-b applies?

Thanks,
Janusz

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Thanks,
> > Janusz
> > 
> > 
> >>
> >> Regards
> >> Andrzej
> >>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
> >>>    1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/
> > intel_gt.c
> >>> index b5ad9caa55372..7ef0edb2e37cd 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long
> > timeout)
> >>>    			return -EINTR;
> >>>    	}
> >>>    
> >>> -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> >>> -							
> > remaining_timeout);
> >>> +	if (timeout)
> >>> +		return timeout;
> >>> +
> >>> +	if (remaining_timeout < 0)
> >>> +		remaining_timeout = 0;
> >>> +
> >>> +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
> >>>    }
> >>>    
> >>>    int intel_gt_init(struct intel_gt *gt)
> >>
> >>
> > 
> > 
> > 
> > 
> 





WARNING: multiple messages have this Message-ID (diff)
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Chris Wilson <chris.p.wilson@intel.com>,
	Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH v3 1/2] drm/i915: Fix negative value passed as remaining time
Date: Wed, 23 Nov 2022 12:29:19 +0100	[thread overview]
Message-ID: <1933091.usQuhbGJ8B@jkrzyszt-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <d05b3556-19c7-6b50-8f23-0d17428dc741@linux.intel.com>

On Tuesday, 22 November 2022 11:41:29 CET Tvrtko Ursulin wrote:
> 
> On 21/11/2022 23:19, Janusz Krzysztofik wrote:
> > Hi Andrzej,
> > 
> > Thanks for providing your R-b.
> > 
> > On Monday, 21 November 2022 18:40:51 CET Andrzej Hajda wrote:
> >> On 21.11.2022 15:56, Janusz Krzysztofik wrote:
> >>> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> >>> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> >>> extra argument 'remaining_timeout', intended for passing back unconsumed
> >>> portion of requested timeout when 0 (success) is returned.  However, when
> >>> request retirement happens to succeed despite an error returned by a call
> >>> to dma_fence_wait_timeout(), that error code (a negative value) is passed
> >>> back instead of remaining time.  If we then pass that negative value
> >>> forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
> >>> will be triggered.
> 
> Right, AFAICT a GEM_BUG_ON in debug builds, but in production builds 
> negative timeout will get passed along all the way to schedule_timeout 
> where error and call trace will be dumped. So fix appears warranted indeed.
> 
> >>> If request retirement succeeds but an error code is passed back via
> >>> remaininig_timeout, we may have no clue on how much of the initial timeout
> >>> might have been left for spending it on waiting for GuC to become idle.
> >>> OTOH, since all pending requests have been successfully retired, that
> >>> error code has been already ignored by intel_gt_retire_requests_timeout(),
> >>> then we shouldn't fail.
> >>>
> >>> Assume no more time has been left on error and pass 0 timeout value to
> >>> intel_uc_wait_for_idle() to give it a chance to return success if GuC is
> >>> already idle.
> >>>
> >>> v3: Don't fail on any error passed back via remaining_timeout.
> >>>
> >>> v2: Fix the issue on the caller side, not the provider.
> >>>
> >>> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> > with GuC")
> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >>> Cc: stable@vger.kernel.org # v5.15+
> >>
> >> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > 
> > While still open for comments from others, I'm now looking for potential
> > committer.
> 
> Both patches are considered good to go?

Yes, I hope so.  The actual diff of 2/2 v3 has received R-b from Andrzej still 
under discussion of v2, then I decided to add that tag to v3.

Andrzej, can you please respond to 2/2 v3 and confirm your R-b applies?

Thanks,
Janusz

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Thanks,
> > Janusz
> > 
> > 
> >>
> >> Regards
> >> Andrzej
> >>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
> >>>    1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/
> > intel_gt.c
> >>> index b5ad9caa55372..7ef0edb2e37cd 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long
> > timeout)
> >>>    			return -EINTR;
> >>>    	}
> >>>    
> >>> -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> >>> -							
> > remaining_timeout);
> >>> +	if (timeout)
> >>> +		return timeout;
> >>> +
> >>> +	if (remaining_timeout < 0)
> >>> +		remaining_timeout = 0;
> >>> +
> >>> +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
> >>>    }
> >>>    
> >>>    int intel_gt_init(struct intel_gt *gt)
> >>
> >>
> > 
> > 
> > 
> > 
> 





  reply	other threads:[~2022-11-23 11:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 14:56 [PATCH v3 0/2] drm/i915: Fix timeout handling when retiring requests Janusz Krzysztofik
2022-11-21 14:56 ` [Intel-gfx] " Janusz Krzysztofik
2022-11-21 14:56 ` [PATCH v3 1/2] drm/i915: Fix negative value passed as remaining time Janusz Krzysztofik
2022-11-21 14:56   ` [Intel-gfx] " Janusz Krzysztofik
2022-11-21 17:40   ` Andrzej Hajda
2022-11-21 17:40     ` Andrzej Hajda
2022-11-21 23:19     ` Janusz Krzysztofik
2022-11-21 23:19       ` Janusz Krzysztofik
2022-11-22 10:41       ` Tvrtko Ursulin
2022-11-22 10:41         ` Tvrtko Ursulin
2022-11-23 11:29         ` Janusz Krzysztofik [this message]
2022-11-23 11:29           ` Janusz Krzysztofik
2022-11-21 14:56 ` [PATCH v3 2/2] drm/i915: Never return 0 if not all requests retired Janusz Krzysztofik
2022-11-21 14:56   ` [Intel-gfx] " Janusz Krzysztofik
2022-11-22 10:50   ` Tvrtko Ursulin
2022-11-22 10:50     ` [Intel-gfx] " Tvrtko Ursulin
2022-11-23  9:28     ` Janusz Krzysztofik
2022-11-23  9:28       ` [Intel-gfx] " Janusz Krzysztofik
2022-11-23 12:57       ` Tvrtko Ursulin
2022-11-23 12:57         ` [Intel-gfx] " Tvrtko Ursulin
2022-11-23 16:21         ` Janusz Krzysztofik
2022-11-23 16:21           ` [Intel-gfx] " Janusz Krzysztofik
2022-11-24 12:27           ` Tvrtko Ursulin
2022-11-24 12:27             ` [Intel-gfx] " Tvrtko Ursulin
2022-11-23 15:42   ` Andrzej Hajda
2022-11-23 15:42     ` [Intel-gfx] " Andrzej Hajda
2022-11-21 15:37 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for drm/i915: Fix timeout handling when retiring requests (rev3) Patchwork
2022-11-21 15:56 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-21 18:03 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1933091.usQuhbGJ8B@jkrzyszt-mobl1.ger.corp.intel.com \
    --to=janusz.krzysztofik@linux.intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.