All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request
Date: Wed, 8 Jun 2016 10:42:58 +0200	[thread overview]
Message-ID: <20160608084258.GM3363@phenom.ffwll.local> (raw)
In-Reply-To: <1464970133-29859-3-git-send-email-chris@chris-wilson.co.uk>

On Fri, Jun 03, 2016 at 05:08:34PM +0100, Chris Wilson wrote:
> We can forgo queuing the hangcheck from the start of every request to
> until we wait upon a request. This reduces the overhead of every
> request, but may increase the latency of detecting a hang. Howeever, if
> nothing every waits upon a hang, did it ever hang? It also improves the
> robustness of the wait-request by ensuring that the hangchecker is
> indeed running before we sleep indefinitely (and thereby ensuring that
> we never actually sleep forever waiting for a dead GPU).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I think this will run into TDR patches, where we want a super-low-latency
hangcheck in some cases. But then I think that's implemented by wrapping
the batch in some special cs commands to insta-kill the engine if the
timeout expired, so probably not a big problem. Still worth it to
double-check with Mika I'd say.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
>  drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
>  2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c7a67a7412cd..03256f096ab6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(dev_priv);
> +
>  		timer.function = NULL;
>  		if (timeout || missed_irq(dev_priv, engine)) {
>  			unsigned long expire;
> @@ -2674,8 +2677,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  	/* Not allowed to fail! */
>  	WARN(ret, "emit|add_request failed: %d!\n", ret);
>  
> -	i915_queue_hangcheck(engine->i915);
> -
>  	queue_delayed_work(dev_priv->wq,
>  			   &dev_priv->mm.retire_work,
>  			   round_jiffies_up_relative(HZ));
> @@ -3019,8 +3020,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
>  
>  	if (idle)
>  		mod_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.idle_work,
> -				   msecs_to_jiffies(100));
> +				 &dev_priv->mm.idle_work,
> +				 msecs_to_jiffies(100));
>  
>  	return idle;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 5c7378374ae6..1303d7c034d3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3134,10 +3134,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>  
>  	for_each_engine_id(engine, dev_priv, id) {
> +		bool busy = waitqueue_active(&engine->irq_queue);
>  		u64 acthd;
>  		u32 seqno;
>  		unsigned user_interrupts;
> -		bool busy = true;
>  
>  		semaphore_clear_deadlocks(dev_priv);
>  
> @@ -3160,12 +3160,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		if (engine->hangcheck.seqno == seqno) {
>  			if (ring_idle(engine, seqno)) {
>  				engine->hangcheck.action = HANGCHECK_IDLE;
> -				if (waitqueue_active(&engine->irq_queue)) {
> +				if (busy) {
>  					/* Safeguard against driver failure */
>  					user_interrupts = kick_waiters(engine);
>  					engine->hangcheck.score += BUSY;
> -				} else
> -					busy = false;
> +				}
>  			} else {
>  				/* We always increment the hangcheck score
>  				 * if the ring is busy and still processing
> @@ -3239,9 +3238,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		goto out;
>  	}
>  
> +	/* Reset timer in case GPU hangs without another request being added */
>  	if (busy_count)
> -		/* Reset timer case chip hangs without another request
> -		 * being added */
>  		i915_queue_hangcheck(dev_priv);
>  
>  out:
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-06-08  8:43 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 16:08 Breadcrumbs, again Chris Wilson
2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-06-08  8:42   ` Daniel Vetter [this message]
2016-06-08  9:13     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-06-06 12:52   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-06-06 13:00   ` Tvrtko Ursulin
2016-06-07 12:11     ` Arun Siluvery
2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-06-06 13:58   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-06-06 14:39   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-06-06 14:55   ` Tvrtko Ursulin
2016-06-08  9:24     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-06-06 15:03   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-06-06 15:05   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-06-06 15:09   ` Tvrtko Ursulin
2016-06-08  9:27     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-06-06 15:10   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-06-06 15:34   ` Tvrtko Ursulin
2016-06-08  9:35     ` Chris Wilson
2016-06-08  9:57       ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-06-08  8:54   ` Daniel Vetter
2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-06-06 13:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-06-07 12:04   ` Tvrtko Ursulin
2016-06-08  9:48     ` Chris Wilson
2016-06-08 10:16       ` Tvrtko Ursulin
2016-06-08 11:24         ` Chris Wilson
2016-06-08 11:47           ` Tvrtko Ursulin
2016-06-08 12:34             ` Chris Wilson
2016-06-08 12:44               ` Tvrtko Ursulin
2016-06-08 13:47                 ` Chris Wilson
2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-06-07 12:31   ` Tvrtko Ursulin
2016-06-08  9:54     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-06-07 12:46   ` Tvrtko Ursulin
2016-06-08 10:01     ` Chris Wilson
2016-06-08 10:18       ` Tvrtko Ursulin
2016-06-08 11:10         ` Chris Wilson
2016-06-08 11:49           ` Tvrtko Ursulin
2016-06-08 12:54             ` Chris Wilson
2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-06-07 12:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-06-07 12:51   ` Tvrtko Ursulin
2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608084258.GM3363@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.