All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Michel Thierry <michel.thierry@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v5 15/18] drm/i915: Watchdog timeout: IRQ handler for gen8+
Date: Sat, 25 Mar 2017 09:26:52 +0000	[thread overview]
Message-ID: <20170325092652.GF25768@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <20170325013010.36244-16-michel.thierry@intel.com>

On Fri, Mar 24, 2017 at 06:30:07PM -0700, Michel Thierry wrote:
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 87e76ef589b1..d484cbc561eb 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1369,6 +1369,10 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
>  
>  	if (tasklet)
>  		tasklet_hi_schedule(&engine->irq_tasklet);
> +
> +	if (iir & (GT_GEN8_WATCHDOG_INTERRUPT << test_shift)) {
> +		tasklet_hi_schedule(&engine->watchdog_tasklet);

We don't need to set this as high, we definitely do want to process the
live engines first and so some small latency in detecting the reset is
no deal breaker. We probably don't even want to use a tasklet? (Or
actually we do!)

>  static void hangcheck_declare_hang(struct drm_i915_private *i915,
>  				   unsigned int hung,
> -				   unsigned int stuck)
> +				   unsigned int stuck,
> +				   unsigned int watchdog)

That's a very interesting question as to whether we want to use the very
heavy hangcheck and capture machine at all for the watchdog.

> +#define GEN8_WATCHDOG_1000US 0x2ee0 //XXX: Temp, replace with helper function
> +static void gen8_watchdog_irq_handler(unsigned long data)
> +{
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> +	struct drm_i915_private *dev_priv = engine->i915;
> +	u32 watchdog_disable, current_seqno;
> +
> +	intel_uncore_forcewake_get(dev_priv, engine->fw_domains);
> +
> +	if (engine->id == RCS)
> +		watchdog_disable = GEN8_RCS_WATCHDOG_DISABLE;
> +	else
> +		watchdog_disable = GEN8_XCS_WATCHDOG_DISABLE;
> +
> +	/* Stop the counter to prevent further timeout interrupts */
> +	I915_WRITE_FW(RING_CNTR(engine->mmio_base), watchdog_disable);
> +
> +	/* false-positive, request completed after the timer expired */

False optimism in spotting the false positive. engine_is_idle() means
all requests not the interesting one. Since you are using seqno, just
reject when seqno == intel_engine_last_submit().

> +	if (intel_engine_is_idle(engine))
> +		goto fw_put;
> +
> +	current_seqno = intel_engine_get_seqno(engine);
> +	if (engine->hangcheck.last_watchdog_seqno == current_seqno) {

Or you could just reset the engine directly, once we rid it of that
pesky mutex (which is done in all but name already). Doing that from
inside the tasklet has some advantages -- we don't need to disable the
execlists/guc tasklet.

> +		/* Make sure the active request will be marked as guilty */
> +		engine->hangcheck.stalled = true;
> +		engine->hangcheck.seqno = intel_engine_get_seqno(engine);
> +
> +		/* And try to run the hangcheck_work as soon as possible */
> +		set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
> +		queue_delayed_work(system_long_wq,
> +				   &dev_priv->gpu_error.hangcheck_work, 0);
> +	} else {
> +		engine->hangcheck.last_watchdog_seqno = current_seqno;
> +		/* Re-start the counter, if really hung, it will expire again */
> +		I915_WRITE_FW(RING_THRESH(engine->mmio_base), GEN8_WATCHDOG_1000US);
> +		I915_WRITE_FW(RING_CNTR(engine->mmio_base), GEN8_WATCHDOG_ENABLE);
> +	}
> +
> +fw_put:
> +	intel_uncore_forcewake_put(dev_priv, engine->fw_domains);
> +}

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index e8faf2c34c97..fffe69f5aed2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -128,6 +128,7 @@ struct intel_instdone {
>  struct intel_engine_hangcheck {
>  	u64 acthd;
>  	u32 seqno;
> +	u32 last_watchdog_seqno;

Just watchdog will be enough for its meaning to be clear.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-03-25  9:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-25  1:29 [PATCH v5 00/18] Gen8+ engine-reset Michel Thierry
2017-03-25  1:29 ` [PATCH v5 01/18] drm/i915: Fix stale comment about I915_RESET_IN_PROGRESS flag Michel Thierry
2017-03-25  1:29 ` [PATCH v5 02/18] drm/i915: Rename gen8_(un)request_engine_reset to gen8_(un)request_reset_engine Michel Thierry
2017-03-25  9:01   ` Chris Wilson
2017-03-27 20:35     ` Michel Thierry
2017-03-25  1:29 ` [PATCH v5 03/18] drm/i915: Update i915.reset to handle engine resets Michel Thierry
2017-03-25  1:29 ` [PATCH v5 04/18] drm/i915/tdr: Modify error handler for per engine hang recovery Michel Thierry
2017-03-25  9:10   ` Chris Wilson
2017-03-25  1:29 ` [PATCH v5 05/18] drm/i915/tdr: Add support for per engine reset recovery Michel Thierry
2017-03-25  1:29 ` [PATCH v5 06/18] drm/i915: Skip reset request if there is one already Michel Thierry
2017-03-25  1:29 ` [PATCH v5 07/18] drm/i915/tdr: Add engine reset count to error state Michel Thierry
2017-03-25  1:30 ` [PATCH v5 08/18] drm/i915/tdr: Export per-engine reset count info to debugfs Michel Thierry
2017-03-25  1:30 ` [PATCH v5 09/18] drm/i915/tdr: Enable Engine reset and recovery support Michel Thierry
2017-03-25  1:30 ` [PATCH v5 10/18] drm/i915: Add engine reset count in get-reset-stats ioctl Michel Thierry
2017-03-25  1:30 ` [PATCH v5 11/18] drm/i915/selftests: reset engine self tests Michel Thierry
2017-03-25  9:50   ` Chris Wilson
2017-03-25  1:30 ` [PATCH v5 12/18] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
2017-04-14  0:16   ` Daniele Ceraolo Spurio
2017-04-14  0:30     ` Michel Thierry
2017-03-25  1:30 ` [PATCH v5 13/18] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-03-25  1:30 ` [PATCH v5 14/18] drm/i915: Watchdog timeout: Pass GuC shared data structure during param load Michel Thierry
2017-03-25  9:15   ` Chris Wilson
2017-04-17 21:28   ` Daniele Ceraolo Spurio
2017-04-17 22:31     ` Michel Thierry
2017-03-25  1:30 ` [PATCH v5 15/18] drm/i915: Watchdog timeout: IRQ handler for gen8+ Michel Thierry
2017-03-25  9:26   ` Chris Wilson [this message]
2017-03-27 21:48     ` Michel Thierry
2017-03-28  8:34       ` Chris Wilson
2017-03-25  1:30 ` [PATCH v5 16/18] drm/i915: Watchdog timeout: Ringbuffer command emission " Michel Thierry
2017-03-25  9:46   ` Chris Wilson
2017-03-25  1:30 ` [PATCH v5 17/18] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Michel Thierry
2017-03-25  9:38   ` Chris Wilson
2017-03-28  1:03     ` Michel Thierry
2017-03-28  8:31       ` Chris Wilson
2017-03-28  8:39         ` Chris Wilson
2017-04-14 16:05   ` Daniele Ceraolo Spurio
2017-04-14 16:16     ` Chris Wilson
2017-04-14 16:47     ` Michel Thierry
2017-03-25  1:30 ` [PATCH v5 18/18] drm/i915: Watchdog timeout: Export media reset count from GuC to debugfs Michel Thierry
2017-03-25  9:12   ` Chris Wilson
2017-03-25  1:50 ` ✗ Fi.CI.BAT: failure for Gen8+ engine-reset Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170325092652.GF25768@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=michel.thierry@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.