All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Carlos Santa <carlos.santa@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: Michel Thierry <michel.thierry@intel.com>
Subject: Re: drm/i915: Watchdog timeout: IRQ handler for gen8+
Date: Mon, 07 Jan 2019 12:16:37 +0000	[thread overview]
Message-ID: <154686339753.27300.16261214859672560976@skylake-alporthouse-com> (raw)
In-Reply-To: <e33871bc-ef83-6f29-2f7a-6ab09339140d@linux.intel.com>

Quoting Tvrtko Ursulin (2019-01-07 11:58:13)
> 
> Hi,
> 
> This series has not been recognized by Patchwork as such, nor are the 
> patches numbered. Have you used git format-patch -<N> --cover-letter and 
> git send-email to send it out?
> 
> Rest inline.
> 
> On 05/01/2019 02:39, Carlos Santa wrote:
> > +static void gen8_watchdog_irq_handler(unsigned long data)
> > +{
> > +     struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> > +     struct drm_i915_private *dev_priv = engine->i915;
> > +     enum forcewake_domains fw_domains;
> > +     u32 current_seqno;
> > +
> > +     switch (engine->id) {
> > +     default:
> > +             MISSING_CASE(engine->id);
> > +             /* fall through */
> > +     case RCS:
> > +             fw_domains = FORCEWAKE_RENDER;
> > +             break;
> > +     case VCS:
> > +     case VCS2:
> > +     case VECS:
> > +             fw_domains = FORCEWAKE_MEDIA;
> > +             break;
> > +     }
> > +
> > +     intel_uncore_forcewake_get(dev_priv, fw_domains);
> 
> I'd be tempted to drop this and just use I915_WRITE. It doesn't feel 
> like there is any performance to be gained with it and it embeds too 
> much knowledge here.

No, no, no. Let's not reintroduce a fw inside irq context on a frequent
timer again.

Rule of thumb for fw_get:
gen6+: 10us to 50ms.
gen8+: 10us to 500us.

And then we don't release fw for 1ms after the fw_put. So we basically
prevent GT powersaving while the watchdog is active. That strikes me as
hopefully an unintended consequence.

The fw_get will be required if we actually hang, but for the timer
check, we should be able to do without.

And while on the topic of the timer irq, it should be forcibly cleared
along intel_engine_park, so that we ensure it is not raised while the
device/driver is supposed to be asleep. Or something to that effect.

> > +     current_seqno = intel_engine_get_seqno(engine);
> > +
> > +     /* did the request complete after the timer expired? */
> > +     if (intel_engine_last_submit(engine) == current_seqno)
> > +             goto fw_put;
> > +
> > +     if (engine->hangcheck.watchdog == current_seqno) {
> > +             /* Make sure the active request will be marked as guilty */
> > +             engine->hangcheck.stalled = true;
> > +             engine->hangcheck.acthd = intel_engine_get_active_head(engine);
> > +             engine->hangcheck.seqno = current_seqno;
> > +
> > +             /* And try to run the hangcheck_work as soon as possible */
> > +             set_bit(I915_RESET_WATCHDOG, &dev_priv->gpu_error.flags);
> > +             queue_delayed_work(system_long_wq,
> > +                                &dev_priv->gpu_error.hangcheck_work,
> > +                                round_jiffies_up_relative(HZ));
> > +     } else {
> > +             engine->hangcheck.watchdog = current_seqno;
> 
> The logic above potentially handles my previous question? Could be if 
> batch 2 hangs. But..

Also, DO NOT USE HANGCHECK for this. The whole design was to be able to
do the engine reset right away. (Now guc can't but that's known broken.)

Aside, we have to rewrite this entire logic anyway as the engine seqno
and global_seqno are obsolete.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-01-07 12:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05  2:39 Gen8+ engine-reset Carlos Santa
2019-01-05  2:39 ` drm/i915: Add engine reset count in get-reset-stats ioctl Carlos Santa
2019-01-05  2:39 ` drm/i915: Watchdog timeout: IRQ handler for gen8+ Carlos Santa
2019-01-07 11:58   ` Tvrtko Ursulin
2019-01-07 12:16     ` Chris Wilson [this message]
2019-01-07 12:58       ` Tvrtko Ursulin
2019-01-07 13:02         ` Chris Wilson
2019-01-07 13:12           ` Tvrtko Ursulin
2019-01-07 13:43     ` Tvrtko Ursulin
2019-01-07 13:57       ` Chris Wilson
2019-01-07 16:58         ` Tvrtko Ursulin
2019-01-07 18:31           ` Chris Wilson
2019-01-11  0:47           ` Antonio Argenziano
2019-01-11  8:22             ` Tvrtko Ursulin
2019-01-11 17:31               ` Antonio Argenziano
2019-01-11 21:28                 ` John Harrison
2019-01-16 16:15                   ` Tvrtko Ursulin
2019-01-16 17:42                     ` Antonio Argenziano
2019-01-16 17:59                       ` Antonio Argenziano
2019-01-11  2:58           ` Carlos Santa
2019-01-24  0:13     ` Carlos Santa
2019-01-05  2:39 ` drm/i915: Watchdog timeout: Ringbuffer command emission " Carlos Santa
2019-01-07 12:21   ` Tvrtko Ursulin
2019-01-05  2:39 ` drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Carlos Santa
2019-01-07 12:38   ` Tvrtko Ursulin
2019-01-07 12:50     ` Chris Wilson
2019-01-07 13:39       ` Tvrtko Ursulin
2019-01-07 13:51         ` Chris Wilson
2019-01-07 17:00     ` Tvrtko Ursulin
2019-01-07 17:20       ` Tvrtko Ursulin
2019-01-05  2:39 ` drm/i915: Watchdog timeout: Include threshold value in error state Carlos Santa
2019-01-05  4:19   ` kbuild test robot
2019-01-05  4:39   ` kbuild test robot
2019-01-05  2:39 ` drm/i915: Only process VCS2 only when supported Carlos Santa
2019-01-07 12:40   ` Tvrtko Ursulin
2019-01-24  0:20     ` Carlos Santa
2019-01-05  2:40 ` drm/i915/watchdog: move emit_stop_watchdog until the very end of the ring commands Carlos Santa
2019-01-07 12:50   ` Tvrtko Ursulin
2019-01-07 12:54     ` Chris Wilson
2019-01-07 13:01       ` Tvrtko Ursulin
2019-01-11  2:25     ` Carlos Santa
2019-01-05  2:40 ` drm/i915: Watchdog timeout: Blindly trust watchdog timeout for reset? Carlos Santa
2019-01-05  4:15   ` kbuild test robot
2019-01-05 13:32   ` kbuild test robot
2019-01-05  2:57 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2019-01-05  3:21 ` ✓ Fi.CI.BAT: success " Patchwork
2019-01-05  4:41 ` ✓ Fi.CI.IGT: " Patchwork
2019-01-07 10:11 ` Gen8+ engine-reset Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=154686339753.27300.16261214859672560976@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=carlos.santa@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=michel.thierry@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.