All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/10] drm/i915: Replace hangcheck by heartbeats
Date: Fri, 11 Oct 2019 16:06:03 +0100	[thread overview]
Message-ID: <157080636308.31572.4047266278658870717@skylake-alporthouse-com> (raw)
In-Reply-To: <e8b09683-fe0b-2e55-6b34-62b6c3fe2a30@linux.intel.com>

Quoting Tvrtko Ursulin (2019-10-11 15:24:21)
> 
> On 10/10/2019 08:14, Chris Wilson wrote:
> > +config DRM_I915_HEARTBEAT_INTERVAL
> > +     int "Interval between heartbeat pulses (ms)"
> > +     default 2500 # milliseconds
> > +     help
> > +       While active the driver uses a periodic request, a heartbeat, to
> > +       check the wellness of the GPU and to regularly flush state changes
> > +       (idle barriers).
> 
> Should this be somehow reworded to be more end user friendly? My idea, 
> may need to be corrected for bad English:

End user friendly. Sure, but that means I didn't hide this well enough
;)
 
> The driver sends a periodic heartbeat down all active GT engines to 
> check the health of the GPU and undertake regular house-keeping of 
> internal driver state.
> 
> Main points from the user perspective: "request" - whaat? "idle 
> barriers" - ditto. "Wellness" - a bit unusual in this context, no?

> > +static void heartbeat(struct work_struct *wrk)
> > +{
> > +     struct i915_sched_attr attr = {
> > +             .priority = I915_USER_PRIORITY(I915_PRIORITY_MIN),
> 
> You were saying it's better to start from zero, right?

The first bump. Starting at lowest, means run when first idle. Then we
jump to 0 and be scheduled like any other normal user.

> > +     };
> > +     struct intel_engine_cs *engine =
> > +             container_of(wrk, typeof(*engine), heartbeat.work.work);
> > +     struct intel_context *ce = engine->kernel_context;
> > +     struct i915_request *rq;
> > +
> > +     if (!intel_engine_pm_get_if_awake(engine))
> > +             return;
> > +
> > +     rq = engine->heartbeat.systole;
> > +     if (rq && i915_request_completed(rq)) {
> > +             i915_request_put(rq);
> > +             engine->heartbeat.systole = NULL;
> > +     }
> > +
> > +     if (intel_gt_is_wedged(engine->gt))
> > +             goto out;
> > +
> > +     if (engine->heartbeat.systole) {
> > +             if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) {
> > +                     struct drm_printer p = drm_debug_printer(__func__);
> > +
> > +                     intel_engine_dump(engine, &p,
> > +                                       "%s heartbeat not ticking\n",
> > +                                       engine->name);
> 
> This could perhaps be better only when we have reached a higher priority 
> attempt. Okay it's under DEBUG_GEM but still, not sure there is value in 
> being so panicky if for any reason preemption does not work. Heartbeat 
> does not depend on preemption as far as I could spot, right?

The challenge is evident by the else path where we immediately reset.
If we cause a preemption event from the heartbeat (even strictly at min
prio we could cause a timeslice to expire) it is useful to have the
debug in dmesg (as in CI we don't get error-state very often).

Yes, I've tried trimming it to only on the vital paths, but so far
haven't found a satisfactory means.

To make me happy I think I need to push it down into the reset routines
themselves. Hmm. Except those we definitely don't want dmesg spam as
they get runs 10s of thousands times during CI.

It'll do for now. I'm sure we'll get tired of it and find it a new home.

> > +static struct kobj_attribute heartbeat_interval_attr =
> > +__ATTR(heartbeat_interval_ms, 0600, heartbeat_interval_show, heartbeat_interval_store);
> >   
> >   static void kobj_engine_release(struct kobject *kobj)
> >   {
> > @@ -115,6 +141,9 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
> >               &class_attr.attr,
> >               &inst_attr.attr,
> >               &mmio_attr.attr,
> > +#if CONFIG_DRM_I915_HEARTBEAT_INTERVAL
> > +             &heartbeat_interval_attr.attr,
> > +#endif
> 
> Presumably compiler is happy (or the linker) with only this part getting 
> the #ifdef treatment? (The show/store functions above don't have it.)

Yup, it's not annoying enough to complain about the dead globals. Although
it should be more than smart enough to remove them.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-10-11 15:06 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10  7:14 [PATCH 01/10] drm/i915: Note the addition of timeslicing to the pretend scheduler Chris Wilson
2019-10-10  7:14 ` [PATCH 02/10] drm/i915/execlists: Leave tell-tales as to why pending[] is bad Chris Wilson
2019-10-11  8:39   ` Tvrtko Ursulin
2019-10-10  7:14 ` [PATCH 03/10] drm/i915: Expose engine properties via sysfs Chris Wilson
2019-10-11  8:44   ` Tvrtko Ursulin
2019-10-11  8:49     ` Chris Wilson
2019-10-11  9:04       ` Tvrtko Ursulin
2019-10-11  9:40   ` [PATCH v2] " Chris Wilson
2019-10-10  7:14 ` [PATCH 04/10] drm/i915/execlists: Force preemption Chris Wilson
2019-10-10  7:14 ` [PATCH 05/10] drm/i915: Mark up "sentinel" requests Chris Wilson
2019-10-11  8:45   ` Tvrtko Ursulin
2019-10-10  7:14 ` [PATCH 06/10] drm/i915/gt: Introduce barrier pulses along engines Chris Wilson
2019-10-11  9:11   ` Tvrtko Ursulin
2019-10-11  9:52     ` Chris Wilson
2019-10-10  7:14 ` [PATCH 07/10] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-10-11  9:47   ` Tvrtko Ursulin
2019-10-11 10:03     ` Chris Wilson
2019-10-11 10:15     ` Chris Wilson
2019-10-11 10:40       ` Chris Wilson
2019-10-11 11:16   ` [PATCH v2] " Chris Wilson
2019-10-11 13:10     ` Tvrtko Ursulin
2019-10-11 14:10       ` Chris Wilson
2019-10-10  7:14 ` [PATCH 08/10] drm/i915: Cancel non-persistent contexts on close Chris Wilson
2019-10-11 13:55   ` Tvrtko Ursulin
2019-10-11 14:22     ` Chris Wilson
2019-10-11 15:41       ` Chris Wilson
2019-10-10  7:14 ` [PATCH 09/10] drm/i915: Replace hangcheck by heartbeats Chris Wilson
2019-10-11 14:24   ` Tvrtko Ursulin
2019-10-11 15:06     ` Chris Wilson [this message]
2019-10-10  7:14 ` [PATCH 10/10] drm/i915: Flush idle barriers when waiting Chris Wilson
2019-10-11 14:56   ` Tvrtko Ursulin
2019-10-11 15:11     ` Chris Wilson
2019-10-14 13:08       ` Tvrtko Ursulin
2019-10-14 13:38         ` Chris Wilson
2019-10-23 15:33         ` Chris Wilson
2019-10-23 15:33           ` [Intel-gfx] " Chris Wilson
2019-10-10  8:18 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/10] drm/i915: Note the addition of timeslicing to the pretend scheduler Patchwork
2019-10-10  8:42 ` ✓ Fi.CI.BAT: success " Patchwork
2019-10-10 16:19 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-10-11  8:16 ` [PATCH 01/10] " Tvrtko Ursulin
2019-10-11  9:49 ` ✗ Fi.CI.BUILD: failure for series starting with [01/10] drm/i915: Note the addition of timeslicing to the pretend scheduler (rev2) Patchwork
2019-10-11 11:39 ` ✗ Fi.CI.BUILD: failure for series starting with [01/10] drm/i915: Note the addition of timeslicing to the pretend scheduler (rev3) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157080636308.31572.4047266278658870717@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.