All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Carlos Santa <carlos.santa@intel.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org
Cc: Michel Thierry <michel.thierry@intel.com>
Subject: Re: [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+
Date: Tue, 19 Mar 2019 12:39:53 +0000	[thread overview]
Message-ID: <25e7c81e-18fc-0676-dd27-d0a7823aaa41@linux.intel.com> (raw)
In-Reply-To: <1271220339df25098eb6305051972c80c125bc52.camel@intel.com>


On 18/03/2019 00:15, Carlos Santa wrote:
> On Mon, 2019-03-11 at 10:39 +0000, Tvrtko Ursulin wrote:
>> On 08/03/2019 03:16, Carlos Santa wrote:
>>> On Fri, 2019-03-01 at 09:36 +0000, Chris Wilson wrote:
>>>>>
>>>>
>>>> Quoting Carlos Santa (2019-02-21 02:58:16)
>>>>> +#define GEN8_WATCHDOG_1000US(dev_priv)
>>>>> watchdog_to_clock_counts(dev_priv, 1000)
>>>>> +static void gen8_watchdog_irq_handler(unsigned long data)
>>>>> +{
>>>>> +       struct intel_engine_cs *engine = (struct
>>>>> intel_engine_cs
>>>>> *)data;
>>>>> +       struct drm_i915_private *dev_priv = engine->i915;
>>>>> +       unsigned int hung = 0;
>>>>> +       u32 current_seqno=0;
>>>>> +       char msg[80];
>>>>> +       unsigned int tmp;
>>>>> +       int len;
>>>>> +
>>>>> +       /* Stop the counter to prevent further timeout
>>>>> interrupts
>>>>> */
>>>>> +       I915_WRITE_FW(RING_CNTR(engine->mmio_base),
>>>>> get_watchdog_disable(engine));
>>>>> +
>>>>> +       /* Read the heartbeat seqno once again to check if we
>>>>> are
>>>>> stuck? */
>>>>> +       current_seqno =
>>>>> intel_engine_get_hangcheck_seqno(engine);
>>>>
>>>> I have said this before, but this doesn't exist either, it's just
>>>> a
>>>> temporary glitch in the matrix.
>>>>
>>>
>>> Chris, Tvrtko, I need some guidance on how to find the quilty seqno
>>> during a hang, can you please advice here what to do?
>>
>> When an interrupt fires you need to ascertain whether the same
>> request
>> which enabled the watchdog is running, correct?
>>
>> So I think you would need this, with a disclaimer that I haven't
>> thought
>> about the details really:
>>
>> 1. Take a reference to timeline hwsp when setting up the watchdog for
>> a
>> request.
>>
>> 2. Store the initial seqno associated with this request.
>>
>> 3. Force enable user interrupts.
>>
>> 4. When timeout fires, inspect the HWSP seqno to see if the request
>> completed or not.
>>
>> 5. Reset the engine if not completed.
>>
>> 6. Put the timeline/hwsp reference.
> 
> 
> static int gen8_emit_bb_start(struct i915_request *rq,
> 							u64 offset, u32
> len,
> 							const unsigned
> int flags)
> {
> 	struct i915_timeline *tl;
> 	u32 seqno;
> 
> 	if (enable_watchdog) {
> 		/* Start watchdog timer */
> 		cs = gen8_emit_start_watchdog(rq, cs);
> 		tl = ce->ring->timeline;
> 		i915_timeline_get_seqno(tl, rq, &seqno);
> 		/*Store initial hwsp seqno associated with this request
> 		engine->watchdog_hwsp_seqno = tl->hwsp_seqno;

You should not need to allocate a new seqno and also having something 
stored per engine does not make clear how will you solve out of order.

Maybe you just set up the timer, then lets see below..

Also, are you not trying to do the software implementation to start with?

> 	}
> 
> }
> 
> static void gen8_watchdog_tasklet(unsigned long data)
> {
> 		struct i915_request *rq;
> 
> 		rq = intel_engine_find_active_request(engine);
> 
> 		/* Inspect the watchdog seqno once again for
> completion? */
> 		if (!i915_seqno_passed(engine->watchdog_hwsp_seqno, rq-
>> fence.seqno)) {
> 			//Reset Engine
> 		}
> }

What happens if you simply reset without checking anything? You know hw 
timer wouldn't have fired if the context wasn't running, correct?

(Ignoring the race condition between interrupt raised -> hw interrupt 
delivered -> serviced -> tasklet scheduled -> tasklet running. Which may 
mean request has completed in the meantime and you reset the engine for 
nothing. But this is probably not 100% solvable.)

Regards,

Tvrtko

> Tvrtko, is the above acceptable to inspect whether the seqno has
> completed?
> 
> I noticed there's a helper function i915_request_completed(struct
> i915_request *rq) but it will require me to modify it in order to pass
> 2 different seqnos.
> 
> Regards,
> Carlos
> 
>>
>> If the user interrupt fires with the request completed cancel the
>> above
>> operations.
>>
>> There could be an inherent race between inspecting the seqno and
>> deciding to reset. Not sure at the moment what to do. Maybe just call
>> it
>> bad luck?
>>
>> I also think for the software implementation you need to force no
>> request coalescing for contexts with timeout set. Because you want
>> to
>> have 100% defined borders for request in and out - since the timeout
>> is
>> defined per request.
>>
>> In this case you don't need the user interrupt for the trailing edge
>> signal but can use context complete. Maybe putting hooks into
>> context_in/out in intel_lrc.c would work under these circumstances.
>>
>> Also if preempted you need to cancel the timer setup and store
>> elapsed
>> execution time.
>>
>> Or it may make sense to just disable preemption for these contexts.
>> Otherwise there is no point in trying to mandate the timeout?
>>
>> But it is also kind of bad since non-privileged contexts can make
>> themselves non-preemptable by setting the watchdog timeout.
>>
>> Maybe as a compromise we need to automatically apply an elevated
>> priority level, but not as high to be completely non-preemptable.
>> Sounds
>> like a hard question.
>>
>> Regards,
>>
>> Tvrtko
> 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-03-19 12:39 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-21  2:58 [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Carlos Santa
2019-02-21  2:58 ` [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl Carlos Santa
2019-02-25 13:34   ` Tvrtko Ursulin
2019-03-06 23:08     ` Carlos Santa
2019-03-07  7:27       ` Tvrtko Ursulin
2019-02-21  2:58 ` [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+ Carlos Santa
2019-02-28 17:38   ` Tvrtko Ursulin
2019-03-01  1:51     ` Carlos Santa
2019-03-01  9:36   ` Chris Wilson
2019-03-02  2:08     ` Carlos Santa
2019-03-08  3:16     ` Carlos Santa
2019-03-11 10:39       ` Tvrtko Ursulin
2019-03-18  0:15         ` Carlos Santa
2019-03-19 12:39           ` Tvrtko Ursulin [this message]
2019-03-19 12:46             ` Tvrtko Ursulin
2019-03-19 17:52               ` Carlos Santa
2019-02-21  2:58 ` [PATCH v4 3/5] drm/i915: Watchdog timeout: Ringbuffer command emission " Carlos Santa
2019-02-21  2:58 ` [PATCH v4 4/5] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Carlos Santa
2019-02-28 17:22   ` Tvrtko Ursulin
2019-02-21  2:58 ` [PATCH v4 5/5] drm/i915: Watchdog timeout: Include threshold value in error state Carlos Santa
2019-02-21  2:58 ` drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Carlos Santa
2019-02-21  3:24 ` ✗ Fi.CI.BAT: failure for drm/i915: Replace global_seqno with a hangcheck heartbeat seqno (rev3) Patchwork
2019-03-11 11:54 ` [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25e7c81e-18fc-0676-dd27-d0a7823aaa41@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=carlos.santa@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=michel.thierry@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.