All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 19/24] drm/i915/selftests: Be a little more lenient for reset workers
Date: Fri, 28 Feb 2020 17:38:42 +0200	[thread overview]
Message-ID: <87wo867md9.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20200228082330.2411941-19-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Give the reset worker a kick before losing help when waiting for hang
> recovery, as the CPU scheduler is a little unreliable.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_lrc.c | 74 ++++++++++++++++++--------
>  1 file changed, 52 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 95da6b880e3f..af5b3da6d894 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -90,6 +90,48 @@ static int wait_for_submit(struct intel_engine_cs *engine,
>  	return -ETIME;
>  }
>  
> +static int wait_for_reset(struct intel_engine_cs *engine,
> +			  struct i915_request *rq,
> +			  unsigned long timeout)
> +{
> +	timeout += jiffies;
> +	do {
> +		cond_resched();
> +		intel_engine_flush_submission(engine);
> +
> +		if (READ_ONCE(engine->execlists.pending[0]))
> +			continue;
> +
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		if (READ_ONCE(rq->fence.error))
> +			break;
> +	} while (time_before(jiffies, timeout));
> +
> +	flush_scheduled_work();
> +
> +	if (rq->fence.error != -EIO) {
> +		pr_err("%s: hanging request %llx:%lld not reset\n",
> +		       engine->name,
> +		       rq->fence.context,
> +		       rq->fence.seqno);
> +		return -EINVAL;
> +	}
> +
> +	/* Give the request a jiffie to complete after flushing the worker */
> +	if (i915_request_wait(rq, 0,
> +			      max(0l, (long)(timeout - jiffies)) + 1) < 0) {
> +		pr_err("%s: hanging request %llx:%lld did not complete\n",
> +		       engine->name,
> +		       rq->fence.context,
> +		       rq->fence.seqno);
> +		return -ETIME;
> +	}
> +
> +	return 0;
> +}
> +
>  static int live_sanitycheck(void *arg)
>  {
>  	struct intel_gt *gt = arg;
> @@ -1805,14 +1847,9 @@ static int __cancel_active0(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> -		err = -EIO;
> -		goto out;
> -	}
> -
> -	if (rq->fence.error != -EIO) {
> -		pr_err("Cancelled inflight0 request did not report -EIO\n");
> -		err = -EINVAL;
> +	err = wait_for_reset(arg->engine, rq, HZ / 2);
> +	if (err) {
> +		pr_err("Cancelled inflight0 request did not reset\n");
>  		goto out;
>  	}
>  
> @@ -1870,10 +1907,9 @@ static int __cancel_active1(struct live_preempt_cancel *arg)
>  		goto out;
>  
>  	igt_spinner_end(&arg->a.spin);
> -	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
> -		err = -EIO;
> +	err = wait_for_reset(arg->engine, rq[1], HZ / 2);
> +	if (err)
>  		goto out;
> -	}
>  
>  	if (rq[0]->fence.error != 0) {
>  		pr_err("Normal inflight0 request did not complete\n");
> @@ -1953,10 +1989,9 @@ static int __cancel_queued(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq[2], 0, HZ / 5) < 0) {
> -		err = -EIO;
> +	err = wait_for_reset(arg->engine, rq[2], HZ / 2);
> +	if (err)
>  		goto out;
> -	}
>  
>  	if (rq[0]->fence.error != -EIO) {
>  		pr_err("Cancelled inflight0 request did not report -EIO\n");
> @@ -2014,14 +2049,9 @@ static int __cancel_hostile(struct live_preempt_cancel *arg)
>  	if (err)
>  		goto out;
>  
> -	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> -		err = -EIO;
> -		goto out;
> -	}
> -
> -	if (rq->fence.error != -EIO) {
> -		pr_err("Cancelled inflight0 request did not report -EIO\n");
> -		err = -EINVAL;
> +	err = wait_for_reset(arg->engine, rq, HZ / 2);
> +	if (err) {
> +		pr_err("Cancelled inflight0 request did not reset\n");
>  		goto out;
>  	}
>  
> -- 
> 2.25.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-02-28 15:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28  8:23 [Intel-gfx] [PATCH 01/24] drm/i915/gt: Check engine-is-awake on reset later Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 02/24] drm/i915: Skip barriers inside waits Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 03/24] drm/i915/perf: Mark up the racy use of perf->exclusive_stream Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 04/24] drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 05/24] drm/i915/perf: Reintroduce wait on OA configuration completion Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 06/24] drm/i915: Wrap i915_active in a simple kreffed struct Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 07/24] drm/i915: Extend i915_request_await_active to use all timelines Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 08/24] drm/i915/perf: Schedule oa_config after modifying the contexts Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 09/24] drm/i915/gem: Consolidate ctx->engines[] release Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 10/24] drm/i915/gt: Prevent allocation on a banned context Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 11/24] drm/i915/gem: Check that the context wasn't closed during setup Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 12/24] drm/i915/selftests: Disable heartbeat around manual pulse tests Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 13/24] drm/i915/gt: Reset queue_priority_hint after wedging Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 14/24] drm/i915/gt: Pull marking vm as closed underneath the vm->mutex Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 15/24] drm/i915: Protect i915_request_await_start from early waits Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 16/24] drm/i915/selftests: Verify LRC isolation Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 17/24] drm/i915/selftests: Check recovery from corrupted LRC Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 18/24] drm/i915/selftests: Wait for the kernel context switch Chris Wilson
2020-02-28 15:09   ` Mika Kuoppala
2020-02-28 15:14     ` Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 19/24] drm/i915/selftests: Be a little more lenient for reset workers Chris Wilson
2020-02-28 15:38   ` Mika Kuoppala [this message]
2020-02-28  8:23 ` [Intel-gfx] [PATCH 20/24] drm/i915/selftests: Add request throughput measurement to perf Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 21/24] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 22/24] drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 23/24] drm/i915/execlists: Check the sentinel is alone in the ELSP Chris Wilson
2020-02-28  8:23 ` [Intel-gfx] [PATCH 24/24] drm/i915/execlists: Reduce preempt-to-busy roundtrip delay Chris Wilson
2020-02-28  8:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/24] drm/i915/gt: Check engine-is-awake on reset later Patchwork
2020-02-28  8:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wo867md9.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.