All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/i915: Force reset on unready engine
Date: Mon, 13 Aug 2018 12:58:10 +0300	[thread overview]
Message-ID: <87mutq4mh9.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <153414914466.23925.2644394062176644872@skylake-alporthouse-com>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-08-13 09:18:07)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Quoting Mika Kuoppala (2018-08-10 15:00:36)
>> >> If engine reports that it is not ready for reset, we
>> >> give up. Evidence shows that forcing a per engine reset
>> >> on an engine which is not reporting to be ready for reset,
>> >> can bring it back into a working order. There is risk that
>> >> we corrupt the context image currently executing on that
>> >> engine. But that is a risk worth taking as if we unblock
>> >> the engine, we prevent a whole device wedging in a case
>> >> of full gpu reset.
>> >> 
>> >> Reset individual engine even if it reports that it is not
>> >> prepared for reset, but only if we aim for full gpu reset
>> >> and not on first reset attempt.
>> >> 
>> >> v2: force reset only on later attempts, readability (Chris)
>> >> 
>> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/intel_uncore.c | 49 +++++++++++++++++++++++------
>> >>  1 file changed, 39 insertions(+), 10 deletions(-)
>> >> 
>> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> >> index 027d14574bfa..d24026839b17 100644
>> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> >> @@ -2099,9 +2099,6 @@ static int gen8_reset_engine_start(struct intel_engine_cs *engine)
>> >>                                            RESET_CTL_READY_TO_RESET,
>> >>                                            700, 0,
>> >>                                            NULL);
>> >> -       if (ret)
>> >> -               DRM_ERROR("%s: reset request timeout\n", engine->name);
>> >> -
>> >>         return ret;
>> >>  }
>> >>  
>> >> @@ -2113,6 +2110,42 @@ static void gen8_reset_engine_cancel(struct intel_engine_cs *engine)
>> >>                       _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
>> >>  }
>> >>  
>> >> +static int reset_engines(struct drm_i915_private *i915,
>> >> +                        unsigned int engine_mask,
>> >> +                        unsigned int retry)
>> >> +{
>> >> +       if (INTEL_GEN(i915) >= 11)
>> >> +               return gen11_reset_engines(i915, engine_mask);
>> >> +       else
>> >> +               return gen6_reset_engines(i915, engine_mask, retry);
>> >> +}
>> >> +
>> >> +static int gen8_prepare_engine_for_reset(struct intel_engine_cs *engine,
>> >> +                                        unsigned int retry)
>> >> +{
>> >> +       const bool force_reset_on_non_ready = retry >= 1;
>> >> +       int ret;
>> >> +
>> >> +       ret = gen8_reset_engine_start(engine);
>> >> +
>> >> +       if (ret && force_reset_on_non_ready) {
>> >> +               /*
>> >> +                * Try to unblock a single non-ready engine by risking
>> >> +                * context corruption.
>> >> +                */
>> >> +               ret = reset_engines(engine->i915,
>> >> +                                   intel_engine_flag(engine),
>> >> +                                   retry);
>> >> +               if (!ret)
>> >> +                       ret = gen8_reset_engine_start(engine);
>> >> +
>> >> +               DRM_ERROR("%s: reset request timeout, forcing reset (%d)\n",
>> >> +                         engine->name, ret);
>> >
>> > This looks dubious now ;)
>> >
>> > After the force you then do a reset in the caller. Twice the reset for
>> > twice the unpreparedness.
>> 
>> It is intentional. First we make the engine unstuck and then do
>> a full cycle with ready to reset involved. I don't know if
>> it really matters tho. It could be that the engine is already
>> in pristine condition after first.
>
> Looks extremely weird way of going about it. If you want to do a double
> reset after the first try fails, try

The crux is: failing to reset or failing to preparing to reset.
It is the ready for reset dance not working that we are trying to
by doing extra per engine reset. So trying in the higher lever,
again with the same preconditions would not help.

-Mika

>
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 4e5826045cbf..6fe137d7d455 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -450,6 +450,8 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>                         GEM_TRACE("engine_mask=%x\n", engine_mask);
>                         preempt_disable();
>                         ret = reset(i915, engine_mask);
> +                       if (retry > 0 && !ret) /* Double check reset worked */
> +                               ret = reset(i915, engine_mask);
>                         preempt_enable();
>                 }
>                 if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
>
> as a separate patch.
>
> P.S. you really need to review the i915_reset.c patch ;)
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-08-13 10:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-10 14:00 [PATCH 1/2] drm/i915: Expose retry count to per gen reset logic Mika Kuoppala
2018-08-10 14:00 ` [PATCH 2/2] drm/i915: Force reset on unready engine Mika Kuoppala
2018-08-10 14:13   ` Chris Wilson
2018-08-13  8:18     ` Mika Kuoppala
2018-08-13  8:32       ` Chris Wilson
2018-08-13  9:58         ` Mika Kuoppala [this message]
2018-08-13 10:03           ` Chris Wilson
2018-08-13 10:42             ` Mika Kuoppala
2018-08-13 10:51               ` Chris Wilson
2018-08-13 11:02                 ` Mika Kuoppala
2018-08-13 11:08                   ` Chris Wilson
2018-08-13 13:01                     ` Mika Kuoppala
2018-08-10 14:14 ` [PATCH 1/2] drm/i915: Expose retry count to per gen reset logic Chris Wilson
2018-08-13 14:03   ` Mika Kuoppala
2018-08-10 14:37 ` ✓ Fi.CI.BAT: success for series starting with [1/2] " Patchwork
2018-08-10 19:10 ` ✓ Fi.CI.IGT: " Patchwork
2018-08-13 11:19 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Expose retry count to per gen reset logic (rev3) Patchwork
2018-08-13 12:51 ` ✓ Fi.CI.IGT: " Patchwork
2018-08-13 13:54 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Expose retry count to per gen reset logic (rev4) Patchwork
2018-08-13 15:37 ` ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mutq4mh9.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.