All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
@ 2018-07-05 15:02 Chris Wilson
  2018-07-05 19:29 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Chris Wilson @ 2018-07-05 15:02 UTC (permalink / raw)
  To: intel-gfx

If the GPU is irrecoverably wedged on startup, it means that it failed
on initialisation and we have already tried to reset it but failed. We
can ignore all further testing, as it is already dead. Failing early,
prevents us from slowly failing in our endeavours later and timing out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index fe7d3190ebfe..fca073c96c2d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return -EIO; /* we're long past hope of a successful reset */
+
 	intel_runtime_pm_get(i915);
 	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
  2018-07-05 15:02 [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged Chris Wilson
@ 2018-07-05 19:29 ` Patchwork
  2018-07-05 20:44 ` [PATCH] " Rodrigo Vivi
  2018-07-06  5:39 ` ✓ Fi.CI.IGT: success for " Patchwork
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-07-05 19:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
URL   : https://patchwork.freedesktop.org/series/46011/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4436 -> Patchwork_9548 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/46011/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9548:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_suspend@basic-s4-devices:
      {fi-kbl-8809g}:     INCOMPLETE -> DMESG-WARN

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).



== Participating hosts (47 -> 42) ==

  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4436 -> Patchwork_9548

  CI_DRM_4436: 11e3f175832e2a194d75db14297c60ac6349a221 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4539: 8b3cc74c6911e9b2835fe6e160f84bae463a70ef @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9548: bb96129fe5b31ece1d51e3fb57b3e3368abce234 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

bb96129fe5b3 drm/i915/selftests: Fail hangcheck testing if the GPU is wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9548/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
  2018-07-05 15:02 [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged Chris Wilson
  2018-07-05 19:29 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-07-05 20:44 ` Rodrigo Vivi
  2018-07-06  6:37   ` Chris Wilson
  2018-07-06  5:39 ` ✓ Fi.CI.IGT: success for " Patchwork
  2 siblings, 1 reply; 5+ messages in thread
From: Rodrigo Vivi @ 2018-07-05 20:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> If the GPU is irrecoverably wedged on startup, it means that it failed
> on initialisation and we have already tried to reset it but failed. We
> can ignore all further testing, as it is already dead. Failing early,
> prevents us from slowly failing in our endeavours later and timing out.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index fe7d3190ebfe..fca073c96c2d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
>  	if (!intel_has_gpu_reset(i915))
>  		return 0;
>  
> +	if (i915_terminally_wedged(&i915->gpu_error))
> +		return -EIO; /* we're long past hope of a successful reset */
> +

Maybe -ENOTRECOVERABLE ?

Anyways

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


>  	intel_runtime_pm_get(i915);
>  	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
>  
> -- 
> 2.18.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
  2018-07-05 15:02 [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged Chris Wilson
  2018-07-05 19:29 ` ✓ Fi.CI.BAT: success for " Patchwork
  2018-07-05 20:44 ` [PATCH] " Rodrigo Vivi
@ 2018-07-06  5:39 ` Patchwork
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-07-06  5:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
URL   : https://patchwork.freedesktop.org/series/46011/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4436_full -> Patchwork_9548_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9548_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9548_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9548_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-bsd1:
      shard-kbl:          PASS -> SKIP +2

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          SKIP -> PASS +1

    
== Known issues ==

  Here are the changes found in Patchwork_9548_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_gtt:
      shard-kbl:          PASS -> INCOMPLETE (fdo#107127, fdo#103665)

    igt@gem_exec_reloc@basic-wc-gtt-active:
      shard-snb:          PASS -> INCOMPLETE (fdo#105411)

    igt@gem_exec_schedule@pi-ringfull-bsd:
      shard-apl:          NOTRUN -> FAIL (fdo#103158)

    igt@kms_flip@2x-blocking-wf_vblank:
      shard-glk:          PASS -> FAIL (fdo#103928)

    igt@kms_flip@2x-plain-flip-ts-check-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip@dpms-vs-vblank-race-interruptible:
      shard-glk:          PASS -> FAIL (fdo#103060)

    igt@kms_flip@flip-vs-expired-vblank:
      shard-glk:          PASS -> FAIL (fdo#102887, fdo#105363)

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#105363)

    igt@kms_flip_tiling@flip-x-tiled:
      shard-glk:          PASS -> FAIL (fdo#103822)

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)
      shard-kbl:          PASS -> FAIL (fdo#99912)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gtt:
      shard-apl:          INCOMPLETE (fdo#103927, fdo#107127) -> PASS

    igt@drv_suspend@shrink:
      shard-apl:          INCOMPLETE (fdo#103927) -> PASS

    igt@gem_exec_big:
      shard-hsw:          INCOMPLETE (fdo#103540) -> PASS

    igt@kms_flip@2x-flip-vs-blocking-wf-vblank:
      shard-glk:          FAIL (fdo#100368) -> PASS +1

    igt@kms_flip@2x-flip-vs-expired-vblank-interruptible:
      shard-glk:          FAIL (fdo#102887) -> PASS

    igt@kms_flip@modeset-vs-vblank-race:
      shard-hsw:          FAIL (fdo#103060) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
  fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#107127 https://bugs.freedesktop.org/show_bug.cgi?id=107127
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4436 -> Patchwork_9548

  CI_DRM_4436: 11e3f175832e2a194d75db14297c60ac6349a221 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4539: 8b3cc74c6911e9b2835fe6e160f84bae463a70ef @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9548: bb96129fe5b31ece1d51e3fb57b3e3368abce234 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9548/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged
  2018-07-05 20:44 ` [PATCH] " Rodrigo Vivi
@ 2018-07-06  6:37   ` Chris Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2018-07-06  6:37 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

Quoting Rodrigo Vivi (2018-07-05 21:44:56)
> On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> > If the GPU is irrecoverably wedged on startup, it means that it failed
> > on initialisation and we have already tried to reset it but failed. We
> > can ignore all further testing, as it is already dead. Failing early,
> > prevents us from slowly failing in our endeavours later and timing out.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index fe7d3190ebfe..fca073c96c2d 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
> >       if (!intel_has_gpu_reset(i915))
> >               return 0;
> >  
> > +     if (i915_terminally_wedged(&i915->gpu_error))
> > +             return -EIO; /* we're long past hope of a successful reset */
> > +
> 
> Maybe -ENOTRECOVERABLE ?

Interesting choice, our convention so far has been -EIO for losing state
due to a GPU hang, but an extra flavour for when we wedge the driver?

Hmm, fence->error needs to remain -EIO (differentiating that between
reset/wedge for userspace seems to convey no more information imo), and
we've already baked 
	if (i915_terminally_wedged(&i915->gpu_error))
		return -EIO;
into the abi for the points of interest. 

Sadly too late, I don't think we can pick another errno for the cases it
actually matter.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-06  6:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05 15:02 [PATCH] drm/i915/selftests: Fail hangcheck testing if the GPU is wedged Chris Wilson
2018-07-05 19:29 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-07-05 20:44 ` [PATCH] " Rodrigo Vivi
2018-07-06  6:37   ` Chris Wilson
2018-07-06  5:39 ` ✓ Fi.CI.IGT: success for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.