All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gt: Only unwedge if we can reset first
@ 2019-09-13  7:47 Chris Wilson
  2019-09-13  9:35 ` ✗ Fi.CI.BAT: failure for " Patchwork
  2019-09-13 16:03 ` [PATCH] " Ville Syrjälä
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2019-09-13  7:47 UTC (permalink / raw)
  To: intel-gfx

Unwedging the GPU requires a successful GPU reset before we restore the
default submission, or else we may see residual context switch events
that we were not expecting.

v2: Pull in the special-case reset_clobbers_display, and explain why it
should be safe in the context of unwedging.

Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 30 ++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index ee52947eb31d..d3b1cdafd4c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -7,6 +7,7 @@
 #include <linux/sched/mm.h>
 #include <linux/stop_machine.h>
 
+#include "display/intel_display.h"
 #include "display/intel_display_types.h"
 #include "display/intel_overlay.h"
 
@@ -729,6 +730,28 @@ static void nop_submit_request(struct i915_request *request)
 	intel_engine_queue_breadcrumbs(engine);
 }
 
+static bool reset_clobbers_display(struct drm_i915_private *i915)
+{
+	struct intel_crtc *crtc;
+
+	if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
+		return false;
+
+	/*
+	 * While this appears racy, we should only be inspecting the display
+	 * state at runtime from inside a GPU reset, which will be serialized
+	 * with modesets on affected machines. For a full device reset,
+	 * we should already have cleared the active CRTC state in
+	 * intel_prepare_reset().
+	 */
+	for_each_intel_crtc(&i915->drm, crtc) {
+		if (crtc->active)
+			return true;
+	}
+
+	return false;
+}
+
 static void __intel_gt_set_wedged(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
@@ -793,6 +816,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl;
 	unsigned long flags;
+	bool ok;
 
 	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		return true;
@@ -838,7 +862,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	}
 	spin_unlock_irqrestore(&timelines->lock, flags);
 
-	intel_gt_sanitize(gt, false);
+	ok = false;
+	if (!reset_clobbers_display(gt->i915))
+		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+	if (!ok)
+		return false;
 
 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from
-- 
2.23.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915/gt: Only unwedge if we can reset first
  2019-09-13  7:47 [PATCH] drm/i915/gt: Only unwedge if we can reset first Chris Wilson
@ 2019-09-13  9:35 ` Patchwork
  2019-09-13 10:06   ` Chris Wilson
  2019-09-13 16:03 ` [PATCH] " Ville Syrjälä
  1 sibling, 1 reply; 7+ messages in thread
From: Patchwork @ 2019-09-13  9:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gt: Only unwedge if we can reset first
URL   : https://patchwork.freedesktop.org/series/66637/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_6887 -> Patchwork_14395
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_14395 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_14395, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14395:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@reload-with-fault-injection:
    - fi-blb-e6850:       NOTRUN -> [DMESG-WARN][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-blb-e6850/igt@i915_module_load@reload-with-fault-injection.html
    - fi-bwr-2160:        [PASS][2] -> [DMESG-WARN][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-bwr-2160/igt@i915_module_load@reload-with-fault-injection.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-bwr-2160/igt@i915_module_load@reload-with-fault-injection.html
    - fi-pnv-d510:        [PASS][4] -> [DMESG-WARN][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-pnv-d510/igt@i915_module_load@reload-with-fault-injection.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-pnv-d510/igt@i915_module_load@reload-with-fault-injection.html
    - fi-gdg-551:         [PASS][6] -> [DMESG-WARN][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-gdg-551/igt@i915_module_load@reload-with-fault-injection.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-gdg-551/igt@i915_module_load@reload-with-fault-injection.html

  * igt@runner@aborted:
    - fi-pnv-d510:        NOTRUN -> [FAIL][8]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-pnv-d510/igt@runner@aborted.html
    - fi-gdg-551:         NOTRUN -> [FAIL][9]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-gdg-551/igt@runner@aborted.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_ctx_create@basic-files:
    - {fi-tgl-u2}:        [PASS][10] -> [INCOMPLETE][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-tgl-u2/igt@gem_ctx_create@basic-files.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-tgl-u2/igt@gem_ctx_create@basic-files.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - {fi-tgl-u}:         NOTRUN -> [SKIP][12]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-tgl-u/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_pm_rpm@basic-rte:
    - {fi-tgl-u}:         NOTRUN -> [FAIL][13]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-tgl-u/igt@i915_pm_rpm@basic-rte.html

  
Known issues
------------

  Here are the changes found in Patchwork_14395 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_reset:
    - fi-icl-u2:          [PASS][14] -> [INCOMPLETE][15] ([fdo#107713])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-icl-u2/igt@i915_selftest@live_reset.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-icl-u2/igt@i915_selftest@live_reset.html

  * igt@i915_selftest@live_workarounds:
    - fi-bsw-kefka:       [PASS][16] -> [DMESG-WARN][17] ([fdo#111373])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-bsw-kefka/igt@i915_selftest@live_workarounds.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-bsw-kefka/igt@i915_selftest@live_workarounds.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [PASS][18] -> [FAIL][19] ([fdo#111407])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@prime_busy@basic-before-default:
    - fi-icl-u3:          [PASS][20] -> [DMESG-WARN][21] ([fdo#107724])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-icl-u3/igt@prime_busy@basic-before-default.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-icl-u3/igt@prime_busy@basic-before-default.html

  
#### Possible fixes ####

  * igt@gem_close_race@basic-process:
    - fi-icl-u3:          [DMESG-WARN][22] ([fdo#107724]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-icl-u3/igt@gem_close_race@basic-process.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-icl-u3/igt@gem_close_race@basic-process.html

  * igt@gem_ctx_switch@legacy-render:
    - fi-bxt-dsi:         [INCOMPLETE][24] ([fdo#103927] / [fdo#111381]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-bxt-dsi/igt@gem_ctx_switch@legacy-render.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-bxt-dsi/igt@gem_ctx_switch@legacy-render.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       [INCOMPLETE][26] ([fdo#107718]) -> [PASS][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_sync@basic-store-all:
    - {fi-tgl-u}:         [INCOMPLETE][28] ([fdo#111647]) -> [PASS][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-tgl-u/igt@gem_sync@basic-store-all.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-tgl-u/igt@gem_sync@basic-store-all.html

  * igt@i915_selftest@live_hangcheck:
    - fi-icl-u3:          [INCOMPLETE][30] ([fdo#107713] / [fdo#108569]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-icl-u3/igt@i915_selftest@live_hangcheck.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-icl-u3/igt@i915_selftest@live_hangcheck.html

  
#### Warnings ####

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-cml-u2:          [FAIL][32] ([fdo#108767] / [fdo#109380]) -> [FAIL][33] ([fdo#111624])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-cml-u2/igt@kms_chamelium@common-hpd-after-suspend.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-cml-u2/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-cml-u2:          [FAIL][34] ([fdo#108767]) -> [FAIL][35] ([fdo#111624]) +7 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-cml-u2/igt@kms_chamelium@dp-hpd-fast.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-cml-u2/igt@kms_chamelium@dp-hpd-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108767]: https://bugs.freedesktop.org/show_bug.cgi?id=108767
  [fdo#109380]: https://bugs.freedesktop.org/show_bug.cgi?id=109380
  [fdo#111373]: https://bugs.freedesktop.org/show_bug.cgi?id=111373
  [fdo#111381]: https://bugs.freedesktop.org/show_bug.cgi?id=111381
  [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
  [fdo#111624]: https://bugs.freedesktop.org/show_bug.cgi?id=111624
  [fdo#111647]: https://bugs.freedesktop.org/show_bug.cgi?id=111647


Participating hosts (54 -> 47)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-icl-y fi-byt-clapper 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6887 -> Patchwork_14395

  CI-20190529: 20190529
  CI_DRM_6887: a692cb67e2b19f95f6a0f543f7ad15140fe46c30 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5178: efb4539494d94f03374874d3b61bd04ef3802aaa @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14395: 09b68c87eb77318b14f9c2875ce5a54fdac79156 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

09b68c87eb77 drm/i915/gt: Only unwedge if we can reset first

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915/gt: Only unwedge if we can reset first
  2019-09-13  9:35 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2019-09-13 10:06   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2019-09-13 10:06 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Patchwork (2019-09-13 10:35:06)
> == Series Details ==
> 
> Series: drm/i915/gt: Only unwedge if we can reset first
> URL   : https://patchwork.freedesktop.org/series/66637/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_6887 -> Patchwork_14395
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_14395 absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_14395, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_14395:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt@i915_module_load@reload-with-fault-injection:
>     - fi-blb-e6850:       NOTRUN -> [DMESG-WARN][1]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-blb-e6850/igt@i915_module_load@reload-with-fault-injection.html
>     - fi-bwr-2160:        [PASS][2] -> [DMESG-WARN][3]
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-bwr-2160/igt@i915_module_load@reload-with-fault-injection.html
>    [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-bwr-2160/igt@i915_module_load@reload-with-fault-injection.html
>     - fi-pnv-d510:        [PASS][4] -> [DMESG-WARN][5]
>    [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-pnv-d510/igt@i915_module_load@reload-with-fault-injection.html
>    [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-pnv-d510/igt@i915_module_load@reload-with-fault-injection.html
>     - fi-gdg-551:         [PASS][6] -> [DMESG-WARN][7]
>    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6887/fi-gdg-551/igt@i915_module_load@reload-with-fault-injection.html
>    [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14395/fi-gdg-551/igt@i915_module_load@reload-with-fault-injection.html

How to ruin the party!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/gt: Only unwedge if we can reset first
  2019-09-13  7:47 [PATCH] drm/i915/gt: Only unwedge if we can reset first Chris Wilson
  2019-09-13  9:35 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2019-09-13 16:03 ` Ville Syrjälä
  2019-09-13 16:13   ` Chris Wilson
  1 sibling, 1 reply; 7+ messages in thread
From: Ville Syrjälä @ 2019-09-13 16:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Sep 13, 2019 at 08:47:20AM +0100, Chris Wilson wrote:
> Unwedging the GPU requires a successful GPU reset before we restore the
> default submission, or else we may see residual context switch events
> that we were not expecting.
> 
> v2: Pull in the special-case reset_clobbers_display, and explain why it
> should be safe in the context of unwedging.
> 
> Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 30 ++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index ee52947eb31d..d3b1cdafd4c2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -7,6 +7,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/stop_machine.h>
>  
> +#include "display/intel_display.h"
>  #include "display/intel_display_types.h"
>  #include "display/intel_overlay.h"
>  
> @@ -729,6 +730,28 @@ static void nop_submit_request(struct i915_request *request)
>  	intel_engine_queue_breadcrumbs(engine);
>  }
>  
> +static bool reset_clobbers_display(struct drm_i915_private *i915)
> +{
> +	struct intel_crtc *crtc;
> +
> +	if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
> +		return false;
> +
> +	/*
> +	 * While this appears racy, we should only be inspecting the display
> +	 * state at runtime from inside a GPU reset, which will be serialized
> +	 * with modesets on affected machines. For a full device reset,
> +	 * we should already have cleared the active CRTC state in
> +	 * intel_prepare_reset().
> +	 */
> +	for_each_intel_crtc(&i915->drm, crtc) {
> +		if (crtc->active)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  static void __intel_gt_set_wedged(struct intel_gt *gt)
>  {
>  	struct intel_engine_cs *engine;
> @@ -793,6 +816,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
>  	struct intel_gt_timelines *timelines = &gt->timelines;
>  	struct intel_timeline *tl;
>  	unsigned long flags;
> +	bool ok;
>  
>  	if (!test_bit(I915_WEDGED, &gt->reset.flags))
>  		return true;
> @@ -838,7 +862,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
>  	}
>  	spin_unlock_irqrestore(&timelines->lock, flags);
>  
> -	intel_gt_sanitize(gt, false);
> +	ok = false;
> +	if (!reset_clobbers_display(gt->i915))
> +		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;

/me re-reading a lot of the reset code..

I'm rather confused about the whole reset flow. We now do a reset here,
but then we still do another one later on? Except for i915_gem_sanitize(),
which gets called during probe and resume so only does the single reset
I guess.

Hopefully we can't be marked as wedged during probe because I think
this gets called before crtc->active is populated so we'd just do the
reset anyway.

As for the resume cases, I think the display should be off already
when this gets called.

So I guess I'm not really sure what this check is meant to do for us.

> +	if (!ok)
> +		return false;
>  
>  	/*
>  	 * Undo nop_submit_request. We prevent all new i915 requests from
> -- 
> 2.23.0

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/gt: Only unwedge if we can reset first
  2019-09-13 16:03 ` [PATCH] " Ville Syrjälä
@ 2019-09-13 16:13   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2019-09-13 16:13 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Quoting Ville Syrjälä (2019-09-13 17:03:34)
> On Fri, Sep 13, 2019 at 08:47:20AM +0100, Chris Wilson wrote:
> > Unwedging the GPU requires a successful GPU reset before we restore the
> > default submission, or else we may see residual context switch events
> > that we were not expecting.
> > 
> > v2: Pull in the special-case reset_clobbers_display, and explain why it
> > should be safe in the context of unwedging.
> > 
> > Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_reset.c | 30 ++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index ee52947eb31d..d3b1cdafd4c2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/sched/mm.h>
> >  #include <linux/stop_machine.h>
> >  
> > +#include "display/intel_display.h"
> >  #include "display/intel_display_types.h"
> >  #include "display/intel_overlay.h"
> >  
> > @@ -729,6 +730,28 @@ static void nop_submit_request(struct i915_request *request)
> >       intel_engine_queue_breadcrumbs(engine);
> >  }
> >  
> > +static bool reset_clobbers_display(struct drm_i915_private *i915)
> > +{
> > +     struct intel_crtc *crtc;
> > +
> > +     if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
> > +             return false;
> > +
> > +     /*
> > +      * While this appears racy, we should only be inspecting the display
> > +      * state at runtime from inside a GPU reset, which will be serialized
> > +      * with modesets on affected machines. For a full device reset,
> > +      * we should already have cleared the active CRTC state in
> > +      * intel_prepare_reset().
> > +      */
> > +     for_each_intel_crtc(&i915->drm, crtc) {
> > +             if (crtc->active)
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  static void __intel_gt_set_wedged(struct intel_gt *gt)
> >  {
> >       struct intel_engine_cs *engine;
> > @@ -793,6 +816,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
> >       struct intel_gt_timelines *timelines = &gt->timelines;
> >       struct intel_timeline *tl;
> >       unsigned long flags;
> > +     bool ok;
> >  
> >       if (!test_bit(I915_WEDGED, &gt->reset.flags))
> >               return true;
> > @@ -838,7 +862,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
> >       }
> >       spin_unlock_irqrestore(&timelines->lock, flags);
> >  
> > -     intel_gt_sanitize(gt, false);
> > +     ok = false;
> > +     if (!reset_clobbers_display(gt->i915))
> > +             ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
> 
> /me re-reading a lot of the reset code..
> 
> I'm rather confused about the whole reset flow. We now do a reset here,
> but then we still do another one later on? Except for i915_gem_sanitize(),
> which gets called during probe and resume so only does the single reset
> I guess.

The idea was to keep wedging as a separate process (since that gives
nice symmetry because set-wedge unset-wedge). We can merge it so
that we only do the single reset, but we still should keep the request
flushing in place, or we need to play games with rcu barriers again.
 
> Hopefully we can't be marked as wedged during probe because I think
> this gets called before crtc->active is populated so we'd just do the
> reset anyway.

I was hoping the crtcs would be off at that point... They weren't.

> As for the resume cases, I think the display should be off already
> when this gets called.
> 
> So I guess I'm not really sure what this check is meant to do for us.

I gave up,

-       intel_gt_sanitize(gt, false);
+       /* We must reset pending GPU events before restoring our submission */
+       ok = !HAS_EXECLISTS(gt->i915);
+       if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+               ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+       if (!ok)
+               return false;

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915/gt: Only unwedge if we can reset first
  2019-09-27 16:03 Chris Wilson
@ 2019-10-01  8:23 ` Janusz Krzysztofik
  0 siblings, 0 replies; 7+ messages in thread
From: Janusz Krzysztofik @ 2019-10-01  8:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

On Friday, September 27, 2019 6:03:35 PM CEST Chris Wilson wrote:
> Unwedging the GPU requires a successful GPU reset before we restore the
> default submission, or else we may see residual context switch events
> that we were not expecting.
> 
> v2: Pull in the special-case reset_clobbers_display, and explain why it
> should be safe in the context of unwedging.
> 
> v3: Just forget all about resets before unwedging if it will clobber the
> display; risk it all.

AFAICU, the risk we take is, when running on hardware with support for 
execlists, if reset clobbers the display we never unwedge, even if maybe we 
could.  On the other hand, when running on hardware which doesn't support 
execlists, we optimistically unwedge unless we can try the reset and it fails.

Assuming the issue we are trying to fix here is specific to execlists mode, 
that seems to be a safe choice.

Thanks,
Janusz

> Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v1
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/
gt/intel_reset.c
> index d08226f5bea5..11781a626f75 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -807,6 +807,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
>  	struct intel_gt_timelines *timelines = &gt->timelines;
>  	struct intel_timeline *tl;
>  	unsigned long flags;
> +	bool ok;
>  
>  	if (!test_bit(I915_WEDGED, &gt->reset.flags))
>  		return true;
> @@ -853,7 +854,12 @@ static bool __intel_gt_unset_wedged(struct intel_gt 
*gt)
>  	}
>  	spin_unlock_irqrestore(&timelines->lock, flags);
>  
> -	intel_gt_sanitize(gt, false);
> +	/* We must reset pending GPU events before restoring our submission 
*/
> +	ok = !HAS_EXECLISTS(gt->i915);
> +	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
> +		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
> +	if (!ok)
> +		return false;
>  
>  	/*
>  	 * Undo nop_submit_request. We prevent all new i915 requests from
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] drm/i915/gt: Only unwedge if we can reset first
@ 2019-09-27 16:03 Chris Wilson
  2019-10-01  8:23 ` Janusz Krzysztofik
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2019-09-27 16:03 UTC (permalink / raw)
  To: intel-gfx

Unwedging the GPU requires a successful GPU reset before we restore the
default submission, or else we may see residual context switch events
that we were not expecting.

v2: Pull in the special-case reset_clobbers_display, and explain why it
should be safe in the context of unwedging.

v3: Just forget all about resets before unwedging if it will clobber the
display; risk it all.

Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v1
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index d08226f5bea5..11781a626f75 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -807,6 +807,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl;
 	unsigned long flags;
+	bool ok;
 
 	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		return true;
@@ -853,7 +854,12 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	}
 	spin_unlock_irqrestore(&timelines->lock, flags);
 
-	intel_gt_sanitize(gt, false);
+	/* We must reset pending GPU events before restoring our submission */
+	ok = !HAS_EXECLISTS(gt->i915);
+	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+	if (!ok)
+		return false;
 
 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from
-- 
2.23.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-01  8:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13  7:47 [PATCH] drm/i915/gt: Only unwedge if we can reset first Chris Wilson
2019-09-13  9:35 ` ✗ Fi.CI.BAT: failure for " Patchwork
2019-09-13 10:06   ` Chris Wilson
2019-09-13 16:03 ` [PATCH] " Ville Syrjälä
2019-09-13 16:13   ` Chris Wilson
2019-09-27 16:03 Chris Wilson
2019-10-01  8:23 ` Janusz Krzysztofik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.