All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Don't unwedge if reset is disabled
@ 2019-09-05  9:09 Janusz Krzysztofik
  2019-09-05  9:43 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Janusz Krzysztofik @ 2019-09-05  9:09 UTC (permalink / raw)
  To: Chris Wilson, Daniele Ceraolo Spurio, Tvrtko Ursulin; +Cc: intel-gfx

When trying to reset a device with reset capability disabled or not
supported while rings are full of requests, it has been observed when
running in execlists submission mode that command stream buffer tail
tends to be incremented by apparently still running GPU regardless of
all requests being already cancelled and command stream buffer pointers
reset.  As a result, kernel panic on NULL pointer dereference occurs
when a trace_ports() helper is called with command stream buffer tail
incremented but request pointers being NULL during final
__intel_gt_set_wedged() operation called from intel_gt_reset().

Skip actual reset procedure if reset is disabled or not supported.

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index b9d84d52e986..d75da124e280 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -932,25 +932,35 @@ void intel_gt_reset(struct intel_gt *gt,
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
 	mutex_lock(&gt->reset.mutex);
 
-	/* Clear any previous failed attempts at recovery. Time to try again. */
-	if (!__intel_gt_unset_wedged(gt))
-		goto unlock;
-
 	if (reason)
 		dev_notice(gt->i915->drm.dev,
 			   "Resetting chip for %s\n", reason);
-	atomic_inc(&gt->i915->gpu_error.reset_count);
-
-	awake = reset_prepare(gt);
 
 	if (!intel_has_gpu_reset(gt->i915)) {
 		if (i915_modparams.reset)
 			dev_err(gt->i915->drm.dev, "GPU reset not supported\n");
 		else
 			DRM_DEBUG_DRIVER("GPU reset disabled\n");
-		goto error;
+
+		/*
+		 * Don't unwedge if reset is disabled or not supported
+		 * because we can't guarantee what the hardware status is.
+		 */
+		if (intel_gt_is_wedged(gt))
+			goto unlock;
 	}
 
+	/* Clear any previous failed attempts at recovery. Time to try again. */
+	if (!__intel_gt_unset_wedged(gt))
+		goto unlock;
+
+	atomic_inc(&gt->i915->gpu_error.reset_count);
+
+	awake = reset_prepare(gt);
+
+	if (!intel_has_gpu_reset(gt->i915))
+		goto error;
+
 	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
 		intel_runtime_pm_disable_interrupts(gt->i915);
 
-- 
2.21.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Don't unwedge if reset is disabled
  2019-09-05  9:09 [PATCH] drm/i915: Don't unwedge if reset is disabled Janusz Krzysztofik
@ 2019-09-05  9:43 ` Patchwork
  2019-09-05 10:50 ` ✓ Fi.CI.IGT: " Patchwork
  2019-09-06 22:28 ` [PATCH] " Daniele Ceraolo Spurio
  2 siblings, 0 replies; 9+ messages in thread
From: Patchwork @ 2019-09-05  9:43 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Don't unwedge if reset is disabled
URL   : https://patchwork.freedesktop.org/series/66264/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6838 -> Patchwork_14281
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/

Known issues
------------

  Here are the changes found in Patchwork_14281 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       [PASS][1] -> [INCOMPLETE][2] ([fdo#107718])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-skl-lmem:        [PASS][3] -> [INCOMPLETE][4] ([fdo#104108])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-skl-lmem/igt@gem_exec_suspend@basic-s4-devices.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-skl-lmem/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@gem_flink_basic@bad-flink:
    - fi-icl-u3:          [PASS][5] -> [DMESG-WARN][6] ([fdo#107724])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-icl-u3/igt@gem_flink_basic@bad-flink.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-icl-u3/igt@gem_flink_basic@bad-flink.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-icl-u2:          [PASS][7] -> [FAIL][8] ([fdo#109483] / [fdo#109635 ])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-icl-u2/igt@kms_chamelium@dp-edid-read.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-icl-u2/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [PASS][9] -> [FAIL][10] ([fdo#111407])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  
#### Possible fixes ####

  * igt@gem_mmap_gtt@basic:
    - fi-glk-dsi:         [INCOMPLETE][11] ([fdo#103359] / [k.org#198133]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-glk-dsi/igt@gem_mmap_gtt@basic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-glk-dsi/igt@gem_mmap_gtt@basic.html

  * igt@kms_busy@basic-flip-c:
    - fi-skl-6770hq:      [SKIP][13] ([fdo#109271] / [fdo#109278]) -> [PASS][14] +2 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-kbl-7500u:       [WARN][15] ([fdo#109483]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-kbl-7500u/igt@kms_chamelium@dp-edid-read.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-kbl-7500u/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_flip@basic-flip-vs-dpms:
    - fi-skl-6770hq:      [SKIP][17] ([fdo#109271]) -> [PASS][18] +23 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       [DMESG-WARN][19] ([fdo#102614]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html

  * igt@vgem_basic@debugfs:
    - fi-icl-u3:          [DMESG-WARN][21] ([fdo#107724]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/fi-icl-u3/igt@vgem_basic@debugfs.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/fi-icl-u3/igt@vgem_basic@debugfs.html

  
  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109483]: https://bugs.freedesktop.org/show_bug.cgi?id=109483
  [fdo#109635 ]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 
  [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (53 -> 46)
------------------------------

  Additional (1): fi-pnv-d510 
  Missing    (8): fi-ilk-m540 fi-tgl-u fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6838 -> Patchwork_14281

  CI-20190529: 20190529
  CI_DRM_6838: 8e907b7591b620dba402c7ada493a31ca0320c99 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5171: 1911564805fe454919e8a5846534a0c1ef376a33 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14281: 24eb771e9f678da87d98f27586f093ee97be08f2 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

24eb771e9f67 drm/i915: Don't unwedge if reset is disabled

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Don't unwedge if reset is disabled
  2019-09-05  9:09 [PATCH] drm/i915: Don't unwedge if reset is disabled Janusz Krzysztofik
  2019-09-05  9:43 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2019-09-05 10:50 ` Patchwork
  2019-09-06 22:28 ` [PATCH] " Daniele Ceraolo Spurio
  2 siblings, 0 replies; 9+ messages in thread
From: Patchwork @ 2019-09-05 10:50 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Don't unwedge if reset is disabled
URL   : https://patchwork.freedesktop.org/series/66264/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6838_full -> Patchwork_14281_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_14281_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_schedule@in-order-bsd:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#111325]) +6 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb5/igt@gem_exec_schedule@in-order-bsd.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb2/igt@gem_exec_schedule@in-order-bsd.html

  * igt@gem_exec_schedule@pi-ringfull-blt:
    - shard-apl:          [PASS][3] -> [FAIL][4] ([fdo#111547])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl4/igt@gem_exec_schedule@pi-ringfull-blt.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl8/igt@gem_exec_schedule@pi-ringfull-blt.html

  * igt@gem_exec_schedule@promotion-bsd1:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([fdo#109276]) +11 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb4/igt@gem_exec_schedule@promotion-bsd1.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb8/igt@gem_exec_schedule@promotion-bsd1.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - shard-apl:          [PASS][7] -> [FAIL][8] ([fdo#111550])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl8/igt@gem_exec_suspend@basic-s4-devices.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl8/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@gem_mocs_settings@mocs-rc6-ctx-dirty-render:
    - shard-apl:          [PASS][9] -> [SKIP][10] ([fdo#109271]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl8/igt@gem_mocs_settings@mocs-rc6-ctx-dirty-render.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl8/igt@gem_mocs_settings@mocs-rc6-ctx-dirty-render.html

  * igt@gem_softpin@softpin:
    - shard-apl:          [PASS][11] -> [INCOMPLETE][12] ([fdo#103927])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl1/igt@gem_softpin@softpin.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl2/igt@gem_softpin@softpin.html

  * igt@i915_pm_rc6_residency@rc6-accuracy:
    - shard-kbl:          [PASS][13] -> [SKIP][14] ([fdo#109271])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-kbl6/igt@i915_pm_rc6_residency@rc6-accuracy.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-kbl3/igt@i915_pm_rc6_residency@rc6-accuracy.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress:
    - shard-hsw:          [PASS][15] -> [FAIL][16] ([fdo#111548]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw5/igt@i915_pm_rpm@modeset-non-lpsp-stress.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw5/igt@i915_pm_rpm@modeset-non-lpsp-stress.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-apl:          [PASS][17] -> [DMESG-WARN][18] ([fdo#108566]) +5 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl1/igt@kms_flip@flip-vs-suspend-interruptible.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl2/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render:
    - shard-iclb:         [PASS][19] -> [FAIL][20] ([fdo#103167]) +2 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb5/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt:
    - shard-skl:          [PASS][21] -> [FAIL][22] ([fdo#103167])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl8/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [PASS][23] -> [SKIP][24] ([fdo#109441]) +1 similar issue
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb7/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_setmode@basic:
    - shard-hsw:          [PASS][25] -> [FAIL][26] ([fdo#99912])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw7/igt@kms_setmode@basic.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw4/igt@kms_setmode@basic.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-apl:          [PASS][27] -> [FAIL][28] ([fdo#103375])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl8/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl8/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@kms_vblank@pipe-b-ts-continuation-suspend:
    - shard-hsw:          [PASS][29] -> [FAIL][30] ([fdo#103375]) +1 similar issue
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw5/igt@kms_vblank@pipe-b-ts-continuation-suspend.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw5/igt@kms_vblank@pipe-b-ts-continuation-suspend.html

  * igt@perf_pmu@other-read-0:
    - shard-apl:          [PASS][31] -> [FAIL][32] ([fdo#111545]) +7 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl8/igt@perf_pmu@other-read-0.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl8/igt@perf_pmu@other-read-0.html

  
#### Possible fixes ####

  * igt@gem_ctx_shared@exec-single-timeline-bsd:
    - shard-iclb:         [SKIP][33] ([fdo#110841]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb1/igt@gem_ctx_shared@exec-single-timeline-bsd.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb6/igt@gem_ctx_shared@exec-single-timeline-bsd.html

  * igt@gem_exec_reloc@basic-gtt-active:
    - shard-skl:          [DMESG-WARN][35] ([fdo#106107]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl5/igt@gem_exec_reloc@basic-gtt-active.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl5/igt@gem_exec_reloc@basic-gtt-active.html

  * igt@gem_exec_schedule@independent-bsd2:
    - shard-iclb:         [SKIP][37] ([fdo#109276]) -> [PASS][38] +15 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb7/igt@gem_exec_schedule@independent-bsd2.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb2/igt@gem_exec_schedule@independent-bsd2.html

  * igt@gem_exec_schedule@preempt-other-chain-bsd:
    - shard-iclb:         [SKIP][39] ([fdo#111325]) -> [PASS][40] +4 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb2/igt@gem_exec_schedule@preempt-other-chain-bsd.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb7/igt@gem_exec_schedule@preempt-other-chain-bsd.html

  * igt@gem_request_retire@retire-vma-not-inactive:
    - shard-apl:          [INCOMPLETE][41] ([fdo#103927]) -> [PASS][42] +3 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl6/igt@gem_request_retire@retire-vma-not-inactive.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl5/igt@gem_request_retire@retire-vma-not-inactive.html

  * igt@gem_softpin@noreloc-s3:
    - shard-skl:          [INCOMPLETE][43] ([fdo#104108] / [fdo#107773]) -> [PASS][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl1/igt@gem_softpin@noreloc-s3.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl3/igt@gem_softpin@noreloc-s3.html

  * igt@i915_pm_rc6_residency@rc6-accuracy:
    - shard-snb:          [SKIP][45] ([fdo#109271]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-snb4/igt@i915_pm_rc6_residency@rc6-accuracy.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-snb6/igt@i915_pm_rc6_residency@rc6-accuracy.html

  * igt@i915_pm_rpm@system-suspend:
    - shard-hsw:          [FAIL][47] ([fdo#111548]) -> [PASS][48] +4 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw2/igt@i915_pm_rpm@system-suspend.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw4/igt@i915_pm_rpm@system-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-skl:          [FAIL][49] ([fdo#105363]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-apl:          [DMESG-WARN][51] ([fdo#108566]) -> [PASS][52] +3 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-apl5/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-apl1/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite:
    - shard-iclb:         [FAIL][53] ([fdo#103167]) -> [PASS][54] +2 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb5/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-hsw:          [FAIL][55] ([fdo#103375]) -> [PASS][56] +6 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw2/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][57] ([fdo#108145] / [fdo#110403]) -> [PASS][58]
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl3/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_plane_cursor@pipe-c-viewport-size-256:
    - shard-iclb:         [INCOMPLETE][59] ([fdo#107713]) -> [PASS][60] +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb1/igt@kms_plane_cursor@pipe-c-viewport-size-256.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb5/igt@kms_plane_cursor@pipe-c-viewport-size-256.html

  * igt@kms_psr@psr2_sprite_mmap_gtt:
    - shard-iclb:         [SKIP][61] ([fdo#109441]) -> [PASS][62] +1 similar issue
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb7/igt@kms_psr@psr2_sprite_mmap_gtt.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb2/igt@kms_psr@psr2_sprite_mmap_gtt.html

  * igt@perf@blocking:
    - shard-skl:          [FAIL][63] ([fdo#110728]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-skl7/igt@perf@blocking.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-skl7/igt@perf@blocking.html

  
#### Warnings ####

  * igt@gem_ctx_isolation@vcs1-nonpriv:
    - shard-iclb:         [FAIL][65] ([fdo#111329]) -> [SKIP][66] ([fdo#109276])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb2/igt@gem_ctx_isolation@vcs1-nonpriv.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb6/igt@gem_ctx_isolation@vcs1-nonpriv.html

  * igt@gem_mocs_settings@mocs-rc6-bsd2:
    - shard-iclb:         [SKIP][67] ([fdo#109276]) -> [FAIL][68] ([fdo#111330])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-iclb8/igt@gem_mocs_settings@mocs-rc6-bsd2.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-iclb4/igt@gem_mocs_settings@mocs-rc6-bsd2.html

  * igt@i915_pm_rpm@gem-execbuf-stress-pc8:
    - shard-hsw:          [SKIP][69] ([fdo#109271]) -> [FAIL][70] ([fdo#111548]) +2 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw5/igt@i915_pm_rpm@gem-execbuf-stress-pc8.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw5/igt@i915_pm_rpm@gem-execbuf-stress-pc8.html

  * igt@i915_pm_rpm@modeset-lpsp:
    - shard-hsw:          [FAIL][71] ([fdo#111548]) -> [SKIP][72] ([fdo#109271])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6838/shard-hsw2/igt@i915_pm_rpm@modeset-lpsp.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/shard-hsw6/igt@i915_pm_rpm@modeset-lpsp.html

  
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#106107]: https://bugs.freedesktop.org/show_bug.cgi?id=106107
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107773]: https://bugs.freedesktop.org/show_bug.cgi?id=107773
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110728]: https://bugs.freedesktop.org/show_bug.cgi?id=110728
  [fdo#110841]: https://bugs.freedesktop.org/show_bug.cgi?id=110841
  [fdo#111325]: https://bugs.freedesktop.org/show_bug.cgi?id=111325
  [fdo#111329]: https://bugs.freedesktop.org/show_bug.cgi?id=111329
  [fdo#111330]: https://bugs.freedesktop.org/show_bug.cgi?id=111330
  [fdo#111545]: https://bugs.freedesktop.org/show_bug.cgi?id=111545
  [fdo#111547]: https://bugs.freedesktop.org/show_bug.cgi?id=111547
  [fdo#111548]: https://bugs.freedesktop.org/show_bug.cgi?id=111548
  [fdo#111550]: https://bugs.freedesktop.org/show_bug.cgi?id=111550
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6838 -> Patchwork_14281

  CI-20190529: 20190529
  CI_DRM_6838: 8e907b7591b620dba402c7ada493a31ca0320c99 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5171: 1911564805fe454919e8a5846534a0c1ef376a33 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14281: 24eb771e9f678da87d98f27586f093ee97be08f2 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14281/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-05  9:09 [PATCH] drm/i915: Don't unwedge if reset is disabled Janusz Krzysztofik
  2019-09-05  9:43 ` ✓ Fi.CI.BAT: success for " Patchwork
  2019-09-05 10:50 ` ✓ Fi.CI.IGT: " Patchwork
@ 2019-09-06 22:28 ` Daniele Ceraolo Spurio
  2019-09-07  8:39   ` Chris Wilson
  2 siblings, 1 reply; 9+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-09-06 22:28 UTC (permalink / raw)
  To: Janusz Krzysztofik, Chris Wilson, Tvrtko Ursulin; +Cc: intel-gfx



On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> When trying to reset a device with reset capability disabled or not
> supported while rings are full of requests, it has been observed when
> running in execlists submission mode that command stream buffer tail
> tends to be incremented by apparently still running GPU regardless of
> all requests being already cancelled and command stream buffer pointers
> reset.  As a result, kernel panic on NULL pointer dereference occurs
> when a trace_ports() helper is called with command stream buffer tail
> incremented but request pointers being NULL during final
> __intel_gt_set_wedged() operation called from intel_gt_reset().
> 
> Skip actual reset procedure if reset is disabled or not supported.

This last sentence is a bit confusing. You're not skipping the reset 
procedure, you're skipping the attempt of unwedging and resetting again 
after a reset & wedge already happened.

> 
> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 26 ++++++++++++++++++--------
>   1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index b9d84d52e986..d75da124e280 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -932,25 +932,35 @@ void intel_gt_reset(struct intel_gt *gt,
>   	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
>   	mutex_lock(&gt->reset.mutex);
>   
> -	/* Clear any previous failed attempts at recovery. Time to try again. */
> -	if (!__intel_gt_unset_wedged(gt))
> -		goto unlock;
> -

Since you're anyway checking the wedged status and reset support 
multiple times, wouldn't it have been better to just add a single check 
at the beginning? e.g.

	/* we can't recover a wedged GT without reset */
	if (!intel_has_gpu_reset(gt->i915) && intel_gt_is_wedged(gt))
		goto unlock;

Daniele

>   	if (reason)
>   		dev_notice(gt->i915->drm.dev,
>   			   "Resetting chip for %s\n", reason);
> -	atomic_inc(&gt->i915->gpu_error.reset_count);
> -
> -	awake = reset_prepare(gt);
>   
>   	if (!intel_has_gpu_reset(gt->i915)) {
>   		if (i915_modparams.reset)
>   			dev_err(gt->i915->drm.dev, "GPU reset not supported\n");
>   		else
>   			DRM_DEBUG_DRIVER("GPU reset disabled\n");
> -		goto error;
> +
> +		/*
> +		 * Don't unwedge if reset is disabled or not supported
> +		 * because we can't guarantee what the hardware status is.
> +		 */
> +		if (intel_gt_is_wedged(gt))
> +			goto unlock;
>   	}
>   
> +	/* Clear any previous failed attempts at recovery. Time to try again. */
> +	if (!__intel_gt_unset_wedged(gt))
> +		goto unlock;
> +
> +	atomic_inc(&gt->i915->gpu_error.reset_count);
> +
> +	awake = reset_prepare(gt);
> +
> +	if (!intel_has_gpu_reset(gt->i915))
> +		goto error;
> +
>   	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
>   		intel_runtime_pm_disable_interrupts(gt->i915);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-06 22:28 ` [PATCH] " Daniele Ceraolo Spurio
@ 2019-09-07  8:39   ` Chris Wilson
  2019-09-09 16:27     ` Daniele Ceraolo Spurio
  2019-09-09 21:48     ` Chris Wilson
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2019-09-07  8:39 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Janusz Krzysztofik, Tvrtko Ursulin; +Cc: intel-gfx

Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> 
> 
> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> > When trying to reset a device with reset capability disabled or not
> > supported while rings are full of requests, it has been observed when
> > running in execlists submission mode that command stream buffer tail
> > tends to be incremented by apparently still running GPU regardless of
> > all requests being already cancelled and command stream buffer pointers
> > reset.  As a result, kernel panic on NULL pointer dereference occurs
> > when a trace_ports() helper is called with command stream buffer tail
> > incremented but request pointers being NULL during final
> > __intel_gt_set_wedged() operation called from intel_gt_reset().
> > 
> > Skip actual reset procedure if reset is disabled or not supported.
> 
> This last sentence is a bit confusing. You're not skipping the reset 
> procedure, you're skipping the attempt of unwedging and resetting again 
> after a reset & wedge already happened.

Loss of email over the last week, so jumping in at the end. My gut
response is that this is still just papering over the bug, as what you
say above makes no sense.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-07  8:39   ` Chris Wilson
@ 2019-09-09 16:27     ` Daniele Ceraolo Spurio
  2019-09-09 21:45       ` Chris Wilson
  2019-09-09 21:48     ` Chris Wilson
  1 sibling, 1 reply; 9+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-09-09 16:27 UTC (permalink / raw)
  To: Chris Wilson, Janusz Krzysztofik, Tvrtko Ursulin; +Cc: intel-gfx



On 9/7/19 1:39 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
>>
>>
>> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
>>> When trying to reset a device with reset capability disabled or not
>>> supported while rings are full of requests, it has been observed when
>>> running in execlists submission mode that command stream buffer tail
>>> tends to be incremented by apparently still running GPU regardless of
>>> all requests being already cancelled and command stream buffer pointers
>>> reset.  As a result, kernel panic on NULL pointer dereference occurs
>>> when a trace_ports() helper is called with command stream buffer tail
>>> incremented but request pointers being NULL during final
>>> __intel_gt_set_wedged() operation called from intel_gt_reset().
>>>
>>> Skip actual reset procedure if reset is disabled or not supported.
>>
>> This last sentence is a bit confusing. You're not skipping the reset
>> procedure, you're skipping the attempt of unwedging and resetting again
>> after a reset & wedge already happened.
> 
> Loss of email over the last week, so jumping in at the end. My gut
> response is that this is still just papering over the bug, as what you
> say above makes no sense.
> -Chris
> 

The issue here is that if we don't reset the HW when we wedge, whatever 
was running on the engines might complete at any point after that, which 
generates an unexpected post-wedge CSB event that we don't handle 
gracefully when we unwedge. The CSB event might arrive at any time (even 
after the unwedge) or cause weird behavior on the first re-submission, 
so trying to handle it is not worth the effort IMO since having reset 
disabled is a debug-only use-case.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-09 16:27     ` Daniele Ceraolo Spurio
@ 2019-09-09 21:45       ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2019-09-09 21:45 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Janusz Krzysztofik, Tvrtko Ursulin; +Cc: intel-gfx

Quoting Daniele Ceraolo Spurio (2019-09-09 17:27:47)
> 
> 
> On 9/7/19 1:39 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> >>
> >>
> >> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> >>> When trying to reset a device with reset capability disabled or not
> >>> supported while rings are full of requests, it has been observed when
> >>> running in execlists submission mode that command stream buffer tail
> >>> tends to be incremented by apparently still running GPU regardless of
> >>> all requests being already cancelled and command stream buffer pointers
> >>> reset.  As a result, kernel panic on NULL pointer dereference occurs
> >>> when a trace_ports() helper is called with command stream buffer tail
> >>> incremented but request pointers being NULL during final
> >>> __intel_gt_set_wedged() operation called from intel_gt_reset().
> >>>
> >>> Skip actual reset procedure if reset is disabled or not supported.
> >>
> >> This last sentence is a bit confusing. You're not skipping the reset
> >> procedure, you're skipping the attempt of unwedging and resetting again
> >> after a reset & wedge already happened.
> > 
> > Loss of email over the last week, so jumping in at the end. My gut
> > response is that this is still just papering over the bug, as what you
> > say above makes no sense.
> > -Chris
> > 
> 
> The issue here is that if we don't reset the HW when we wedge, whatever 
> was running on the engines might complete at any point after that, which 
> generates an unexpected post-wedge CSB event that we don't handle 
> gracefully when we unwedge.

Indeed, until we call reset_default_submission all those unexpected
interrupts are redirected to the nop_submission_tasklet.

I think it should be more along the lines of

 	struct intel_timeline *tl;
 	unsigned long flags;
+	bool ok;

 	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		return true;
@@ -838,7 +839,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	}
 	spin_unlock_irqrestore(&timelines->lock, flags);

-	intel_gt_sanitize(gt, false);
+	ok = false;
+	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+	if (!ok)
+		return false;

 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from

For bonus points, gpu_reset_clobbers_display should take into account
whether the display is active.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-07  8:39   ` Chris Wilson
  2019-09-09 16:27     ` Daniele Ceraolo Spurio
@ 2019-09-09 21:48     ` Chris Wilson
  2019-09-13 14:20       ` Janusz Krzysztofik
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2019-09-09 21:48 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Janusz Krzysztofik, Tvrtko Ursulin; +Cc: intel-gfx

Quoting Chris Wilson (2019-09-07 09:39:52)
> Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> > 
> > 
> > On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> > > When trying to reset a device with reset capability disabled or not
> > > supported while rings are full of requests, it has been observed when
> > > running in execlists submission mode that command stream buffer tail
> > > tends to be incremented by apparently still running GPU regardless of
> > > all requests being already cancelled and command stream buffer pointers
> > > reset.  As a result, kernel panic on NULL pointer dereference occurs
> > > when a trace_ports() helper is called with command stream buffer tail
> > > incremented but request pointers being NULL during final
> > > __intel_gt_set_wedged() operation called from intel_gt_reset().
> > > 
> > > Skip actual reset procedure if reset is disabled or not supported.
> > 
> > This last sentence is a bit confusing. You're not skipping the reset 
> > procedure, you're skipping the attempt of unwedging and resetting again 
> > after a reset & wedge already happened.
> 
> Loss of email over the last week, so jumping in at the end. My gut
> response is that this is still just papering over the bug, as what you
> say above makes no sense.

So my gut response was to the run on sentence, when all you needed to
say that without a successful reset prior to calling
reset_default_submission, the engine may still generate CS events out of
the blue. And I think the patch should be written to require the
successful reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Don't unwedge if reset is disabled
  2019-09-09 21:48     ` Chris Wilson
@ 2019-09-13 14:20       ` Janusz Krzysztofik
  0 siblings, 0 replies; 9+ messages in thread
From: Janusz Krzysztofik @ 2019-09-13 14:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Monday, September 9, 2019 11:48:42 PM CEST Chris Wilson wrote:
> Quoting Chris Wilson (2019-09-07 09:39:52)
> > Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05)
> > > 
> > > 
> > > On 9/5/19 2:09 AM, Janusz Krzysztofik wrote:
> > > > When trying to reset a device with reset capability disabled or not
> > > > supported while rings are full of requests, it has been observed when
> > > > running in execlists submission mode that command stream buffer tail
> > > > tends to be incremented by apparently still running GPU regardless of
> > > > all requests being already cancelled and command stream buffer 
pointers
> > > > reset.  As a result, kernel panic on NULL pointer dereference occurs
> > > > when a trace_ports() helper is called with command stream buffer tail
> > > > incremented but request pointers being NULL during final
> > > > __intel_gt_set_wedged() operation called from intel_gt_reset().
> > > > 
> > > > Skip actual reset procedure if reset is disabled or not supported.
> > > 
> > > This last sentence is a bit confusing. You're not skipping the reset 
> > > procedure, you're skipping the attempt of unwedging and resetting again 
> > > after a reset & wedge already happened.
> > 
> > Loss of email over the last week, so jumping in at the end. My gut
> > response is that this is still just papering over the bug, as what you
> > say above makes no sense.
> 
> So my gut response was to the run on sentence, when all you needed to
> say that without a successful reset prior to calling
> reset_default_submission, the engine may still generate CS events out of
> the blue. And I think the patch should be written to require the
> successful reset.

You are right, successful reset seems the only safe protection.

But anyway, while digging deeper waiting for your clarification of that gut 
respone ;-) , I've discovered that symptoms from which the issue can be 
predicted may be sometimes observed during reset_prepere() as failing 
intel_engine_stop_cs().  Checking for that failure alone may be too weak as it 
can probably happen to succeed regardless of the uncertain hardware status, 
but anyway, what do you think about modifying reset_prepare() so it may fail 
with an error propagated from functions it calls, then calling reset_prepare() 
at the beginning of intel_gt_reset() and skiping over 
__intel_gt_unset_wedgede() and further steps (do_reset(), ..., reset_finish()) 
if reset_prepare() fails?  Wouldn't that be a useful additional layer of 
protection?

If you think the idea is worth of being considered, please have a look at my 
first attempt sent to trybot already before your explanation arrived:
https://patchwork.freedesktop.org/patch/329840/?series=66447&rev=1
(don't complain on its commit message making no sense, please ;-) ).

Thanks,
Janusz

> -Chris
> 




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-13 14:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05  9:09 [PATCH] drm/i915: Don't unwedge if reset is disabled Janusz Krzysztofik
2019-09-05  9:43 ` ✓ Fi.CI.BAT: success for " Patchwork
2019-09-05 10:50 ` ✓ Fi.CI.IGT: " Patchwork
2019-09-06 22:28 ` [PATCH] " Daniele Ceraolo Spurio
2019-09-07  8:39   ` Chris Wilson
2019-09-09 16:27     ` Daniele Ceraolo Spurio
2019-09-09 21:45       ` Chris Wilson
2019-09-09 21:48     ` Chris Wilson
2019-09-13 14:20       ` Janusz Krzysztofik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.