All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
@ 2022-11-28 16:52 Andrzej Hajda
  2022-11-28 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrzej Hajda @ 2022-11-28 16:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andrzej Hajda, Rodrigo Vivi

In case context is exiting preempt_timeout_ms is used for timeout,
but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it increases
to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.

Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a for GuC submission")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
Hi all,

I am not sure what is expected solution here, and if my patch does not
actually reverts intentions of patch d7a8680ec9fb21. Feel free to propose
something better.
Other alternative would be to increase t/o in IGT tests, but I am not sure
if this is good direction.

Regards
Andrzej
---
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 49a8f10d76c77b..bbbbcd9e00f947 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1248,6 +1248,10 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
 	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
 		return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS;
+	else if (unlikely(intel_context_is_exiting(rq->context)))
+		return min_t(typeof(unsigned long),
+			     READ_ONCE(engine->props.preempt_timeout_ms),
+			     CONFIG_DRM_I915_PREEMPT_TIMEOUT);
 
 	return READ_ONCE(engine->props.preempt_timeout_ms);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: fix exiting context timeout calculation
  2022-11-28 16:52 [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation Andrzej Hajda
@ 2022-11-28 18:06 ` Patchwork
  2022-11-28 18:28 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2022-11-28 18:06 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: fix exiting context timeout calculation
URL   : https://patchwork.freedesktop.org/series/111402/
State : warning

== Summary ==

Error: dim checkpatch failed
a049dd461775 drm/i915: fix exiting context timeout calculation
-:10: WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: d7a8680ec9fb ("drm/i915: Improve long running compute w/a for GuC submission")'
#10: 
Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a for GuC submission")

total: 0 errors, 1 warnings, 0 checks, 10 lines checked



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: fix exiting context timeout calculation
  2022-11-28 16:52 [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation Andrzej Hajda
  2022-11-28 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2022-11-28 18:28 ` Patchwork
  2022-11-29  0:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  2022-11-29  8:43 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2022-11-28 18:28 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 10626 bytes --]

== Series Details ==

Series: drm/i915: fix exiting context timeout calculation
URL   : https://patchwork.freedesktop.org/series/111402/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12439 -> Patchwork_111402v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/index.html

Participating hosts (31 -> 35)
------------------------------

  Additional (4): fi-ilk-650 fi-jsl-1 fi-cfl-8700k bat-dg1-6 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_111402v1:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-dp-2:
    - {bat-dg2-11}:       [PASS][1] -> [FAIL][2] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/bat-dg2-11/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-dp-2.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg2-11/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-dp-2.html

  
Known issues
------------

  Here are the changes found in Patchwork_111402v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_huc_copy@huc-copy:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][3] ([fdo#109271] / [i915#2190])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-cfl-8700k/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#4613]) +3 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-cfl-8700k/igt@gem_lmem_swapping@basic.html

  * igt@gem_mmap@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][5] ([i915#4083])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@gem_mmap@basic.html

  * igt@gem_render_tiled_blits@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][6] ([i915#4079]) +1 similar issue
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@gem_render_tiled_blits@basic.html

  * igt@gem_tiled_fence_blits@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][7] ([i915#4077]) +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@gem_tiled_fence_blits@basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - bat-dg1-6:          NOTRUN -> [SKIP][8] ([i915#7561])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-ilk-650:         NOTRUN -> [SKIP][9] ([fdo#109271]) +20 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-ilk-650/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg1-6:          NOTRUN -> [SKIP][10] ([i915#6621])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@i915_pm_rps@basic-api.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg1-6:          NOTRUN -> [SKIP][11] ([i915#4215])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
    - bat-dg1-6:          NOTRUN -> [SKIP][12] ([i915#4212]) +7 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_addfb_basic@tile-pitch-mismatch.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][13] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-cfl-8700k/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-ilk-650:         NOTRUN -> [SKIP][14] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-ilk-650/igt@kms_chamelium@dp-hpd-fast.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - bat-dg1-6:          NOTRUN -> [SKIP][15] ([fdo#111827]) +8 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
    - fi-cfl-8700k:       NOTRUN -> [SKIP][16] ([fdo#109271]) +9 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/fi-cfl-8700k/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html
    - bat-dg1-6:          NOTRUN -> [SKIP][17] ([i915#4103] / [i915#4213])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_cursor_legacy@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-dg1-6:          NOTRUN -> [SKIP][18] ([fdo#109285])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_psr@sprite_plane_onoff:
    - bat-dg1-6:          NOTRUN -> [SKIP][19] ([i915#1072] / [i915#4078]) +3 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_psr@sprite_plane_onoff.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg1-6:          NOTRUN -> [SKIP][20] ([i915#3555])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-gtt:
    - bat-dg1-6:          NOTRUN -> [SKIP][21] ([i915#3708] / [i915#4077]) +1 similar issue
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@prime_vgem@basic-gtt.html

  * igt@prime_vgem@basic-userptr:
    - bat-dg1-6:          NOTRUN -> [SKIP][22] ([i915#3708] / [i915#4873])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@prime_vgem@basic-userptr.html

  * igt@prime_vgem@basic-write:
    - bat-dg1-6:          NOTRUN -> [SKIP][23] ([i915#3708]) +3 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-dg1-6/igt@prime_vgem@basic-write.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - {bat-rplp-1}:       [DMESG-WARN][24] ([i915#2867]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/bat-rplp-1/igt@gem_exec_suspend@basic-s3@smem.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-rplp-1/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_pm_rpm@module-reload:
    - {bat-rpls-2}:       [WARN][26] ([i915#7346]) -> [PASS][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/bat-rpls-2/igt@i915_pm_rpm@module-reload.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-rpls-2/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@reset:
    - {bat-rpls-2}:       [DMESG-FAIL][28] ([i915#4983]) -> [PASS][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/bat-rpls-2/igt@i915_selftest@live@reset.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-rpls-2/igt@i915_selftest@live@reset.html

  * igt@i915_selftest@live@slpc:
    - {bat-adln-1}:       [DMESG-FAIL][30] ([i915#6997]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/bat-adln-1/igt@i915_selftest@live@slpc.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/bat-adln-1/igt@i915_selftest@live@slpc.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6434]: https://gitlab.freedesktop.org/drm/intel/issues/6434
  [i915#6559]: https://gitlab.freedesktop.org/drm/intel/issues/6559
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6816]: https://gitlab.freedesktop.org/drm/intel/issues/6816
  [i915#6818]: https://gitlab.freedesktop.org/drm/intel/issues/6818
  [i915#6997]: https://gitlab.freedesktop.org/drm/intel/issues/6997
  [i915#7346]: https://gitlab.freedesktop.org/drm/intel/issues/7346
  [i915#7456]: https://gitlab.freedesktop.org/drm/intel/issues/7456
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561


Build changes
-------------

  * Linux: CI_DRM_12439 -> Patchwork_111402v1

  CI-20190529: 20190529
  CI_DRM_12439: 1e78c0412b6cc27f0b0e3773377011966757ac38 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7072: 69ba7163475925cdc69aebbdfa0e87453ae165c7 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_111402v1: 1e78c0412b6cc27f0b0e3773377011966757ac38 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

7b97ea68ca83 drm/i915: fix exiting context timeout calculation

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/index.html

[-- Attachment #2: Type: text/html, Size: 11868 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: fix exiting context timeout calculation
  2022-11-28 16:52 [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation Andrzej Hajda
  2022-11-28 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
  2022-11-28 18:28 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2022-11-29  0:32 ` Patchwork
  2022-11-29  8:43 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2022-11-29  0:32 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 31659 bytes --]

== Series Details ==

Series: drm/i915: fix exiting context timeout calculation
URL   : https://patchwork.freedesktop.org/series/111402/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12439_full -> Patchwork_111402v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_111402v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_111402v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_111402v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_capture@pi@vecs0:
    - shard-skl:          NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@gem_exec_capture@pi@vecs0.html

  
Known issues
------------

  Here are the changes found in Patchwork_111402v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-iclb:         NOTRUN -> [SKIP][2] ([i915#4525])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_balancer@parallel-keep-in-fence:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([i915#4525]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb1/igt@gem_exec_balancer@parallel-keep-in-fence.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@gem_exec_balancer@parallel-keep-in-fence.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-tglb:         [PASS][5] -> [FAIL][6] ([i915#2842]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-tglb1/igt@gem_exec_fair@basic-flow@rcs0.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-tglb2/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-apl:          [PASS][7] -> [FAIL][8] ([i915#2842])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl3/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-skl:          NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4613]) +2 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_pxp@fail-invalid-protected-context:
    - shard-iclb:         NOTRUN -> [SKIP][10] ([i915#4270])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@gem_pxp@fail-invalid-protected-context.html

  * igt@gem_userptr_blits@unsync-overlap:
    - shard-skl:          NOTRUN -> [SKIP][11] ([fdo#109271]) +195 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl7/igt@gem_userptr_blits@unsync-overlap.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-apl:          [PASS][12] -> [DMESG-WARN][13] ([i915#5566] / [i915#716])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl7/igt@gen9_exec_parse@allowed-single.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl6/igt@gen9_exec_parse@allowed-single.html

  * igt@gen9_exec_parse@bb-start-cmd:
    - shard-iclb:         NOTRUN -> [SKIP][14] ([i915#2856])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@gen9_exec_parse@bb-start-cmd.html

  * igt@kms_big_fb@4-tiled-8bpp-rotate-90:
    - shard-iclb:         NOTRUN -> [SKIP][15] ([i915#5286])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_big_fb@4-tiled-8bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip:
    - shard-iclb:         [PASS][16] -> [DMESG-FAIL][17] ([i915#5138])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb2/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][18] ([i915#3763])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-4_tiled_dg2_mc_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][19] ([fdo#109278]) +5 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-4_tiled_dg2_mc_ccs.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs:
    - shard-skl:          NOTRUN -> [SKIP][20] ([fdo#109271] / [i915#3886]) +8 similar issues
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl6/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@dp-hpd:
    - shard-skl:          NOTRUN -> [SKIP][21] ([fdo#109271] / [fdo#111827]) +10 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@kms_chamelium@dp-hpd.html

  * igt@kms_chamelium@vga-hpd-enable-disable-mode:
    - shard-iclb:         NOTRUN -> [SKIP][22] ([fdo#109284] / [fdo#111827])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_chamelium@vga-hpd-enable-disable-mode.html

  * igt@kms_flip@plain-flip-fb-recreate@b-edp1:
    - shard-skl:          NOTRUN -> [FAIL][23] ([i915#2122])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@kms_flip@plain-flip-fb-recreate@b-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][24] ([i915#2587] / [i915#2672]) +9 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb6/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][25] ([i915#2672]) +3 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-fullscreen:
    - shard-iclb:         NOTRUN -> [SKIP][26] ([fdo#109280]) +2 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-fullscreen.html

  * igt@kms_plane_scaling@plane-scaler-with-clipping-clamping-pixel-formats@pipe-b-edp-1:
    - shard-iclb:         [PASS][27] -> [SKIP][28] ([i915#5176]) +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb1/igt@kms_plane_scaling@plane-scaler-with-clipping-clamping-pixel-formats@pipe-b-edp-1.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb3/igt@kms_plane_scaling@plane-scaler-with-clipping-clamping-pixel-formats@pipe-b-edp-1.html

  * igt@kms_prime@basic-crc-hybrid:
    - shard-iclb:         NOTRUN -> [SKIP][29] ([i915#6524])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_prime@basic-crc-hybrid.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area:
    - shard-skl:          NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#658]) +4 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl1/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-iclb:         NOTRUN -> [SKIP][31] ([fdo#109642] / [fdo#111068] / [i915#658])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@psr2_primary_mmap_gtt:
    - shard-iclb:         NOTRUN -> [SKIP][32] ([fdo#109441]) +1 similar issue
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_psr@psr2_primary_mmap_gtt.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [PASS][33] -> [SKIP][34] ([fdo#109441]) +4 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-iclb:         [PASS][35] -> [SKIP][36] ([i915#5519])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb5/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb8/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_setmode@invalid-clone-single-crtc-stealing:
    - shard-iclb:         NOTRUN -> [SKIP][37] ([i915#3555]) +1 similar issue
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@kms_setmode@invalid-clone-single-crtc-stealing.html

  * igt@kms_vblank@pipe-d-wait-idle:
    - shard-skl:          NOTRUN -> [SKIP][38] ([fdo#109271] / [i915#533])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl1/igt@kms_vblank@pipe-d-wait-idle.html

  * igt@perf@non-zero-reason:
    - shard-skl:          NOTRUN -> [TIMEOUT][39] ([i915#6943])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@perf@non-zero-reason.html

  * igt@syncobj_timeline@reset-during-wait-for-submit:
    - shard-skl:          NOTRUN -> [DMESG-WARN][40] ([i915#1982])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl6/igt@syncobj_timeline@reset-during-wait-for-submit.html

  * igt@sysfs_clients@busy:
    - shard-skl:          NOTRUN -> [SKIP][41] ([fdo#109271] / [i915#2994]) +1 similar issue
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl4/igt@sysfs_clients@busy.html

  
#### Possible fixes ####

  * igt@feature_discovery@psr2:
    - {shard-rkl}:        [SKIP][42] ([i915#658]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-3/igt@feature_discovery@psr2.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@feature_discovery@psr2.html

  * igt@gem_ctx_exec@basic-nohangcheck:
    - {shard-rkl}:        [FAIL][44] ([i915#6268]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-4/igt@gem_ctx_exec@basic-nohangcheck.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-4/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_ctx_persistence@many-contexts:
    - shard-tglb:         [FAIL][46] ([i915#2410]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-tglb8/igt@gem_ctx_persistence@many-contexts.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-tglb3/igt@gem_ctx_persistence@many-contexts.html

  * igt@gem_ctx_persistence@smoketest:
    - {shard-rkl}:        [FAIL][48] ([i915#5099]) -> [PASS][49]
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-6/igt@gem_ctx_persistence@smoketest.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@gem_ctx_persistence@smoketest.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-apl:          [FAIL][50] ([i915#2842]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl6/igt@gem_exec_fair@basic-none-solo@rcs0.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl2/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - {shard-rkl}:        [FAIL][52] ([i915#2842]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-5/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-5/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-skl:          [DMESG-WARN][54] ([i915#5566] / [i915#716]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-skl4/igt@gen9_exec_parse@allowed-all.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl9/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         [SKIP][56] ([i915#4281]) -> [PASS][57]
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb3/igt@i915_pm_dc@dc9-dpms.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb6/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rc6_residency@rc6-idle@rcs0:
    - {shard-dg1}:        [FAIL][58] ([i915#3591]) -> [PASS][59]
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-dg1-13/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-dg1-13/igt@i915_pm_rc6_residency@rc6-idle@rcs0.html

  * igt@i915_pm_rpm@drm-resources-equal:
    - {shard-rkl}:        [SKIP][60] ([fdo#109308]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-1/igt@i915_pm_rpm@drm-resources-equal.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@i915_pm_rpm@drm-resources-equal.html

  * igt@i915_pm_rpm@modeset-lpsp-stress-no-wait:
    - {shard-rkl}:        [SKIP][62] ([i915#1397]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-1/igt@i915_pm_rpm@modeset-lpsp-stress-no-wait.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@i915_pm_rpm@modeset-lpsp-stress-no-wait.html

  * igt@i915_pm_rps@engine-order:
    - shard-apl:          [FAIL][64] ([i915#6537]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl8/igt@i915_pm_rps@engine-order.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl6/igt@i915_pm_rps@engine-order.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - shard-skl:          [DMESG-WARN][66] ([i915#1982]) -> [PASS][67] +1 similar issue
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-skl3/igt@i915_suspend@basic-s2idle-without-i915.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl7/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@i915_suspend@basic-s3-without-i915:
    - shard-iclb:         [INCOMPLETE][68] -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb3/igt@i915_suspend@basic-s3-without-i915.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb5/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_cursor_legacy@cursor-vs-flip@atomic-transitions-varying-size:
    - shard-iclb:         [FAIL][70] ([i915#5072]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb7/igt@kms_cursor_legacy@cursor-vs-flip@atomic-transitions-varying-size.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb3/igt@kms_cursor_legacy@cursor-vs-flip@atomic-transitions-varying-size.html

  * igt@kms_flip@flip-vs-blocking-wf-vblank@a-edp1:
    - shard-skl:          [FAIL][72] ([i915#2122]) -> [PASS][73] +1 similar issue
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-skl3/igt@kms_flip@flip-vs-blocking-wf-vblank@a-edp1.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl7/igt@kms_flip@flip-vs-blocking-wf-vblank@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc:
    - {shard-rkl}:        [SKIP][74] ([i915#1849] / [i915#4098]) -> [PASS][75] +5 similar issues
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-3/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary:
    - shard-iclb:         [FAIL][76] ([i915#2546]) -> [PASS][77]
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb2/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html

  * igt@kms_hdmi_inject@inject-audio:
    - {shard-rkl}:        [SKIP][78] ([i915#433]) -> [PASS][79]
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-3/igt@kms_hdmi_inject@inject-audio.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-1/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_plane@plane-position-hole-dpms@pipe-b-planes:
    - {shard-rkl}:        [SKIP][80] ([i915#3558]) -> [PASS][81] +1 similar issue
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-1/igt@kms_plane@plane-position-hole-dpms@pipe-b-planes.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@kms_plane@plane-position-hole-dpms@pipe-b-planes.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5@pipe-a-edp-1:
    - shard-iclb:         [SKIP][82] ([i915#5235]) -> [PASS][83] +2 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb2/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5@pipe-a-edp-1.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5@pipe-a-edp-1.html

  * igt@kms_psr@primary_render:
    - {shard-rkl}:        [SKIP][84] ([i915#1072]) -> [PASS][85] +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-3/igt@kms_psr@primary_render.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@kms_psr@primary_render.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-tglb:         [SKIP][86] ([i915#5519]) -> [PASS][87]
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-tglb2/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-tglb6/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_rotation_crc@exhaust-fences:
    - {shard-rkl}:        [SKIP][88] ([i915#1845] / [i915#4098]) -> [PASS][89] +10 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-1/igt@kms_rotation_crc@exhaust-fences.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@kms_rotation_crc@exhaust-fences.html

  * igt@kms_universal_plane@cursor-fb-leak-pipe-b:
    - {shard-rkl}:        [SKIP][90] ([i915#1845] / [i915#4070] / [i915#4098]) -> [PASS][91] +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-rkl-1/igt@kms_universal_plane@cursor-fb-leak-pipe-b.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-rkl-6/igt@kms_universal_plane@cursor-fb-leak-pipe-b.html

  
#### Warnings ####

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-iclb:         [SKIP][92] ([i915#658]) -> [SKIP][93] ([i915#588])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb6/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb2/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-c-edp-1:
    - shard-skl:          [FAIL][94] ([i915#4573]) -> [DMESG-FAIL][95] ([IGT#6])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-skl7/igt@kms_plane_alpha_blend@alpha-basic@pipe-c-edp-1.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-skl7/igt@kms_plane_alpha_blend@alpha-basic@pipe-c-edp-1.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-sf:
    - shard-iclb:         [SKIP][96] ([i915#658]) -> [SKIP][97] ([i915#2920]) +1 similar issue
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb6/igt@kms_psr2_sf@cursor-plane-move-continuous-sf.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-iclb:         [SKIP][98] ([fdo#111068] / [i915#658]) -> [SKIP][99] ([i915#2920]) +1 similar issue
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb6/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-big-fb:
    - shard-iclb:         [SKIP][100] ([i915#2920]) -> [SKIP][101] ([i915#658])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-iclb2/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-big-fb.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-iclb1/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-big-fb.html

  * igt@runner@aborted:
    - shard-apl:          ([FAIL][102], [FAIL][103]) ([i915#3002] / [i915#4312]) -> ([FAIL][104], [FAIL][105], [FAIL][106]) ([fdo#109271] / [i915#3002] / [i915#4312])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl8/igt@runner@aborted.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12439/shard-apl1/igt@runner@aborted.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl6/igt@runner@aborted.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl3/igt@runner@aborted.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/shard-apl6/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#6]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/6
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112054]: https://bugs.freedesktop.org/show_bug.cgi?id=112054
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1722]: https://gitlab.freedesktop.org/drm/intel/issues/1722
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2410]: https://gitlab.freedesktop.org/drm/intel/issues/2410
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2546]: https://gitlab.freedesktop.org/drm/intel/issues/2546
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3763]: https://gitlab.freedesktop.org/drm/intel/issues/3763
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3938]: https://gitlab.freedesktop.org/drm/intel/issues/3938
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#433]: https://gitlab.freedesktop.org/drm/intel/issues/433
  [i915#4349]: https://gitlab.freedesktop.org/drm/intel/issues/4349
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#4573]: https://gitlab.freedesktop.org/drm/intel/issues/4573
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4771]: https://gitlab.freedesktop.org/drm/intel/issues/4771
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4854]: https://gitlab.freedesktop.org/drm/intel/issues/4854
  [i915#4859]: https://gitlab.freedesktop.org/drm/intel/issues/4859
  [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880
  [i915#5072]: https://gitlab.freedesktop.org/drm/intel/issues/5072
  [i915#5099]: https://gitlab.freedesktop.org/drm/intel/issues/5099
  [i915#5138]: https://gitlab.freedesktop.org/drm/intel/issues/5138
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5519]: https://gitlab.freedesktop.org/drm/intel/issues/5519
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#588]: https://gitlab.freedesktop.org/drm/intel/issues/588
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6497]: https://gitlab.freedesktop.org/drm/intel/issues/6497
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#6537]: https://gitlab.freedesktop.org/drm/intel/issues/6537
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768
  [i915#6943]: https://gitlab.freedesktop.org/drm/intel/issues/6943
  [i915#6946]: https://gitlab.freedesktop.org/drm/intel/issues/6946
  [i915#7037]: https://gitlab.freedesktop.org/drm/intel/issues/7037
  [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561


Build changes
-------------

  * Linux: CI_DRM_12439 -> Patchwork_111402v1

  CI-20190529: 20190529
  CI_DRM_12439: 1e78c0412b6cc27f0b0e3773377011966757ac38 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7072: 69ba7163475925cdc69aebbdfa0e87453ae165c7 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_111402v1: 1e78c0412b6cc27f0b0e3773377011966757ac38 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111402v1/index.html

[-- Attachment #2: Type: text/html, Size: 31687 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-11-28 16:52 [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation Andrzej Hajda
                   ` (2 preceding siblings ...)
  2022-11-29  0:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
@ 2022-11-29  8:43 ` Tvrtko Ursulin
  2022-12-01  0:22   ` John Harrison
  3 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2022-11-29  8:43 UTC (permalink / raw)
  To: Andrzej Hajda, intel-gfx; +Cc: Rodrigo Vivi



On 28/11/2022 16:52, Andrzej Hajda wrote:
> In case context is exiting preempt_timeout_ms is used for timeout,
> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it increases
> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
> 
> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a for GuC submission")
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> ---
> Hi all,
> 
> I am not sure what is expected solution here, and if my patch does not
> actually reverts intentions of patch d7a8680ec9fb21. Feel free to propose
> something better.
> Other alternative would be to increase t/o in IGT tests, but I am not sure
> if this is good direction.

Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: Make 
the heartbeat play nice with long pre-emption timeouts") that actually 
breaks things? (If IGT modifies the preempt timeout the heartbeat 
extension will not work as intended.)

If so, I think we agreed during review that was a weakness which needs 
to be addressed, but I would need to re-read the old threads to remember 
what was the plan. Regardless what it was it may be time is now to 
continue with those improvements.

Regards,

Tvrtko

> 
> Regards
> Andrzej
> ---
>   drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 49a8f10d76c77b..bbbbcd9e00f947 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1248,6 +1248,10 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
>   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
>   	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
>   		return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS;
> +	else if (unlikely(intel_context_is_exiting(rq->context)))
> +		return min_t(typeof(unsigned long),
> +			     READ_ONCE(engine->props.preempt_timeout_ms),
> +			     CONFIG_DRM_I915_PREEMPT_TIMEOUT);
>   
>   	return READ_ONCE(engine->props.preempt_timeout_ms);
>   }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-11-29  8:43 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
@ 2022-12-01  0:22   ` John Harrison
  2022-12-01 10:28     ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: John Harrison @ 2022-12-01  0:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Andrzej Hajda, intel-gfx; +Cc: Rodrigo Vivi

On 11/29/2022 00:43, Tvrtko Ursulin wrote:
> On 28/11/2022 16:52, Andrzej Hajda wrote:
>> In case context is exiting preempt_timeout_ms is used for timeout,
>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it increases
>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>
>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a 
>> for GuC submission")
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> ---
>> Hi all,
>>
>> I am not sure what is expected solution here, and if my patch does not
>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>> propose
>> something better.
>> Other alternative would be to increase t/o in IGT tests, but I am not 
>> sure
>> if this is good direction.
>
> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
> Make the heartbeat play nice with long pre-emption timeouts") that 
> actually breaks things? (If IGT modifies the preempt timeout the 
> heartbeat extension will not work as intended.)
>
> If so, I think we agreed during review that was a weakness which needs 
> to be addressed, but I would need to re-read the old threads to 
> remember what was the plan. Regardless what it was it may be time is 
> now to continue with those improvements.
>
What is the actual issue? Just that closing contexts are taking forever 
to actually close? That would be the whole point of the 
'context_is_exiting' patch. Which I still totally disagree with.

If the context is being closed 'gracefully' and it is intended that it 
should be allowed time to pre-empt without being killed via an engine 
reset then the 7.5s delay is required. That is the officially agreed 
upon timeout to allow compute capable contexts to reach a pre-emption 
point before they should be killed. If an IGT is failing because it 
enforces a shorter timeout then the IGT needs to be updated to account 
for the fact that i915 has to support slow compute workloads.

If the context is being closed 'forcefully' and should be killed 
immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' value 
not the sysfs/config value.

Regarding heartbeats...

The heartbeat period is 2.5s. But there are up to five heartbeat periods 
between the heartbeat starting and it declaring a hang. The patch you 
mention also introduced a check on the pre-emption timeout when the last 
period starts. If the pre-emption timeout is longer than the heartbeat 
period then the last period is extended to guarantee that a full 
pre-emption time is granted before declaring the hang.

Are you saying that a heartbeat timeout is occurring and killing the 
system? Or are you just worried that something doesn't align correctly?

John.

> Regards,
>
> Tvrtko
>
>>
>> Regards
>> Andrzej
>> ---
>>   drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> index 49a8f10d76c77b..bbbbcd9e00f947 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> @@ -1248,6 +1248,10 @@ static unsigned long 
>> active_preempt_timeout(struct intel_engine_cs *engine,
>>       /* Force a fast reset for terminated contexts (ignoring sysfs!) */
>>       if (unlikely(intel_context_is_banned(rq->context) || 
>> bad_request(rq)))
>>           return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS;
>> +    else if (unlikely(intel_context_is_exiting(rq->context)))
>> +        return min_t(typeof(unsigned long),
>> + READ_ONCE(engine->props.preempt_timeout_ms),
>> +                 CONFIG_DRM_I915_PREEMPT_TIMEOUT);
>>         return READ_ONCE(engine->props.preempt_timeout_ms);
>>   }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-12-01  0:22   ` John Harrison
@ 2022-12-01 10:28     ` Tvrtko Ursulin
  2022-12-01 16:36       ` Andrzej Hajda
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2022-12-01 10:28 UTC (permalink / raw)
  To: John Harrison, Andrzej Hajda, intel-gfx; +Cc: Rodrigo Vivi


On 01/12/2022 00:22, John Harrison wrote:
> On 11/29/2022 00:43, Tvrtko Ursulin wrote:
>> On 28/11/2022 16:52, Andrzej Hajda wrote:
>>> In case context is exiting preempt_timeout_ms is used for timeout,
>>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it increases
>>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>>
>>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a 
>>> for GuC submission")
>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> ---
>>> Hi all,
>>>
>>> I am not sure what is expected solution here, and if my patch does not
>>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>>> propose
>>> something better.
>>> Other alternative would be to increase t/o in IGT tests, but I am not 
>>> sure
>>> if this is good direction.
>>
>> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
>> Make the heartbeat play nice with long pre-emption timeouts") that 
>> actually breaks things? (If IGT modifies the preempt timeout the 
>> heartbeat extension will not work as intended.)
>>
>> If so, I think we agreed during review that was a weakness which needs 
>> to be addressed, but I would need to re-read the old threads to 
>> remember what was the plan. Regardless what it was it may be time is 
>> now to continue with those improvements.
>>
> What is the actual issue? Just that closing contexts are taking forever 
> to actually close? That would be the whole point of the 
> 'context_is_exiting' patch. Which I still totally disagree with.
> 
> If the context is being closed 'gracefully' and it is intended that it 
> should be allowed time to pre-empt without being killed via an engine 
> reset then the 7.5s delay is required. That is the officially agreed 
> upon timeout to allow compute capable contexts to reach a pre-emption 
> point before they should be killed. If an IGT is failing because it 
> enforces a shorter timeout then the IGT needs to be updated to account 
> for the fact that i915 has to support slow compute workloads.
> 
> If the context is being closed 'forcefully' and should be killed 
> immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' value 
> not the sysfs/config value.
> 
> Regarding heartbeats...
> 
> The heartbeat period is 2.5s. But there are up to five heartbeat periods 
> between the heartbeat starting and it declaring a hang. The patch you 
> mention also introduced a check on the pre-emption timeout when the last 
> period starts. If the pre-emption timeout is longer than the heartbeat 
> period then the last period is extended to guarantee that a full 
> pre-emption time is granted before declaring the hang.
> 
> Are you saying that a heartbeat timeout is occurring and killing the 
> system? Or are you just worried that something doesn't align correctly?

I leave this to Andrzej since I am not the one debugging this. I just glanced over the IGT and saw that there's code in there which sets both the preempt timeout and heartbeat interval to non-default values. And then I remembered this:

next_heartbeat():
...
         /*
          * FIXME: The final period extension is disabled if the period has been
          * modified from the default. This is to prevent issues with certain
          * selftests which override the value and expect specific behaviour.
          * Once the selftests have been updated to either cope with variable
          * heartbeat periods (or to override the pre-emption timeout as well,
          * or just to add a selftest specific override of the extension), the
          * generic override can be removed.
          */
         if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
             delay == engine->defaults.heartbeat_interval_ms) {

Which then wouldn't dtrt with last heartbeat pulse extensions, if the IGT would be relying on that. Don't know, just pointing out to check and see if this FIXME needs to be prioritised.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-12-01 10:28     ` Tvrtko Ursulin
@ 2022-12-01 16:36       ` Andrzej Hajda
  2022-12-02  9:14         ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: Andrzej Hajda @ 2022-12-01 16:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, John Harrison, intel-gfx; +Cc: Rodrigo Vivi

On 01.12.2022 11:28, Tvrtko Ursulin wrote:
> 
> On 01/12/2022 00:22, John Harrison wrote:
>> On 11/29/2022 00:43, Tvrtko Ursulin wrote:
>>> On 28/11/2022 16:52, Andrzej Hajda wrote:
>>>> In case context is exiting preempt_timeout_ms is used for timeout,
>>>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it increases
>>>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>>>
>>>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a 
>>>> for GuC submission")
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> ---
>>>> Hi all,
>>>>
>>>> I am not sure what is expected solution here, and if my patch does not
>>>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>>>> propose
>>>> something better.
>>>> Other alternative would be to increase t/o in IGT tests, but I am 
>>>> not sure
>>>> if this is good direction.
>>>
>>> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
>>> Make the heartbeat play nice with long pre-emption timeouts") that 
>>> actually breaks things? (If IGT modifies the preempt timeout the 
>>> heartbeat extension will not work as intended.)
>>>
>>> If so, I think we agreed during review that was a weakness which 
>>> needs to be addressed, but I would need to re-read the old threads to 
>>> remember what was the plan. Regardless what it was it may be time is 
>>> now to continue with those improvements.
>>>
>> What is the actual issue? Just that closing contexts are taking 
>> forever to actually close? That would be the whole point of the 
>> 'context_is_exiting' patch. Which I still totally disagree with.
>>
>> If the context is being closed 'gracefully' and it is intended that it 
>> should be allowed time to pre-empt without being killed via an engine 
>> reset then the 7.5s delay is required. That is the officially agreed 
>> upon timeout to allow compute capable contexts to reach a pre-emption 
>> point before they should be killed. If an IGT is failing because it 
>> enforces a shorter timeout then the IGT needs to be updated to account 
>> for the fact that i915 has to support slow compute workloads.
>>
>> If the context is being closed 'forcefully' and should be killed 
>> immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' 
>> value not the sysfs/config value.
>>
>> Regarding heartbeats...
>>
>> The heartbeat period is 2.5s. But there are up to five heartbeat 
>> periods between the heartbeat starting and it declaring a hang. The 
>> patch you mention also introduced a check on the pre-emption timeout 
>> when the last period starts. If the pre-emption timeout is longer than 
>> the heartbeat period then the last period is extended to guarantee 
>> that a full pre-emption time is granted before declaring the hang.
>>
>> Are you saying that a heartbeat timeout is occurring and killing the 
>> system? Or are you just worried that something doesn't align correctly?
> 
> I leave this to Andrzej since I am not the one debugging this. I just 
> glanced over the IGT and saw that there's code in there which sets both 
> the preempt timeout and heartbeat interval to non-default values. And 
> then I remembered this:

The test is gem_ctx_persistence@many-contexts. It does not modify sysfs 
timeouts, but it assumes 1sec is enough to wait for exiting context 
(no-preemption). It works with bcs, vcs, vecs, but fails on rcs since it has
timeout set to 7.5sec (btw it works with GuC submissions enabled). It 
seemed to me somehow inconsistent, but if this is how it should work
I will just adjust the test.

Regards
Andrzej


> 
> next_heartbeat():
> ...
>          /*
>           * FIXME: The final period extension is disabled if the period 
> has been
>           * modified from the default. This is to prevent issues with 
> certain
>           * selftests which override the value and expect specific 
> behaviour.
>           * Once the selftests have been updated to either cope with 
> variable
>           * heartbeat periods (or to override the pre-emption timeout as 
> well,
>           * or just to add a selftest specific override of the 
> extension), the
>           * generic override can be removed.
>           */
>          if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
>              delay == engine->defaults.heartbeat_interval_ms) {
> 
> Which then wouldn't dtrt with last heartbeat pulse extensions, if the 
> IGT would be relying on that. Don't know, just pointing out to check and 
> see if this FIXME needs to be prioritised.
> 
> Regards,
> 
> Tvrtko


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-12-01 16:36       ` Andrzej Hajda
@ 2022-12-02  9:14         ` Tvrtko Ursulin
  2022-12-02 12:19           ` Andrzej Hajda
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2022-12-02  9:14 UTC (permalink / raw)
  To: Andrzej Hajda, John Harrison, intel-gfx; +Cc: Rodrigo Vivi


On 01/12/2022 16:36, Andrzej Hajda wrote:
> On 01.12.2022 11:28, Tvrtko Ursulin wrote:
>>
>> On 01/12/2022 00:22, John Harrison wrote:
>>> On 11/29/2022 00:43, Tvrtko Ursulin wrote:
>>>> On 28/11/2022 16:52, Andrzej Hajda wrote:
>>>>> In case context is exiting preempt_timeout_ms is used for timeout,
>>>>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it 
>>>>> increases
>>>>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>>>>
>>>>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute w/a 
>>>>> for GuC submission")
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>>> ---
>>>>> Hi all,
>>>>>
>>>>> I am not sure what is expected solution here, and if my patch does not
>>>>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>>>>> propose
>>>>> something better.
>>>>> Other alternative would be to increase t/o in IGT tests, but I am 
>>>>> not sure
>>>>> if this is good direction.
>>>>
>>>> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
>>>> Make the heartbeat play nice with long pre-emption timeouts") that 
>>>> actually breaks things? (If IGT modifies the preempt timeout the 
>>>> heartbeat extension will not work as intended.)
>>>>
>>>> If so, I think we agreed during review that was a weakness which 
>>>> needs to be addressed, but I would need to re-read the old threads 
>>>> to remember what was the plan. Regardless what it was it may be time 
>>>> is now to continue with those improvements.
>>>>
>>> What is the actual issue? Just that closing contexts are taking 
>>> forever to actually close? That would be the whole point of the 
>>> 'context_is_exiting' patch. Which I still totally disagree with.
>>>
>>> If the context is being closed 'gracefully' and it is intended that 
>>> it should be allowed time to pre-empt without being killed via an 
>>> engine reset then the 7.5s delay is required. That is the officially 
>>> agreed upon timeout to allow compute capable contexts to reach a 
>>> pre-emption point before they should be killed. If an IGT is failing 
>>> because it enforces a shorter timeout then the IGT needs to be 
>>> updated to account for the fact that i915 has to support slow compute 
>>> workloads.
>>>
>>> If the context is being closed 'forcefully' and should be killed 
>>> immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' 
>>> value not the sysfs/config value.
>>>
>>> Regarding heartbeats...
>>>
>>> The heartbeat period is 2.5s. But there are up to five heartbeat 
>>> periods between the heartbeat starting and it declaring a hang. The 
>>> patch you mention also introduced a check on the pre-emption timeout 
>>> when the last period starts. If the pre-emption timeout is longer 
>>> than the heartbeat period then the last period is extended to 
>>> guarantee that a full pre-emption time is granted before declaring 
>>> the hang.
>>>
>>> Are you saying that a heartbeat timeout is occurring and killing the 
>>> system? Or are you just worried that something doesn't align correctly?
>>
>> I leave this to Andrzej since I am not the one debugging this. I just 
>> glanced over the IGT and saw that there's code in there which sets 
>> both the preempt timeout and heartbeat interval to non-default values. 
>> And then I remembered this:
> 
> The test is gem_ctx_persistence@many-contexts. It does not modify sysfs 
> timeouts, but it assumes 1sec is enough to wait for exiting context 
> (no-preemption). It works with bcs, vcs, vecs, but fails on rcs since it 
> has
> timeout set to 7.5sec (btw it works with GuC submissions enabled). It 
> seemed to me somehow inconsistent, but if this is how it should work
> I will just adjust the test.

This looks odd then. That test is using non-preemptable spinners and 
AFAICT it keeps submitting them for 30s, across all engines, and then it 
stops and waits for one second for all of them to exit.

With the 7.5 preempt timeout I'd expect test should fail both with GuC 
and execlists.

What should happen is that every context is marked as "exiting" and is 
revoked. On the next scheduling event they would all be dropped.

So I think two questions - how did increase of preempt timeout to 7.5s 
pass CI - is the failure sporadic for instance?

Second question - you are saying with GuC test always passes - how does 
GuC manages to revoke a non-preemptible spinner in less than one second 
if preempt timeout is 7.5s.. colour me confused.

Anyway those questions are secondary.. Fix here I think pretty obviously 
is for many_contexts() to fetch the preempt timeout from sysfs and allow 
for that much time (plus a safety factor). Use the longest timeout 
between all engines since all are submitted to.

Regards,

Tvrtko

> 
> Regards
> Andrzej
> 
> 
>>
>> next_heartbeat():
>> ...
>>          /*
>>           * FIXME: The final period extension is disabled if the 
>> period has been
>>           * modified from the default. This is to prevent issues with 
>> certain
>>           * selftests which override the value and expect specific 
>> behaviour.
>>           * Once the selftests have been updated to either cope with 
>> variable
>>           * heartbeat periods (or to override the pre-emption timeout 
>> as well,
>>           * or just to add a selftest specific override of the 
>> extension), the
>>           * generic override can be removed.
>>           */
>>          if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
>>              delay == engine->defaults.heartbeat_interval_ms) {
>>
>> Which then wouldn't dtrt with last heartbeat pulse extensions, if the 
>> IGT would be relying on that. Don't know, just pointing out to check 
>> and see if this FIXME needs to be prioritised.
>>
>> Regards,
>>
>> Tvrtko
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-12-02  9:14         ` Tvrtko Ursulin
@ 2022-12-02 12:19           ` Andrzej Hajda
  2022-12-02 13:13             ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: Andrzej Hajda @ 2022-12-02 12:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, John Harrison, intel-gfx; +Cc: Rodrigo Vivi



On 02.12.2022 10:14, Tvrtko Ursulin wrote:
>
> On 01/12/2022 16:36, Andrzej Hajda wrote:
>> On 01.12.2022 11:28, Tvrtko Ursulin wrote:
>>>
>>> On 01/12/2022 00:22, John Harrison wrote:
>>>> On 11/29/2022 00:43, Tvrtko Ursulin wrote:
>>>>> On 28/11/2022 16:52, Andrzej Hajda wrote:
>>>>>> In case context is exiting preempt_timeout_ms is used for timeout,
>>>>>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it 
>>>>>> increases
>>>>>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>>>>>
>>>>>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute 
>>>>>> w/a for GuC submission")
>>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>>>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>>>> ---
>>>>>> Hi all,
>>>>>>
>>>>>> I am not sure what is expected solution here, and if my patch 
>>>>>> does not
>>>>>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>>>>>> propose
>>>>>> something better.
>>>>>> Other alternative would be to increase t/o in IGT tests, but I am 
>>>>>> not sure
>>>>>> if this is good direction.
>>>>>
>>>>> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
>>>>> Make the heartbeat play nice with long pre-emption timeouts") that 
>>>>> actually breaks things? (If IGT modifies the preempt timeout the 
>>>>> heartbeat extension will not work as intended.)
>>>>>
>>>>> If so, I think we agreed during review that was a weakness which 
>>>>> needs to be addressed, but I would need to re-read the old threads 
>>>>> to remember what was the plan. Regardless what it was it may be 
>>>>> time is now to continue with those improvements.
>>>>>
>>>> What is the actual issue? Just that closing contexts are taking 
>>>> forever to actually close? That would be the whole point of the 
>>>> 'context_is_exiting' patch. Which I still totally disagree with.
>>>>
>>>> If the context is being closed 'gracefully' and it is intended that 
>>>> it should be allowed time to pre-empt without being killed via an 
>>>> engine reset then the 7.5s delay is required. That is the 
>>>> officially agreed upon timeout to allow compute capable contexts to 
>>>> reach a pre-emption point before they should be killed. If an IGT 
>>>> is failing because it enforces a shorter timeout then the IGT needs 
>>>> to be updated to account for the fact that i915 has to support slow 
>>>> compute workloads.
>>>>
>>>> If the context is being closed 'forcefully' and should be killed 
>>>> immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' 
>>>> value not the sysfs/config value.
>>>>
>>>> Regarding heartbeats...
>>>>
>>>> The heartbeat period is 2.5s. But there are up to five heartbeat 
>>>> periods between the heartbeat starting and it declaring a hang. The 
>>>> patch you mention also introduced a check on the pre-emption 
>>>> timeout when the last period starts. If the pre-emption timeout is 
>>>> longer than the heartbeat period then the last period is extended 
>>>> to guarantee that a full pre-emption time is granted before 
>>>> declaring the hang.
>>>>
>>>> Are you saying that a heartbeat timeout is occurring and killing 
>>>> the system? Or are you just worried that something doesn't align 
>>>> correctly?
>>>
>>> I leave this to Andrzej since I am not the one debugging this. I 
>>> just glanced over the IGT and saw that there's code in there which 
>>> sets both the preempt timeout and heartbeat interval to non-default 
>>> values. And then I remembered this:
>>
>> The test is gem_ctx_persistence@many-contexts. It does not modify 
>> sysfs timeouts, but it assumes 1sec is enough to wait for exiting 
>> context (no-preemption). It works with bcs, vcs, vecs, but fails on 
>> rcs since it has
>> timeout set to 7.5sec (btw it works with GuC submissions enabled). It 
>> seemed to me somehow inconsistent, but if this is how it should work
>> I will just adjust the test.
>
> This looks odd then. That test is using non-preemptable spinners and 
> AFAICT it keeps submitting them for 30s, across all engines, and then 
> it stops and waits for one second for all of them to exit.
>
> With the 7.5 preempt timeout I'd expect test should fail both with GuC 
> and execlists.

OK, my claim about working with GuC was not verified enough, just one 
testing machine.

>
> What should happen is that every context is marked as "exiting" and is 
> revoked. On the next scheduling event they would all be dropped.
>
> So I think two questions - how did increase of preempt timeout to 7.5s 
> pass CI - is the failure sporadic for instance?

After some data mining on cibuglog from last month I can say results are 
mostly consistent per machine.
On most machines it always passes.
Always fails on shard-tgl*, shard-rkl-{1,2,3,4,6} (but on shard-rkl-5 it 
always passes), fi-adl-ddr5, fi-kbl-soraka, fi-rkl-11600.
On re-dg2-{11,12,15} results are inconsistent - some passes, some fails.

>
> Second question - you are saying with GuC test always passes - how 
> does GuC manages to revoke a non-preemptible spinner in less than one 
> second if preempt timeout is 7.5s.. colour me confused.
>
> Anyway those questions are secondary.. Fix here I think pretty 
> obviously is for many_contexts() to fetch the preempt timeout from 
> sysfs and allow for that much time (plus a safety factor). Use the 
> longest timeout between all engines since all are submitted to.

With increasing to 10 seconds the issue disappeared on two RIL machines 
used for tests, but I will post the patch on try-bot check other 
machines as well.

One more thing, to be sure. As I understand reset due to stopped 
heartbeat, should not happen for 7.5sec preemption timeouts, if test do 
not adjust any timeouts? If yes then there is sth wrong anyway.
See sample logs from dg2 showing what happens: pass[1], fail[2].
In both cases there is 22 "heartbeat * not ticking" log lines, all on 
7.5s preemption_timeouts (rcs, ccs).

[1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12449/re-dg2-12/igt@gem_ctx_persistence@many-contexts.html
[2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12457/re-dg2-12/igt@gem_ctx_persistence@many-contexts.html

Regards
Andrzej


>
> Regards,
>
> Tvrtko
>
>>
>> Regards
>> Andrzej
>>
>>
>>>
>>> next_heartbeat():
>>> ...
>>>          /*
>>>           * FIXME: The final period extension is disabled if the 
>>> period has been
>>>           * modified from the default. This is to prevent issues 
>>> with certain
>>>           * selftests which override the value and expect specific 
>>> behaviour.
>>>           * Once the selftests have been updated to either cope with 
>>> variable
>>>           * heartbeat periods (or to override the pre-emption 
>>> timeout as well,
>>>           * or just to add a selftest specific override of the 
>>> extension), the
>>>           * generic override can be removed.
>>>           */
>>>          if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
>>>              delay == engine->defaults.heartbeat_interval_ms) {
>>>
>>> Which then wouldn't dtrt with last heartbeat pulse extensions, if 
>>> the IGT would be relying on that. Don't know, just pointing out to 
>>> check and see if this FIXME needs to be prioritised.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation
  2022-12-02 12:19           ` Andrzej Hajda
@ 2022-12-02 13:13             ` Tvrtko Ursulin
  0 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2022-12-02 13:13 UTC (permalink / raw)
  To: Andrzej Hajda, John Harrison, intel-gfx; +Cc: Rodrigo Vivi


On 02/12/2022 12:19, Andrzej Hajda wrote:
> On 02.12.2022 10:14, Tvrtko Ursulin wrote:
>>
>> On 01/12/2022 16:36, Andrzej Hajda wrote:
>>> On 01.12.2022 11:28, Tvrtko Ursulin wrote:
>>>>
>>>> On 01/12/2022 00:22, John Harrison wrote:
>>>>> On 11/29/2022 00:43, Tvrtko Ursulin wrote:
>>>>>> On 28/11/2022 16:52, Andrzej Hajda wrote:
>>>>>>> In case context is exiting preempt_timeout_ms is used for timeout,
>>>>>>> but since introduction of DRM_I915_PREEMPT_TIMEOUT_COMPUTE it 
>>>>>>> increases
>>>>>>> to 7.5 seconds. Heartbeat occurs earlier but it is still 2.5s.
>>>>>>>
>>>>>>> Fixes: d7a8680ec9fb21 ("drm/i915: Improve long running compute 
>>>>>>> w/a for GuC submission")
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2410
>>>>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>>>>> ---
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am not sure what is expected solution here, and if my patch 
>>>>>>> does not
>>>>>>> actually reverts intentions of patch d7a8680ec9fb21. Feel free to 
>>>>>>> propose
>>>>>>> something better.
>>>>>>> Other alternative would be to increase t/o in IGT tests, but I am 
>>>>>>> not sure
>>>>>>> if this is good direction.
>>>>>>
>>>>>> Is it the hack with the FIXME marker from 47daf84a8bfb ("drm/i915: 
>>>>>> Make the heartbeat play nice with long pre-emption timeouts") that 
>>>>>> actually breaks things? (If IGT modifies the preempt timeout the 
>>>>>> heartbeat extension will not work as intended.)
>>>>>>
>>>>>> If so, I think we agreed during review that was a weakness which 
>>>>>> needs to be addressed, but I would need to re-read the old threads 
>>>>>> to remember what was the plan. Regardless what it was it may be 
>>>>>> time is now to continue with those improvements.
>>>>>>
>>>>> What is the actual issue? Just that closing contexts are taking 
>>>>> forever to actually close? That would be the whole point of the 
>>>>> 'context_is_exiting' patch. Which I still totally disagree with.
>>>>>
>>>>> If the context is being closed 'gracefully' and it is intended that 
>>>>> it should be allowed time to pre-empt without being killed via an 
>>>>> engine reset then the 7.5s delay is required. That is the 
>>>>> officially agreed upon timeout to allow compute capable contexts to 
>>>>> reach a pre-emption point before they should be killed. If an IGT 
>>>>> is failing because it enforces a shorter timeout then the IGT needs 
>>>>> to be updated to account for the fact that i915 has to support slow 
>>>>> compute workloads.
>>>>>
>>>>> If the context is being closed 'forcefully' and should be killed 
>>>>> immediately then you should be using the 'BANNED_PREEMPT_TIMEOUT' 
>>>>> value not the sysfs/config value.
>>>>>
>>>>> Regarding heartbeats...
>>>>>
>>>>> The heartbeat period is 2.5s. But there are up to five heartbeat 
>>>>> periods between the heartbeat starting and it declaring a hang. The 
>>>>> patch you mention also introduced a check on the pre-emption 
>>>>> timeout when the last period starts. If the pre-emption timeout is 
>>>>> longer than the heartbeat period then the last period is extended 
>>>>> to guarantee that a full pre-emption time is granted before 
>>>>> declaring the hang.
>>>>>
>>>>> Are you saying that a heartbeat timeout is occurring and killing 
>>>>> the system? Or are you just worried that something doesn't align 
>>>>> correctly?
>>>>
>>>> I leave this to Andrzej since I am not the one debugging this. I 
>>>> just glanced over the IGT and saw that there's code in there which 
>>>> sets both the preempt timeout and heartbeat interval to non-default 
>>>> values. And then I remembered this:
>>>
>>> The test is gem_ctx_persistence@many-contexts. It does not modify 
>>> sysfs timeouts, but it assumes 1sec is enough to wait for exiting 
>>> context (no-preemption). It works with bcs, vcs, vecs, but fails on 
>>> rcs since it has
>>> timeout set to 7.5sec (btw it works with GuC submissions enabled). It 
>>> seemed to me somehow inconsistent, but if this is how it should work
>>> I will just adjust the test.
>>
>> This looks odd then. That test is using non-preemptable spinners and 
>> AFAICT it keeps submitting them for 30s, across all engines, and then 
>> it stops and waits for one second for all of them to exit.
>>
>> With the 7.5 preempt timeout I'd expect test should fail both with GuC 
>> and execlists.
> 
> OK, my claim about working with GuC was not verified enough, just one 
> testing machine.
> 
>>
>> What should happen is that every context is marked as "exiting" and is 
>> revoked. On the next scheduling event they would all be dropped.
>>
>> So I think two questions - how did increase of preempt timeout to 7.5s 
>> pass CI - is the failure sporadic for instance?
> 
> After some data mining on cibuglog from last month I can say results are 
> mostly consistent per machine.
> On most machines it always passes.
> Always fails on shard-tgl*, shard-rkl-{1,2,3,4,6} (but on shard-rkl-5 it 
> always passes), fi-adl-ddr5, fi-kbl-soraka, fi-rkl-11600.
> On re-dg2-{11,12,15} results are inconsistent - some passes, some fails.
> 
>>
>> Second question - you are saying with GuC test always passes - how 
>> does GuC manages to revoke a non-preemptible spinner in less than one 
>> second if preempt timeout is 7.5s.. colour me confused.
>>
>> Anyway those questions are secondary.. Fix here I think pretty 
>> obviously is for many_contexts() to fetch the preempt timeout from 
>> sysfs and allow for that much time (plus a safety factor). Use the 
>> longest timeout between all engines since all are submitted to.
> 
> With increasing to 10 seconds the issue disappeared on two RIL machines 
> used for tests, but I will post the patch on try-bot check other 
> machines as well.
> 
> One more thing, to be sure. As I understand reset due to stopped 
> heartbeat, should not happen for 7.5sec preemption timeouts, if test do 
> not adjust any timeouts? If yes then there is sth wrong anyway.
> See sample logs from dg2 showing what happens: pass[1], fail[2].
> In both cases there is 22 "heartbeat * not ticking" log lines, all on 
> 7.5s preemption_timeouts (rcs, ccs).
> 
> [1]: 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12449/re-dg2-12/igt@gem_ctx_persistence@many-contexts.html

<6> [278.652441] [IGT] gem_ctx_persistence: executing
<7> [278.663699] i915 0000:03:00.0: [drm:i915_gem_open [i915]]
<7> [278.664362] i915 0000:03:00.0: [drm:i915_drop_caches_set [i915]] Dropping caches: 0x0000005c [0x0000005c]
<7> [278.664496] i915 0000:03:00.0: [drm:i915_gem_open [i915]]
<7> [278.664992] i915 0000:03:00.0: [drm:i915_gem_open [i915]]
<6> [278.670968] [IGT] gem_ctx_persistence: starting subtest many-contexts
<7> [278.671164] [drm:eb_lookup_vmas [i915]] EINVAL at eb_validate_vma:505
<7> [278.686769] i915 0000:03:00.0: [drm:i915_drop_caches_set [i915]] Dropping caches: 0x000001dc [0x000001dc]
<6> [279.367025] i915 0000:03:00.0: [drm] Ignoring context reset notification of exiting context 0x100C on bcs0
<6> [279.368863] i915 0000:03:00.0: [drm] Ignoring context reset notification of exiting context 0x100D on vcs0
<6> [279.370813] i915 0000:03:00.0: [drm] Ignoring context reset notification of exiting context 0x100E on vcs1
<6> [279.373360] i915 0000:03:00.0: [drm] Ignoring context reset notification of exiting context 0x100F on vecs0
<6> [279.376086] i915 0000:03:00.0: [drm] Ignoring context reset notification of exiting context 0x1010 on vecs1
<7> [281.964427] heartbeat ccs3 heartbeat {seqno:e:190, prio:2147483646} not ticking
<7> [281.964449] heartbeat 	Awake? 2
<7> [281.964457] heartbeat 	Barriers?: no
<7> [281.964465] heartbeat 	Latency: 0us
<7> [281.964499] heartbeat 	Runtime: 292051ms
<7> [281.964507] heartbeat 	Forcewake: 0 domains, 0 active
<7> [281.964516] heartbeat 	Heartbeat: 3224 ms ago
<7> [281.964525] heartbeat 	Reset count: 0 (global 28)
<7> [281.964533] heartbeat 	Properties:
<7> [281.964539] heartbeat 		heartbeat_interval_ms: 2500 [default 2500]
<7> [281.964548] heartbeat 		max_busywait_duration_ns: 8000 [default 8000]
<7> [281.964556] heartbeat 		preempt_timeout_ms: 7500 [default 7500]
<7> [281.964564] heartbeat 		stop_timeout_ms: 100 [default 100]
<7> [281.964571] heartbeat 		timeslice_duration_ms: 1 [default 1]
<7> [281.964580] heartbeat 	Requests:
<7> [281.964663] heartbeat 		active in queueE e:190*  prio=2147483646 @ 3224ms: [i915]
<7> [281.964677] heartbeat 		ring->start:  0xfecb0000
<7> [281.964685] heartbeat 		ring->head:   0x000008a0
<7> [281.964692] heartbeat 		ring->tail:   0x00000758
<7> [281.964698] heartbeat 		ring->emit:   0x00000760
<7> [281.964705] heartbeat 		ring->space:  0x00000100

This indeed looks super strange - heartbeat at max prio and not ticking 3224ms after first created. No idea. One for GuC experts if GuC is the only backend where this happens.

Well then I looked at the failures on TGL you mention above:

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7078/shard-tglb3/igt@gem_ctx_persistence@many-contexts.html

<6> [331.908547] [IGT] gem_ctx_persistence: executing
<7> [331.910926] i915 0000:00:02.0: [drm:i915_gem_open [i915]]
<7> [331.911475] i915 0000:00:02.0: [drm:i915_drop_caches_set [i915]] Dropping caches: 0x0000005c [0x0000005c]
<7> [331.911608] i915 0000:00:02.0: [drm:i915_gem_open [i915]]
<7> [331.911924] i915 0000:00:02.0: [drm:i915_gem_open [i915]]
<6> [331.915834] [IGT] gem_ctx_persistence: starting subtest many-contexts
...
<7> [335.249250] heartbeat rcs0 heartbeat {seqno:5:76, prio:2147483646} not ticking
<7> [335.249269] heartbeat 	Awake? 1338
<7> [335.249273] heartbeat 	Barriers?: no
<7> [335.249277] heartbeat 	Latency: 243us
<7> [335.249291] heartbeat 	Runtime: 10531ms
<7> [335.249294] heartbeat 	Forcewake: 0 domains, 0 active
<7> [335.249297] heartbeat 	Heartbeat: 3307 ms ago
<7> [335.249305] heartbeat 	Reset count: 0 (global 0)
<7> [335.249308] heartbeat 	Properties:
<7> [335.249310] heartbeat 		heartbeat_interval_ms: 2500 [default 2500]
<7> [335.249314] heartbeat 		max_busywait_duration_ns: 8000 [default 8000]
<7> [335.249318] heartbeat 		preempt_timeout_ms: 7500 [default 7500]
<7> [335.249322] heartbeat 		stop_timeout_ms: 100 [default 100]
<7> [335.249326] heartbeat 		timeslice_duration_ms: 1 [default 1]
<7> [335.249336] heartbeat 	Requests:
<7> [335.249387] heartbeat 		hungR 13a:2*  prio=0 @ 3308ms: gem_ctx_persist<1773>
<7> [335.249393] heartbeat 		ring->start:  0x0149f000
<7> [335.249395] heartbeat 		ring->head:   0x00000000
<7> [335.249398] heartbeat 		ring->tail:   0x000000b0
<7> [335.249402] heartbeat 		ring->emit:   0x000000b8
<7> [335.249404] heartbeat 		ring->space:  0x00003f08
<7> [335.249407] heartbeat 		ring->hwsp:   0xfedc4000

Same thing. So either something is totally broken or I totally forgot how things are supposed to work.

There shouldn't be a hearbeat stopped ticking until 4x hearbeat intervals + preempt timeout. And it nicely shows the current engine values for those so it makes no sense.

Was there something left running before the test started? But drop_caches was done on test start..

Regards,

Tvrtko

> [2]: 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12457/re-dg2-12/igt@gem_ctx_persistence@many-contexts.html
> 
> Regards
> Andrzej
> 
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Regards
>>> Andrzej
>>>
>>>
>>>>
>>>> next_heartbeat():
>>>> ...
>>>>          /*
>>>>           * FIXME: The final period extension is disabled if the 
>>>> period has been
>>>>           * modified from the default. This is to prevent issues 
>>>> with certain
>>>>           * selftests which override the value and expect specific 
>>>> behaviour.
>>>>           * Once the selftests have been updated to either cope with 
>>>> variable
>>>>           * heartbeat periods (or to override the pre-emption 
>>>> timeout as well,
>>>>           * or just to add a selftest specific override of the 
>>>> extension), the
>>>>           * generic override can be removed.
>>>>           */
>>>>          if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
>>>>              delay == engine->defaults.heartbeat_interval_ms) {
>>>>
>>>> Which then wouldn't dtrt with last heartbeat pulse extensions, if 
>>>> the IGT would be relying on that. Don't know, just pointing out to 
>>>> check and see if this FIXME needs to be prioritised.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-12-02 13:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28 16:52 [Intel-gfx] [PATCH] drm/i915: fix exiting context timeout calculation Andrzej Hajda
2022-11-28 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-11-28 18:28 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-29  0:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-11-29  8:43 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
2022-12-01  0:22   ` John Harrison
2022-12-01 10:28     ` Tvrtko Ursulin
2022-12-01 16:36       ` Andrzej Hajda
2022-12-02  9:14         ` Tvrtko Ursulin
2022-12-02 12:19           ` Andrzej Hajda
2022-12-02 13:13             ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.