All of lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
@ 2021-11-26  1:57 Mastan Katragadda
  2021-11-26  2:42 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mastan Katragadda @ 2021-11-26  1:57 UTC (permalink / raw)
  To: igt-dev, tejaskumarx.surendrakumar.upadhyay; +Cc: mastanx.katragadda

This a known failure when running igt@gem_exec_balancer@bonded-(dual|pair|sync)
tests with GuC submission.The hang is expected with GuC submission since the
test was written to expect execlist scheduling hence added skip if Guc
submission enabled.

Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
---
 tests/i915/gem_exec_balancer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
index cc07a5a9..d58734ab 100644
--- a/tests/i915/gem_exec_balancer.c
+++ b/tests/i915/gem_exec_balancer.c
@@ -3320,6 +3320,7 @@ igt_main
 
 	igt_subtest_group {
 		igt_fixture {
+			igt_require(!gem_using_guc_submission(i915));
 			intel_allocator_multiprocess_start();
 		}
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-26  1:57 [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission Mastan Katragadda
@ 2021-11-26  2:42 ` Patchwork
  2021-11-26  8:46 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
  2021-11-29 10:30 ` [igt-dev] [i-g-t] " Tvrtko Ursulin
  2 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2021-11-26  2:42 UTC (permalink / raw)
  To: Mastan Katragadda; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 7268 bytes --]

== Series Details ==

Series: tests/i915/exec_balancer: Added Skip Guc Submission
URL   : https://patchwork.freedesktop.org/series/97304/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10929 -> IGTPW_6438
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/index.html

Participating hosts (42 -> 33)
------------------------------

  Additional (1): fi-tgl-u2 
  Missing    (10): fi-bxt-dsi bat-dg1-6 bat-dg1-5 fi-icl-u2 fi-bsw-cyan bat-adlp-6 bat-adlp-4 fi-bdw-samus bat-jsl-2 bat-jsl-1 

Known issues
------------

  Here are the changes found in IGTPW_6438 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-gfx:
    - fi-skl-6700k2:      NOTRUN -> [SKIP][1] ([fdo#109271]) +26 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@amdgpu/amd_basic@cs-gfx.html

  * igt@amdgpu/amd_basic@semaphore:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][2] ([fdo#109271]) +31 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-bdw-5557u/igt@amdgpu/amd_basic@semaphore.html

  * igt@core_hotunplug@unbind-rebind:
    - fi-tgl-u2:          NOTRUN -> [INCOMPLETE][3] ([i915#4006])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@core_hotunplug@unbind-rebind.html

  * igt@gem_huc_copy@huc-copy:
    - fi-tgl-u2:          NOTRUN -> [SKIP][4] ([i915#2190])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@gem_huc_copy@huc-copy.html
    - fi-skl-6700k2:      NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-skl-6700k2:      NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@gem_lmem_swapping@random-engines.html

  * igt@gem_lmem_swapping@verify-random:
    - fi-tgl-u2:          NOTRUN -> [SKIP][7] ([i915#4613]) +3 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@gem_lmem_swapping@verify-random.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [PASS][8] -> [INCOMPLETE][9] ([i915#3921])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][10] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-bdw-5557u/igt@kms_chamelium@dp-crc-fast.html

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-tgl-u2:          NOTRUN -> [SKIP][11] ([fdo#109284] / [fdo#111827]) +8 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@kms_chamelium@dp-hpd-fast.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-skl-6700k2:      NOTRUN -> [SKIP][12] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-tgl-u2:          NOTRUN -> [SKIP][13] ([i915#4103]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-tgl-u2:          NOTRUN -> [SKIP][14] ([fdo#109285])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-skl-6700k2:      NOTRUN -> [SKIP][15] ([fdo#109271] / [i915#533])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_psr@primary_page_flip:
    - fi-skl-6600u:       [PASS][16] -> [FAIL][17] ([i915#4547])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/fi-skl-6600u/igt@kms_psr@primary_page_flip.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6600u/igt@kms_psr@primary_page_flip.html

  * igt@prime_vgem@basic-userptr:
    - fi-tgl-u2:          NOTRUN -> [SKIP][18] ([i915#3301])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-skl-6600u:       NOTRUN -> [FAIL][19] ([i915#3363] / [i915#4312])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6600u/igt@runner@aborted.html
    - fi-tgl-u2:          NOTRUN -> [FAIL][20] ([i915#1602] / [i915#2722] / [i915#4312])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-tgl-u2/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-skl-6700k2:      [INCOMPLETE][21] ([i915#198]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/fi-skl-6700k2/igt@gem_exec_suspend@basic-s3.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/fi-skl-6700k2/igt@gem_exec_suspend@basic-s3.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2722]: https://gitlab.freedesktop.org/drm/intel/issues/2722
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3363]: https://gitlab.freedesktop.org/drm/intel/issues/3363
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4006]: https://gitlab.freedesktop.org/drm/intel/issues/4006
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_6291 -> IGTPW_6438

  CI-20190529: 20190529
  CI_DRM_10929: 59e636eb65235e3b6220ecf1b19da2a6dca76160 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6438: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/index.html
  IGT_6291: 9ff3844d8c1fee8d8736d888f16223c4789fb69f @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/index.html

[-- Attachment #2: Type: text/html, Size: 8986 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-26  1:57 [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission Mastan Katragadda
  2021-11-26  2:42 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
@ 2021-11-26  8:46 ` Patchwork
  2021-11-29 10:30 ` [igt-dev] [i-g-t] " Tvrtko Ursulin
  2 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2021-11-26  8:46 UTC (permalink / raw)
  To: Mastan Katragadda; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 30269 bytes --]

== Series Details ==

Series: tests/i915/exec_balancer: Added Skip Guc Submission
URL   : https://patchwork.freedesktop.org/series/97304/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10929_full -> IGTPW_6438_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_6438_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_6438_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/index.html

Participating hosts (11 -> 7)
------------------------------

  Missing    (4): pig-skl-6260u pig-kbl-iris shard-rkl pig-glk-j5005 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_6438_full:

### IGT changes ###

#### Possible regressions ####

  * igt@prime_self_import@export-vs-gem_close-race:
    - shard-tglb:         NOTRUN -> [FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb5/igt@prime_self_import@export-vs-gem_close-race.html

  
Known issues
------------

  Here are the changes found in IGTPW_6438_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_persistence@legacy-engines-mixed-process:
    - shard-snb:          NOTRUN -> [SKIP][2] ([fdo#109271] / [i915#1099]) +2 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-snb4/igt@gem_ctx_persistence@legacy-engines-mixed-process.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [PASS][3] -> [TIMEOUT][4] ([i915#3063] / [i915#3648])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-tglb3/igt@gem_eio@unwedge-stress.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-tglb:         NOTRUN -> [SKIP][5] ([i915#4525])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@gem_exec_balancer@parallel-contexts.html
    - shard-iclb:         NOTRUN -> [SKIP][6] ([i915#4525])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb5/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_capture@pi@rcs0:
    - shard-tglb:         [PASS][7] -> [INCOMPLETE][8] ([i915#3371])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-tglb6/igt@gem_exec_capture@pi@rcs0.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb3/igt@gem_exec_capture@pi@rcs0.html

  * igt@gem_exec_capture@pi@vecs0:
    - shard-iclb:         [PASS][9] -> [INCOMPLETE][10] ([i915#3371])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-iclb5/igt@gem_exec_capture@pi@vecs0.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb7/igt@gem_exec_capture@pi@vecs0.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-tglb:         NOTRUN -> [FAIL][11] ([i915#2842]) +1 similar issue
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb8/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-iclb:         NOTRUN -> [FAIL][12] ([i915#2842]) +1 similar issue
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb7/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-kbl:          [PASS][13] -> [FAIL][14] ([i915#2842])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-kbl7/igt@gem_exec_fair@basic-pace@vecs0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl4/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [PASS][15] -> [FAIL][16] ([i915#2842]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-glk8/igt@gem_exec_fair@basic-throttle@rcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk1/igt@gem_exec_fair@basic-throttle@rcs0.html
    - shard-iclb:         [PASS][17] -> [FAIL][18] ([i915#2849])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-iclb5/igt@gem_exec_fair@basic-throttle@rcs0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb2/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-tglb:         [PASS][19] -> [INCOMPLETE][20] ([i915#456])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-tglb8/igt@gem_exec_suspend@basic-s3.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb7/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_lmem_swapping@random-engines:
    - shard-iclb:         NOTRUN -> [SKIP][21] ([i915#4613]) +1 similar issue
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb5/igt@gem_lmem_swapping@random-engines.html
    - shard-apl:          NOTRUN -> [SKIP][22] ([fdo#109271] / [i915#4613]) +1 similar issue
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl7/igt@gem_lmem_swapping@random-engines.html
    - shard-glk:          NOTRUN -> [SKIP][23] ([fdo#109271] / [i915#4613]) +1 similar issue
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk5/igt@gem_lmem_swapping@random-engines.html
    - shard-kbl:          NOTRUN -> [SKIP][24] ([fdo#109271] / [i915#4613]) +3 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl3/igt@gem_lmem_swapping@random-engines.html

  * igt@gem_lmem_swapping@verify-random:
    - shard-tglb:         NOTRUN -> [SKIP][25] ([i915#4613]) +1 similar issue
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb5/igt@gem_lmem_swapping@verify-random.html

  * igt@gem_pread@exhaustion:
    - shard-snb:          NOTRUN -> [WARN][26] ([i915#2658])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-snb4/igt@gem_pread@exhaustion.html

  * igt@gem_pxp@verify-pxp-key-change-after-suspend-resume:
    - shard-tglb:         NOTRUN -> [SKIP][27] ([i915#4270])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb8/igt@gem_pxp@verify-pxp-key-change-after-suspend-resume.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled:
    - shard-iclb:         NOTRUN -> [SKIP][28] ([i915#768])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gem_userptr_blits@input-checking:
    - shard-apl:          NOTRUN -> [DMESG-WARN][29] ([i915#3002])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@gem_userptr_blits@input-checking.html

  * igt@gem_userptr_blits@readonly-unsync:
    - shard-tglb:         NOTRUN -> [SKIP][30] ([i915#3297])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@gem_userptr_blits@readonly-unsync.html
    - shard-iclb:         NOTRUN -> [SKIP][31] ([i915#3297])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@gem_userptr_blits@readonly-unsync.html

  * igt@gen9_exec_parse@bb-large:
    - shard-tglb:         NOTRUN -> [SKIP][32] ([i915#2856])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb5/igt@gen9_exec_parse@bb-large.html
    - shard-apl:          [PASS][33] -> [TIMEOUT][34] ([i915#4639])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl1/igt@gen9_exec_parse@bb-large.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@gen9_exec_parse@bb-large.html

  * igt@gen9_exec_parse@bb-start-far:
    - shard-iclb:         NOTRUN -> [SKIP][35] ([i915#2856])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@gen9_exec_parse@bb-start-far.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-iclb:         [PASS][36] -> [FAIL][37] ([i915#454]) +1 similar issue
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-iclb4/igt@i915_pm_dc@dc6-dpms.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_rc6_residency@rc6-fence:
    - shard-tglb:         NOTRUN -> [WARN][38] ([i915#2681] / [i915#2684])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb2/igt@i915_pm_rc6_residency@rc6-fence.html
    - shard-iclb:         NOTRUN -> [WARN][39] ([i915#2684])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb8/igt@i915_pm_rc6_residency@rc6-fence.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress-no-wait:
    - shard-tglb:         NOTRUN -> [SKIP][40] ([fdo#111644] / [i915#1397] / [i915#2411])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@i915_pm_rpm@modeset-non-lpsp-stress-no-wait.html
    - shard-iclb:         NOTRUN -> [SKIP][41] ([fdo#110892])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb5/igt@i915_pm_rpm@modeset-non-lpsp-stress-no-wait.html

  * igt@i915_suspend@debugfs-reader:
    - shard-kbl:          [PASS][42] -> [DMESG-WARN][43] ([i915#180]) +1 similar issue
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-kbl7/igt@i915_suspend@debugfs-reader.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl1/igt@i915_suspend@debugfs-reader.html

  * igt@i915_suspend@fence-restore-untiled:
    - shard-apl:          [PASS][44] -> [DMESG-WARN][45] ([i915#180]) +2 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl2/igt@i915_suspend@fence-restore-untiled.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@i915_suspend@fence-restore-untiled.html

  * igt@kms_atomic@plane-primary-overlay-mutable-zpos:
    - shard-tglb:         NOTRUN -> [SKIP][46] ([i915#404])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@kms_atomic@plane-primary-overlay-mutable-zpos.html

  * igt@kms_big_fb@linear-16bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][47] ([fdo#111614])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb8/igt@kms_big_fb@linear-16bpp-rotate-270.html
    - shard-iclb:         NOTRUN -> [SKIP][48] ([fdo#110725] / [fdo#111614])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@kms_big_fb@linear-16bpp-rotate-270.html

  * igt@kms_big_fb@linear-32bpp-rotate-180:
    - shard-glk:          [PASS][49] -> [DMESG-WARN][50] ([i915#118])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-glk1/igt@kms_big_fb@linear-32bpp-rotate-180.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk9/igt@kms_big_fb@linear-32bpp-rotate-180.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-kbl:          NOTRUN -> [SKIP][51] ([fdo#109271] / [i915#3777]) +1 similar issue
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl7/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip:
    - shard-snb:          NOTRUN -> [SKIP][52] ([fdo#109271]) +114 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-snb6/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][53] ([fdo#111615]) +3 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb5/igt@kms_big_fb@yf-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip:
    - shard-iclb:         NOTRUN -> [SKIP][54] ([fdo#110723])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb7/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip.html

  * igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][55] ([i915#3689] / [i915#3886]) +5 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb3/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][56] ([i915#3689]) +1 similar issue
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb3/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][57] ([fdo#109271] / [i915#3886]) +4 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk7/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109278] / [i915#3886]) +4 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html
    - shard-apl:          NOTRUN -> [SKIP][59] ([fdo#109271] / [i915#3886]) +6 similar issues
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl6/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-random-ccs-data-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][60] ([fdo#111615] / [i915#3689]) +3 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@kms_ccs@pipe-a-random-ccs-data-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-c-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-kbl:          NOTRUN -> [SKIP][61] ([fdo#109271] / [i915#3886]) +5 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl1/igt@kms_ccs@pipe-c-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-d-ccs-on-another-bo-yf_tiled_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][62] ([fdo#109278]) +15 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@kms_ccs@pipe-d-ccs-on-another-bo-yf_tiled_ccs.html

  * igt@kms_cdclk@mode-transition:
    - shard-apl:          NOTRUN -> [SKIP][63] ([fdo#109271]) +101 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@kms_cdclk@mode-transition.html
    - shard-tglb:         NOTRUN -> [SKIP][64] ([i915#3742])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb5/igt@kms_cdclk@mode-transition.html

  * igt@kms_chamelium@hdmi-hpd:
    - shard-glk:          NOTRUN -> [SKIP][65] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk3/igt@kms_chamelium@hdmi-hpd.html
    - shard-tglb:         NOTRUN -> [SKIP][66] ([fdo#109284] / [fdo#111827]) +7 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@kms_chamelium@hdmi-hpd.html

  * igt@kms_chamelium@hdmi-hpd-enable-disable-mode:
    - shard-iclb:         NOTRUN -> [SKIP][67] ([fdo#109284] / [fdo#111827]) +4 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb2/igt@kms_chamelium@hdmi-hpd-enable-disable-mode.html

  * igt@kms_chamelium@hdmi-mode-timings:
    - shard-snb:          NOTRUN -> [SKIP][68] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-snb5/igt@kms_chamelium@hdmi-mode-timings.html

  * igt@kms_chamelium@vga-hpd-after-suspend:
    - shard-apl:          NOTRUN -> [SKIP][69] ([fdo#109271] / [fdo#111827]) +6 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl2/igt@kms_chamelium@vga-hpd-after-suspend.html

  * igt@kms_color_chamelium@pipe-a-ctm-blue-to-red:
    - shard-kbl:          NOTRUN -> [SKIP][70] ([fdo#109271] / [fdo#111827]) +15 similar issues
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl7/igt@kms_color_chamelium@pipe-a-ctm-blue-to-red.html

  * igt@kms_content_protection@lic:
    - shard-tglb:         NOTRUN -> [SKIP][71] ([fdo#111828])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb2/igt@kms_content_protection@lic.html
    - shard-kbl:          NOTRUN -> [TIMEOUT][72] ([i915#1319])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl4/igt@kms_content_protection@lic.html

  * igt@kms_cursor_crc@pipe-a-cursor-32x10-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][73] ([i915#3359]) +3 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@kms_cursor_crc@pipe-a-cursor-32x10-offscreen.html

  * igt@kms_cursor_crc@pipe-a-cursor-32x32-random:
    - shard-tglb:         NOTRUN -> [SKIP][74] ([i915#3319])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb7/igt@kms_cursor_crc@pipe-a-cursor-32x32-random.html

  * igt@kms_cursor_crc@pipe-a-cursor-64x21-onscreen:
    - shard-apl:          [PASS][75] -> [DMESG-WARN][76] ([i915#165] / [i915#62]) +7 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl8/igt@kms_cursor_crc@pipe-a-cursor-64x21-onscreen.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl4/igt@kms_cursor_crc@pipe-a-cursor-64x21-onscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-128x42-offscreen:
    - shard-apl:          NOTRUN -> [DMESG-WARN][77] ([i915#165] / [i915#62]) +1 similar issue
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl4/igt@kms_cursor_crc@pipe-c-cursor-128x42-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x170-onscreen:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#109278] / [fdo#109279])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb4/igt@kms_cursor_crc@pipe-c-cursor-512x170-onscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-256x256-onscreen:
    - shard-kbl:          NOTRUN -> [SKIP][79] ([fdo#109271]) +147 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl7/igt@kms_cursor_crc@pipe-d-cursor-256x256-onscreen.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][80] ([fdo#109279] / [i915#3359]) +3 similar issues
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb2/igt@kms_cursor_crc@pipe-d-cursor-512x512-onscreen.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-iclb:         NOTRUN -> [SKIP][81] ([fdo#109274] / [fdo#109278]) +1 similar issue
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_dp_tiled_display@basic-test-pattern:
    - shard-tglb:         NOTRUN -> [SKIP][82] ([i915#426])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb8/igt@kms_dp_tiled_display@basic-test-pattern.html

  * igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium:
    - shard-tglb:         NOTRUN -> [SKIP][83] ([i915#3528])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb2/igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium.html
    - shard-iclb:         NOTRUN -> [SKIP][84] ([i915#3528])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb8/igt@kms_dp_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][85] -> [FAIL][86] ([i915#79])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-glk4/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk1/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-flip-vs-fences:
    - shard-iclb:         NOTRUN -> [SKIP][87] ([fdo#109274])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@kms_flip@2x-flip-vs-fences.html

  * igt@kms_flip@flip-vs-expired-vblank@a-dp1:
    - shard-apl:          [PASS][88] -> [DMESG-WARN][89] ([i915#1982] / [i915#62])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl6/igt@kms_flip@flip-vs-expired-vblank@a-dp1.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl4/igt@kms_flip@flip-vs-expired-vblank@a-dp1.html

  * igt@kms_flip@flip-vs-expired-vblank@c-dp1:
    - shard-apl:          [PASS][90] -> [DMESG-WARN][91] ([i915#62]) +12 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl6/igt@kms_flip@flip-vs-expired-vblank@c-dp1.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl4/igt@kms_flip@flip-vs-expired-vblank@c-dp1.html

  * igt@kms_flip@flip-vs-suspend@c-dp1:
    - shard-kbl:          [PASS][92] -> [INCOMPLETE][93] ([i915#636])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-kbl3/igt@kms_flip@flip-vs-suspend@c-dp1.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl3/igt@kms_flip@flip-vs-suspend@c-dp1.html

  * igt@kms_flip@plain-flip-ts-check-interruptible@c-hdmi-a1:
    - shard-glk:          [PASS][94] -> [FAIL][95] ([i915#2122])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-glk4/igt@kms_flip@plain-flip-ts-check-interruptible@c-hdmi-a1.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk4/igt@kms_flip@plain-flip-ts-check-interruptible@c-hdmi-a1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs:
    - shard-kbl:          NOTRUN -> [SKIP][96] ([fdo#109271] / [i915#2672])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl3/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile:
    - shard-tglb:         NOTRUN -> [SKIP][97] ([i915#2587])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html
    - shard-iclb:         NOTRUN -> [SKIP][98] ([i915#3701])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb2/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
    - shard-iclb:         NOTRUN -> [SKIP][99] ([fdo#109280]) +9 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-tglb:         [PASS][100] -> [INCOMPLETE][101] ([i915#2411] / [i915#2828] / [i915#456])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-tglb5/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-indfb-draw-mmap-cpu:
    - shard-tglb:         NOTRUN -> [SKIP][102] ([fdo#111825]) +21 similar issues
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb8/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-indfb-draw-mmap-cpu.html

  * igt@kms_hdr@static-swap:
    - shard-tglb:         NOTRUN -> [SKIP][103] ([i915#1187])
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb6/igt@kms_hdr@static-swap.html
    - shard-iclb:         NOTRUN -> [SKIP][104] ([i915#1187])
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb3/igt@kms_hdr@static-swap.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-d:
    - shard-kbl:          NOTRUN -> [SKIP][105] ([fdo#109271] / [i915#533]) +1 similar issue
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl4/igt@kms_pipe_crc_basic@read-crc-pipe-d.html
    - shard-glk:          NOTRUN -> [SKIP][106] ([fdo#109271] / [i915#533])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk5/igt@kms_pipe_crc_basic@read-crc-pipe-d.html
    - shard-apl:          NOTRUN -> [SKIP][107] ([fdo#109271] / [i915#533]) +1 similar issue
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl3/igt@kms_pipe_crc_basic@read-crc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    - shard-apl:          NOTRUN -> [FAIL][108] ([i915#265])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl2/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-7efc:
    - shard-kbl:          NOTRUN -> [FAIL][109] ([fdo#108145] / [i915#265])
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl1/igt@kms_plane_alpha_blend@pipe-c-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-apl:          NOTRUN -> [FAIL][110] ([fdo#108145] / [i915#265])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@kms_plane_alpha_blend@pipe-c-alpha-basic.html

  * igt@kms_plane_cursor@pipe-c-primary-size-64:
    - shard-apl:          NOTRUN -> [DMESG-WARN][111] ([i915#62])
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl4/igt@kms_plane_cursor@pipe-c-primary-size-64.html

  * igt@kms_plane_cursor@pipe-d-viewport-size-256:
    - shard-glk:          NOTRUN -> [SKIP][112] ([fdo#109271]) +47 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk9/igt@kms_plane_cursor@pipe-d-viewport-size-256.html

  * igt@kms_plane_lowres@pipe-b-tiling-x:
    - shard-iclb:         NOTRUN -> [SKIP][113] ([i915#3536])
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@kms_plane_lowres@pipe-b-tiling-x.html

  * igt@kms_plane_lowres@pipe-c-tiling-x:
    - shard-tglb:         NOTRUN -> [SKIP][114] ([i915#3536]) +1 similar issue
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb7/igt@kms_plane_lowres@pipe-c-tiling-x.html

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-tglb:         NOTRUN -> [SKIP][115] ([fdo#111615] / [fdo#112054])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@kms_plane_multiple@atomic-pipe-b-tiling-yf.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1:
    - shard-iclb:         NOTRUN -> [SKIP][116] ([i915#658])
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb6/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1.html
    - shard-apl:          NOTRUN -> [SKIP][117] ([fdo#109271] / [i915#658]) +1 similar issue
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl6/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1.html
    - shard-glk:          NOTRUN -> [SKIP][118] ([fdo#109271] / [i915#658])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-glk7/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-1.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-0:
    - shard-tglb:         NOTRUN -> [SKIP][119] ([i915#2920]) +2 similar issues
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb2/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-1:
    - shard-kbl:          NOTRUN -> [SKIP][120] ([fdo#109271] / [i915#658]) +1 similar issue
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-kbl4/igt@kms_psr2_sf@plane-move-sf-dmg-area-1.html

  * igt@kms_psr2_su@frontbuffer:
    - shard-iclb:         [PASS][121] -> [SKIP][122] ([fdo#109642] / [fdo#111068] / [i915#658])
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-iclb2/igt@kms_psr2_su@frontbuffer.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb4/igt@kms_psr2_su@frontbuffer.html

  * igt@kms_psr@psr2_no_drrs:
    - shard-iclb:         NOTRUN -> [SKIP][123] ([fdo#109441])
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb7/igt@kms_psr@psr2_no_drrs.html
    - shard-tglb:         NOTRUN -> [FAIL][124] ([i915#132] / [i915#3467])
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb3/igt@kms_psr@psr2_no_drrs.html

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         [PASS][125] -> [SKIP][126] ([fdo#109441]) +1 similar issue
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-iclb2/igt@kms_psr@psr2_primary_blt.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb5/igt@kms_psr@psr2_primary_blt.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-apl:          [PASS][127] -> [DMESG-WARN][128] ([i915#180] / [i915#295])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10929/shard-apl4/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl1/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@kms_vrr@flip-basic:
    - shard-tglb:         NOTRUN -> [SKIP][129] ([fdo#109502])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@kms_vrr@flip-basic.html
    - shard-iclb:         NOTRUN -> [SKIP][130] ([fdo#109502])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb4/igt@kms_vrr@flip-basic.html

  * igt@kms_writeback@writeback-check-output:
    - shard-apl:          NOTRUN -> [SKIP][131] ([fdo#109271] / [i915#2437])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-apl8/igt@kms_writeback@writeback-check-output.html

  * igt@nouveau_crc@pipe-b-source-outp-inactive:
    - shard-iclb:         NOTRUN -> [SKIP][132] ([i915#2530])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb4/igt@nouveau_crc@pipe-b-source-outp-inactive.html
    - shard-tglb:         NOTRUN -> [SKIP][133] ([i915#2530])
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-tglb1/igt@nouveau_crc@pipe-b-source-outp-inactive.html

  * igt@prime_nv_api@i915_nv_reimport_twice_check_flink_name:
    - shard-iclb:         NOTRUN -> [SKIP][134] ([fdo#109291]) +1 similar issue
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/shard-iclb4/igt@prime_nv_api@i915_nv_re

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6438/index.html

[-- Attachment #2: Type: text/html, Size: 33785 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-26  1:57 [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission Mastan Katragadda
  2021-11-26  2:42 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
  2021-11-26  8:46 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
@ 2021-11-29 10:30 ` Tvrtko Ursulin
  2021-11-29 10:58   ` Katragadda, MastanX
  2 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-11-29 10:30 UTC (permalink / raw)
  To: Mastan Katragadda, igt-dev, tejaskumarx.surendrakumar.upadhyay


On 26/11/2021 01:57, Mastan Katragadda wrote:
> This a known failure when running igt@gem_exec_balancer@bonded-(dual|pair|sync)
> tests with GuC submission.The hang is expected with GuC submission since the
> test was written to expect execlist scheduling hence added skip if Guc
> submission enabled.

Looking at the test I don't see anything backend specific. Can you 
provide an explanation of how and why it hangs?

Regards,

Tvrtko

> Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
> ---
>   tests/i915/gem_exec_balancer.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
> index cc07a5a9..d58734ab 100644
> --- a/tests/i915/gem_exec_balancer.c
> +++ b/tests/i915/gem_exec_balancer.c
> @@ -3320,6 +3320,7 @@ igt_main
>   
>   	igt_subtest_group {
>   		igt_fixture {
> +			igt_require(!gem_using_guc_submission(i915));
>   			intel_allocator_multiprocess_start();
>   		}
>   
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-29 10:30 ` [igt-dev] [i-g-t] " Tvrtko Ursulin
@ 2021-11-29 10:58   ` Katragadda, MastanX
  2021-11-29 11:15     ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Katragadda, MastanX @ 2021-11-29 10:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev, Surendrakumar Upadhyay, TejaskumarX

Hi   Tvrtko Ursulin,
             Based on following backend information added skip on guc enabled platforms.    

basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.

Test is expected to fail with GuC submission, skip it in CI.

Regards,
Mastan

-----Original Message-----
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> 
Sent: 29 November 2021 16:01
To: Katragadda, MastanX <mastanx.katragadda@intel.com>; igt-dev@lists.freedesktop.org; Surendrakumar Upadhyay, TejaskumarX <tejaskumarx.surendrakumar.upadhyay@intel.com>
Subject: Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission


On 26/11/2021 01:57, Mastan Katragadda wrote:
> This a known failure when running 
> igt@gem_exec_balancer@bonded-(dual|pair|sync)
> tests with GuC submission.The hang is expected with GuC submission 
> since the test was written to expect execlist scheduling hence added 
> skip if Guc submission enabled.

Looking at the test I don't see anything backend specific. Can you provide an explanation of how and why it hangs?

Regards,

Tvrtko

> Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
> ---
>   tests/i915/gem_exec_balancer.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/i915/gem_exec_balancer.c 
> b/tests/i915/gem_exec_balancer.c index cc07a5a9..d58734ab 100644
> --- a/tests/i915/gem_exec_balancer.c
> +++ b/tests/i915/gem_exec_balancer.c
> @@ -3320,6 +3320,7 @@ igt_main
>   
>   	igt_subtest_group {
>   		igt_fixture {
> +			igt_require(!gem_using_guc_submission(i915));
>   			intel_allocator_multiprocess_start();
>   		}
>   
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-29 10:58   ` Katragadda, MastanX
@ 2021-11-29 11:15     ` Tvrtko Ursulin
  2021-11-30 16:48       ` Matthew Brost
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-11-29 11:15 UTC (permalink / raw)
  To: Katragadda, MastanX, igt-dev, Surendrakumar Upadhyay, TejaskumarX


On 29/11/2021 10:58, Katragadda, MastanX wrote:
> Hi   Tvrtko Ursulin,
>               Based on following backend information added skip on guc enabled platforms.
> 
> basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
> 
> Test is expected to fail with GuC submission, skip it in CI.

Does "blocks the rest of the queue" mean unrelated contexts submitted 
against the same engine class?

If so then it would be a DoS vector and the "user _should_ always" would 
not be sufficient.

Or if the blockage is localised to a single context then it might be 
fine (but unfortunate) if on top we chose to disallow submission to 
non-virtual indices in the engine map (in case of GuC)? If the firmware 
bug is not getting fixed that is. I may be on the wrong track here since 
I am not 100% certain I figured out why it exactly gets stuck.

Because, looking at the bonded-pair to start with, if the test is 
emitting a pair of request on the same context, spinner first, then a 
another one with a semaphore dependency I am not sure why it hangs. When 
the spinner switches out after time slice expires the second request 
should run, cancel the spinner and exit. At which point they are both 
complete.

Regards,

Tvrtko

> Regards,
> Mastan
> 
> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: 29 November 2021 16:01
> To: Katragadda, MastanX <mastanx.katragadda@intel.com>; igt-dev@lists.freedesktop.org; Surendrakumar Upadhyay, TejaskumarX <tejaskumarx.surendrakumar.upadhyay@intel.com>
> Subject: Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
> 
> 
> On 26/11/2021 01:57, Mastan Katragadda wrote:
>> This a known failure when running
>> igt@gem_exec_balancer@bonded-(dual|pair|sync)
>> tests with GuC submission.The hang is expected with GuC submission
>> since the test was written to expect execlist scheduling hence added
>> skip if Guc submission enabled.
> 
> Looking at the test I don't see anything backend specific. Can you provide an explanation of how and why it hangs?
> 
> Regards,
> 
> Tvrtko
> 
>> Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
>> ---
>>    tests/i915/gem_exec_balancer.c | 1 +
>>    1 file changed, 1 insertion(+)
>>
>> diff --git a/tests/i915/gem_exec_balancer.c
>> b/tests/i915/gem_exec_balancer.c index cc07a5a9..d58734ab 100644
>> --- a/tests/i915/gem_exec_balancer.c
>> +++ b/tests/i915/gem_exec_balancer.c
>> @@ -3320,6 +3320,7 @@ igt_main
>>    
>>    	igt_subtest_group {
>>    		igt_fixture {
>> +			igt_require(!gem_using_guc_submission(i915));
>>    			intel_allocator_multiprocess_start();
>>    		}
>>    
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-29 11:15     ` Tvrtko Ursulin
@ 2021-11-30 16:48       ` Matthew Brost
  2021-12-01  9:46         ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Brost @ 2021-11-30 16:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev, Katragadda, MastanX

On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
> 
> On 29/11/2021 10:58, Katragadda, MastanX wrote:
> > Hi   Tvrtko Ursulin,
> >               Based on following backend information added skip on guc enabled platforms.
> > 
> > basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
> > 
> > Test is expected to fail with GuC submission, skip it in CI.
> 
> Does "blocks the rest of the queue" mean unrelated contexts submitted
> against the same engine class?
> 

Yes.

> If so then it would be a DoS vector and the "user _should_ always" would not
> be sufficient.
>

This is a DoS vector but there are about a million others and this no
worse than most. 

> Or if the blockage is localised to a single context then it might be fine
> (but unfortunate) if on top we chose to disallow submission to non-virtual
> indices in the engine map (in case of GuC)? If the firmware bug is not

We discussed this we basically land on if the UMDs want to do something
stupid, let them as with all the other DoS in the i915. We likely can't
disable non-virtual submission as some UMDs want to explictly place some
contexts on certain engines (compute). In that case the UMD has to be
smart enough to not submit contexts in a way to expose this scheduling
quirk.

> getting fixed that is. I may be on the wrong track here since I am not 100%

Just because the GuC scheduling doesn't work like execlists doesn't mean
that it is bug. Checking for the HoQ only is done for a good reason -
performance a uC can do everything and anything like execlists as it is
single low power CPU controling the scheduling of an entire GT. We can't
protect against everything a user can do that is stupid.

FWIW we do have an open task assigned to the GuC team to allow a KMD
configurable search depth when the queue can't be scheduled. No idea
when this is going to get implemented.

> certain I figured out why it exactly gets stuck.
> 
> Because, looking at the bonded-pair to start with, if the test is emitting a
> pair of request on the same context, spinner first, then a another one with
> a semaphore dependency I am not sure why it hangs. When the spinner switches
> out after time slice expires the second request should run, cancel the
> spinner and exit. At which point they are both complete.
>

I think only the MT version of bonded-pair hangs but it has been a while
since I looked at this.

IMO this fix is 100% correct as this is a known, tracked issue. It was
agreed upon (arch, i915, GuC team) that we just skip these tests with
GuC submission.

Matt

> Regards,
> 
> Tvrtko
> 
> > Regards,
> > Mastan
> > 
> > -----Original Message-----
> > From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Sent: 29 November 2021 16:01
> > To: Katragadda, MastanX <mastanx.katragadda@intel.com>; igt-dev@lists.freedesktop.org; Surendrakumar Upadhyay, TejaskumarX <tejaskumarx.surendrakumar.upadhyay@intel.com>
> > Subject: Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
> > 
> > 
> > On 26/11/2021 01:57, Mastan Katragadda wrote:
> > > This a known failure when running
> > > igt@gem_exec_balancer@bonded-(dual|pair|sync)
> > > tests with GuC submission.The hang is expected with GuC submission
> > > since the test was written to expect execlist scheduling hence added
> > > skip if Guc submission enabled.
> > 
> > Looking at the test I don't see anything backend specific. Can you provide an explanation of how and why it hangs?
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
> > > ---
> > >    tests/i915/gem_exec_balancer.c | 1 +
> > >    1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/tests/i915/gem_exec_balancer.c
> > > b/tests/i915/gem_exec_balancer.c index cc07a5a9..d58734ab 100644
> > > --- a/tests/i915/gem_exec_balancer.c
> > > +++ b/tests/i915/gem_exec_balancer.c
> > > @@ -3320,6 +3320,7 @@ igt_main
> > >    	igt_subtest_group {
> > >    		igt_fixture {
> > > +			igt_require(!gem_using_guc_submission(i915));
> > >    			intel_allocator_multiprocess_start();
> > >    		}
> > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-11-30 16:48       ` Matthew Brost
@ 2021-12-01  9:46         ` Tvrtko Ursulin
  2021-12-01 11:36           ` Radoslaw Szwichtenberg
  2021-12-01 23:42           ` Matthew Brost
  0 siblings, 2 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-01  9:46 UTC (permalink / raw)
  To: Matthew Brost; +Cc: igt-dev, Katragadda, MastanX


On 30/11/2021 16:48, Matthew Brost wrote:
> On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
>>
>> On 29/11/2021 10:58, Katragadda, MastanX wrote:
>>> Hi   Tvrtko Ursulin,
>>>                Based on following backend information added skip on guc enabled platforms.
>>>
>>> basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
>>>
>>> Test is expected to fail with GuC submission, skip it in CI.
>>
>> Does "blocks the rest of the queue" mean unrelated contexts submitted
>> against the same engine class?
>>
> 
> Yes.
> 
>> If so then it would be a DoS vector and the "user _should_ always" would not
>> be sufficient.
>>
> 
> This is a DoS vector but there are about a million others and this no
> worse than most.

Adding new ones should not be taken lightly.

>> Or if the blockage is localised to a single context then it might be fine
>> (but unfortunate) if on top we chose to disallow submission to non-virtual
>> indices in the engine map (in case of GuC)? If the firmware bug is not
> 
> We discussed this we basically land on if the UMDs want to do something
> stupid, let them as with all the other DoS in the i915. We likely can't
> disable non-virtual submission as some UMDs want to explictly place some
> contexts on certain engines (compute). In that case the UMD has to be
> smart enough to not submit contexts in a way to expose this scheduling
> quirk.

"Quirk"... :) Why it happens though? Is the firmware scheduling with GEM 
context granularity, and not intel_context, and that is the disconnect?

>> getting fixed that is. I may be on the wrong track here since I am not 100%
> 
> Just because the GuC scheduling doesn't work like execlists doesn't mean
> that it is bug. Checking for the HoQ only is done for a good reason -
> performance a uC can do everything and anything like execlists as it is
> single low power CPU controling the scheduling of an entire GT. We can't
> protect against everything a user can do that is stupid.
> 
> FWIW we do have an open task assigned to the GuC team to allow a KMD
> configurable search depth when the queue can't be scheduled. No idea
> when this is going to get implemented.

This maybe too specific. Question rather could be why is something not 
runnable put at the head of the queue by the firmware. Sounds like a 
plain bug to me.

>> certain I figured out why it exactly gets stuck.
>>
>> Because, looking at the bonded-pair to start with, if the test is emitting a
>> pair of request on the same context, spinner first, then a another one with
>> a semaphore dependency I am not sure why it hangs. When the spinner switches
>> out after time slice expires the second request should run, cancel the
>> spinner and exit. At which point they are both complete.
>>
> 
> I think only the MT version of bonded-pair hangs but it has been a while
> since I looked at this.
> 
> IMO this fix is 100% correct as this is a known, tracked issue. It was
> agreed upon (arch, i915, GuC team) that we just skip these tests with
> GuC submission.

I915 team is here on upstream as well.

Record those acks publicly would be my ask. Unless some security by 
obscurity is happening here? Until then from me it is a soft nack to 
keep disabling tests which show genuine weaknesses in GuC mode. Soft 
until we get a public record of exactly what is broken and in what 
circumstances, acked by architects publicly as you say they acked it 
somewhere. Commit message devoid of detail is not good enough.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01  9:46         ` Tvrtko Ursulin
@ 2021-12-01 11:36           ` Radoslaw Szwichtenberg
  2021-12-01 11:46             ` Daniel Vetter
  2021-12-01 23:42           ` Matthew Brost
  1 sibling, 1 reply; 23+ messages in thread
From: Radoslaw Szwichtenberg @ 2021-12-01 11:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost; +Cc: igt-dev, Katragadda, MastanX

On 01/12/2021 10:46, Tvrtko Ursulin wrote:
> 
> On 30/11/2021 16:48, Matthew Brost wrote:
>>
>> IMO this fix is 100% correct as this is a known, tracked issue. It was
>> agreed upon (arch, i915, GuC team) that we just skip these tests with
>> GuC submission.
This does not look like a fix to me - you just disable test to hide the 
result. If this issue is recorded with a bug, is tracked - why cant we 
just let this test fail till we get this issue fixed?
> 
> I915 team is here on upstream as well.
> 
> Record those acks publicly would be my ask. Unless some security by 
> obscurity is happening here? Until then from me it is a soft nack to 
> keep disabling tests which show genuine weaknesses in GuC mode. Soft 
> until we get a public record of exactly what is broken and in what 
> circumstances, acked by architects publicly as you say they acked it 
> somewhere. Commit message devoid of detail is not good enough.
This should be most probably documented in the bug, right? Here we 
should just keep the test as is till the issue is fixed. I don't see how 
docummenting an issue would enable us to just disable the test.

Radek

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 11:36           ` Radoslaw Szwichtenberg
@ 2021-12-01 11:46             ` Daniel Vetter
  2021-12-01 12:14               ` Tvrtko Ursulin
  2021-12-01 23:57               ` Matthew Brost
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel Vetter @ 2021-12-01 11:46 UTC (permalink / raw)
  To: Radoslaw Szwichtenberg; +Cc: igt-dev, Katragadda, MastanX

On Wed, Dec 1, 2021 at 12:37 PM Radoslaw Szwichtenberg
<radoslaw.szwichtenberg@intel.com> wrote:
> On 01/12/2021 10:46, Tvrtko Ursulin wrote:
> > On 30/11/2021 16:48, Matthew Brost wrote:
> >>
> >> IMO this fix is 100% correct as this is a known, tracked issue. It was
> >> agreed upon (arch, i915, GuC team) that we just skip these tests with
> >> GuC submission.
> This does not look like a fix to me - you just disable test to hide the
> result. If this issue is recorded with a bug, is tracked - why cant we
> just let this test fail till we get this issue fixed?

This is correct in general, but sadly not for gem igts and selftests.
The state of our validation suite is screwed up enough that
unfortunately the safe starting point for failing tests is that the
test is simply wrong, or too much just validating implementation
details of the current platform/driver, while not actually validating
stuff that should be tested for.

> > I915 team is here on upstream as well.
> >
> > Record those acks publicly would be my ask. Unless some security by
> > obscurity is happening here? Until then from me it is a soft nack to
> > keep disabling tests which show genuine weaknesses in GuC mode. Soft
> > until we get a public record of exactly what is broken and in what
> > circumstances, acked by architects publicly as you say they acked it
> > somewhere. Commit message devoid of detail is not good enough.
> This should be most probably documented in the bug, right? Here we
> should just keep the test as is till the issue is fixed. I don't see how
> docummenting an issue would enable us to just disable the test.

Sadly the situation is bad enough that I'm tempted to just drop a few
thousand Acked-by: me tags in this thread for any case where a
questionable testcase gets in the way. Unless someone can proof that
it's a POR architectural requirement we're validating here.

I do agree though that really we should just delete such tests
outright, not hide the mess on each platform individually.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 11:46             ` Daniel Vetter
@ 2021-12-01 12:14               ` Tvrtko Ursulin
  2021-12-01 23:57               ` Matthew Brost
  1 sibling, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-01 12:14 UTC (permalink / raw)
  To: Daniel Vetter, Radoslaw Szwichtenberg; +Cc: igt-dev, Katragadda, MastanX


On 01/12/2021 11:46, Daniel Vetter wrote:
> On Wed, Dec 1, 2021 at 12:37 PM Radoslaw Szwichtenberg
> <radoslaw.szwichtenberg@intel.com> wrote:
>> On 01/12/2021 10:46, Tvrtko Ursulin wrote:
>>> On 30/11/2021 16:48, Matthew Brost wrote:
>>>>
>>>> IMO this fix is 100% correct as this is a known, tracked issue. It was
>>>> agreed upon (arch, i915, GuC team) that we just skip these tests with
>>>> GuC submission.
>> This does not look like a fix to me - you just disable test to hide the
>> result. If this issue is recorded with a bug, is tracked - why cant we
>> just let this test fail till we get this issue fixed?
> 
> This is correct in general, but sadly not for gem igts and selftests.
> The state of our validation suite is screwed up enough that
> unfortunately the safe starting point for failing tests is that the
> test is simply wrong, or too much just validating implementation
> details of the current platform/driver, while not actually validating
> stuff that should be tested for.
> 
>>> I915 team is here on upstream as well.
>>>
>>> Record those acks publicly would be my ask. Unless some security by
>>> obscurity is happening here? Until then from me it is a soft nack to
>>> keep disabling tests which show genuine weaknesses in GuC mode. Soft
>>> until we get a public record of exactly what is broken and in what
>>> circumstances, acked by architects publicly as you say they acked it
>>> somewhere. Commit message devoid of detail is not good enough.
>> This should be most probably documented in the bug, right? Here we
>> should just keep the test as is till the issue is fixed. I don't see how
>> docummenting an issue would enable us to just disable the test.
> 
> Sadly the situation is bad enough that I'm tempted to just drop a few
> thousand Acked-by: me tags in this thread for any case where a
> questionable testcase gets in the way. Unless someone can proof that
> it's a POR architectural requirement we're validating here.
> 
> I do agree though that really we should just delete such tests
> outright, not hide the mess on each platform individually.

One ack is enough, thousands shouldn't be needed. :) But, since the test 
by accident showed how GuC firmware can get apparently completely 
blocked and confused by innocent userspace operations, please apply that 
ack against something with a proper commit message.

Statements such as "it's just one more DoS", "agreed by the i915 team" 
(where?) and commit message devoid of detail are not at the standard you 
yourself are otherwise advocating.

And on the technical level I really would like to know why and how GuC 
ends up with a non-runnable item stuck at the top of it's scheduling 
queue. AKA being able to understand the issue fully.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01  9:46         ` Tvrtko Ursulin
  2021-12-01 11:36           ` Radoslaw Szwichtenberg
@ 2021-12-01 23:42           ` Matthew Brost
  2021-12-02  9:19             ` Petri Latvala
                               ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Matthew Brost @ 2021-12-01 23:42 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev, Katragadda, MastanX

On Wed, Dec 01, 2021 at 09:46:02AM +0000, Tvrtko Ursulin wrote:
> 
> On 30/11/2021 16:48, Matthew Brost wrote:
> > On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 29/11/2021 10:58, Katragadda, MastanX wrote:
> > > > Hi   Tvrtko Ursulin,
> > > >                Based on following backend information added skip on guc enabled platforms.
> > > > 
> > > > basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
> > > > 
> > > > Test is expected to fail with GuC submission, skip it in CI.
> > > 
> > > Does "blocks the rest of the queue" mean unrelated contexts submitted
> > > against the same engine class?
> > > 
> > 
> > Yes.
> > 
> > > If so then it would be a DoS vector and the "user _should_ always" would not
> > > be sufficient.
> > > 
> > 
> > This is a DoS vector but there are about a million others and this no
> > worse than most.
> 
> Adding new ones should not be taken lightly.
> 
> > > Or if the blockage is localised to a single context then it might be fine
> > > (but unfortunate) if on top we chose to disallow submission to non-virtual
> > > indices in the engine map (in case of GuC)? If the firmware bug is not
> > 
> > We discussed this we basically land on if the UMDs want to do something
> > stupid, let them as with all the other DoS in the i915. We likely can't
> > disable non-virtual submission as some UMDs want to explictly place some
> > contexts on certain engines (compute). In that case the UMD has to be
> > smart enough to not submit contexts in a way to expose this scheduling
> > quirk.
> 
> "Quirk"... :) Why it happens though? Is the firmware scheduling with GEM
> context granularity, and not intel_context, and that is the disconnect?
> 

The firmware has no idea of GEM contexts or requests only contexts (a LRC).

The scheduling queue is per class, if the HoQ can't be scheduled it
blocks scheduling for that queue until the HoQ can be scheduled. If the
user is using VEs then the HoQ can't block unless all engines within the
class are full. This all goes back to making VEs optional to the user,
once we did that we have exposed ourselves to dump users.

Again this not a bug in the GuC firmware, this is by design and was SoB
by multiple architects. The i915 team doesn't control the GuC firmware
arch BTW. We can ask for changes, but at the end of the day we can't
force any changes without SoB from multiple architects, on multiple
teams.

The current proposal is add a per class KMD controlled policy which
controls the search depth of the queue to find a context which can be
scheduled (e.g. search so many entires past of the HoQ). It is likely to
get implemented.

> > > getting fixed that is. I may be on the wrong track here since I am not 100%
> > 
> > Just because the GuC scheduling doesn't work like execlists doesn't mean
> > that it is bug. Checking for the HoQ only is done for a good reason -
> > performance a uC can do everything and anything like execlists as it is
> > single low power CPU controling the scheduling of an entire GT. We can't
> > protect against everything a user can do that is stupid.
> > 
> > FWIW we do have an open task assigned to the GuC team to allow a KMD
> > configurable search depth when the queue can't be scheduled. No idea
> > when this is going to get implemented.
> 
> This maybe too specific. Question rather could be why is something not
> runnable put at the head of the queue by the firmware. Sounds like a plain
> bug to me.
> 

See above. As far as I'm concerned this is not a bug.

Also see Daniel's comments, this is basically a broken test which
assumes certain scheduler behavior. It is not testing ABI or a POR
arch requirement.

> > > certain I figured out why it exactly gets stuck.
> > > 
> > > Because, looking at the bonded-pair to start with, if the test is emitting a
> > > pair of request on the same context, spinner first, then a another one with
> > > a semaphore dependency I am not sure why it hangs. When the spinner switches
> > > out after time slice expires the second request should run, cancel the
> > > spinner and exit. At which point they are both complete.
> > > 
> > 
> > I think only the MT version of bonded-pair hangs but it has been a while
> > since I looked at this.
> > 
> > IMO this fix is 100% correct as this is a known, tracked issue. It was
> > agreed upon (arch, i915, GuC team) that we just skip these tests with
> > GuC submission.
> 
> I915 team is here on upstream as well.
> 
> Record those acks publicly would be my ask. Unless some security by
> obscurity is happening here? Until then from me it is a soft nack to keep
> disabling tests which show genuine weaknesses in GuC mode. Soft until we get
> a public record of exactly what is broken and in what circumstances, acked
> by architects publicly as you say they acked it somewhere. Commit message
> devoid of detail is not good enough.

I'm fine with not fixing this but you'd have to ask the CI if they can
live with this. If they can't (it crashes CI runs) then we have to skip
this within the test.

Matt

> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 11:46             ` Daniel Vetter
  2021-12-01 12:14               ` Tvrtko Ursulin
@ 2021-12-01 23:57               ` Matthew Brost
  1 sibling, 0 replies; 23+ messages in thread
From: Matthew Brost @ 2021-12-01 23:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: igt-dev, Katragadda, MastanX

On Wed, Dec 01, 2021 at 12:46:06PM +0100, Daniel Vetter wrote:
> On Wed, Dec 1, 2021 at 12:37 PM Radoslaw Szwichtenberg
> <radoslaw.szwichtenberg@intel.com> wrote:
> > On 01/12/2021 10:46, Tvrtko Ursulin wrote:
> > > On 30/11/2021 16:48, Matthew Brost wrote:
> > >>
> > >> IMO this fix is 100% correct as this is a known, tracked issue. It was
> > >> agreed upon (arch, i915, GuC team) that we just skip these tests with
> > >> GuC submission.
> > This does not look like a fix to me - you just disable test to hide the
> > result. If this issue is recorded with a bug, is tracked - why cant we
> > just let this test fail till we get this issue fixed?
> 
> This is correct in general, but sadly not for gem igts and selftests.
> The state of our validation suite is screwed up enough that
> unfortunately the safe starting point for failing tests is that the
> test is simply wrong, or too much just validating implementation
> details of the current platform/driver, while not actually validating
> stuff that should be tested for.
> 

That is exactly what this test is doing - IMO it is validating details
of backend scheduler not ABI nor a POR arch requirement. 

> > > I915 team is here on upstream as well.
> > >
> > > Record those acks publicly would be my ask. Unless some security by
> > > obscurity is happening here? Until then from me it is a soft nack to
> > > keep disabling tests which show genuine weaknesses in GuC mode. Soft
> > > until we get a public record of exactly what is broken and in what
> > > circumstances, acked by architects publicly as you say they acked it
> > > somewhere. Commit message devoid of detail is not good enough.
> > This should be most probably documented in the bug, right? Here we
> > should just keep the test as is till the issue is fixed. I don't see how
> > docummenting an issue would enable us to just disable the test.
> 
> Sadly the situation is bad enough that I'm tempted to just drop a few
> thousand Acked-by: me tags in this thread for any case where a
> questionable testcase gets in the way. Unless someone can proof that
> it's a POR architectural requirement we're validating here.
>

Agree our selftests / IGTs are a complete mess with many questionable
test cases. You wouldn't believe the amount of time / effort we waste
looking into these.

> I do agree though that really we should just delete such tests
> outright, not hide the mess on each platform individually.

Agree likely should do an audit of our tests and delete some of them.

Matt

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 23:42           ` Matthew Brost
@ 2021-12-02  9:19             ` Petri Latvala
  2021-12-02  9:20             ` Tvrtko Ursulin
  2021-12-03 19:36             ` Matthew Brost
  2 siblings, 0 replies; 23+ messages in thread
From: Petri Latvala @ 2021-12-02  9:19 UTC (permalink / raw)
  To: Matthew Brost; +Cc: igt-dev, Katragadda, MastanX

On Wed, Dec 01, 2021 at 03:42:34PM -0800, Matthew Brost wrote:
> On Wed, Dec 01, 2021 at 09:46:02AM +0000, Tvrtko Ursulin wrote:
> > 
> > On 30/11/2021 16:48, Matthew Brost wrote:
> > > On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 29/11/2021 10:58, Katragadda, MastanX wrote:
> > > > > Hi   Tvrtko Ursulin,
> > > > >                Based on following backend information added skip on guc enabled platforms.
> > > > > 
> > > > > basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
> > > > > 
> > > > > Test is expected to fail with GuC submission, skip it in CI.
> > > > 
> > > > Does "blocks the rest of the queue" mean unrelated contexts submitted
> > > > against the same engine class?
> > > > 
> > > 
> > > Yes.
> > > 
> > > > If so then it would be a DoS vector and the "user _should_ always" would not
> > > > be sufficient.
> > > > 
> > > 
> > > This is a DoS vector but there are about a million others and this no
> > > worse than most.
> > 
> > Adding new ones should not be taken lightly.
> > 
> > > > Or if the blockage is localised to a single context then it might be fine
> > > > (but unfortunate) if on top we chose to disallow submission to non-virtual
> > > > indices in the engine map (in case of GuC)? If the firmware bug is not
> > > 
> > > We discussed this we basically land on if the UMDs want to do something
> > > stupid, let them as with all the other DoS in the i915. We likely can't
> > > disable non-virtual submission as some UMDs want to explictly place some
> > > contexts on certain engines (compute). In that case the UMD has to be
> > > smart enough to not submit contexts in a way to expose this scheduling
> > > quirk.
> > 
> > "Quirk"... :) Why it happens though? Is the firmware scheduling with GEM
> > context granularity, and not intel_context, and that is the disconnect?
> > 
> 
> The firmware has no idea of GEM contexts or requests only contexts (a LRC).
> 
> The scheduling queue is per class, if the HoQ can't be scheduled it
> blocks scheduling for that queue until the HoQ can be scheduled. If the
> user is using VEs then the HoQ can't block unless all engines within the
> class are full. This all goes back to making VEs optional to the user,
> once we did that we have exposed ourselves to dump users.
> 
> Again this not a bug in the GuC firmware, this is by design and was SoB
> by multiple architects. The i915 team doesn't control the GuC firmware
> arch BTW. We can ask for changes, but at the end of the day we can't
> force any changes without SoB from multiple architects, on multiple
> teams.
> 
> The current proposal is add a per class KMD controlled policy which
> controls the search depth of the queue to find a context which can be
> scheduled (e.g. search so many entires past of the HoQ). It is likely to
> get implemented.
> 
> > > > getting fixed that is. I may be on the wrong track here since I am not 100%
> > > 
> > > Just because the GuC scheduling doesn't work like execlists doesn't mean
> > > that it is bug. Checking for the HoQ only is done for a good reason -
> > > performance a uC can do everything and anything like execlists as it is
> > > single low power CPU controling the scheduling of an entire GT. We can't
> > > protect against everything a user can do that is stupid.
> > > 
> > > FWIW we do have an open task assigned to the GuC team to allow a KMD
> > > configurable search depth when the queue can't be scheduled. No idea
> > > when this is going to get implemented.
> > 
> > This maybe too specific. Question rather could be why is something not
> > runnable put at the head of the queue by the firmware. Sounds like a plain
> > bug to me.
> > 
> 
> See above. As far as I'm concerned this is not a bug.
> 
> Also see Daniel's comments, this is basically a broken test which
> assumes certain scheduler behavior. It is not testing ABI or a POR
> arch requirement.
> 
> > > > certain I figured out why it exactly gets stuck.
> > > > 
> > > > Because, looking at the bonded-pair to start with, if the test is emitting a
> > > > pair of request on the same context, spinner first, then a another one with
> > > > a semaphore dependency I am not sure why it hangs. When the spinner switches
> > > > out after time slice expires the second request should run, cancel the
> > > > spinner and exit. At which point they are both complete.
> > > > 
> > > 
> > > I think only the MT version of bonded-pair hangs but it has been a while
> > > since I looked at this.
> > > 
> > > IMO this fix is 100% correct as this is a known, tracked issue. It was
> > > agreed upon (arch, i915, GuC team) that we just skip these tests with
> > > GuC submission.
> > 
> > I915 team is here on upstream as well.
> > 
> > Record those acks publicly would be my ask. Unless some security by
> > obscurity is happening here? Until then from me it is a soft nack to keep
> > disabling tests which show genuine weaknesses in GuC mode. Soft until we get
> > a public record of exactly what is broken and in what circumstances, acked
> > by architects publicly as you say they acked it somewhere. Commit message
> > devoid of detail is not good enough.
> 
> I'm fine with not fixing this but you'd have to ask the CI if they can
> live with this. If they can't (it crashes CI runs) then we have to skip
> this within the test.

It's https://gitlab.freedesktop.org/drm/intel/-/issues/4362 right?

That's a fairly well-behaving failure that gets recovered fine for the
next test. CI is fine with having the failure as is.


-- 
Petri Latvala

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 23:42           ` Matthew Brost
  2021-12-02  9:19             ` Petri Latvala
@ 2021-12-02  9:20             ` Tvrtko Ursulin
  2021-12-03 19:36             ` Matthew Brost
  2 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-02  9:20 UTC (permalink / raw)
  To: Matthew Brost; +Cc: igt-dev, Katragadda, MastanX


On 01/12/2021 23:42, Matthew Brost wrote:
> On Wed, Dec 01, 2021 at 09:46:02AM +0000, Tvrtko Ursulin wrote:
>>
>> On 30/11/2021 16:48, Matthew Brost wrote:
>>> On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 29/11/2021 10:58, Katragadda, MastanX wrote:
>>>>> Hi   Tvrtko Ursulin,
>>>>>                 Based on following backend information added skip on guc enabled platforms.
>>>>>
>>>>> basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
>>>>>
>>>>> Test is expected to fail with GuC submission, skip it in CI.
>>>>
>>>> Does "blocks the rest of the queue" mean unrelated contexts submitted
>>>> against the same engine class?
>>>>
>>>
>>> Yes.
>>>
>>>> If so then it would be a DoS vector and the "user _should_ always" would not
>>>> be sufficient.
>>>>
>>>
>>> This is a DoS vector but there are about a million others and this no
>>> worse than most.
>>
>> Adding new ones should not be taken lightly.
>>
>>>> Or if the blockage is localised to a single context then it might be fine
>>>> (but unfortunate) if on top we chose to disallow submission to non-virtual
>>>> indices in the engine map (in case of GuC)? If the firmware bug is not
>>>
>>> We discussed this we basically land on if the UMDs want to do something
>>> stupid, let them as with all the other DoS in the i915. We likely can't
>>> disable non-virtual submission as some UMDs want to explictly place some
>>> contexts on certain engines (compute). In that case the UMD has to be
>>> smart enough to not submit contexts in a way to expose this scheduling
>>> quirk.
>>
>> "Quirk"... :) Why it happens though? Is the firmware scheduling with GEM
>> context granularity, and not intel_context, and that is the disconnect?
>>
> 
> The firmware has no idea of GEM contexts or requests only contexts (a LRC).
> 
> The scheduling queue is per class, if the HoQ can't be scheduled it
> blocks scheduling for that queue until the HoQ can be scheduled. If the
> user is using VEs then the HoQ can't block unless all engines within the
> class are full. This all goes back to making VEs optional to the user,
> once we did that we have exposed ourselves to dump users.
> 
> Again this not a bug in the GuC firmware, this is by design and was SoB
> by multiple architects. The i915 team doesn't control the GuC firmware
> arch BTW. We can ask for changes, but at the end of the day we can't
> force any changes without SoB from multiple architects, on multiple
> teams.
> 
> The current proposal is add a per class KMD controlled policy which
> controls the search depth of the queue to find a context which can be
> scheduled (e.g. search so many entires past of the HoQ). It is likely to
> get implemented.

That's all fine and dandy as an explanation for someone who knows how 
GuC firmware works but is not what I am asking for.

Please write a commit message for this removal which explains which 
currently possible submission scenarios cannot be handled by GuC.

You can then expand into why, and GuC FW implementation details if you 
want, but is not the primary ask.

Start with the particular example of the bonded-pair subtest. As far as 
I can see it emits two requests on two contexts. First spins until 
second does a SDW. Second has a semaphore wait on first making SDW. So 
execution flow should end up like this:

  ctxA rq1 sdw1
  ctxA rq1 spin
<timeslice switches it out>
  ctxB rq1 semaphore wait - satisfied due sdw1
  ctxB rq1 sdw2
  ctxB rq1 completed
<completed - context switch>
  ctxA rq1 completed due sdw2

Please either explain that I didn't get what the test is doing, or if I 
have, how GuC manages to not cope with it. Like what does GuC sees as 
"head of the queue" in this case and why it is not schedulable.

When the root cause is explained what is needed is to come up with all 
possible entry points into this failure mode.

>>>> getting fixed that is. I may be on the wrong track here since I am not 100%
>>>
>>> Just because the GuC scheduling doesn't work like execlists doesn't mean
>>> that it is bug. Checking for the HoQ only is done for a good reason -
>>> performance a uC can do everything and anything like execlists as it is
>>> single low power CPU controling the scheduling of an entire GT. We can't
>>> protect against everything a user can do that is stupid.
>>>
>>> FWIW we do have an open task assigned to the GuC team to allow a KMD
>>> configurable search depth when the queue can't be scheduled. No idea
>>> when this is going to get implemented.
>>
>> This maybe too specific. Question rather could be why is something not
>> runnable put at the head of the queue by the firmware. Sounds like a plain
>> bug to me.
>>
> 
> See above. As far as I'm concerned this is not a bug.
> 
> Also see Daniel's comments, this is basically a broken test which
> assumes certain scheduler behavior. It is not testing ABI or a POR
> arch requirement.
> 
>>>> certain I figured out why it exactly gets stuck.
>>>>
>>>> Because, looking at the bonded-pair to start with, if the test is emitting a
>>>> pair of request on the same context, spinner first, then a another one with
>>>> a semaphore dependency I am not sure why it hangs. When the spinner switches
>>>> out after time slice expires the second request should run, cancel the
>>>> spinner and exit. At which point they are both complete.
>>>>
>>>
>>> I think only the MT version of bonded-pair hangs but it has been a while
>>> since I looked at this.
>>>
>>> IMO this fix is 100% correct as this is a known, tracked issue. It was
>>> agreed upon (arch, i915, GuC team) that we just skip these tests with
>>> GuC submission.
>>
>> I915 team is here on upstream as well.
>>
>> Record those acks publicly would be my ask. Unless some security by
>> obscurity is happening here? Until then from me it is a soft nack to keep
>> disabling tests which show genuine weaknesses in GuC mode. Soft until we get
>> a public record of exactly what is broken and in what circumstances, acked
>> by architects publicly as you say they acked it somewhere. Commit message
>> devoid of detail is not good enough.
> 
> I'm fine with not fixing this but you'd have to ask the CI if they can
> live with this. If they can't (it crashes CI runs) then we have to skip
> this within the test.

That's not what I said. I said we need a proper commit message to go 
with adding skips (or removals) so it is recorded what has been swept 
under the carpet. Only then acks make sense. Acks against a commit 
message which says nothing mean nothing in terms of accountability and 
process.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-01 23:42           ` Matthew Brost
  2021-12-02  9:19             ` Petri Latvala
  2021-12-02  9:20             ` Tvrtko Ursulin
@ 2021-12-03 19:36             ` Matthew Brost
  2 siblings, 0 replies; 23+ messages in thread
From: Matthew Brost @ 2021-12-03 19:36 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev, Katragadda, MastanX

On Wed, Dec 01, 2021 at 03:42:34PM -0800, Matthew Brost wrote:
> On Wed, Dec 01, 2021 at 09:46:02AM +0000, Tvrtko Ursulin wrote:
> > 
> > On 30/11/2021 16:48, Matthew Brost wrote:
> > > On Mon, Nov 29, 2021 at 11:15:22AM +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 29/11/2021 10:58, Katragadda, MastanX wrote:
> > > > > Hi   Tvrtko Ursulin,
> > > > >                Based on following backend information added skip on guc enabled platforms.
> > > > > 
> > > > > basically there is a quirk in GuC scheduling where if a context reaches the head of the queue and can't be scheduled it blocks the rest of the queue. A queue is an engine class. So in this case if the user submits to VCS0, VCS0, then VCS1 and the first submission to VCS0 is spinner the VCS1 submission is blocked. This more or less is exactly what this test is doing, thus it hangs. We have a request with the GuC firmware team to be able to tweak this head of queue blocking behaviour but don't expect to land anytime soon. Also in the real world this isn't an issue as the user should always be using VEs which should never block the head of the queue unless all the engines within the class are busy.
> > > > > 
> > > > > Test is expected to fail with GuC submission, skip it in CI.
> > > > 
> > > > Does "blocks the rest of the queue" mean unrelated contexts submitted
> > > > against the same engine class?
> > > > 
> > > 
> > > Yes.
> > > 
> > > > If so then it would be a DoS vector and the "user _should_ always" would not
> > > > be sufficient.
> > > > 
> > > 
> > > This is a DoS vector but there are about a million others and this no
> > > worse than most.
> > 
> > Adding new ones should not be taken lightly.

I have used the wrong terminology here / didn't mean we shouldn't take
DoS seriously. What I really meant to say there are million ways a
malicious user can temporarily deny other users service (e.g. just
create 1000s of contexts with hang batches to all engines), in both
cases we have protections to eventually detect a bad user (hangcheck)
and ban it unblocking other users. IMO a user using physical engines
when it shouldn't and user intentionally hogging engines with hanging
batches are basically the same thing - bad user software. This is
non-issue for well behavied UMDs, there is not a hard DoS vector exposed
here, nor is there a new temporary DoS vector exposed that didn't
already exist.

Matt

> > 
> > > > Or if the blockage is localised to a single context then it might be fine
> > > > (but unfortunate) if on top we chose to disallow submission to non-virtual
> > > > indices in the engine map (in case of GuC)? If the firmware bug is not
> > > 
> > > We discussed this we basically land on if the UMDs want to do something
> > > stupid, let them as with all the other DoS in the i915. We likely can't
> > > disable non-virtual submission as some UMDs want to explictly place some
> > > contexts on certain engines (compute). In that case the UMD has to be
> > > smart enough to not submit contexts in a way to expose this scheduling
> > > quirk.
> > 
> > "Quirk"... :) Why it happens though? Is the firmware scheduling with GEM
> > context granularity, and not intel_context, and that is the disconnect?
> > 
> 
> The firmware has no idea of GEM contexts or requests only contexts (a LRC).
> 
> The scheduling queue is per class, if the HoQ can't be scheduled it
> blocks scheduling for that queue until the HoQ can be scheduled. If the
> user is using VEs then the HoQ can't block unless all engines within the
> class are full. This all goes back to making VEs optional to the user,
> once we did that we have exposed ourselves to dump users.
> 
> Again this not a bug in the GuC firmware, this is by design and was SoB
> by multiple architects. The i915 team doesn't control the GuC firmware
> arch BTW. We can ask for changes, but at the end of the day we can't
> force any changes without SoB from multiple architects, on multiple
> teams.
> 
> The current proposal is add a per class KMD controlled policy which
> controls the search depth of the queue to find a context which can be
> scheduled (e.g. search so many entires past of the HoQ). It is likely to
> get implemented.
> 
> > > > getting fixed that is. I may be on the wrong track here since I am not 100%
> > > 
> > > Just because the GuC scheduling doesn't work like execlists doesn't mean
> > > that it is bug. Checking for the HoQ only is done for a good reason -
> > > performance a uC can do everything and anything like execlists as it is
> > > single low power CPU controling the scheduling of an entire GT. We can't
> > > protect against everything a user can do that is stupid.
> > > 
> > > FWIW we do have an open task assigned to the GuC team to allow a KMD
> > > configurable search depth when the queue can't be scheduled. No idea
> > > when this is going to get implemented.
> > 
> > This maybe too specific. Question rather could be why is something not
> > runnable put at the head of the queue by the firmware. Sounds like a plain
> > bug to me.
> > 
> 
> See above. As far as I'm concerned this is not a bug.
> 
> Also see Daniel's comments, this is basically a broken test which
> assumes certain scheduler behavior. It is not testing ABI or a POR
> arch requirement.
> 
> > > > certain I figured out why it exactly gets stuck.
> > > > 
> > > > Because, looking at the bonded-pair to start with, if the test is emitting a
> > > > pair of request on the same context, spinner first, then a another one with
> > > > a semaphore dependency I am not sure why it hangs. When the spinner switches
> > > > out after time slice expires the second request should run, cancel the
> > > > spinner and exit. At which point they are both complete.
> > > > 
> > > 
> > > I think only the MT version of bonded-pair hangs but it has been a while
> > > since I looked at this.
> > > 
> > > IMO this fix is 100% correct as this is a known, tracked issue. It was
> > > agreed upon (arch, i915, GuC team) that we just skip these tests with
> > > GuC submission.
> > 
> > I915 team is here on upstream as well.
> > 
> > Record those acks publicly would be my ask. Unless some security by
> > obscurity is happening here? Until then from me it is a soft nack to keep
> > disabling tests which show genuine weaknesses in GuC mode. Soft until we get
> > a public record of exactly what is broken and in what circumstances, acked
> > by architects publicly as you say they acked it somewhere. Commit message
> > devoid of detail is not good enough.
> 
> I'm fine with not fixing this but you'd have to ask the CI if they can
> live with this. If they can't (it crashes CI runs) then we have to skip
> this within the test.
> 
> Matt
> 
> > 
> > Regards,
> > 
> > Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-23 11:08       ` Tvrtko Ursulin
@ 2021-12-23 11:16         ` Surendrakumar Upadhyay, TejaskumarX
  0 siblings, 0 replies; 23+ messages in thread
From: Surendrakumar Upadhyay, TejaskumarX @ 2021-12-23 11:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Harrison, John C, Katragadda, MastanX, igt-dev
  Cc: Bloomfield, Jon, Daniel Vetter



> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: 23 December 2021 16:38
> To: Harrison, John C <john.c.harrison@intel.com>; Katragadda, MastanX
> <mastanx.katragadda@intel.com>; igt-dev@lists.freedesktop.org;
> Surendrakumar Upadhyay, TejaskumarX
> <tejaskumarx.surendrakumar.upadhyay@intel.com>
> Cc: Bloomfield, Jon <jon.bloomfield@intel.com>; Daniel Vetter
> <daniel@ffwll.ch>
> Subject: Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc
> Submission
> 
> 
> On 22/12/2021 21:02, John Harrison wrote:
> > On 12/22/2021 01:30, Tvrtko Ursulin wrote:
> >> I see this has been merged despite my complaint that the commit
> >> message is not transparent enough and without public arch level acks.
> > I would ack a revert if you want to post it...
> 
> Revert and re-submit with a better/more complete explanation would be my
> preferred choice yes.

+ Matthew

Ok Mastan please revert the change and post again with complete explanation.

Thanks,
Tejas
> 
> >> On 10/12/2021 10:02, Tvrtko Ursulin wrote:
> >>>
> >>> Old commit message:
> >>>
> >>> """
> >>> This a known failure when running
> >>> igt@gem_exec_balancer@bonded-(dual|pair|sync)
> >>> tests with GuC submission.The hang is expected with GuC submission
> >>> since the test was written to expect execlist scheduling hence added
> >>> skip if Guc submission enabled.
> >>> """
> >>>
> >>> New commit message:
> >>>
> >>> On 09/12/2021 08:20, Mastan Katragadda wrote:
> >>>> This test makes assumptions about the backend scheduling algorithm
> >>>> that a real world UMD would never assume. This test is not testing
> >>>> a UMD-KMD contract, rather a specific backend.It is an invalid test
> >>>> case thus we are skipping.
> >>>>
> >>>> Changes Since v1:
> >>>>               -Updated commit message
> >>>
> >>> Well that hasn't been really improved at all an is still totally
> >>> opaque. No mention whether it is all or some subtests, no mention of
> >>> what assumptions, what exact usage of the ABI is discouraged, not
> >>> much more than pure rewording.
> >>>
> >>> No desire to split into passing and non passing tests and only skip
> >>> latter group?
> > It's not clear from the patch itself, but looking at the code this
> > change appears to only affect a specific set of subtests. Those being
> > 'sliced' and 'bonded-(pair|dual|sync)'. It's not actually disabling
> > the entire gem_exec_balancer suite. That is really not clear from the
> > subject or commit description, though. And I totally agree that it
> > should be documenting a) what the test is doing and b) why that
> > behaviour is both backend specific and unreasonable for a UMD.
> >
> > My understanding is that gem_exec_balancer had already been updated to
> > support the new parallel extension as well as the old bonding extension.
> > So any subtest with 'parallel' as the prefix is supposed to work with
> > GuC submission. Having said that, there are odd other subtests that
> > are not obviously related to either bonded nor parallel submission
> > (such as persistence, noheartbeat, nohangcheck) which I recall do rely
> > on execlist backend implementations and therefore have issues when
> > using GuC submission.
> >
> > I totally agree that we should not be blanket disabling an entire test
> > suite or even sub-group with such a generic stated reason. The bonded
> > tests (and parallel tests) are meant to check for the presence of the
> > appropriate API and skip if not available. They should not be skipping
> > based on arbitrary assumptions about the platform.
> >
> >
> >>>
> >>> Or to add a twist, what about making the specific subtests which
> >>> hang with GuC, expect to hang and therefore pass?
> > Seems unnecessarily complicated. Why require that something is broken?
> > If a test is specific to a backend implementation then the behaviour
> > is undefined on any other implementation. It may hang, it may pass, it
> > may be random each time you run it. If a test is not relevant to a
> > given platform, I would say that it should just be skipped.
> >
> >>> That would a) document the failing corner case of the ABI and more
> >>> importantly b) keep exercising the i915 code paths which sit between
> >>> the uAPI and the GuC firmware?
> > You can't exercise code paths which don't exist. Bonding is
> > fundamentally an i915 only implementation. Persistence is partially
> > implemented inside the execlist backend. So you can't get to the low
> > level code of either when GuC submission is enabled no matter what
> > hideousness you put in the test.
> 
> Sticking to bonding-* subtests..
> 
> Catch here is that even though subtests in question have bonding in their
> names they are not using the bonding extension!
> 
> They are simply using the submit fence with plain virtual engines. And I am
> told submit fence cannot be disabled on the uAPI level (even on GuC
> submission platforms).
> 
> So my concern is for code coverage of using the two in combination. I don't
> think it's good just to not have any (coverage), even if the statement is the
> two /shouldn't/ be used together. If uapi _can_ be used together we have to
> see what happens in the driver if it is. In order to avoid nasty surprises if
> there was some unused code that bit rotted in a dangerous way.
> 
> That was the first thing I asked to ack in clear explicit words.
> 
> Second part what the commit message should say is which of the actual
> disabled subtests fail with GuC. Matt was saying about unpreemptable
> spinners, but not all subtests use those. So perhaps splitting into a group
> which passes and which is a problem for GuC would work and keep the
> coverage.
> 
> >>> If that is not acceptable, and I do acknowledge to the cost to CI
> >>> time, then architects should ack and in my mind the ack should mean
> >>> one of the two things. Either:
> >>>
> >>>   1) We know there is coverage for the same i915 code paths
> >>> elsewhere in IGT. Or:
> > That exists when running on older platforms that support the feature
> > being tested.
> >
> >>>
> >>>   2) We are willingly dropping coverage for these i915 code paths on
> >>> GuC platforms with the risk that the code might unknowingly bit rot
> >>> and cause more serious security issues down the line.
> > But we don't care if the code rots for platforms it does not support
> > because it doesn't support them.
> 
> Hopefully I managed to explain this in the previous paragraph. If we would
> be able to say ENODEV that would be ideal, but we are not, so "doesn't
> support" in this case is more like "please don't use this". I think we need to
> know what happens if someone does use it.
> 
> >>
> >> And no acks to the effect of either 1) or 2) were given. So git
> >> history will not show what is this allegedly backend specific
> >> behavior, neither it will clarify which parts of the uapi are stopped
> >> being tested in GuC mode.
> > I agree entirely with this. And would go further to say that the test
> > code itself should be documenting what it is doing, what it is trying to
> > test and why it things that is not applicable to a given platform if it
> > is going to skip that platform.
> 
> Oh yes, understanding a lot of our tests is such a time sink.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-22 21:02     ` John Harrison
@ 2021-12-23 11:08       ` Tvrtko Ursulin
  2021-12-23 11:16         ` Surendrakumar Upadhyay, TejaskumarX
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-23 11:08 UTC (permalink / raw)
  To: John Harrison, Mastan Katragadda, igt-dev,
	tejaskumarx.surendrakumar.upadhyay
  Cc: Bloomfield, Jon, Daniel Vetter


On 22/12/2021 21:02, John Harrison wrote:
> On 12/22/2021 01:30, Tvrtko Ursulin wrote:
>> I see this has been merged despite my complaint that the commit 
>> message is not transparent enough and without public arch level acks.
> I would ack a revert if you want to post it...

Revert and re-submit with a better/more complete explanation would be my 
preferred choice yes.

>> On 10/12/2021 10:02, Tvrtko Ursulin wrote:
>>>
>>> Old commit message:
>>>
>>> """
>>> This a known failure when running 
>>> igt@gem_exec_balancer@bonded-(dual|pair|sync)
>>> tests with GuC submission.The hang is expected with GuC submission 
>>> since the
>>> test was written to expect execlist scheduling hence added skip if Guc
>>> submission enabled.
>>> """
>>>
>>> New commit message:
>>>
>>> On 09/12/2021 08:20, Mastan Katragadda wrote:
>>>> This test makes assumptions about the backend scheduling algorithm that
>>>> a real world UMD would never assume. This test is not testing a UMD-KMD
>>>> contract, rather a specific backend.It is an invalid test case thus we
>>>> are skipping.
>>>>
>>>> Changes Since v1:
>>>>               -Updated commit message
>>>
>>> Well that hasn't been really improved at all an is still totally 
>>> opaque. No mention whether it is all or some subtests, no mention of 
>>> what assumptions, what exact usage of the ABI is discouraged, not 
>>> much more than pure rewording.
>>>
>>> No desire to split into passing and non passing tests and only skip 
>>> latter group?
> It's not clear from the patch itself, but looking at the code this 
> change appears to only affect a specific set of subtests. Those being 
> 'sliced' and 'bonded-(pair|dual|sync)'. It's not actually disabling the 
> entire gem_exec_balancer suite. That is really not clear from the 
> subject or commit description, though. And I totally agree that it 
> should be documenting a) what the test is doing and b) why that 
> behaviour is both backend specific and unreasonable for a UMD.
> 
> My understanding is that gem_exec_balancer had already been updated to 
> support the new parallel extension as well as the old bonding extension. 
> So any subtest with 'parallel' as the prefix is supposed to work with 
> GuC submission. Having said that, there are odd other subtests that are 
> not obviously related to either bonded nor parallel submission (such as 
> persistence, noheartbeat, nohangcheck) which I recall do rely on 
> execlist backend implementations and therefore have issues when using 
> GuC submission.
> 
> I totally agree that we should not be blanket disabling an entire test 
> suite or even sub-group with such a generic stated reason. The bonded 
> tests (and parallel tests) are meant to check for the presence of the 
> appropriate API and skip if not available. They should not be skipping 
> based on arbitrary assumptions about the platform.
> 
> 
>>>
>>> Or to add a twist, what about making the specific subtests which hang 
>>> with GuC, expect to hang and therefore pass?
> Seems unnecessarily complicated. Why require that something is broken? 
> If a test is specific to a backend implementation then the behaviour is 
> undefined on any other implementation. It may hang, it may pass, it may 
> be random each time you run it. If a test is not relevant to a given 
> platform, I would say that it should just be skipped.
> 
>>> That would a) document the failing corner case of the ABI and more 
>>> importantly b) keep exercising the i915 code paths which sit between 
>>> the uAPI and the GuC firmware?
> You can't exercise code paths which don't exist. Bonding is 
> fundamentally an i915 only implementation. Persistence is partially 
> implemented inside the execlist backend. So you can't get to the low 
> level code of either when GuC submission is enabled no matter what 
> hideousness you put in the test.

Sticking to bonding-* subtests..

Catch here is that even though subtests in question have bonding in 
their names they are not using the bonding extension!

They are simply using the submit fence with plain virtual engines. And I 
am told submit fence cannot be disabled on the uAPI level (even on GuC 
submission platforms).

So my concern is for code coverage of using the two in combination. I 
don't think it's good just to not have any (coverage), even if the 
statement is the two /shouldn't/ be used together. If uapi _can_ be used 
together we have to see what happens in the driver if it is. In order to 
avoid nasty surprises if there was some unused code that bit rotted in a 
dangerous way.

That was the first thing I asked to ack in clear explicit words.

Second part what the commit message should say is which of the actual 
disabled subtests fail with GuC. Matt was saying about unpreemptable 
spinners, but not all subtests use those. So perhaps splitting into a 
group which passes and which is a problem for GuC would work and keep 
the coverage.

>>> If that is not acceptable, and I do acknowledge to the cost to CI 
>>> time, then architects should ack and in my mind the ack should mean 
>>> one of the two things. Either:
>>>
>>>   1) We know there is coverage for the same i915 code paths elsewhere 
>>> in IGT. Or:
> That exists when running on older platforms that support the feature 
> being tested.
> 
>>>
>>>   2) We are willingly dropping coverage for these i915 code paths on 
>>> GuC platforms with the risk that the code might unknowingly bit rot 
>>> and cause more serious security issues down the line.
> But we don't care if the code rots for platforms it does not support 
> because it doesn't support them.

Hopefully I managed to explain this in the previous paragraph. If we 
would be able to say ENODEV that would be ideal, but we are not, so 
"doesn't support" in this case is more like "please don't use this". I 
think we need to know what happens if someone does use it.

>>
>> And no acks to the effect of either 1) or 2) were given. So git 
>> history will not show what is this allegedly backend specific 
>> behavior, neither it will clarify which parts of the uapi are stopped 
>> being tested in GuC mode.
> I agree entirely with this. And would go further to say that the test 
> code itself should be documenting what it is doing, what it is trying to 
> test and why it things that is not applicable to a given platform if it 
> is going to skip that platform.

Oh yes, understanding a lot of our tests is such a time sink.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-22  9:30   ` Tvrtko Ursulin
@ 2021-12-22 21:02     ` John Harrison
  2021-12-23 11:08       ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: John Harrison @ 2021-12-22 21:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, Mastan Katragadda, igt-dev,
	tejaskumarx.surendrakumar.upadhyay
  Cc: Bloomfield, Jon, Daniel Vetter

On 12/22/2021 01:30, Tvrtko Ursulin wrote:
> I see this has been merged despite my complaint that the commit 
> message is not transparent enough and without public arch level acks.
I would ack a revert if you want to post it...

>
> On 10/12/2021 10:02, Tvrtko Ursulin wrote:
>>
>> Old commit message:
>>
>> """
>> This a known failure when running 
>> igt@gem_exec_balancer@bonded-(dual|pair|sync)
>> tests with GuC submission.The hang is expected with GuC submission 
>> since the
>> test was written to expect execlist scheduling hence added skip if Guc
>> submission enabled.
>> """
>>
>> New commit message:
>>
>> On 09/12/2021 08:20, Mastan Katragadda wrote:
>>> This test makes assumptions about the backend scheduling algorithm that
>>> a real world UMD would never assume. This test is not testing a UMD-KMD
>>> contract, rather a specific backend.It is an invalid test case thus we
>>> are skipping.
>>>
>>> Changes Since v1:
>>>               -Updated commit message
>>
>> Well that hasn't been really improved at all an is still totally 
>> opaque. No mention whether it is all or some subtests, no mention of 
>> what assumptions, what exact usage of the ABI is discouraged, not 
>> much more than pure rewording.
>>
>> No desire to split into passing and non passing tests and only skip 
>> latter group?
It's not clear from the patch itself, but looking at the code this 
change appears to only affect a specific set of subtests. Those being 
'sliced' and 'bonded-(pair|dual|sync)'. It's not actually disabling the 
entire gem_exec_balancer suite. That is really not clear from the 
subject or commit description, though. And I totally agree that it 
should be documenting a) what the test is doing and b) why that 
behaviour is both backend specific and unreasonable for a UMD.

My understanding is that gem_exec_balancer had already been updated to 
support the new parallel extension as well as the old bonding extension. 
So any subtest with 'parallel' as the prefix is supposed to work with 
GuC submission. Having said that, there are odd other subtests that are 
not obviously related to either bonded nor parallel submission (such as 
persistence, noheartbeat, nohangcheck) which I recall do rely on 
execlist backend implementations and therefore have issues when using 
GuC submission.

I totally agree that we should not be blanket disabling an entire test 
suite or even sub-group with such a generic stated reason. The bonded 
tests (and parallel tests) are meant to check for the presence of the 
appropriate API and skip if not available. They should not be skipping 
based on arbitrary assumptions about the platform.


>>
>> Or to add a twist, what about making the specific subtests which hang 
>> with GuC, expect to hang and therefore pass?
Seems unnecessarily complicated. Why require that something is broken? 
If a test is specific to a backend implementation then the behaviour is 
undefined on any other implementation. It may hang, it may pass, it may 
be random each time you run it. If a test is not relevant to a given 
platform, I would say that it should just be skipped.

>> That would a) document the failing corner case of the ABI and more 
>> importantly b) keep exercising the i915 code paths which sit between 
>> the uAPI and the GuC firmware?
You can't exercise code paths which don't exist. Bonding is 
fundamentally an i915 only implementation. Persistence is partially 
implemented inside the execlist backend. So you can't get to the low 
level code of either when GuC submission is enabled no matter what 
hideousness you put in the test.

>>
>> If that is not acceptable, and I do acknowledge to the cost to CI 
>> time, then architects should ack and in my mind the ack should mean 
>> one of the two things. Either:
>>
>>   1) We know there is coverage for the same i915 code paths elsewhere 
>> in IGT. Or:
That exists when running on older platforms that support the feature 
being tested.

>>
>>   2) We are willingly dropping coverage for these i915 code paths on 
>> GuC platforms with the risk that the code might unknowingly bit rot 
>> and cause more serious security issues down the line.
But we don't care if the code rots for platforms it does not support 
because it doesn't support them.

>
> And no acks to the effect of either 1) or 2) were given. So git 
> history will not show what is this allegedly backend specific 
> behavior, neither it will clarify which parts of the uapi are stopped 
> being tested in GuC mode.
I agree entirely with this. And would go further to say that the test 
code itself should be documenting what it is doing, what it is trying to 
test and why it things that is not applicable to a given platform if it 
is going to skip that platform.

John.


>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-10 10:02 ` Tvrtko Ursulin
@ 2021-12-22  9:30   ` Tvrtko Ursulin
  2021-12-22 21:02     ` John Harrison
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-22  9:30 UTC (permalink / raw)
  To: Mastan Katragadda, igt-dev, tejaskumarx.surendrakumar.upadhyay
  Cc: Bloomfield, Jon, Daniel Vetter


I see this has been merged despite my complaint that the commit message 
is not transparent enough and without public arch level acks.

On 10/12/2021 10:02, Tvrtko Ursulin wrote:
> 
> Old commit message:
> 
> """
> This a known failure when running 
> igt@gem_exec_balancer@bonded-(dual|pair|sync)
> tests with GuC submission.The hang is expected with GuC submission since 
> the
> test was written to expect execlist scheduling hence added skip if Guc
> submission enabled.
> """
> 
> New commit message:
> 
> On 09/12/2021 08:20, Mastan Katragadda wrote:
>> This test makes assumptions about the backend scheduling algorithm that
>> a real world UMD would never assume. This test is not testing a UMD-KMD
>> contract, rather a specific backend.It is an invalid test case thus we
>> are skipping.
>>
>> Changes Since v1:
>>               -Updated commit message
> 
> Well that hasn't been really improved at all an is still totally opaque. 
> No mention whether it is all or some subtests, no mention of what 
> assumptions, what exact usage of the ABI is discouraged, not much more 
> than pure rewording.
> 
> No desire to split into passing and non passing tests and only skip 
> latter group?
> 
> Or to add a twist, what about making the specific subtests which hang 
> with GuC, expect to hang and therefore pass?
> 
> That would a) document the failing corner case of the ABI and more 
> importantly b) keep exercising the i915 code paths which sit between the 
> uAPI and the GuC firmware?
> 
> If that is not acceptable, and I do acknowledge to the cost to CI time, 
> then architects should ack and in my mind the ack should mean one of the 
> two things. Either:
> 
>   1) We know there is coverage for the same i915 code paths elsewhere in 
> IGT. Or:
> 
>   2) We are willingly dropping coverage for these i915 code paths on GuC 
> platforms with the risk that the code might unknowingly bit rot and 
> cause more serious security issues down the line.

And no acks to the effect of either 1) or 2) were given. So git history 
will not show what is this allegedly backend specific behavior, neither 
it will clarify which parts of the uapi are stopped being tested in GuC 
mode.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-09  8:20 Mastan Katragadda
  2021-12-10  3:20 ` Matthew Brost
@ 2021-12-10 10:02 ` Tvrtko Ursulin
  2021-12-22  9:30   ` Tvrtko Ursulin
  1 sibling, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2021-12-10 10:02 UTC (permalink / raw)
  To: Mastan Katragadda, igt-dev, tejaskumarx.surendrakumar.upadhyay
  Cc: Bloomfield, Jon


Old commit message:

"""
This a known failure when running 
igt@gem_exec_balancer@bonded-(dual|pair|sync)
tests with GuC submission.The hang is expected with GuC submission since the
test was written to expect execlist scheduling hence added skip if Guc
submission enabled.
"""

New commit message:

On 09/12/2021 08:20, Mastan Katragadda wrote:
> This test makes assumptions about the backend scheduling algorithm that
> a real world UMD would never assume. This test is not testing a UMD-KMD
> contract, rather a specific backend.It is an invalid test case thus we
> are skipping.
> 
> Changes Since v1:
>               -Updated commit message

Well that hasn't been really improved at all an is still totally opaque. 
No mention whether it is all or some subtests, no mention of what 
assumptions, what exact usage of the ABI is discouraged, not much more 
than pure rewording.

No desire to split into passing and non passing tests and only skip 
latter group?

Or to add a twist, what about making the specific subtests which hang 
with GuC, expect to hang and therefore pass?

That would a) document the failing corner case of the ABI and more 
importantly b) keep exercising the i915 code paths which sit between the 
uAPI and the GuC firmware?

If that is not acceptable, and I do acknowledge to the cost to CI time, 
then architects should ack and in my mind the ack should mean one of the 
two things. Either:

  1) We know there is coverage for the same i915 code paths elsewhere in 
IGT. Or:

  2) We are willingly dropping coverage for these i915 code paths on GuC 
platforms with the risk that the code might unknowingly bit rot and 
cause more serious security issues down the line.

Regards,

Tvrtko

> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
> ---
>   tests/i915/gem_exec_balancer.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
> index cc07a5a9..d58734ab 100644
> --- a/tests/i915/gem_exec_balancer.c
> +++ b/tests/i915/gem_exec_balancer.c
> @@ -3320,6 +3320,7 @@ igt_main
>   
>   	igt_subtest_group {
>   		igt_fixture {
> +			igt_require(!gem_using_guc_submission(i915));
>   			intel_allocator_multiprocess_start();
>   		}
>   
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
  2021-12-09  8:20 Mastan Katragadda
@ 2021-12-10  3:20 ` Matthew Brost
  2021-12-10 10:02 ` Tvrtko Ursulin
  1 sibling, 0 replies; 23+ messages in thread
From: Matthew Brost @ 2021-12-10  3:20 UTC (permalink / raw)
  To: Mastan Katragadda; +Cc: igt-dev

On Thu, Dec 09, 2021 at 01:50:53PM +0530, Mastan Katragadda wrote:
> This test makes assumptions about the backend scheduling algorithm that
> a real world UMD would never assume. This test is not testing a UMD-KMD
> contract, rather a specific backend.It is an invalid test case thus we
> are skipping.
> 
> Changes Since v1:
>              -Updated commit message
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  tests/i915/gem_exec_balancer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
> index cc07a5a9..d58734ab 100644
> --- a/tests/i915/gem_exec_balancer.c
> +++ b/tests/i915/gem_exec_balancer.c
> @@ -3320,6 +3320,7 @@ igt_main
>  
>  	igt_subtest_group {
>  		igt_fixture {
> +			igt_require(!gem_using_guc_submission(i915));
>  			intel_allocator_multiprocess_start();
>  		}
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission
@ 2021-12-09  8:20 Mastan Katragadda
  2021-12-10  3:20 ` Matthew Brost
  2021-12-10 10:02 ` Tvrtko Ursulin
  0 siblings, 2 replies; 23+ messages in thread
From: Mastan Katragadda @ 2021-12-09  8:20 UTC (permalink / raw)
  To: igt-dev, tejaskumarx.surendrakumar.upadhyay; +Cc: mastanx.katragadda

This test makes assumptions about the backend scheduling algorithm that
a real world UMD would never assume. This test is not testing a UMD-KMD
contract, rather a specific backend.It is an invalid test case thus we
are skipping.

Changes Since v1:
             -Updated commit message

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Mastan Katragadda <mastanx.katragadda@intel.com>
---
 tests/i915/gem_exec_balancer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
index cc07a5a9..d58734ab 100644
--- a/tests/i915/gem_exec_balancer.c
+++ b/tests/i915/gem_exec_balancer.c
@@ -3320,6 +3320,7 @@ igt_main
 
 	igt_subtest_group {
 		igt_fixture {
+			igt_require(!gem_using_guc_submission(i915));
 			intel_allocator_multiprocess_start();
 		}
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-12-23 11:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26  1:57 [igt-dev] [i-g-t] tests/i915/exec_balancer: Added Skip Guc Submission Mastan Katragadda
2021-11-26  2:42 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
2021-11-26  8:46 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
2021-11-29 10:30 ` [igt-dev] [i-g-t] " Tvrtko Ursulin
2021-11-29 10:58   ` Katragadda, MastanX
2021-11-29 11:15     ` Tvrtko Ursulin
2021-11-30 16:48       ` Matthew Brost
2021-12-01  9:46         ` Tvrtko Ursulin
2021-12-01 11:36           ` Radoslaw Szwichtenberg
2021-12-01 11:46             ` Daniel Vetter
2021-12-01 12:14               ` Tvrtko Ursulin
2021-12-01 23:57               ` Matthew Brost
2021-12-01 23:42           ` Matthew Brost
2021-12-02  9:19             ` Petri Latvala
2021-12-02  9:20             ` Tvrtko Ursulin
2021-12-03 19:36             ` Matthew Brost
2021-12-09  8:20 Mastan Katragadda
2021-12-10  3:20 ` Matthew Brost
2021-12-10 10:02 ` Tvrtko Ursulin
2021-12-22  9:30   ` Tvrtko Ursulin
2021-12-22 21:02     ` John Harrison
2021-12-23 11:08       ` Tvrtko Ursulin
2021-12-23 11:16         ` Surendrakumar Upadhyay, TejaskumarX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.