✗ CI.Patch_applied: failure for tests/xe_exec_threads: Make hang tests reset domain aware

All of lore.kernel.org
 help / color / mirror / Atom feed

* ✗ CI.Patch_applied: failure for tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
@ 2024-04-02 12:15 ` Patchwork
  2024-04-02 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-04-02 12:15 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: intel-xe

== Series Details ==

Series: tests/xe_exec_threads: Make hang tests reset domain aware
URL   : https://patchwork.freedesktop.org/series/131937/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: f54ea7473cd1 drm-tip: 2024y-04m-02d-09h-16m-22s UTC integration manifest
=== git am output follows ===
error: tests/intel/xe_exec_threads.c: does not exist in index
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: tests/xe_exec_threads: Make hang tests reset domain aware
Patch failed at 0001 tests/xe_exec_threads: Make hang tests reset domain aware
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
@ 2024-04-02 12:22 Tejas Upadhyay
  2024-04-02 12:15 ` ✗ CI.Patch_applied: failure for " Patchwork
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Tejas Upadhyay @ 2024-04-02 12:22 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-xe, Matthew Brost, Tejas Upadhyay

RCS/CCS are dependent engines as they are sharing reset
domain. Whenever there is reset from CCS, all the exec queues
running on RCS are victimised mainly on Lunarlake.

Lets skip parallel execution on CCS with RCS.

It helps in fixing following errors:
1. Test assertion failure function test_legacy_mode, file, Failed assertion: data[i].data == 0xc0ffee

2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
index 8083980f9..31af61dc9 100644
--- a/tests/intel/xe_exec_threads.c
+++ b/tests/intel/xe_exec_threads.c
@@ -710,6 +710,17 @@ static void *thread(void *data)
 	return NULL;
 }
 
+static bool is_engine_contexts_victimized(int fd, unsigned int flags)
+{
+	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
+		return false;
+
+	if (flags & HANG)
+		return true;
+
+	return false;
+}
+
 /**
  * SUBTEST: threads-%s
  * Description: Run threads %arg[1] test with multi threads
@@ -955,9 +966,13 @@ static void threads(int fd, int flags)
 	bool go = false;
 	int n_threads = 0;
 	int gt;
+	bool has_rcs = false;
 
-	xe_for_each_engine(fd, hwe)
+	xe_for_each_engine(fd, hwe) {
+		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
+			has_rcs = true;
 		++n_engines;
+	}
 
 	if (flags & BALANCER) {
 		xe_for_each_gt(fd, gt)
@@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
 	}
 
 	xe_for_each_engine(fd, hwe) {
+		/* RCS/CCS sharing reset domain hence dependent engines.
+		 * When CCS is doing reset, all the contexts of RCS are
+		 * victimized, so skip the compute engine avoiding
+		 * parallel execution with RCS
+		 */
+		if (has_rcs && hwe->engine_class == DRM_XE_ENGINE_CLASS_COMPUTE &&
+		    is_engine_contexts_victimized(fd, flags))
+			continue;
+
 		threads_data[i].mutex = &mutex;
 		threads_data[i].cond = &cond;
 #define ADDRESS_SHIFT	39
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* ✓ Fi.CI.BAT: success for tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
  2024-04-02 12:15 ` ✗ CI.Patch_applied: failure for " Patchwork
@ 2024-04-02 15:42 ` Patchwork
  2024-04-02 16:36 ` ✓ CI.xeBAT: " Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-04-02 15:42 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 2277 bytes --]

== Series Details ==

Series: tests/xe_exec_threads: Make hang tests reset domain aware
URL   : https://patchwork.freedesktop.org/series/131938/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_14516 -> IGTPW_10965
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html

Participating hosts (37 -> 37)
------------------------------

  Additional (1): fi-kbl-8809g 
  Missing    (1): fi-snb-2520m 

Known issues
------------

  Here are the changes found in IGTPW_10965 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_force_connector_basic@force-edid:
    - bat-dg2-8:          [PASS][1] -> [INCOMPLETE][2] ([i915#10583] / [i915#2295])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/bat-dg2-8/igt@kms_force_connector_basic@force-edid.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/bat-dg2-8/igt@kms_force_connector_basic@force-edid.html
    - bat-dg2-9:          [PASS][3] -> [INCOMPLETE][4] ([i915#10583] / [i915#2295])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/bat-dg2-9/igt@kms_force_connector_basic@force-edid.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/bat-dg2-9/igt@kms_force_connector_basic@force-edid.html

  * igt@runner@aborted:
    - fi-kbl-8809g:       NOTRUN -> [FAIL][5] ([i915#4991])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/fi-kbl-8809g/igt@runner@aborted.html

  
  [i915#10583]: https://gitlab.freedesktop.org/drm/intel/issues/10583
  [i915#2295]: https://gitlab.freedesktop.org/drm/intel/issues/2295
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7796 -> IGTPW_10965

  CI-20190529: 20190529
  CI_DRM_14516: 5100fcc57dc5d45b246a0aeb068f4f8062d29b09 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_10965: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html
  IGT_7796: 2cfed18f6aa776c1593d7cc328d23225dd61bdf9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html

[-- Attachment #2: Type: text/html, Size: 3042 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✓ CI.xeBAT: success for tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
  2024-04-02 12:15 ` ✗ CI.Patch_applied: failure for " Patchwork
  2024-04-02 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2024-04-02 16:36 ` Patchwork
  2024-04-02 19:40 ` [PATCH V2 i-g-t] " Matt Roper
  2024-04-03  0:30 ` ✗ Fi.CI.IGT: failure for " Patchwork
  4 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-04-02 16:36 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 6093 bytes --]

== Series Details ==

Series: tests/xe_exec_threads: Make hang tests reset domain aware
URL   : https://patchwork.freedesktop.org/series/131938/
State : success

== Summary ==

CI Bug Log - changes from XEIGT_7796_BAT -> XEIGTPW_10965_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (5 -> 5)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in XEIGTPW_10965_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
    - bat-pvc-2:          NOTRUN -> [SKIP][1] ([i915#6077]) +30 other tests skip
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_addfb_basic@addfb25-x-tiled-legacy.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-atomic:
    - bat-pvc-2:          NOTRUN -> [SKIP][2] ([Intel XE#1024] / [Intel XE#782]) +5 other tests skip
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html

  * igt@kms_dsc@dsc-basic:
    - bat-pvc-2:          NOTRUN -> [SKIP][3] ([Intel XE#1024] / [Intel XE#784])
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_dsc@dsc-basic.html

  * igt@kms_flip@basic-flip-vs-wf_vblank:
    - bat-pvc-2:          NOTRUN -> [SKIP][4] ([Intel XE#1024] / [Intel XE#947]) +3 other tests skip
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_flip@basic-flip-vs-wf_vblank.html

  * igt@kms_force_connector_basic@force-connector-state:
    - bat-pvc-2:          NOTRUN -> [SKIP][5] ([Intel XE#540]) +3 other tests skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_frontbuffer_tracking@basic:
    - bat-pvc-2:          NOTRUN -> [SKIP][6] ([Intel XE#1024] / [Intel XE#783])
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@nonblocking-crc:
    - bat-pvc-2:          NOTRUN -> [SKIP][7] ([Intel XE#829]) +6 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_pipe_crc_basic@nonblocking-crc.html

  * igt@kms_prop_blob@basic:
    - bat-pvc-2:          NOTRUN -> [SKIP][8] ([Intel XE#780])
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_prop_blob@basic.html

  * igt@kms_psr@psr-cursor-plane-move:
    - bat-pvc-2:          NOTRUN -> [SKIP][9] ([Intel XE#1024]) +2 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@kms_psr@psr-cursor-plane-move.html

  * igt@xe_gt_freq@freq_range_idle:
    - bat-pvc-2:          NOTRUN -> [SKIP][10] ([Intel XE#1021]) +1 other test skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_gt_freq@freq_range_idle.html

  * igt@xe_huc_copy@huc_copy:
    - bat-pvc-2:          NOTRUN -> [SKIP][11] ([Intel XE#255])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_huc_copy@huc_copy.html

  * igt@xe_intel_bb@render:
    - bat-pvc-2:          NOTRUN -> [SKIP][12] ([Intel XE#532])
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_intel_bb@render.html

  * igt@xe_pat@pat-index-xe2:
    - bat-pvc-2:          NOTRUN -> [SKIP][13] ([Intel XE#977]) +1 other test skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_pat@pat-index-xe2.html

  * igt@xe_pat@pat-index-xehpc@render:
    - bat-pvc-2:          NOTRUN -> [SKIP][14] ([Intel XE#976])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_pat@pat-index-xehpc@render.html

  * igt@xe_pat@pat-index-xelpg:
    - bat-pvc-2:          NOTRUN -> [SKIP][15] ([Intel XE#979])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_pat@pat-index-xelpg.html

  * igt@xe_pm_residency@gt-c6-on-idle:
    - bat-pvc-2:          NOTRUN -> [SKIP][16] ([Intel XE#531])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/bat-pvc-2/igt@xe_pm_residency@gt-c6-on-idle.html

  
  [Intel XE#1021]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1021
  [Intel XE#1024]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1024
  [Intel XE#255]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/255
  [Intel XE#531]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/531
  [Intel XE#532]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/532
  [Intel XE#540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/540
  [Intel XE#780]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/780
  [Intel XE#782]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/782
  [Intel XE#783]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/783
  [Intel XE#784]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/784
  [Intel XE#829]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/829
  [Intel XE#947]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/947
  [Intel XE#976]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/976
  [Intel XE#977]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/977
  [Intel XE#979]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/979
  [i915#6077]: https://gitlab.freedesktop.org/drm/intel/issues/6077


Build changes
-------------

  * IGT: IGT_7796 -> IGTPW_10965
  * Linux: xe-1025-f54ea7473cd118eb39978f2e946b17558b5ff46d -> xe-1026-5100fcc57dc5d45b246a0aeb068f4f8062d29b09

  IGTPW_10965: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html
  IGT_7796: 2cfed18f6aa776c1593d7cc328d23225dd61bdf9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-1025-f54ea7473cd118eb39978f2e946b17558b5ff46d: f54ea7473cd118eb39978f2e946b17558b5ff46d
  xe-1026-5100fcc57dc5d45b246a0aeb068f4f8062d29b09: 5100fcc57dc5d45b246a0aeb068f4f8062d29b09

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_10965/index.html

[-- Attachment #2: Type: text/html, Size: 7316 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
                   ` (2 preceding siblings ...)
  2024-04-02 16:36 ` ✓ CI.xeBAT: " Patchwork
@ 2024-04-02 19:40 ` Matt Roper
  2024-04-02 20:55   ` Lucas De Marchi
  2024-04-03  0:30 ` ✗ Fi.CI.IGT: failure for " Patchwork
  4 siblings, 1 reply; 21+ messages in thread
From: Matt Roper @ 2024-04-02 19:40 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: igt-dev, intel-xe, Matthew Brost

On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> RCS/CCS are dependent engines as they are sharing reset
> domain. Whenever there is reset from CCS, all the exec queues
> running on RCS are victimised mainly on Lunarlake.
> 
> Lets skip parallel execution on CCS with RCS.

I haven't really looked at this specific test in detail, but based on
your explanation here, you're also going to run into problems with
multiple CCS engines since they all share the same reset.  You won't see
that on platforms like LNL that only have a single CCS, but platforms
like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
one kills anything running on the others.


Matt

> 
> It helps in fixing following errors:
> 1. Test assertion failure function test_legacy_mode, file, Failed assertion: data[i].data == 0xc0ffee
> 
> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
> index 8083980f9..31af61dc9 100644
> --- a/tests/intel/xe_exec_threads.c
> +++ b/tests/intel/xe_exec_threads.c
> @@ -710,6 +710,17 @@ static void *thread(void *data)
>  	return NULL;
>  }
>  
> +static bool is_engine_contexts_victimized(int fd, unsigned int flags)
> +{
> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> +		return false;
> +
> +	if (flags & HANG)
> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * SUBTEST: threads-%s
>   * Description: Run threads %arg[1] test with multi threads
> @@ -955,9 +966,13 @@ static void threads(int fd, int flags)
>  	bool go = false;
>  	int n_threads = 0;
>  	int gt;
> +	bool has_rcs = false;
>  
> -	xe_for_each_engine(fd, hwe)
> +	xe_for_each_engine(fd, hwe) {
> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> +			has_rcs = true;
>  		++n_engines;
> +	}
>  
>  	if (flags & BALANCER) {
>  		xe_for_each_gt(fd, gt)
> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>  	}
>  
>  	xe_for_each_engine(fd, hwe) {
> +		/* RCS/CCS sharing reset domain hence dependent engines.
> +		 * When CCS is doing reset, all the contexts of RCS are
> +		 * victimized, so skip the compute engine avoiding
> +		 * parallel execution with RCS
> +		 */
> +		if (has_rcs && hwe->engine_class == DRM_XE_ENGINE_CLASS_COMPUTE &&
> +		    is_engine_contexts_victimized(fd, flags))
> +			continue;
> +
>  		threads_data[i].mutex = &mutex;
>  		threads_data[i].cond = &cond;
>  #define ADDRESS_SHIFT	39
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 19:40 ` [PATCH V2 i-g-t] " Matt Roper
@ 2024-04-02 20:55   ` Lucas De Marchi
  2024-04-03  5:35     ` Upadhyay, Tejas
  0 siblings, 1 reply; 21+ messages in thread
From: Lucas De Marchi @ 2024-04-02 20:55 UTC (permalink / raw)
  To: Matt Roper; +Cc: Tejas Upadhyay, igt-dev, intel-xe, Matthew Brost

On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>> RCS/CCS are dependent engines as they are sharing reset
>> domain. Whenever there is reset from CCS, all the exec queues
>> running on RCS are victimised mainly on Lunarlake.
>>
>> Lets skip parallel execution on CCS with RCS.
>
>I haven't really looked at this specific test in detail, but based on
>your explanation here, you're also going to run into problems with
>multiple CCS engines since they all share the same reset.  You won't see
>that on platforms like LNL that only have a single CCS, but platforms

but it is seen on LNL because of having both RCS and CCS.

>like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
>one kills anything running on the others.
>
>
>Matt
>
>>
>> It helps in fixing following errors:
>> 1. Test assertion failure function test_legacy_mode, file, Failed assertion: data[i].data == 0xc0ffee
>>
>> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
>>
>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> ---
>>  tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
>> index 8083980f9..31af61dc9 100644
>> --- a/tests/intel/xe_exec_threads.c
>> +++ b/tests/intel/xe_exec_threads.c
>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>  	return NULL;
>>  }
>>
>> +static bool is_engine_contexts_victimized(int fd, unsigned int flags)
>> +{
>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>> +		return false;

as above, I don't think we should add any platform check here. It's
impossible to keep it up to date and it's also testing the wrong thing.
AFAIU you don't want parallel submission on engines that share the same
reset domain. So, this is actually what should be tested.

Lucas De Marchi

>> +
>> +	if (flags & HANG)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  /**
>>   * SUBTEST: threads-%s
>>   * Description: Run threads %arg[1] test with multi threads
>> @@ -955,9 +966,13 @@ static void threads(int fd, int flags)
>>  	bool go = false;
>>  	int n_threads = 0;
>>  	int gt;
>> +	bool has_rcs = false;
>>
>> -	xe_for_each_engine(fd, hwe)
>> +	xe_for_each_engine(fd, hwe) {
>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>> +			has_rcs = true;
>>  		++n_engines;
>> +	}
>>
>>  	if (flags & BALANCER) {
>>  		xe_for_each_gt(fd, gt)
>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>  	}
>>
>>  	xe_for_each_engine(fd, hwe) {
>> +		/* RCS/CCS sharing reset domain hence dependent engines.
>> +		 * When CCS is doing reset, all the contexts of RCS are
>> +		 * victimized, so skip the compute engine avoiding
>> +		 * parallel execution with RCS
>> +		 */
>> +		if (has_rcs && hwe->engine_class == DRM_XE_ENGINE_CLASS_COMPUTE &&
>> +		    is_engine_contexts_victimized(fd, flags))
>> +			continue;
>> +
>>  		threads_data[i].mutex = &mutex;
>>  		threads_data[i].cond = &cond;
>>  #define ADDRESS_SHIFT	39
>> --
>> 2.25.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✗ Fi.CI.IGT: failure for tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
                   ` (3 preceding siblings ...)
  2024-04-02 19:40 ` [PATCH V2 i-g-t] " Matt Roper
@ 2024-04-03  0:30 ` Patchwork
  4 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-04-03  0:30 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: igt-dev

[-- Attachment #1: Type: text/plain, Size: 93871 bytes --]

== Series Details ==

Series: tests/xe_exec_threads: Make hang tests reset domain aware
URL   : https://patchwork.freedesktop.org/series/131938/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_14516_full -> IGTPW_10965_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with IGTPW_10965_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_10965_full, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html

Participating hosts (9 -> 9)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in IGTPW_10965_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_isolation@preservation-s3@vcs1:
    - shard-mtlp:         NOTRUN -> [DMESG-WARN][1] +8 other tests dmesg-warn
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@gem_ctx_isolation@preservation-s3@vcs1.html

  * igt@gem_eio@in-flight-suspend:
    - shard-tglu:         NOTRUN -> [DMESG-WARN][2] +1 other test dmesg-warn
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-7/igt@gem_eio@in-flight-suspend.html

  * igt@gem_workarounds@suspend-resume-context:
    - shard-dg1:          NOTRUN -> [DMESG-WARN][3] +1 other test dmesg-warn
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_workarounds@suspend-resume-context.html

  * igt@kms_flip@2x-flip-vs-suspend@bc-hdmi-a1-hdmi-a2:
    - shard-glk:          NOTRUN -> [DMESG-WARN][4] +13 other tests dmesg-warn
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk3/igt@kms_flip@2x-flip-vs-suspend@bc-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a4:
    - shard-dg1:          [PASS][5] -> [FAIL][6] +2 other tests fail
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg1-18/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a4.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a4.html

  * igt@kms_pm_dc@dc6-psr:
    - shard-mtlp:         NOTRUN -> [SKIP][7] +2 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_pm_dc@dc6-psr.html

  * igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait:
    - shard-rkl:          NOTRUN -> [SKIP][8] +11 other tests skip
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait.html

  * igt@kms_pm_rpm@system-suspend-modeset:
    - shard-tglu:         NOTRUN -> [SKIP][9] +1 other test skip
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@kms_pm_rpm@system-suspend-modeset.html

  * igt@kms_vblank@ts-continuation-dpms-suspend@pipe-d-hdmi-a-1:
    - shard-dg2:          NOTRUN -> [DMESG-WARN][10] +18 other tests dmesg-warn
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_vblank@ts-continuation-dpms-suspend@pipe-d-hdmi-a-1.html

  * igt@kms_vblank@ts-continuation-modeset-rpm@pipe-d-dp-4:
    - shard-dg2:          NOTRUN -> [SKIP][11] +10 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@kms_vblank@ts-continuation-modeset-rpm@pipe-d-dp-4.html

  * igt@kms_vblank@ts-continuation-modeset-rpm@pipe-d-hdmi-a-3:
    - shard-dg1:          NOTRUN -> [SKIP][12] +8 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_vblank@ts-continuation-modeset-rpm@pipe-d-hdmi-a-3.html

  * igt@kms_vblank@ts-continuation-suspend@pipe-b-hdmi-a-1:
    - shard-rkl:          NOTRUN -> [DMESG-WARN][13] +7 other tests dmesg-warn
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_vblank@ts-continuation-suspend@pipe-b-hdmi-a-1.html

  
#### Warnings ####

  * igt@i915_suspend@debugfs-reader:
    - shard-dg2:          [FAIL][14] ([i915#10031]) -> [DMESG-WARN][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-2/igt@i915_suspend@debugfs-reader.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@i915_suspend@debugfs-reader.html

  
Known issues
------------

  Here are the changes found in IGTPW_10965_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@api_intel_bb@blit-reloc-purge-cache:
    - shard-dg1:          NOTRUN -> [SKIP][16] ([i915#8411])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@api_intel_bb@blit-reloc-purge-cache.html

  * igt@api_intel_bb@object-reloc-purge-cache:
    - shard-dg2:          NOTRUN -> [SKIP][17] ([i915#8411])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@api_intel_bb@object-reloc-purge-cache.html

  * igt@debugfs_test@basic-hwmon:
    - shard-rkl:          NOTRUN -> [SKIP][18] ([i915#9318])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@debugfs_test@basic-hwmon.html

  * igt@device_reset@cold-reset-bound:
    - shard-dg2:          NOTRUN -> [SKIP][19] ([i915#7701])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@device_reset@cold-reset-bound.html

  * igt@device_reset@unbind-reset-rebind:
    - shard-dg1:          NOTRUN -> [INCOMPLETE][20] ([i915#9408] / [i915#9618])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@device_reset@unbind-reset-rebind.html

  * igt@drm_fdinfo@all-busy-check-all:
    - shard-mtlp:         NOTRUN -> [SKIP][21] ([i915#8414])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@drm_fdinfo@all-busy-check-all.html

  * igt@drm_fdinfo@busy-idle@vcs1:
    - shard-dg1:          NOTRUN -> [SKIP][22] ([i915#8414]) +5 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@drm_fdinfo@busy-idle@vcs1.html

  * igt@drm_fdinfo@most-busy-check-all@bcs0:
    - shard-dg2:          NOTRUN -> [SKIP][23] ([i915#8414]) +8 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@drm_fdinfo@most-busy-check-all@bcs0.html

  * igt@drm_fdinfo@virtual-idle:
    - shard-rkl:          NOTRUN -> [FAIL][24] ([i915#7742])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@drm_fdinfo@virtual-idle.html

  * igt@gem_caching@read-writes:
    - shard-mtlp:         NOTRUN -> [SKIP][25] ([i915#4873])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@gem_caching@read-writes.html

  * igt@gem_ccs@block-copy-compressed:
    - shard-rkl:          NOTRUN -> [SKIP][26] ([i915#3555] / [i915#9323])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@gem_ccs@block-copy-compressed.html

  * igt@gem_ccs@block-multicopy-compressed:
    - shard-rkl:          NOTRUN -> [SKIP][27] ([i915#9323])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@gem_ccs@block-multicopy-compressed.html

  * igt@gem_ccs@block-multicopy-inplace:
    - shard-dg1:          NOTRUN -> [SKIP][28] ([i915#3555] / [i915#9323])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_ccs@block-multicopy-inplace.html
    - shard-tglu:         NOTRUN -> [SKIP][29] ([i915#3555] / [i915#9323])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-3/igt@gem_ccs@block-multicopy-inplace.html

  * igt@gem_ccs@ctrl-surf-copy-new-ctx:
    - shard-mtlp:         NOTRUN -> [SKIP][30] ([i915#9323])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@gem_ccs@ctrl-surf-copy-new-ctx.html

  * igt@gem_close_race@multigpu-basic-threads:
    - shard-dg1:          NOTRUN -> [SKIP][31] ([i915#7697])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_close_race@multigpu-basic-threads.html

  * igt@gem_create@create-ext-set-pat:
    - shard-dg1:          NOTRUN -> [SKIP][32] ([i915#8562])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@gem_create@create-ext-set-pat.html

  * igt@gem_ctx_persistence@hang:
    - shard-mtlp:         NOTRUN -> [SKIP][33] ([i915#8555])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@gem_ctx_persistence@hang.html

  * igt@gem_ctx_persistence@heartbeat-many:
    - shard-dg1:          NOTRUN -> [SKIP][34] ([i915#8555])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@gem_ctx_persistence@heartbeat-many.html

  * igt@gem_ctx_persistence@heartbeat-stop:
    - shard-dg2:          NOTRUN -> [SKIP][35] ([i915#8555])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@gem_ctx_persistence@heartbeat-stop.html

  * igt@gem_ctx_sseu@invalid-args:
    - shard-mtlp:         NOTRUN -> [SKIP][36] ([i915#280])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@gem_ctx_sseu@invalid-args.html

  * igt@gem_ctx_sseu@invalid-sseu:
    - shard-dg2:          NOTRUN -> [SKIP][37] ([i915#280])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@gem_ctx_sseu@invalid-sseu.html

  * igt@gem_eio@kms:
    - shard-dg2:          NOTRUN -> [INCOMPLETE][38] ([i915#10513])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@gem_eio@kms.html
    - shard-dg1:          NOTRUN -> [INCOMPLETE][39] ([i915#10513])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_eio@kms.html

  * igt@gem_exec_balancer@bonded-pair:
    - shard-dg2:          NOTRUN -> [SKIP][40] ([i915#4771])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@gem_exec_balancer@bonded-pair.html
    - shard-dg1:          NOTRUN -> [SKIP][41] ([i915#4771])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@gem_exec_balancer@bonded-pair.html

  * igt@gem_exec_balancer@hog:
    - shard-dg2:          NOTRUN -> [SKIP][42] ([i915#4812])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_exec_balancer@hog.html

  * igt@gem_exec_balancer@parallel-ordering:
    - shard-tglu:         NOTRUN -> [FAIL][43] ([i915#6117])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-6/igt@gem_exec_balancer@parallel-ordering.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          NOTRUN -> [FAIL][44] ([i915#2846])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk3/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-tglu:         NOTRUN -> [FAIL][45] ([i915#2842])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@gem_exec_fair@basic-none-solo@rcs0.html
    - shard-rkl:          [PASS][46] -> [FAIL][47] ([i915#2842])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-rkl-4/igt@gem_exec_fair@basic-none-solo@rcs0.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-none-vip:
    - shard-dg2:          NOTRUN -> [SKIP][48] ([i915#3539] / [i915#4852]) +1 other test skip
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_exec_fair@basic-none-vip.html

  * igt@gem_exec_fair@basic-none@bcs0:
    - shard-rkl:          NOTRUN -> [FAIL][49] ([i915#2842]) +3 other tests fail
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-3/igt@gem_exec_fair@basic-none@bcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][50] -> [FAIL][51] ([i915#2842])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-glk8/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk8/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo:
    - shard-dg2:          NOTRUN -> [SKIP][52] ([i915#3539]) +2 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@gem_exec_fair@basic-pace-solo.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-tglu:         [PASS][53] -> [FAIL][54] ([i915#2842])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-8/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_fence@submit:
    - shard-mtlp:         NOTRUN -> [SKIP][55] ([i915#4812])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-5/igt@gem_exec_fence@submit.html

  * igt@gem_exec_flush@basic-wb-rw-before-default:
    - shard-dg1:          NOTRUN -> [SKIP][56] ([i915#3539] / [i915#4852])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@gem_exec_flush@basic-wb-rw-before-default.html

  * igt@gem_exec_params@rsvd2-dirt:
    - shard-dg2:          NOTRUN -> [SKIP][57] ([i915#5107])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@gem_exec_params@rsvd2-dirt.html

  * igt@gem_exec_reloc@basic-cpu-read:
    - shard-dg2:          NOTRUN -> [SKIP][58] ([i915#3281]) +13 other tests skip
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@gem_exec_reloc@basic-cpu-read.html

  * igt@gem_exec_reloc@basic-cpu-read-noreloc:
    - shard-mtlp:         NOTRUN -> [SKIP][59] ([i915#3281]) +7 other tests skip
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@gem_exec_reloc@basic-cpu-read-noreloc.html

  * igt@gem_exec_reloc@basic-cpu-wc:
    - shard-dg1:          NOTRUN -> [SKIP][60] ([i915#3281]) +12 other tests skip
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_exec_reloc@basic-cpu-wc.html

  * igt@gem_exec_reloc@basic-write-read-active:
    - shard-rkl:          NOTRUN -> [SKIP][61] ([i915#3281]) +7 other tests skip
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@gem_exec_reloc@basic-write-read-active.html

  * igt@gem_exec_schedule@preempt-queue-contexts:
    - shard-dg1:          NOTRUN -> [SKIP][62] ([i915#4812])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_exec_schedule@preempt-queue-contexts.html
    - shard-dg2:          NOTRUN -> [SKIP][63] ([i915#4537] / [i915#4812]) +1 other test skip
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@gem_exec_schedule@preempt-queue-contexts.html

  * igt@gem_exec_schedule@reorder-wide:
    - shard-mtlp:         NOTRUN -> [SKIP][64] ([i915#4537] / [i915#4812])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@gem_exec_schedule@reorder-wide.html

  * igt@gem_exec_suspend@basic-s4-devices@smem:
    - shard-rkl:          NOTRUN -> [ABORT][65] ([i915#7975] / [i915#8213])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@gem_exec_suspend@basic-s4-devices@smem.html

  * igt@gem_fence_thrash@bo-write-verify-x:
    - shard-dg2:          NOTRUN -> [SKIP][66] ([i915#4860]) +2 other tests skip
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_fence_thrash@bo-write-verify-x.html

  * igt@gem_fenced_exec_thrash@too-many-fences:
    - shard-mtlp:         NOTRUN -> [SKIP][67] ([i915#4860]) +1 other test skip
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@gem_fenced_exec_thrash@too-many-fences.html

  * igt@gem_huc_copy@huc-copy:
    - shard-rkl:          NOTRUN -> [SKIP][68] ([i915#2190])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-glk:          NOTRUN -> [SKIP][69] ([i915#4613]) +4 other tests skip
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk5/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_lmem_swapping@heavy-verify-random-ccs@lmem0:
    - shard-dg2:          NOTRUN -> [FAIL][70] ([i915#10378])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@gem_lmem_swapping@heavy-verify-random-ccs@lmem0.html

  * igt@gem_lmem_swapping@heavy-verify-random@lmem0:
    - shard-dg1:          NOTRUN -> [FAIL][71] ([i915#10378])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_lmem_swapping@heavy-verify-random@lmem0.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - shard-mtlp:         NOTRUN -> [SKIP][72] ([i915#4613])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@gem_lmem_swapping@parallel-random-verify-ccs:
    - shard-rkl:          NOTRUN -> [SKIP][73] ([i915#4613]) +4 other tests skip
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@gem_lmem_swapping@parallel-random-verify-ccs.html

  * igt@gem_lmem_swapping@verify-random:
    - shard-tglu:         NOTRUN -> [SKIP][74] ([i915#4613]) +1 other test skip
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@gem_lmem_swapping@verify-random.html

  * igt@gem_lmem_swapping@verify-random-ccs@lmem0:
    - shard-dg1:          NOTRUN -> [SKIP][75] ([i915#4565]) +1 other test skip
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_lmem_swapping@verify-random-ccs@lmem0.html

  * igt@gem_media_vme:
    - shard-dg2:          NOTRUN -> [SKIP][76] ([i915#284])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_media_vme.html

  * igt@gem_mmap_gtt@basic-small-bo:
    - shard-dg2:          NOTRUN -> [SKIP][77] ([i915#4077]) +7 other tests skip
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_mmap_gtt@basic-small-bo.html

  * igt@gem_mmap_gtt@cpuset-basic-small-copy-odd:
    - shard-dg1:          NOTRUN -> [SKIP][78] ([i915#4077]) +11 other tests skip
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-16/igt@gem_mmap_gtt@cpuset-basic-small-copy-odd.html

  * igt@gem_mmap_gtt@cpuset-medium-copy-odd:
    - shard-mtlp:         NOTRUN -> [SKIP][79] ([i915#4077]) +5 other tests skip
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@gem_mmap_gtt@cpuset-medium-copy-odd.html

  * igt@gem_mmap_wc@fault-concurrent:
    - shard-dg2:          NOTRUN -> [SKIP][80] ([i915#4083]) +4 other tests skip
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@gem_mmap_wc@fault-concurrent.html

  * igt@gem_mmap_wc@read-write:
    - shard-mtlp:         NOTRUN -> [SKIP][81] ([i915#4083])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-5/igt@gem_mmap_wc@read-write.html

  * igt@gem_mmap_wc@write-read:
    - shard-dg1:          NOTRUN -> [SKIP][82] ([i915#4083]) +5 other tests skip
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_mmap_wc@write-read.html

  * igt@gem_partial_pwrite_pread@reads-uncached:
    - shard-dg2:          NOTRUN -> [SKIP][83] ([i915#3282]) +7 other tests skip
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-3/igt@gem_partial_pwrite_pread@reads-uncached.html

  * igt@gem_partial_pwrite_pread@write-snoop:
    - shard-mtlp:         NOTRUN -> [SKIP][84] ([i915#3282]) +2 other tests skip
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-1/igt@gem_partial_pwrite_pread@write-snoop.html

  * igt@gem_pread@snoop:
    - shard-dg1:          NOTRUN -> [SKIP][85] ([i915#3282]) +2 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_pread@snoop.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-rkl:          NOTRUN -> [SKIP][86] ([i915#3282]) +4 other tests skip
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_pxp@create-regular-buffer:
    - shard-mtlp:         NOTRUN -> [SKIP][87] ([i915#4270])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@gem_pxp@create-regular-buffer.html

  * igt@gem_pxp@dmabuf-shared-protected-dst-is-context-refcounted:
    - shard-dg2:          NOTRUN -> [SKIP][88] ([i915#4270]) +2 other tests skip
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_pxp@dmabuf-shared-protected-dst-is-context-refcounted.html

  * igt@gem_pxp@reject-modify-context-protection-off-1:
    - shard-tglu:         NOTRUN -> [SKIP][89] ([i915#4270]) +2 other tests skip
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-4/igt@gem_pxp@reject-modify-context-protection-off-1.html

  * igt@gem_pxp@reject-modify-context-protection-off-3:
    - shard-dg1:          NOTRUN -> [SKIP][90] ([i915#4270]) +4 other tests skip
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gem_pxp@reject-modify-context-protection-off-3.html

  * igt@gem_pxp@reject-modify-context-protection-on:
    - shard-rkl:          NOTRUN -> [SKIP][91] ([i915#4270])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@gem_pxp@reject-modify-context-protection-on.html

  * igt@gem_render_copy@y-tiled-ccs-to-linear:
    - shard-dg2:          NOTRUN -> [SKIP][92] ([i915#5190] / [i915#8428]) +11 other tests skip
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-3/igt@gem_render_copy@y-tiled-ccs-to-linear.html

  * igt@gem_render_copy@y-tiled-to-vebox-y-tiled:
    - shard-mtlp:         NOTRUN -> [SKIP][93] ([i915#8428]) +3 other tests skip
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-4/igt@gem_render_copy@y-tiled-to-vebox-y-tiled.html

  * igt@gem_render_tiled_blits@basic:
    - shard-mtlp:         NOTRUN -> [SKIP][94] ([i915#4079])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@gem_render_tiled_blits@basic.html

  * igt@gem_set_tiling_vs_blt@tiled-to-tiled:
    - shard-rkl:          NOTRUN -> [SKIP][95] ([i915#8411]) +1 other test skip
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@gem_set_tiling_vs_blt@tiled-to-tiled.html

  * igt@gem_tiled_pread_pwrite:
    - shard-dg2:          NOTRUN -> [SKIP][96] ([i915#4079])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@gem_tiled_pread_pwrite.html

  * igt@gem_userptr_blits@coherency-sync:
    - shard-dg2:          NOTRUN -> [SKIP][97] ([i915#3297]) +2 other tests skip
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@gem_userptr_blits@coherency-sync.html

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy:
    - shard-dg1:          NOTRUN -> [SKIP][98] ([i915#3297] / [i915#4880]) +1 other test skip
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html

  * igt@gem_userptr_blits@mmap-offset-banned@gtt:
    - shard-mtlp:         NOTRUN -> [SKIP][99] ([i915#3297])
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-1/igt@gem_userptr_blits@mmap-offset-banned@gtt.html

  * igt@gem_userptr_blits@unsync-unmap-after-close:
    - shard-dg1:          NOTRUN -> [SKIP][100] ([i915#3297])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@gem_userptr_blits@unsync-unmap-after-close.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-dg2:          NOTRUN -> [SKIP][101] ([i915#2856]) +3 other tests skip
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gen9_exec_parse@allowed-single.html

  * igt@gen9_exec_parse@batch-zero-length:
    - shard-mtlp:         NOTRUN -> [SKIP][102] ([i915#2856])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@gen9_exec_parse@batch-zero-length.html

  * igt@gen9_exec_parse@bb-chained:
    - shard-rkl:          NOTRUN -> [SKIP][103] ([i915#2527]) +1 other test skip
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@gen9_exec_parse@bb-chained.html

  * igt@gen9_exec_parse@bb-start-param:
    - shard-dg1:          NOTRUN -> [SKIP][104] ([i915#2527])
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@gen9_exec_parse@bb-start-param.html

  * igt@gen9_exec_parse@secure-batches:
    - shard-tglu:         NOTRUN -> [SKIP][105] ([i915#2527] / [i915#2856])
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-6/igt@gen9_exec_parse@secure-batches.html

  * igt@i915_fb_tiling:
    - shard-dg2:          NOTRUN -> [SKIP][106] ([i915#4881])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@i915_fb_tiling.html
    - shard-dg1:          NOTRUN -> [SKIP][107] ([i915#4881])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-16/igt@i915_fb_tiling.html

  * igt@i915_module_load@load:
    - shard-glk:          NOTRUN -> [SKIP][108] ([i915#6227])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk8/igt@i915_module_load@load.html
    - shard-dg2:          NOTRUN -> [SKIP][109] ([i915#6227])
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@i915_module_load@load.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-glk:          [PASS][110] -> [INCOMPLETE][111] ([i915#9849])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-glk8/igt@i915_module_load@reload-with-fault-injection.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk8/igt@i915_module_load@reload-with-fault-injection.html

  * igt@i915_pipe_stress@stress-xrgb8888-ytiled:
    - shard-dg2:          NOTRUN -> [SKIP][112] ([i915#7091])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@i915_pipe_stress@stress-xrgb8888-ytiled.html

  * igt@i915_pm_freq_api@freq-suspend:
    - shard-tglu:         NOTRUN -> [SKIP][113] ([i915#8399])
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@i915_pm_freq_api@freq-suspend.html

  * igt@i915_pm_freq_mult@media-freq@gt0:
    - shard-dg1:          NOTRUN -> [SKIP][114] ([i915#6590])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@i915_pm_freq_mult@media-freq@gt0.html

  * igt@i915_pm_rps@min-max-config-idle:
    - shard-mtlp:         NOTRUN -> [SKIP][115] ([i915#6621])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@i915_pm_rps@min-max-config-idle.html

  * igt@i915_pm_rps@thresholds-park@gt0:
    - shard-dg2:          NOTRUN -> [SKIP][116] ([i915#8925])
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@i915_pm_rps@thresholds-park@gt0.html

  * igt@i915_power@sanity:
    - shard-rkl:          NOTRUN -> [SKIP][117] ([i915#7984])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@i915_power@sanity.html

  * igt@i915_query@test-query-geometry-subslices:
    - shard-dg1:          NOTRUN -> [SKIP][118] ([i915#5723])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@i915_query@test-query-geometry-subslices.html

  * igt@i915_selftest@live@gt_contexts:
    - shard-dg2:          NOTRUN -> [ABORT][119] ([i915#10366])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@i915_selftest@live@gt_contexts.html

  * igt@i915_suspend@basic-s3-without-i915:
    - shard-rkl:          [PASS][120] -> [FAIL][121] ([i915#10031])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-rkl-4/igt@i915_suspend@basic-s3-without-i915.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@i915_suspend@basic-s3-without-i915.html
    - shard-tglu:         NOTRUN -> [INCOMPLETE][122] ([i915#7443])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-7/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_addfb_basic@bo-too-small-due-to-tiling:
    - shard-dg1:          NOTRUN -> [SKIP][123] ([i915#4212]) +3 other tests skip
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_addfb_basic@bo-too-small-due-to-tiling.html

  * igt@kms_addfb_basic@framebuffer-vs-set-tiling:
    - shard-dg2:          NOTRUN -> [SKIP][124] ([i915#4212]) +1 other test skip
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-3/igt@kms_addfb_basic@framebuffer-vs-set-tiling.html

  * igt@kms_async_flips@async-flip-with-page-flip-events@pipe-b-hdmi-a-2-y-rc-ccs-cc:
    - shard-rkl:          NOTRUN -> [SKIP][125] ([i915#8709]) +3 other tests skip
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_async_flips@async-flip-with-page-flip-events@pipe-b-hdmi-a-2-y-rc-ccs-cc.html

  * igt@kms_async_flips@invalid-async-flip:
    - shard-dg2:          NOTRUN -> [SKIP][126] ([i915#6228])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_async_flips@invalid-async-flip.html

  * igt@kms_atomic@plane-primary-overlay-mutable-zpos:
    - shard-dg2:          NOTRUN -> [SKIP][127] ([i915#9531])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_atomic@plane-primary-overlay-mutable-zpos.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels:
    - shard-tglu:         NOTRUN -> [SKIP][128] ([i915#1769] / [i915#3555])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-8/igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels.html
    - shard-rkl:          NOTRUN -> [SKIP][129] ([i915#1769] / [i915#3555])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-internal-panels:
    - shard-glk:          NOTRUN -> [SKIP][130] ([i915#1769])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk5/igt@kms_atomic_transition@plane-all-modeset-transition-internal-panels.html

  * igt@kms_big_fb@4-tiled-16bpp-rotate-90:
    - shard-rkl:          NOTRUN -> [SKIP][131] ([i915#5286]) +3 other tests skip
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-3/igt@kms_big_fb@4-tiled-16bpp-rotate-90.html

  * igt@kms_big_fb@4-tiled-32bpp-rotate-180:
    - shard-dg1:          NOTRUN -> [SKIP][132] ([i915#4538] / [i915#5286]) +5 other tests skip
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_big_fb@4-tiled-32bpp-rotate-180.html

  * igt@kms_big_fb@4-tiled-32bpp-rotate-270:
    - shard-tglu:         NOTRUN -> [SKIP][133] ([i915#5286]) +2 other tests skip
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-3/igt@kms_big_fb@4-tiled-32bpp-rotate-270.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0:
    - shard-mtlp:         [PASS][134] -> [FAIL][135] ([i915#5138]) +1 other test fail
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-mtlp-4/igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_big_fb@linear-32bpp-rotate-90:
    - shard-rkl:          NOTRUN -> [SKIP][136] ([i915#3638]) +2 other tests skip
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_big_fb@linear-32bpp-rotate-90.html

  * igt@kms_big_fb@linear-64bpp-rotate-90:
    - shard-dg1:          NOTRUN -> [SKIP][137] ([i915#3638]) +4 other tests skip
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_big_fb@linear-64bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip:
    - shard-tglu:         NOTRUN -> [FAIL][138] ([i915#3743])
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-6/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip.html

  * igt@kms_big_fb@y-tiled-addfb-size-offset-overflow:
    - shard-dg2:          NOTRUN -> [SKIP][139] ([i915#5190]) +4 other tests skip
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_big_fb@y-tiled-addfb-size-offset-overflow.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip:
    - shard-mtlp:         NOTRUN -> [SKIP][140] +9 other tests skip
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip.html
    - shard-tglu:         [PASS][141] -> [FAIL][142] ([i915#3743])
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-7/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip.html
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip:
    - shard-dg2:          NOTRUN -> [SKIP][143] ([i915#4538] / [i915#5190]) +13 other tests skip
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-8/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180:
    - shard-dg1:          NOTRUN -> [SKIP][144] ([i915#4538]) +4 other tests skip
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180.html

  * igt@kms_big_joiner@invalid-modeset:
    - shard-dg2:          NOTRUN -> [SKIP][145] ([i915#10656])
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_big_joiner@invalid-modeset.html

  * igt@kms_ccs@bad-aux-stride-y-tiled-gen12-rc-ccs@pipe-d-hdmi-a-1:
    - shard-dg2:          NOTRUN -> [SKIP][146] ([i915#10307] / [i915#10434] / [i915#6095]) +6 other tests skip
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_ccs@bad-aux-stride-y-tiled-gen12-rc-ccs@pipe-d-hdmi-a-1.html

  * igt@kms_ccs@bad-pixel-format-4-tiled-dg2-rc-ccs-cc@pipe-c-edp-1:
    - shard-mtlp:         NOTRUN -> [SKIP][147] ([i915#6095]) +23 other tests skip
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_ccs@bad-pixel-format-4-tiled-dg2-rc-ccs-cc@pipe-c-edp-1.html

  * igt@kms_ccs@bad-rotation-90-4-tiled-xe2-ccs:
    - shard-dg2:          NOTRUN -> [SKIP][148] ([i915#10278]) +1 other test skip
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_ccs@bad-rotation-90-4-tiled-xe2-ccs.html
    - shard-dg1:          NOTRUN -> [SKIP][149] ([i915#10278]) +1 other test skip
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_ccs@bad-rotation-90-4-tiled-xe2-ccs.html

  * igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-mc-ccs@pipe-b-hdmi-a-1:
    - shard-tglu:         NOTRUN -> [SKIP][150] ([i915#6095]) +15 other tests skip
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-6/igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-mc-ccs@pipe-b-hdmi-a-1.html

  * igt@kms_ccs@ccs-on-another-bo-yf-tiled-ccs@pipe-a-hdmi-a-3:
    - shard-dg2:          NOTRUN -> [SKIP][151] ([i915#10307] / [i915#6095]) +156 other tests skip
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@kms_ccs@ccs-on-another-bo-yf-tiled-ccs@pipe-a-hdmi-a-3.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-xe2-ccs:
    - shard-tglu:         NOTRUN -> [SKIP][152] ([i915#10278]) +1 other test skip
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-8/igt@kms_ccs@crc-primary-basic-4-tiled-xe2-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-a-hdmi-a-3:
    - shard-dg1:          NOTRUN -> [SKIP][153] ([i915#6095]) +67 other tests skip
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-a-hdmi-a-3.html

  * igt@kms_ccs@random-ccs-data-y-tiled-ccs@pipe-b-hdmi-a-1:
    - shard-rkl:          NOTRUN -> [SKIP][154] ([i915#6095]) +57 other tests skip
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@kms_ccs@random-ccs-data-y-tiled-ccs@pipe-b-hdmi-a-1.html

  * igt@kms_cdclk@mode-transition-all-outputs:
    - shard-rkl:          NOTRUN -> [SKIP][155] ([i915#3742])
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-2/igt@kms_cdclk@mode-transition-all-outputs.html

  * igt@kms_cdclk@mode-transition@pipe-a-dp-4:
    - shard-dg2:          NOTRUN -> [SKIP][156] ([i915#7213]) +3 other tests skip
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@kms_cdclk@mode-transition@pipe-a-dp-4.html

  * igt@kms_cdclk@plane-scaling:
    - shard-tglu:         NOTRUN -> [SKIP][157] ([i915#3742])
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@kms_cdclk@plane-scaling.html

  * igt@kms_chamelium_audio@hdmi-audio:
    - shard-dg2:          NOTRUN -> [SKIP][158] ([i915#7828]) +12 other tests skip
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_chamelium_audio@hdmi-audio.html

  * igt@kms_chamelium_audio@hdmi-audio-edid:
    - shard-dg1:          NOTRUN -> [SKIP][159] ([i915#7828]) +9 other tests skip
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-16/igt@kms_chamelium_audio@hdmi-audio-edid.html

  * igt@kms_chamelium_frames@hdmi-crc-single:
    - shard-rkl:          NOTRUN -> [SKIP][160] ([i915#7828]) +5 other tests skip
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_chamelium_frames@hdmi-crc-single.html

  * igt@kms_chamelium_hpd@dp-hpd-for-each-pipe:
    - shard-mtlp:         NOTRUN -> [SKIP][161] ([i915#7828]) +3 other tests skip
   [161]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_chamelium_hpd@dp-hpd-for-each-pipe.html

  * igt@kms_chamelium_hpd@hdmi-hpd-storm:
    - shard-tglu:         NOTRUN -> [SKIP][162] ([i915#7828]) +4 other tests skip
   [162]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-4/igt@kms_chamelium_hpd@hdmi-hpd-storm.html

  * igt@kms_color@deep-color:
    - shard-dg2:          NOTRUN -> [SKIP][163] ([i915#3555]) +6 other tests skip
   [163]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-8/igt@kms_color@deep-color.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-mtlp:         NOTRUN -> [SKIP][164] ([i915#6944] / [i915#9424])
   [164]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@content-type-change:
    - shard-dg2:          NOTRUN -> [SKIP][165] ([i915#9424])
   [165]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_content_protection@content-type-change.html

  * igt@kms_content_protection@dp-mst-lic-type-0:
    - shard-dg2:          NOTRUN -> [SKIP][166] ([i915#3299])
   [166]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_content_protection@dp-mst-lic-type-0.html

  * igt@kms_content_protection@dp-mst-lic-type-1:
    - shard-rkl:          NOTRUN -> [SKIP][167] ([i915#3116]) +1 other test skip
   [167]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@kms_content_protection@dp-mst-lic-type-1.html

  * igt@kms_content_protection@lic-type-1:
    - shard-dg1:          NOTRUN -> [SKIP][168] ([i915#9424])
   [168]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_content_protection@lic-type-1.html

  * igt@kms_content_protection@srm:
    - shard-rkl:          NOTRUN -> [SKIP][169] ([i915#7118])
   [169]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_content_protection@srm.html

  * igt@kms_content_protection@type1:
    - shard-dg1:          NOTRUN -> [SKIP][170] ([i915#7116] / [i915#9424]) +1 other test skip
   [170]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_content_protection@type1.html

  * igt@kms_content_protection@uevent:
    - shard-dg2:          NOTRUN -> [SKIP][171] ([i915#7118] / [i915#9424]) +1 other test skip
   [171]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_content_protection@uevent.html
    - shard-tglu:         NOTRUN -> [SKIP][172] ([i915#6944] / [i915#7116] / [i915#7118] / [i915#9424])
   [172]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-10/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@cursor-offscreen-32x10:
    - shard-rkl:          NOTRUN -> [SKIP][173] ([i915#3555]) +6 other tests skip
   [173]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_cursor_crc@cursor-offscreen-32x10.html

  * igt@kms_cursor_crc@cursor-onscreen-512x170:
    - shard-mtlp:         NOTRUN -> [SKIP][174] ([i915#3359])
   [174]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_cursor_crc@cursor-onscreen-512x170.html

  * igt@kms_cursor_crc@cursor-rapid-movement-512x512:
    - shard-rkl:          NOTRUN -> [SKIP][175] ([i915#3359])
   [175]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@kms_cursor_crc@cursor-rapid-movement-512x512.html

  * igt@kms_cursor_crc@cursor-rapid-movement-64x21:
    - shard-mtlp:         NOTRUN -> [SKIP][176] ([i915#8814]) +2 other tests skip
   [176]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-5/igt@kms_cursor_crc@cursor-rapid-movement-64x21.html

  * igt@kms_cursor_crc@cursor-rapid-movement-max-size:
    - shard-mtlp:         NOTRUN -> [SKIP][177] ([i915#3555] / [i915#8814])
   [177]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_cursor_crc@cursor-rapid-movement-max-size.html

  * igt@kms_cursor_crc@cursor-sliding-512x170:
    - shard-dg2:          NOTRUN -> [SKIP][178] ([i915#3359]) +3 other tests skip
   [178]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_cursor_crc@cursor-sliding-512x170.html

  * igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
    - shard-mtlp:         NOTRUN -> [SKIP][179] ([i915#9809]) +1 other test skip
   [179]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size:
    - shard-dg2:          NOTRUN -> [SKIP][180] ([i915#4103] / [i915#4213])
   [180]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html
    - shard-dg1:          NOTRUN -> [SKIP][181] ([i915#4103] / [i915#4213])
   [181]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html
    - shard-tglu:         NOTRUN -> [SKIP][182] ([i915#4103])
   [182]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-8/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html

  * igt@kms_dirtyfb@drrs-dirtyfb-ioctl:
    - shard-dg2:          NOTRUN -> [SKIP][183] ([i915#9833])
   [183]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-3/igt@kms_dirtyfb@drrs-dirtyfb-ioctl.html

  * igt@kms_dirtyfb@fbc-dirtyfb-ioctl@a-hdmi-a-1:
    - shard-dg2:          NOTRUN -> [SKIP][184] ([i915#9227])
   [184]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_dirtyfb@fbc-dirtyfb-ioctl@a-hdmi-a-1.html

  * igt@kms_dirtyfb@fbc-dirtyfb-ioctl@a-hdmi-a-4:
    - shard-dg1:          NOTRUN -> [SKIP][185] ([i915#9723])
   [185]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_dirtyfb@fbc-dirtyfb-ioctl@a-hdmi-a-4.html

  * igt@kms_dirtyfb@psr-dirtyfb-ioctl:
    - shard-tglu:         NOTRUN -> [SKIP][186] ([i915#9723])
   [186]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-10/igt@kms_dirtyfb@psr-dirtyfb-ioctl.html

  * igt@kms_draw_crc@draw-method-mmap-wc:
    - shard-dg2:          NOTRUN -> [SKIP][187] ([i915#8812])
   [187]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@kms_draw_crc@draw-method-mmap-wc.html

  * igt@kms_dsc@dsc-basic:
    - shard-dg2:          NOTRUN -> [SKIP][188] ([i915#3555] / [i915#3840])
   [188]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_dsc@dsc-basic.html

  * igt@kms_dsc@dsc-fractional-bpp-with-bpc:
    - shard-dg1:          NOTRUN -> [SKIP][189] ([i915#3840])
   [189]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_dsc@dsc-fractional-bpp-with-bpc.html

  * igt@kms_dsc@dsc-with-formats:
    - shard-tglu:         NOTRUN -> [SKIP][190] ([i915#3555] / [i915#3840]) +1 other test skip
   [190]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@kms_dsc@dsc-with-formats.html
    - shard-mtlp:         NOTRUN -> [SKIP][191] ([i915#3555] / [i915#3840])
   [191]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_dsc@dsc-with-formats.html
    - shard-rkl:          NOTRUN -> [SKIP][192] ([i915#3555] / [i915#3840])
   [192]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_dsc@dsc-with-formats.html

  * igt@kms_fbcon_fbt@psr:
    - shard-dg2:          NOTRUN -> [SKIP][193] ([i915#3469]) +1 other test skip
   [193]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_fbcon_fbt@psr.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-rkl:          NOTRUN -> [SKIP][194] ([i915#3955]) +1 other test skip
   [194]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_feature_discovery@chamelium:
    - shard-dg1:          NOTRUN -> [SKIP][195] ([i915#4854])
   [195]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_feature_discovery@chamelium.html

  * igt@kms_feature_discovery@display-2x:
    - shard-mtlp:         NOTRUN -> [SKIP][196] ([i915#1839])
   [196]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@kms_feature_discovery@display-2x.html

  * igt@kms_feature_discovery@display-3x:
    - shard-dg2:          NOTRUN -> [SKIP][197] ([i915#1839])
   [197]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_feature_discovery@display-3x.html
    - shard-dg1:          NOTRUN -> [SKIP][198] ([i915#1839])
   [198]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_feature_discovery@display-3x.html
    - shard-tglu:         NOTRUN -> [SKIP][199] ([i915#1839])
   [199]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-8/igt@kms_feature_discovery@display-3x.html

  * igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-vga1-hdmi-a1:
    - shard-snb:          [PASS][200] -> [FAIL][201] ([i915#2122])
   [200]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-snb4/igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-vga1-hdmi-a1.html
   [201]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-snb7/igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-vga1-hdmi-a1.html

  * igt@kms_flip@2x-flip-vs-dpms:
    - shard-rkl:          NOTRUN -> [SKIP][202] +30 other tests skip
   [202]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_flip@2x-flip-vs-dpms.html

  * igt@kms_flip@2x-flip-vs-fences:
    - shard-dg2:          NOTRUN -> [SKIP][203] ([i915#8381])
   [203]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_flip@2x-flip-vs-fences.html

  * igt@kms_flip@2x-flip-vs-modeset:
    - shard-tglu:         NOTRUN -> [SKIP][204] ([i915#3637] / [i915#3966])
   [204]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-10/igt@kms_flip@2x-flip-vs-modeset.html

  * igt@kms_flip@2x-modeset-vs-vblank-race:
    - shard-dg2:          NOTRUN -> [SKIP][205] +35 other tests skip
   [205]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_flip@2x-modeset-vs-vblank-race.html

  * igt@kms_flip@2x-plain-flip:
    - shard-tglu:         NOTRUN -> [SKIP][206] ([i915#3637]) +2 other tests skip
   [206]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@kms_flip@2x-plain-flip.html

  * igt@kms_flip@2x-plain-flip-fb-recreate:
    - shard-mtlp:         NOTRUN -> [SKIP][207] ([i915#3637]) +2 other tests skip
   [207]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-5/igt@kms_flip@2x-plain-flip-fb-recreate.html

  * igt@kms_flip@2x-plain-flip-ts-check-interruptible:
    - shard-dg1:          NOTRUN -> [SKIP][208] ([i915#9934]) +6 other tests skip
   [208]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html

  * igt@kms_flip@flip-vs-fences-interruptible:
    - shard-dg1:          NOTRUN -> [SKIP][209] ([i915#8381])
   [209]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-16/igt@kms_flip@flip-vs-fences-interruptible.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode:
    - shard-mtlp:         NOTRUN -> [SKIP][210] ([i915#8810])
   [210]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-5/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling@pipe-a-valid-mode:
    - shard-dg1:          NOTRUN -> [SKIP][211] ([i915#2587] / [i915#2672]) +3 other tests skip
   [211]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling@pipe-a-valid-mode.html
    - shard-tglu:         NOTRUN -> [SKIP][212] ([i915#2587] / [i915#2672]) +2 other tests skip
   [212]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-default-mode:
    - shard-mtlp:         NOTRUN -> [SKIP][213] ([i915#2672])
   [213]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling@pipe-a-valid-mode:
    - shard-rkl:          NOTRUN -> [SKIP][214] ([i915#2672]) +1 other test skip
   [214]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-upscaling@pipe-a-valid-mode:
    - shard-dg2:          NOTRUN -> [SKIP][215] ([i915#2672]) +8 other tests skip
   [215]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-8/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-upscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@fbc-2p-rte:
    - shard-dg2:          NOTRUN -> [SKIP][216] ([i915#5354]) +47 other tests skip
   [216]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@kms_frontbuffer_tracking@fbc-2p-rte.html

  * igt@kms_frontbuffer_tracking@fbc-tiling-y:
    - shard-dg2:          NOTRUN -> [SKIP][217] ([i915#10055])
   [217]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_frontbuffer_tracking@fbc-tiling-y.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-indfb-draw-pwrite:
    - shard-dg2:          NOTRUN -> [SKIP][218] ([i915#3458]) +23 other tests skip
   [218]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-indfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-render:
    - shard-dg1:          NOTRUN -> [SKIP][219] ([i915#3458]) +15 other tests skip
   [219]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-shrfb-draw-mmap-cpu:
    - shard-mtlp:         NOTRUN -> [SKIP][220] ([i915#1825]) +16 other tests skip
   [220]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-shrfb-draw-mmap-cpu.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-blt:
    - shard-rkl:          NOTRUN -> [SKIP][221] ([i915#1825]) +34 other tests skip
   [221]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-shrfb-fliptrack-mmap-gtt:
    - shard-dg2:          NOTRUN -> [SKIP][222] ([i915#8708]) +26 other tests skip
   [222]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_frontbuffer_tracking@fbcpsr-2p-shrfb-fliptrack-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@plane-fbc-rte:
    - shard-dg2:          NOTRUN -> [SKIP][223] ([i915#10070])
   [223]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_frontbuffer_tracking@plane-fbc-rte.html

  * igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-mmap-gtt:
    - shard-dg1:          NOTRUN -> [SKIP][224] ([i915#8708]) +12 other tests skip
   [224]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-rkl:          NOTRUN -> [SKIP][225] ([i915#3023]) +22 other tests skip
   [225]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-msflip-blt:
    - shard-dg1:          NOTRUN -> [SKIP][226] +42 other tests skip
   [226]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-rgb565-draw-mmap-gtt:
    - shard-mtlp:         NOTRUN -> [SKIP][227] ([i915#8708]) +4 other tests skip
   [227]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_frontbuffer_tracking@psr-rgb565-draw-mmap-gtt.html

  * igt@kms_getfb@getfb-reject-ccs:
    - shard-dg2:          NOTRUN -> [SKIP][228] ([i915#6118])
   [228]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@kms_getfb@getfb-reject-ccs.html

  * igt@kms_hdr@invalid-hdr:
    - shard-dg1:          NOTRUN -> [SKIP][229] ([i915#3555] / [i915#8228])
   [229]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_hdr@invalid-hdr.html

  * igt@kms_hdr@static-swap:
    - shard-mtlp:         NOTRUN -> [SKIP][230] ([i915#3555] / [i915#8228])
   [230]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@kms_hdr@static-swap.html

  * igt@kms_hdr@static-toggle:
    - shard-tglu:         NOTRUN -> [SKIP][231] ([i915#3555] / [i915#8228]) +1 other test skip
   [231]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-6/igt@kms_hdr@static-toggle.html

  * igt@kms_hdr@static-toggle-dpms:
    - shard-dg2:          NOTRUN -> [SKIP][232] ([i915#3555] / [i915#8228]) +1 other test skip
   [232]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_hdr@static-toggle-dpms.html

  * igt@kms_hdr@static-toggle-suspend:
    - shard-rkl:          NOTRUN -> [SKIP][233] ([i915#3555] / [i915#8228]) +1 other test skip
   [233]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_hdr@static-toggle-suspend.html

  * igt@kms_invalid_mode@clock-too-high@pipe-a-edp-1:
    - shard-mtlp:         NOTRUN -> [SKIP][234] ([i915#9457]) +3 other tests skip
   [234]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_invalid_mode@clock-too-high@pipe-a-edp-1.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-c-hdmi-a-1:
    - shard-glk:          NOTRUN -> [FAIL][235] ([i915#7862]) +1 other test fail
   [235]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk8/igt@kms_plane_alpha_blend@alpha-basic@pipe-c-hdmi-a-1.html

  * igt@kms_plane_lowres@tiling-y:
    - shard-dg2:          NOTRUN -> [SKIP][236] ([i915#8821])
   [236]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@kms_plane_lowres@tiling-y.html

  * igt@kms_plane_scaling@intel-max-src-size@pipe-a-hdmi-a-4:
    - shard-dg1:          NOTRUN -> [FAIL][237] ([i915#8292])
   [237]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_plane_scaling@intel-max-src-size@pipe-a-hdmi-a-4.html

  * igt@kms_plane_scaling@plane-downscale-factor-0-25-with-modifiers@pipe-a-hdmi-a-3:
    - shard-dg2:          NOTRUN -> [SKIP][238] ([i915#9423]) +7 other tests skip
   [238]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@kms_plane_scaling@plane-downscale-factor-0-25-with-modifiers@pipe-a-hdmi-a-3.html

  * igt@kms_plane_scaling@plane-downscale-factor-0-25-with-modifiers@pipe-b-hdmi-a-4:
    - shard-dg1:          NOTRUN -> [SKIP][239] ([i915#9423]) +7 other tests skip
   [239]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_plane_scaling@plane-downscale-factor-0-25-with-modifiers@pipe-b-hdmi-a-4.html

  * igt@kms_plane_scaling@plane-downscale-factor-0-25-with-pixel-format@pipe-a-hdmi-a-1:
    - shard-rkl:          NOTRUN -> [SKIP][240] ([i915#9423]) +9 other tests skip
   [240]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-4/igt@kms_plane_scaling@plane-downscale-factor-0-25-with-pixel-format@pipe-a-hdmi-a-1.html

  * igt@kms_plane_scaling@plane-downscale-factor-0-25-with-pixel-format@pipe-b-hdmi-a-1:
    - shard-glk:          NOTRUN -> [SKIP][241] +277 other tests skip
   [241]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk9/igt@kms_plane_scaling@plane-downscale-factor-0-25-with-pixel-format@pipe-b-hdmi-a-1.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-25@pipe-c-hdmi-a-3:
    - shard-dg1:          NOTRUN -> [SKIP][242] ([i915#5235]) +7 other tests skip
   [242]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_plane_scaling@planes-downscale-factor-0-25@pipe-c-hdmi-a-3.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-5-upscale-20x20@pipe-b-edp-1:
    - shard-mtlp:         NOTRUN -> [SKIP][243] ([i915#5235]) +6 other tests skip
   [243]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_plane_scaling@planes-downscale-factor-0-5-upscale-20x20@pipe-b-edp-1.html

  * igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-25@pipe-b-hdmi-a-2:
    - shard-rkl:          NOTRUN -> [SKIP][244] ([i915#5235]) +5 other tests skip
   [244]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-25@pipe-b-hdmi-a-2.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-25@pipe-a-hdmi-a-3:
    - shard-dg2:          NOTRUN -> [SKIP][245] ([i915#5235] / [i915#9423]) +15 other tests skip
   [245]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-25@pipe-a-hdmi-a-3.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-d-edp-1:
    - shard-mtlp:         NOTRUN -> [SKIP][246] ([i915#3555] / [i915#5235])
   [246]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-d-edp-1.html

  * igt@kms_pm_backlight@bad-brightness:
    - shard-rkl:          NOTRUN -> [SKIP][247] ([i915#5354]) +1 other test skip
   [247]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_pm_backlight@bad-brightness.html

  * igt@kms_pm_backlight@basic-brightness:
    - shard-dg1:          NOTRUN -> [SKIP][248] ([i915#5354]) +1 other test skip
   [248]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_pm_backlight@basic-brightness.html

  * igt@kms_pm_backlight@fade-with-suspend:
    - shard-tglu:         NOTRUN -> [SKIP][249] ([i915#9812])
   [249]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-10/igt@kms_pm_backlight@fade-with-suspend.html

  * igt@kms_pm_rpm@basic-rte:
    - shard-dg1:          NOTRUN -> [SKIP][250] ([i915#10648])
   [250]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-15/igt@kms_pm_rpm@basic-rte.html

  * igt@kms_prime@basic-crc-vgem:
    - shard-dg2:          NOTRUN -> [SKIP][251] ([i915#6524] / [i915#6805])
   [251]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-3/igt@kms_prime@basic-crc-vgem.html

  * igt@kms_prime@basic-modeset-hybrid:
    - shard-tglu:         NOTRUN -> [SKIP][252] ([i915#6524])
   [252]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-10/igt@kms_prime@basic-modeset-hybrid.html

  * igt@kms_psr2_su@frontbuffer-xrgb8888:
    - shard-dg1:          NOTRUN -> [SKIP][253] ([i915#9683])
   [253]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@kms_psr2_su@frontbuffer-xrgb8888.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-dg2:          NOTRUN -> [SKIP][254] ([i915#9683]) +1 other test skip
   [254]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@fbc-pr-cursor-plane-onoff:
    - shard-tglu:         NOTRUN -> [SKIP][255] ([i915#9732]) +11 other tests skip
   [255]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-4/igt@kms_psr@fbc-pr-cursor-plane-onoff.html

  * igt@kms_psr@fbc-pr-no-drrs:
    - shard-rkl:          NOTRUN -> [SKIP][256] ([i915#1072] / [i915#9732]) +16 other tests skip
   [256]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@kms_psr@fbc-pr-no-drrs.html

  * igt@kms_psr@fbc-pr-primary-mmap-cpu:
    - shard-mtlp:         NOTRUN -> [SKIP][257] ([i915#9688]) +5 other tests skip
   [257]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_psr@fbc-pr-primary-mmap-cpu.html

  * igt@kms_psr@psr-cursor-render:
    - shard-dg2:          NOTRUN -> [SKIP][258] ([i915#1072] / [i915#9732]) +30 other tests skip
   [258]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_psr@psr-cursor-render.html

  * igt@kms_psr@psr2-sprite-mmap-gtt:
    - shard-dg1:          NOTRUN -> [SKIP][259] ([i915#1072] / [i915#9732]) +16 other tests skip
   [259]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_psr@psr2-sprite-mmap-gtt.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-rkl:          NOTRUN -> [SKIP][260] ([i915#9685])
   [260]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-6/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_rotation_crc@bad-tiling:
    - shard-dg2:          NOTRUN -> [SKIP][261] ([i915#4235])
   [261]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-2/igt@kms_rotation_crc@bad-tiling.html

  * igt@kms_rotation_crc@primary-yf-tiled-reflect-x-270:
    - shard-dg2:          NOTRUN -> [SKIP][262] ([i915#4235] / [i915#5190]) +1 other test skip
   [262]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-270.html
    - shard-rkl:          NOTRUN -> [SKIP][263] ([i915#5289]) +2 other tests skip
   [263]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-270.html

  * igt@kms_scaling_modes@scaling-mode-center:
    - shard-dg1:          NOTRUN -> [SKIP][264] ([i915#3555]) +5 other tests skip
   [264]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@kms_scaling_modes@scaling-mode-center.html

  * igt@kms_scaling_modes@scaling-mode-full:
    - shard-tglu:         NOTRUN -> [SKIP][265] ([i915#3555]) +2 other tests skip
   [265]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-4/igt@kms_scaling_modes@scaling-mode-full.html

  * igt@kms_setmode@basic@pipe-a-hdmi-a-1:
    - shard-rkl:          NOTRUN -> [FAIL][266] ([i915#5465]) +1 other test fail
   [266]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_setmode@basic@pipe-a-hdmi-a-1.html

  * igt@kms_setmode@invalid-clone-single-crtc-stealing:
    - shard-mtlp:         NOTRUN -> [SKIP][267] ([i915#3555] / [i915#8809])
   [267]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-3/igt@kms_setmode@invalid-clone-single-crtc-stealing.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-dg1:          NOTRUN -> [SKIP][268] ([i915#8623])
   [268]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_tiled_display@basic-test-pattern.html
    - shard-tglu:         NOTRUN -> [SKIP][269] ([i915#8623])
   [269]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-3/igt@kms_tiled_display@basic-test-pattern.html
    - shard-mtlp:         NOTRUN -> [SKIP][270] ([i915#8623])
   [270]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@kms_tiled_display@basic-test-pattern.html
    - shard-rkl:          NOTRUN -> [SKIP][271] ([i915#8623])
   [271]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-d-edp-1:
    - shard-mtlp:         [PASS][272] -> [FAIL][273] ([i915#9196]) +2 other tests fail
   [272]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-mtlp-8/igt@kms_universal_plane@cursor-fb-leak@pipe-d-edp-1.html
   [273]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@kms_universal_plane@cursor-fb-leak@pipe-d-edp-1.html

  * igt@kms_vrr@flip-basic-fastset:
    - shard-dg1:          NOTRUN -> [SKIP][274] ([i915#9906])
   [274]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_vrr@flip-basic-fastset.html

  * igt@kms_writeback@writeback-check-output:
    - shard-dg2:          NOTRUN -> [SKIP][275] ([i915#2437])
   [275]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@kms_writeback@writeback-check-output.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-tglu:         NOTRUN -> [SKIP][276] ([i915#2437] / [i915#9412])
   [276]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-3/igt@kms_writeback@writeback-pixel-formats.html
    - shard-dg2:          NOTRUN -> [SKIP][277] ([i915#2437] / [i915#9412])
   [277]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@kms_writeback@writeback-pixel-formats.html
    - shard-dg1:          NOTRUN -> [SKIP][278] ([i915#2437] / [i915#9412])
   [278]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@kms_writeback@writeback-pixel-formats.html

  * igt@perf@gen8-unprivileged-single-ctx-counters:
    - shard-dg2:          NOTRUN -> [SKIP][279] ([i915#2436] / [i915#7387])
   [279]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@perf@gen8-unprivileged-single-ctx-counters.html

  * igt@perf@per-context-mode-unprivileged:
    - shard-rkl:          NOTRUN -> [SKIP][280] ([i915#2435])
   [280]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@perf@per-context-mode-unprivileged.html

  * igt@perf_pmu@cpu-hotplug:
    - shard-dg1:          NOTRUN -> [SKIP][281] ([i915#8850])
   [281]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@perf_pmu@cpu-hotplug.html

  * igt@perf_pmu@frequency@gt0:
    - shard-dg2:          NOTRUN -> [FAIL][282] ([i915#6806])
   [282]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-10/igt@perf_pmu@frequency@gt0.html

  * igt@perf_pmu@rc6-all-gts:
    - shard-dg1:          NOTRUN -> [SKIP][283] ([i915#8516])
   [283]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@perf_pmu@rc6-all-gts.html
    - shard-dg2:          NOTRUN -> [SKIP][284] ([i915#8516])
   [284]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@perf_pmu@rc6-all-gts.html

  * igt@prime_vgem@basic-fence-flip:
    - shard-dg1:          NOTRUN -> [SKIP][285] ([i915#3708])
   [285]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-fence-read:
    - shard-dg2:          NOTRUN -> [SKIP][286] ([i915#3291] / [i915#3708])
   [286]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-8/igt@prime_vgem@basic-fence-read.html

  * igt@prime_vgem@basic-gtt:
    - shard-dg2:          NOTRUN -> [SKIP][287] ([i915#3708] / [i915#4077])
   [287]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-1/igt@prime_vgem@basic-gtt.html

  * igt@prime_vgem@basic-read:
    - shard-mtlp:         NOTRUN -> [SKIP][288] ([i915#3708])
   [288]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@prime_vgem@basic-read.html

  * igt@prime_vgem@basic-write:
    - shard-rkl:          NOTRUN -> [SKIP][289] ([i915#3291] / [i915#3708])
   [289]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@prime_vgem@basic-write.html

  * igt@prime_vgem@coherency-gtt:
    - shard-dg1:          NOTRUN -> [SKIP][290] ([i915#3708] / [i915#4077])
   [290]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-14/igt@prime_vgem@coherency-gtt.html

  * igt@prime_vgem@fence-write-hang:
    - shard-tglu:         NOTRUN -> [SKIP][291] +43 other tests skip
   [291]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@prime_vgem@fence-write-hang.html

  * igt@sriov_basic@enable-vfs-autoprobe-off:
    - shard-dg1:          NOTRUN -> [SKIP][292] ([i915#9917]) +1 other test skip
   [292]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@sriov_basic@enable-vfs-autoprobe-off.html

  * igt@sriov_basic@enable-vfs-autoprobe-on:
    - shard-tglu:         NOTRUN -> [SKIP][293] ([i915#9917])
   [293]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@sriov_basic@enable-vfs-autoprobe-on.html

  * igt@sriov_basic@enable-vfs-bind-unbind-each:
    - shard-rkl:          NOTRUN -> [SKIP][294] ([i915#9917])
   [294]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@sriov_basic@enable-vfs-bind-unbind-each.html

  * igt@syncobj_timeline@invalid-wait-zero-handles:
    - shard-glk:          NOTRUN -> [FAIL][295] ([i915#9781])
   [295]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk9/igt@syncobj_timeline@invalid-wait-zero-handles.html

  * igt@tools_test@sysfs_l3_parity:
    - shard-mtlp:         NOTRUN -> [SKIP][296] ([i915#4818])
   [296]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-8/igt@tools_test@sysfs_l3_parity.html

  * igt@v3d/v3d_perfmon@destroy-valid-perfmon:
    - shard-mtlp:         NOTRUN -> [SKIP][297] ([i915#2575]) +5 other tests skip
   [297]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-1/igt@v3d/v3d_perfmon@destroy-valid-perfmon.html

  * igt@v3d/v3d_submit_cl@bad-extension:
    - shard-dg1:          NOTRUN -> [SKIP][298] ([i915#2575]) +10 other tests skip
   [298]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-13/igt@v3d/v3d_submit_cl@bad-extension.html

  * igt@v3d/v3d_submit_cl@simple-flush-cache:
    - shard-dg2:          NOTRUN -> [SKIP][299] ([i915#2575]) +12 other tests skip
   [299]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@v3d/v3d_submit_cl@simple-flush-cache.html

  * igt@v3d/v3d_submit_csd@bad-multisync-in-sync:
    - shard-tglu:         NOTRUN -> [SKIP][300] ([i915#2575]) +10 other tests skip
   [300]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-4/igt@v3d/v3d_submit_csd@bad-multisync-in-sync.html

  * igt@vc4/vc4_mmap@mmap-bo:
    - shard-dg1:          NOTRUN -> [SKIP][301] ([i915#7711]) +5 other tests skip
   [301]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@vc4/vc4_mmap@mmap-bo.html

  * igt@vc4/vc4_purgeable_bo@access-purged-bo-mem:
    - shard-mtlp:         NOTRUN -> [SKIP][302] ([i915#7711]) +3 other tests skip
   [302]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-7/igt@vc4/vc4_purgeable_bo@access-purged-bo-mem.html

  * igt@vc4/vc4_purgeable_bo@mark-unpurgeable-check-retained:
    - shard-rkl:          NOTRUN -> [SKIP][303] ([i915#7711]) +6 other tests skip
   [303]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@vc4/vc4_purgeable_bo@mark-unpurgeable-check-retained.html

  * igt@vc4/vc4_wait_bo@bad-bo:
    - shard-dg2:          NOTRUN -> [SKIP][304] ([i915#7711]) +13 other tests skip
   [304]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-5/igt@vc4/vc4_wait_bo@bad-bo.html

  
#### Possible fixes ####

  * igt@drm_fdinfo@idle@rcs0:
    - shard-rkl:          [FAIL][305] ([i915#7742]) -> [PASS][306]
   [305]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-rkl-4/igt@drm_fdinfo@idle@rcs0.html
   [306]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-5/igt@drm_fdinfo@idle@rcs0.html

  * igt@gem_eio@kms:
    - shard-tglu:         [INCOMPLETE][307] ([i915#10513]) -> [PASS][308]
   [307]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-8/igt@gem_eio@kms.html
   [308]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-3/igt@gem_eio@kms.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-glk:          [FAIL][309] ([i915#2842]) -> [PASS][310]
   [309]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-glk3/igt@gem_exec_fair@basic-none-share@rcs0.html
   [310]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk9/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-rkl:          [FAIL][311] ([i915#2842]) -> [PASS][312]
   [311]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-rkl-2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [312]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@rcs0:
    - shard-tglu:         [FAIL][313] ([i915#2842]) -> [PASS][314]
   [313]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-2/igt@gem_exec_fair@basic-pace@rcs0.html
   [314]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@gem_exec_fair@basic-pace@rcs0.html

  * igt@gem_lmem_swapping@heavy-verify-random@lmem0:
    - shard-dg2:          [FAIL][315] ([i915#10378]) -> [PASS][316]
   [315]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-7/igt@gem_lmem_swapping@heavy-verify-random@lmem0.html
   [316]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-7/igt@gem_lmem_swapping@heavy-verify-random@lmem0.html

  * igt@gem_lmem_swapping@smem-oom@lmem0:
    - shard-dg1:          [TIMEOUT][317] ([i915#5493]) -> [PASS][318]
   [317]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg1-18/igt@gem_lmem_swapping@smem-oom@lmem0.html
   [318]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-18/igt@gem_lmem_swapping@smem-oom@lmem0.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-tglu:         [INCOMPLETE][319] ([i915#9820]) -> [PASS][320]
   [319]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-7/igt@i915_module_load@reload-with-fault-injection.html
   [320]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-9/igt@i915_module_load@reload-with-fault-injection.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-tglu:         [FAIL][321] ([i915#3743]) -> [PASS][322]
   [321]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-2/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html
   [322]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-8/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-glk:          [FAIL][323] ([i915#2346]) -> [PASS][324]
   [323]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-glk2/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [324]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk4/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@single-bo@pipe-a:
    - shard-mtlp:         [DMESG-WARN][325] ([i915#10166]) -> [PASS][326]
   [325]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-mtlp-2/igt@kms_cursor_legacy@single-bo@pipe-a.html
   [326]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_cursor_legacy@single-bo@pipe-a.html

  * igt@kms_cursor_legacy@torture-bo@pipe-b:
    - shard-glk:          [DMESG-WARN][327] ([i915#10166] / [i915#1982]) -> [PASS][328]
   [327]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-glk3/igt@kms_cursor_legacy@torture-bo@pipe-b.html
   [328]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-glk8/igt@kms_cursor_legacy@torture-bo@pipe-b.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu:
    - shard-dg2:          [FAIL][329] ([i915#6880]) -> [PASS][330]
   [329]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-3/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu.html
   [330]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1:
    - shard-mtlp:         [FAIL][331] ([i915#9196]) -> [PASS][332]
   [331]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-mtlp-8/igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1.html
   [332]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-2/igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-a-hdmi-a-1:
    - shard-snb:          [FAIL][333] ([i915#9196]) -> [PASS][334]
   [333]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-snb2/igt@kms_universal_plane@cursor-fb-leak@pipe-a-hdmi-a-1.html
   [334]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-snb6/igt@kms_universal_plane@cursor-fb-leak@pipe-a-hdmi-a-1.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-d-hdmi-a-1:
    - shard-tglu:         [FAIL][335] ([i915#9196]) -> [PASS][336] +1 other test pass
   [335]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-tglu-4/igt@kms_universal_plane@cursor-fb-leak@pipe-d-hdmi-a-1.html
   [336]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-tglu-2/igt@kms_universal_plane@cursor-fb-leak@pipe-d-hdmi-a-1.html

  * igt@perf_pmu@busy-double-start@vecs1:
    - shard-dg2:          [FAIL][337] ([i915#4349]) -> [PASS][338] +3 other tests pass
   [337]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-5/igt@perf_pmu@busy-double-start@vecs1.html
   [338]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@perf_pmu@busy-double-start@vecs1.html

  
#### Warnings ####

  * igt@gem_create@create-ext-cpu-access-big:
    - shard-dg2:          [ABORT][339] ([i915#9846]) -> [INCOMPLETE][340] ([i915#9364])
   [339]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-8/igt@gem_create@create-ext-cpu-access-big.html
   [340]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-6/igt@gem_create@create-ext-cpu-access-big.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-rkl:          [ABORT][341] ([i915#9820]) -> [INCOMPLETE][342] ([i915#9820] / [i915#9849])
   [341]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-rkl-3/igt@i915_module_load@reload-with-fault-injection.html
   [342]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-rkl-1/igt@i915_module_load@reload-with-fault-injection.html

  * igt@kms_content_protection@mei-interface:
    - shard-dg1:          [SKIP][343] ([i915#9433]) -> [SKIP][344] ([i915#9424])
   [343]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg1-13/igt@kms_content_protection@mei-interface.html
   [344]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg1-17/igt@kms_content_protection@mei-interface.html

  * igt@kms_content_protection@type1:
    - shard-dg2:          [SKIP][345] ([i915#7118] / [i915#9424]) -> [SKIP][346] ([i915#7118] / [i915#7162] / [i915#9424])
   [345]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-7/igt@kms_content_protection@type1.html
   [346]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@kms_content_protection@type1.html

  * igt@kms_frontbuffer_tracking@fbcpsr-suspend:
    - shard-mtlp:         [DMESG-WARN][347] -> [DMESG-WARN][348] ([i915#1982])
   [347]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-mtlp-1/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html
   [348]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-mtlp-6/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html

  * igt@kms_psr@fbc-psr-primary-mmap-cpu:
    - shard-dg2:          [SKIP][349] ([i915#1072] / [i915#9732]) -> [SKIP][350] ([i915#1072] / [i915#9673] / [i915#9732]) +4 other tests skip
   [349]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14516/shard-dg2-6/igt@kms_psr@fbc-psr-primary-mmap-cpu.html
   [350]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/shard-dg2-11/igt@kms_psr@fbc-psr-primary-mmap-cpu.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#10031]: https://gitlab.freedesktop.org/drm/intel/issues/10031
  [i915#10055]: https://gitlab.freedesktop.org/drm/intel/issues/10055
  [i915#10070]: https://gitlab.freedesktop.org/drm/intel/issues/10070
  [i915#10166]: https://gitlab.freedesktop.org/drm/intel/issues/10166
  [i915#10278]: https://gitlab.freedesktop.org/drm/intel/issues/10278
  [i915#10307]: https://gitlab.freedesktop.org/drm/intel/issues/10307
  [i915#10366]: https://gitlab.freedesktop.org/drm/intel/issues/10366
  [i915#10378]: https://gitlab.freedesktop.org/drm/intel/issues/10378
  [i915#10434]: https://gitlab.freedesktop.org/drm/intel/issues/10434
  [i915#10513]: https://gitlab.freedesktop.org/drm/intel/issues/10513
  [i915#10648]: https://gitlab.freedesktop.org/drm/intel/issues/10648
  [i915#10656]: https://gitlab.freedesktop.org/drm/intel/issues/10656
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2435]: https://gitlab.freedesktop.org/drm/intel/issues/2435
  [i915#2436]: https://gitlab.freedesktop.org/drm/intel/issues/2436
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#3023]: https://gitlab.freedesktop.org/drm/intel/issues/3023
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3299]: https://gitlab.freedesktop.org/drm/intel/issues/3299
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3743]: https://gitlab.freedesktop.org/drm/intel/issues/3743
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#3966]: https://gitlab.freedesktop.org/drm/intel/issues/3966
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4235]: https://gitlab.freedesktop.org/drm/intel/issues/4235
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4349]: https://gitlab.freedesktop.org/drm/intel/issues/4349
  [i915#4537]: https://gitlab.freedesktop.org/drm/intel/issues/4537
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#4565]: https://gitlab.freedesktop.org/drm/intel/issues/4565
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4771]: https://gitlab.freedesktop.org/drm/intel/issues/4771
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4818]: https://gitlab.freedesktop.org/drm/intel/issues/4818
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4854]: https://gitlab.freedesktop.org/drm/intel/issues/4854
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880
  [i915#4881]: https://gitlab.freedesktop.org/drm/intel/issues/4881
  [i915#5107]: https://gitlab.freedesktop.org/drm/intel/issues/5107
  [i915#5138]: https://gitlab.freedesktop.org/drm/intel/issues/5138
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5465]: https://gitlab.freedesktop.org/drm/intel/issues/5465
  [i915#5493]: https://gitlab.freedesktop.org/drm/intel/issues/5493
  [i915#5723]: https://gitlab.freedesktop.org/drm/intel/issues/5723
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117
  [i915#6118]: https://gitlab.freedesktop.org/drm/intel/issues/6118
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6228]: https://gitlab.freedesktop.org/drm/intel/issues/6228
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#6590]: https://gitlab.freedesktop.org/drm/intel/issues/6590
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6805]: https://gitlab.freedesktop.org/drm/intel/issues/6805
  [i915#6806]: https://gitlab.freedesktop.org/drm/intel/issues/6806
  [i915#6880]: https://gitlab.freedesktop.org/drm/intel/issues/6880
  [i915#6944]: https://gitlab.freedesktop.org/drm/intel/issues/6944
  [i915#7091]: https://gitlab.freedesktop.org/drm/intel/issues/7091
  [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#7162]: https://gitlab.freedesktop.org/drm/intel/issues/7162
  [i915#7213]: https://gitlab.freedesktop.org/drm/intel/issues/7213
  [i915#7387]: https://gitlab.freedesktop.org/drm/intel/issues/7387
  [i915#7443]: https://gitlab.freedesktop.org/drm/intel/issues/7443
  [i915#7697]: https://gitlab.freedesktop.org/drm/intel/issues/7697
  [i915#7701]: https://gitlab.freedesktop.org/drm/intel/issues/7701
  [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711
  [i915#7742]: https://gitlab.freedesktop.org/drm/intel/issues/7742
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7862]: https://gitlab.freedesktop.org/drm/intel/issues/7862
  [i915#7975]: https://gitlab.freedesktop.org/drm/intel/issues/7975
  [i915#7984]: https://gitlab.freedesktop.org/drm/intel/issues/7984
  [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213
  [i915#8228]: https://gitlab.freedesktop.org/drm/intel/issues/8228
  [i915#8292]: https://gitlab.freedesktop.org/drm/intel/issues/8292
  [i915#8381]: https://gitlab.freedesktop.org/drm/intel/issues/8381
  [i915#8399]: https://gitlab.freedesktop.org/drm/intel/issues/8399
  [i915#8411]: https://gitlab.freedesktop.org/drm/intel/issues/8411
  [i915#8414]: https://gitlab.freedesktop.org/drm/intel/issues/8414
  [i915#8428]: https://gitlab.freedesktop.org/drm/intel/issues/8428
  [i915#8516]: https://gitlab.freedesktop.org/drm/intel/issues/8516
  [i915#8555]: https://gitlab.freedesktop.org/drm/intel/issues/8555
  [i915#8562]: https://gitlab.freedesktop.org/drm/intel/issues/8562
  [i915#8623]: https://gitlab.freedesktop.org/drm/intel/issues/8623
  [i915#8708]: https://gitlab.freedesktop.org/drm/intel/issues/8708
  [i915#8709]: https://gitlab.freedesktop.org/drm/intel/issues/8709
  [i915#8809]: https://gitlab.freedesktop.org/drm/intel/issues/8809
  [i915#8810]: https://gitlab.freedesktop.org/drm/intel/issues/8810
  [i915#8812]: https://gitlab.freedesktop.org/drm/intel/issues/8812
  [i915#8814]: https://gitlab.freedesktop.org/drm/intel/issues/8814
  [i915#8821]: https://gitlab.freedesktop.org/drm/intel/issues/8821
  [i915#8850]: https://gitlab.freedesktop.org/drm/intel/issues/8850
  [i915#8925]: https://gitlab.freedesktop.org/drm/intel/issues/8925
  [i915#9196]: https://gitlab.freedesktop.org/drm/intel/issues/9196
  [i915#9227]: https://gitlab.freedesktop.org/drm/intel/issues/9227
  [i915#9318]: https://gitlab.freedesktop.org/drm/intel/issues/9318
  [i915#9323]: https://gitlab.freedesktop.org/drm/intel/issues/9323
  [i915#9364]: https://gitlab.freedesktop.org/drm/intel/issues/9364
  [i915#9408]: https://gitlab.freedesktop.org/drm/intel/issues/9408
  [i915#9412]: https://gitlab.freedesktop.org/drm/intel/issues/9412
  [i915#9423]: https://gitlab.freedesktop.org/drm/intel/issues/9423
  [i915#9424]: https://gitlab.freedesktop.org/drm/intel/issues/9424
  [i915#9433]: https://gitlab.freedesktop.org/drm/intel/issues/9433
  [i915#9457]: https://gitlab.freedesktop.org/drm/intel/issues/9457
  [i915#9531]: https://gitlab.freedesktop.org/drm/intel/issues/9531
  [i915#9618]: https://gitlab.freedesktop.org/drm/intel/issues/9618
  [i915#9673]: https://gitlab.freedesktop.org/drm/intel/issues/9673
  [i915#9683]: https://gitlab.freedesktop.org/drm/intel/issues/9683
  [i915#9685]: https://gitlab.freedesktop.org/drm/intel/issues/9685
  [i915#9688]: https://gitlab.freedesktop.org/drm/intel/issues/9688
  [i915#9723]: https://gitlab.freedesktop.org/drm/intel/issues/9723
  [i915#9732]: https://gitlab.freedesktop.org/drm/intel/issues/9732
  [i915#9781]: https://gitlab.freedesktop.org/drm/intel/issues/9781
  [i915#9809]: https://gitlab.freedesktop.org/drm/intel/issues/9809
  [i915#9812]: https://gitlab.freedesktop.org/drm/intel/issues/9812
  [i915#9820]: https://gitlab.freedesktop.org/drm/intel/issues/9820
  [i915#9833]: https://gitlab.freedesktop.org/drm/intel/issues/9833
  [i915#9846]: https://gitlab.freedesktop.org/drm/intel/issues/9846
  [i915#9849]: https://gitlab.freedesktop.org/drm/intel/issues/9849
  [i915#9906]: https://gitlab.freedesktop.org/drm/intel/issues/9906
  [i915#9917]: https://gitlab.freedesktop.org/drm/intel/issues/9917
  [i915#9934]: https://gitlab.freedesktop.org/drm/intel/issues/9934


Build changes
-------------

  * CI: CI-20190529 -> None
  * IGT: IGT_7796 -> IGTPW_10965

  CI-20190529: 20190529
  CI_DRM_14516: 5100fcc57dc5d45b246a0aeb068f4f8062d29b09 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_10965: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html
  IGT_7796: 2cfed18f6aa776c1593d7cc328d23225dd61bdf9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_10965/index.html

[-- Attachment #2: Type: text/html, Size: 115402 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-02 20:55   ` Lucas De Marchi
@ 2024-04-03  5:35     ` Upadhyay, Tejas
  2024-04-04 23:22       ` John Harrison
  0 siblings, 1 reply; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-03  5:35 UTC (permalink / raw)
  To: De Marchi, Lucas, Roper, Matthew D, Harrison, John C
  Cc: igt-dev, intel-xe, Brost, Matthew



> -----Original Message-----
> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> Sent: Wednesday, April 3, 2024 2:26 AM
> To: Roper, Matthew D <matthew.d.roper@intel.com>
> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; igt-
> dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost, Matthew
> <matthew.brost@intel.com>
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> >> RCS/CCS are dependent engines as they are sharing reset domain.
> >> Whenever there is reset from CCS, all the exec queues running on RCS
> >> are victimised mainly on Lunarlake.
> >>
> >> Lets skip parallel execution on CCS with RCS.
> >
> >I haven't really looked at this specific test in detail, but based on
> >your explanation here, you're also going to run into problems with
> >multiple CCS engines since they all share the same reset.  You won't
> >see that on platforms like LNL that only have a single CCS, but
> >platforms
> 
> but it is seen on LNL because of having both RCS and CCS.
> 
> >like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
> >one kills anything running on the others.
> >
> >
> >Matt
> >
> >>
> >> It helps in fixing following errors:
> >> 1. Test assertion failure function test_legacy_mode, file, Failed
> >> assertion: data[i].data == 0xc0ffee
> >>
> >> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c,
> >> Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
> >>
> >> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >> ---
> >>  tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> >>  1 file changed, 25 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/intel/xe_exec_threads.c
> >> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
> >> --- a/tests/intel/xe_exec_threads.c
> >> +++ b/tests/intel/xe_exec_threads.c
> >> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>  	return NULL;
> >>  }
> >>
> >> +static bool is_engine_contexts_victimized(int fd, unsigned int
> >> +flags) {
> >> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >> +		return false;
> 
> as above, I don't think we should add any platform check here. It's impossible
> to keep it up to date and it's also testing the wrong thing.
> AFAIU you don't want parallel submission on engines that share the same
> reset domain. So, this is actually what should be tested.

Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which helps to run things parallelly on engines in same reset domain and apparently BMG/LNL does not have that kind of support so applicable for LNL/BMG with parallel submission on RCS/CCS only.

@Harrison, John C please reply if you have any other input here.

Thanks,
Tejas 
> 
> Lucas De Marchi
> 
> >> +
> >> +	if (flags & HANG)
> >> +		return true;
> >> +
> >> +	return false;
> >> +}
> >> +
> >>  /**
> >>   * SUBTEST: threads-%s
> >>   * Description: Run threads %arg[1] test with multi threads @@
> >> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>  	bool go = false;
> >>  	int n_threads = 0;
> >>  	int gt;
> >> +	bool has_rcs = false;
> >>
> >> -	xe_for_each_engine(fd, hwe)
> >> +	xe_for_each_engine(fd, hwe) {
> >> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> >> +			has_rcs = true;
> >>  		++n_engines;
> >> +	}
> >>
> >>  	if (flags & BALANCER) {
> >>  		xe_for_each_gt(fd, gt)
> >> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> >>  	}
> >>
> >>  	xe_for_each_engine(fd, hwe) {
> >> +		/* RCS/CCS sharing reset domain hence dependent engines.
> >> +		 * When CCS is doing reset, all the contexts of RCS are
> >> +		 * victimized, so skip the compute engine avoiding
> >> +		 * parallel execution with RCS
> >> +		 */
> >> +		if (has_rcs && hwe->engine_class ==
> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >> +		    is_engine_contexts_victimized(fd, flags))
> >> +			continue;
> >> +
> >>  		threads_data[i].mutex = &mutex;
> >>  		threads_data[i].cond = &cond;
> >>  #define ADDRESS_SHIFT	39
> >> --
> >> 2.25.1
> >>
> >
> >--
> >Matt Roper
> >Graphics Software Engineer
> >Linux GPU Platform Enablement
> >Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-03  5:35     ` Upadhyay, Tejas
@ 2024-04-04 23:22       ` John Harrison
  2024-04-04 23:45         ` John Harrison
  2024-04-05  4:47         ` Upadhyay, Tejas
  0 siblings, 2 replies; 21+ messages in thread
From: John Harrison @ 2024-04-04 23:22 UTC (permalink / raw)
  To: Upadhyay, Tejas, De Marchi, Lucas, Roper, Matthew D
  Cc: igt-dev, intel-xe, Brost, Matthew

On 4/2/2024 22:35, Upadhyay, Tejas wrote:
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi@intel.com>
>> Sent: Wednesday, April 3, 2024 2:26 AM
>> To: Roper, Matthew D <matthew.d.roper@intel.com>
>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; igt-
>> dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost, Matthew
>> <matthew.brost@intel.com>
>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
>> domain aware
>>
>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>>>> RCS/CCS are dependent engines as they are sharing reset domain.
>>>> Whenever there is reset from CCS, all the exec queues running on RCS
>>>> are victimised mainly on Lunarlake.
>>>>
>>>> Lets skip parallel execution on CCS with RCS.
>>> I haven't really looked at this specific test in detail, but based on
>>> your explanation here, you're also going to run into problems with
>>> multiple CCS engines since they all share the same reset.  You won't
>>> see that on platforms like LNL that only have a single CCS, but
>>> platforms
>> but it is seen on LNL because of having both RCS and CCS.
>>
>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
>>> one kills anything running on the others.
>>>
>>>
>>> Matt
>>>
>>>> It helps in fixing following errors:
>>>> 1. Test assertion failure function test_legacy_mode, file, Failed
>>>> assertion: data[i].data == 0xc0ffee
>>>>
>>>> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c,
>>>> Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
>>>>
>>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>>> ---
>>>>   tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tests/intel/xe_exec_threads.c
>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
>>>> --- a/tests/intel/xe_exec_threads.c
>>>> +++ b/tests/intel/xe_exec_threads.c
>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>>>   	return NULL;
>>>>   }
>>>>
>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
>>>> +flags) {
>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>>>> +		return false;
>> as above, I don't think we should add any platform check here. It's impossible
>> to keep it up to date and it's also testing the wrong thing.
>> AFAIU you don't want parallel submission on engines that share the same
>> reset domain. So, this is actually what should be tested.
> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which helps to run things parallelly on engines in same reset domain and apparently BMG/LNL does not have that kind of support so applicable for LNL/BMG with parallel submission on RCS/CCS only.
>
> @Harrison, John C please reply if you have any other input here.
I don't get what you mean by 'have some kind of WA/noWA'. All platforms 
with compute engines have shared reset domains. That is all there is to 
it. I.e. everything from TGL onwards. That includes RCS and all CCS 
engines. So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform 
with multiple engines that talk to EUs will reset all of those engines 
in parallel.

There are w/a's which make the situation even worse. E.g. on DG2/MTL you 
are not allowed to context switch one of those engines while another is 
busy. Which means that if one hangs, they all hang - you cannot just 
wait for other workloads to complete and/or pre-empt them off the engine 
prior to doing the shared reset. But there is nothing that makes it better.

I assume we are talking about GuC triggered engine resets here? As 
opposed to driver triggered full GT resets?

The GuC will attempt to idle all other connected engines first by 
pre-empting out any executing contexts. If those contexts are 
pre-emptible then they will survive - GuC will automatically restart 
them once the reset is complete. If they are not (or at least not 
pre-emptible within the pre-emption timeout limit) then they will be 
killed as collateral damage.

What are the workloads being submitted by this test? Are the 
pre-emptible spinners? If so, then they should survive (assuming you 
don't have the DG2/MTL RCS/CCS w/a in effect). If they are 
non-preemptible spinners then they are toast.

John.


>
> Thanks,
> Tejas
>> Lucas De Marchi
>>
>>>> +
>>>> +	if (flags & HANG)
>>>> +		return true;
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>>   /**
>>>>    * SUBTEST: threads-%s
>>>>    * Description: Run threads %arg[1] test with multi threads @@
>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
>>>>   	bool go = false;
>>>>   	int n_threads = 0;
>>>>   	int gt;
>>>> +	bool has_rcs = false;
>>>>
>>>> -	xe_for_each_engine(fd, hwe)
>>>> +	xe_for_each_engine(fd, hwe) {
>>>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>>>> +			has_rcs = true;
>>>>   		++n_engines;
>>>> +	}
>>>>
>>>>   	if (flags & BALANCER) {
>>>>   		xe_for_each_gt(fd, gt)
>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>>>   	}
>>>>
>>>>   	xe_for_each_engine(fd, hwe) {
>>>> +		/* RCS/CCS sharing reset domain hence dependent engines.
>>>> +		 * When CCS is doing reset, all the contexts of RCS are
>>>> +		 * victimized, so skip the compute engine avoiding
>>>> +		 * parallel execution with RCS
>>>> +		 */
>>>> +		if (has_rcs && hwe->engine_class ==
>> DRM_XE_ENGINE_CLASS_COMPUTE &&
>>>> +		    is_engine_contexts_victimized(fd, flags))
>>>> +			continue;
>>>> +
>>>>   		threads_data[i].mutex = &mutex;
>>>>   		threads_data[i].cond = &cond;
>>>>   #define ADDRESS_SHIFT	39
>>>> --
>>>> 2.25.1
>>>>
>>> --
>>> Matt Roper
>>> Graphics Software Engineer
>>> Linux GPU Platform Enablement
>>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-04 23:22       ` John Harrison
@ 2024-04-04 23:45         ` John Harrison
  2024-04-05  4:42           ` Upadhyay, Tejas
  2024-04-05  4:47         ` Upadhyay, Tejas
  1 sibling, 1 reply; 21+ messages in thread
From: John Harrison @ 2024-04-04 23:45 UTC (permalink / raw)
  To: Upadhyay, Tejas, De Marchi, Lucas, Roper, Matthew D
  Cc: igt-dev, intel-xe, Brost, Matthew

On 4/4/2024 16:22, John Harrison wrote:
> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
>>> -----Original Message-----
>>> From: De Marchi, Lucas <lucas.demarchi@intel.com>
>>> Sent: Wednesday, April 3, 2024 2:26 AM
>>> To: Roper, Matthew D <matthew.d.roper@intel.com>
>>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; igt-
>>> dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost, 
>>> Matthew
>>> <matthew.brost@intel.com>
>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests 
>>> reset
>>> domain aware
>>>
>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
>>>>> Whenever there is reset from CCS, all the exec queues running on RCS
>>>>> are victimised mainly on Lunarlake.
>>>>>
>>>>> Lets skip parallel execution on CCS with RCS.
>>>> I haven't really looked at this specific test in detail, but based on
>>>> your explanation here, you're also going to run into problems with
>>>> multiple CCS engines since they all share the same reset. You won't
>>>> see that on platforms like LNL that only have a single CCS, but
>>>> platforms
>>> but it is seen on LNL because of having both RCS and CCS.
>>>
>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
>>>> one kills anything running on the others.
>>>>
>>>>
>>>> Matt
>>>>
>>>>> It helps in fixing following errors:
>>>>> 1. Test assertion failure function test_legacy_mode, file, Failed
>>>>> assertion: data[i].data == 0xc0ffee
>>>>>
>>>>> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c,
>>>>> Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
>>>>>
>>>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>>>> ---
>>>>>   tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>>>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tests/intel/xe_exec_threads.c
>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
>>>>> --- a/tests/intel/xe_exec_threads.c
>>>>> +++ b/tests/intel/xe_exec_threads.c
>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>>>>       return NULL;
>>>>>   }
>>>>>
>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
>>>>> +flags) {
>>>>> +    if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>>>>> +        return false;
>>> as above, I don't think we should add any platform check here. It's 
>>> impossible
>>> to keep it up to date and it's also testing the wrong thing.
>>> AFAIU you don't want parallel submission on engines that share the same
>>> reset domain. So, this is actually what should be tested.
>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which 
>> helps to run things parallelly on engines in same reset domain and 
>> apparently BMG/LNL does not have that kind of support so applicable 
>> for LNL/BMG with parallel submission on RCS/CCS only.
>>
>> @Harrison, John C please reply if you have any other input here.
> I don't get what you mean by 'have some kind of WA/noWA'. All 
> platforms with compute engines have shared reset domains. That is all 
> there is to it. I.e. everything from TGL onwards. That includes RCS 
> and all CCS engines. So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. 
> Any platform with multiple engines that talk to EUs will reset all of 
> those engines in parallel.
>
> There are w/a's which make the situation even worse. E.g. on DG2/MTL 
> you are not allowed to context switch one of those engines while 
> another is busy. Which means that if one hangs, they all hang - you 
> cannot just wait for other workloads to complete and/or pre-empt them 
> off the engine prior to doing the shared reset. But there is nothing 
> that makes it better.
>
> I assume we are talking about GuC triggered engine resets here? As 
> opposed to driver triggered full GT resets?
>
> The GuC will attempt to idle all other connected engines first by 
> pre-empting out any executing contexts. If those contexts are 
> pre-emptible then they will survive - GuC will automatically restart 
> them once the reset is complete. If they are not (or at least not 
> pre-emptible within the pre-emption timeout limit) then they will be 
> killed as collateral damage.
>
> What are the workloads being submitted by this test? Are the 
> pre-emptible spinners? If so, then they should survive (assuming you 
> don't have the DG2/MTL RCS/CCS w/a in effect). If they are 
> non-preemptible spinners then they are toast.
>
> John.
>
>
>>
>> Thanks,
>> Tejas
>>> Lucas De Marchi
>>>
>>>>> +
>>>>> +    if (flags & HANG)
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>   /**
>>>>>    * SUBTEST: threads-%s
>>>>>    * Description: Run threads %arg[1] test with multi threads @@
>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
>>>>>       bool go = false;
>>>>>       int n_threads = 0;
>>>>>       int gt;
>>>>> +    bool has_rcs = false;
>>>>>
>>>>> -    xe_for_each_engine(fd, hwe)
>>>>> +    xe_for_each_engine(fd, hwe) {
>>>>> +        if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>>>>> +            has_rcs = true;
>>>>>           ++n_engines;
>>>>> +    }
>>>>>
>>>>>       if (flags & BALANCER) {
>>>>>           xe_for_each_gt(fd, gt)
>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
PS: There is nothing in the function name that suggests this is a reset 
specific test. If this is common code for multiple tests including some 
that do not expect to hit resets, then removing all testing of compute 
engines is a bad idea.

John.


>>>>>       }
>>>>>
>>>>>       xe_for_each_engine(fd, hwe) {
>>>>> +        /* RCS/CCS sharing reset domain hence dependent engines.
>>>>> +         * When CCS is doing reset, all the contexts of RCS are
>>>>> +         * victimized, so skip the compute engine avoiding
>>>>> +         * parallel execution with RCS
>>>>> +         */
>>>>> +        if (has_rcs && hwe->engine_class ==
>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
>>>>> + is_engine_contexts_victimized(fd, flags))
>>>>> +            continue;
>>>>> +
>>>>>           threads_data[i].mutex = &mutex;
>>>>>           threads_data[i].cond = &cond;
>>>>>   #define ADDRESS_SHIFT    39
>>>>> -- 
>>>>> 2.25.1
>>>>>
>>>> -- 
>>>> Matt Roper
>>>> Graphics Software Engineer
>>>> Linux GPU Platform Enablement
>>>> Intel Corporation
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-04 23:45         ` John Harrison
@ 2024-04-05  4:42           ` Upadhyay, Tejas
  0 siblings, 0 replies; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-05  4:42 UTC (permalink / raw)
  To: Harrison, John C, De Marchi, Lucas, Roper, Matthew D
  Cc: igt-dev, intel-xe, Brost, Matthew



> -----Original Message-----
> From: Harrison, John C <john.c.harrison@intel.com>
> Sent: Friday, April 5, 2024 5:15 AM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>
> Cc: igt-dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost,
> Matthew <matthew.brost@intel.com>
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On 4/4/2024 16:22, John Harrison wrote:
> > On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> >>> -----Original Message-----
> >>> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> >>> Sent: Wednesday, April 3, 2024 2:26 AM
> >>> To: Roper, Matthew D <matthew.d.roper@intel.com>
> >>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; igt-
> >>> dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost,
> >>> Matthew <matthew.brost@intel.com>
> >>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >>> reset domain aware
> >>>
> >>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> >>>>> RCS/CCS are dependent engines as they are sharing reset domain.
> >>>>> Whenever there is reset from CCS, all the exec queues running on
> >>>>> RCS are victimised mainly on Lunarlake.
> >>>>>
> >>>>> Lets skip parallel execution on CCS with RCS.
> >>>> I haven't really looked at this specific test in detail, but based
> >>>> on your explanation here, you're also going to run into problems
> >>>> with multiple CCS engines since they all share the same reset. You
> >>>> won't see that on platforms like LNL that only have a single CCS,
> >>>> but platforms
> >>> but it is seen on LNL because of having both RCS and CCS.
> >>>
> >>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset
> >>>> on one kills anything running on the others.
> >>>>
> >>>>
> >>>> Matt
> >>>>
> >>>>> It helps in fixing following errors:
> >>>>> 1. Test assertion failure function test_legacy_mode, file, Failed
> >>>>> assertion: data[i].data == 0xc0ffee
> >>>>>
> >>>>> 2.Test assertion failure function xe_exec, file
> >>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0,
> >>>>> error: -125 != 0
> >>>>>
> >>>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >>>>> ---
> >>>>>   tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> >>>>>   1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/tests/intel/xe_exec_threads.c
> >>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
> >>>>> --- a/tests/intel/xe_exec_threads.c
> >>>>> +++ b/tests/intel/xe_exec_threads.c
> >>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>>>>       return NULL;
> >>>>>   }
> >>>>>
> >>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
> >>>>> +flags) {
> >>>>> +    if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >>>>> +        return false;
> >>> as above, I don't think we should add any platform check here. It's
> >>> impossible to keep it up to date and it's also testing the wrong
> >>> thing.
> >>> AFAIU you don't want parallel submission on engines that share the
> >>> same reset domain. So, this is actually what should be tested.
> >> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which
> >> helps to run things parallelly on engines in same reset domain and
> >> apparently BMG/LNL does not have that kind of support so applicable
> >> for LNL/BMG with parallel submission on RCS/CCS only.
> >>
> >> @Harrison, John C please reply if you have any other input here.
> > I don't get what you mean by 'have some kind of WA/noWA'. All
> > platforms with compute engines have shared reset domains. That is all
> > there is to it. I.e. everything from TGL onwards. That includes RCS
> > and all CCS engines. So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc.
> > Any platform with multiple engines that talk to EUs will reset all of
> > those engines in parallel.
> >
> > There are w/a's which make the situation even worse. E.g. on DG2/MTL
> > you are not allowed to context switch one of those engines while
> > another is busy. Which means that if one hangs, they all hang - you
> > cannot just wait for other workloads to complete and/or pre-empt them
> > off the engine prior to doing the shared reset. But there is nothing
> > that makes it better.
> >
> > I assume we are talking about GuC triggered engine resets here? As
> > opposed to driver triggered full GT resets?
> >
> > The GuC will attempt to idle all other connected engines first by
> > pre-empting out any executing contexts. If those contexts are
> > pre-emptible then they will survive - GuC will automatically restart
> > them once the reset is complete. If they are not (or at least not
> > pre-emptible within the pre-emption timeout limit) then they will be
> > killed as collateral damage.
> >
> > What are the workloads being submitted by this test? Are the
> > pre-emptible spinners? If so, then they should survive (assuming you
> > don't have the DG2/MTL RCS/CCS w/a in effect). If they are
> > non-preemptible spinners then they are toast.
> >
> > John.
> >
> >
> >>
> >> Thanks,
> >> Tejas
> >>> Lucas De Marchi
> >>>
> >>>>> +
> >>>>> +    if (flags & HANG)
> >>>>> +        return true;
> >>>>> +
> >>>>> +    return false;
> >>>>> +}
> >>>>> +
> >>>>>   /**
> >>>>>    * SUBTEST: threads-%s
> >>>>>    * Description: Run threads %arg[1] test with multi threads @@
> >>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>>>>       bool go = false;
> >>>>>       int n_threads = 0;
> >>>>>       int gt;
> >>>>> +    bool has_rcs = false;
> >>>>>
> >>>>> -    xe_for_each_engine(fd, hwe)
> >>>>> +    xe_for_each_engine(fd, hwe) {
> >>>>> +        if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> >>>>> +            has_rcs = true;
> >>>>>           ++n_engines;
> >>>>> +    }
> >>>>>
> >>>>>       if (flags & BALANCER) {
> >>>>>           xe_for_each_gt(fd, gt)
> >>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> PS: There is nothing in the function name that suggests this is a reset specific
> test. If this is common code for multiple tests including some that do not
> expect to hit resets, then removing all testing of compute engines is a bad
> idea.

We skip tests where rcs/ccs both are involved and HANG flag is passed, HANG flag means we will have reset involved. Otherwise all other compute engine test will run as is.

Thanks,
Tejas 
> 
> John.
> 
> 
> >>>>>       }
> >>>>>
> >>>>>       xe_for_each_engine(fd, hwe) {
> >>>>> +        /* RCS/CCS sharing reset domain hence dependent engines.
> >>>>> +         * When CCS is doing reset, all the contexts of RCS are
> >>>>> +         * victimized, so skip the compute engine avoiding
> >>>>> +         * parallel execution with RCS
> >>>>> +         */
> >>>>> +        if (has_rcs && hwe->engine_class ==
> >>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >>>>> + is_engine_contexts_victimized(fd, flags))
> >>>>> +            continue;
> >>>>> +
> >>>>>           threads_data[i].mutex = &mutex;
> >>>>>           threads_data[i].cond = &cond;
> >>>>>   #define ADDRESS_SHIFT    39
> >>>>> --
> >>>>> 2.25.1
> >>>>>
> >>>> --
> >>>> Matt Roper
> >>>> Graphics Software Engineer
> >>>> Linux GPU Platform Enablement
> >>>> Intel Corporation
> >


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-04 23:22       ` John Harrison
  2024-04-04 23:45         ` John Harrison
@ 2024-04-05  4:47         ` Upadhyay, Tejas
  2024-04-05 18:15           ` John Harrison
  1 sibling, 1 reply; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-05  4:47 UTC (permalink / raw)
  To: Harrison, John C, De Marchi, Lucas, Roper, Matthew D
  Cc: igt-dev, intel-xe, Brost, Matthew



> -----Original Message-----
> From: Harrison, John C <john.c.harrison@intel.com>
> Sent: Friday, April 5, 2024 4:53 AM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>
> Cc: igt-dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost,
> Matthew <matthew.brost@intel.com>
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> >> -----Original Message-----
> >> From: De Marchi, Lucas <lucas.demarchi@intel.com>
> >> Sent: Wednesday, April 3, 2024 2:26 AM
> >> To: Roper, Matthew D <matthew.d.roper@intel.com>
> >> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; igt-
> >> dev@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost,
> >> Matthew <matthew.brost@intel.com>
> >> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >> reset domain aware
> >>
> >> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> >>>> RCS/CCS are dependent engines as they are sharing reset domain.
> >>>> Whenever there is reset from CCS, all the exec queues running on
> >>>> RCS are victimised mainly on Lunarlake.
> >>>>
> >>>> Lets skip parallel execution on CCS with RCS.
> >>> I haven't really looked at this specific test in detail, but based
> >>> on your explanation here, you're also going to run into problems
> >>> with multiple CCS engines since they all share the same reset.  You
> >>> won't see that on platforms like LNL that only have a single CCS,
> >>> but platforms
> >> but it is seen on LNL because of having both RCS and CCS.
> >>
> >>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset
> >>> on one kills anything running on the others.
> >>>
> >>>
> >>> Matt
> >>>
> >>>> It helps in fixing following errors:
> >>>> 1. Test assertion failure function test_legacy_mode, file, Failed
> >>>> assertion: data[i].data == 0xc0ffee
> >>>>
> >>>> 2.Test assertion failure function xe_exec, file
> >>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0,
> >>>> error: -125 != 0
> >>>>
> >>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >>>> ---
> >>>>   tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> >>>>   1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/tests/intel/xe_exec_threads.c
> >>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
> >>>> --- a/tests/intel/xe_exec_threads.c
> >>>> +++ b/tests/intel/xe_exec_threads.c
> >>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>>>   	return NULL;
> >>>>   }
> >>>>
> >>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
> >>>> +flags) {
> >>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >>>> +		return false;
> >> as above, I don't think we should add any platform check here. It's
> >> impossible to keep it up to date and it's also testing the wrong thing.
> >> AFAIU you don't want parallel submission on engines that share the
> >> same reset domain. So, this is actually what should be tested.
> > Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which
> helps to run things parallelly on engines in same reset domain and apparently
> BMG/LNL does not have that kind of support so applicable for LNL/BMG with
> parallel submission on RCS/CCS only.
> >
> > @Harrison, John C please reply if you have any other input here.
> I don't get what you mean by 'have some kind of WA/noWA'. All platforms
> with compute engines have shared reset domains. That is all there is to it. I.e.
> everything from TGL onwards. That includes RCS and all CCS engines. So RCS +
> CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform with multiple engines
> that talk to EUs will reset all of those engines in parallel.
> 
> There are w/a's which make the situation even worse. E.g. on DG2/MTL you
> are not allowed to context switch one of those engines while another is busy.
> Which means that if one hangs, they all hang - you cannot just wait for other
> workloads to complete and/or pre-empt them off the engine prior to doing
> the shared reset. But there is nothing that makes it better.
> 
> I assume we are talking about GuC triggered engine resets here? As opposed
> to driver triggered full GT resets?
> 
> The GuC will attempt to idle all other connected engines first by pre-empting
> out any executing contexts. If those contexts are pre-emptible then they will
> survive - GuC will automatically restart them once the reset is complete. If
> they are not (or at least not pre-emptible within the pre-emption timeout
> limit) then they will be killed as collateral damage.
> 
> What are the workloads being submitted by this test? Are the pre-emptible
> spinners? If so, then they should survive (assuming you don't have the
> DG2/MTL RCS/CCS w/a in effect). If they are non-preemptible spinners then
> they are toast.

Main question here was, if this fix should be applied to all platforms who has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG are hitting this issue, with same tests PVC and other platforms are not hitting issue which we are addressing here.

Thanks,
Tejas
> 
> John.
> 
> 
> >
> > Thanks,
> > Tejas
> >> Lucas De Marchi
> >>
> >>>> +
> >>>> +	if (flags & HANG)
> >>>> +		return true;
> >>>> +
> >>>> +	return false;
> >>>> +}
> >>>> +
> >>>>   /**
> >>>>    * SUBTEST: threads-%s
> >>>>    * Description: Run threads %arg[1] test with multi threads @@
> >>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>>>   	bool go = false;
> >>>>   	int n_threads = 0;
> >>>>   	int gt;
> >>>> +	bool has_rcs = false;
> >>>>
> >>>> -	xe_for_each_engine(fd, hwe)
> >>>> +	xe_for_each_engine(fd, hwe) {
> >>>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> >>>> +			has_rcs = true;
> >>>>   		++n_engines;
> >>>> +	}
> >>>>
> >>>>   	if (flags & BALANCER) {
> >>>>   		xe_for_each_gt(fd, gt)
> >>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> >>>>   	}
> >>>>
> >>>>   	xe_for_each_engine(fd, hwe) {
> >>>> +		/* RCS/CCS sharing reset domain hence dependent engines.
> >>>> +		 * When CCS is doing reset, all the contexts of RCS are
> >>>> +		 * victimized, so skip the compute engine avoiding
> >>>> +		 * parallel execution with RCS
> >>>> +		 */
> >>>> +		if (has_rcs && hwe->engine_class ==
> >> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >>>> +		    is_engine_contexts_victimized(fd, flags))
> >>>> +			continue;
> >>>> +
> >>>>   		threads_data[i].mutex = &mutex;
> >>>>   		threads_data[i].cond = &cond;
> >>>>   #define ADDRESS_SHIFT	39
> >>>> --
> >>>> 2.25.1
> >>>>
> >>> --
> >>> Matt Roper
> >>> Graphics Software Engineer
> >>> Linux GPU Platform Enablement
> >>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-05  4:47         ` Upadhyay, Tejas
@ 2024-04-05 18:15           ` John Harrison
  2024-04-05 23:33             ` Matthew Brost
  0 siblings, 1 reply; 21+ messages in thread
From: John Harrison @ 2024-04-05 18:15 UTC (permalink / raw)
  To: Upadhyay, Tejas, De Marchi, Lucas, Roper, Matthew D
  Cc: igt-dev, intel-xe, Brost, Matthew

[-- Attachment #1: Type: text/plain, Size: 7845 bytes --]

On 4/4/2024 21:47, Upadhyay, Tejas wrote:
>> -----Original Message-----
>> From: Harrison, John C<john.c.harrison@intel.com>
>> Sent: Friday, April 5, 2024 4:53 AM
>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
>> <lucas.demarchi@intel.com>; Roper, Matthew D
>> <matthew.d.roper@intel.com>
>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
>> Matthew<matthew.brost@intel.com>
>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
>> domain aware
>>
>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
>>>> -----Original Message-----
>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
>>>> Sent: Wednesday, April 3, 2024 2:26 AM
>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
>>>> Matthew<matthew.brost@intel.com>
>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
>>>> reset domain aware
>>>>
>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
>>>>>> Whenever there is reset from CCS, all the exec queues running on
>>>>>> RCS are victimised mainly on Lunarlake.
>>>>>>
>>>>>> Lets skip parallel execution on CCS with RCS.
>>>>> I haven't really looked at this specific test in detail, but based
>>>>> on your explanation here, you're also going to run into problems
>>>>> with multiple CCS engines since they all share the same reset.  You
>>>>> won't see that on platforms like LNL that only have a single CCS,
>>>>> but platforms
>>>> but it is seen on LNL because of having both RCS and CCS.
>>>>
>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset
>>>>> on one kills anything running on the others.
>>>>>
>>>>>
>>>>> Matt
>>>>>
>>>>>> It helps in fixing following errors:
>>>>>> 1. Test assertion failure function test_legacy_mode, file, Failed
>>>>>> assertion: data[i].data == 0xc0ffee
>>>>>>
>>>>>> 2.Test assertion failure function xe_exec, file
>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0,
>>>>>> error: -125 != 0
>>>>>>
>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
>>>>>> ---
>>>>>>    tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>>>>>    1 file changed, 25 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/tests/intel/xe_exec_threads.c
>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
>>>>>> --- a/tests/intel/xe_exec_threads.c
>>>>>> +++ b/tests/intel/xe_exec_threads.c
>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>>>>>    	return NULL;
>>>>>>    }
>>>>>>
>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
>>>>>> +flags) {
>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>>>>>> +		return false;
>>>> as above, I don't think we should add any platform check here. It's
>>>> impossible to keep it up to date and it's also testing the wrong thing.
>>>> AFAIU you don't want parallel submission on engines that share the
>>>> same reset domain. So, this is actually what should be tested.
>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which
>> helps to run things parallelly on engines in same reset domain and apparently
>> BMG/LNL does not have that kind of support so applicable for LNL/BMG with
>> parallel submission on RCS/CCS only.
>>> @Harrison, John C please reply if you have any other input here.
>> I don't get what you mean by 'have some kind of WA/noWA'. All platforms
>> with compute engines have shared reset domains. That is all there is to it. I.e.
>> everything from TGL onwards. That includes RCS and all CCS engines. So RCS +
>> CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform with multiple engines
>> that talk to EUs will reset all of those engines in parallel.
>>
>> There are w/a's which make the situation even worse. E.g. on DG2/MTL you
>> are not allowed to context switch one of those engines while another is busy.
>> Which means that if one hangs, they all hang - you cannot just wait for other
>> workloads to complete and/or pre-empt them off the engine prior to doing
>> the shared reset. But there is nothing that makes it better.
>>
>> I assume we are talking about GuC triggered engine resets here? As opposed
>> to driver triggered full GT resets?
>>
>> The GuC will attempt to idle all other connected engines first by pre-empting
>> out any executing contexts. If those contexts are pre-emptible then they will
>> survive - GuC will automatically restart them once the reset is complete. If
>> they are not (or at least not pre-emptible within the pre-emption timeout
>> limit) then they will be killed as collateral damage.
>>
>> What are the workloads being submitted by this test? Are the pre-emptible
>> spinners? If so, then they should survive (assuming you don't have the
>> DG2/MTL RCS/CCS w/a in effect). If they are non-preemptible spinners then
>> they are toast.
> Main question here was, if this fix should be applied to all platforms who has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG are hitting this issue, with same tests PVC and other platforms are not hitting issue which we are addressing here.
And the answer is that yes, shared reset domains are common to all 
platforms with compute engines. So if only LNL/BMG are failing then the 
problem is not understood. Which is not helped by this test code being 
extremely complex and having almost zero explanation in it at all :(.

As noted, PVC has multiple compute engines but no RCS engine. If any 
compute engine is reset then all are reset. So if the test is running 
correctly and passing on PVC then it cannot be failing on LNL/BMG purely 
due to shared domain resets.

Is the reset not happening on PVC? Is the test not actually running 
multiple contexts in parallel on PVC? Or are the spinners pre-emptible 
and are therefore supposed to survive the reset of a shared domain 
engine by being swapped out first? In which case LNL/BMG are broken 
because the killed contexts are not supposed to be killed even though 
the engine is reset?

John.

>
> Thanks,
> Tejas
>> John.
>>
>>
>>> Thanks,
>>> Tejas
>>>> Lucas De Marchi
>>>>
>>>>>> +
>>>>>> +	if (flags & HANG)
>>>>>> +		return true;
>>>>>> +
>>>>>> +	return false;
>>>>>> +}
>>>>>> +
>>>>>>    /**
>>>>>>     * SUBTEST: threads-%s
>>>>>>     * Description: Run threads %arg[1] test with multi threads @@
>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
>>>>>>    	bool go = false;
>>>>>>    	int n_threads = 0;
>>>>>>    	int gt;
>>>>>> +	bool has_rcs = false;
>>>>>>
>>>>>> -	xe_for_each_engine(fd, hwe)
>>>>>> +	xe_for_each_engine(fd, hwe) {
>>>>>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>>>>>> +			has_rcs = true;
>>>>>>    		++n_engines;
>>>>>> +	}
>>>>>>
>>>>>>    	if (flags & BALANCER) {
>>>>>>    		xe_for_each_gt(fd, gt)
>>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>>>>>    	}
>>>>>>
>>>>>>    	xe_for_each_engine(fd, hwe) {
>>>>>> +		/* RCS/CCS sharing reset domain hence dependent engines.
>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
>>>>>> +		 * victimized, so skip the compute engine avoiding
>>>>>> +		 * parallel execution with RCS
>>>>>> +		 */
>>>>>> +		if (has_rcs && hwe->engine_class ==
>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
>>>>>> +		    is_engine_contexts_victimized(fd, flags))
>>>>>> +			continue;
>>>>>> +
>>>>>>    		threads_data[i].mutex = &mutex;
>>>>>>    		threads_data[i].cond = &cond;
>>>>>>    #define ADDRESS_SHIFT	39
>>>>>> --
>>>>>> 2.25.1
>>>>>>
>>>>> --
>>>>> Matt Roper
>>>>> Graphics Software Engineer
>>>>> Linux GPU Platform Enablement
>>>>> Intel Corporation

[-- Attachment #2: Type: text/html, Size: 10882 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-05 18:15           ` John Harrison
@ 2024-04-05 23:33             ` Matthew Brost
  2024-04-05 23:42               ` John Harrison
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2024-04-05 23:33 UTC (permalink / raw)
  To: John Harrison
  Cc: Upadhyay, Tejas, De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe

On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> > > -----Original Message-----
> > > From: Harrison, John C<john.c.harrison@intel.com>
> > > Sent: Friday, April 5, 2024 4:53 AM
> > > To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
> > > <lucas.demarchi@intel.com>; Roper, Matthew D
> > > <matthew.d.roper@intel.com>
> > > Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
> > > Matthew<matthew.brost@intel.com>
> > > Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> > > domain aware
> > > 
> > > On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> > > > > -----Original Message-----
> > > > > From: De Marchi, Lucas<lucas.demarchi@intel.com>
> > > > > Sent: Wednesday, April 3, 2024 2:26 AM
> > > > > To: Roper, Matthew D<matthew.d.roper@intel.com>
> > > > > Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> > > > > dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
> > > > > Matthew<matthew.brost@intel.com>
> > > > > Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> > > > > reset domain aware
> > > > > 
> > > > > On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> > > > > > On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> > > > > > > RCS/CCS are dependent engines as they are sharing reset domain.
> > > > > > > Whenever there is reset from CCS, all the exec queues running on
> > > > > > > RCS are victimised mainly on Lunarlake.
> > > > > > > 
> > > > > > > Lets skip parallel execution on CCS with RCS.
> > > > > > I haven't really looked at this specific test in detail, but based
> > > > > > on your explanation here, you're also going to run into problems
> > > > > > with multiple CCS engines since they all share the same reset.  You
> > > > > > won't see that on platforms like LNL that only have a single CCS,
> > > > > > but platforms
> > > > > but it is seen on LNL because of having both RCS and CCS.
> > > > > 
> > > > > > like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset
> > > > > > on one kills anything running on the others.
> > > > > > 
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > > It helps in fixing following errors:
> > > > > > > 1. Test assertion failure function test_legacy_mode, file, Failed
> > > > > > > assertion: data[i].data == 0xc0ffee
> > > > > > > 
> > > > > > > 2.Test assertion failure function xe_exec, file
> > > > > > > ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0,
> > > > > > > error: -125 != 0
> > > > > > > 
> > > > > > > Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> > > > > > > ---
> > > > > > >    tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> > > > > > >    1 file changed, 25 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/tests/intel/xe_exec_threads.c
> > > > > > > b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
> > > > > > > --- a/tests/intel/xe_exec_threads.c
> > > > > > > +++ b/tests/intel/xe_exec_threads.c
> > > > > > > @@ -710,6 +710,17 @@ static void *thread(void *data)
> > > > > > >    	return NULL;
> > > > > > >    }
> > > > > > > 
> > > > > > > +static bool is_engine_contexts_victimized(int fd, unsigned int
> > > > > > > +flags) {
> > > > > > > +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> > > > > > > +		return false;
> > > > > as above, I don't think we should add any platform check here. It's
> > > > > impossible to keep it up to date and it's also testing the wrong thing.
> > > > > AFAIU you don't want parallel submission on engines that share the
> > > > > same reset domain. So, this is actually what should be tested.
> > > > Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which
> > > helps to run things parallelly on engines in same reset domain and apparently
> > > BMG/LNL does not have that kind of support so applicable for LNL/BMG with
> > > parallel submission on RCS/CCS only.
> > > > @Harrison, John C please reply if you have any other input here.
> > > I don't get what you mean by 'have some kind of WA/noWA'. All platforms
> > > with compute engines have shared reset domains. That is all there is to it. I.e.
> > > everything from TGL onwards. That includes RCS and all CCS engines. So RCS +
> > > CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform with multiple engines
> > > that talk to EUs will reset all of those engines in parallel.
> > > 
> > > There are w/a's which make the situation even worse. E.g. on DG2/MTL you
> > > are not allowed to context switch one of those engines while another is busy.
> > > Which means that if one hangs, they all hang - you cannot just wait for other
> > > workloads to complete and/or pre-empt them off the engine prior to doing
> > > the shared reset. But there is nothing that makes it better.
> > > 
> > > I assume we are talking about GuC triggered engine resets here? As opposed
> > > to driver triggered full GT resets?
> > > 
> > > The GuC will attempt to idle all other connected engines first by pre-empting
> > > out any executing contexts. If those contexts are pre-emptible then they will
> > > survive - GuC will automatically restart them once the reset is complete. If
> > > they are not (or at least not pre-emptible within the pre-emption timeout
> > > limit) then they will be killed as collateral damage.
> > > 
> > > What are the workloads being submitted by this test? Are the pre-emptible
> > > spinners? If so, then they should survive (assuming you don't have the
> > > DG2/MTL RCS/CCS w/a in effect). If they are non-preemptible spinners then
> > > they are toast.
> > Main question here was, if this fix should be applied to all platforms who has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG are hitting this issue, with same tests PVC and other platforms are not hitting issue which we are addressing here.
> And the answer is that yes, shared reset domains are common to all platforms
> with compute engines. So if only LNL/BMG are failing then the problem is not
> understood. Which is not helped by this test code being extremely complex
> and having almost zero explanation in it at all :(.
> 

Let me explain what this test is doing...

- It creates a thread per hardware engine instance
- Within each thread it creates many exec queues targeting 1 hardware
  instance
- It submits a bunch of batches which do dword write the exec queues
- If the HANG flag is set, 1 of the exec queue per thread will insert a
  non-preemptable spinner. It is expected the GuC will reset this exec
  queue only. Cross CCS / RCS resets will break this as one of the 
  'good' exec queues from another thread could also be reset.
- I think the HANG sections can fail on PVC too (VLK-57725)
- This is racey, as if the resets occur when all the 'bad' exec queues
  are running the test will still work with cross CCS / RCS resets

As the author of this test, I am fine with compute class just being
skipped if HANF flag set. It is not testing individual engine instance
resets (we had tests for that but might be temporally removed) rather
falls into the class of tests which trying to do a bunch of things in
parallel to stress the KMD.

Matt

> As noted, PVC has multiple compute engines but no RCS engine. If any compute
> engine is reset then all are reset. So if the test is running correctly and
> passing on PVC then it cannot be failing on LNL/BMG purely due to shared
> domain resets.
> 
> Is the reset not happening on PVC? Is the test not actually running multiple
> contexts in parallel on PVC? Or are the spinners pre-emptible and are
> therefore supposed to survive the reset of a shared domain engine by being
> swapped out first? In which case LNL/BMG are broken because the killed
> contexts are not supposed to be killed even though the engine is reset?
> 
> John.
> 
> > 
> > Thanks,
> > Tejas
> > > John.
> > > 
> > > 
> > > > Thanks,
> > > > Tejas
> > > > > Lucas De Marchi
> > > > > 
> > > > > > > +
> > > > > > > +	if (flags & HANG)
> > > > > > > +		return true;
> > > > > > > +
> > > > > > > +	return false;
> > > > > > > +}
> > > > > > > +
> > > > > > >    /**
> > > > > > >     * SUBTEST: threads-%s
> > > > > > >     * Description: Run threads %arg[1] test with multi threads @@
> > > > > > > -955,9 +966,13 @@ static void threads(int fd, int flags)
> > > > > > >    	bool go = false;
> > > > > > >    	int n_threads = 0;
> > > > > > >    	int gt;
> > > > > > > +	bool has_rcs = false;
> > > > > > > 
> > > > > > > -	xe_for_each_engine(fd, hwe)
> > > > > > > +	xe_for_each_engine(fd, hwe) {
> > > > > > > +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> > > > > > > +			has_rcs = true;
> > > > > > >    		++n_engines;
> > > > > > > +	}
> > > > > > > 
> > > > > > >    	if (flags & BALANCER) {
> > > > > > >    		xe_for_each_gt(fd, gt)
> > > > > > > @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> > > > > > >    	}
> > > > > > > 
> > > > > > >    	xe_for_each_engine(fd, hwe) {
> > > > > > > +		/* RCS/CCS sharing reset domain hence dependent engines.
> > > > > > > +		 * When CCS is doing reset, all the contexts of RCS are
> > > > > > > +		 * victimized, so skip the compute engine avoiding
> > > > > > > +		 * parallel execution with RCS
> > > > > > > +		 */
> > > > > > > +		if (has_rcs && hwe->engine_class ==
> > > > > DRM_XE_ENGINE_CLASS_COMPUTE &&
> > > > > > > +		    is_engine_contexts_victimized(fd, flags))
> > > > > > > +			continue;
> > > > > > > +
> > > > > > >    		threads_data[i].mutex = &mutex;
> > > > > > >    		threads_data[i].cond = &cond;
> > > > > > >    #define ADDRESS_SHIFT	39
> > > > > > > --
> > > > > > > 2.25.1
> > > > > > > 
> > > > > > --
> > > > > > Matt Roper
> > > > > > Graphics Software Engineer
> > > > > > Linux GPU Platform Enablement
> > > > > > Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-05 23:33             ` Matthew Brost
@ 2024-04-05 23:42               ` John Harrison
  2024-04-08  5:23                 ` Upadhyay, Tejas
  0 siblings, 1 reply; 21+ messages in thread
From: John Harrison @ 2024-04-05 23:42 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Upadhyay, Tejas, De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe

On 4/5/2024 16:33, Matthew Brost wrote:
> On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
>> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
>>>> -----Original Message-----
>>>> From: Harrison, John C<john.c.harrison@intel.com>
>>>> Sent: Friday, April 5, 2024 4:53 AM
>>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
>>>> <lucas.demarchi@intel.com>; Roper, Matthew D
>>>> <matthew.d.roper@intel.com>
>>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
>>>> Matthew<matthew.brost@intel.com>
>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
>>>> domain aware
>>>>
>>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
>>>>>> -----Original Message-----
>>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
>>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
>>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
>>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
>>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
>>>>>> Matthew<matthew.brost@intel.com>
>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
>>>>>> reset domain aware
>>>>>>
>>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>>>>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
>>>>>>>> Whenever there is reset from CCS, all the exec queues running on
>>>>>>>> RCS are victimised mainly on Lunarlake.
>>>>>>>>
>>>>>>>> Lets skip parallel execution on CCS with RCS.
>>>>>>> I haven't really looked at this specific test in detail, but based
>>>>>>> on your explanation here, you're also going to run into problems
>>>>>>> with multiple CCS engines since they all share the same reset.  You
>>>>>>> won't see that on platforms like LNL that only have a single CCS,
>>>>>>> but platforms
>>>>>> but it is seen on LNL because of having both RCS and CCS.
>>>>>>
>>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset
>>>>>>> on one kills anything running on the others.
>>>>>>>
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>>> It helps in fixing following errors:
>>>>>>>> 1. Test assertion failure function test_legacy_mode, file, Failed
>>>>>>>> assertion: data[i].data == 0xc0ffee
>>>>>>>>
>>>>>>>> 2.Test assertion failure function xe_exec, file
>>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0,
>>>>>>>> error: -125 != 0
>>>>>>>>
>>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
>>>>>>>> ---
>>>>>>>>     tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>>>>>>>     1 file changed, 25 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
>>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
>>>>>>>> --- a/tests/intel/xe_exec_threads.c
>>>>>>>> +++ b/tests/intel/xe_exec_threads.c
>>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>>>>>>>     	return NULL;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
>>>>>>>> +flags) {
>>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>>>>>>>> +		return false;
>>>>>> as above, I don't think we should add any platform check here. It's
>>>>>> impossible to keep it up to date and it's also testing the wrong thing.
>>>>>> AFAIU you don't want parallel submission on engines that share the
>>>>>> same reset domain. So, this is actually what should be tested.
>>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which
>>>> helps to run things parallelly on engines in same reset domain and apparently
>>>> BMG/LNL does not have that kind of support so applicable for LNL/BMG with
>>>> parallel submission on RCS/CCS only.
>>>>> @Harrison, John C please reply if you have any other input here.
>>>> I don't get what you mean by 'have some kind of WA/noWA'. All platforms
>>>> with compute engines have shared reset domains. That is all there is to it. I.e.
>>>> everything from TGL onwards. That includes RCS and all CCS engines. So RCS +
>>>> CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform with multiple engines
>>>> that talk to EUs will reset all of those engines in parallel.
>>>>
>>>> There are w/a's which make the situation even worse. E.g. on DG2/MTL you
>>>> are not allowed to context switch one of those engines while another is busy.
>>>> Which means that if one hangs, they all hang - you cannot just wait for other
>>>> workloads to complete and/or pre-empt them off the engine prior to doing
>>>> the shared reset. But there is nothing that makes it better.
>>>>
>>>> I assume we are talking about GuC triggered engine resets here? As opposed
>>>> to driver triggered full GT resets?
>>>>
>>>> The GuC will attempt to idle all other connected engines first by pre-empting
>>>> out any executing contexts. If those contexts are pre-emptible then they will
>>>> survive - GuC will automatically restart them once the reset is complete. If
>>>> they are not (or at least not pre-emptible within the pre-emption timeout
>>>> limit) then they will be killed as collateral damage.
>>>>
>>>> What are the workloads being submitted by this test? Are the pre-emptible
>>>> spinners? If so, then they should survive (assuming you don't have the
>>>> DG2/MTL RCS/CCS w/a in effect). If they are non-preemptible spinners then
>>>> they are toast.
>>> Main question here was, if this fix should be applied to all platforms who has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG are hitting this issue, with same tests PVC and other platforms are not hitting issue which we are addressing here.
>> And the answer is that yes, shared reset domains are common to all platforms
>> with compute engines. So if only LNL/BMG are failing then the problem is not
>> understood. Which is not helped by this test code being extremely complex
>> and having almost zero explanation in it at all :(.
>>
> Let me explain what this test is doing...
>
> - It creates a thread per hardware engine instance
> - Within each thread it creates many exec queues targeting 1 hardware
>    instance
> - It submits a bunch of batches which do dword write the exec queues
> - If the HANG flag is set, 1 of the exec queue per thread will insert a
>    non-preemptable spinner. It is expected the GuC will reset this exec
>    queue only. Cross CCS / RCS resets will break this as one of the
>    'good' exec queues from another thread could also be reset.
If the 'good' workloads are pre-emptible then they should not be reset. 
The GuC will attempt to pre-empt all shared domain engines prior to 
triggering any resets. If they are being killed then something is broken 
and needs to be fixed.

> - I think the HANG sections can fail on PVC too (VLK-57725)
> - This is racey, as if the resets occur when all the 'bad' exec queues
>    are running the test will still work with cross CCS / RCS resets
Can this description be added to the test?

> As the author of this test, I am fine with compute class just being
> skipped if HANF flag set. It is not testing individual engine instance
> resets (we had tests for that but might be temporally removed) rather
> falls into the class of tests which trying to do a bunch of things in
> parallel to stress the KMD.
Note that some platforms don't have RCS or media engines. Which means 
only running on BCS engines. Is that sufficient coverage?

And if this is not meant to be a reset test, why does it test resets at 
all? If the concern is that we don't have a stress test involving resets 
and this is the only coverage then it seems like we should not be 
crippling it.

John.


>
> Matt
>
>> As noted, PVC has multiple compute engines but no RCS engine. If any compute
>> engine is reset then all are reset. So if the test is running correctly and
>> passing on PVC then it cannot be failing on LNL/BMG purely due to shared
>> domain resets.
>>
>> Is the reset not happening on PVC? Is the test not actually running multiple
>> contexts in parallel on PVC? Or are the spinners pre-emptible and are
>> therefore supposed to survive the reset of a shared domain engine by being
>> swapped out first? In which case LNL/BMG are broken because the killed
>> contexts are not supposed to be killed even though the engine is reset?
>>
>> John.
>>
>>> Thanks,
>>> Tejas
>>>> John.
>>>>
>>>>
>>>>> Thanks,
>>>>> Tejas
>>>>>> Lucas De Marchi
>>>>>>
>>>>>>>> +
>>>>>>>> +	if (flags & HANG)
>>>>>>>> +		return true;
>>>>>>>> +
>>>>>>>> +	return false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * SUBTEST: threads-%s
>>>>>>>>      * Description: Run threads %arg[1] test with multi threads @@
>>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
>>>>>>>>     	bool go = false;
>>>>>>>>     	int n_threads = 0;
>>>>>>>>     	int gt;
>>>>>>>> +	bool has_rcs = false;
>>>>>>>>
>>>>>>>> -	xe_for_each_engine(fd, hwe)
>>>>>>>> +	xe_for_each_engine(fd, hwe) {
>>>>>>>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>>>>>>>> +			has_rcs = true;
>>>>>>>>     		++n_engines;
>>>>>>>> +	}
>>>>>>>>
>>>>>>>>     	if (flags & BALANCER) {
>>>>>>>>     		xe_for_each_gt(fd, gt)
>>>>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>>>>>>>     	}
>>>>>>>>
>>>>>>>>     	xe_for_each_engine(fd, hwe) {
>>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent engines.
>>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
>>>>>>>> +		 * victimized, so skip the compute engine avoiding
>>>>>>>> +		 * parallel execution with RCS
>>>>>>>> +		 */
>>>>>>>> +		if (has_rcs && hwe->engine_class ==
>>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
>>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
>>>>>>>> +			continue;
>>>>>>>> +
>>>>>>>>     		threads_data[i].mutex = &mutex;
>>>>>>>>     		threads_data[i].cond = &cond;
>>>>>>>>     #define ADDRESS_SHIFT	39
>>>>>>>> --
>>>>>>>> 2.25.1
>>>>>>>>
>>>>>>> --
>>>>>>> Matt Roper
>>>>>>> Graphics Software Engineer
>>>>>>> Linux GPU Platform Enablement
>>>>>>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-05 23:42               ` John Harrison
@ 2024-04-08  5:23                 ` Upadhyay, Tejas
  2024-04-08 12:00                   ` Upadhyay, Tejas
  0 siblings, 1 reply; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-08  5:23 UTC (permalink / raw)
  To: Harrison, John C, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe



> -----Original Message-----
> From: Harrison, John C <john.c.harrison@intel.com>
> Sent: Saturday, April 6, 2024 5:13 AM
> To: Brost, Matthew <matthew.brost@intel.com>
> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On 4/5/2024 16:33, Matthew Brost wrote:
> > On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> >> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> >>>> -----Original Message-----
> >>>> From: Harrison, John C<john.c.harrison@intel.com>
> >>>> Sent: Friday, April 5, 2024 4:53 AM
> >>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
> >>>> <lucas.demarchi@intel.com>; Roper, Matthew D
> >>>> <matthew.d.roper@intel.com>
> >>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >>>> Brost, Matthew<matthew.brost@intel.com>
> >>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>> tests reset domain aware
> >>>>
> >>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
> >>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
> >>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
> >>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> >>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org; Brost,
> >>>>>> Matthew<matthew.brost@intel.com>
> >>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>>>> tests reset domain aware
> >>>>>>
> >>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> >>>>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
> >>>>>>>> Whenever there is reset from CCS, all the exec queues running
> >>>>>>>> on RCS are victimised mainly on Lunarlake.
> >>>>>>>>
> >>>>>>>> Lets skip parallel execution on CCS with RCS.
> >>>>>>> I haven't really looked at this specific test in detail, but
> >>>>>>> based on your explanation here, you're also going to run into
> >>>>>>> problems with multiple CCS engines since they all share the same
> >>>>>>> reset.  You won't see that on platforms like LNL that only have
> >>>>>>> a single CCS, but platforms
> >>>>>> but it is seen on LNL because of having both RCS and CCS.
> >>>>>>
> >>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a
> >>>>>>> reset on one kills anything running on the others.
> >>>>>>>
> >>>>>>>
> >>>>>>> Matt
> >>>>>>>
> >>>>>>>> It helps in fixing following errors:
> >>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
> >>>>>>>> Failed
> >>>>>>>> assertion: data[i].data == 0xc0ffee
> >>>>>>>>
> >>>>>>>> 2.Test assertion failure function xe_exec, file
> >>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) ==
> >>>>>>>> 0,
> >>>>>>>> error: -125 != 0
> >>>>>>>>
> >>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> >>>>>>>> ---
> >>>>>>>>     tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> >>>>>>>>     1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
> >>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
> >>>>>>>> 100644
> >>>>>>>> --- a/tests/intel/xe_exec_threads.c
> >>>>>>>> +++ b/tests/intel/xe_exec_threads.c
> >>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>>>>>>>     	return NULL;
> >>>>>>>>     }
> >>>>>>>>
> >>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned int
> >>>>>>>> +flags) {
> >>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >>>>>>>> +		return false;
> >>>>>> as above, I don't think we should add any platform check here.
> >>>>>> It's impossible to keep it up to date and it's also testing the wrong
> thing.
> >>>>>> AFAIU you don't want parallel submission on engines that share
> >>>>>> the same reset domain. So, this is actually what should be tested.
> >>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA
> >>>>> which
> >>>> helps to run things parallelly on engines in same reset domain and
> >>>> apparently BMG/LNL does not have that kind of support so applicable
> >>>> for LNL/BMG with parallel submission on RCS/CCS only.
> >>>>> @Harrison, John C please reply if you have any other input here.
> >>>> I don't get what you mean by 'have some kind of WA/noWA'. All
> >>>> platforms with compute engines have shared reset domains. That is all
> there is to it. I.e.
> >>>> everything from TGL onwards. That includes RCS and all CCS engines.
> >>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform with
> >>>> multiple engines that talk to EUs will reset all of those engines in
> parallel.
> >>>>
> >>>> There are w/a's which make the situation even worse. E.g. on
> >>>> DG2/MTL you are not allowed to context switch one of those engines
> while another is busy.
> >>>> Which means that if one hangs, they all hang - you cannot just wait
> >>>> for other workloads to complete and/or pre-empt them off the engine
> >>>> prior to doing the shared reset. But there is nothing that makes it better.
> >>>>
> >>>> I assume we are talking about GuC triggered engine resets here? As
> >>>> opposed to driver triggered full GT resets?
> >>>>
> >>>> The GuC will attempt to idle all other connected engines first by
> >>>> pre-empting out any executing contexts. If those contexts are
> >>>> pre-emptible then they will survive - GuC will automatically
> >>>> restart them once the reset is complete. If they are not (or at
> >>>> least not pre-emptible within the pre-emption timeout
> >>>> limit) then they will be killed as collateral damage.
> >>>>
> >>>> What are the workloads being submitted by this test? Are the
> >>>> pre-emptible spinners? If so, then they should survive (assuming
> >>>> you don't have the DG2/MTL RCS/CCS w/a in effect). If they are
> >>>> non-preemptible spinners then they are toast.
> >>> Main question here was, if this fix should be applied to all platforms who
> has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG are
> hitting this issue, with same tests PVC and other platforms are not hitting
> issue which we are addressing here.
> >> And the answer is that yes, shared reset domains are common to all
> >> platforms with compute engines. So if only LNL/BMG are failing then
> >> the problem is not understood. Which is not helped by this test code
> >> being extremely complex and having almost zero explanation in it at all :(.
> >>
> > Let me explain what this test is doing...
> >
> > - It creates a thread per hardware engine instance
> > - Within each thread it creates many exec queues targeting 1 hardware
> >    instance
> > - It submits a bunch of batches which do dword write the exec queues
> > - If the HANG flag is set, 1 of the exec queue per thread will insert a
> >    non-preemptable spinner. It is expected the GuC will reset this exec
> >    queue only. Cross CCS / RCS resets will break this as one of the
> >    'good' exec queues from another thread could also be reset.
> If the 'good' workloads are pre-emptible then they should not be reset.
> The GuC will attempt to pre-empt all shared domain engines prior to
> triggering any resets. If they are being killed then something is broken and
> needs to be fixed.
> 
> > - I think the HANG sections can fail on PVC too (VLK-57725)
> > - This is racey, as if the resets occur when all the 'bad' exec queues
> >    are running the test will still work with cross CCS / RCS resets
> Can this description be added to the test?
> 
> > As the author of this test, I am fine with compute class just being
> > skipped if HANF flag set. It is not testing individual engine instance
> > resets (we had tests for that but might be temporally removed) rather
> > falls into the class of tests which trying to do a bunch of things in
> > parallel to stress the KMD.
> Note that some platforms don't have RCS or media engines. Which means
> only running on BCS engines. Is that sufficient coverage?

If you look at this patch, we skip compute only if rcs is present, otherwise not. So far I don’t see failure when 2 compute instances happen to enter this race.

Thanks,
Tejas

> 
> And if this is not meant to be a reset test, why does it test resets at all? If the
> concern is that we don't have a stress test involving resets and this is the only
> coverage then it seems like we should not be crippling it.
> 
> John.
> 
> 
> >
> > Matt
> >
> >> As noted, PVC has multiple compute engines but no RCS engine. If any
> >> compute engine is reset then all are reset. So if the test is running
> >> correctly and passing on PVC then it cannot be failing on LNL/BMG
> >> purely due to shared domain resets.
> >>
> >> Is the reset not happening on PVC? Is the test not actually running
> >> multiple contexts in parallel on PVC? Or are the spinners
> >> pre-emptible and are therefore supposed to survive the reset of a
> >> shared domain engine by being swapped out first? In which case
> >> LNL/BMG are broken because the killed contexts are not supposed to be
> killed even though the engine is reset?
> >>
> >> John.
> >>
> >>> Thanks,
> >>> Tejas
> >>>> John.
> >>>>
> >>>>
> >>>>> Thanks,
> >>>>> Tejas
> >>>>>> Lucas De Marchi
> >>>>>>
> >>>>>>>> +
> >>>>>>>> +	if (flags & HANG)
> >>>>>>>> +		return true;
> >>>>>>>> +
> >>>>>>>> +	return false;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>     /**
> >>>>>>>>      * SUBTEST: threads-%s
> >>>>>>>>      * Description: Run threads %arg[1] test with multi threads
> >>>>>>>> @@
> >>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>>>>>>>     	bool go = false;
> >>>>>>>>     	int n_threads = 0;
> >>>>>>>>     	int gt;
> >>>>>>>> +	bool has_rcs = false;
> >>>>>>>>
> >>>>>>>> -	xe_for_each_engine(fd, hwe)
> >>>>>>>> +	xe_for_each_engine(fd, hwe) {
> >>>>>>>> +		if (hwe->engine_class ==
> DRM_XE_ENGINE_CLASS_RENDER)
> >>>>>>>> +			has_rcs = true;
> >>>>>>>>     		++n_engines;
> >>>>>>>> +	}
> >>>>>>>>
> >>>>>>>>     	if (flags & BALANCER) {
> >>>>>>>>     		xe_for_each_gt(fd, gt)
> >>>>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> >>>>>>>>     	}
> >>>>>>>>
> >>>>>>>>     	xe_for_each_engine(fd, hwe) {
> >>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent
> engines.
> >>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
> >>>>>>>> +		 * victimized, so skip the compute engine avoiding
> >>>>>>>> +		 * parallel execution with RCS
> >>>>>>>> +		 */
> >>>>>>>> +		if (has_rcs && hwe->engine_class ==
> >>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
> >>>>>>>> +			continue;
> >>>>>>>> +
> >>>>>>>>     		threads_data[i].mutex = &mutex;
> >>>>>>>>     		threads_data[i].cond = &cond;
> >>>>>>>>     #define ADDRESS_SHIFT	39
> >>>>>>>> --
> >>>>>>>> 2.25.1
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Matt Roper
> >>>>>>> Graphics Software Engineer
> >>>>>>> Linux GPU Platform Enablement
> >>>>>>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-08  5:23                 ` Upadhyay, Tejas
@ 2024-04-08 12:00                   ` Upadhyay, Tejas
  2024-04-10 19:22                     ` John Harrison
  0 siblings, 1 reply; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-08 12:00 UTC (permalink / raw)
  To: Harrison, John C, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe



> -----Original Message-----
> From: Upadhyay, Tejas
> Sent: Monday, April 8, 2024 10:54 AM
> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>
> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> 
> 
> > -----Original Message-----
> > From: Harrison, John C <john.c.harrison@intel.com>
> > Sent: Saturday, April 6, 2024 5:13 AM
> > To: Brost, Matthew <matthew.brost@intel.com>
> > Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> > <lucas.demarchi@intel.com>; Roper, Matthew D
> > <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> > xe@lists.freedesktop.org
> > Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> > reset domain aware
> >
> > On 4/5/2024 16:33, Matthew Brost wrote:
> > > On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> > >> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> > >>>> -----Original Message-----
> > >>>> From: Harrison, John C<john.c.harrison@intel.com>
> > >>>> Sent: Friday, April 5, 2024 4:53 AM
> > >>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
> > >>>> <lucas.demarchi@intel.com>; Roper, Matthew D
> > >>>> <matthew.d.roper@intel.com>
> > >>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> > >>>> Brost, Matthew<matthew.brost@intel.com>
> > >>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> > >>>> tests reset domain aware
> > >>>>
> > >>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> > >>>>>> -----Original Message-----
> > >>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
> > >>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
> > >>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
> > >>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> > >>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> > >>>>>> Brost, Matthew<matthew.brost@intel.com>
> > >>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> > >>>>>> tests reset domain aware
> > >>>>>>
> > >>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> > >>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> > >>>>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
> > >>>>>>>> Whenever there is reset from CCS, all the exec queues running
> > >>>>>>>> on RCS are victimised mainly on Lunarlake.
> > >>>>>>>>
> > >>>>>>>> Lets skip parallel execution on CCS with RCS.
> > >>>>>>> I haven't really looked at this specific test in detail, but
> > >>>>>>> based on your explanation here, you're also going to run into
> > >>>>>>> problems with multiple CCS engines since they all share the
> > >>>>>>> same reset.  You won't see that on platforms like LNL that
> > >>>>>>> only have a single CCS, but platforms
> > >>>>>> but it is seen on LNL because of having both RCS and CCS.
> > >>>>>>
> > >>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a
> > >>>>>>> reset on one kills anything running on the others.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Matt
> > >>>>>>>
> > >>>>>>>> It helps in fixing following errors:
> > >>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
> > >>>>>>>> Failed
> > >>>>>>>> assertion: data[i].data == 0xc0ffee
> > >>>>>>>>
> > >>>>>>>> 2.Test assertion failure function xe_exec, file
> > >>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec)
> > >>>>>>>> == 0,
> > >>>>>>>> error: -125 != 0
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> > >>>>>>>> ---
> > >>>>>>>>     tests/intel/xe_exec_threads.c | 26
> +++++++++++++++++++++++++-
> > >>>>>>>>     1 file changed, 25 insertions(+), 1 deletion(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
> > >>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
> > >>>>>>>> 100644
> > >>>>>>>> --- a/tests/intel/xe_exec_threads.c
> > >>>>>>>> +++ b/tests/intel/xe_exec_threads.c
> > >>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> > >>>>>>>>     	return NULL;
> > >>>>>>>>     }
> > >>>>>>>>
> > >>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned
> > >>>>>>>> +int
> > >>>>>>>> +flags) {
> > >>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> > >>>>>>>> +		return false;
> > >>>>>> as above, I don't think we should add any platform check here.
> > >>>>>> It's impossible to keep it up to date and it's also testing the
> > >>>>>> wrong
> > thing.
> > >>>>>> AFAIU you don't want parallel submission on engines that share
> > >>>>>> the same reset domain. So, this is actually what should be tested.
> > >>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA
> > >>>>> which
> > >>>> helps to run things parallelly on engines in same reset domain
> > >>>> and apparently BMG/LNL does not have that kind of support so
> > >>>> applicable for LNL/BMG with parallel submission on RCS/CCS only.
> > >>>>> @Harrison, John C please reply if you have any other input here.
> > >>>> I don't get what you mean by 'have some kind of WA/noWA'. All
> > >>>> platforms with compute engines have shared reset domains. That is
> > >>>> all
> > there is to it. I.e.
> > >>>> everything from TGL onwards. That includes RCS and all CCS engines.
> > >>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform
> > >>>> with multiple engines that talk to EUs will reset all of those
> > >>>> engines in
> > parallel.
> > >>>>
> > >>>> There are w/a's which make the situation even worse. E.g. on
> > >>>> DG2/MTL you are not allowed to context switch one of those
> > >>>> engines
> > while another is busy.
> > >>>> Which means that if one hangs, they all hang - you cannot just
> > >>>> wait for other workloads to complete and/or pre-empt them off the
> > >>>> engine prior to doing the shared reset. But there is nothing that makes
> it better.
> > >>>>
> > >>>> I assume we are talking about GuC triggered engine resets here?
> > >>>> As opposed to driver triggered full GT resets?
> > >>>>
> > >>>> The GuC will attempt to idle all other connected engines first by
> > >>>> pre-empting out any executing contexts. If those contexts are
> > >>>> pre-emptible then they will survive - GuC will automatically
> > >>>> restart them once the reset is complete. If they are not (or at
> > >>>> least not pre-emptible within the pre-emption timeout
> > >>>> limit) then they will be killed as collateral damage.
> > >>>>
> > >>>> What are the workloads being submitted by this test? Are the
> > >>>> pre-emptible spinners? If so, then they should survive (assuming
> > >>>> you don't have the DG2/MTL RCS/CCS w/a in effect). If they are
> > >>>> non-preemptible spinners then they are toast.
> > >>> Main question here was, if this fix should be applied to all
> > >>> platforms who
> > has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG
> > are hitting this issue, with same tests PVC and other platforms are
> > not hitting issue which we are addressing here.
> > >> And the answer is that yes, shared reset domains are common to all
> > >> platforms with compute engines. So if only LNL/BMG are failing then
> > >> the problem is not understood. Which is not helped by this test
> > >> code being extremely complex and having almost zero explanation in it at
> all :(.
> > >>
> > > Let me explain what this test is doing...
> > >
> > > - It creates a thread per hardware engine instance
> > > - Within each thread it creates many exec queues targeting 1 hardware
> > >    instance
> > > - It submits a bunch of batches which do dword write the exec queues
> > > - If the HANG flag is set, 1 of the exec queue per thread will insert a
> > >    non-preemptable spinner. It is expected the GuC will reset this exec
> > >    queue only. Cross CCS / RCS resets will break this as one of the
> > >    'good' exec queues from another thread could also be reset.
> > If the 'good' workloads are pre-emptible then they should not be reset.
> > The GuC will attempt to pre-empt all shared domain engines prior to
> > triggering any resets. If they are being killed then something is
> > broken and needs to be fixed.
> >
> > > - I think the HANG sections can fail on PVC too (VLK-57725)
> > > - This is racey, as if the resets occur when all the 'bad' exec queues
> > >    are running the test will still work with cross CCS / RCS resets
> > Can this description be added to the test?
> >
> > > As the author of this test, I am fine with compute class just being
> > > skipped if HANF flag set. It is not testing individual engine
> > > instance resets (we had tests for that but might be temporally
> > > removed) rather falls into the class of tests which trying to do a
> > > bunch of things in parallel to stress the KMD.
> > Note that some platforms don't have RCS or media engines. Which means
> > only running on BCS engines. Is that sufficient coverage?
> 
> If you look at this patch, we skip compute only if rcs is present, otherwise not.
> So far I don’t see failure when 2 compute instances happen to enter this race.
> 

I can modify test to skip in case 2 ccs and no rcs with HANG tests. Even though issue not seen anywhere else if change needs to be generic. 
 
> Thanks,
> Tejas
> 
> >
> > And if this is not meant to be a reset test, why does it test resets
> > at all? If the concern is that we don't have a stress test involving
> > resets and this is the only coverage then it seems like we should not be
> crippling it.
> >
> > John.
> >
> >
> > >
> > > Matt
> > >
> > >> As noted, PVC has multiple compute engines but no RCS engine. If
> > >> any compute engine is reset then all are reset. So if the test is
> > >> running correctly and passing on PVC then it cannot be failing on
> > >> LNL/BMG purely due to shared domain resets.
> > >>
> > >> Is the reset not happening on PVC? Is the test not actually running
> > >> multiple contexts in parallel on PVC? Or are the spinners
> > >> pre-emptible and are therefore supposed to survive the reset of a
> > >> shared domain engine by being swapped out first? In which case
> > >> LNL/BMG are broken because the killed contexts are not supposed to
> > >> be
> > killed even though the engine is reset?
> > >>
> > >> John.
> > >>
> > >>> Thanks,
> > >>> Tejas
> > >>>> John.
> > >>>>
> > >>>>
> > >>>>> Thanks,
> > >>>>> Tejas
> > >>>>>> Lucas De Marchi
> > >>>>>>
> > >>>>>>>> +
> > >>>>>>>> +	if (flags & HANG)
> > >>>>>>>> +		return true;
> > >>>>>>>> +
> > >>>>>>>> +	return false;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>>     /**
> > >>>>>>>>      * SUBTEST: threads-%s
> > >>>>>>>>      * Description: Run threads %arg[1] test with multi
> > >>>>>>>> threads @@
> > >>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> > >>>>>>>>     	bool go = false;
> > >>>>>>>>     	int n_threads = 0;
> > >>>>>>>>     	int gt;
> > >>>>>>>> +	bool has_rcs = false;
> > >>>>>>>>
> > >>>>>>>> -	xe_for_each_engine(fd, hwe)
> > >>>>>>>> +	xe_for_each_engine(fd, hwe) {
> > >>>>>>>> +		if (hwe->engine_class ==
> > DRM_XE_ENGINE_CLASS_RENDER)
> > >>>>>>>> +			has_rcs = true;
> > >>>>>>>>     		++n_engines;
> > >>>>>>>> +	}
> > >>>>>>>>
> > >>>>>>>>     	if (flags & BALANCER) {
> > >>>>>>>>     		xe_for_each_gt(fd, gt)
> > >>>>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> > >>>>>>>>     	}
> > >>>>>>>>
> > >>>>>>>>     	xe_for_each_engine(fd, hwe) {
> > >>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent
> > engines.
> > >>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
> > >>>>>>>> +		 * victimized, so skip the compute engine avoiding
> > >>>>>>>> +		 * parallel execution with RCS
> > >>>>>>>> +		 */
> > >>>>>>>> +		if (has_rcs && hwe->engine_class ==
> > >>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> > >>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
> > >>>>>>>> +			continue;
> > >>>>>>>> +
> > >>>>>>>>     		threads_data[i].mutex = &mutex;
> > >>>>>>>>     		threads_data[i].cond = &cond;
> > >>>>>>>>     #define ADDRESS_SHIFT	39
> > >>>>>>>> --
> > >>>>>>>> 2.25.1
> > >>>>>>>>
> > >>>>>>> --
> > >>>>>>> Matt Roper
> > >>>>>>> Graphics Software Engineer
> > >>>>>>> Linux GPU Platform Enablement
> > >>>>>>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-08 12:00                   ` Upadhyay, Tejas
@ 2024-04-10 19:22                     ` John Harrison
  2024-04-11  5:12                       ` Upadhyay, Tejas
  2024-04-23 13:06                       ` Upadhyay, Tejas
  0 siblings, 2 replies; 21+ messages in thread
From: John Harrison @ 2024-04-10 19:22 UTC (permalink / raw)
  To: Upadhyay, Tejas, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe

On 4/8/2024 05:00, Upadhyay, Tejas wrote:
>> -----Original Message-----
>> From: Upadhyay, Tejas
>> Sent: Monday, April 8, 2024 10:54 AM
>> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
>> <matthew.brost@intel.com>
>> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
>> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
>> xe@lists.freedesktop.org
>> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
>> domain aware
>>
>>
>>
>>> -----Original Message-----
>>> From: Harrison, John C <john.c.harrison@intel.com>
>>> Sent: Saturday, April 6, 2024 5:13 AM
>>> To: Brost, Matthew <matthew.brost@intel.com>
>>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
>>> <lucas.demarchi@intel.com>; Roper, Matthew D
>>> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
>>> xe@lists.freedesktop.org
>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
>>> reset domain aware
>>>
>>> On 4/5/2024 16:33, Matthew Brost wrote:
>>>> On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
>>>>> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Harrison, John C<john.c.harrison@intel.com>
>>>>>>> Sent: Friday, April 5, 2024 4:53 AM
>>>>>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
>>>>>>> <lucas.demarchi@intel.com>; Roper, Matthew D
>>>>>>> <matthew.d.roper@intel.com>
>>>>>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>>>>> Brost, Matthew<matthew.brost@intel.com>
>>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
>>>>>>> tests reset domain aware
>>>>>>>
>>>>>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
>>>>>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
>>>>>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
>>>>>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
>>>>>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>>>>>>> Brost, Matthew<matthew.brost@intel.com>
>>>>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
>>>>>>>>> tests reset domain aware
>>>>>>>>>
>>>>>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>>>>>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>>>>>>>>>>> RCS/CCS are dependent engines as they are sharing reset domain.
>>>>>>>>>>> Whenever there is reset from CCS, all the exec queues running
>>>>>>>>>>> on RCS are victimised mainly on Lunarlake.
>>>>>>>>>>>
>>>>>>>>>>> Lets skip parallel execution on CCS with RCS.
>>>>>>>>>> I haven't really looked at this specific test in detail, but
>>>>>>>>>> based on your explanation here, you're also going to run into
>>>>>>>>>> problems with multiple CCS engines since they all share the
>>>>>>>>>> same reset.  You won't see that on platforms like LNL that
>>>>>>>>>> only have a single CCS, but platforms
>>>>>>>>> but it is seen on LNL because of having both RCS and CCS.
>>>>>>>>>
>>>>>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a
>>>>>>>>>> reset on one kills anything running on the others.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>>> It helps in fixing following errors:
>>>>>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
>>>>>>>>>>> Failed
>>>>>>>>>>> assertion: data[i].data == 0xc0ffee
>>>>>>>>>>>
>>>>>>>>>>> 2.Test assertion failure function xe_exec, file
>>>>>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec)
>>>>>>>>>>> == 0,
>>>>>>>>>>> error: -125 != 0
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>>      tests/intel/xe_exec_threads.c | 26
>> +++++++++++++++++++++++++-
>>>>>>>>>>>      1 file changed, 25 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
>>>>>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
>>>>>>>>>>> 100644
>>>>>>>>>>> --- a/tests/intel/xe_exec_threads.c
>>>>>>>>>>> +++ b/tests/intel/xe_exec_threads.c
>>>>>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>>>>>>>>>>      	return NULL;
>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned
>>>>>>>>>>> +int
>>>>>>>>>>> +flags) {
>>>>>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>>>>>>>>>>> +		return false;
>>>>>>>>> as above, I don't think we should add any platform check here.
>>>>>>>>> It's impossible to keep it up to date and it's also testing the
>>>>>>>>> wrong
>>> thing.
>>>>>>>>> AFAIU you don't want parallel submission on engines that share
>>>>>>>>> the same reset domain. So, this is actually what should be tested.
>>>>>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA
>>>>>>>> which
>>>>>>> helps to run things parallelly on engines in same reset domain
>>>>>>> and apparently BMG/LNL does not have that kind of support so
>>>>>>> applicable for LNL/BMG with parallel submission on RCS/CCS only.
>>>>>>>> @Harrison, John C please reply if you have any other input here.
>>>>>>> I don't get what you mean by 'have some kind of WA/noWA'. All
>>>>>>> platforms with compute engines have shared reset domains. That is
>>>>>>> all
>>> there is to it. I.e.
>>>>>>> everything from TGL onwards. That includes RCS and all CCS engines.
>>>>>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform
>>>>>>> with multiple engines that talk to EUs will reset all of those
>>>>>>> engines in
>>> parallel.
>>>>>>> There are w/a's which make the situation even worse. E.g. on
>>>>>>> DG2/MTL you are not allowed to context switch one of those
>>>>>>> engines
>>> while another is busy.
>>>>>>> Which means that if one hangs, they all hang - you cannot just
>>>>>>> wait for other workloads to complete and/or pre-empt them off the
>>>>>>> engine prior to doing the shared reset. But there is nothing that makes
>> it better.
>>>>>>> I assume we are talking about GuC triggered engine resets here?
>>>>>>> As opposed to driver triggered full GT resets?
>>>>>>>
>>>>>>> The GuC will attempt to idle all other connected engines first by
>>>>>>> pre-empting out any executing contexts. If those contexts are
>>>>>>> pre-emptible then they will survive - GuC will automatically
>>>>>>> restart them once the reset is complete. If they are not (or at
>>>>>>> least not pre-emptible within the pre-emption timeout
>>>>>>> limit) then they will be killed as collateral damage.
>>>>>>>
>>>>>>> What are the workloads being submitted by this test? Are the
>>>>>>> pre-emptible spinners? If so, then they should survive (assuming
>>>>>>> you don't have the DG2/MTL RCS/CCS w/a in effect). If they are
>>>>>>> non-preemptible spinners then they are toast.
>>>>>> Main question here was, if this fix should be applied to all
>>>>>> platforms who
>>> has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG
>>> are hitting this issue, with same tests PVC and other platforms are
>>> not hitting issue which we are addressing here.
>>>>> And the answer is that yes, shared reset domains are common to all
>>>>> platforms with compute engines. So if only LNL/BMG are failing then
>>>>> the problem is not understood. Which is not helped by this test
>>>>> code being extremely complex and having almost zero explanation in it at
>> all :(.
>>>> Let me explain what this test is doing...
>>>>
>>>> - It creates a thread per hardware engine instance
>>>> - Within each thread it creates many exec queues targeting 1 hardware
>>>>     instance
>>>> - It submits a bunch of batches which do dword write the exec queues
>>>> - If the HANG flag is set, 1 of the exec queue per thread will insert a
>>>>     non-preemptable spinner. It is expected the GuC will reset this exec
>>>>     queue only. Cross CCS / RCS resets will break this as one of the
>>>>     'good' exec queues from another thread could also be reset.
>>> If the 'good' workloads are pre-emptible then they should not be reset.
>>> The GuC will attempt to pre-empt all shared domain engines prior to
>>> triggering any resets. If they are being killed then something is
>>> broken and needs to be fixed.
>>>
>>>> - I think the HANG sections can fail on PVC too (VLK-57725)
>>>> - This is racey, as if the resets occur when all the 'bad' exec queues
>>>>     are running the test will still work with cross CCS / RCS resets
>>> Can this description be added to the test?
>>>
>>>> As the author of this test, I am fine with compute class just being
>>>> skipped if HANF flag set. It is not testing individual engine
>>>> instance resets (we had tests for that but might be temporally
>>>> removed) rather falls into the class of tests which trying to do a
>>>> bunch of things in parallel to stress the KMD.
>>> Note that some platforms don't have RCS or media engines. Which means
>>> only running on BCS engines. Is that sufficient coverage?
>> If you look at this patch, we skip compute only if rcs is present, otherwise not.
>> So far I don’t see failure when 2 compute instances happen to enter this race.
>>
> I can modify test to skip in case 2 ccs and no rcs with HANG tests. Even though issue not seen anywhere else if change needs to be generic.
My point is not to make random and unexplained changes to the test but 
to understand why the test is failing the way it is. So far, the 
explanation does not make sense.

See above about pre-emptible workloads should not be killed. LNL/BMG do 
not have the RCS/CCS workaround of DG2 that prevents pre-emptions and 
context switches while the other side is busy. So I am not seeing a 
reason why the test is failing. That needs to be explained before simply 
making it skip on those platforms.

John.

>   
>> Thanks,
>> Tejas
>>
>>> And if this is not meant to be a reset test, why does it test resets
>>> at all? If the concern is that we don't have a stress test involving
>>> resets and this is the only coverage then it seems like we should not be
>> crippling it.
>>> John.
>>>
>>>
>>>> Matt
>>>>
>>>>> As noted, PVC has multiple compute engines but no RCS engine. If
>>>>> any compute engine is reset then all are reset. So if the test is
>>>>> running correctly and passing on PVC then it cannot be failing on
>>>>> LNL/BMG purely due to shared domain resets.
>>>>>
>>>>> Is the reset not happening on PVC? Is the test not actually running
>>>>> multiple contexts in parallel on PVC? Or are the spinners
>>>>> pre-emptible and are therefore supposed to survive the reset of a
>>>>> shared domain engine by being swapped out first? In which case
>>>>> LNL/BMG are broken because the killed contexts are not supposed to
>>>>> be
>>> killed even though the engine is reset?
>>>>> John.
>>>>>
>>>>>> Thanks,
>>>>>> Tejas
>>>>>>> John.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tejas
>>>>>>>>> Lucas De Marchi
>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +	if (flags & HANG)
>>>>>>>>>>> +		return true;
>>>>>>>>>>> +
>>>>>>>>>>> +	return false;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>      /**
>>>>>>>>>>>       * SUBTEST: threads-%s
>>>>>>>>>>>       * Description: Run threads %arg[1] test with multi
>>>>>>>>>>> threads @@
>>>>>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
>>>>>>>>>>>      	bool go = false;
>>>>>>>>>>>      	int n_threads = 0;
>>>>>>>>>>>      	int gt;
>>>>>>>>>>> +	bool has_rcs = false;
>>>>>>>>>>>
>>>>>>>>>>> -	xe_for_each_engine(fd, hwe)
>>>>>>>>>>> +	xe_for_each_engine(fd, hwe) {
>>>>>>>>>>> +		if (hwe->engine_class ==
>>> DRM_XE_ENGINE_CLASS_RENDER)
>>>>>>>>>>> +			has_rcs = true;
>>>>>>>>>>>      		++n_engines;
>>>>>>>>>>> +	}
>>>>>>>>>>>
>>>>>>>>>>>      	if (flags & BALANCER) {
>>>>>>>>>>>      		xe_for_each_gt(fd, gt)
>>>>>>>>>>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>>>>>>>>>>      	}
>>>>>>>>>>>
>>>>>>>>>>>      	xe_for_each_engine(fd, hwe) {
>>>>>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent
>>> engines.
>>>>>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
>>>>>>>>>>> +		 * victimized, so skip the compute engine avoiding
>>>>>>>>>>> +		 * parallel execution with RCS
>>>>>>>>>>> +		 */
>>>>>>>>>>> +		if (has_rcs && hwe->engine_class ==
>>>>>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
>>>>>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
>>>>>>>>>>> +			continue;
>>>>>>>>>>> +
>>>>>>>>>>>      		threads_data[i].mutex = &mutex;
>>>>>>>>>>>      		threads_data[i].cond = &cond;
>>>>>>>>>>>      #define ADDRESS_SHIFT	39
>>>>>>>>>>> --
>>>>>>>>>>> 2.25.1
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Matt Roper
>>>>>>>>>> Graphics Software Engineer
>>>>>>>>>> Linux GPU Platform Enablement
>>>>>>>>>> Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-10 19:22                     ` John Harrison
@ 2024-04-11  5:12                       ` Upadhyay, Tejas
  2024-04-11  5:37                         ` Upadhyay, Tejas
  2024-04-23 13:06                       ` Upadhyay, Tejas
  1 sibling, 1 reply; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-11  5:12 UTC (permalink / raw)
  To: Harrison, John C, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe



> -----Original Message-----
> From: Harrison, John C <john.c.harrison@intel.com>
> Sent: Thursday, April 11, 2024 12:52 AM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>
> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On 4/8/2024 05:00, Upadhyay, Tejas wrote:
> >> -----Original Message-----
> >> From: Upadhyay, Tejas
> >> Sent: Monday, April 8, 2024 10:54 AM
> >> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
> >> <matthew.brost@intel.com>
> >> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> >> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> >> xe@lists.freedesktop.org
> >> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >> reset domain aware
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Harrison, John C <john.c.harrison@intel.com>
> >>> Sent: Saturday, April 6, 2024 5:13 AM
> >>> To: Brost, Matthew <matthew.brost@intel.com>
> >>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> >>> <lucas.demarchi@intel.com>; Roper, Matthew D
> >>> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> >>> xe@lists.freedesktop.org
> >>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >>> reset domain aware
> >>>
> >>> On 4/5/2024 16:33, Matthew Brost wrote:
> >>>> On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> >>>>> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: Harrison, John C<john.c.harrison@intel.com>
> >>>>>>> Sent: Friday, April 5, 2024 4:53 AM
> >>>>>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
> >>>>>>> <lucas.demarchi@intel.com>; Roper, Matthew D
> >>>>>>> <matthew.d.roper@intel.com>
> >>>>>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >>>>>>> Brost, Matthew<matthew.brost@intel.com>
> >>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>>>>> tests reset domain aware
> >>>>>>>
> >>>>>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
> >>>>>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
> >>>>>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
> >>>>>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> >>>>>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >>>>>>>>> Brost, Matthew<matthew.brost@intel.com>
> >>>>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>>>>>>> tests reset domain aware
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >>>>>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay
> wrote:
> >>>>>>>>>>> RCS/CCS are dependent engines as they are sharing reset
> domain.
> >>>>>>>>>>> Whenever there is reset from CCS, all the exec queues
> >>>>>>>>>>> running on RCS are victimised mainly on Lunarlake.
> >>>>>>>>>>>
> >>>>>>>>>>> Lets skip parallel execution on CCS with RCS.
> >>>>>>>>>> I haven't really looked at this specific test in detail, but
> >>>>>>>>>> based on your explanation here, you're also going to run into
> >>>>>>>>>> problems with multiple CCS engines since they all share the
> >>>>>>>>>> same reset.  You won't see that on platforms like LNL that
> >>>>>>>>>> only have a single CCS, but platforms
> >>>>>>>>> but it is seen on LNL because of having both RCS and CCS.
> >>>>>>>>>
> >>>>>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a
> >>>>>>>>>> reset on one kills anything running on the others.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Matt
> >>>>>>>>>>
> >>>>>>>>>>> It helps in fixing following errors:
> >>>>>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
> >>>>>>>>>>> Failed
> >>>>>>>>>>> assertion: data[i].data == 0xc0ffee
> >>>>>>>>>>>
> >>>>>>>>>>> 2.Test assertion failure function xe_exec, file
> >>>>>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec)
> >>>>>>>>>>> == 0,
> >>>>>>>>>>> error: -125 != 0
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>      tests/intel/xe_exec_threads.c | 26
> >> +++++++++++++++++++++++++-
> >>>>>>>>>>>      1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
> >>>>>>>>>>> 100644
> >>>>>>>>>>> --- a/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> +++ b/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>>>>>>>>>>      	return NULL;
> >>>>>>>>>>>      }
> >>>>>>>>>>>
> >>>>>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned
> >>>>>>>>>>> +int
> >>>>>>>>>>> +flags) {
> >>>>>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >>>>>>>>>>> +		return false;
> >>>>>>>>> as above, I don't think we should add any platform check here.
> >>>>>>>>> It's impossible to keep it up to date and it's also testing
> >>>>>>>>> the wrong
> >>> thing.
> >>>>>>>>> AFAIU you don't want parallel submission on engines that share
> >>>>>>>>> the same reset domain. So, this is actually what should be tested.
> >>>>>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA
> >>>>>>>> which
> >>>>>>> helps to run things parallelly on engines in same reset domain
> >>>>>>> and apparently BMG/LNL does not have that kind of support so
> >>>>>>> applicable for LNL/BMG with parallel submission on RCS/CCS only.
> >>>>>>>> @Harrison, John C please reply if you have any other input here.
> >>>>>>> I don't get what you mean by 'have some kind of WA/noWA'. All
> >>>>>>> platforms with compute engines have shared reset domains. That
> >>>>>>> is all
> >>> there is to it. I.e.
> >>>>>>> everything from TGL onwards. That includes RCS and all CCS engines.
> >>>>>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform
> >>>>>>> with multiple engines that talk to EUs will reset all of those
> >>>>>>> engines in
> >>> parallel.
> >>>>>>> There are w/a's which make the situation even worse. E.g. on
> >>>>>>> DG2/MTL you are not allowed to context switch one of those
> >>>>>>> engines
> >>> while another is busy.
> >>>>>>> Which means that if one hangs, they all hang - you cannot just
> >>>>>>> wait for other workloads to complete and/or pre-empt them off
> >>>>>>> the engine prior to doing the shared reset. But there is nothing
> >>>>>>> that makes
> >> it better.
> >>>>>>> I assume we are talking about GuC triggered engine resets here?
> >>>>>>> As opposed to driver triggered full GT resets?
> >>>>>>>
> >>>>>>> The GuC will attempt to idle all other connected engines first
> >>>>>>> by pre-empting out any executing contexts. If those contexts are
> >>>>>>> pre-emptible then they will survive - GuC will automatically
> >>>>>>> restart them once the reset is complete. If they are not (or at
> >>>>>>> least not pre-emptible within the pre-emption timeout
> >>>>>>> limit) then they will be killed as collateral damage.
> >>>>>>>
> >>>>>>> What are the workloads being submitted by this test? Are the
> >>>>>>> pre-emptible spinners? If so, then they should survive (assuming
> >>>>>>> you don't have the DG2/MTL RCS/CCS w/a in effect). If they are
> >>>>>>> non-preemptible spinners then they are toast.
> >>>>>> Main question here was, if this fix should be applied to all
> >>>>>> platforms who
> >>> has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG
> >>> are hitting this issue, with same tests PVC and other platforms are
> >>> not hitting issue which we are addressing here.
> >>>>> And the answer is that yes, shared reset domains are common to all
> >>>>> platforms with compute engines. So if only LNL/BMG are failing
> >>>>> then the problem is not understood. Which is not helped by this
> >>>>> test code being extremely complex and having almost zero
> >>>>> explanation in it at
> >> all :(.
> >>>> Let me explain what this test is doing...
> >>>>
> >>>> - It creates a thread per hardware engine instance
> >>>> - Within each thread it creates many exec queues targeting 1 hardware
> >>>>     instance
> >>>> - It submits a bunch of batches which do dword write the exec
> >>>> queues
> >>>> - If the HANG flag is set, 1 of the exec queue per thread will insert a
> >>>>     non-preemptable spinner. It is expected the GuC will reset this exec
> >>>>     queue only. Cross CCS / RCS resets will break this as one of the
> >>>>     'good' exec queues from another thread could also be reset.
> >>> If the 'good' workloads are pre-emptible then they should not be reset.
> >>> The GuC will attempt to pre-empt all shared domain engines prior to
> >>> triggering any resets. If they are being killed then something is
> >>> broken and needs to be fixed.
> >>>
> >>>> - I think the HANG sections can fail on PVC too (VLK-57725)
> >>>> - This is racey, as if the resets occur when all the 'bad' exec queues
> >>>>     are running the test will still work with cross CCS / RCS
> >>>> resets
> >>> Can this description be added to the test?
> >>>
> >>>> As the author of this test, I am fine with compute class just being
> >>>> skipped if HANF flag set. It is not testing individual engine
> >>>> instance resets (we had tests for that but might be temporally
> >>>> removed) rather falls into the class of tests which trying to do a
> >>>> bunch of things in parallel to stress the KMD.
> >>> Note that some platforms don't have RCS or media engines. Which
> >>> means only running on BCS engines. Is that sufficient coverage?
> >> If you look at this patch, we skip compute only if rcs is present, otherwise
> not.
> >> So far I don’t see failure when 2 compute instances happen to enter this
> race.
> >>
> > I can modify test to skip in case 2 ccs and no rcs with HANG tests. Even
> though issue not seen anywhere else if change needs to be generic.
> My point is not to make random and unexplained changes to the test but to
> understand why the test is failing the way it is. So far, the explanation does
> not make sense.
> 
> See above about pre-emptible workloads should not be killed. LNL/BMG do
> not have the RCS/CCS workaround of DG2 that prevents pre-emptions and
> context switches while the other side is busy. So I am not seeing a reason why
> the test is failing. That needs to be explained before simply making it skip on
> those platforms.
> 
> John.
> 
> >
> >> Thanks,
> >> Tejas
> >>
> >>> And if this is not meant to be a reset test, why does it test resets
> >>> at all? If the concern is that we don't have a stress test involving
> >>> resets and this is the only coverage then it seems like we should
> >>> not be
> >> crippling it.
> >>> John.
> >>>
> >>>
> >>>> Matt
> >>>>
> >>>>> As noted, PVC has multiple compute engines but no RCS engine. If
> >>>>> any compute engine is reset then all are reset. So if the test is
> >>>>> running correctly and passing on PVC then it cannot be failing on
> >>>>> LNL/BMG purely due to shared domain resets.
> >>>>>
> >>>>> Is the reset not happening on PVC? Is the test not actually
> >>>>> running multiple contexts in parallel on PVC? Or are the spinners
> >>>>> pre-emptible and are therefore supposed to survive the reset of a
> >>>>> shared domain engine by being swapped out first? In which case
> >>>>> LNL/BMG are broken because the killed contexts are not supposed to
> >>>>> be
> >>> killed even though the engine is reset?

Ok, I saw in test spinner were marked not preemptible, but then I made them preemptible with below change, still see the failure when CCS goes reset.

@@ -477,7 +477,7 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr, uint64_t userptr,
                uint64_t pad;
                uint32_t data;
        } *data;
-       struct xe_spin_opts spin_opts = { .preempt = false };
+       struct xe_spin_opts spin_opts = { .preempt = true };

Is this expected if not could this be test or KMD or GuC issue?

Thanks,
Tejas

> >>>>> John.
> >>>>>
> >>>>>> Thanks,
> >>>>>> Tejas
> >>>>>>> John.
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Tejas
> >>>>>>>>> Lucas De Marchi
> >>>>>>>>>
> >>>>>>>>>>> +
> >>>>>>>>>>> +	if (flags & HANG)
> >>>>>>>>>>> +		return true;
> >>>>>>>>>>> +
> >>>>>>>>>>> +	return false;
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>      /**
> >>>>>>>>>>>       * SUBTEST: threads-%s
> >>>>>>>>>>>       * Description: Run threads %arg[1] test with multi
> >>>>>>>>>>> threads @@
> >>>>>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>>>>>>>>>>      	bool go = false;
> >>>>>>>>>>>      	int n_threads = 0;
> >>>>>>>>>>>      	int gt;
> >>>>>>>>>>> +	bool has_rcs = false;
> >>>>>>>>>>>
> >>>>>>>>>>> -	xe_for_each_engine(fd, hwe)
> >>>>>>>>>>> +	xe_for_each_engine(fd, hwe) {
> >>>>>>>>>>> +		if (hwe->engine_class ==
> >>> DRM_XE_ENGINE_CLASS_RENDER)
> >>>>>>>>>>> +			has_rcs = true;
> >>>>>>>>>>>      		++n_engines;
> >>>>>>>>>>> +	}
> >>>>>>>>>>>
> >>>>>>>>>>>      	if (flags & BALANCER) {
> >>>>>>>>>>>      		xe_for_each_gt(fd, gt) @@ -990,6 +1005,15
> @@ static
> >>>>>>>>>>> void threads(int fd, int flags)
> >>>>>>>>>>>      	}
> >>>>>>>>>>>
> >>>>>>>>>>>      	xe_for_each_engine(fd, hwe) {
> >>>>>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent
> >>> engines.
> >>>>>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
> >>>>>>>>>>> +		 * victimized, so skip the compute engine avoiding
> >>>>>>>>>>> +		 * parallel execution with RCS
> >>>>>>>>>>> +		 */
> >>>>>>>>>>> +		if (has_rcs && hwe->engine_class ==
> >>>>>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >>>>>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
> >>>>>>>>>>> +			continue;
> >>>>>>>>>>> +
> >>>>>>>>>>>      		threads_data[i].mutex = &mutex;
> >>>>>>>>>>>      		threads_data[i].cond = &cond;
> >>>>>>>>>>>      #define ADDRESS_SHIFT	39
> >>>>>>>>>>> --
> >>>>>>>>>>> 2.25.1
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Matt Roper
> >>>>>>>>>> Graphics Software Engineer
> >>>>>>>>>> Linux GPU Platform Enablement Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-11  5:12                       ` Upadhyay, Tejas
@ 2024-04-11  5:37                         ` Upadhyay, Tejas
  0 siblings, 0 replies; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-11  5:37 UTC (permalink / raw)
  To: Upadhyay, Tejas, Harrison, John C, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe



> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> Upadhyay, Tejas
> Sent: Thursday, April 11, 2024 10:42 AM
> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>
> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> 
> 
> > -----Original Message-----
> > From: Harrison, John C <john.c.harrison@intel.com>
> > Sent: Thursday, April 11, 2024 12:52 AM
> > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; Brost, Matthew
> > <matthew.brost@intel.com>
> > Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> > <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> > xe@lists.freedesktop.org
> > Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> > reset domain aware
> >
> > On 4/8/2024 05:00, Upadhyay, Tejas wrote:
> > >> -----Original Message-----
> > >> From: Upadhyay, Tejas
> > >> Sent: Monday, April 8, 2024 10:54 AM
> > >> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
> > >> <matthew.brost@intel.com>
> > >> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> > >> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> > >> xe@lists.freedesktop.org
> > >> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> > >> tests reset domain aware
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Harrison, John C <john.c.harrison@intel.com>
> > >>> Sent: Saturday, April 6, 2024 5:13 AM
> > >>> To: Brost, Matthew <matthew.brost@intel.com>
> > >>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> > >>> <lucas.demarchi@intel.com>; Roper, Matthew D
> > >>> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> > >>> xe@lists.freedesktop.org
> > >>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> > >>> tests reset domain aware
> > >>>
> > >>> On 4/5/2024 16:33, Matthew Brost wrote:
> > >>>> On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> > >>>>> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Harrison, John C<john.c.harrison@intel.com>
> > >>>>>>> Sent: Friday, April 5, 2024 4:53 AM
> > >>>>>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi,
> > >>>>>>> Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> > >>>>>>> <matthew.d.roper@intel.com>
> > >>>>>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.or
> > >>>>>>> g; Brost, Matthew<matthew.brost@intel.com>
> > >>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> > >>>>>>> tests reset domain aware
> > >>>>>>>
> > >>>>>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
> > >>>>>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
> > >>>>>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
> > >>>>>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> > >>>>>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> > >>>>>>>>> Brost, Matthew<matthew.brost@intel.com>
> > >>>>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make
> > >>>>>>>>> hang tests reset domain aware
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> > >>>>>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay
> > wrote:
> > >>>>>>>>>>> RCS/CCS are dependent engines as they are sharing reset
> > domain.
> > >>>>>>>>>>> Whenever there is reset from CCS, all the exec queues
> > >>>>>>>>>>> running on RCS are victimised mainly on Lunarlake.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Lets skip parallel execution on CCS with RCS.
> > >>>>>>>>>> I haven't really looked at this specific test in detail,
> > >>>>>>>>>> but based on your explanation here, you're also going to
> > >>>>>>>>>> run into problems with multiple CCS engines since they all
> > >>>>>>>>>> share the same reset.  You won't see that on platforms like
> > >>>>>>>>>> LNL that only have a single CCS, but platforms
> > >>>>>>>>> but it is seen on LNL because of having both RCS and CCS.
> > >>>>>>>>>
> > >>>>>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where
> > >>>>>>>>>> a reset on one kills anything running on the others.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Matt
> > >>>>>>>>>>
> > >>>>>>>>>>> It helps in fixing following errors:
> > >>>>>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
> > >>>>>>>>>>> Failed
> > >>>>>>>>>>> assertion: data[i].data == 0xc0ffee
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2.Test assertion failure function xe_exec, file
> > >>>>>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd,
> > >>>>>>>>>>> exec) == 0,
> > >>>>>>>>>>> error: -125 != 0
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>      tests/intel/xe_exec_threads.c | 26
> > >> +++++++++++++++++++++++++-
> > >>>>>>>>>>>      1 file changed, 25 insertions(+), 1 deletion(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
> > >>>>>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
> > >>>>>>>>>>> 100644
> > >>>>>>>>>>> --- a/tests/intel/xe_exec_threads.c
> > >>>>>>>>>>> +++ b/tests/intel/xe_exec_threads.c
> > >>>>>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> > >>>>>>>>>>>      	return NULL;
> > >>>>>>>>>>>      }
> > >>>>>>>>>>>
> > >>>>>>>>>>> +static bool is_engine_contexts_victimized(int fd,
> > >>>>>>>>>>> +unsigned int
> > >>>>>>>>>>> +flags) {
> > >>>>>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> > >>>>>>>>>>> +		return false;
> > >>>>>>>>> as above, I don't think we should add any platform check here.
> > >>>>>>>>> It's impossible to keep it up to date and it's also testing
> > >>>>>>>>> the wrong
> > >>> thing.
> > >>>>>>>>> AFAIU you don't want parallel submission on engines that
> > >>>>>>>>> share the same reset domain. So, this is actually what should be
> tested.
> > >>>>>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of
> > >>>>>>>> WA/noWA which
> > >>>>>>> helps to run things parallelly on engines in same reset domain
> > >>>>>>> and apparently BMG/LNL does not have that kind of support so
> > >>>>>>> applicable for LNL/BMG with parallel submission on RCS/CCS only.
> > >>>>>>>> @Harrison, John C please reply if you have any other input here.
> > >>>>>>> I don't get what you mean by 'have some kind of WA/noWA'. All
> > >>>>>>> platforms with compute engines have shared reset domains. That
> > >>>>>>> is all
> > >>> there is to it. I.e.
> > >>>>>>> everything from TGL onwards. That includes RCS and all CCS
> engines.
> > >>>>>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform
> > >>>>>>> with multiple engines that talk to EUs will reset all of those
> > >>>>>>> engines in
> > >>> parallel.
> > >>>>>>> There are w/a's which make the situation even worse. E.g. on
> > >>>>>>> DG2/MTL you are not allowed to context switch one of those
> > >>>>>>> engines
> > >>> while another is busy.
> > >>>>>>> Which means that if one hangs, they all hang - you cannot just
> > >>>>>>> wait for other workloads to complete and/or pre-empt them off
> > >>>>>>> the engine prior to doing the shared reset. But there is
> > >>>>>>> nothing that makes
> > >> it better.
> > >>>>>>> I assume we are talking about GuC triggered engine resets here?
> > >>>>>>> As opposed to driver triggered full GT resets?
> > >>>>>>>
> > >>>>>>> The GuC will attempt to idle all other connected engines first
> > >>>>>>> by pre-empting out any executing contexts. If those contexts
> > >>>>>>> are pre-emptible then they will survive - GuC will
> > >>>>>>> automatically restart them once the reset is complete. If they
> > >>>>>>> are not (or at least not pre-emptible within the pre-emption
> > >>>>>>> timeout
> > >>>>>>> limit) then they will be killed as collateral damage.
> > >>>>>>>
> > >>>>>>> What are the workloads being submitted by this test? Are the
> > >>>>>>> pre-emptible spinners? If so, then they should survive
> > >>>>>>> (assuming you don't have the DG2/MTL RCS/CCS w/a in effect).
> > >>>>>>> If they are non-preemptible spinners then they are toast.
> > >>>>>> Main question here was, if this fix should be applied to all
> > >>>>>> platforms who
> > >>> has RCS and CCS both or just LNL/BMG. Reason to ask is, only
> > >>> LNL/BMG are hitting this issue, with same tests PVC and other
> > >>> platforms are not hitting issue which we are addressing here.
> > >>>>> And the answer is that yes, shared reset domains are common to
> > >>>>> all platforms with compute engines. So if only LNL/BMG are
> > >>>>> failing then the problem is not understood. Which is not helped
> > >>>>> by this test code being extremely complex and having almost zero
> > >>>>> explanation in it at
> > >> all :(.
> > >>>> Let me explain what this test is doing...
> > >>>>
> > >>>> - It creates a thread per hardware engine instance
> > >>>> - Within each thread it creates many exec queues targeting 1 hardware
> > >>>>     instance
> > >>>> - It submits a bunch of batches which do dword write the exec
> > >>>> queues
> > >>>> - If the HANG flag is set, 1 of the exec queue per thread will insert a
> > >>>>     non-preemptable spinner. It is expected the GuC will reset this exec
> > >>>>     queue only. Cross CCS / RCS resets will break this as one of the
> > >>>>     'good' exec queues from another thread could also be reset.
> > >>> If the 'good' workloads are pre-emptible then they should not be reset.
> > >>> The GuC will attempt to pre-empt all shared domain engines prior
> > >>> to triggering any resets. If they are being killed then something
> > >>> is broken and needs to be fixed.
> > >>>
> > >>>> - I think the HANG sections can fail on PVC too (VLK-57725)
> > >>>> - This is racey, as if the resets occur when all the 'bad' exec queues
> > >>>>     are running the test will still work with cross CCS / RCS
> > >>>> resets
> > >>> Can this description be added to the test?
> > >>>
> > >>>> As the author of this test, I am fine with compute class just
> > >>>> being skipped if HANF flag set. It is not testing individual
> > >>>> engine instance resets (we had tests for that but might be
> > >>>> temporally
> > >>>> removed) rather falls into the class of tests which trying to do
> > >>>> a bunch of things in parallel to stress the KMD.
> > >>> Note that some platforms don't have RCS or media engines. Which
> > >>> means only running on BCS engines. Is that sufficient coverage?
> > >> If you look at this patch, we skip compute only if rcs is present,
> > >> otherwise
> > not.
> > >> So far I don’t see failure when 2 compute instances happen to enter
> > >> this
> > race.
> > >>
> > > I can modify test to skip in case 2 ccs and no rcs with HANG tests.
> > > Even
> > though issue not seen anywhere else if change needs to be generic.
> > My point is not to make random and unexplained changes to the test but
> > to understand why the test is failing the way it is. So far, the
> > explanation does not make sense.
> >
> > See above about pre-emptible workloads should not be killed. LNL/BMG
> > do not have the RCS/CCS workaround of DG2 that prevents pre-emptions
> > and context switches while the other side is busy. So I am not seeing
> > a reason why the test is failing. That needs to be explained before
> > simply making it skip on those platforms.
> >
> > John.
> >
> > >
> > >> Thanks,
> > >> Tejas
> > >>
> > >>> And if this is not meant to be a reset test, why does it test
> > >>> resets at all? If the concern is that we don't have a stress test
> > >>> involving resets and this is the only coverage then it seems like
> > >>> we should not be
> > >> crippling it.
> > >>> John.
> > >>>
> > >>>
> > >>>> Matt
> > >>>>
> > >>>>> As noted, PVC has multiple compute engines but no RCS engine. If
> > >>>>> any compute engine is reset then all are reset. So if the test
> > >>>>> is running correctly and passing on PVC then it cannot be
> > >>>>> failing on LNL/BMG purely due to shared domain resets.
> > >>>>>
> > >>>>> Is the reset not happening on PVC? Is the test not actually
> > >>>>> running multiple contexts in parallel on PVC? Or are the
> > >>>>> spinners pre-emptible and are therefore supposed to survive the
> > >>>>> reset of a shared domain engine by being swapped out first? In
> > >>>>> which case LNL/BMG are broken because the killed contexts are
> > >>>>> not supposed to be
> > >>> killed even though the engine is reset?
> 
> Ok, I saw in test spinner were marked not preemptible, but then I made them
> preemptible with below change, still see the failure when CCS goes reset.
> 
> @@ -477,7 +477,7 @@ test_legacy_mode(int fd, uint32_t vm, uint64_t addr,
> uint64_t userptr,
>                 uint64_t pad;
>                 uint32_t data;
>         } *data;
> -       struct xe_spin_opts spin_opts = { .preempt = false };
> +       struct xe_spin_opts spin_opts = { .preempt = true };

I see MI_ARB_CHECK is added in test spin batch if pre-empt is true below ,
        if (opts->preempt)
                spin->batch[b++] = MI_ARB_CHECK;

Tejas
> 
> Is this expected if not could this be test or KMD or GuC issue?
> 
> Thanks,
> Tejas
> 
> > >>>>> John.
> > >>>>>
> > >>>>>> Thanks,
> > >>>>>> Tejas
> > >>>>>>> John.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Tejas
> > >>>>>>>>> Lucas De Marchi
> > >>>>>>>>>
> > >>>>>>>>>>> +
> > >>>>>>>>>>> +	if (flags & HANG)
> > >>>>>>>>>>> +		return true;
> > >>>>>>>>>>> +
> > >>>>>>>>>>> +	return false;
> > >>>>>>>>>>> +}
> > >>>>>>>>>>> +
> > >>>>>>>>>>>      /**
> > >>>>>>>>>>>       * SUBTEST: threads-%s
> > >>>>>>>>>>>       * Description: Run threads %arg[1] test with multi
> > >>>>>>>>>>> threads @@
> > >>>>>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> > >>>>>>>>>>>      	bool go = false;
> > >>>>>>>>>>>      	int n_threads = 0;
> > >>>>>>>>>>>      	int gt;
> > >>>>>>>>>>> +	bool has_rcs = false;
> > >>>>>>>>>>>
> > >>>>>>>>>>> -	xe_for_each_engine(fd, hwe)
> > >>>>>>>>>>> +	xe_for_each_engine(fd, hwe) {
> > >>>>>>>>>>> +		if (hwe->engine_class ==
> > >>> DRM_XE_ENGINE_CLASS_RENDER)
> > >>>>>>>>>>> +			has_rcs = true;
> > >>>>>>>>>>>      		++n_engines;
> > >>>>>>>>>>> +	}
> > >>>>>>>>>>>
> > >>>>>>>>>>>      	if (flags & BALANCER) {
> > >>>>>>>>>>>      		xe_for_each_gt(fd, gt) @@ -990,6 +1005,15
> > @@ static
> > >>>>>>>>>>> void threads(int fd, int flags)
> > >>>>>>>>>>>      	}
> > >>>>>>>>>>>
> > >>>>>>>>>>>      	xe_for_each_engine(fd, hwe) {
> > >>>>>>>>>>> +		/* RCS/CCS sharing reset domain hence
> dependent
> > >>> engines.
> > >>>>>>>>>>> +		 * When CCS is doing reset, all the contexts of
> RCS are
> > >>>>>>>>>>> +		 * victimized, so skip the compute engine
> avoiding
> > >>>>>>>>>>> +		 * parallel execution with RCS
> > >>>>>>>>>>> +		 */
> > >>>>>>>>>>> +		if (has_rcs && hwe->engine_class ==
> > >>>>>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> > >>>>>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
> > >>>>>>>>>>> +			continue;
> > >>>>>>>>>>> +
> > >>>>>>>>>>>      		threads_data[i].mutex = &mutex;
> > >>>>>>>>>>>      		threads_data[i].cond = &cond;
> > >>>>>>>>>>>      #define ADDRESS_SHIFT	39
> > >>>>>>>>>>> --
> > >>>>>>>>>>> 2.25.1
> > >>>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Matt Roper
> > >>>>>>>>>> Graphics Software Engineer
> > >>>>>>>>>> Linux GPU Platform Enablement Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware
  2024-04-10 19:22                     ` John Harrison
  2024-04-11  5:12                       ` Upadhyay, Tejas
@ 2024-04-23 13:06                       ` Upadhyay, Tejas
  1 sibling, 0 replies; 21+ messages in thread
From: Upadhyay, Tejas @ 2024-04-23 13:06 UTC (permalink / raw)
  To: Harrison, John C, Brost, Matthew
  Cc: De Marchi, Lucas, Roper, Matthew D, igt-dev, intel-xe



> -----Original Message-----
> From: Harrison, John C <john.c.harrison@intel.com>
> Sent: Thursday, April 11, 2024 12:52 AM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; Brost, Matthew
> <matthew.brost@intel.com>
> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On 4/8/2024 05:00, Upadhyay, Tejas wrote:
> >> -----Original Message-----
> >> From: Upadhyay, Tejas
> >> Sent: Monday, April 8, 2024 10:54 AM
> >> To: Harrison, John C <john.c.harrison@intel.com>; Brost, Matthew
> >> <matthew.brost@intel.com>
> >> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; Roper, Matthew D
> >> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> >> xe@lists.freedesktop.org
> >> Subject: RE: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >> reset domain aware
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Harrison, John C <john.c.harrison@intel.com>
> >>> Sent: Saturday, April 6, 2024 5:13 AM
> >>> To: Brost, Matthew <matthew.brost@intel.com>
> >>> Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; De Marchi, Lucas
> >>> <lucas.demarchi@intel.com>; Roper, Matthew D
> >>> <matthew.d.roper@intel.com>; igt-dev@lists.freedesktop.org; intel-
> >>> xe@lists.freedesktop.org
> >>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests
> >>> reset domain aware
> >>>
> >>> On 4/5/2024 16:33, Matthew Brost wrote:
> >>>> On Fri, Apr 05, 2024 at 11:15:14AM -0700, John Harrison wrote:
> >>>>> On 4/4/2024 21:47, Upadhyay, Tejas wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: Harrison, John C<john.c.harrison@intel.com>
> >>>>>>> Sent: Friday, April 5, 2024 4:53 AM
> >>>>>>> To: Upadhyay, Tejas<tejas.upadhyay@intel.com>; De Marchi, Lucas
> >>>>>>> <lucas.demarchi@intel.com>; Roper, Matthew D
> >>>>>>> <matthew.d.roper@intel.com>
> >>>>>>> Cc:igt-dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >>>>>>> Brost, Matthew<matthew.brost@intel.com>
> >>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>>>>> tests reset domain aware
> >>>>>>>
> >>>>>>> On 4/2/2024 22:35, Upadhyay, Tejas wrote:
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: De Marchi, Lucas<lucas.demarchi@intel.com>
> >>>>>>>>> Sent: Wednesday, April 3, 2024 2:26 AM
> >>>>>>>>> To: Roper, Matthew D<matthew.d.roper@intel.com>
> >>>>>>>>> Cc: Upadhyay, Tejas<tejas.upadhyay@intel.com>; igt-
> >>>>>>>>> dev@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >>>>>>>>> Brost, Matthew<matthew.brost@intel.com>
> >>>>>>>>> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang
> >>>>>>>>> tests reset domain aware
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >>>>>>>>>> On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay
> wrote:
> >>>>>>>>>>> RCS/CCS are dependent engines as they are sharing reset
> domain.
> >>>>>>>>>>> Whenever there is reset from CCS, all the exec queues
> >>>>>>>>>>> running on RCS are victimised mainly on Lunarlake.
> >>>>>>>>>>>
> >>>>>>>>>>> Lets skip parallel execution on CCS with RCS.
> >>>>>>>>>> I haven't really looked at this specific test in detail, but
> >>>>>>>>>> based on your explanation here, you're also going to run into
> >>>>>>>>>> problems with multiple CCS engines since they all share the
> >>>>>>>>>> same reset.  You won't see that on platforms like LNL that
> >>>>>>>>>> only have a single CCS, but platforms
> >>>>>>>>> but it is seen on LNL because of having both RCS and CCS.
> >>>>>>>>>
> >>>>>>>>>> like PVC, ATS-M, DG2, etc. can all have multiple CCS where a
> >>>>>>>>>> reset on one kills anything running on the others.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Matt
> >>>>>>>>>>
> >>>>>>>>>>> It helps in fixing following errors:
> >>>>>>>>>>> 1. Test assertion failure function test_legacy_mode, file,
> >>>>>>>>>>> Failed
> >>>>>>>>>>> assertion: data[i].data == 0xc0ffee
> >>>>>>>>>>>
> >>>>>>>>>>> 2.Test assertion failure function xe_exec, file
> >>>>>>>>>>> ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec)
> >>>>>>>>>>> == 0,
> >>>>>>>>>>> error: -125 != 0
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Tejas Upadhyay<tejas.upadhyay@intel.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>      tests/intel/xe_exec_threads.c | 26
> >> +++++++++++++++++++++++++-
> >>>>>>>>>>>      1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9
> >>>>>>>>>>> 100644
> >>>>>>>>>>> --- a/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> +++ b/tests/intel/xe_exec_threads.c
> >>>>>>>>>>> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>>>>>>>>>>      	return NULL;
> >>>>>>>>>>>      }
> >>>>>>>>>>>
> >>>>>>>>>>> +static bool is_engine_contexts_victimized(int fd, unsigned
> >>>>>>>>>>> +int
> >>>>>>>>>>> +flags) {
> >>>>>>>>>>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >>>>>>>>>>> +		return false;
> >>>>>>>>> as above, I don't think we should add any platform check here.
> >>>>>>>>> It's impossible to keep it up to date and it's also testing
> >>>>>>>>> the wrong
> >>> thing.
> >>>>>>>>> AFAIU you don't want parallel submission on engines that share
> >>>>>>>>> the same reset domain. So, this is actually what should be tested.
> >>>>>>>> Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA
> >>>>>>>> which
> >>>>>>> helps to run things parallelly on engines in same reset domain
> >>>>>>> and apparently BMG/LNL does not have that kind of support so
> >>>>>>> applicable for LNL/BMG with parallel submission on RCS/CCS only.
> >>>>>>>> @Harrison, John C please reply if you have any other input here.
> >>>>>>> I don't get what you mean by 'have some kind of WA/noWA'. All
> >>>>>>> platforms with compute engines have shared reset domains. That
> >>>>>>> is all
> >>> there is to it. I.e.
> >>>>>>> everything from TGL onwards. That includes RCS and all CCS engines.
> >>>>>>> So RCS + CCS, CCS0 + CCS1, RCS + CC0 + CCS1, etc. Any platform
> >>>>>>> with multiple engines that talk to EUs will reset all of those
> >>>>>>> engines in
> >>> parallel.
> >>>>>>> There are w/a's which make the situation even worse. E.g. on
> >>>>>>> DG2/MTL you are not allowed to context switch one of those
> >>>>>>> engines
> >>> while another is busy.
> >>>>>>> Which means that if one hangs, they all hang - you cannot just
> >>>>>>> wait for other workloads to complete and/or pre-empt them off
> >>>>>>> the engine prior to doing the shared reset. But there is nothing
> >>>>>>> that makes
> >> it better.
> >>>>>>> I assume we are talking about GuC triggered engine resets here?
> >>>>>>> As opposed to driver triggered full GT resets?
> >>>>>>>
> >>>>>>> The GuC will attempt to idle all other connected engines first
> >>>>>>> by pre-empting out any executing contexts. If those contexts are
> >>>>>>> pre-emptible then they will survive - GuC will automatically
> >>>>>>> restart them once the reset is complete. If they are not (or at
> >>>>>>> least not pre-emptible within the pre-emption timeout
> >>>>>>> limit) then they will be killed as collateral damage.
> >>>>>>>
> >>>>>>> What are the workloads being submitted by this test? Are the
> >>>>>>> pre-emptible spinners? If so, then they should survive (assuming
> >>>>>>> you don't have the DG2/MTL RCS/CCS w/a in effect). If they are
> >>>>>>> non-preemptible spinners then they are toast.
> >>>>>> Main question here was, if this fix should be applied to all
> >>>>>> platforms who
> >>> has RCS and CCS both or just LNL/BMG. Reason to ask is, only LNL/BMG
> >>> are hitting this issue, with same tests PVC and other platforms are
> >>> not hitting issue which we are addressing here.
> >>>>> And the answer is that yes, shared reset domains are common to all
> >>>>> platforms with compute engines. So if only LNL/BMG are failing
> >>>>> then the problem is not understood. Which is not helped by this
> >>>>> test code being extremely complex and having almost zero
> >>>>> explanation in it at
> >> all :(.
> >>>> Let me explain what this test is doing...
> >>>>
> >>>> - It creates a thread per hardware engine instance
> >>>> - Within each thread it creates many exec queues targeting 1 hardware
> >>>>     instance
> >>>> - It submits a bunch of batches which do dword write the exec
> >>>> queues
> >>>> - If the HANG flag is set, 1 of the exec queue per thread will insert a
> >>>>     non-preemptable spinner. It is expected the GuC will reset this exec
> >>>>     queue only. Cross CCS / RCS resets will break this as one of the
> >>>>     'good' exec queues from another thread could also be reset.
> >>> If the 'good' workloads are pre-emptible then they should not be reset.
> >>> The GuC will attempt to pre-empt all shared domain engines prior to
> >>> triggering any resets. If they are being killed then something is
> >>> broken and needs to be fixed.
> >>>
> >>>> - I think the HANG sections can fail on PVC too (VLK-57725)
> >>>> - This is racey, as if the resets occur when all the 'bad' exec queues
> >>>>     are running the test will still work with cross CCS / RCS
> >>>> resets
> >>> Can this description be added to the test?
> >>>
> >>>> As the author of this test, I am fine with compute class just being
> >>>> skipped if HANF flag set. It is not testing individual engine
> >>>> instance resets (we had tests for that but might be temporally
> >>>> removed) rather falls into the class of tests which trying to do a
> >>>> bunch of things in parallel to stress the KMD.
> >>> Note that some platforms don't have RCS or media engines. Which
> >>> means only running on BCS engines. Is that sufficient coverage?
> >> If you look at this patch, we skip compute only if rcs is present, otherwise
> not.
> >> So far I don’t see failure when 2 compute instances happen to enter this
> race.
> >>
> > I can modify test to skip in case 2 ccs and no rcs with HANG tests. Even
> though issue not seen anywhere else if change needs to be generic.
> My point is not to make random and unexplained changes to the test but to
> understand why the test is failing the way it is. So far, the explanation does
> not make sense.
> 
> See above about pre-emptible workloads should not be killed. LNL/BMG do
> not have the RCS/CCS workaround of DG2 that prevents pre-emptions and
> context switches while the other side is busy. So I am not seeing a reason why
> the test is failing. That needs to be explained before simply making it skip on
> those platforms.

Hi @Harrison, John C, 

As far as I am seeing,

Workload batch submitted are non-preemptible including spinner which actually creates hang in the test. In that case as per your explanation what should be test behaviour? Should we skip computes as I sent in V3 version? Or something needs to be fixed in KMD/GUC?

Also which DG2 workaround you are talking about, can you please help with WA number or some reference? As far as I understand we don’t mark WL preemptible so it does not matter if WA is there or not, no?

FYI @Brost, Matthew

Thanks,
Tejas 
> 
> John.
> 
> >
> >> Thanks,
> >> Tejas
> >>
> >>> And if this is not meant to be a reset test, why does it test resets
> >>> at all? If the concern is that we don't have a stress test involving
> >>> resets and this is the only coverage then it seems like we should
> >>> not be
> >> crippling it.
> >>> John.
> >>>
> >>>
> >>>> Matt
> >>>>
> >>>>> As noted, PVC has multiple compute engines but no RCS engine. If
> >>>>> any compute engine is reset then all are reset. So if the test is
> >>>>> running correctly and passing on PVC then it cannot be failing on
> >>>>> LNL/BMG purely due to shared domain resets.
> >>>>>
> >>>>> Is the reset not happening on PVC? Is the test not actually
> >>>>> running multiple contexts in parallel on PVC? Or are the spinners
> >>>>> pre-emptible and are therefore supposed to survive the reset of a
> >>>>> shared domain engine by being swapped out first? In which case
> >>>>> LNL/BMG are broken because the killed contexts are not supposed to
> >>>>> be
> >>> killed even though the engine is reset?
> >>>>> John.
> >>>>>
> >>>>>> Thanks,
> >>>>>> Tejas
> >>>>>>> John.
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Tejas
> >>>>>>>>> Lucas De Marchi
> >>>>>>>>>
> >>>>>>>>>>> +
> >>>>>>>>>>> +	if (flags & HANG)
> >>>>>>>>>>> +		return true;
> >>>>>>>>>>> +
> >>>>>>>>>>> +	return false;
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>      /**
> >>>>>>>>>>>       * SUBTEST: threads-%s
> >>>>>>>>>>>       * Description: Run threads %arg[1] test with multi
> >>>>>>>>>>> threads @@
> >>>>>>>>>>> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>>>>>>>>>>      	bool go = false;
> >>>>>>>>>>>      	int n_threads = 0;
> >>>>>>>>>>>      	int gt;
> >>>>>>>>>>> +	bool has_rcs = false;
> >>>>>>>>>>>
> >>>>>>>>>>> -	xe_for_each_engine(fd, hwe)
> >>>>>>>>>>> +	xe_for_each_engine(fd, hwe) {
> >>>>>>>>>>> +		if (hwe->engine_class ==
> >>> DRM_XE_ENGINE_CLASS_RENDER)
> >>>>>>>>>>> +			has_rcs = true;
> >>>>>>>>>>>      		++n_engines;
> >>>>>>>>>>> +	}
> >>>>>>>>>>>
> >>>>>>>>>>>      	if (flags & BALANCER) {
> >>>>>>>>>>>      		xe_for_each_gt(fd, gt) @@ -990,6 +1005,15
> @@ static
> >>>>>>>>>>> void threads(int fd, int flags)
> >>>>>>>>>>>      	}
> >>>>>>>>>>>
> >>>>>>>>>>>      	xe_for_each_engine(fd, hwe) {
> >>>>>>>>>>> +		/* RCS/CCS sharing reset domain hence dependent
> >>> engines.
> >>>>>>>>>>> +		 * When CCS is doing reset, all the contexts of RCS are
> >>>>>>>>>>> +		 * victimized, so skip the compute engine avoiding
> >>>>>>>>>>> +		 * parallel execution with RCS
> >>>>>>>>>>> +		 */
> >>>>>>>>>>> +		if (has_rcs && hwe->engine_class ==
> >>>>>>>>> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >>>>>>>>>>> +		    is_engine_contexts_victimized(fd, flags))
> >>>>>>>>>>> +			continue;
> >>>>>>>>>>> +
> >>>>>>>>>>>      		threads_data[i].mutex = &mutex;
> >>>>>>>>>>>      		threads_data[i].cond = &cond;
> >>>>>>>>>>>      #define ADDRESS_SHIFT	39
> >>>>>>>>>>> --
> >>>>>>>>>>> 2.25.1
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Matt Roper
> >>>>>>>>>> Graphics Software Engineer
> >>>>>>>>>> Linux GPU Platform Enablement Intel Corporation


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-04-23 13:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-02 12:22 [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware Tejas Upadhyay
2024-04-02 12:15 ` ✗ CI.Patch_applied: failure for " Patchwork
2024-04-02 15:42 ` ✓ Fi.CI.BAT: success " Patchwork
2024-04-02 16:36 ` ✓ CI.xeBAT: " Patchwork
2024-04-02 19:40 ` [PATCH V2 i-g-t] " Matt Roper
2024-04-02 20:55   ` Lucas De Marchi
2024-04-03  5:35     ` Upadhyay, Tejas
2024-04-04 23:22       ` John Harrison
2024-04-04 23:45         ` John Harrison
2024-04-05  4:42           ` Upadhyay, Tejas
2024-04-05  4:47         ` Upadhyay, Tejas
2024-04-05 18:15           ` John Harrison
2024-04-05 23:33             ` Matthew Brost
2024-04-05 23:42               ` John Harrison
2024-04-08  5:23                 ` Upadhyay, Tejas
2024-04-08 12:00                   ` Upadhyay, Tejas
2024-04-10 19:22                     ` John Harrison
2024-04-11  5:12                       ` Upadhyay, Tejas
2024-04-11  5:37                         ` Upadhyay, Tejas
2024-04-23 13:06                       ` Upadhyay, Tejas
2024-04-03  0:30 ` ✗ Fi.CI.IGT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.