All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Disable automatic load CCS load balancing
@ 2024-02-20 14:35 Andi Shyti
  2024-02-20 14:35 ` [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison,
	Tvrtko Ursulin, stable, Andi Shyti, Andi Shyti

Hi,

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
   workaround.

2. Assigns all the CCS slices to one single user engine. The user
   will then be able to query only one CCS engine

Changelog
=========
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI
  engine list and adapt the engine counting accordingly (thanks
  Tvrtko).
- Reword the commit of Patch 2 (thanks John).

Andi Shyti (2):
  drm/i915/gt: Disable HW load balancing for CCS
  drm/i915/gt: Enable only one CCS for compute workload

 drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +++++++++
 drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++++
 drivers/gpu/drm/i915/i915_query.c           |  1 +
 5 files changed, 30 insertions(+)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS
  2024-02-20 14:35 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
@ 2024-02-20 14:35 ` Andi Shyti
  2024-02-20 23:26   ` Matt Roper
  2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison,
	Tvrtko Ursulin, stable, Andi Shyti, Andi Shyti

The hardware should not dynamically balance the load between CCS
engines. Wa_14019159160 recommends disabling it across all
platforms.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 50962cfd1353..cf709f6c05ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1478,6 +1478,7 @@
 
 #define GEN12_RCU_MODE				_MMIO(0x14800)
 #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
+#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
 
 #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0			(1 << 10)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index d67d44611c28..9126b37186fc 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
 		wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1,
 				 GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE);
 	}
+
+	/*
+	 * Wa_14019159160: disable the CCS load balancing
+	 * indiscriminately for all the platforms
+	 */
+	wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
 }
 
 static void
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 14:35 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
  2024-02-20 14:35 ` [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-02-20 14:35 ` Andi Shyti
  2024-02-20 14:42   ` Andi Shyti
                     ` (2 more replies)
  2024-02-20 15:45 ` ✗ Fi.CI.CHECKPATCH: warning for Disable automatic load CCS load balancing (rev3) Patchwork
  2024-02-20 16:06 ` ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 3 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:35 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison,
	Tvrtko Ursulin, stable, Andi Shyti, Andi Shyti

Enable only one CCS engine by default with all the compute sices
allocated to it.

While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance.

This change can be tested with igt i915_query.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v6.2+
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +++++++++
 drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  2 ++
 drivers/gpu/drm/i915/i915_query.c           |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
 		if (engine->uabi_class == I915_NO_UABI_CLASS)
 			continue;
 
+		/*
+		 * Do not list and do not count CCS engines other than the first
+		 */
+		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+		    engine->uabi_instance > 0) {
+			i915->engine_uabi_class_count[engine->uabi_class]--;
+			continue;
+		}
+
 		rb_link_node(&engine->uabi_node, prev, p);
 		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e19df4ef47f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
 	}
 }
 
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+	if (!IS_DG2(gt->i915))
+		return;
+
+	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
+
 int intel_gt_init_hw(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
 
 	intel_gt_init_swizzling(gt);
 
+	/* Configure CCS mode */
+	intel_gt_apply_ccs_mode(gt);
+
 	/*
 	 * At least 830 can leave some of the unused rings
 	 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..c148113770ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1605,6 +1605,8 @@
 #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
 #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
 
+#define XEHP_CCS_MODE                          _MMIO(0x14804)
+
 #define GEN11_GT_INTR_DW(x)			_MMIO(0x190018 + ((x) * 4))
 #define   GEN11_CSME				(31)
 #define   GEN12_HECI_2				(30)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index 3baa2f54a86e..d5a5143971f5 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
 	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
 }
 
+
 static int
 query_engine_info(struct drm_i915_private *i915,
 		  struct drm_i915_query_item *query_item)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
@ 2024-02-20 14:42   ` Andi Shyti
  2024-02-20 14:48   ` Tvrtko Ursulin
  2024-02-20 23:39   ` Matt Roper
  2 siblings, 0 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:42 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

Hi,

[...]

> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index 3baa2f54a86e..d5a5143971f5 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
>  	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
>  }
>  
> +

sorry, this is a leftover from the cleanup.

Andi

>  static int
>  query_engine_info(struct drm_i915_private *i915,
>  		  struct drm_i915_query_item *query_item)
> -- 
> 2.43.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  2024-02-20 14:42   ` Andi Shyti
@ 2024-02-20 14:48   ` Tvrtko Ursulin
  2024-02-21  0:14     ` Andi Shyti
  2024-02-20 23:39   ` Matt Roper
  2 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2024-02-20 14:48 UTC (permalink / raw)
  To: Andi Shyti, intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison, stable,
	Andi Shyti


On 20/02/2024 14:35, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices

slices

> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +++++++++
>   drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
>   drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  2 ++
>   drivers/gpu/drm/i915/i915_query.c           |  1 +
>   4 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..7041acc77810 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>   		if (engine->uabi_class == I915_NO_UABI_CLASS)
>   			continue;
>   
> +		/*
> +		 * Do not list and do not count CCS engines other than the first
> +		 */
> +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> +		    engine->uabi_instance > 0) {
> +			i915->engine_uabi_class_count[engine->uabi_class]--;
> +			continue;
> +		}

It's a bit ugly to decrement after increment, instead of somehow 
restructuring the loop to satisfy both cases more elegantly. And I 
wonder if internally (in dmesg when engine name is logged) we don't end 
up with ccs0 ccs0 ccs0 ccs0.. for all instances.

> +
>   		rb_link_node(&engine->uabi_node, prev, p);
>   		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index a425db5ed3a2..e19df4ef47f6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
>   	}
>   }
>   
> +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> +	if (!IS_DG2(gt->i915))
> +		return;
> +
> +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> +}
> +
>   int intel_gt_init_hw(struct intel_gt *gt)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
> @@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
>   
>   	intel_gt_init_swizzling(gt);
>   
> +	/* Configure CCS mode */
> +	intel_gt_apply_ccs_mode(gt);
> +
>   	/*
>   	 * At least 830 can leave some of the unused rings
>   	 * "active" (ie. head != tail) after resume which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cf709f6c05ae..c148113770ea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1605,6 +1605,8 @@
>   #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
>   #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
>   
> +#define XEHP_CCS_MODE                          _MMIO(0x14804)
> +
>   #define GEN11_GT_INTR_DW(x)			_MMIO(0x190018 + ((x) * 4))
>   #define   GEN11_CSME				(31)
>   #define   GEN12_HECI_2				(30)
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index 3baa2f54a86e..d5a5143971f5 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
>   	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
>   }
>   
> +

Zap please.

>   static int
>   query_engine_info(struct drm_i915_private *i915,
>   		  struct drm_i915_query_item *query_item)

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for Disable automatic load CCS load balancing (rev3)
  2024-02-20 14:35 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
  2024-02-20 14:35 ` [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
  2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
@ 2024-02-20 15:45 ` Patchwork
  2024-02-20 16:06 ` ✗ Fi.CI.BAT: failure " Patchwork
  3 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2024-02-20 15:45 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx

== Series Details ==

Series: Disable automatic load CCS load balancing (rev3)
URL   : https://patchwork.freedesktop.org/series/129951/
State : warning

== Summary ==

Error: dim checkpatch failed
c2b6f7fef25d drm/i915/gt: Disable HW load balancing for CCS
63fd178cadec drm/i915/gt: Enable only one CCS for compute workload
-:92: CHECK:LINE_SPACING: Please don't use multiple blank lines
#92: FILE: drivers/gpu/drm/i915/i915_query.c:127:
 
+

total: 0 errors, 0 warnings, 1 checks, 53 lines checked



^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.BAT: failure for Disable automatic load CCS load balancing (rev3)
  2024-02-20 14:35 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
                   ` (2 preceding siblings ...)
  2024-02-20 15:45 ` ✗ Fi.CI.CHECKPATCH: warning for Disable automatic load CCS load balancing (rev3) Patchwork
@ 2024-02-20 16:06 ` Patchwork
  3 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2024-02-20 16:06 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 12546 bytes --]

== Series Details ==

Series: Disable automatic load CCS load balancing (rev3)
URL   : https://patchwork.freedesktop.org/series/129951/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_14300 -> Patchwork_129951v3
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_129951v3 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_129951v3, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/index.html

Participating hosts (40 -> 40)
------------------------------

  Additional (1): bat-arls-3 
  Missing    (1): fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_129951v3:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@load:
    - fi-bsw-n3050:       [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-bsw-n3050/igt@i915_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-bsw-n3050/igt@i915_module_load@load.html
    - fi-skl-6600u:       [PASS][3] -> [ABORT][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-skl-6600u/igt@i915_module_load@load.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-skl-6600u/igt@i915_module_load@load.html
    - fi-apl-guc:         NOTRUN -> [ABORT][5]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-apl-guc/igt@i915_module_load@load.html
    - fi-glk-j4005:       [PASS][6] -> [ABORT][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-glk-j4005/igt@i915_module_load@load.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-glk-j4005/igt@i915_module_load@load.html
    - fi-skl-guc:         [PASS][8] -> [ABORT][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-skl-guc/igt@i915_module_load@load.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-skl-guc/igt@i915_module_load@load.html
    - fi-kbl-7567u:       [PASS][10] -> [ABORT][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-kbl-7567u/igt@i915_module_load@load.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-kbl-7567u/igt@i915_module_load@load.html
    - fi-cfl-8700k:       [PASS][12] -> [ABORT][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-cfl-8700k/igt@i915_module_load@load.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-cfl-8700k/igt@i915_module_load@load.html
    - fi-bsw-nick:        [PASS][14] -> [ABORT][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-bsw-nick/igt@i915_module_load@load.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-bsw-nick/igt@i915_module_load@load.html
    - bat-kbl-2:          [PASS][16] -> [ABORT][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-kbl-2/igt@i915_module_load@load.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-kbl-2/igt@i915_module_load@load.html
    - fi-cfl-guc:         [PASS][18] -> [ABORT][19]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-cfl-guc/igt@i915_module_load@load.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-cfl-guc/igt@i915_module_load@load.html
    - fi-kbl-x1275:       [PASS][20] -> [ABORT][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-kbl-x1275/igt@i915_module_load@load.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-kbl-x1275/igt@i915_module_load@load.html
    - fi-cfl-8109u:       [PASS][22] -> [ABORT][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-cfl-8109u/igt@i915_module_load@load.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-cfl-8109u/igt@i915_module_load@load.html
    - fi-ivb-3770:        [PASS][24] -> [ABORT][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-ivb-3770/igt@i915_module_load@load.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-ivb-3770/igt@i915_module_load@load.html
    - fi-kbl-guc:         [PASS][26] -> [ABORT][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-kbl-guc/igt@i915_module_load@load.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-kbl-guc/igt@i915_module_load@load.html

  * igt@i915_selftest@live@client:
    - fi-elk-e7500:       [PASS][28] -> [DMESG-WARN][29] +42 other tests dmesg-warn
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-elk-e7500/igt@i915_selftest@live@client.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-elk-e7500/igt@i915_selftest@live@client.html

  * igt@i915_selftest@live@coherency:
    - bat-jsl-3:          [PASS][30] -> [DMESG-WARN][31] +40 other tests dmesg-warn
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-jsl-3/igt@i915_selftest@live@coherency.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-jsl-3/igt@i915_selftest@live@coherency.html

  * igt@i915_selftest@live@gt_contexts:
    - fi-ilk-650:         [PASS][32] -> [DMESG-WARN][33] +42 other tests dmesg-warn
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-ilk-650/igt@i915_selftest@live@gt_contexts.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-ilk-650/igt@i915_selftest@live@gt_contexts.html

  * igt@i915_selftest@live@gt_pm:
    - bat-jsl-1:          [PASS][34] -> [DMESG-WARN][35] +41 other tests dmesg-warn
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-jsl-1/igt@i915_selftest@live@gt_pm.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-jsl-1/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
    - bat-atsm-1:         [PASS][36] -> [DMESG-FAIL][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-atsm-1/igt@i915_selftest@live@hangcheck.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-atsm-1/igt@i915_selftest@live@hangcheck.html
    - bat-dg2-9:          [PASS][38] -> [DMESG-FAIL][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-dg2-9/igt@i915_selftest@live@hangcheck.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-dg2-9/igt@i915_selftest@live@hangcheck.html
    - bat-dg2-8:          [PASS][40] -> [DMESG-FAIL][41]
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-dg2-8/igt@i915_selftest@live@hangcheck.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-dg2-8/igt@i915_selftest@live@hangcheck.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@gt_mocs:
    - {bat-arls-2}:       NOTRUN -> [INCOMPLETE][42]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-arls-2/igt@i915_selftest@live@gt_mocs.html

  * igt@i915_selftest@live@hangcheck:
    - {bat-dg2-14}:       [PASS][43] -> [DMESG-FAIL][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-dg2-14/igt@i915_selftest@live@hangcheck.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-dg2-14/igt@i915_selftest@live@hangcheck.html

  
Known issues
------------

  Here are the changes found in Patchwork_129951v3 that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - fi-tgl-1115g4:      [PASS][45] -> [FAIL][46] ([i915#8293])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-tgl-1115g4/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-tgl-1115g4/boot.html

  
#### Possible fixes ####

  * boot:
    - fi-apl-guc:         [FAIL][47] ([i915#8293]) -> [PASS][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/fi-apl-guc/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/fi-apl-guc/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@i915_module_load@reload:
    - bat-jsl-3:          [PASS][49] -> [DMESG-WARN][50] ([i915#1982])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-jsl-3/igt@i915_module_load@reload.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-jsl-3/igt@i915_module_load@reload.html

  * igt@kms_pipe_crc_basic@nonblocking-crc:
    - bat-dg2-11:         NOTRUN -> [SKIP][51] ([i915#9197])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-dg2-11/igt@kms_pipe_crc_basic@nonblocking-crc.html

  
#### Possible fixes ####

  * igt@gem_busy@busy@all-engines:
    - {bat-arls-2}:       [INCOMPLETE][52] ([i915#10194]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-arls-2/igt@gem_busy@busy@all-engines.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-arls-2/igt@gem_busy@busy@all-engines.html

  * igt@gem_exec_create@basic@smem:
    - {bat-arls-1}:       [DMESG-WARN][54] ([i915#10194]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_14300/bat-arls-1/igt@gem_exec_create@basic@smem.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/bat-arls-1/igt@gem_exec_create@basic@smem.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#10194]: https://gitlab.freedesktop.org/drm/intel/issues/10194
  [i915#10196]: https://gitlab.freedesktop.org/drm/intel/issues/10196
  [i915#10197]: https://gitlab.freedesktop.org/drm/intel/issues/10197
  [i915#10200]: https://gitlab.freedesktop.org/drm/intel/issues/10200
  [i915#10202]: https://gitlab.freedesktop.org/drm/intel/issues/10202
  [i915#10206]: https://gitlab.freedesktop.org/drm/intel/issues/10206
  [i915#10207]: https://gitlab.freedesktop.org/drm/intel/issues/10207
  [i915#10208]: https://gitlab.freedesktop.org/drm/intel/issues/10208
  [i915#10209]: https://gitlab.freedesktop.org/drm/intel/issues/10209
  [i915#10211]: https://gitlab.freedesktop.org/drm/intel/issues/10211
  [i915#10212]: https://gitlab.freedesktop.org/drm/intel/issues/10212
  [i915#10213]: https://gitlab.freedesktop.org/drm/intel/issues/10213
  [i915#10214]: https://gitlab.freedesktop.org/drm/intel/issues/10214
  [i915#10215]: https://gitlab.freedesktop.org/drm/intel/issues/10215
  [i915#10216]: https://gitlab.freedesktop.org/drm/intel/issues/10216
  [i915#10274]: https://gitlab.freedesktop.org/drm/intel/issues/10274
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#8293]: https://gitlab.freedesktop.org/drm/intel/issues/8293
  [i915#8809]: https://gitlab.freedesktop.org/drm/intel/issues/8809
  [i915#9197]: https://gitlab.freedesktop.org/drm/intel/issues/9197
  [i915#9318]: https://gitlab.freedesktop.org/drm/intel/issues/9318
  [i915#9688]: https://gitlab.freedesktop.org/drm/intel/issues/9688
  [i915#9732]: https://gitlab.freedesktop.org/drm/intel/issues/9732
  [i915#9812]: https://gitlab.freedesktop.org/drm/intel/issues/9812
  [i915#9886]: https://gitlab.freedesktop.org/drm/intel/issues/9886


Build changes
-------------

  * Linux: CI_DRM_14300 -> Patchwork_129951v3

  CI-20190529: 20190529
  CI_DRM_14300: e2b02e89746d8eff8c244f938eecd0f1db8eb805 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7718: 40e8b9122853f455c84afcfa56469a6bc9a0d564 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_129951v3: e2b02e89746d8eff8c244f938eecd0f1db8eb805 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

3f2d8335dd9c drm/i915/gt: Enable only one CCS for compute workload
48a58957fb31 drm/i915/gt: Disable HW load balancing for CCS

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129951v3/index.html

[-- Attachment #2: Type: text/html, Size: 12037 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS
  2024-02-20 14:35 ` [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
@ 2024-02-20 23:26   ` Matt Roper
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Roper @ 2024-02-20 23:26 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

On Tue, Feb 20, 2024 at 03:35:25PM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE				_MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE		REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE	REG_BIT(1)
>  
>  #define CHV_FUSE_GT				_MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0			(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..9126b37186fc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
>  		wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1,
>  				 GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE);
>  	}
> +
> +	/*
> +	 * Wa_14019159160: disable the CCS load balancing
> +	 * indiscriminately for all the platforms

The database's description of this workaround is a bit confusing since
it's been modified a few times, but if I'm reading it correctly it
doesn't sound like this is what it's asking us to do.  What I see says
that load balancing shouldn't be allowed specifically while the RCS is
active.  If the RCS is sitting idle, I believe you're free to use as
many CCS engines as you like, with load balancing still active.

We already have other workarounds that prevent different address spaces
from executing on the RCS/CCS engines at the same time, so the part
about "same address space" in the description should already be
satisfied.  It sounds like the issues now are if 2+ CCS's are in use and
something new shows up to run on the previously-idle RCS, or if
something's already running on the RCS and 1 CCS, and something new
shows up to run on an additional CCS.  The workaround details make it
sound like it's supposed to be the GuC's responsibility to prevent the
new work from getting scheduled onto the additional engine while we're
already in one of those two situations, so I don't see anything asking
us to change the hardware-level load balance enable/disable (actually
the spec specifically tells us *not* to do this).  Aren't we supposed to
be just setting a GuC workaround flag for this?


Matt

> +	 */
> +	wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
>  }
>  
>  static void
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
  2024-02-20 14:42   ` Andi Shyti
  2024-02-20 14:48   ` Tvrtko Ursulin
@ 2024-02-20 23:39   ` Matt Roper
  2024-02-21  0:12     ` Andi Shyti
  2 siblings, 1 reply; 22+ messages in thread
From: Matt Roper @ 2024-02-20 23:39 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices
> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: <stable@vger.kernel.org> # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +++++++++
>  drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  2 ++
>  drivers/gpu/drm/i915/i915_query.c           |  1 +
>  4 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..7041acc77810 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>  		if (engine->uabi_class == I915_NO_UABI_CLASS)
>  			continue;
>  
> +		/*
> +		 * Do not list and do not count CCS engines other than the first
> +		 */
> +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> +		    engine->uabi_instance > 0) {
> +			i915->engine_uabi_class_count[engine->uabi_class]--;
> +			continue;
> +		}

Wouldn't it be simpler to just add a workaround to the end of
engine_mask_apply_compute_fuses() if we want to ensure only a single
compute engine gets exposed?  Then both the driver internals and uapi
will agree that's there's just one CCS (and on which one there is).

If we want to do something fancy with "hotplugging" a new engine later
on or whatever, that can be handled in the future series (although as
noted on the previous patch, it sounds like these changes might not
actually be aligned with the workaround we were trying to address).

> +
>  		rb_link_node(&engine->uabi_node, prev, p);
>  		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index a425db5ed3a2..e19df4ef47f6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
>  	}
>  }
>  
> +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> +	if (!IS_DG2(gt->i915))
> +		return;
> +
> +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);

This doesn't look right to me.  A value of 0 means every cslice gets
associated with CCS0.  On a DG2-G11 platform, that will flat out break
compute since CCS0 is never present (G11 only has a single CCS and it's
always the hardware's CCS1).  Even on a G10 or G12 this could also break
things depending on the fusing of your card if the hardware CCS0 happens
to be missing.

Also, the register says that we need a field value of 0x7 for each
cslice that's fused off.  By passing 0, we're telling the CCS engine
that it can use cslices that may not actually exist.

> +}
> +
>  int intel_gt_init_hw(struct intel_gt *gt)
>  {
>  	struct drm_i915_private *i915 = gt->i915;
> @@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
>  
>  	intel_gt_init_swizzling(gt);
>  
> +	/* Configure CCS mode */
> +	intel_gt_apply_ccs_mode(gt);
> +
>  	/*
>  	 * At least 830 can leave some of the unused rings
>  	 * "active" (ie. head != tail) after resume which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cf709f6c05ae..c148113770ea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1605,6 +1605,8 @@
>  #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
>  #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
>  
> +#define XEHP_CCS_MODE                          _MMIO(0x14804)

Nitpick:  this doesn't seem to be in the proper place and also breaks
the file's convention of using tabs to move over to column 48 for the
definition value.


Matt

> +
>  #define GEN11_GT_INTR_DW(x)			_MMIO(0x190018 + ((x) * 4))
>  #define   GEN11_CSME				(31)
>  #define   GEN12_HECI_2				(30)
> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> index 3baa2f54a86e..d5a5143971f5 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
>  	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
>  }
>  
> +
>  static int
>  query_engine_info(struct drm_i915_private *i915,
>  		  struct drm_i915_query_item *query_item)
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 23:39   ` Matt Roper
@ 2024-02-21  0:12     ` Andi Shyti
  2024-02-21 20:51       ` Matt Roper
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Shyti @ 2024-02-21  0:12 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

Hi Matt,

thanks a lot for looking into this.

On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:

[...]

> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 833987015b8b..7041acc77810 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> >  		if (engine->uabi_class == I915_NO_UABI_CLASS)
> >  			continue;
> >  
> > +		/*
> > +		 * Do not list and do not count CCS engines other than the first
> > +		 */
> > +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > +		    engine->uabi_instance > 0) {
> > +			i915->engine_uabi_class_count[engine->uabi_class]--;
> > +			continue;
> > +		}
> 
> Wouldn't it be simpler to just add a workaround to the end of
> engine_mask_apply_compute_fuses() if we want to ensure only a single
> compute engine gets exposed?  Then both the driver internals and uapi
> will agree that's there's just one CCS (and on which one there is).
> 
> If we want to do something fancy with "hotplugging" a new engine later
> on or whatever, that can be handled in the future series (although as
> noted on the previous patch, it sounds like these changes might not
> actually be aligned with the workaround we were trying to address).

The hotplugging capability is one of the features I was looking
for, actually.

I have done some more refactoring in this piece of code in
upcoming patches.

Will check, though, if I can do something with compute_fuses(),
even though, the other cslices are not really fused off (read
below).

> > +
> >  		rb_link_node(&engine->uabi_node, prev, p);
> >  		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index a425db5ed3a2..e19df4ef47f6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> >  	}
> >  }
> >  
> > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > +{
> > +	if (!IS_DG2(gt->i915))
> > +		return;
> > +
> > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> 
> This doesn't look right to me.  A value of 0 means every cslice gets
> associated with CCS0.

Yes, that's what I'm trying to do. The behavior I'm looking for
is this one:

	 /*
	  ...
          * With 1 engine (ccs0):
          *   slice 0, 1, 2, 3: ccs0
          *
          * With 2 engines (ccs0, ccs1):
          *   slice 0, 2: ccs0
          *   slice 1, 3: ccs1
          *
          * With 4 engines (ccs0, ccs1, ccs2, ccs3):
          *   slice 0: ccs0
          *   slice 1: ccs1
          *   slice 2: ccs2
          *   slice 3: ccs3
	  ...
	  */

where the user can configure runtime the mode, making sure that
no client is connected to i915.

But, this needs to be written 

As we are now forcing mode '1', then all cslices are connected
with ccs0.

> On a DG2-G11 platform, that will flat out break
> compute since CCS0 is never present (G11 only has a single CCS and it's
> always the hardware's CCS1).  Even on a G10 or G12 this could also break
> things depending on the fusing of your card if the hardware CCS0 happens
> to be missing.
> 
> Also, the register says that we need a field value of 0x7 for each
> cslice that's fused off.  By passing 0, we're telling the CCS engine
> that it can use cslices that may not actually exist.

does it? Or do I need to write the id (0x0-0x3) of the user
engine? That's how the mode is calculated in this algorithm.

> > +}
> > +

[...]

> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index cf709f6c05ae..c148113770ea 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1605,6 +1605,8 @@
> >  #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
> >  #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
> >  
> > +#define XEHP_CCS_MODE                          _MMIO(0x14804)
> 
> Nitpick:  this doesn't seem to be in the proper place and also breaks
> the file's convention of using tabs to move over to column 48 for the
> definition value.

This was something I actually forgot to fix. Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-20 14:48   ` Tvrtko Ursulin
@ 2024-02-21  0:14     ` Andi Shyti
  2024-02-21  8:19       ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Shyti @ 2024-02-21  0:14 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	Matt Roper, John Harrison, stable, Andi Shyti

Hi Tvrtko,

On Tue, Feb 20, 2024 at 02:48:31PM +0000, Tvrtko Ursulin wrote:
> On 20/02/2024 14:35, Andi Shyti wrote:
> > Enable only one CCS engine by default with all the compute sices
> 
> slices

Thanks!

> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 833987015b8b..7041acc77810 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> >   		if (engine->uabi_class == I915_NO_UABI_CLASS)
> >   			continue;
> > +		/*
> > +		 * Do not list and do not count CCS engines other than the first
> > +		 */
> > +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > +		    engine->uabi_instance > 0) {
> > +			i915->engine_uabi_class_count[engine->uabi_class]--;
> > +			continue;
> > +		}
> 
> It's a bit ugly to decrement after increment, instead of somehow
> restructuring the loop to satisfy both cases more elegantly.

yes, agree, indeed I had a hard time here to accept this change
myself.

But moving the check above where the counter was incremented it
would have been much uglier.

This check looks ugly everywhere you place it :-)

In any case, I'm working on a patch that is splitting this
function in two parts and there is some refactoring happening
here (for the first initialization and the dynamic update).

Please let me know if it's OK with you or you want me to fix it
in this run.

> And I wonder if
> internally (in dmesg when engine name is logged) we don't end up with ccs0
> ccs0 ccs0 ccs0.. for all instances.

I don't see this. Even in sysfs we see only one ccs. Where is it?

> > +
> >   		rb_link_node(&engine->uabi_node, prev, p);
> >   		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);

[...]

> > diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
> > index 3baa2f54a86e..d5a5143971f5 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
> >   	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
> >   }
> > +
> 
> Zap please.

yes... yes... I noticed it after sending the patch :-)

Thanks,
Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21  0:14     ` Andi Shyti
@ 2024-02-21  8:19       ` Tvrtko Ursulin
  2024-02-21 11:19         ` Andi Shyti
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2024-02-21  8:19 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, stable, Andi Shyti


On 21/02/2024 00:14, Andi Shyti wrote:
> Hi Tvrtko,
> 
> On Tue, Feb 20, 2024 at 02:48:31PM +0000, Tvrtko Ursulin wrote:
>> On 20/02/2024 14:35, Andi Shyti wrote:
>>> Enable only one CCS engine by default with all the compute sices
>>
>> slices
> 
> Thanks!
> 
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>> index 833987015b8b..7041acc77810 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>>>    		if (engine->uabi_class == I915_NO_UABI_CLASS)
>>>    			continue;
>>> +		/*
>>> +		 * Do not list and do not count CCS engines other than the first
>>> +		 */
>>> +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
>>> +		    engine->uabi_instance > 0) {
>>> +			i915->engine_uabi_class_count[engine->uabi_class]--;
>>> +			continue;
>>> +		}
>>
>> It's a bit ugly to decrement after increment, instead of somehow
>> restructuring the loop to satisfy both cases more elegantly.
> 
> yes, agree, indeed I had a hard time here to accept this change
> myself.
> 
> But moving the check above where the counter was incremented it
> would have been much uglier.
> 
> This check looks ugly everywhere you place it :-)

One idea would be to introduce a separate local counter array for 
name_instance, so not use i915->engine_uabi_class_count[]. First one 
increments for every engine, second only for the exposed ones. That way 
feels wouldn't be too ugly.

> In any case, I'm working on a patch that is splitting this
> function in two parts and there is some refactoring happening
> here (for the first initialization and the dynamic update).
> 
> Please let me know if it's OK with you or you want me to fix it
> in this run.
> 
>> And I wonder if
>> internally (in dmesg when engine name is logged) we don't end up with ccs0
>> ccs0 ccs0 ccs0.. for all instances.
> 
> I don't see this. Even in sysfs we see only one ccs. Where is it?

When you run this patch on something with two or more ccs-es, the 
"renamed ccs... to ccs.." debug logs do not all log the new name as ccs0?

Regards,

Tvrtko

> 
>>> +
>>>    		rb_link_node(&engine->uabi_node, prev, p);
>>>    		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> 
> [...]
> 
>>> diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
>>> index 3baa2f54a86e..d5a5143971f5 100644
>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>> @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915,
>>>    	return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask);
>>>    }
>>> +
>>
>> Zap please.
> 
> yes... yes... I noticed it after sending the patch :-)
> 
> Thanks,
> Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21  8:19       ` Tvrtko Ursulin
@ 2024-02-21 11:19         ` Andi Shyti
  2024-02-21 12:08           ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Shyti @ 2024-02-21 11:19 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	Matt Roper, John Harrison, stable, Andi Shyti

Hi Tvrtko,

On Wed, Feb 21, 2024 at 08:19:34AM +0000, Tvrtko Ursulin wrote:
> On 21/02/2024 00:14, Andi Shyti wrote:
> > On Tue, Feb 20, 2024 at 02:48:31PM +0000, Tvrtko Ursulin wrote:
> > > On 20/02/2024 14:35, Andi Shyti wrote:
> > > > Enable only one CCS engine by default with all the compute sices
> > > 
> > > slices
> > 
> > Thanks!
> > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > > index 833987015b8b..7041acc77810 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> > > >    		if (engine->uabi_class == I915_NO_UABI_CLASS)
> > > >    			continue;
> > > > +		/*
> > > > +		 * Do not list and do not count CCS engines other than the first
> > > > +		 */
> > > > +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > > > +		    engine->uabi_instance > 0) {
> > > > +			i915->engine_uabi_class_count[engine->uabi_class]--;
> > > > +			continue;
> > > > +		}
> > > 
> > > It's a bit ugly to decrement after increment, instead of somehow
> > > restructuring the loop to satisfy both cases more elegantly.
> > 
> > yes, agree, indeed I had a hard time here to accept this change
> > myself.
> > 
> > But moving the check above where the counter was incremented it
> > would have been much uglier.
> > 
> > This check looks ugly everywhere you place it :-)
> 
> One idea would be to introduce a separate local counter array for
> name_instance, so not use i915->engine_uabi_class_count[]. First one
> increments for every engine, second only for the exposed ones. That way
> feels wouldn't be too ugly.

Ah... you mean that whenever we change the CCS mode, we update
the indexes of the exposed engines from list of the real engines.
Will try.

My approach was to regenerate the list everytime the CCS mode was
changed, but your suggestion looks a bit simplier.

> > In any case, I'm working on a patch that is splitting this
> > function in two parts and there is some refactoring happening
> > here (for the first initialization and the dynamic update).
> > 
> > Please let me know if it's OK with you or you want me to fix it
> > in this run.
> > 
> > > And I wonder if
> > > internally (in dmesg when engine name is logged) we don't end up with ccs0
> > > ccs0 ccs0 ccs0.. for all instances.
> > 
> > I don't see this. Even in sysfs we see only one ccs. Where is it?
> 
> When you run this patch on something with two or more ccs-es, the "renamed
> ccs... to ccs.." debug logs do not all log the new name as ccs0?

it shouldn't, because the name_instance is anyway incremented
normally... anyway, I will test it.

Thanks a lot!
Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21 11:19         ` Andi Shyti
@ 2024-02-21 12:08           ` Tvrtko Ursulin
  2024-02-21 12:11             ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2024-02-21 12:08 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, stable, Andi Shyti


On 21/02/2024 11:19, Andi Shyti wrote:
> Hi Tvrtko,
> 
> On Wed, Feb 21, 2024 at 08:19:34AM +0000, Tvrtko Ursulin wrote:
>> On 21/02/2024 00:14, Andi Shyti wrote:
>>> On Tue, Feb 20, 2024 at 02:48:31PM +0000, Tvrtko Ursulin wrote:
>>>> On 20/02/2024 14:35, Andi Shyti wrote:
>>>>> Enable only one CCS engine by default with all the compute sices
>>>>
>>>> slices
>>>
>>> Thanks!
>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>> index 833987015b8b..7041acc77810 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
>>>>>     		if (engine->uabi_class == I915_NO_UABI_CLASS)
>>>>>     			continue;
>>>>> +		/*
>>>>> +		 * Do not list and do not count CCS engines other than the first
>>>>> +		 */
>>>>> +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
>>>>> +		    engine->uabi_instance > 0) {
>>>>> +			i915->engine_uabi_class_count[engine->uabi_class]--;
>>>>> +			continue;
>>>>> +		}
>>>>
>>>> It's a bit ugly to decrement after increment, instead of somehow
>>>> restructuring the loop to satisfy both cases more elegantly.
>>>
>>> yes, agree, indeed I had a hard time here to accept this change
>>> myself.
>>>
>>> But moving the check above where the counter was incremented it
>>> would have been much uglier.
>>>
>>> This check looks ugly everywhere you place it :-)
>>
>> One idea would be to introduce a separate local counter array for
>> name_instance, so not use i915->engine_uabi_class_count[]. First one
>> increments for every engine, second only for the exposed ones. That way
>> feels wouldn't be too ugly.
> 
> Ah... you mean that whenever we change the CCS mode, we update
> the indexes of the exposed engines from list of the real engines.
> Will try.
> 
> My approach was to regenerate the list everytime the CCS mode was
> changed, but your suggestion looks a bit simplier.

No, I meant just for this first stage of permanently single engine. For avoiding the decrement after increment. Something like this, but not compile tested even:

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..4c33f30612c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,8 @@ static void engine_rename(struct intel_engine_cs *engine, const char *name, u16
  
  void intel_engines_driver_register(struct drm_i915_private *i915)
  {
-       u16 name_instance, other_instance = 0;
+       u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
+       u16 uabi_class, other_instance = 0;
         struct legacy_ring ring = {};
         struct list_head *it, *next;
         struct rb_node **p, *prev;
@@ -222,15 +223,14 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
  
                 GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
                 engine->uabi_class = uabi_classes[engine->class];
+
                 if (engine->uabi_class == I915_NO_UABI_CLASS) {
-                       name_instance = other_instance++;
-               } else {
-                       GEM_BUG_ON(engine->uabi_class >=
-                                  ARRAY_SIZE(i915->engine_uabi_class_count));
-                       name_instance =
-                               i915->engine_uabi_class_count[engine->uabi_class]++;
-               }
-               engine->uabi_instance = name_instance;
+                       uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+               else
+                       uabi_class = engine->uabi_class;
+
+               GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+               engine->uabi_instance = class_instance[uabi_class]++;
  
                 /*
                  * Replace the internal name with the final user and log facing
@@ -238,11 +238,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
                  */
                 engine_rename(engine,
                               intel_engine_class_repr(engine->class),
-                             name_instance);
+                             engine->uabi_instance);
  
-               if (engine->uabi_class == I915_NO_UABI_CLASS)
+               if (uabi_class == I915_NO_UABI_CLASS)
                         continue;
  
+               GEM_BUG_ON(uabi_class >=
+                          ARRAY_SIZE(i915->engine_uabi_class_count));
+               i915->engine_uabi_class_count[uabi_class]++;
+
                 rb_link_node(&engine->uabi_node, prev, p);
                 rb_insert_color(&engine->uabi_node, &i915->uabi_engines);


>>> In any case, I'm working on a patch that is splitting this
>>> function in two parts and there is some refactoring happening
>>> here (for the first initialization and the dynamic update).
>>>
>>> Please let me know if it's OK with you or you want me to fix it
>>> in this run.
>>>
>>>> And I wonder if
>>>> internally (in dmesg when engine name is logged) we don't end up with ccs0
>>>> ccs0 ccs0 ccs0.. for all instances.
>>>
>>> I don't see this. Even in sysfs we see only one ccs. Where is it?
>>
>> When you run this patch on something with two or more ccs-es, the "renamed
>> ccs... to ccs.." debug logs do not all log the new name as ccs0?
> 
> it shouldn't, because the name_instance is anyway incremented
> normally... anyway, I will test it.

Hm maybe it needs more than two ccs engines and then it would be ccs0, ccs1, ccs2, ccs2, on a four ccs part. Or something.. It just feels the decrement of i915->engine_uabi_class_count, which engine_instance currently uses, has to mess this up somehow.

Regards,

Tvrtko

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21 12:08           ` Tvrtko Ursulin
@ 2024-02-21 12:11             ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2024-02-21 12:11 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, stable, Andi Shyti



On 21/02/2024 12:08, Tvrtko Ursulin wrote:
> 
> On 21/02/2024 11:19, Andi Shyti wrote:
>> Hi Tvrtko,
>>
>> On Wed, Feb 21, 2024 at 08:19:34AM +0000, Tvrtko Ursulin wrote:
>>> On 21/02/2024 00:14, Andi Shyti wrote:
>>>> On Tue, Feb 20, 2024 at 02:48:31PM +0000, Tvrtko Ursulin wrote:
>>>>> On 20/02/2024 14:35, Andi Shyti wrote:
>>>>>> Enable only one CCS engine by default with all the compute sices
>>>>>
>>>>> slices
>>>>
>>>> Thanks!
>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>>> index 833987015b8b..7041acc77810 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
>>>>>> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct 
>>>>>> drm_i915_private *i915)
>>>>>>             if (engine->uabi_class == I915_NO_UABI_CLASS)
>>>>>>                 continue;
>>>>>> +        /*
>>>>>> +         * Do not list and do not count CCS engines other than 
>>>>>> the first
>>>>>> +         */
>>>>>> +        if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
>>>>>> +            engine->uabi_instance > 0) {
>>>>>> +            i915->engine_uabi_class_count[engine->uabi_class]--;
>>>>>> +            continue;
>>>>>> +        }
>>>>>
>>>>> It's a bit ugly to decrement after increment, instead of somehow
>>>>> restructuring the loop to satisfy both cases more elegantly.
>>>>
>>>> yes, agree, indeed I had a hard time here to accept this change
>>>> myself.
>>>>
>>>> But moving the check above where the counter was incremented it
>>>> would have been much uglier.
>>>>
>>>> This check looks ugly everywhere you place it :-)
>>>
>>> One idea would be to introduce a separate local counter array for
>>> name_instance, so not use i915->engine_uabi_class_count[]. First one
>>> increments for every engine, second only for the exposed ones. That way
>>> feels wouldn't be too ugly.
>>
>> Ah... you mean that whenever we change the CCS mode, we update
>> the indexes of the exposed engines from list of the real engines.
>> Will try.
>>
>> My approach was to regenerate the list everytime the CCS mode was
>> changed, but your suggestion looks a bit simplier.
> 
> No, I meant just for this first stage of permanently single engine. For 
> avoiding the decrement after increment. Something like this, but not 
> compile tested even:
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..4c33f30612c4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -203,7 +203,8 @@ static void engine_rename(struct intel_engine_cs 
> *engine, const char *name, u16
> 
>   void intel_engines_driver_register(struct drm_i915_private *i915)
>   {
> -       u16 name_instance, other_instance = 0;
> +       u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
> +       u16 uabi_class, other_instance = 0;
>          struct legacy_ring ring = {};
>          struct list_head *it, *next;
>          struct rb_node **p, *prev;
> @@ -222,15 +223,14 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
> 
>                  GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
>                  engine->uabi_class = uabi_classes[engine->class];
> +
>                  if (engine->uabi_class == I915_NO_UABI_CLASS) {
> -                       name_instance = other_instance++;
> -               } else {
> -                       GEM_BUG_ON(engine->uabi_class >=
> -                                  
> ARRAY_SIZE(i915->engine_uabi_class_count));
> -                       name_instance =
> -                               
> i915->engine_uabi_class_count[engine->uabi_class]++;
> -               }
> -               engine->uabi_instance = name_instance;
> +                       uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
> +               else
> +                       uabi_class = engine->uabi_class;
> +
> +               GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
> +               engine->uabi_instance = class_instance[uabi_class]++;
> 
>                  /*
>                   * Replace the internal name with the final user and 
> log facing
> @@ -238,11 +238,15 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>                   */
>                  engine_rename(engine,
>                                intel_engine_class_repr(engine->class),
> -                             name_instance);
> +                             engine->uabi_instance);
> 
> -               if (engine->uabi_class == I915_NO_UABI_CLASS)
> +               if (uabi_class == I915_NO_UABI_CLASS)
>                          continue;

Here you just add the ccs skip condition.

Anyway.. I rushed it a bit so see what you think.

Regards,

Tvrtko

> 
> +               GEM_BUG_ON(uabi_class >=
> +                          ARRAY_SIZE(i915->engine_uabi_class_count));
> +               i915->engine_uabi_class_count[uabi_class]++;
> +
>                  rb_link_node(&engine->uabi_node, prev, p);
>                  rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> 
> 
>>>> In any case, I'm working on a patch that is splitting this
>>>> function in two parts and there is some refactoring happening
>>>> here (for the first initialization and the dynamic update).
>>>>
>>>> Please let me know if it's OK with you or you want me to fix it
>>>> in this run.
>>>>
>>>>> And I wonder if
>>>>> internally (in dmesg when engine name is logged) we don't end up 
>>>>> with ccs0
>>>>> ccs0 ccs0 ccs0.. for all instances.
>>>>
>>>> I don't see this. Even in sysfs we see only one ccs. Where is it?
>>>
>>> When you run this patch on something with two or more ccs-es, the 
>>> "renamed
>>> ccs... to ccs.." debug logs do not all log the new name as ccs0?
>>
>> it shouldn't, because the name_instance is anyway incremented
>> normally... anyway, I will test it.
> 
> Hm maybe it needs more than two ccs engines and then it would be ccs0, 
> ccs1, ccs2, ccs2, on a four ccs part. Or something.. It just feels the 
> decrement of i915->engine_uabi_class_count, which engine_instance 
> currently uses, has to mess this up somehow.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21  0:12     ` Andi Shyti
@ 2024-02-21 20:51       ` Matt Roper
  2024-02-22 22:03         ` Andi Shyti
  0 siblings, 1 reply; 22+ messages in thread
From: Matt Roper @ 2024-02-21 20:51 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> thanks a lot for looking into this.
> 
> On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> 
> [...]
> 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > index 833987015b8b..7041acc77810 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> > >  		if (engine->uabi_class == I915_NO_UABI_CLASS)
> > >  			continue;
> > >  
> > > +		/*
> > > +		 * Do not list and do not count CCS engines other than the first
> > > +		 */
> > > +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > > +		    engine->uabi_instance > 0) {
> > > +			i915->engine_uabi_class_count[engine->uabi_class]--;
> > > +			continue;
> > > +		}
> > 
> > Wouldn't it be simpler to just add a workaround to the end of
> > engine_mask_apply_compute_fuses() if we want to ensure only a single
> > compute engine gets exposed?  Then both the driver internals and uapi
> > will agree that's there's just one CCS (and on which one there is).
> > 
> > If we want to do something fancy with "hotplugging" a new engine later
> > on or whatever, that can be handled in the future series (although as
> > noted on the previous patch, it sounds like these changes might not
> > actually be aligned with the workaround we were trying to address).
> 
> The hotplugging capability is one of the features I was looking
> for, actually.
> 
> I have done some more refactoring in this piece of code in
> upcoming patches.
> 
> Will check, though, if I can do something with compute_fuses(),
> even though, the other cslices are not really fused off (read
> below).
> 
> > > +
> > >  		rb_link_node(&engine->uabi_node, prev, p);
> > >  		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index a425db5ed3a2..e19df4ef47f6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > >  	}
> > >  }
> > >  
> > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > +{
> > > +	if (!IS_DG2(gt->i915))
> > > +		return;
> > > +
> > > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > 
> > This doesn't look right to me.  A value of 0 means every cslice gets
> > associated with CCS0.
> 
> Yes, that's what I'm trying to do. The behavior I'm looking for
> is this one:
> 
> 	 /*
> 	  ...
>           * With 1 engine (ccs0):
>           *   slice 0, 1, 2, 3: ccs0
>           *
>           * With 2 engines (ccs0, ccs1):
>           *   slice 0, 2: ccs0
>           *   slice 1, 3: ccs1
>           *
>           * With 4 engines (ccs0, ccs1, ccs2, ccs3):
>           *   slice 0: ccs0
>           *   slice 1: ccs1
>           *   slice 2: ccs2
>           *   slice 3: ccs3
> 	  ...
> 	  */
> 
> where the user can configure runtime the mode, making sure that
> no client is connected to i915.
> 
> But, this needs to be written 
> 
> As we are now forcing mode '1', then all cslices are connected
> with ccs0.

Right --- and that's what I'm pointing out as illegal.  I think that
code comment above was taken out of context from a different RFC series;
that's not an accurate description of the behavior we want here.

First, the above comment is using ccs# to refer to userspace engines,
not hardware engines.  As a simple example, DG2-G11 only ever has a
single CCS which userspace sees as "instance 0" but which is actually
CCS1 at the hardware level.  If you try to follow the comment above when
programming CCS_MODE, you've assigned all of the cslices to a
non-existent engine and assigned no cslices to the CCS engine that
actually exists.  For DG2-G10 (and I think DG2-G12), there are different
combinations of fused-off / not-fused-off engines that will always show
up in userspace as CCS0-CCSn, even if those don't match the hardware
IDs.

Second, the above comment is assuming that you have a part with a
maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
as an example, there's also only a single cslice (cslice1), so if you
tell CCS1 that it's allowed to use EUs from non-existent cslice0,
cslice2, and cslice3, you might not get the behavior you were hoping
for.

> 
> > On a DG2-G11 platform, that will flat out break
> > compute since CCS0 is never present (G11 only has a single CCS and it's
> > always the hardware's CCS1).  Even on a G10 or G12 this could also break
> > things depending on the fusing of your card if the hardware CCS0 happens
> > to be missing.
> > 
> > Also, the register says that we need a field value of 0x7 for each
> > cslice that's fused off.  By passing 0, we're telling the CCS engine
> > that it can use cslices that may not actually exist.
> 
> does it? Or do I need to write the id (0x0-0x3) of the user
> engine? That's how the mode is calculated in this algorithm.

0x0 - 0x3 are how you specify that a specific CCS engine can use the
cslice.  If the cslice can't be used at all (i.e., it's fused off), then
you need to program a 0x7 to ensure no engines try to use the
non-existent DSS/EUs.


Matt

> 
> > > +}
> > > +
> 
> [...]
> 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > index cf709f6c05ae..c148113770ea 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > @@ -1605,6 +1605,8 @@
> > >  #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
> > >  #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
> > >  
> > > +#define XEHP_CCS_MODE                          _MMIO(0x14804)
> > 
> > Nitpick:  this doesn't seem to be in the proper place and also breaks
> > the file's convention of using tabs to move over to column 48 for the
> > definition value.
> 
> This was something I actually forgot to fix. Thanks!

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-21 20:51       ` Matt Roper
@ 2024-02-22 22:03         ` Andi Shyti
  2024-02-22 22:33           ` Matt Roper
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Shyti @ 2024-02-22 22:03 UTC (permalink / raw)
  To: Matt Roper
  Cc: Andi Shyti, intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

Hi Matt,

first of all thanks a lot for the observations you are raising.

On Wed, Feb 21, 2024 at 12:51:04PM -0800, Matt Roper wrote:
> On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:

...

> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > index a425db5ed3a2..e19df4ef47f6 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > > >  	}
> > > >  }
> > > >  
> > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > > +{
> > > > +	if (!IS_DG2(gt->i915))
> > > > +		return;
> > > > +
> > > > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > > 
> > > This doesn't look right to me.  A value of 0 means every cslice gets
> > > associated with CCS0.
> > 
> > Yes, that's what I'm trying to do. The behavior I'm looking for
> > is this one:
> > 
> > 	 /*
> > 	  ...
> >           * With 1 engine (ccs0):
> >           *   slice 0, 1, 2, 3: ccs0
> >           *
> >           * With 2 engines (ccs0, ccs1):
> >           *   slice 0, 2: ccs0
> >           *   slice 1, 3: ccs1
> >           *
> >           * With 4 engines (ccs0, ccs1, ccs2, ccs3):
> >           *   slice 0: ccs0
> >           *   slice 1: ccs1
> >           *   slice 2: ccs2
> >           *   slice 3: ccs3
> > 	  ...
> > 	  */
> > 
> > where the user can configure runtime the mode, making sure that
> > no client is connected to i915.
> > 
> > But, this needs to be written 
> > 
> > As we are now forcing mode '1', then all cslices are connected
> > with ccs0.
> 
> Right --- and that's what I'm pointing out as illegal.  I think that
> code comment above was taken out of context from a different RFC series;
> that's not an accurate description of the behavior we want here.
> 
> First, the above comment is using ccs# to refer to userspace engines,
> not hardware engines.  As a simple example, DG2-G11 only ever has a
> single CCS which userspace sees as "instance 0" but which is actually
> CCS1 at the hardware level.  If you try to follow the comment above when
> programming CCS_MODE, you've assigned all of the cslices to a
> non-existent engine and assigned no cslices to the CCS engine that
> actually exists.  For DG2-G10 (and I think DG2-G12), there are different
> combinations of fused-off / not-fused-off engines that will always show
> up in userspace as CCS0-CCSn, even if those don't match the hardware
> IDs.
> 
> Second, the above comment is assuming that you have a part with a
> maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
> as an example, there's also only a single cslice (cslice1), so if you
> tell CCS1 that it's allowed to use EUs from non-existent cslice0,
> cslice2, and cslice3, you might not get the behavior you were hoping
> for.

if the hardware slices are fused off we wouldn't see them in a
first place, right? And that's anyway a permanent configuration
that wouldn't affect the patch.

BTW, is there any machine I can test this scenario?

> > > On a DG2-G11 platform, that will flat out break
> > > compute since CCS0 is never present (G11 only has a single CCS and it's
> > > always the hardware's CCS1).  Even on a G10 or G12 this could also break
> > > things depending on the fusing of your card if the hardware CCS0 happens
> > > to be missing.
> > > 
> > > Also, the register says that we need a field value of 0x7 for each
> > > cslice that's fused off.  By passing 0, we're telling the CCS engine
> > > that it can use cslices that may not actually exist.
> > 
> > does it? Or do I need to write the id (0x0-0x3) of the user
> > engine? That's how the mode is calculated in this algorithm.
> 
> 0x0 - 0x3 are how you specify that a specific CCS engine can use the
> cslice.  If the cslice can't be used at all (i.e., it's fused off), then
> you need to program a 0x7 to ensure no engines try to use the
> non-existent DSS/EUs.

I planned to limit this to the only DG2 (and ATSM, of course).
Do you think it would it help?

Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
  2024-02-22 22:03         ` Andi Shyti
@ 2024-02-22 22:33           ` Matt Roper
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Roper @ 2024-02-22 22:33 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

On Thu, Feb 22, 2024 at 11:03:27PM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> first of all thanks a lot for the observations you are raising.
> 
> On Wed, Feb 21, 2024 at 12:51:04PM -0800, Matt Roper wrote:
> > On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> > > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> 
> ...
> 
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > index a425db5ed3a2..e19df4ef47f6 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > > > >  	}
> > > > >  }
> > > > >  
> > > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > > > +{
> > > > > +	if (!IS_DG2(gt->i915))
> > > > > +		return;
> > > > > +
> > > > > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > > > 
> > > > This doesn't look right to me.  A value of 0 means every cslice gets
> > > > associated with CCS0.
> > > 
> > > Yes, that's what I'm trying to do. The behavior I'm looking for
> > > is this one:
> > > 
> > > 	 /*
> > > 	  ...
> > >           * With 1 engine (ccs0):
> > >           *   slice 0, 1, 2, 3: ccs0
> > >           *
> > >           * With 2 engines (ccs0, ccs1):
> > >           *   slice 0, 2: ccs0
> > >           *   slice 1, 3: ccs1
> > >           *
> > >           * With 4 engines (ccs0, ccs1, ccs2, ccs3):
> > >           *   slice 0: ccs0
> > >           *   slice 1: ccs1
> > >           *   slice 2: ccs2
> > >           *   slice 3: ccs3
> > > 	  ...
> > > 	  */
> > > 
> > > where the user can configure runtime the mode, making sure that
> > > no client is connected to i915.
> > > 
> > > But, this needs to be written 
> > > 
> > > As we are now forcing mode '1', then all cslices are connected
> > > with ccs0.
> > 
> > Right --- and that's what I'm pointing out as illegal.  I think that
> > code comment above was taken out of context from a different RFC series;
> > that's not an accurate description of the behavior we want here.
> > 
> > First, the above comment is using ccs# to refer to userspace engines,
> > not hardware engines.  As a simple example, DG2-G11 only ever has a
> > single CCS which userspace sees as "instance 0" but which is actually
> > CCS1 at the hardware level.  If you try to follow the comment above when
> > programming CCS_MODE, you've assigned all of the cslices to a
> > non-existent engine and assigned no cslices to the CCS engine that
> > actually exists.  For DG2-G10 (and I think DG2-G12), there are different
> > combinations of fused-off / not-fused-off engines that will always show
> > up in userspace as CCS0-CCSn, even if those don't match the hardware
> > IDs.
> > 
> > Second, the above comment is assuming that you have a part with a
> > maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
> > as an example, there's also only a single cslice (cslice1), so if you
> > tell CCS1 that it's allowed to use EUs from non-existent cslice0,
> > cslice2, and cslice3, you might not get the behavior you were hoping
> > for.
> 
> if the hardware slices are fused off we wouldn't see them in a
> first place, right? And that's anyway a permanent configuration
> that wouldn't affect the patch.

There are physically four possible cslices in the IP design.  The
presence/absence of each of those cslices can vary both by SKU and by
part-specific fusing.  Some SKUs (DG2-G11) wind up only ever having a
single possible configuration as far as I know, but the larger SKUs have
more part-to-part variation in terms of exactly which specific subset of
DSS (and by extension cslices) are present/absent.  The KMD determines
the configuration at runtime by reading the DSS fuse registers and
deriving the cslice presence/absence from that.

The register you're writing in this patch tells the CCS engine which
cslice(s) it can use to execute work.  If the KMD already knows that
cslice<x> doesn't exist, but it tells CCS<y> that it can go ahead and
use it anyway, things probably won't work properly.  That's why the spec
mandates that we always program 0x7 in the register for any cslices that
we know aren't present so that none of the CCS engines will incorrectly
try to utilize them.  If we ignore that rule, then it's a driver bug.

> 
> BTW, is there any machine I can test this scenario?

There should differently-fused DG2 systems in our internal pool,
although I'm not sure what the breakdown is.  I don't think the
reservation system makes the low-level cslice configuration immediately
obvious on the summary page, so you might just need to log into a few
systems and read the fuse registers to check which ones are best for
testing these cases.

> 
> > > > On a DG2-G11 platform, that will flat out break
> > > > compute since CCS0 is never present (G11 only has a single CCS and it's
> > > > always the hardware's CCS1).  Even on a G10 or G12 this could also break
> > > > things depending on the fusing of your card if the hardware CCS0 happens
> > > > to be missing.
> > > > 
> > > > Also, the register says that we need a field value of 0x7 for each
> > > > cslice that's fused off.  By passing 0, we're telling the CCS engine
> > > > that it can use cslices that may not actually exist.
> > > 
> > > does it? Or do I need to write the id (0x0-0x3) of the user
> > > engine? That's how the mode is calculated in this algorithm.
> > 
> > 0x0 - 0x3 are how you specify that a specific CCS engine can use the
> > cslice.  If the cslice can't be used at all (i.e., it's fused off), then
> > you need to program a 0x7 to ensure no engines try to use the
> > non-existent DSS/EUs.
> 
> I planned to limit this to the only DG2 (and ATSM, of course).
> Do you think it would it help?

Yes, definitely.  It's mandatory programming for these platforms.


Matt

> 
> Andi

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Disable automatic load CCS load balancing
  2024-02-20 14:20 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
  2024-02-20 14:23 ` Andi Shyti
@ 2024-02-20 14:34 ` Andi Shyti
  1 sibling, 0 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:34 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

Please, ignore, I sent V1 again.

Sorry about the noise!

Andi

On Tue, Feb 20, 2024 at 03:20:32PM +0100, Andi Shyti wrote:
> Hi,
> 
> this series does basically two things:
> 
> 1. Disables automatic load balancing as adviced by the hardware
>    workaround.
> 
> 2. Forces the sharing of the load submitted to CCS among all the
>    CCS available (as of now only DG2 has more than one CCS). This
>    way the user, when sending a query, will see only one CCS
>    available.
> 
> Andi
> 
> Andi Shyti (2):
>   drm/i915/gt: Disable HW load balancing for CCS
>   drm/i915/gt: Set default CCS mode '1'
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
>  drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++++
>  drivers/gpu/drm/i915/i915_drv.h             | 17 +++++++++++++++++
>  drivers/gpu/drm/i915/i915_query.c           |  5 +++--
>  5 files changed, 40 insertions(+), 2 deletions(-)
> 
> -- 
> 2.43.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Disable automatic load CCS load balancing
  2024-02-20 14:20 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
@ 2024-02-20 14:23 ` Andi Shyti
  2024-02-20 14:34 ` Andi Shyti
  1 sibling, 0 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:23 UTC (permalink / raw)
  To: Andi Shyti
  Cc: intel-gfx, dri-devel, Chris Wilson, Joonas Lahtinen, Matt Roper,
	John Harrison, Tvrtko Ursulin, stable, Andi Shyti

Hi,

I'm sorry, I forgot to add the changelog. Here it is:

v1 -> v2
========
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI
  engine list and adapt the engine counting accordingly (thanks
  Tvrtko).
- Reword the commit of Patch 2 (thanks John).

On Tue, Feb 20, 2024 at 03:20:32PM +0100, Andi Shyti wrote:
> Hi,
> 
> this series does basically two things:
> 
> 1. Disables automatic load balancing as adviced by the hardware
>    workaround.
> 
> 2. Forces the sharing of the load submitted to CCS among all the
>    CCS available (as of now only DG2 has more than one CCS). This
>    way the user, when sending a query, will see only one CCS
>    available.
> 
> Andi
> 
> Andi Shyti (2):
>   drm/i915/gt: Disable HW load balancing for CCS
>   drm/i915/gt: Set default CCS mode '1'
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
>  drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++++
>  drivers/gpu/drm/i915/i915_drv.h             | 17 +++++++++++++++++
>  drivers/gpu/drm/i915/i915_query.c           |  5 +++--
>  5 files changed, 40 insertions(+), 2 deletions(-)
> 
> -- 
> 2.43.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/2] Disable automatic load CCS load balancing
@ 2024-02-20 14:20 Andi Shyti
  2024-02-20 14:23 ` Andi Shyti
  2024-02-20 14:34 ` Andi Shyti
  0 siblings, 2 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-20 14:20 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, John Harrison,
	Tvrtko Ursulin, stable, Andi Shyti, Andi Shyti

Hi,

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
   workaround.

2. Forces the sharing of the load submitted to CCS among all the
   CCS available (as of now only DG2 has more than one CCS). This
   way the user, when sending a query, will see only one CCS
   available.

Andi

Andi Shyti (2):
  drm/i915/gt: Disable HW load balancing for CCS
  drm/i915/gt: Set default CCS mode '1'

 drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++++
 drivers/gpu/drm/i915/i915_drv.h             | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_query.c           |  5 +++--
 5 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/2] Disable automatic load CCS load balancing
@ 2024-02-15 13:59 Andi Shyti
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Shyti @ 2024-02-15 13:59 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Chris Wilson, Joonas Lahtinen, Matt Roper, stable, Andi Shyti,
	Andi Shyti

Hi,

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
   workaround.

2. Forces the sharing of the load submitted to CCS among all the
   CCS available (as of now only DG2 has more than one CCS). This
   way the user, when sending a query, will see only one CCS
   available.

Andi

Andi Shyti (2):
  drm/i915/gt: Disable HW load balancing for CCS
  drm/i915/gt: Set default CCS mode '1'

 drivers/gpu/drm/i915/gt/intel_gt.c          | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 ++++++
 drivers/gpu/drm/i915/i915_drv.h             | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_query.c           |  5 +++--
 5 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-02-22 22:33 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-20 14:35 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
2024-02-20 14:35 ` [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS Andi Shyti
2024-02-20 23:26   ` Matt Roper
2024-02-20 14:35 ` [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload Andi Shyti
2024-02-20 14:42   ` Andi Shyti
2024-02-20 14:48   ` Tvrtko Ursulin
2024-02-21  0:14     ` Andi Shyti
2024-02-21  8:19       ` Tvrtko Ursulin
2024-02-21 11:19         ` Andi Shyti
2024-02-21 12:08           ` Tvrtko Ursulin
2024-02-21 12:11             ` Tvrtko Ursulin
2024-02-20 23:39   ` Matt Roper
2024-02-21  0:12     ` Andi Shyti
2024-02-21 20:51       ` Matt Roper
2024-02-22 22:03         ` Andi Shyti
2024-02-22 22:33           ` Matt Roper
2024-02-20 15:45 ` ✗ Fi.CI.CHECKPATCH: warning for Disable automatic load CCS load balancing (rev3) Patchwork
2024-02-20 16:06 ` ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-02-20 14:20 [PATCH 0/2] Disable automatic load CCS load balancing Andi Shyti
2024-02-20 14:23 ` Andi Shyti
2024-02-20 14:34 ` Andi Shyti
2024-02-15 13:59 Andi Shyti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.