All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming
@ 2017-09-22 15:10 Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

Hi,

A small update to all userspace to select the engine to which the
slice/subslice configuration applies (as suggested by Chris).

Cheers,

Chris Wilson (4):
  drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (1):
  drm/i915: expose helper mapping exec flag engine to intel_engine_cs

 drivers/gpu/drm/i915/i915_debugfs.c        |  36 +++++++---
 drivers/gpu/drm/i915/i915_drv.h            |  21 +-----
 drivers/gpu/drm/i915/i915_gem_context.c    |  55 ++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h    |  21 ++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  18 ++---
 drivers/gpu/drm/i915/intel_device_info.c   |  32 +++++----
 drivers/gpu/drm/i915/intel_lrc.c           | 112 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.h           |   5 ++
 include/uapi/drm/i915_drm.h                |  28 ++++++++
 9 files changed, 258 insertions(+), 70 deletions(-)

--
2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
@ 2017-09-22 15:10 ` Lionel Landwerlin
  2017-09-29 11:09   ` Joonas Lahtinen
  2017-09-22 15:10 ` [RFC PATCH v2 2/5] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Lionel Landwerlin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

This function will be used later by the per (context,engine) power
programming interface.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f5d0e816008d..60c63f141a47 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3380,6 +3380,9 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file_priv);
 int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
+struct intel_engine_cs *i915_gem_engine_from_flags(struct drm_i915_private *dev_priv,
+						   struct drm_file *file,
+						   u64 flags);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d733c4d5a500..6120d22ac145 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2008,12 +2008,12 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 	[I915_EXEC_VEBOX]	= VECS
 };
 
-static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
-		 struct drm_file *file,
-		 struct drm_i915_gem_execbuffer2 *args)
+struct intel_engine_cs *
+i915_gem_engine_from_flags(struct drm_i915_private *dev_priv,
+			   struct drm_file *file,
+			   u64 flags)
 {
-	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
+	unsigned int user_ring_id = flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
 	if (user_ring_id > I915_USER_RINGS) {
@@ -2022,14 +2022,14 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 	}
 
 	if ((user_ring_id != I915_EXEC_BSD) &&
-	    ((args->flags & I915_EXEC_BSD_MASK) != 0)) {
+	    ((flags & I915_EXEC_BSD_MASK) != 0)) {
 		DRM_DEBUG("execbuf with non bsd ring but with invalid "
-			  "bsd dispatch flags: %d\n", (int)(args->flags));
+			  "bsd dispatch flags: %d\n", (int)(flags));
 		return NULL;
 	}
 
 	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
-		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
+		unsigned int bsd_idx = flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
 			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
@@ -2220,7 +2220,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = i915_gem_engine_from_flags(eb.i915, file, args->flags);
 	if (!eb.engine)
 		return -EINVAL;
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 2/5] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
@ 2017-09-22 15:10 ` Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 3/5] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

When we query the available eu on each subslice, we currently only
report the max. It would also be useful to report the minimum found as
well.

When we set RPCS (power gating over the EU), we can also specify both
the min and max number of eu to configure on each slice; currently we
just set it to a single value, but the flexibility may be beneficial in
future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 36 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_drv.h          |  3 ++-
 drivers/gpu/drm/i915/intel_device_info.c | 32 +++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_lrc.c         |  4 ++--
 4 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 13fc25997d65..fb77ad49d327 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4547,6 +4547,7 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_cache_sharing_fops,
 static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 					  struct sseu_dev_info *sseu)
 {
+	unsigned int min_eu_per_subslice, max_eu_per_subslice;
 	int ss_max = 2;
 	int ss;
 	u32 sig1[ss_max], sig2[ss_max];
@@ -4556,6 +4557,9 @@ static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 	sig2[0] = I915_READ(CHV_POWER_SS0_SIG2);
 	sig2[1] = I915_READ(CHV_POWER_SS1_SIG2);
 
+	min_eu_per_subslice = ~0u;
+	max_eu_per_subslice = 0;
+
 	for (ss = 0; ss < ss_max; ss++) {
 		unsigned int eu_cnt;
 
@@ -4570,14 +4574,18 @@ static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 			 ((sig1[ss] & CHV_EU210_PG_ENABLE) ? 0 : 2) +
 			 ((sig2[ss] & CHV_EU311_PG_ENABLE) ? 0 : 2);
 		sseu->eu_total += eu_cnt;
-		sseu->eu_per_subslice = max_t(unsigned int,
-					      sseu->eu_per_subslice, eu_cnt);
+		min_eu_per_subslice = min(min_eu_per_subslice, eu_cnt);
+		max_eu_per_subslice = max(max_eu_per_subslice, eu_cnt);
 	}
+
+	sseu->min_eu_per_subslice = min_eu_per_subslice;
+	sseu->max_eu_per_subslice = max_eu_per_subslice;
 }
 
 static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 				    struct sseu_dev_info *sseu)
 {
+	unsigned int min_eu_per_subslice, max_eu_per_subslice;
 	int s_max = 3, ss_max = 4;
 	int s, ss;
 	u32 s_reg[s_max], eu_reg[2*s_max], eu_mask[2];
@@ -4603,6 +4611,9 @@ static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 		     GEN9_PGCTL_SSB_EU210_ACK |
 		     GEN9_PGCTL_SSB_EU311_ACK;
 
+	min_eu_per_subslice = ~0u;
+	max_eu_per_subslice = 0;
+
 	for (s = 0; s < s_max; s++) {
 		if ((s_reg[s] & GEN9_PGCTL_SLICE_ACK) == 0)
 			/* skip disabled slice */
@@ -4628,11 +4639,14 @@ static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 			eu_cnt = 2 * hweight32(eu_reg[2*s + ss/2] &
 					       eu_mask[ss%2]);
 			sseu->eu_total += eu_cnt;
-			sseu->eu_per_subslice = max_t(unsigned int,
-						      sseu->eu_per_subslice,
-						      eu_cnt);
+
+			min_eu_per_subslice = min(min_eu_per_subslice, eu_cnt);
+			max_eu_per_subslice = max(max_eu_per_subslice, eu_cnt);
 		}
 	}
+
+	sseu->min_eu_per_subslice = min_eu_per_subslice;
+	sseu->max_eu_per_subslice = max_eu_per_subslice;
 }
 
 static void broadwell_sseu_device_status(struct drm_i915_private *dev_priv,
@@ -4645,9 +4659,11 @@ static void broadwell_sseu_device_status(struct drm_i915_private *dev_priv,
 
 	if (sseu->slice_mask) {
 		sseu->subslice_mask = INTEL_INFO(dev_priv)->sseu.subslice_mask;
-		sseu->eu_per_subslice =
-				INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
-		sseu->eu_total = sseu->eu_per_subslice *
+		sseu->min_eu_per_subslice =
+			INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice;
+		sseu->max_eu_per_subslice =
+			INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice;
+		sseu->eu_total = sseu->max_eu_per_subslice *
 				 sseu_subslice_total(sseu);
 
 		/* subtract fused off EU(s) from enabled slice(s) */
@@ -4678,8 +4694,8 @@ static void i915_print_sseu_info(struct seq_file *m, bool is_available_info,
 		   hweight8(sseu->subslice_mask));
 	seq_printf(m, "  %s EU Total: %u\n", type,
 		   sseu->eu_total);
-	seq_printf(m, "  %s EU Per Subslice: %u\n", type,
-		   sseu->eu_per_subslice);
+	seq_printf(m, "  %s EU Per Subslice: [%u, %u]\n", type,
+		   sseu->min_eu_per_subslice, sseu->max_eu_per_subslice);
 
 	if (!is_available_info)
 		return;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 60c63f141a47..2902a0251484 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -803,7 +803,8 @@ struct sseu_dev_info {
 	u8 slice_mask;
 	u8 subslice_mask;
 	u8 eu_total;
-	u8 eu_per_subslice;
+	u8 min_eu_per_subslice;
+	u8 max_eu_per_subslice;
 	u8 min_eu_in_pool;
 	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
 	u8 subslice_7eu[3];
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index fdf9b54b71e9..306512688367 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -85,6 +85,7 @@ void intel_device_info_dump(struct drm_i915_private *dev_priv)
 static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct sseu_dev_info *sseu = &mkwrite_device_info(dev_priv)->sseu;
+	unsigned int eu_per_subslice;
 	u32 fuse, eu_dis;
 
 	fuse = I915_READ(CHV_FUSE_GT);
@@ -109,9 +110,10 @@ static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * CHV expected to always have a uniform distribution of EU
 	 * across subslices.
 	*/
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				sseu->eu_total / sseu_subslice_total(sseu) :
-				0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		sseu->eu_total / sseu_subslice_total(sseu) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 	/*
 	 * CHV supports subslice power gating on devices with more than
 	 * one subslice, and supports EU power gating on devices with
@@ -119,13 +121,14 @@ static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 	*/
 	sseu->has_slice_pg = 0;
 	sseu->has_subslice_pg = sseu_subslice_total(sseu) > 1;
-	sseu->has_eu_pg = (sseu->eu_per_subslice > 2);
+	sseu->has_eu_pg = eu_per_subslice > 2;
 }
 
 static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct intel_device_info *info = mkwrite_device_info(dev_priv);
 	struct sseu_dev_info *sseu = &info->sseu;
+	unsigned int eu_per_subslice;
 	int s_max = 3, ss_max = 4, eu_max = 8;
 	int s, ss;
 	u32 fuse2, eu_disable;
@@ -181,9 +184,10 @@ static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * recovery. BXT is expected to be perfectly uniform in EU
 	 * distribution.
 	*/
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				DIV_ROUND_UP(sseu->eu_total,
-					     sseu_subslice_total(sseu)) : 0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		DIV_ROUND_UP(sseu->eu_total, sseu_subslice_total(sseu)) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 	/*
 	 * SKL+ supports slice power gating on devices with more than
 	 * one slice, and supports EU power gating on devices with
@@ -196,7 +200,7 @@ static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 		!IS_GEN9_LP(dev_priv) && hweight8(sseu->slice_mask) > 1;
 	sseu->has_subslice_pg =
 		IS_GEN9_LP(dev_priv) && sseu_subslice_total(sseu) > 1;
-	sseu->has_eu_pg = sseu->eu_per_subslice > 2;
+	sseu->has_eu_pg = eu_per_subslice > 2;
 
 	if (IS_GEN9_LP(dev_priv)) {
 #define IS_SS_DISABLED(ss)	(!(sseu->subslice_mask & BIT(ss)))
@@ -229,6 +233,7 @@ static void broadwell_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct sseu_dev_info *sseu = &mkwrite_device_info(dev_priv)->sseu;
 	const int s_max = 3, ss_max = 3, eu_max = 8;
+	unsigned int eu_per_subslice;
 	int s, ss;
 	u32 fuse2, eu_disable[3]; /* s_max */
 
@@ -283,9 +288,10 @@ static void broadwell_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * subslices with the exception that any one EU in any one subslice may
 	 * be fused off for die recovery.
 	 */
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				DIV_ROUND_UP(sseu->eu_total,
-					     sseu_subslice_total(sseu)) : 0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		DIV_ROUND_UP(sseu->eu_total, sseu_subslice_total(sseu)) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 
 	/*
 	 * BDW supports slice power gating on devices with more than
@@ -420,7 +426,9 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("subslice per slice: %u\n",
 			 hweight8(info->sseu.subslice_mask));
 	DRM_DEBUG_DRIVER("EU total: %u\n", info->sseu.eu_total);
-	DRM_DEBUG_DRIVER("EU per subslice: %u\n", info->sseu.eu_per_subslice);
+	DRM_DEBUG_DRIVER("EU per subslice: [%u, %u]\n",
+			 info->sseu.min_eu_per_subslice,
+			 info->sseu.max_eu_per_subslice);
 	DRM_DEBUG_DRIVER("has slice power gating: %s\n",
 			 info->sseu.has_slice_pg ? "y" : "n");
 	DRM_DEBUG_DRIVER("has subslice power gating: %s\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 955c87999280..341a4205f480 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1922,9 +1922,9 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	}
 
 	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 3/5] drm/i915: Program RPCS for Broadwell
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 2/5] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Lionel Landwerlin
@ 2017-09-22 15:10 ` Lionel Landwerlin
  2017-09-22 15:10 ` [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 341a4205f480..699398257c37 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1894,13 +1894,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2017-09-22 15:10 ` [RFC PATCH v2 3/5] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2017-09-22 15:10 ` Lionel Landwerlin
  2017-09-29 11:18   ` Joonas Lahtinen
  2017-09-22 15:10 ` [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 19 -------------------
 drivers/gpu/drm/i915/i915_gem_context.c |  6 ++++++
 drivers/gpu/drm/i915/i915_gem_context.h | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 23 +++++++++--------------
 4 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2902a0251484..b7ee84317f49 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -799,25 +799,6 @@ struct intel_csr {
 	func(supports_tv); \
 	func(has_ipc);
 
-struct sseu_dev_info {
-	u8 slice_mask;
-	u8 subslice_mask;
-	u8 eu_total;
-	u8 min_eu_per_subslice;
-	u8 max_eu_per_subslice;
-	u8 min_eu_in_pool;
-	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
-	u8 subslice_7eu[3];
-	u8 has_slice_pg:1;
-	u8 has_subslice_pg:1;
-	u8 has_eu_pg:1;
-};
-
-static inline unsigned int sseu_subslice_total(const struct sseu_dev_info *sseu)
-{
-	return hweight8(sseu->slice_mask) * hweight8(sseu->subslice_mask);
-}
-
 /* Keep in gen based order, and chronological order within a gen */
 enum intel_platform {
 	INTEL_PLATFORM_UNINITIALIZED = 0,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 921ee369c74d..b386574259a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -257,6 +257,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -305,6 +307,10 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id)
+		ctx->engine[id].sseu = INTEL_INFO(dev_priv)->sseu;
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 44688e22a5c2..727b3b5bced1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -40,6 +40,25 @@ struct i915_hw_ppgtt;
 struct i915_vma;
 struct intel_ring;
 
+struct sseu_dev_info {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 eu_total;
+	u8 min_eu_per_subslice;
+	u8 max_eu_per_subslice;
+	u8 min_eu_in_pool;
+	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
+	u8 subslice_7eu[3];
+	u8 has_slice_pg:1;
+	u8 has_subslice_pg:1;
+	u8 has_eu_pg:1;
+};
+
+static inline unsigned int sseu_subslice_total(const struct sseu_dev_info *sseu)
+{
+	return hweight8(sseu->slice_mask) * hweight8(sseu->subslice_mask);
+}
+
 #define DEFAULT_CONTEXT_HANDLE 0
 
 /**
@@ -158,6 +177,8 @@ struct i915_gem_context {
 		u64 lrc_desc;
 		int pin_count;
 		bool initialised;
+		/** sseu: Control eu/slice partitioning */
+		struct sseu_dev_info sseu;
 	} engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 699398257c37..f5e9caf4913c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1889,8 +1889,7 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu)
 {
 	u32 rpcs = 0;
 
@@ -1900,25 +1899,21 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(sseu->slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= hweight8(sseu->subslice_mask) << GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice <<
-			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice <<
-			GEN8_RPCS_EU_MAX_SHIFT;
+	if (sseu->has_eu_pg) {
+		rpcs |= sseu->min_eu_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
+		rpcs |= sseu->max_eu_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
@@ -2032,7 +2027,7 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&ctx->engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2017-09-22 15:10 ` [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2017-09-22 15:10 ` Lionel Landwerlin
  2017-10-05 11:57   ` Tvrtko Ursulin
  2017-09-22 15:33 ` ✓ Fi.CI.BAT: success for drm/i915: enable userspace to program slice/subslice programming (rev2) Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-22 15:10 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 49 ++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 82 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  5 ++
 include/uapi/drm/i915_drm.h             | 28 +++++++++++
 4 files changed, 164 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b386574259a1..088b5035c3a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1042,6 +1042,30 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = i915_gem_engine_from_flags(to_i915(dev), file,
+						    param_sseu.flags);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		param_sseu.value = intel_lr_context_get_sseu(ctx, engine);
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -1097,6 +1121,31 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		if (args->size)
+			ret = -EINVAL;
+		else if (!i915_modparams.enable_execlists)
+			ret = -ENODEV;
+		else {
+			struct drm_i915_gem_context_param_sseu param_sseu;
+			struct intel_engine_cs *engine;
+
+			if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+					   sizeof(param_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = i915_gem_engine_from_flags(to_i915(dev), file,
+							    param_sseu.flags);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ret = intel_lr_context_set_sseu(ctx, engine, param_sseu.value);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f5e9caf4913c..bffdc1126838 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2164,3 +2164,85 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 		}
 	}
 }
+
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      u64 value)
+{
+	struct drm_i915_gem_context_param_sseu user = { .value = value };
+	struct drm_i915_private *i915 = ctx->i915;
+	struct sseu_dev_info sseu = ctx->engine[engine->id].sseu;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	sseu.slice_mask = user.packed.slice_mask == 0 ?
+		INTEL_INFO(i915)->sseu.slice_mask :
+		(user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask);
+	sseu.subslice_mask = user.packed.subslice_mask == 0 ?
+		INTEL_INFO(i915)->sseu.subslice_mask :
+		(user.packed.subslice_mask & INTEL_INFO(i915)->sseu.subslice_mask);
+	sseu.min_eu_per_subslice =
+		max(user.packed.min_eu_per_subslice,
+		    INTEL_INFO(i915)->sseu.min_eu_per_subslice);
+	sseu.max_eu_per_subslice =
+		min(user.packed.max_eu_per_subslice,
+		    INTEL_INFO(i915)->sseu.max_eu_per_subslice);
+
+	if (memcmp(&sseu, &ctx->engine[engine->id].sseu, sizeof(sseu)) == 0)
+		return 0;
+
+	/*
+	 * We can only program this on render ring.
+	 */
+	ce = &ctx->engine[RCS];
+
+	if (ce->pin_count) { /* Assume that the context is active! */
+		ret = i915_gem_switch_to_kernel_context(i915);
+		if (ret)
+			return ret;
+
+		ret = i915_gem_wait_for_idle(i915,
+					     I915_WAIT_INTERRUPTIBLE |
+					     I915_WAIT_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (ce->state) {
+		u32 *regs;
+
+		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB) +
+			LRC_STATE_PN * PAGE_SIZE;
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
+
+		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, i915, id)
+		ctx->engine[id].sseu = sseu;
+
+	return 0;
+}
+
+u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine)
+{
+	struct drm_i915_gem_context_param_sseu user;
+	const struct sseu_dev_info *sseu = &ctx->engine[engine->id].sseu;
+
+	user.packed.slice_mask = sseu->slice_mask;
+	user.packed.subslice_mask = sseu->subslice_mask;
+	user.packed.min_eu_per_subslice = sseu->min_eu_per_subslice;
+	user.packed.max_eu_per_subslice = sseu->max_eu_per_subslice;
+
+	return user.value;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 314adee7127a..a51e67d9fec5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -106,6 +106,11 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
 	return ctx->engine[engine->id].lrc_desc;
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      u64 value);
+u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fe25a01c81f2..ed1cbced8e6e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1360,9 +1360,37 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+/*
+ * When using the following param, value should be a pointer to
+ * drm_i915_gem_context_param_sseu.
+ */
+#define I915_CONTEXT_PARAM_SSEU		0x6
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine to be configured or queried. Same value you would use with
+	 * drm_i915_gem_execbuffer2.
+	 */
+	__u64 flags;
+
+	/*
+	 * Setting slice_mask or subslice_mask to 0 will make the context use
+	 * masks reported respectively by I915_PARAM_SLICE_MASK or
+	 * I915_PARAM_SUBSLICE_MASK.
+	 */
+	union {
+		struct {
+			__u8 slice_mask;
+			__u8 subslice_mask;
+			__u8 min_eu_per_subslice;
+			__u8 max_eu_per_subslice;
+		} packed;
+		__u64 value;
+	};
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: enable userspace to program slice/subslice programming (rev2)
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2017-09-22 15:10 ` [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2017-09-22 15:33 ` Patchwork
  2017-09-22 20:41 ` ✗ Fi.CI.IGT: failure " Patchwork
  2017-09-29 11:20 ` [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Joonas Lahtinen
  7 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-09-22 15:33 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: enable userspace to program slice/subslice programming (rev2)
URL   : https://patchwork.freedesktop.org/series/29715/
State : success

== Summary ==

Series 29715v2 drm/i915: enable userspace to program slice/subslice programming
https://patchwork.freedesktop.org/api/1.0/series/29715/revisions/2/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                pass       -> FAIL       (fi-snb-2600) fdo#100215
Test drv_module_reload:
        Subgroup basic-reload:
                pass       -> DMESG-WARN (fi-glk-1) fdo#102777 +1

fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#102777 https://bugs.freedesktop.org/show_bug.cgi?id=102777

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:440s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:468s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:418s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:521s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:276s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:505s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:488s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:493s
fi-cfl-s         total:289  pass:223  dwarn:34  dfail:0   fail:0   skip:32  time:537s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:417s
fi-glk-1         total:289  pass:259  dwarn:1   dfail:0   fail:0   skip:29  time:627s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:423s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:404s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:430s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:488s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:465s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:472s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:575s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:581s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:545s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:449s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:752s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:486s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:471s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:567s
fi-snb-2600      total:289  pass:248  dwarn:0   dfail:0   fail:2   skip:39  time:417s

e0e308721fd283e1c5777657a5941f178f0d49e6 drm-tip: 2017y-09m-22d-13h-31m-38s UTC integration manifest
debf519bdb29 drm/i915: Expose RPCS (SSEU) configuration to userspace
dd07f41f81a8 drm/i915: Record the sseu configuration per-context & engine
ce6c3d3f3fbb drm/i915: Program RPCS for Broadwell
806471214773 drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
1039526a05a7 drm/i915: expose helper mapping exec flag engine to intel_engine_cs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5795/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: enable userspace to program slice/subslice programming (rev2)
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2017-09-22 15:33 ` ✓ Fi.CI.BAT: success for drm/i915: enable userspace to program slice/subslice programming (rev2) Patchwork
@ 2017-09-22 20:41 ` Patchwork
  2017-09-29 11:20 ` [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Joonas Lahtinen
  7 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2017-09-22 20:41 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: enable userspace to program slice/subslice programming (rev2)
URL   : https://patchwork.freedesktop.org/series/29715/
State : failure

== Summary ==

Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-primscrn-pri-indfb-draw-blt:
                skip       -> PASS       (shard-hsw)
Test gem_ctx_param:
        Subgroup invalid-param-set:
                pass       -> FAIL       (shard-hsw)
Test kms_atomic_transition:
        Subgroup plane-all-transition-nonblocking-fencing:
                pass       -> FAIL       (shard-hsw)
Test perf:
        Subgroup blocking:
                pass       -> FAIL       (shard-hsw) fdo#102252
Test gem_exec_parallel:
        Subgroup render-fds:
                pass       -> INCOMPLETE (shard-hsw)

fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-hsw        total:2429 pass:1308 dwarn:1   dfail:0   fail:13  skip:1058 time:9646s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5795/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2017-09-22 15:10 ` [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
@ 2017-09-29 11:09   ` Joonas Lahtinen
  0 siblings, 0 replies; 15+ messages in thread
From: Joonas Lahtinen @ 2017-09-29 11:09 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

On Fri, 2017-09-22 at 16:10 +0100, Lionel Landwerlin wrote:
> This function will be used later by the per (context,engine) power
> programming interface.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  3 +++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +++++++++---------
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f5d0e816008d..60c63f141a47 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3380,6 +3380,9 @@ int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  			      struct drm_file *file_priv);
>  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *file_priv);
> +struct intel_engine_cs *i915_gem_engine_from_flags(struct drm_i915_private *dev_priv,

May wanna cut before function name here, too, to keep the readability.

Also, why would i915_select_engine be a bad name?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine
  2017-09-22 15:10 ` [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2017-09-29 11:18   ` Joonas Lahtinen
  0 siblings, 0 replies; 15+ messages in thread
From: Joonas Lahtinen @ 2017-09-29 11:18 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

On Fri, 2017-09-22 at 16:10 +0100, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to expose the ability to reconfigure the slices, subslice and
> eu per context and per engine. To facilitate that, store the current
> configuration on the context for each engine, which is initially set
> to the device default upon creation.
> 
> v2: record sseu configuration per context & engine (Chris)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

<SNIP>

> @@ -158,6 +177,8 @@ struct i915_gem_context {
>  		u64 lrc_desc;
>  		int pin_count;
>  		bool initialised;
> +		/** sseu: Control eu/slice partitioning */

Consistently "EU" vs "eu" as this goes to HTML docs.

> @@ -1889,8 +1889,7 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>  	return logical_ring_init(engine);
>  }
>  
> -static u32
> -make_rpcs(struct drm_i915_private *dev_priv)
> +static u32 make_rpcs(const struct sseu_dev_info *sseu)

That's bit of an extra noise, but the return is unlikely to be
changed... So inconclusive.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming
  2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2017-09-22 20:41 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2017-09-29 11:20 ` Joonas Lahtinen
  2017-09-29 11:33   ` Lionel Landwerlin
  7 siblings, 1 reply; 15+ messages in thread
From: Joonas Lahtinen @ 2017-09-29 11:20 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

On Fri, 2017-09-22 at 16:10 +0100, Lionel Landwerlin wrote:
> Hi,
> 
> A small update to all userspace to select the engine to which the
> slice/subslice configuration applies (as suggested by Chris).

Lets link the Open Source user for these, for easier tracking of those
patches to be on their way, too.

Also, please link I-G-T testcases here (hoping they exist), they're
needed too for merging.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming
  2017-09-29 11:20 ` [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Joonas Lahtinen
@ 2017-09-29 11:33   ` Lionel Landwerlin
  0 siblings, 0 replies; 15+ messages in thread
From: Lionel Landwerlin @ 2017-09-29 11:33 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On 29/09/17 12:20, Joonas Lahtinen wrote:
> On Fri, 2017-09-22 at 16:10 +0100, Lionel Landwerlin wrote:
>> Hi,
>>
>> A small update to all userspace to select the engine to which the
>> slice/subslice configuration applies (as suggested by Chris).
> Lets link the Open Source user for these, for easier tracking of those
> patches to be on their way, too.
>
> Also, please link I-G-T testcases here (hoping they exist), they're
> needed too for merging.
>
> Regards, Joonas

The tests : https://patchwork.freedesktop.org/series/29717/
There is no userspace available yet. I was hoping for the vaapi driver 
to potentially make use of this.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-09-22 15:10 ` [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2017-10-05 11:57   ` Tvrtko Ursulin
  2017-10-05 13:24     ` Lionel Landwerlin
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-10-05 11:57 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/09/2017 16:10, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.

In recent discussion a few interesting question were raised on this 
feature. And one more which comes from looking at the implementation.

First of all, the nature of this implementation would need to be 
documented loudly that it should not be called lightly since it can 
produce random stalls (for the calling program) since it needs to idle 
the GPU in order to modify the context image.

I suspect ideally users who wanted to use different number of slices 
could (should?) try hard to only set it before any work has been 
submitted against it.

In case they expect a need for multiple slice counts would a good plan 
to be to create multiple contexts?

In fact, would a system friendly way be to enforce this ioctl can only 
be called on fresh contexts and otherwise returns an error?

Otherwise it would be an easy DoS vector for a misbehaving userspace.

Second concern is the cost of powering slices up and down on context 
switches which needs to be measured by someone who plans to use this 
feature a lot.

Following from this is a question of whether or not should i915 allow 
dynamic switching by default. Joonas has suggested that i915 should 
apply a max() formula on all programmed slice counts which sounds like a 
good and safe default behaviour.

Otherwise a client who wanted to run with a lower number of slices would 
cause a switching cost penalty to other clients. And given that only 
known user of decreased sliced count is media, where the performance 
penalty has been estimated as up to 10-20% (Dmitry please correct if 
wrong), it sounds reasonable to me that it, as the only user, is the 
only party paying the performance penalty on an _otherwise_ 
_unconfigured_ system.

What also sounds good to me is to add a global i915 tunable policy, 
protected with CAP_SYS_ADMIN, to choose between the 'max' and 'dynamic' 
behaviour between all configured slice counts.

Max would behave as described above - there would be no dynamic 
switching, and the GPU would run with the largest number of slices as 
configured by any client.

Clients could also agree between themselves to collectively request a 
lower slice count if they deemed that to be a better option for their 
combined workloads.

But if clients thought that the cost of dynamic switching is less than 
the individual performance penalty, caused by running with the 
sub-optimal number of slices, they could also agree to set the global 
policy to 'dynamic'.

This would cause i915 to program each context image with it's own slice 
count and so would cause slice powering up and down at context switch time.

It sounds like the above would enable a safe and reasonably performant 
default, plus options to configure and tweak to any possible 
configuration. Thoughts, comments etc?

Regards,

Tvrtko

> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> 	struct drm_i915_gem_context_param arg;
> 	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> 
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	arg.value = (uintptr_t) &sseu;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		sseu.packed.subslice_mask = 0;
> 
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 49 ++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 82 +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.h        |  5 ++
>   include/uapi/drm/i915_drm.h             | 28 +++++++++++
>   4 files changed, 164 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b386574259a1..088b5035c3a6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1042,6 +1042,30 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_BANNABLE:
>   		args->value = i915_gem_context_is_bannable(ctx);
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU: {
> +		struct drm_i915_gem_context_param_sseu param_sseu;
> +		struct intel_engine_cs *engine;
> +
> +		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +				   sizeof(param_sseu))) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +						    param_sseu.flags);
> +		if (!engine) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		param_sseu.value = intel_lr_context_get_sseu(ctx, engine);
> +
> +		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +				 sizeof(param_sseu)))
> +			ret = -EFAULT;
> +		break;
> +	}
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1097,6 +1121,31 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			i915_gem_context_clear_bannable(ctx);
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		if (args->size)
> +			ret = -EINVAL;
> +		else if (!i915_modparams.enable_execlists)
> +			ret = -ENODEV;
> +		else {
> +			struct drm_i915_gem_context_param_sseu param_sseu;
> +			struct intel_engine_cs *engine;
> +
> +			if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +					   sizeof(param_sseu))) {
> +				ret = -EFAULT;
> +				break;
> +			}
> +
> +			engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +							    param_sseu.flags);
> +			if (!engine) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ret = intel_lr_context_set_sseu(ctx, engine, param_sseu.value);
> +		}
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f5e9caf4913c..bffdc1126838 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2164,3 +2164,85 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>   		}
>   	}
>   }
> +
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      u64 value)
> +{
> +	struct drm_i915_gem_context_param_sseu user = { .value = value };
> +	struct drm_i915_private *i915 = ctx->i915;
> +	struct sseu_dev_info sseu = ctx->engine[engine->id].sseu;
> +	struct intel_context *ce;
> +	enum intel_engine_id id;
> +	int ret;
> +
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +	sseu.slice_mask = user.packed.slice_mask == 0 ?
> +		INTEL_INFO(i915)->sseu.slice_mask :
> +		(user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask);
> +	sseu.subslice_mask = user.packed.subslice_mask == 0 ?
> +		INTEL_INFO(i915)->sseu.subslice_mask :
> +		(user.packed.subslice_mask & INTEL_INFO(i915)->sseu.subslice_mask);
> +	sseu.min_eu_per_subslice =
> +		max(user.packed.min_eu_per_subslice,
> +		    INTEL_INFO(i915)->sseu.min_eu_per_subslice);
> +	sseu.max_eu_per_subslice =
> +		min(user.packed.max_eu_per_subslice,
> +		    INTEL_INFO(i915)->sseu.max_eu_per_subslice);
> +
> +	if (memcmp(&sseu, &ctx->engine[engine->id].sseu, sizeof(sseu)) == 0)
> +		return 0;
> +
> +	/*
> +	 * We can only program this on render ring.
> +	 */
> +	ce = &ctx->engine[RCS];
> +
> +	if (ce->pin_count) { /* Assume that the context is active! */
> +		ret = i915_gem_switch_to_kernel_context(i915);
> +		if (ret)
> +			return ret;
> +
> +		ret = i915_gem_wait_for_idle(i915,
> +					     I915_WAIT_INTERRUPTIBLE |
> +					     I915_WAIT_LOCKED);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (ce->state) {
> +		u32 *regs;
> +
> +		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB) +
> +			LRC_STATE_PN * PAGE_SIZE;
> +		if (IS_ERR(regs))
> +			return PTR_ERR(regs);
> +
> +		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
> +		i915_gem_object_unpin_map(ce->state->obj);
> +	}
> +
> +	/*
> +	 * Apply the configuration to all engine. Our hardware doesn't
> +	 * currently support different configurations for each engine.
> +	 */
> +	for_each_engine(engine, i915, id)
> +		ctx->engine[id].sseu = sseu;
> +
> +	return 0;
> +}
> +
> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_gem_context_param_sseu user;
> +	const struct sseu_dev_info *sseu = &ctx->engine[engine->id].sseu;
> +
> +	user.packed.slice_mask = sseu->slice_mask;
> +	user.packed.subslice_mask = sseu->subslice_mask;
> +	user.packed.min_eu_per_subslice = sseu->min_eu_per_subslice;
> +	user.packed.max_eu_per_subslice = sseu->max_eu_per_subslice;
> +
> +	return user.value;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 314adee7127a..a51e67d9fec5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -106,6 +106,11 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>   	return ctx->engine[engine->id].lrc_desc;
>   }
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      u64 value);
> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine);
>   
>   /* Execlists */
>   int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index fe25a01c81f2..ed1cbced8e6e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1360,9 +1360,37 @@ struct drm_i915_gem_context_param {
>   #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
>   #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
>   #define I915_CONTEXT_PARAM_BANNABLE	0x5
> +/*
> + * When using the following param, value should be a pointer to
> + * drm_i915_gem_context_param_sseu.
> + */
> +#define I915_CONTEXT_PARAM_SSEU		0x6
>   	__u64 value;
>   };
>   
> +struct drm_i915_gem_context_param_sseu {
> +	/*
> +	 * Engine to be configured or queried. Same value you would use with
> +	 * drm_i915_gem_execbuffer2.
> +	 */
> +	__u64 flags;
> +
> +	/*
> +	 * Setting slice_mask or subslice_mask to 0 will make the context use
> +	 * masks reported respectively by I915_PARAM_SLICE_MASK or
> +	 * I915_PARAM_SUBSLICE_MASK.
> +	 */
> +	union {
> +		struct {
> +			__u8 slice_mask;
> +			__u8 subslice_mask;
> +			__u8 min_eu_per_subslice;
> +			__u8 max_eu_per_subslice;
> +		} packed;
> +		__u64 value;
> +	};
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-10-05 11:57   ` Tvrtko Ursulin
@ 2017-10-05 13:24     ` Lionel Landwerlin
  2017-10-05 15:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Lionel Landwerlin @ 2017-10-05 13:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 05/10/17 12:57, Tvrtko Ursulin wrote:
>
> On 22/09/2017 16:10, Lionel Landwerlin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>
> In recent discussion a few interesting question were raised on this 
> feature. And one more which comes from looking at the implementation.
>
> First of all, the nature of this implementation would need to be 
> documented loudly that it should not be called lightly since it can 
> produce random stalls (for the calling program) since it needs to idle 
> the GPU in order to modify the context image.
>
> I suspect ideally users who wanted to use different number of slices 
> could (should?) try hard to only set it before any work has been 
> submitted against it.
>
> In case they expect a need for multiple slice counts would a good plan 
> to be to create multiple contexts?
>
> In fact, would a system friendly way be to enforce this ioctl can only 
> be called on fresh contexts and otherwise returns an error?

It would be good to hear more details about the setup in which Dmitry 
measured significant improvements.
I remember this feature being discussed as "dynamic slice shutdown".
That seems to imply that performance improvements where measured despite 
having to switch between different slice power configurations.

If we go ahead with the max() formula, that would wipe the benefits.

>
> Otherwise it would be an easy DoS vector for a misbehaving userspace.
>
> Second concern is the cost of powering slices up and down on context 
> switches which needs to be measured by someone who plans to use this 
> feature a lot.
>
> Following from this is a question of whether or not should i915 allow 
> dynamic switching by default. Joonas has suggested that i915 should 
> apply a max() formula on all programmed slice counts which sounds like 
> a good and safe default behaviour.
>
> Otherwise a client who wanted to run with a lower number of slices 
> would cause a switching cost penalty to other clients. And given that 
> only known user of decreased sliced count is media, where the 
> performance penalty has been estimated as up to 10-20% (Dmitry please 
> correct if wrong), it sounds reasonable to me that it, as the only 
> user, is the only party paying the performance penalty on an 
> _otherwise_ _unconfigured_ system.
>
> What also sounds good to me is to add a global i915 tunable policy, 
> protected with CAP_SYS_ADMIN, to choose between the 'max' and 
> 'dynamic' behaviour between all configured slice counts.
>
> Max would behave as described above - there would be no dynamic 
> switching, and the GPU would run with the largest number of slices as 
> configured by any client.
>
> Clients could also agree between themselves to collectively request a 
> lower slice count if they deemed that to be a better option for their 
> combined workloads.
>
> But if clients thought that the cost of dynamic switching is less than 
> the individual performance penalty, caused by running with the 
> sub-optimal number of slices, they could also agree to set the global 
> policy to 'dynamic'.
>
> This would cause i915 to program each context image with it's own 
> slice count and so would cause slice powering up and down at context 
> switch time.
>
> It sounds like the above would enable a safe and reasonably performant 
> default, plus options to configure and tweak to any possible 
> configuration. Thoughts, comments etc?
>
> Regards,
>
> Tvrtko
>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>     struct drm_i915_gem_context_param arg;
>>     struct drm_i915_gem_context_param_sseu sseu = { .flags = 
>> I915_EXEC_RENDER };
>>
>>     memset(&arg, 0, sizeof(arg));
>>     arg.ctx_id = ctx;
>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>     arg.value = (uintptr_t) &sseu;
>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>         sseu.packed.subslice_mask = 0;
>>
>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>     }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>> (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 49 ++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 82 
>> +++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.h        |  5 ++
>>   include/uapi/drm/i915_drm.h             | 28 +++++++++++
>>   4 files changed, 164 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index b386574259a1..088b5035c3a6 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -1042,6 +1042,30 @@ int i915_gem_context_getparam_ioctl(struct 
>> drm_device *dev, void *data,
>>       case I915_CONTEXT_PARAM_BANNABLE:
>>           args->value = i915_gem_context_is_bannable(ctx);
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU: {
>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>> +        struct intel_engine_cs *engine;
>> +
>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                   sizeof(param_sseu))) {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        engine = i915_gem_engine_from_flags(to_i915(dev), file,
>> +                            param_sseu.flags);
>> +        if (!engine) {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        param_sseu.value = intel_lr_context_get_sseu(ctx, engine);
>> +
>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                 sizeof(param_sseu)))
>> +            ret = -EFAULT;
>> +        break;
>> +    }
>>       default:
>>           ret = -EINVAL;
>>           break;
>> @@ -1097,6 +1121,31 @@ int i915_gem_context_setparam_ioctl(struct 
>> drm_device *dev, void *data,
>>           else
>>               i915_gem_context_clear_bannable(ctx);
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU:
>> +        if (args->size)
>> +            ret = -EINVAL;
>> +        else if (!i915_modparams.enable_execlists)
>> +            ret = -ENODEV;
>> +        else {
>> +            struct drm_i915_gem_context_param_sseu param_sseu;
>> +            struct intel_engine_cs *engine;
>> +
>> +            if (copy_from_user(&param_sseu, 
>> u64_to_user_ptr(args->value),
>> +                       sizeof(param_sseu))) {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            engine = i915_gem_engine_from_flags(to_i915(dev), file,
>> +                                param_sseu.flags);
>> +            if (!engine) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = intel_lr_context_set_sseu(ctx, engine, 
>> param_sseu.value);
>> +        }
>> +        break;
>>       default:
>>           ret = -EINVAL;
>>           break;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index f5e9caf4913c..bffdc1126838 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2164,3 +2164,85 @@ void intel_lr_context_resume(struct 
>> drm_i915_private *dev_priv)
>>           }
>>       }
>>   }
>> +
>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  u64 value)
>> +{
>> +    struct drm_i915_gem_context_param_sseu user = { .value = value };
>> +    struct drm_i915_private *i915 = ctx->i915;
>> +    struct sseu_dev_info sseu = ctx->engine[engine->id].sseu;
>> +    struct intel_context *ce;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    lockdep_assert_held(&i915->drm.struct_mutex);
>> +
>> +    sseu.slice_mask = user.packed.slice_mask == 0 ?
>> +        INTEL_INFO(i915)->sseu.slice_mask :
>> +        (user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask);
>> +    sseu.subslice_mask = user.packed.subslice_mask == 0 ?
>> +        INTEL_INFO(i915)->sseu.subslice_mask :
>> +        (user.packed.subslice_mask & 
>> INTEL_INFO(i915)->sseu.subslice_mask);
>> +    sseu.min_eu_per_subslice =
>> +        max(user.packed.min_eu_per_subslice,
>> +            INTEL_INFO(i915)->sseu.min_eu_per_subslice);
>> +    sseu.max_eu_per_subslice =
>> +        min(user.packed.max_eu_per_subslice,
>> +            INTEL_INFO(i915)->sseu.max_eu_per_subslice);
>> +
>> +    if (memcmp(&sseu, &ctx->engine[engine->id].sseu, sizeof(sseu)) 
>> == 0)
>> +        return 0;
>> +
>> +    /*
>> +     * We can only program this on render ring.
>> +     */
>> +    ce = &ctx->engine[RCS];
>> +
>> +    if (ce->pin_count) { /* Assume that the context is active! */
>> +        ret = i915_gem_switch_to_kernel_context(i915);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = i915_gem_wait_for_idle(i915,
>> +                         I915_WAIT_INTERRUPTIBLE |
>> +                         I915_WAIT_LOCKED);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    if (ce->state) {
>> +        u32 *regs;
>> +
>> +        regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB) +
>> +            LRC_STATE_PN * PAGE_SIZE;
>> +        if (IS_ERR(regs))
>> +            return PTR_ERR(regs);
>> +
>> +        regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
>> +        i915_gem_object_unpin_map(ce->state->obj);
>> +    }
>> +
>> +    /*
>> +     * Apply the configuration to all engine. Our hardware doesn't
>> +     * currently support different configurations for each engine.
>> +     */
>> +    for_each_engine(engine, i915, id)
>> +        ctx->engine[id].sseu = sseu;
>> +
>> +    return 0;
>> +}
>> +
>> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine)
>> +{
>> +    struct drm_i915_gem_context_param_sseu user;
>> +    const struct sseu_dev_info *sseu = &ctx->engine[engine->id].sseu;
>> +
>> +    user.packed.slice_mask = sseu->slice_mask;
>> +    user.packed.subslice_mask = sseu->subslice_mask;
>> +    user.packed.min_eu_per_subslice = sseu->min_eu_per_subslice;
>> +    user.packed.max_eu_per_subslice = sseu->max_eu_per_subslice;
>> +
>> +    return user.value;
>> +}
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h 
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index 314adee7127a..a51e67d9fec5 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -106,6 +106,11 @@ intel_lr_context_descriptor(struct 
>> i915_gem_context *ctx,
>>       return ctx->engine[engine->id].lrc_desc;
>>   }
>>   +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  u64 value);
>> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine);
>>     /* Execlists */
>>   int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index fe25a01c81f2..ed1cbced8e6e 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1360,9 +1360,37 @@ struct drm_i915_gem_context_param {
>>   #define I915_CONTEXT_PARAM_GTT_SIZE    0x3
>>   #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE    0x4
>>   #define I915_CONTEXT_PARAM_BANNABLE    0x5
>> +/*
>> + * When using the following param, value should be a pointer to
>> + * drm_i915_gem_context_param_sseu.
>> + */
>> +#define I915_CONTEXT_PARAM_SSEU        0x6
>>       __u64 value;
>>   };
>>   +struct drm_i915_gem_context_param_sseu {
>> +    /*
>> +     * Engine to be configured or queried. Same value you would use 
>> with
>> +     * drm_i915_gem_execbuffer2.
>> +     */
>> +    __u64 flags;
>> +
>> +    /*
>> +     * Setting slice_mask or subslice_mask to 0 will make the 
>> context use
>> +     * masks reported respectively by I915_PARAM_SLICE_MASK or
>> +     * I915_PARAM_SUBSLICE_MASK.
>> +     */
>> +    union {
>> +        struct {
>> +            __u8 slice_mask;
>> +            __u8 subslice_mask;
>> +            __u8 min_eu_per_subslice;
>> +            __u8 max_eu_per_subslice;
>> +        } packed;
>> +        __u64 value;
>> +    };
>> +};
>> +
>>   enum drm_i915_oa_format {
>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>       I915_OA_FORMAT_A29,        /* HSW only */
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-10-05 13:24     ` Lionel Landwerlin
@ 2017-10-05 15:17       ` Tvrtko Ursulin
  0 siblings, 0 replies; 15+ messages in thread
From: Tvrtko Ursulin @ 2017-10-05 15:17 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 05/10/2017 14:24, Lionel Landwerlin wrote:
> On 05/10/17 12:57, Tvrtko Ursulin wrote:
>>
>> On 22/09/2017 16:10, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>
>> In recent discussion a few interesting question were raised on this 
>> feature. And one more which comes from looking at the implementation.
>>
>> First of all, the nature of this implementation would need to be 
>> documented loudly that it should not be called lightly since it can 
>> produce random stalls (for the calling program) since it needs to idle 
>> the GPU in order to modify the context image.
>>
>> I suspect ideally users who wanted to use different number of slices 
>> could (should?) try hard to only set it before any work has been 
>> submitted against it.
>>
>> In case they expect a need for multiple slice counts would a good plan 
>> to be to create multiple contexts?
>>
>> In fact, would a system friendly way be to enforce this ioctl can only 
>> be called on fresh contexts and otherwise returns an error?
> 
> It would be good to hear more details about the setup in which Dmitry 
> measured significant improvements.

Agreed.

> I remember this feature being discussed as "dynamic slice shutdown".
> That seems to imply that performance improvements where measured despite 
> having to switch between different slice power configurations.

I was going by the concern expressed recently that dynamic switching 
adds too much latency. Maybe that is my misunderstanding.

> If we go ahead with the max() formula, that would wipe the benefits.

No, because I describe below an option to change the global policy from 
'max' to 'dynamic'.

Heavy media users can then tweak in all the ways described below, but a 
default system which is perhaps decoding one stream plus running some 
light load on the rcs - does that system care more whether the video 
decode is taking 25 or 30% on the vcs engine, or that no extra latency 
is incurred in context switching?

That's why I thought 'max' is a good default policy. So by default we 
maybe take a small hit on media performance (which is also not universal 
for all stream types), but the 3d and OCL stay at full potential.

And at the same time give the requested feature to our customers.

Regards,

Tvrtko

>>
>> Otherwise it would be an easy DoS vector for a misbehaving userspace.
>>
>> Second concern is the cost of powering slices up and down on context 
>> switches which needs to be measured by someone who plans to use this 
>> feature a lot.
>>
>> Following from this is a question of whether or not should i915 allow 
>> dynamic switching by default. Joonas has suggested that i915 should 
>> apply a max() formula on all programmed slice counts which sounds like 
>> a good and safe default behaviour.
>>
>> Otherwise a client who wanted to run with a lower number of slices 
>> would cause a switching cost penalty to other clients. And given that 
>> only known user of decreased sliced count is media, where the 
>> performance penalty has been estimated as up to 10-20% (Dmitry please 
>> correct if wrong), it sounds reasonable to me that it, as the only 
>> user, is the only party paying the performance penalty on an 
>> _otherwise_ _unconfigured_ system.
>>
>> What also sounds good to me is to add a global i915 tunable policy, 
>> protected with CAP_SYS_ADMIN, to choose between the 'max' and 
>> 'dynamic' behaviour between all configured slice counts.
>>
>> Max would behave as described above - there would be no dynamic 
>> switching, and the GPU would run with the largest number of slices as 
>> configured by any client.
>>
>> Clients could also agree between themselves to collectively request a 
>> lower slice count if they deemed that to be a better option for their 
>> combined workloads.
>>
>> But if clients thought that the cost of dynamic switching is less than 
>> the individual performance penalty, caused by running with the 
>> sub-optimal number of slices, they could also agree to set the global 
>> policy to 'dynamic'.
>>
>> This would cause i915 to program each context image with it's own 
>> slice count and so would cause slice powering up and down at context 
>> switch time.
>>
>> It sounds like the above would enable a safe and reasonably performant 
>> default, plus options to configure and tweak to any possible 
>> configuration. Thoughts, comments etc?
>>
>> Regards,
>>
>> Tvrtko
>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>     struct drm_i915_gem_context_param arg;
>>>     struct drm_i915_gem_context_param_sseu sseu = { .flags = 
>>> I915_EXEC_RENDER };
>>>
>>>     memset(&arg, 0, sizeof(arg));
>>>     arg.ctx_id = ctx;
>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>     arg.value = (uintptr_t) &sseu;
>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>         sseu.packed.subslice_mask = 0;
>>>
>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>     }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>>> (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 49 ++++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 82 
>>> +++++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.h        |  5 ++
>>>   include/uapi/drm/i915_drm.h             | 28 +++++++++++
>>>   4 files changed, 164 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>>> b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index b386574259a1..088b5035c3a6 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -1042,6 +1042,30 @@ int i915_gem_context_getparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>       case I915_CONTEXT_PARAM_BANNABLE:
>>>           args->value = i915_gem_context_is_bannable(ctx);
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU: {
>>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>>> +        struct intel_engine_cs *engine;
>>> +
>>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                   sizeof(param_sseu))) {
>>> +            ret = -EFAULT;
>>> +            break;
>>> +        }
>>> +
>>> +        engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>> +                            param_sseu.flags);
>>> +        if (!engine) {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        param_sseu.value = intel_lr_context_get_sseu(ctx, engine);
>>> +
>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                 sizeof(param_sseu)))
>>> +            ret = -EFAULT;
>>> +        break;
>>> +    }
>>>       default:
>>>           ret = -EINVAL;
>>>           break;
>>> @@ -1097,6 +1121,31 @@ int i915_gem_context_setparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>           else
>>>               i915_gem_context_clear_bannable(ctx);
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU:
>>> +        if (args->size)
>>> +            ret = -EINVAL;
>>> +        else if (!i915_modparams.enable_execlists)
>>> +            ret = -ENODEV;
>>> +        else {
>>> +            struct drm_i915_gem_context_param_sseu param_sseu;
>>> +            struct intel_engine_cs *engine;
>>> +
>>> +            if (copy_from_user(&param_sseu, 
>>> u64_to_user_ptr(args->value),
>>> +                       sizeof(param_sseu))) {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>> +                                param_sseu.flags);
>>> +            if (!engine) {
>>> +                ret = -EINVAL;
>>> +                break;
>>> +            }
>>> +
>>> +            ret = intel_lr_context_set_sseu(ctx, engine, 
>>> param_sseu.value);
>>> +        }
>>> +        break;
>>>       default:
>>>           ret = -EINVAL;
>>>           break;
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>>> b/drivers/gpu/drm/i915/intel_lrc.c
>>> index f5e9caf4913c..bffdc1126838 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2164,3 +2164,85 @@ void intel_lr_context_resume(struct 
>>> drm_i915_private *dev_priv)
>>>           }
>>>       }
>>>   }
>>> +
>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>> +                  struct intel_engine_cs *engine,
>>> +                  u64 value)
>>> +{
>>> +    struct drm_i915_gem_context_param_sseu user = { .value = value };
>>> +    struct drm_i915_private *i915 = ctx->i915;
>>> +    struct sseu_dev_info sseu = ctx->engine[engine->id].sseu;
>>> +    struct intel_context *ce;
>>> +    enum intel_engine_id id;
>>> +    int ret;
>>> +
>>> +    lockdep_assert_held(&i915->drm.struct_mutex);
>>> +
>>> +    sseu.slice_mask = user.packed.slice_mask == 0 ?
>>> +        INTEL_INFO(i915)->sseu.slice_mask :
>>> +        (user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask);
>>> +    sseu.subslice_mask = user.packed.subslice_mask == 0 ?
>>> +        INTEL_INFO(i915)->sseu.subslice_mask :
>>> +        (user.packed.subslice_mask & 
>>> INTEL_INFO(i915)->sseu.subslice_mask);
>>> +    sseu.min_eu_per_subslice =
>>> +        max(user.packed.min_eu_per_subslice,
>>> +            INTEL_INFO(i915)->sseu.min_eu_per_subslice);
>>> +    sseu.max_eu_per_subslice =
>>> +        min(user.packed.max_eu_per_subslice,
>>> +            INTEL_INFO(i915)->sseu.max_eu_per_subslice);
>>> +
>>> +    if (memcmp(&sseu, &ctx->engine[engine->id].sseu, sizeof(sseu)) 
>>> == 0)
>>> +        return 0;
>>> +
>>> +    /*
>>> +     * We can only program this on render ring.
>>> +     */
>>> +    ce = &ctx->engine[RCS];
>>> +
>>> +    if (ce->pin_count) { /* Assume that the context is active! */
>>> +        ret = i915_gem_switch_to_kernel_context(i915);
>>> +        if (ret)
>>> +            return ret;
>>> +
>>> +        ret = i915_gem_wait_for_idle(i915,
>>> +                         I915_WAIT_INTERRUPTIBLE |
>>> +                         I915_WAIT_LOCKED);
>>> +        if (ret)
>>> +            return ret;
>>> +    }
>>> +
>>> +    if (ce->state) {
>>> +        u32 *regs;
>>> +
>>> +        regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB) +
>>> +            LRC_STATE_PN * PAGE_SIZE;
>>> +        if (IS_ERR(regs))
>>> +            return PTR_ERR(regs);
>>> +
>>> +        regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
>>> +        i915_gem_object_unpin_map(ce->state->obj);
>>> +    }
>>> +
>>> +    /*
>>> +     * Apply the configuration to all engine. Our hardware doesn't
>>> +     * currently support different configurations for each engine.
>>> +     */
>>> +    for_each_engine(engine, i915, id)
>>> +        ctx->engine[id].sseu = sseu;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
>>> +                  struct intel_engine_cs *engine)
>>> +{
>>> +    struct drm_i915_gem_context_param_sseu user;
>>> +    const struct sseu_dev_info *sseu = &ctx->engine[engine->id].sseu;
>>> +
>>> +    user.packed.slice_mask = sseu->slice_mask;
>>> +    user.packed.subslice_mask = sseu->subslice_mask;
>>> +    user.packed.min_eu_per_subslice = sseu->min_eu_per_subslice;
>>> +    user.packed.max_eu_per_subslice = sseu->max_eu_per_subslice;
>>> +
>>> +    return user.value;
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h 
>>> b/drivers/gpu/drm/i915/intel_lrc.h
>>> index 314adee7127a..a51e67d9fec5 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>>> @@ -106,6 +106,11 @@ intel_lr_context_descriptor(struct 
>>> i915_gem_context *ctx,
>>>       return ctx->engine[engine->id].lrc_desc;
>>>   }
>>>   +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>> +                  struct intel_engine_cs *engine,
>>> +                  u64 value);
>>> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx,
>>> +                  struct intel_engine_cs *engine);
>>>     /* Execlists */
>>>   int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index fe25a01c81f2..ed1cbced8e6e 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -1360,9 +1360,37 @@ struct drm_i915_gem_context_param {
>>>   #define I915_CONTEXT_PARAM_GTT_SIZE    0x3
>>>   #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE    0x4
>>>   #define I915_CONTEXT_PARAM_BANNABLE    0x5
>>> +/*
>>> + * When using the following param, value should be a pointer to
>>> + * drm_i915_gem_context_param_sseu.
>>> + */
>>> +#define I915_CONTEXT_PARAM_SSEU        0x6
>>>       __u64 value;
>>>   };
>>>   +struct drm_i915_gem_context_param_sseu {
>>> +    /*
>>> +     * Engine to be configured or queried. Same value you would use 
>>> with
>>> +     * drm_i915_gem_execbuffer2.
>>> +     */
>>> +    __u64 flags;
>>> +
>>> +    /*
>>> +     * Setting slice_mask or subslice_mask to 0 will make the 
>>> context use
>>> +     * masks reported respectively by I915_PARAM_SLICE_MASK or
>>> +     * I915_PARAM_SUBSLICE_MASK.
>>> +     */
>>> +    union {
>>> +        struct {
>>> +            __u8 slice_mask;
>>> +            __u8 subslice_mask;
>>> +            __u8 min_eu_per_subslice;
>>> +            __u8 max_eu_per_subslice;
>>> +        } packed;
>>> +        __u64 value;
>>> +    };
>>> +};
>>> +
>>>   enum drm_i915_oa_format {
>>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>>       I915_OA_FORMAT_A29,        /* HSW only */
>>>
>>
> 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-10-05 15:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 15:10 [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Lionel Landwerlin
2017-09-22 15:10 ` [RFC PATCH v2 1/5] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
2017-09-29 11:09   ` Joonas Lahtinen
2017-09-22 15:10 ` [RFC PATCH v2 2/5] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Lionel Landwerlin
2017-09-22 15:10 ` [RFC PATCH v2 3/5] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2017-09-22 15:10 ` [RFC PATCH v2 4/5] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2017-09-29 11:18   ` Joonas Lahtinen
2017-09-22 15:10 ` [RFC PATCH v2 5/5] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2017-10-05 11:57   ` Tvrtko Ursulin
2017-10-05 13:24     ` Lionel Landwerlin
2017-10-05 15:17       ` Tvrtko Ursulin
2017-09-22 15:33 ` ✓ Fi.CI.BAT: success for drm/i915: enable userspace to program slice/subslice programming (rev2) Patchwork
2017-09-22 20:41 ` ✗ Fi.CI.IGT: failure " Patchwork
2017-09-29 11:20 ` [RFC PATCH v2 0/5] drm/i915: enable userspace to program slice/subslice programming Joonas Lahtinen
2017-09-29 11:33   ` Lionel Landwerlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.