All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
@ 2017-05-02 10:33   ` Oscar Mateo
  2017-05-02 19:55     ` Chris Wilson
  2017-05-09  9:52   ` Lionel Landwerlin
  2017-07-11 17:14   ` Lionel Landwerlin
  2 siblings, 1 reply; 13+ messages in thread
From: Oscar Mateo @ 2017-05-02 10:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx



On 05/02/2017 11:49 AM, Chris Wilson wrote:
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI).

Userspace could also do this by themselves via LRI if we simply 
whitelist GEN8_R_PWR_CLK_STATE.

Hardware people suggested this programming model:

- PIPECONTROL - Stalling flish, flush all caches (color, depth, DC$)
- LOAD_REGISTER_IMMEDIATE - R_PWR_CLK_STATE
- Reprogram complete state

> If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.

There is another cost associated with the adjustment: slice poweron and 
shutdown do take some time to happen (in the order of tens of usecs). I 
have been playing with an i-g-t benchmark to measure this delay, I'll 
send it to the mailing list.

> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
>
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
>
> 	struct drm_i915_gem_context_param arg;
>
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		union drm_i915_gem_context_param_sseu *sseu = &arg.value;
>
> 		sseu->packed.subslice_mask = 0;
>
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
>
> could be used to disable all subslices where supported.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 63 +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.h        |  3 ++
>   include/uapi/drm/i915_drm.h             | 11 ++++++
>   4 files changed, 89 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 71ca74bcf170..0b72f9f62ddb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1044,6 +1044,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		args->value = ctx->priority;
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		args->value = intel_lr_context_get_sseu(ctx);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1120,6 +1123,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		}
>   		break;
>   
> +	case I915_CONTEXT_PARAM_SSEU:
> +		if (args->size)
> +			ret = -EINVAL;
> +		else if (!i915.enable_execlists)
> +			ret = -ENODEV;
> +		else
> +			ret = intel_lr_context_set_sseu(ctx, args->value);
> +		break;
> +
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a3183298b993..e4e2eefd4854 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2064,3 +2064,66 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>   		}
>   	}
>   }
> +
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, u64 value)
> +{
> +	union drm_i915_gem_context_param_sseu user = { .value = value };
> +	struct drm_i915_private *i915 = ctx->i915;
> +	struct intel_context *ce = &ctx->engine[RCS];
> +	struct sseu_dev_info sseu = ctx->sseu;
> +	int ret;
> +
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +	sseu.slice_mask =
> +		user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask;
> +	sseu.subslice_mask =
> +		user.packed.subslice_mask & INTEL_INFO(i915)->sseu.subslice_mask;
> +	sseu.min_eu_per_subslice =
> +		max(user.packed.min_eu_per_subslice,
> +		    INTEL_INFO(i915)->sseu.min_eu_per_subslice);
> +	sseu.max_eu_per_subslice =
> +		min(user.packed.max_eu_per_subslice,
> +		    INTEL_INFO(i915)->sseu.max_eu_per_subslice);
> +
> +	if (memcmp(&sseu, &ctx->sseu, sizeof(sseu)) == 0)
> +		return 0;
> +
> +	if (ce->pin_count) { /* Assume that the context is active! */
> +		ret = i915_gem_switch_to_kernel_context(i915);
> +		if (ret)
> +			return ret;
> +
> +		ret = i915_gem_wait_for_idle(i915,
> +					     I915_WAIT_INTERRUPTIBLE |
> +					     I915_WAIT_LOCKED);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (ce->state) {
> +		u32 *regs;
> +
> +		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> +		if (IS_ERR(regs))
> +			return PTR_ERR(regs);
> +
> +		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
> +		i915_gem_object_unpin_map(ce->state->obj);
> +	}
> +
> +	ctx->sseu = sseu;
> +	return 0;
> +}
> +
> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx)
> +{
> +	union drm_i915_gem_context_param_sseu user;
> +
> +	user.packed.slice_mask = ctx->sseu.slice_mask;
> +	user.packed.subslice_mask = ctx->sseu.subslice_mask;
> +	user.packed.min_eu_per_subslice = ctx->sseu.min_eu_per_subslice;
> +	user.packed.max_eu_per_subslice = ctx->sseu.max_eu_per_subslice;
> +
> +	return user.value;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 52b3a1fd4059..e4d811892382 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -82,6 +82,9 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   uint64_t intel_lr_context_descriptor(struct i915_gem_context *ctx,
>   				     struct intel_engine_cs *engine);
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, u64 value);
> +u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx);
> +
>   /* Execlists */
>   int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
>   				    int enable_execlists);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 34ee011f08ac..106d9140d65e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1327,6 +1327,17 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
> +	__u64 value;
> +};
> +
> +union drm_i915_gem_context_param_sseu {
> +	struct {
> +		u8 slice_mask;
> +		u8 subslice_mask;
> +		u8 min_eu_per_subslice;
> +		u8 max_eu_per_subslice;
> +	} packed;
>   	__u64 value;
>   };
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
@ 2017-05-02 11:49 Chris Wilson
  2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Chris Wilson @ 2017-05-02 11:49 UTC (permalink / raw)
  To: intel-gfx

When we query the available eu on each subslice, we currently only
report the max. It would also be useful to report the minimum found as
well.

When we set RPCS (power gating over the EU), we can also specify both
the min and max number of eu to configure on each slice; currently we
just set it to a single value, but the flexibility may be beneficial in
future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 36 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_drv.h          |  3 ++-
 drivers/gpu/drm/i915/intel_device_info.c | 32 +++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_lrc.c         |  4 ++--
 4 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a2472048b84d..e15c4609375e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4480,6 +4480,7 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_cache_sharing_fops,
 static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 					  struct sseu_dev_info *sseu)
 {
+	unsigned int min_eu_per_subslice, max_eu_per_subslice;
 	int ss_max = 2;
 	int ss;
 	u32 sig1[ss_max], sig2[ss_max];
@@ -4489,6 +4490,9 @@ static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 	sig2[0] = I915_READ(CHV_POWER_SS0_SIG2);
 	sig2[1] = I915_READ(CHV_POWER_SS1_SIG2);
 
+	min_eu_per_subslice = ~0u;
+	max_eu_per_subslice = 0;
+
 	for (ss = 0; ss < ss_max; ss++) {
 		unsigned int eu_cnt;
 
@@ -4503,14 +4507,18 @@ static void cherryview_sseu_device_status(struct drm_i915_private *dev_priv,
 			 ((sig1[ss] & CHV_EU210_PG_ENABLE) ? 0 : 2) +
 			 ((sig2[ss] & CHV_EU311_PG_ENABLE) ? 0 : 2);
 		sseu->eu_total += eu_cnt;
-		sseu->eu_per_subslice = max_t(unsigned int,
-					      sseu->eu_per_subslice, eu_cnt);
+		min_eu_per_subslice = min(min_eu_per_subslice, eu_cnt);
+		max_eu_per_subslice = max(max_eu_per_subslice, eu_cnt);
 	}
+
+	sseu->min_eu_per_subslice = min_eu_per_subslice;
+	sseu->max_eu_per_subslice = max_eu_per_subslice;
 }
 
 static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 				    struct sseu_dev_info *sseu)
 {
+	unsigned int min_eu_per_subslice, max_eu_per_subslice;
 	int s_max = 3, ss_max = 4;
 	int s, ss;
 	u32 s_reg[s_max], eu_reg[2*s_max], eu_mask[2];
@@ -4536,6 +4544,9 @@ static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 		     GEN9_PGCTL_SSB_EU210_ACK |
 		     GEN9_PGCTL_SSB_EU311_ACK;
 
+	min_eu_per_subslice = ~0u;
+	max_eu_per_subslice = 0;
+
 	for (s = 0; s < s_max; s++) {
 		if ((s_reg[s] & GEN9_PGCTL_SLICE_ACK) == 0)
 			/* skip disabled slice */
@@ -4561,11 +4572,14 @@ static void gen9_sseu_device_status(struct drm_i915_private *dev_priv,
 			eu_cnt = 2 * hweight32(eu_reg[2*s + ss/2] &
 					       eu_mask[ss%2]);
 			sseu->eu_total += eu_cnt;
-			sseu->eu_per_subslice = max_t(unsigned int,
-						      sseu->eu_per_subslice,
-						      eu_cnt);
+
+			min_eu_per_subslice = min(min_eu_per_subslice, eu_cnt);
+			max_eu_per_subslice = max(max_eu_per_subslice, eu_cnt);
 		}
 	}
+
+	sseu->min_eu_per_subslice = min_eu_per_subslice;
+	sseu->max_eu_per_subslice = max_eu_per_subslice;
 }
 
 static void broadwell_sseu_device_status(struct drm_i915_private *dev_priv,
@@ -4578,9 +4592,11 @@ static void broadwell_sseu_device_status(struct drm_i915_private *dev_priv,
 
 	if (sseu->slice_mask) {
 		sseu->subslice_mask = INTEL_INFO(dev_priv)->sseu.subslice_mask;
-		sseu->eu_per_subslice =
-				INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
-		sseu->eu_total = sseu->eu_per_subslice *
+		sseu->min_eu_per_subslice =
+			INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice;
+		sseu->max_eu_per_subslice =
+			INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice;
+		sseu->eu_total = sseu->max_eu_per_subslice *
 				 sseu_subslice_total(sseu);
 
 		/* subtract fused off EU(s) from enabled slice(s) */
@@ -4611,8 +4627,8 @@ static void i915_print_sseu_info(struct seq_file *m, bool is_available_info,
 		   hweight8(sseu->subslice_mask));
 	seq_printf(m, "  %s EU Total: %u\n", type,
 		   sseu->eu_total);
-	seq_printf(m, "  %s EU Per Subslice: %u\n", type,
-		   sseu->eu_per_subslice);
+	seq_printf(m, "  %s EU Per Subslice: [%u, %u]\n", type,
+		   sseu->min_eu_per_subslice, sseu->max_eu_per_subslice);
 
 	if (!is_available_info)
 		return;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ff12d11a36ac..9b69fc8fb0c8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -844,7 +844,8 @@ struct sseu_dev_info {
 	u8 slice_mask;
 	u8 subslice_mask;
 	u8 eu_total;
-	u8 eu_per_subslice;
+	u8 min_eu_per_subslice;
+	u8 max_eu_per_subslice;
 	u8 min_eu_in_pool;
 	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
 	u8 subslice_7eu[3];
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 7d01dfe7faac..0853e1f26230 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -83,6 +83,7 @@ void intel_device_info_dump(struct drm_i915_private *dev_priv)
 static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct sseu_dev_info *sseu = &mkwrite_device_info(dev_priv)->sseu;
+	unsigned int eu_per_subslice;
 	u32 fuse, eu_dis;
 
 	fuse = I915_READ(CHV_FUSE_GT);
@@ -107,9 +108,10 @@ static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * CHV expected to always have a uniform distribution of EU
 	 * across subslices.
 	*/
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				sseu->eu_total / sseu_subslice_total(sseu) :
-				0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		sseu->eu_total / sseu_subslice_total(sseu) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 	/*
 	 * CHV supports subslice power gating on devices with more than
 	 * one subslice, and supports EU power gating on devices with
@@ -117,13 +119,14 @@ static void cherryview_sseu_info_init(struct drm_i915_private *dev_priv)
 	*/
 	sseu->has_slice_pg = 0;
 	sseu->has_subslice_pg = sseu_subslice_total(sseu) > 1;
-	sseu->has_eu_pg = (sseu->eu_per_subslice > 2);
+	sseu->has_eu_pg = eu_per_subslice > 2;
 }
 
 static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct intel_device_info *info = mkwrite_device_info(dev_priv);
 	struct sseu_dev_info *sseu = &info->sseu;
+	unsigned int eu_per_subslice;
 	int s_max = 3, ss_max = 4, eu_max = 8;
 	int s, ss;
 	u32 fuse2, eu_disable;
@@ -179,9 +182,10 @@ static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * recovery. BXT is expected to be perfectly uniform in EU
 	 * distribution.
 	*/
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				DIV_ROUND_UP(sseu->eu_total,
-					     sseu_subslice_total(sseu)) : 0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		DIV_ROUND_UP(sseu->eu_total, sseu_subslice_total(sseu)) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 	/*
 	 * SKL supports slice power gating on devices with more than
 	 * one slice, and supports EU power gating on devices with
@@ -195,7 +199,7 @@ static void gen9_sseu_info_init(struct drm_i915_private *dev_priv)
 		hweight8(sseu->slice_mask) > 1;
 	sseu->has_subslice_pg =
 		IS_GEN9_LP(dev_priv) && sseu_subslice_total(sseu) > 1;
-	sseu->has_eu_pg = sseu->eu_per_subslice > 2;
+	sseu->has_eu_pg = eu_per_subslice > 2;
 
 	if (IS_GEN9_LP(dev_priv)) {
 #define IS_SS_DISABLED(ss)	(!(sseu->subslice_mask & BIT(ss)))
@@ -228,6 +232,7 @@ static void broadwell_sseu_info_init(struct drm_i915_private *dev_priv)
 {
 	struct sseu_dev_info *sseu = &mkwrite_device_info(dev_priv)->sseu;
 	const int s_max = 3, ss_max = 3, eu_max = 8;
+	unsigned int eu_per_subslice;
 	int s, ss;
 	u32 fuse2, eu_disable[3]; /* s_max */
 
@@ -282,9 +287,10 @@ static void broadwell_sseu_info_init(struct drm_i915_private *dev_priv)
 	 * subslices with the exception that any one EU in any one subslice may
 	 * be fused off for die recovery.
 	 */
-	sseu->eu_per_subslice = sseu_subslice_total(sseu) ?
-				DIV_ROUND_UP(sseu->eu_total,
-					     sseu_subslice_total(sseu)) : 0;
+	eu_per_subslice = sseu_subslice_total(sseu) ?
+		DIV_ROUND_UP(sseu->eu_total, sseu_subslice_total(sseu)) : 0;
+	sseu->min_eu_per_subslice = eu_per_subslice;
+	sseu->max_eu_per_subslice = eu_per_subslice;
 
 	/*
 	 * BDW supports slice power gating on devices with more than
@@ -421,7 +427,9 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("subslice per slice: %u\n",
 			 hweight8(info->sseu.subslice_mask));
 	DRM_DEBUG_DRIVER("EU total: %u\n", info->sseu.eu_total);
-	DRM_DEBUG_DRIVER("EU per subslice: %u\n", info->sseu.eu_per_subslice);
+	DRM_DEBUG_DRIVER("EU per subslice: [%u, %u]\n",
+			 info->sseu.min_eu_per_subslice,
+			 info->sseu.max_eu_per_subslice);
 	DRM_DEBUG_DRIVER("has slice power gating: %s\n",
 			 info->sseu.has_slice_pg ? "y" : "n");
 	DRM_DEBUG_DRIVER("has subslice power gating: %s\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 42c776c42212..45c187abf10a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1831,9 +1831,9 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	}
 
 	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/4] drm/i915: Program RPCS for Broadwell
  2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
@ 2017-05-02 11:49 ` Chris Wilson
  2017-05-02 12:22   ` Joonas Lahtinen
  2017-05-02 19:32   ` Lionel Landwerlin
  2017-05-02 11:49 ` [RFC 3/4] drm/i915: Record the sseu configuration per-context Chris Wilson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2017-05-02 11:49 UTC (permalink / raw)
  To: intel-gfx

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 45c187abf10a..9add516d66c2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1804,13 +1804,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	u32 rpcs = 0;
 
 	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
-	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
 	 * must make an explicit request through RPCS for full
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/4] drm/i915: Record the sseu configuration per-context
  2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
  2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
@ 2017-05-02 11:49 ` Chris Wilson
  2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2017-05-02 11:49 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will expose the ability to reconfigure the slices,
subslice and eu per context. To facilitate that, store the current
configuration on the context, which is initially set to the device
default upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 19 -------------------
 drivers/gpu/drm/i915/i915_gem_context.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_context.h | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 23 +++++++++--------------
 4 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9b69fc8fb0c8..a8a622078849 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -840,25 +840,6 @@ struct intel_csr {
 	func(overlay_needs_physical); \
 	func(supports_tv);
 
-struct sseu_dev_info {
-	u8 slice_mask;
-	u8 subslice_mask;
-	u8 eu_total;
-	u8 min_eu_per_subslice;
-	u8 max_eu_per_subslice;
-	u8 min_eu_in_pool;
-	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
-	u8 subslice_7eu[3];
-	u8 has_slice_pg:1;
-	u8 has_subslice_pg:1;
-	u8 has_eu_pg:1;
-};
-
-static inline unsigned int sseu_subslice_total(const struct sseu_dev_info *sseu)
-{
-	return hweight8(sseu->slice_mask) * hweight8(sseu->subslice_mask);
-}
-
 /* Keep in gen based order, and chronological order within a gen */
 enum intel_platform {
 	INTEL_PLATFORM_UNINITIALIZED = 0,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 612d1045f90a..71ca74bcf170 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -308,6 +308,9 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* Use the whole device by default */
+	ctx->sseu = INTEL_INFO(dev_priv)->sseu;
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 82c99ba92ad3..7c07791e804e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -39,6 +39,25 @@ struct i915_hw_ppgtt;
 struct i915_vma;
 struct intel_ring;
 
+struct sseu_dev_info {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 eu_total;
+	u8 min_eu_per_subslice;
+	u8 max_eu_per_subslice;
+	u8 min_eu_in_pool;
+	/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
+	u8 subslice_7eu[3];
+	u8 has_slice_pg:1;
+	u8 has_subslice_pg:1;
+	u8 has_eu_pg:1;
+};
+
+static inline unsigned int sseu_subslice_total(const struct sseu_dev_info *sseu)
+{
+	return hweight8(sseu->slice_mask) * hweight8(sseu->subslice_mask);
+}
+
 #define DEFAULT_CONTEXT_HANDLE 0
 
 /**
@@ -199,6 +218,9 @@ struct i915_gem_context {
 
 	/** remap_slice: Bitmask of cache lines that need remapping */
 	u8 remap_slice;
+
+	/** sseu: Control eu/slice partitioning */
+	struct sseu_dev_info sseu;
 };
 
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9add516d66c2..a3183298b993 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1798,8 +1798,7 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu)
 {
 	u32 rpcs = 0;
 
@@ -1809,25 +1808,21 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(sseu->slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= hweight8(sseu->subslice_mask) << GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.min_eu_per_subslice <<
-			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.max_eu_per_subslice <<
-			GEN8_RPCS_EU_MAX_SHIFT;
+	if (sseu->has_eu_pg) {
+		rpcs |= sseu->min_eu_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
+		rpcs |= sseu->max_eu_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
@@ -1937,7 +1932,7 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&ctx->sseu));
 	}
 }
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
  2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
  2017-05-02 11:49 ` [RFC 3/4] drm/i915: Record the sseu configuration per-context Chris Wilson
@ 2017-05-02 11:49 ` Chris Wilson
  2017-05-02 10:33   ` Oscar Mateo
                     ` (2 more replies)
  2017-05-02 12:08 ` ✓ Fi.CI.BAT: success for series starting with [RFC,1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Patchwork
  2017-05-02 12:13 ` [RFC 1/4] " Joonas Lahtinen
  4 siblings, 3 replies; 13+ messages in thread
From: Chris Wilson @ 2017-05-02 11:49 UTC (permalink / raw)
  To: intel-gfx

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		union drm_i915_gem_context_param_sseu *sseu = &arg.value;

		sseu->packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 63 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  3 ++
 include/uapi/drm/i915_drm.h             | 11 ++++++
 4 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 71ca74bcf170..0b72f9f62ddb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1044,6 +1044,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		args->value = intel_lr_context_get_sseu(ctx);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1120,6 +1123,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		}
 		break;
 
+	case I915_CONTEXT_PARAM_SSEU:
+		if (args->size)
+			ret = -EINVAL;
+		else if (!i915.enable_execlists)
+			ret = -ENODEV;
+		else
+			ret = intel_lr_context_set_sseu(ctx, args->value);
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a3183298b993..e4e2eefd4854 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2064,3 +2064,66 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 		}
 	}
 }
+
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx, u64 value)
+{
+	union drm_i915_gem_context_param_sseu user = { .value = value };
+	struct drm_i915_private *i915 = ctx->i915;
+	struct intel_context *ce = &ctx->engine[RCS];
+	struct sseu_dev_info sseu = ctx->sseu;
+	int ret;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	sseu.slice_mask =
+		user.packed.slice_mask & INTEL_INFO(i915)->sseu.slice_mask;
+	sseu.subslice_mask =
+		user.packed.subslice_mask & INTEL_INFO(i915)->sseu.subslice_mask;
+	sseu.min_eu_per_subslice =
+		max(user.packed.min_eu_per_subslice,
+		    INTEL_INFO(i915)->sseu.min_eu_per_subslice);
+	sseu.max_eu_per_subslice =
+		min(user.packed.max_eu_per_subslice,
+		    INTEL_INFO(i915)->sseu.max_eu_per_subslice);
+
+	if (memcmp(&sseu, &ctx->sseu, sizeof(sseu)) == 0)
+		return 0;
+
+	if (ce->pin_count) { /* Assume that the context is active! */
+		ret = i915_gem_switch_to_kernel_context(i915);
+		if (ret)
+			return ret;
+
+		ret = i915_gem_wait_for_idle(i915,
+					     I915_WAIT_INTERRUPTIBLE |
+					     I915_WAIT_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (ce->state) {
+		u32 *regs;
+
+		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
+
+		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(&sseu);
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	ctx->sseu = sseu;
+	return 0;
+}
+
+u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx)
+{
+	union drm_i915_gem_context_param_sseu user;
+
+	user.packed.slice_mask = ctx->sseu.slice_mask;
+	user.packed.subslice_mask = ctx->sseu.subslice_mask;
+	user.packed.min_eu_per_subslice = ctx->sseu.min_eu_per_subslice;
+	user.packed.max_eu_per_subslice = ctx->sseu.max_eu_per_subslice;
+
+	return user.value;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 52b3a1fd4059..e4d811892382 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -82,6 +82,9 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 uint64_t intel_lr_context_descriptor(struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine);
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx, u64 value);
+u64 intel_lr_context_get_sseu(struct i915_gem_context *ctx);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_i915_private *dev_priv,
 				    int enable_execlists);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 34ee011f08ac..106d9140d65e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1327,6 +1327,17 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+#define I915_CONTEXT_PARAM_SSEU		0x7
+	__u64 value;
+};
+
+union drm_i915_gem_context_param_sseu {
+	struct {
+		u8 slice_mask;
+		u8 subslice_mask;
+		u8 min_eu_per_subslice;
+		u8 max_eu_per_subslice;
+	} packed;
 	__u64 value;
 };
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [RFC,1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
  2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
                   ` (2 preceding siblings ...)
  2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
@ 2017-05-02 12:08 ` Patchwork
  2017-05-02 12:13 ` [RFC 1/4] " Joonas Lahtinen
  4 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2017-05-02 12:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [RFC,1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
URL   : https://patchwork.freedesktop.org/series/23802/
State : success

== Summary ==

Series 23802v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/23802/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> FAIL       (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                dmesg-warn -> PASS       (fi-kbl-7560u) fdo#100125

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:430s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:426s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:578s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:509s
fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time:561s
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:489s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:483s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:408s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:403s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:410s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:488s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:469s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
fi-kbl-7560u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:566s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:452s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:577s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:460s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:485s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:430s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:532s
fi-snb-2600      total:278  pass:248  dwarn:0   dfail:0   fail:1   skip:29  time:401s

0d90375e6d38f986224e4dfce1674c5b093590cc drm-tip: 2017y-05m-02d-09h-11m-43s UTC integration manifest
b39d79e drm/i915: Record the sseu configuration per-context
4f22f59 drm/i915: Program RPCS for Broadwell
0c72600 drm/i915: Record both min/max eu_per_subslice in sseu_dev_info

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4593/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info
  2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
                   ` (3 preceding siblings ...)
  2017-05-02 12:08 ` ✓ Fi.CI.BAT: success for series starting with [RFC,1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Patchwork
@ 2017-05-02 12:13 ` Joonas Lahtinen
  4 siblings, 0 replies; 13+ messages in thread
From: Joonas Lahtinen @ 2017-05-02 12:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ti, 2017-05-02 at 12:49 +0100, Chris Wilson wrote:
> When we query the available eu on each subslice, we currently only
> report the max. It would also be useful to report the minimum found as
> well.
> 
> When we set RPCS (power gating over the EU), we can also specify both
> the min and max number of eu to configure on each slice; currently we
> just set it to a single value, but the flexibility may be beneficial in
> future.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/4] drm/i915: Program RPCS for Broadwell
  2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
@ 2017-05-02 12:22   ` Joonas Lahtinen
  2017-05-02 19:32   ` Lionel Landwerlin
  1 sibling, 0 replies; 13+ messages in thread
From: Joonas Lahtinen @ 2017-05-02 12:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ti, 2017-05-02 at 12:49 +0100, Chris Wilson wrote:
> Currently we only configure the power gating for Skylake and above, but
> the configuration should equally apply to Broadwell and Braswell. Even
> though, there is not as much variation as for later generations, we want
> to expose control over the configuration to userspace and may want to
> opt out of the "always-enabled" setting.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This craves for T-b's because it excercises new portions of HW.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 19:55     ` Chris Wilson
@ 2017-05-02 15:00       ` Oscar Mateo
  0 siblings, 0 replies; 13+ messages in thread
From: Oscar Mateo @ 2017-05-02 15:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 05/02/2017 07:55 PM, Chris Wilson wrote:
> On Tue, May 02, 2017 at 10:33:19AM +0000, Oscar Mateo wrote:
>>
>> On 05/02/2017 11:49 AM, Chris Wilson wrote:
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI).
>> Userspace could also do this by themselves via LRI if we simply
>> whitelist GEN8_R_PWR_CLK_STATE.
>>
>> Hardware people suggested this programming model:
>>
>> - PIPECONTROL - Stalling flish, flush all caches (color, depth, DC$)
>> - LOAD_REGISTER_IMMEDIATE - R_PWR_CLK_STATE
>> - Reprogram complete state
> Hmm, treating it as a complete state wipe is a nuisance, but fairly
> trivial. The simplest way will be for the user to execute the LRI batch
> as part of creating the context. But there will be some use cases where
> dynamic reconfiguration within an active context will be desired, I'm
> sure.

Exactly, in this way the UMD gets the best of both worlds: they can do 
the LRI once and forget about it, or they can reconfigure on-demand.

>>> If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>> There is another cost associated with the adjustment: slice poweron
>> and shutdown do take some time to happen (in the order of tens of
>> usecs). I have been playing with an i-g-t benchmark to measure this
>> delay, I'll send it to the mailing list.
> Hmm, I thought the argument for why selecting smaller subslices gave
> better performance was that it was restoring the whole set between
> contexts, even when the configuration between contexts was the same.

Hmmm... it's the first time I hear that particular argument. I can 
definitely see the delay when changing the configuration (also, powering 
slices on takes a little bit more than switching them down) but no 
difference when I am just switching between contexts with the same 
configuration.
Until now, the most convincing argument I've heard is that thread 
scheduling is much more efficient with just one slice when you don't 
really need more, but maybe that doesn't explain the whole picture.

> As always numbers demonstrating the advantage, perhaps explaining why
> it helps, and also for spotting when we break it are most welcome :)
> -Chris

I can provide numbers for the slice configuration delay (numbers that 
have to be taken into account by the UMD when deciding which 
configuration to use) but I think Dimitry is in a better position to 
provide numbers for the advantage.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/4] drm/i915: Program RPCS for Broadwell
  2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
  2017-05-02 12:22   ` Joonas Lahtinen
@ 2017-05-02 19:32   ` Lionel Landwerlin
  1 sibling, 0 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2017-05-02 19:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 02/05/17 04:49, Chris Wilson wrote:
> Currently we only configure the power gating for Skylake and above, but
> the configuration should equally apply to Broadwell and Braswell. Even
> though, there is not as much variation as for later generations, we want
> to expose control over the configuration to userspace and may want to
> opt out of the "always-enabled" setting.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 7 -------
>   1 file changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 45c187abf10a..9add516d66c2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1804,13 +1804,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
>   	u32 rpcs = 0;
>   
>   	/*
> -	 * No explicit RPCS request is needed to ensure full
> -	 * slice/subslice/EU enablement prior to Gen9.
> -	*/
> -	if (INTEL_GEN(dev_priv) < 9)
> -		return 0;

Makes sense. Comments were confusing too.

> -
> -	/*
>   	 * Starting in Gen9, render power gating can leave
>   	 * slice/subslice/EU in a partially enabled state. We
>   	 * must make an explicit request through RPCS for full


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 10:33   ` Oscar Mateo
@ 2017-05-02 19:55     ` Chris Wilson
  2017-05-02 15:00       ` Oscar Mateo
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-05-02 19:55 UTC (permalink / raw)
  To: Oscar Mateo; +Cc: intel-gfx

On Tue, May 02, 2017 at 10:33:19AM +0000, Oscar Mateo wrote:
> 
> 
> On 05/02/2017 11:49 AM, Chris Wilson wrote:
> >We want to allow userspace to reconfigure the subslice configuration for
> >its own use case. To do so, we expose a context parameter to allow
> >adjustment of the RPCS register stored within the context image (and
> >currently not accessible via LRI).
> 
> Userspace could also do this by themselves via LRI if we simply
> whitelist GEN8_R_PWR_CLK_STATE.
> 
> Hardware people suggested this programming model:
> 
> - PIPECONTROL - Stalling flish, flush all caches (color, depth, DC$)
> - LOAD_REGISTER_IMMEDIATE - R_PWR_CLK_STATE
> - Reprogram complete state

Hmm, treating it as a complete state wipe is a nuisance, but fairly
trivial. The simplest way will be for the user to execute the LRI batch
as part of creating the context. But there will be some use cases where
dynamic reconfiguration within an active context will be desired, I'm
sure.

> >If the context is adjusted before
> >first use, the adjustment is for "free"; otherwise if the context is
> >active we flush the context off the GPU (stalling all users) and forcing
> >the GPU to save the context to memory where we can modify it and so
> >ensure that the register is reloaded on next execution.
> 
> There is another cost associated with the adjustment: slice poweron
> and shutdown do take some time to happen (in the order of tens of
> usecs). I have been playing with an i-g-t benchmark to measure this
> delay, I'll send it to the mailing list.

Hmm, I thought the argument for why selecting smaller subslices gave
better performance was that it was restoring the whole set between
contexts, even when the configuration between contexts was the same.

As always numbers demonstrating the advantage, perhaps explaining why
it helps, and also for spotting when we break it are most welcome :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
  2017-05-02 10:33   ` Oscar Mateo
@ 2017-05-09  9:52   ` Lionel Landwerlin
  2017-07-11 17:14   ` Lionel Landwerlin
  2 siblings, 0 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2017-05-09  9:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 02/05/17 12:49, Chris Wilson wrote:
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 34ee011f08ac..106d9140d65e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1327,6 +1327,17 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
> +	__u64 value;
> +};
> +
> +union drm_i915_gem_context_param_sseu {
> +	struct {
> +		u8 slice_mask;
> +		u8 subslice_mask;
> +		u8 min_eu_per_subslice;
> +		u8 max_eu_per_subslice;
> +	} packed;
>   	__u64 value;
>   };
>   
Would it make sense to expose this as a query through 
DRM_IOCTL_I915_GETPARAM as well?
So the userspace could get the max values rather than just the last setup.

Cheers,

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
  2017-05-02 10:33   ` Oscar Mateo
  2017-05-09  9:52   ` Lionel Landwerlin
@ 2017-07-11 17:14   ` Lionel Landwerlin
  2 siblings, 0 replies; 13+ messages in thread
From: Lionel Landwerlin @ 2017-07-11 17:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 02/05/17 12:49, Chris Wilson wrote:
> +
> +union drm_i915_gem_context_param_sseu {
> +	struct {
> +		u8 slice_mask;
> +		u8 subslice_mask;
> +		u8 min_eu_per_subslice;
> +		u8 max_eu_per_subslice;
> +	} packed;
>   	__u64 value;
>   };

I think we should probably rename slice_mask & subslice_mask and 
anywhere in the kernel as well.
Those aren't actually masks but numbers (ie number of slices/subslices 
on/off). It doesn't look like we can actually control which ones are on 
or off.

-
Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-11 17:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-02 11:49 [RFC 1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Chris Wilson
2017-05-02 11:49 ` [RFC 2/4] drm/i915: Program RPCS for Broadwell Chris Wilson
2017-05-02 12:22   ` Joonas Lahtinen
2017-05-02 19:32   ` Lionel Landwerlin
2017-05-02 11:49 ` [RFC 3/4] drm/i915: Record the sseu configuration per-context Chris Wilson
2017-05-02 11:49 ` [RFC 4/4] drm/i915: Expose RPCS (SSEU) configuration to userspace Chris Wilson
2017-05-02 10:33   ` Oscar Mateo
2017-05-02 19:55     ` Chris Wilson
2017-05-02 15:00       ` Oscar Mateo
2017-05-09  9:52   ` Lionel Landwerlin
2017-07-11 17:14   ` Lionel Landwerlin
2017-05-02 12:08 ` ✓ Fi.CI.BAT: success for series starting with [RFC,1/4] drm/i915: Record both min/max eu_per_subslice in sseu_dev_info Patchwork
2017-05-02 12:13 ` [RFC 1/4] " Joonas Lahtinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.