All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] drm/i915: per context slice/subslice powergating
@ 2018-05-14 15:55 Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

Hi,

This is a different approach to the perf & powergating interaction
problem. In this version we basically disable powergating when
i915/perf is on. We do something similar already in that we disable
RC6 for similar reasons.

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (4):
  drm/i915/perf: simplify configure all context function
  drm/i915/perf: reuse intel_lrc ctx regs macro
  drm/i915/perf: lock powergating configuration to default when active
  drm/i915: count powergating transitions per engine

 drivers/gpu/drm/i915/i915_gem_context.c | 178 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h |  16 +++
 drivers/gpu/drm/i915/i915_perf.c        |  69 +++++----
 drivers/gpu/drm/i915/i915_request.c     |   2 +
 drivers/gpu/drm/i915/i915_request.h     |  24 ++++
 drivers/gpu/drm/i915/intel_engine_cs.c  |   5 +
 drivers/gpu/drm/i915/intel_lrc.c        | 148 ++++++++++++++------
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  16 +++
 include/uapi/drm/i915_drm.h             |  38 +++++
 11 files changed, 425 insertions(+), 76 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 1/7] drm/i915: Program RPCS for Broadwell
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 15434cad5430..690b41b751ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2397,13 +2397,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 2/7] drm/i915: Record the sseu configuration per-context & engine
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem_context.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h     | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 22 +++++++++++-----------
 4 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 33f8a4b3c981..01310c99e032 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -266,6 +266,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -314,6 +316,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = to_intel_context(ctx, engine);
+
+		ce->sseu = intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+	}
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ace3b129c189..5a2d10f03787 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -30,6 +30,7 @@
 #include <linux/radix-tree.h>
 
 #include "i915_gem.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -149,6 +150,8 @@ struct i915_gem_context {
 		u32 *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
+		/** sseu: Control eu/slice partitioning */
+		union intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
@@ -326,4 +329,17 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
+static inline union intel_sseu
+intel_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
+{
+	union intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index eddbd4245cb3..beb312ac9aa0 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -39,6 +39,19 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+union intel_sseu {
+	struct {
+		u8 slice_mask;
+		u8 subslice_mask;
+		u8 min_eus_per_subslice;
+		u8 max_eus_per_subslice;
+	};
+	u64 value;
+};
+
 struct intel_wait {
 	struct rb_node node;
 	struct task_struct *tsk;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 690b41b751ec..f188ba1b5608 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2392,8 +2392,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     union intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2403,24 +2403,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
 			GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2544,7 +2543,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  ctx->__engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 3/7] drm/i915/perf: simplify configure all context function
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 019bd2d073ad..94466aeafd02 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1765,7 +1765,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1782,7 +1782,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	 */
 	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1794,10 +1794,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1807,7 +1805,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-05-14 15:55 ` [PATCH v5 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 94466aeafd02..fc5b5d66abcd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -210,6 +210,7 @@
 #include "i915_oa_cflgt3.h"
 #include "i915_oa_cnl.h"
 #include "i915_oa_icl.h"
+#include "intel_lrc_reg.h"
 
 /* HW requires this to be a power of two, between 128k and 16M, though driver
  * is currently generally designed assuming the largest 16M size is used such
@@ -1582,27 +1583,25 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
 	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
-	u32 flex_mmio[] = {
-		i915_mmio_reg_offset(EU_PERF_CNTL0),
-		i915_mmio_reg_offset(EU_PERF_CNTL1),
-		i915_mmio_reg_offset(EU_PERF_CNTL2),
-		i915_mmio_reg_offset(EU_PERF_CNTL3),
-		i915_mmio_reg_offset(EU_PERF_CNTL4),
-		i915_mmio_reg_offset(EU_PERF_CNTL5),
-		i915_mmio_reg_offset(EU_PERF_CNTL6),
+	i915_reg_t flex_regs[] = {
+		EU_PERF_CNTL0,
+		EU_PERF_CNTL1,
+		EU_PERF_CNTL2,
+		EU_PERF_CNTL3,
+		EU_PERF_CNTL4,
+		EU_PERF_CNTL5,
+		EU_PERF_CNTL6,
 	};
 	int i;
 
-	reg_state[ctx_oactxctrl] = i915_mmio_reg_offset(GEN8_OACTXCONTROL);
-	reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
-				      GEN8_OA_TIMER_PERIOD_SHIFT) |
-				     (dev_priv->perf.oa.periodic ?
-				      GEN8_OA_TIMER_ENABLE : 0) |
-				     GEN8_OA_COUNTER_RESUME;
+	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
+		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
+		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
+		GEN8_OA_COUNTER_RESUME);
 
-	for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
+	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
 		u32 state_offset = ctx_flexeu0 + i * 2;
-		u32 mmio = flex_mmio[i];
+		u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
 
 		/*
 		 * This arbitrary default will select the 'EU FPU0 Pipeline
@@ -1622,8 +1621,7 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 			}
 		}
 
-		reg_state[state_offset] = mmio;
-		reg_state[state_offset+1] = value;
+		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 5/7] drm/i915/perf: lock powergating configuration to default when active
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-05-14 15:55 ` [PATCH v5 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:55 ` [PATCH v5 6/7] drm/i915: count powergating transitions per engine Lionel Landwerlin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active. There is no known issue with this apart from a performance
penality for some media workloads that benefit from running on a
partially powergated GPU. We already prevent RC6 from affecting the
programming so it doesn't sound completely unreasonable to hold on
powergating for the same reason.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 24 +++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_request.c    |  2 ++
 drivers/gpu/drm/i915/i915_request.h    | 11 +++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c       | 26 +++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.h       |  3 +++
 6 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index fc5b5d66abcd..9cfbd5075097 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1577,7 +1577,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  */
 static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+					   const struct i915_oa_config *oa_config,
+					   union intel_sseu sseu)
 {
 	struct drm_i915_private *dev_priv = ctx->i915;
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
@@ -1623,6 +1624,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
 }
 
 /*
@@ -1754,6 +1758,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				       const struct i915_oa_config *oa_config)
 {
 	struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	union intel_sseu default_sseu =
+		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
 	struct i915_gem_context *ctx;
 	int ret;
 	unsigned int wait_flags = I915_WAIT_LOCKED;
@@ -1798,7 +1804,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
+					       oa_config ? default_sseu : ce->sseu);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2170,14 +2177,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    u32 *reg_state)
 {
+	struct drm_i915_private *dev_priv = engine->i915;
 	struct i915_perf_stream *stream;
 
 	if (engine->id != RCS)
 		return;
 
-	stream = engine->i915->perf.oa.exclusive_stream;
-	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+	stream = dev_priv->perf.oa.exclusive_stream;
+	if (stream) {
+		union intel_sseu default_sseu =
+			intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+
+		gen8_update_reg_state_unlocked(ctx, reg_state,
+					       stream->oa_config,
+					       default_sseu);
+	}
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 8928894dd9c7..dd0b37e0a85c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -786,6 +786,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->capture_list = NULL;
 	rq->waitboost = false;
 
+	rq->sseu = ctx->__engine[engine->id].sseu;
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index beb312ac9aa0..b4191d382145 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -162,6 +162,17 @@ struct i915_request {
 	/** Preallocate space in the ring for the emitting the request */
 	u32 reserved_space;
 
+	/*
+	 * Position in the ring batchbuffer to where the i915/perf NOA
+	 * reprogramming can be inserted just before HW submission.
+	 */
+	u32 perf_prog;
+
+	/*
+	 * Powergating configuration associated with this request.
+	 */
+	union intel_sseu sseu;
+
 	/** Batch buffer related to this request if any (used for
 	 * error state dump only).
 	 */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 6bfd7e3ed152..9f86e40f22a7 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -493,6 +493,8 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
 	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
 
+	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+
 	intel_engine_init_execlist(engine);
 	intel_engine_init_hangcheck(engine);
 	intel_engine_init_batch_pool(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f188ba1b5608..0e93ad90d039 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1956,10 +1956,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
-	cs = intel_ring_begin(rq, 6);
+	cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	if (rq->engine->id == RCS) {
+		/*
+		 * Leave some instructions to be written with an
+		 * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
+		 * batchbuffer. We only turn those MI_NOOP into
+		 * MI_BATCH_BUFFER_START when we detect a SSEU powergating
+		 * configuration change that might affect NOA. This is only
+		 * for the RCS.
+		 */
+		rq->perf_prog = intel_ring_offset(rq, cs);
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP; /* Aligning to 2 dwords */
+	}
+
 	/*
 	 * WaDisableCtxRestoreArbitration:bdw,chv
 	 *
@@ -2392,8 +2408,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(const struct sseu_dev_info *sseu,
-		     union intel_sseu ctx_sseu)
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2543,8 +2559,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
-				  ctx->__engine[engine->id].sseu));
+			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				       ctx->__engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4ec7d8dd13c8..f53e704a21b8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,6 +104,9 @@ struct i915_gem_context;
 
 void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu);
+
 static inline uint64_t
 intel_lr_context_descriptor(struct i915_gem_context *ctx,
 			    struct intel_engine_cs *engine)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 6/7] drm/i915: count powergating transitions per engine
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-05-14 15:55 ` [PATCH v5 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
@ 2018-05-14 15:55 ` Lionel Landwerlin
  2018-05-14 15:56 ` [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:55 UTC (permalink / raw)
  To: intel-gfx

This can be used to monitor the number of powergating transition
changes for a particular workload.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  3 +++
 drivers/gpu/drm/i915/intel_lrc.c        | 16 +++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 9f86e40f22a7..ab3ab80daf62 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -494,6 +494,7 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
 
 	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+	atomic_set(&engine->sseu_transitions, 0);
 
 	intel_engine_init_execlist(engine);
 	intel_engine_init_hangcheck(engine);
@@ -1428,6 +1429,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	hexdump(m, engine->status_page.page_addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	drm_printf(m, "Powergating transitions: %u\n", atomic_read(&engine->sseu_transitions));
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0e93ad90d039..320b416482e1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -520,6 +520,11 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 	if (port_isset(port))
 		i915_request_put(port_request(port));
 
+	if (rq->sseu.value != rq->engine->last_sseu.value) {
+		rq->engine->last_sseu = rq->sseu;
+		atomic_inc(&rq->engine->sseu_transitions);
+	}
+
 	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
 }
 
@@ -779,6 +784,7 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
+		bool completed = i915_request_completed(rq);
 
 		GEM_TRACE("%s:port%u global=%d (fence %llx:%d), (current %d)\n",
 			  rq->engine->name,
@@ -789,10 +795,18 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 
 		GEM_BUG_ON(!execlists->active);
 		execlists_context_schedule_out(rq,
-					       i915_request_completed(rq) ?
+					       completed ?
 					       INTEL_CONTEXT_SCHEDULE_OUT :
 					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 
+		/*
+		 * Update the last known sseu configuration to the first
+		 * uncompleted request. Notice this works because we pop the
+		 * requests out of the ports in reverse order.
+		 */
+		if (!completed)
+			rq->engine->last_sseu = rq->sseu;
+
 		i915_request_put(rq);
 
 		memset(port, 0, sizeof(*port));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..cc7e73730469 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -343,6 +343,18 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	/**
+	 * @last_sseu: The last SSEU configuration submitted to the
+	 * hardware. Set to 0 if unknown.
+	 */
+	union intel_sseu last_sseu;
+
+	/**
+	 * @sseu_transitions: A counter of the number of powergating
+	 * transition this engine has gone through.
+	 */
+	atomic_t sseu_transitions;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-05-14 15:55 ` [PATCH v5 6/7] drm/i915: count powergating transitions per engine Lionel Landwerlin
@ 2018-05-14 15:56 ` Lionel Landwerlin
  2018-05-15  9:05   ` Tvrtko Ursulin
  2018-05-14 16:10 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev4) Patchwork
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-14 15:56 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
	                                                .instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  38 ++++++
 5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+			  const struct drm_i915_gem_context_param_sseu *user_sseu,
+			  union intel_sseu *ctx_sseu)
+{
+	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+	    user_sseu->slice_mask == 0)
+		return -EINVAL;
+
+	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+	    user_sseu->subslice_mask == 0)
+		return -EINVAL;
+
+	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+		return -EINVAL;
+
+	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice == 0)
+		return -EINVAL;
+
+	ctx_sseu->slice_mask = user_sseu->slice_mask;
+	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+	return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct i915_timeline *timeline;
+	struct i915_request *rq;
+	union intel_sseu actual_sseu;
+	enum intel_engine_id id;
+	int ret;
+
+	/*
+	 * First notify user when this capability is not available so that it
+	 * can be detected with any valid input.
+	 */
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
+		return 0;
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	i915_retire_requests(dev_priv);
+
+	/* Now use the RCS to actually reconfigure. */
+	engine = dev_priv->engine[RCS];
+
+	rq = i915_request_alloc(engine, dev_priv->kernel_context);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system. Act as if we recorded the user's request but program
+	 * the default sseu configuration. When i915/perf is stopped, the
+	 * recorded configuration will be programmed.
+	 */
+	actual_sseu = dev_priv->perf.oa.exclusive_stream ?
+		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
+		sseu;
+
+	ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
+	if (ret) {
+		__i915_request_add(rq, true);
+		return ret;
+	}
+
+	/* Queue this switch after all other activity */
+	list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
+		struct i915_request *prev;
+
+		prev = last_request_on_engine(timeline, engine);
+		if (prev)
+			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							 &prev->submit,
+							 I915_FENCE_GFP);
+	}
+
+	__i915_request_add(rq, true);
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = to_intel_context(ctx, engine);
+
+		ce->sseu.value = sseu.value;
+	}
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+		struct intel_context *ce;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = intel_engine_lookup_user(to_i915(dev),
+						  param_sseu.class,
+						  param_sseu.instance);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ce = &ctx->__engine[engine->id];
+
+		param_sseu.slice_mask = ce->sseu.slice_mask;
+		param_sseu.subslice_mask = ce->sseu.subslice_mask;
+		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		{
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			struct intel_engine_cs *engine;
+			union intel_sseu ctx_sseu;
+
+			if (args->size) {
+				ret = -EINVAL;
+				break;
+			}
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = intel_engine_lookup_user(dev_priv,
+							  user_sseu.class,
+							  user_sseu.instance);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+							&user_sseu, &ctx_sseu);
+			if (ret)
+				break;
 
+			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
+								ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 320b416482e1..4e0216f33c75 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 }
 static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu)
+{
+	u32 rpcs = 0;
+
+	/*
+	 * Starting in Gen9, render power gating can leave
+	 * slice/subslice/EU in a partially enabled state. We
+	 * must make an explicit request through RPCS for full
+	 * enablement.
+	 */
+	if (sseu->has_slice_pg) {
+		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_subslice_pg) {
+		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
+			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
+			GEN8_RPCS_EU_MIN_SHIFT;
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
+			GEN8_RPCS_EU_MAX_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	return rpcs;
+}
+
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct i915_gem_context *ctx,
+				 union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = rq->i915;
+	struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]);
+	u64 offset;
+	u32 *cs;
+
+	/* Let the deferred state allocation take care of this. */
+	if (!ce->state)
+		return 0;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = ce->state->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
@@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
 
+	engine->emit_rpcs_config = gen8_emit_rpcs_config;
+
 	engine->set_default_submission = execlists_set_default_submission;
 
 	if (INTEL_GEN(engine->i915) < 11) {
@@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
-		   union intel_sseu ctx_sseu)
-{
-	u32 rpcs = 0;
-
-	/*
-	 * Starting in Gen9, render power gating can leave
-	 * slice/subslice/EU in a partially enabled state. We
-	 * must make an explicit request through RPCS for full
-	 * enablement.
-	*/
-	if (sseu->has_slice_pg) {
-		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_subslice_pg) {
-		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_eu_pg) {
-		rpcs |= ctx_sseu.min_eus_per_subslice <<
-			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= ctx_sseu.max_eus_per_subslice <<
-			GEN8_RPCS_EU_MAX_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	return rpcs;
-}
-
 static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 {
 	u32 indirect_ctx_offset;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f19349a6055..44fb3a1cf8f9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 			engine->emit_breadcrumb_sz++;
 	}
 
+	engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
+
 	engine->set_default_submission = i9xx_set_default_submission;
 
 	if (INTEL_GEN(dev_priv) >= 6)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index cc7e73730469..9745f4ab8214 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -463,6 +463,10 @@ struct intel_engine_cs {
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
+	int		(*emit_rpcs_config)(struct i915_request *rq,
+					    struct i915_gem_context *ctx,
+					    union intel_sseu sseu);
+
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..24b90836ce1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u32 class;
+	__u32 instance;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u8 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u8 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u8 min_eus_per_subslice;
+	__u8 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev4)
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2018-05-14 15:56 ` [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-14 16:10 ` Patchwork
  2018-05-14 16:12 ` ✗ Fi.CI.SPARSE: " Patchwork
  2018-05-14 16:33 ` ✓ Fi.CI.BAT: success " Patchwork
  9 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2018-05-14 16:10 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev4)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
d66d43edbd8d drm/i915: Program RPCS for Broadwell
61ad4a566887 drm/i915: Record the sseu configuration per-context & engine
ad11435f9851 drm/i915/perf: simplify configure all context function
c5534ae9cbfb drm/i915/perf: reuse intel_lrc ctx regs macro
5d2fc23e9965 drm/i915/perf: lock powergating configuration to default when active
2060fd56a22b drm/i915: count powergating transitions per engine
a68f546b6c49 drm/i915: Expose RPCS (SSEU) configuration to userspace
-:40: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#40: 
v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

total: 0 errors, 1 warnings, 0 checks, 374 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: per context slice/subslice powergating (rev4)
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2018-05-14 16:10 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev4) Patchwork
@ 2018-05-14 16:12 ` Patchwork
  2018-05-14 16:33 ` ✓ Fi.CI.BAT: success " Patchwork
  9 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2018-05-14 16:12 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev4)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: Record the sseu configuration per-context & engine
Okay!

Commit: drm/i915/perf: simplify configure all context function
Okay!

Commit: drm/i915/perf: reuse intel_lrc ctx regs macro
Okay!

Commit: drm/i915/perf: lock powergating configuration to default when active
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    expected void [noderef] <asn:4>**slot
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    expected void **slot
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    expected void **slot
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    expected void **slot
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    got void [noderef] <asn:4>**
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    got void [noderef] <asn:4>**
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    got void [noderef] <asn:4>**
-drivers/gpu/drm/i915/gvt/gtt.c:671:9:    got void **slot
-drivers/gpu/drm/i915/gvt/gtt.c:671:9: warning: incorrect type in argument 1 (different address spaces)
-drivers/gpu/drm/i915/gvt/gtt.c:671:9: warning: incorrect type in assignment (different address spaces)
-drivers/gpu/drm/i915/gvt/gtt.c:671:9: warning: incorrect type in assignment (different address spaces)
-drivers/gpu/drm/i915/gvt/gtt.c:671:9: warning: incorrect type in assignment (different address spaces)
-drivers/gpu/drm/i915/gvt/gtt.c:672:45:    expected void [noderef] <asn:4>**slot
-drivers/gpu/drm/i915/gvt/gtt.c:672:45:    got void **slot
-drivers/gpu/drm/i915/gvt/gtt.c:672:45: warning: incorrect type in argument 1 (different address spaces)
-drivers/gpu/drm/i915/gvt/mmio.c:253:23: warning: memcpy with byte count of 279040
-drivers/gpu/drm/i915/gvt/mmio.c:254:23: warning: memcpy with byte count of 279040
-drivers/gpu/drm/i915/gvt/vgpu.c:144:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:144:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/gvt/vgpu.c:192:48: warning: expression using sizeof(void)
+                ^~
+                                             ^~
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_drv.h:3663:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gpu_error.c:955:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gpu_error.c:955:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_perf.c:1368:15: warning: memset with byte count of 16777216
-drivers/gpu/drm/i915/i915_perf.c:1426:15: warning: memset with byte count of 16777216
-drivers/gpu/drm/i915/intel_bios.c:1489:24: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2136:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2139:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2159:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2167:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2200:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2200:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2236:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2236:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2438:17: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_cdclk.c:2438:17: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:176:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:184:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:187:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:190:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:192:46: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:195:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:198:41: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:275:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_color.c:96:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_ddi.c:696:24: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_ddi.c:698:24: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:13045:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:13045:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:14191:24: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:2670:28: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:2670:28: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:6658:18: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:900:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_display.c:900:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp_aux_backlight.c:157:21: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1281:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1667:23: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:1667:23: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:186:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:186:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:231:30: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:231:30: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:308:28: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:308:28: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5235:30: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5616:31: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5645:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5645:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5645:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5645:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5646:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5646:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5646:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5646:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5647:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5647:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5647:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5647:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5648:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5648:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5648:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5648:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5649:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5649:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5649:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dp.c:5649:9: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi.c:113:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi.c:97:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:644:26: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:644:26: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:686:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:686:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:718:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_dsi_vbt.c:718:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_fbc.c:88:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_fbc.c:90:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_fbdev.c:338:30: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/intel_engine_cs.c:496:15: warning: call with no type!
+drivers/gpu/drm/i915/intel_engine_cs.c:496:16: error: ‘struct intel_engine_cs’ has no member named ‘last_sseu’
+drivers/gpu/drm/i915/intel_engine_cs.c:496:23: error: no member 'last_sseu' in struct intel_engine_cs
+drivers/gpu/drm/i915/intel_engine_cs.c:496:45: error: ‘struct intel_engine_cs’ has no member named ‘last_sseu’
+drivers/gpu/drm/i915/intel_engine_cs.c: In function ‘intel_engine_setup_common’:
-drivers/gpu/drm/i915/intel_hdmi.c:1494:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1494:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1511:42: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1511:42: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1515:42: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1515:42: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_hdmi.c:1518:42: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_i2c.c:403:23: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_i2c.c:465:23: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_overlay.c:865:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_overlay.c:865:29: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1493:15: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1532:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1562:34: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/intel_panel.c:1603:34: warn

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: per context slice/subslice powergating (rev4)
  2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2018-05-14 16:12 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-14 16:33 ` Patchwork
  9 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2018-05-14 16:33 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev4)
URL   : https://patchwork.freedesktop.org/series/42285/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4178 -> Patchwork_8994 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_8994 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8994, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/4/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8994:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_8994 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       PASS -> DMESG-WARN (fdo#105128)

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
      fi-elk-e7500:       PASS -> DMESG-WARN (fdo#105225)

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-c:
      fi-elk-e7500:       SKIP -> INCOMPLETE (fdo#103989)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-kbl-7567u:       PASS -> FAIL (fdo#103191, fdo#104724)

    igt@prime_vgem@basic-fence-flip:
      fi-kbl-7500u:       PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@drv_module_reload@basic-reload:
      fi-glk-j4005:       DMESG-WARN (fdo#106248) -> PASS

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         FAIL (fdo#102575) -> PASS

    igt@kms_busy@basic-flip-c:
      fi-glk-j4005:       DMESG-WARN (fdo#106097) -> PASS +2

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103989 https://bugs.freedesktop.org/show_bug.cgi?id=103989
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#105225 https://bugs.freedesktop.org/show_bug.cgi?id=105225
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248


== Participating hosts (41 -> 37) ==

  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4178 -> Patchwork_8994

  CI_DRM_4178: 8213a085ddd82871fab4bf94d1a3555fcdf9e6ac @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4478: b871b10f2a6250d6dbe31665b267820fee829c84 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8994: a68f546b6c496e87a752530012cff9ce1a3a5bef @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4478: 3ba0657bff4216d1ec7179935590261855f1651e @ git://anongit.freedesktop.org/piglit


== Linux commits ==

a68f546b6c49 drm/i915: Expose RPCS (SSEU) configuration to userspace
2060fd56a22b drm/i915: count powergating transitions per engine
5d2fc23e9965 drm/i915/perf: lock powergating configuration to default when active
c5534ae9cbfb drm/i915/perf: reuse intel_lrc ctx regs macro
ad11435f9851 drm/i915/perf: simplify configure all context function
61ad4a566887 drm/i915: Record the sseu configuration per-context & engine
d66d43edbd8d drm/i915: Program RPCS for Broadwell

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8994/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-14 15:56 ` [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-15  9:05   ` Tvrtko Ursulin
  2018-05-16 15:40     ` Tvrtko Ursulin
  2018-05-21 13:22     ` Lionel Landwerlin
  0 siblings, 2 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2018-05-15  9:05 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 14/05/2018 16:56, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> 	struct drm_i915_gem_context_param arg;
> 	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> 	                                                .instance = 0, };
> 
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	arg.value = (uintptr_t) &sseu;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		sseu.packed.subslice_mask = 0;
> 
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>   include/uapi/drm/i915_drm.h             |  38 ++++++
>   5 files changed, 281 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 01310c99e032..0b72a771c3f3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	return 0;
>   }
>   
> +static int
> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +			  const struct drm_i915_gem_context_param_sseu *user_sseu,
> +			  union intel_sseu *ctx_sseu)
> +{
> +	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
> +	    user_sseu->slice_mask == 0)
> +		return -EINVAL;
> +
> +	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
> +	    user_sseu->subslice_mask == 0)
> +		return -EINVAL;
> +
> +	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
> +		return -EINVAL;
> +
> +	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
> +	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
> +	    user_sseu->max_eus_per_subslice == 0)
> +		return -EINVAL;
> +
> +	ctx_sseu->slice_mask = user_sseu->slice_mask;
> +	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
> +	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
> +	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
> +
> +	return 0;
> +}
> +
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +				  struct intel_engine_cs *engine,
> +				  union intel_sseu sseu)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +	struct i915_timeline *timeline;
> +	struct i915_request *rq;
> +	union intel_sseu actual_sseu;
> +	enum intel_engine_id id;
> +	int ret;
> +
> +	/*
> +	 * First notify user when this capability is not available so that it
> +	 * can be detected with any valid input.
> +	 */
> +	if (!engine->emit_rpcs_config)
> +		return -ENODEV;
> +
> +	if (to_intel_context(ctx, engine)->sseu.value == sseu.value)

Are there other uses for the value union in the series? Think whether it 
could be dropped and memcmp used here for simplicity.

> +		return 0;
> +
> +	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +	i915_retire_requests(dev_priv);
> +
> +	/* Now use the RCS to actually reconfigure. */
> +	engine = dev_priv->engine[RCS];
> +
> +	rq = i915_request_alloc(engine, dev_priv->kernel_context);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. Act as if we recorded the user's request but program
> +	 * the default sseu configuration. When i915/perf is stopped, the
> +	 * recorded configuration will be programmed.
> +	 */
> +	actual_sseu = dev_priv->perf.oa.exclusive_stream ?

This feels it should jump out more since there is no obvious connection 
between the two. Probably a helper of some sort like 
intel_sanitize_sseu? Or intel_apply_sseu? Merge? ... ?

The comment would then be in the helper which I think would be better 
than sprinkle OA knowledge into context params code.

> +		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
> +		sseu;
> +
> +	ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
> +	if (ret) {
> +		__i915_request_add(rq, true);
> +		return ret;
> +	}
> +
> +	/* Queue this switch after all other activity */
> +	list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
> +		struct i915_request *prev;
> +
> +		prev = last_request_on_engine(timeline, engine);
> +		if (prev)
> +			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +							 &prev->submit,
> +							 I915_FENCE_GFP);
> +	}
> +
> +	__i915_request_add(rq, true);

This is actually the bit I reading the patch for. So this I think is 
much better/safer than previous idling. However one concern, and maybe 
not a concern but just something which needs to be explained in the 
uAPI, is what it means with regards to the point from which the new 
configuration becomes live.

Could preemption for instance make it not defined enough? Could we, or 
should we, also link this request somehow onto the issuing context 
timeline so it must be first there? Hm, should we use the user context 
instead of the kernel one, and set the highest priority? But would have 
to avoid triggering preemption.

> +	/*
> +	 * Apply the configuration to all engine. Our hardware doesn't
> +	 * currently support different configurations for each engine.
> +	 */
> +	for_each_engine(engine, dev_priv, id) {
> +		struct intel_context *ce = to_intel_context(ctx, engine);
> +
> +		ce->sseu.value = sseu.value;
> +	}
> +
> +	return 0;
> +}
> +
>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file)
>   {
> @@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		args->value = ctx->sched.priority;
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU: {
> +		struct drm_i915_gem_context_param_sseu param_sseu;
> +		struct intel_engine_cs *engine;
> +		struct intel_context *ce;
> +
> +		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +				   sizeof(param_sseu))) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		engine = intel_engine_lookup_user(to_i915(dev),
> +						  param_sseu.class,
> +						  param_sseu.instance);
> +		if (!engine) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		ce = &ctx->__engine[engine->id];

to_intel_context(ctx, engine) ?

> +
> +		param_sseu.slice_mask = ce->sseu.slice_mask;
> +		param_sseu.subslice_mask = ce->sseu.subslice_mask;
> +		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> +		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> +
> +		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +				 sizeof(param_sseu)))
> +			ret = -EFAULT;
> +		break;
> +	}
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				ctx->sched.priority = priority;
>   		}
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		{
> +			struct drm_i915_private *dev_priv = to_i915(dev);
> +			struct drm_i915_gem_context_param_sseu user_sseu;
> +			struct intel_engine_cs *engine;
> +			union intel_sseu ctx_sseu;
> +
> +			if (args->size) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +					   sizeof(user_sseu))) {
> +				ret = -EFAULT;
> +				break;
> +			}
> +
> +			engine = intel_engine_lookup_user(dev_priv,
> +							  user_sseu.class,
> +							  user_sseu.instance);
> +			if (!engine) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +							&user_sseu, &ctx_sseu);

Should setter have a helper as well?

> +			if (ret)
> +				break;
>   
> +			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
> +								ctx_sseu);
> +		}
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 320b416482e1..4e0216f33c75 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   }
>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>   
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   union intel_sseu ctx_sseu)
> +{
> +	u32 rpcs = 0;
> +
> +	/*
> +	 * Starting in Gen9, render power gating can leave
> +	 * slice/subslice/EU in a partially enabled state. We
> +	 * must make an explicit request through RPCS for full
> +	 * enablement.
> +	 */
> +	if (sseu->has_slice_pg) {
> +		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> +		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	if (sseu->has_subslice_pg) {
> +		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> +		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
> +			GEN8_RPCS_SS_CNT_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	if (sseu->has_eu_pg) {
> +		rpcs |= ctx_sseu.min_eus_per_subslice <<
> +			GEN8_RPCS_EU_MIN_SHIFT;
> +		rpcs |= ctx_sseu.max_eus_per_subslice <<
> +			GEN8_RPCS_EU_MAX_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	return rpcs;
> +}
> +
> +static int gen8_emit_rpcs_config(struct i915_request *rq,
> +				 struct i915_gem_context *ctx,
> +				 union intel_sseu sseu)
> +{
> +	struct drm_i915_private *dev_priv = rq->i915;
> +	struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]);
> +	u64 offset;
> +	u32 *cs;
> +
> +	/* Let the deferred state allocation take care of this. */
> +	if (!ce->state)
> +		return 0;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	offset = ce->state->node.start +
> +		LRC_STATE_PN * PAGE_SIZE +
> +		(CTX_R_PWR_CLK_STATE + 1) * 4;
> +
> +	*cs++ = MI_STORE_DWORD_IMM_GEN4;
> +	*cs++ = lower_32_bits(offset);
> +	*cs++ = upper_32_bits(offset);
> +	*cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
>   static int gen8_init_rcs_context(struct i915_request *rq)
>   {
>   	int ret;
> @@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->emit_breadcrumb = gen8_emit_breadcrumb;
>   	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>   
> +	engine->emit_rpcs_config = gen8_emit_rpcs_config;
> +
>   	engine->set_default_submission = execlists_set_default_submission;
>   
>   	if (INTEL_GEN(engine->i915) < 11) {
> @@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> -		   union intel_sseu ctx_sseu)
> -{
> -	u32 rpcs = 0;
> -
> -	/*
> -	 * Starting in Gen9, render power gating can leave
> -	 * slice/subslice/EU in a partially enabled state. We
> -	 * must make an explicit request through RPCS for full
> -	 * enablement.
> -	*/
> -	if (sseu->has_slice_pg) {
> -		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> -		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	if (sseu->has_subslice_pg) {
> -		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> -		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
> -			GEN8_RPCS_SS_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	if (sseu->has_eu_pg) {
> -		rpcs |= ctx_sseu.min_eus_per_subslice <<
> -			GEN8_RPCS_EU_MIN_SHIFT;
> -		rpcs |= ctx_sseu.max_eus_per_subslice <<
> -			GEN8_RPCS_EU_MAX_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	return rpcs;
> -}
> -
>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
>   {
>   	u32 indirect_ctx_offset;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8f19349a6055..44fb3a1cf8f9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   			engine->emit_breadcrumb_sz++;
>   	}
>   
> +	engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
> +
>   	engine->set_default_submission = i9xx_set_default_submission;
>   
>   	if (INTEL_GEN(dev_priv) >= 6)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index cc7e73730469..9745f4ab8214 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -463,6 +463,10 @@ struct intel_engine_cs {
>   	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>   	int		emit_breadcrumb_sz;
>   
> +	int		(*emit_rpcs_config)(struct i915_request *rq,
> +					    struct i915_gem_context *ctx,
> +					    union intel_sseu sseu);
> +
>   	/* Pass the request to the hardware queue (e.g. directly into
>   	 * the legacy ringbuffer or to the end of an execlist).
>   	 *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 7f5634ce8e88..24b90836ce1d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +	/*
> +	 * When using the following param, value should be a pointer to
> +	 * drm_i915_gem_context_param_sseu.
> +	 */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
>   	__u64 value;
>   };
>   
> +struct drm_i915_gem_context_param_sseu {
> +	/*
> +	 * Engine class & instance to be configured or queried.
> +	 */
> +	__u32 class;
> +	__u32 instance;
> +
> +	/*
> +	 * Mask of slices to enable for the context. Valid values are a subset
> +	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
> +	 */
> +	__u8 slice_mask;
> +
> +	/*
> +	 * Mask of subslices to enable for the context. Valid values are a
> +	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
> +	 */
> +	__u8 subslice_mask;
> +
> +	/*
> +	 * Minimum/Maximum number of EUs to enable per subslice for the
> +	 * context. min_eus_per_subslice must be inferior or equal to
> +	 * max_eus_per_subslice.
> +	 */
> +	__u8 min_eus_per_subslice;
> +	__u8 max_eus_per_subslice;
> +
> +	/*
> +	 * Unused for now. Must be cleared to zero.
> +	 */
> +	__u32 rsvd;
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-15  9:05   ` Tvrtko Ursulin
@ 2018-05-16 15:40     ` Tvrtko Ursulin
  2018-05-16 15:44       ` Lionel Landwerlin
  2018-05-16 15:51       ` Chris Wilson
  2018-05-21 13:22     ` Lionel Landwerlin
  1 sibling, 2 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2018-05-16 15:40 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 15/05/2018 10:05, Tvrtko Ursulin wrote:
> 
> On 14/05/2018 16:56, Lionel Landwerlin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>     struct drm_i915_gem_context_param arg;
>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                     .instance = 0, };
>>
>>     memset(&arg, 0, sizeof(arg));
>>     arg.ctx_id = ctx;
>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>     arg.value = (uintptr_t) &sseu;
>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>         sseu.packed.subslice_mask = 0;
>>
>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>     }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>> (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities 
>> (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel 
>> context (Chris)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>   5 files changed, 281 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 01310c99e032..0b72a771c3f3 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
>> drm_device *dev, void *data,
>>       return 0;
>>   }
>> +static int
>> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
>> +              union intel_sseu *ctx_sseu)
>> +{
>> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
>> +        user_sseu->slice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
>> +        user_sseu->subslice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice < 
>> user_sseu->min_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice == 0)
>> +        return -EINVAL;
>> +
>> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
>> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
>> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
>> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct i915_timeline *timeline;
>> +    struct i915_request *rq;
>> +    union intel_sseu actual_sseu;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    /*
>> +     * First notify user when this capability is not available so 
>> that it
>> +     * can be detected with any valid input.
>> +     */
>> +    if (!engine->emit_rpcs_config)
>> +        return -ENODEV;
>> +
>> +    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
> 
> Are there other uses for the value union in the series? Think whether it 
> could be dropped and memcmp used here for simplicity.
> 
>> +        return 0;
>> +
>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +    i915_retire_requests(dev_priv);
>> +
>> +    /* Now use the RCS to actually reconfigure. */
>> +    engine = dev_priv->engine[RCS];
>> +
>> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
>> +    if (IS_ERR(rq))
>> +        return PTR_ERR(rq);
>> +
>> +    /*
>> +     * If i915/perf is active, we want a stable powergating 
>> configuration
>> +     * on the system. Act as if we recorded the user's request but 
>> program
>> +     * the default sseu configuration. When i915/perf is stopped, the
>> +     * recorded configuration will be programmed.
>> +     */
>> +    actual_sseu = dev_priv->perf.oa.exclusive_stream ?
> 
> This feels it should jump out more since there is no obvious connection 
> between the two. Probably a helper of some sort like 
> intel_sanitize_sseu? Or intel_apply_sseu? Merge? ... ?
> 
> The comment would then be in the helper which I think would be better 
> than sprinkle OA knowledge into context params code.

Furthermore, that idea to have a global sysfs knob would fit into this 
helper.

echo 1 > $i915_sysfs_root/allow_dynamic_slice_configuration

Or something.. As long as it defaults to off I think both camps should 
be happy.

Regards,

Tvrtko

> 
>> +        intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
>> +        sseu;
>> +
>> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
>> +    if (ret) {
>> +        __i915_request_add(rq, true);
>> +        return ret;
>> +    }
>> +
>> +    /* Queue this switch after all other activity */
>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>> +        struct i915_request *prev;
>> +
>> +        prev = last_request_on_engine(timeline, engine);
>> +        if (prev)
>> +            i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                             &prev->submit,
>> +                             I915_FENCE_GFP);
>> +    }
>> +
>> +    __i915_request_add(rq, true);
> 
> This is actually the bit I reading the patch for. So this I think is 
> much better/safer than previous idling. However one concern, and maybe 
> not a concern but just something which needs to be explained in the 
> uAPI, is what it means with regards to the point from which the new 
> configuration becomes live.
> 
> Could preemption for instance make it not defined enough? Could we, or 
> should we, also link this request somehow onto the issuing context 
> timeline so it must be first there? Hm, should we use the user context 
> instead of the kernel one, and set the highest priority? But would have 
> to avoid triggering preemption.
> 
>> +    /*
>> +     * Apply the configuration to all engine. Our hardware doesn't
>> +     * currently support different configurations for each engine.
>> +     */
>> +    for_each_engine(engine, dev_priv, id) {
>> +        struct intel_context *ce = to_intel_context(ctx, engine);
>> +
>> +        ce->sseu.value = sseu.value;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>                       struct drm_file *file)
>>   {
>> @@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct 
>> drm_device *dev, void *data,
>>       case I915_CONTEXT_PARAM_PRIORITY:
>>           args->value = ctx->sched.priority;
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU: {
>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>> +        struct intel_engine_cs *engine;
>> +        struct intel_context *ce;
>> +
>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                   sizeof(param_sseu))) {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        engine = intel_engine_lookup_user(to_i915(dev),
>> +                          param_sseu.class,
>> +                          param_sseu.instance);
>> +        if (!engine) {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        ce = &ctx->__engine[engine->id];
> 
> to_intel_context(ctx, engine) ?
> 
>> +
>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>> +        param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
>> +        param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
>> +
>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                 sizeof(param_sseu)))
>> +            ret = -EFAULT;
>> +        break;
>> +    }
>>       default:
>>           ret = -EINVAL;
>>           break;
>> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
>> drm_device *dev, void *data,
>>                   ctx->sched.priority = priority;
>>           }
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU:
>> +        {
>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>> +            struct intel_engine_cs *engine;
>> +            union intel_sseu ctx_sseu;
>> +
>> +            if (args->size) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>> +                       sizeof(user_sseu))) {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            engine = intel_engine_lookup_user(dev_priv,
>> +                              user_sseu.class,
>> +                              user_sseu.instance);
>> +            if (!engine) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>> +                            &user_sseu, &ctx_sseu);
> 
> Should setter have a helper as well?
> 
>> +            if (ret)
>> +                break;
>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>> +                                ctx_sseu);
>> +        }
>> +        break;
>>       default:
>>           ret = -EINVAL;
>>           break;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 320b416482e1..4e0216f33c75 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct 
>> i915_request *request, u32 *cs)
>>   }
>>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           union intel_sseu ctx_sseu)
>> +{
>> +    u32 rpcs = 0;
>> +
>> +    /*
>> +     * Starting in Gen9, render power gating can leave
>> +     * slice/subslice/EU in a partially enabled state. We
>> +     * must make an explicit request through RPCS for full
>> +     * enablement.
>> +     */
>> +    if (sseu->has_slice_pg) {
>> +        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_subslice_pg) {
>> +        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> +            GEN8_RPCS_SS_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_eu_pg) {
>> +        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MIN_SHIFT;
>> +        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MAX_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    return rpcs;
>> +}
>> +
>> +static int gen8_emit_rpcs_config(struct i915_request *rq,
>> +                 struct i915_gem_context *ctx,
>> +                 union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = rq->i915;
>> +    struct intel_context *ce = to_intel_context(ctx, 
>> dev_priv->engine[RCS]);
>> +    u64 offset;
>> +    u32 *cs;
>> +
>> +    /* Let the deferred state allocation take care of this. */
>> +    if (!ce->state)
>> +        return 0;
>> +
>> +    cs = intel_ring_begin(rq, 4);
>> +    if (IS_ERR(cs))
>> +        return PTR_ERR(cs);
>> +
>> +    offset = ce->state->node.start +
>> +        LRC_STATE_PN * PAGE_SIZE +
>> +        (CTX_R_PWR_CLK_STATE + 1) * 4;
>> +
>> +    *cs++ = MI_STORE_DWORD_IMM_GEN4;
>> +    *cs++ = lower_32_bits(offset);
>> +    *cs++ = upper_32_bits(offset);
>> +    *cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>> +
>> +    intel_ring_advance(rq, cs);
>> +
>> +    return 0;
>> +}
>> +
>>   static int gen8_init_rcs_context(struct i915_request *rq)
>>   {
>>       int ret;
>> @@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct 
>> intel_engine_cs *engine)
>>       engine->emit_breadcrumb = gen8_emit_breadcrumb;
>>       engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>> +    engine->emit_rpcs_config = gen8_emit_rpcs_config;
>> +
>>       engine->set_default_submission = execlists_set_default_submission;
>>       if (INTEL_GEN(engine->i915) < 11) {
>> @@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct 
>> intel_engine_cs *engine)
>>       return logical_ring_init(engine);
>>   }
>> -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> -           union intel_sseu ctx_sseu)
>> -{
>> -    u32 rpcs = 0;
>> -
>> -    /*
>> -     * Starting in Gen9, render power gating can leave
>> -     * slice/subslice/EU in a partially enabled state. We
>> -     * must make an explicit request through RPCS for full
>> -     * enablement.
>> -    */
>> -    if (sseu->has_slice_pg) {
>> -        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_subslice_pg) {
>> -        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> -            GEN8_RPCS_SS_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_eu_pg) {
>> -        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MIN_SHIFT;
>> -        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MAX_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    return rpcs;
>> -}
>> -
>>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
>>   {
>>       u32 indirect_ctx_offset;
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 8f19349a6055..44fb3a1cf8f9 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct 
>> drm_i915_private *dev_priv,
>>               engine->emit_breadcrumb_sz++;
>>       }
>> +    engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
>> +
>>       engine->set_default_submission = i9xx_set_default_submission;
>>       if (INTEL_GEN(dev_priv) >= 6)
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index cc7e73730469..9745f4ab8214 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -463,6 +463,10 @@ struct intel_engine_cs {
>>       void        (*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>>       int        emit_breadcrumb_sz;
>> +    int        (*emit_rpcs_config)(struct i915_request *rq,
>> +                        struct i915_gem_context *ctx,
>> +                        union intel_sseu sseu);
>> +
>>       /* Pass the request to the hardware queue (e.g. directly into
>>        * the legacy ringbuffer or to the end of an execlist).
>>        *
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 7f5634ce8e88..24b90836ce1d 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>> +    /*
>> +     * When using the following param, value should be a pointer to
>> +     * drm_i915_gem_context_param_sseu.
>> +     */
>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>       __u64 value;
>>   };
>> +struct drm_i915_gem_context_param_sseu {
>> +    /*
>> +     * Engine class & instance to be configured or queried.
>> +     */
>> +    __u32 class;
>> +    __u32 instance;
>> +
>> +    /*
>> +     * Mask of slices to enable for the context. Valid values are a 
>> subset
>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>> +     */
>> +    __u8 slice_mask;
>> +
>> +    /*
>> +     * Mask of subslices to enable for the context. Valid values are a
>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>> +     */
>> +    __u8 subslice_mask;
>> +
>> +    /*
>> +     * Minimum/Maximum number of EUs to enable per subslice for the
>> +     * context. min_eus_per_subslice must be inferior or equal to
>> +     * max_eus_per_subslice.
>> +     */
>> +    __u8 min_eus_per_subslice;
>> +    __u8 max_eus_per_subslice;
>> +
>> +    /*
>> +     * Unused for now. Must be cleared to zero.
>> +     */
>> +    __u32 rsvd;
>> +};
>> +
>>   enum drm_i915_oa_format {
>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>       I915_OA_FORMAT_A29,        /* HSW only */
>>
> 
> Regards,
> 
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-16 15:40     ` Tvrtko Ursulin
@ 2018-05-16 15:44       ` Lionel Landwerlin
  2018-05-16 15:51       ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-16 15:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 16/05/18 16:40, Tvrtko Ursulin wrote:
>
> On 15/05/2018 10:05, Tvrtko Ursulin wrote:
>>
>> On 14/05/2018 16:56, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration 
>>> for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and 
>>> forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>     struct drm_i915_gem_context_param arg;
>>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>                                                     .instance = 0, };
>>>
>>>     memset(&arg, 0, sizeof(arg));
>>>     arg.ctx_id = ctx;
>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>     arg.value = (uintptr_t) &sseu;
>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>         sseu.packed.subslice_mask = 0;
>>>
>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>     }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>>> (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> v5: Validate sseu configuration against the device's capabilities 
>>> (Lionel)
>>>
>>> v6: Change context powergating settings through MI_SDM on kernel 
>>> context (Chris)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 169 
>>> ++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>>   5 files changed, 281 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>>> b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index 01310c99e032..0b72a771c3f3 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
>>> drm_device *dev, void *data,
>>>       return 0;
>>>   }
>>> +static int
>>> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>>> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
>>> +              union intel_sseu *ctx_sseu)
>>> +{
>>> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
>>> +        user_sseu->slice_mask == 0)
>>> +        return -EINVAL;
>>> +
>>> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
>>> +        user_sseu->subslice_mask == 0)
>>> +        return -EINVAL;
>>> +
>>> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
>>> +        return -EINVAL;
>>> +
>>> +    if (user_sseu->max_eus_per_subslice > 
>>> sseu->max_eus_per_subslice ||
>>> +        user_sseu->max_eus_per_subslice < 
>>> user_sseu->min_eus_per_subslice ||
>>> +        user_sseu->max_eus_per_subslice == 0)
>>> +        return -EINVAL;
>>> +
>>> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
>>> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
>>> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
>>> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>>> +                  struct intel_engine_cs *engine,
>>> +                  union intel_sseu sseu)
>>> +{
>>> +    struct drm_i915_private *dev_priv = ctx->i915;
>>> +    struct i915_timeline *timeline;
>>> +    struct i915_request *rq;
>>> +    union intel_sseu actual_sseu;
>>> +    enum intel_engine_id id;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * First notify user when this capability is not available so 
>>> that it
>>> +     * can be detected with any valid input.
>>> +     */
>>> +    if (!engine->emit_rpcs_config)
>>> +        return -ENODEV;
>>> +
>>> +    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
>>
>> Are there other uses for the value union in the series? Think whether 
>> it could be dropped and memcmp used here for simplicity.
>>
>>> +        return 0;
>>> +
>>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>>> +
>>> +    i915_retire_requests(dev_priv);
>>> +
>>> +    /* Now use the RCS to actually reconfigure. */
>>> +    engine = dev_priv->engine[RCS];
>>> +
>>> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
>>> +    if (IS_ERR(rq))
>>> +        return PTR_ERR(rq);
>>> +
>>> +    /*
>>> +     * If i915/perf is active, we want a stable powergating 
>>> configuration
>>> +     * on the system. Act as if we recorded the user's request but 
>>> program
>>> +     * the default sseu configuration. When i915/perf is stopped, the
>>> +     * recorded configuration will be programmed.
>>> +     */
>>> +    actual_sseu = dev_priv->perf.oa.exclusive_stream ?
>>
>> This feels it should jump out more since there is no obvious 
>> connection between the two. Probably a helper of some sort like 
>> intel_sanitize_sseu? Or intel_apply_sseu? Merge? ... ?
>>
>> The comment would then be in the helper which I think would be better 
>> than sprinkle OA knowledge into context params code.
>
> Furthermore, that idea to have a global sysfs knob would fit into this 
> helper.
>
> echo 1 > $i915_sysfs_root/allow_dynamic_slice_configuration
>
> Or something.. As long as it defaults to off I think both camps should 
> be happy.
>
> Regards,
>
> Tvrtko

Thanks Tvrtko,

I was thinking the same as a simple first step.
I can add that in another revision.

Cheers,

-
Lionel

>
>>
>>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
>>> +        sseu;
>>> +
>>> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
>>> +    if (ret) {
>>> +        __i915_request_add(rq, true);
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* Queue this switch after all other activity */
>>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>>> +        struct i915_request *prev;
>>> +
>>> +        prev = last_request_on_engine(timeline, engine);
>>> +        if (prev)
>>> + i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>>> +                             &prev->submit,
>>> +                             I915_FENCE_GFP);
>>> +    }
>>> +
>>> +    __i915_request_add(rq, true);
>>
>> This is actually the bit I reading the patch for. So this I think is 
>> much better/safer than previous idling. However one concern, and 
>> maybe not a concern but just something which needs to be explained in 
>> the uAPI, is what it means with regards to the point from which the 
>> new configuration becomes live.
>>
>> Could preemption for instance make it not defined enough? Could we, 
>> or should we, also link this request somehow onto the issuing context 
>> timeline so it must be first there? Hm, should we use the user 
>> context instead of the kernel one, and set the highest priority? But 
>> would have to avoid triggering preemption.
>>
>>> +    /*
>>> +     * Apply the configuration to all engine. Our hardware doesn't
>>> +     * currently support different configurations for each engine.
>>> +     */
>>> +    for_each_engine(engine, dev_priv, id) {
>>> +        struct intel_context *ce = to_intel_context(ctx, engine);
>>> +
>>> +        ce->sseu.value = sseu.value;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void 
>>> *data,
>>>                       struct drm_file *file)
>>>   {
>>> @@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>       case I915_CONTEXT_PARAM_PRIORITY:
>>>           args->value = ctx->sched.priority;
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU: {
>>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>>> +        struct intel_engine_cs *engine;
>>> +        struct intel_context *ce;
>>> +
>>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                   sizeof(param_sseu))) {
>>> +            ret = -EFAULT;
>>> +            break;
>>> +        }
>>> +
>>> +        engine = intel_engine_lookup_user(to_i915(dev),
>>> +                          param_sseu.class,
>>> +                          param_sseu.instance);
>>> +        if (!engine) {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        ce = &ctx->__engine[engine->id];
>>
>> to_intel_context(ctx, engine) ?
>>
>>> +
>>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>>> +        param_sseu.min_eus_per_subslice = 
>>> ce->sseu.min_eus_per_subslice;
>>> +        param_sseu.max_eus_per_subslice = 
>>> ce->sseu.max_eus_per_subslice;
>>> +
>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                 sizeof(param_sseu)))
>>> +            ret = -EFAULT;
>>> +        break;
>>> +    }
>>>       default:
>>>           ret = -EINVAL;
>>>           break;
>>> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>                   ctx->sched.priority = priority;
>>>           }
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU:
>>> +        {
>>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>>> +            struct intel_engine_cs *engine;
>>> +            union intel_sseu ctx_sseu;
>>> +
>>> +            if (args->size) {
>>> +                ret = -EINVAL;
>>> +                break;
>>> +            }
>>> +
>>> +            if (copy_from_user(&user_sseu, 
>>> u64_to_user_ptr(args->value),
>>> +                       sizeof(user_sseu))) {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            engine = intel_engine_lookup_user(dev_priv,
>>> +                              user_sseu.class,
>>> +                              user_sseu.instance);
>>> +            if (!engine) {
>>> +                ret = -EINVAL;
>>> +                break;
>>> +            }
>>> +
>>> +            ret = 
>>> intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>> +                            &user_sseu, &ctx_sseu);
>>
>> Should setter have a helper as well?
>>
>>> +            if (ret)
>>> +                break;
>>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>>> +                                ctx_sseu);
>>> +        }
>>> +        break;
>>>       default:
>>>           ret = -EINVAL;
>>>           break;
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>>> b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 320b416482e1..4e0216f33c75 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct 
>>> i915_request *request, u32 *cs)
>>>   }
>>>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>>> +           union intel_sseu ctx_sseu)
>>> +{
>>> +    u32 rpcs = 0;
>>> +
>>> +    /*
>>> +     * Starting in Gen9, render power gating can leave
>>> +     * slice/subslice/EU in a partially enabled state. We
>>> +     * must make an explicit request through RPCS for full
>>> +     * enablement.
>>> +     */
>>> +    if (sseu->has_slice_pg) {
>>> +        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>>> +        rpcs |= hweight8(ctx_sseu.slice_mask) << 
>>> GEN8_RPCS_S_CNT_SHIFT;
>>> +        rpcs |= GEN8_RPCS_ENABLE;
>>> +    }
>>> +
>>> +    if (sseu->has_subslice_pg) {
>>> +        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>>> +        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>>> +            GEN8_RPCS_SS_CNT_SHIFT;
>>> +        rpcs |= GEN8_RPCS_ENABLE;
>>> +    }
>>> +
>>> +    if (sseu->has_eu_pg) {
>>> +        rpcs |= ctx_sseu.min_eus_per_subslice <<
>>> +            GEN8_RPCS_EU_MIN_SHIFT;
>>> +        rpcs |= ctx_sseu.max_eus_per_subslice <<
>>> +            GEN8_RPCS_EU_MAX_SHIFT;
>>> +        rpcs |= GEN8_RPCS_ENABLE;
>>> +    }
>>> +
>>> +    return rpcs;
>>> +}
>>> +
>>> +static int gen8_emit_rpcs_config(struct i915_request *rq,
>>> +                 struct i915_gem_context *ctx,
>>> +                 union intel_sseu sseu)
>>> +{
>>> +    struct drm_i915_private *dev_priv = rq->i915;
>>> +    struct intel_context *ce = to_intel_context(ctx, 
>>> dev_priv->engine[RCS]);
>>> +    u64 offset;
>>> +    u32 *cs;
>>> +
>>> +    /* Let the deferred state allocation take care of this. */
>>> +    if (!ce->state)
>>> +        return 0;
>>> +
>>> +    cs = intel_ring_begin(rq, 4);
>>> +    if (IS_ERR(cs))
>>> +        return PTR_ERR(cs);
>>> +
>>> +    offset = ce->state->node.start +
>>> +        LRC_STATE_PN * PAGE_SIZE +
>>> +        (CTX_R_PWR_CLK_STATE + 1) * 4;
>>> +
>>> +    *cs++ = MI_STORE_DWORD_IMM_GEN4;
>>> +    *cs++ = lower_32_bits(offset);
>>> +    *cs++ = upper_32_bits(offset);
>>> +    *cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>>> +
>>> +    intel_ring_advance(rq, cs);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static int gen8_init_rcs_context(struct i915_request *rq)
>>>   {
>>>       int ret;
>>> @@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct 
>>> intel_engine_cs *engine)
>>>       engine->emit_breadcrumb = gen8_emit_breadcrumb;
>>>       engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>>> +    engine->emit_rpcs_config = gen8_emit_rpcs_config;
>>> +
>>>       engine->set_default_submission = 
>>> execlists_set_default_submission;
>>>       if (INTEL_GEN(engine->i915) < 11) {
>>> @@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct 
>>> intel_engine_cs *engine)
>>>       return logical_ring_init(engine);
>>>   }
>>> -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>>> -           union intel_sseu ctx_sseu)
>>> -{
>>> -    u32 rpcs = 0;
>>> -
>>> -    /*
>>> -     * Starting in Gen9, render power gating can leave
>>> -     * slice/subslice/EU in a partially enabled state. We
>>> -     * must make an explicit request through RPCS for full
>>> -     * enablement.
>>> -    */
>>> -    if (sseu->has_slice_pg) {
>>> -        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>>> -        rpcs |= hweight8(ctx_sseu.slice_mask) << 
>>> GEN8_RPCS_S_CNT_SHIFT;
>>> -        rpcs |= GEN8_RPCS_ENABLE;
>>> -    }
>>> -
>>> -    if (sseu->has_subslice_pg) {
>>> -        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>>> -        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>>> -            GEN8_RPCS_SS_CNT_SHIFT;
>>> -        rpcs |= GEN8_RPCS_ENABLE;
>>> -    }
>>> -
>>> -    if (sseu->has_eu_pg) {
>>> -        rpcs |= ctx_sseu.min_eus_per_subslice <<
>>> -            GEN8_RPCS_EU_MIN_SHIFT;
>>> -        rpcs |= ctx_sseu.max_eus_per_subslice <<
>>> -            GEN8_RPCS_EU_MAX_SHIFT;
>>> -        rpcs |= GEN8_RPCS_ENABLE;
>>> -    }
>>> -
>>> -    return rpcs;
>>> -}
>>> -
>>>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs 
>>> *engine)
>>>   {
>>>       u32 indirect_ctx_offset;
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index 8f19349a6055..44fb3a1cf8f9 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct 
>>> drm_i915_private *dev_priv,
>>>               engine->emit_breadcrumb_sz++;
>>>       }
>>> +    engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
>>> +
>>>       engine->set_default_submission = i9xx_set_default_submission;
>>>       if (INTEL_GEN(dev_priv) >= 6)
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> index cc7e73730469..9745f4ab8214 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> @@ -463,6 +463,10 @@ struct intel_engine_cs {
>>>       void        (*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>>>       int        emit_breadcrumb_sz;
>>> +    int        (*emit_rpcs_config)(struct i915_request *rq,
>>> +                        struct i915_gem_context *ctx,
>>> +                        union intel_sseu sseu);
>>> +
>>>       /* Pass the request to the hardware queue (e.g. directly into
>>>        * the legacy ringbuffer or to the end of an execlist).
>>>        *
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 7f5634ce8e88..24b90836ce1d 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>>> +    /*
>>> +     * When using the following param, value should be a pointer to
>>> +     * drm_i915_gem_context_param_sseu.
>>> +     */
>>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>>       __u64 value;
>>>   };
>>> +struct drm_i915_gem_context_param_sseu {
>>> +    /*
>>> +     * Engine class & instance to be configured or queried.
>>> +     */
>>> +    __u32 class;
>>> +    __u32 instance;
>>> +
>>> +    /*
>>> +     * Mask of slices to enable for the context. Valid values are a 
>>> subset
>>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>>> +     */
>>> +    __u8 slice_mask;
>>> +
>>> +    /*
>>> +     * Mask of subslices to enable for the context. Valid values are a
>>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>>> +     */
>>> +    __u8 subslice_mask;
>>> +
>>> +    /*
>>> +     * Minimum/Maximum number of EUs to enable per subslice for the
>>> +     * context. min_eus_per_subslice must be inferior or equal to
>>> +     * max_eus_per_subslice.
>>> +     */
>>> +    __u8 min_eus_per_subslice;
>>> +    __u8 max_eus_per_subslice;
>>> +
>>> +    /*
>>> +     * Unused for now. Must be cleared to zero.
>>> +     */
>>> +    __u32 rsvd;
>>> +};
>>> +
>>>   enum drm_i915_oa_format {
>>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>>       I915_OA_FORMAT_A29,        /* HSW only */
>>>
>>
>> Regards,
>>
>> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-16 15:40     ` Tvrtko Ursulin
  2018-05-16 15:44       ` Lionel Landwerlin
@ 2018-05-16 15:51       ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2018-05-16 15:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-16 16:40:55)
> 
> On 15/05/2018 10:05, Tvrtko Ursulin wrote:
> > 
> > On 14/05/2018 16:56, Lionel Landwerlin wrote:
> >> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>     struct drm_i915_gem_context_param arg;
> >>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>                                                     .instance = 0, };
> >>
> >>     memset(&arg, 0, sizeof(arg));
> >>     arg.ctx_id = ctx;
> >>     arg.param = I915_CONTEXT_PARAM_SSEU;
> >>     arg.value = (uintptr_t) &sseu;
> >>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>         sseu.packed.subslice_mask = 0;
> >>
> >>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>     }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> >> (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities 
> >> (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel 
> >> context (Chris)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> >> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> >> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
> >>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h             |  38 ++++++
> >>   5 files changed, 281 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 01310c99e032..0b72a771c3f3 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
> >> drm_device *dev, void *data,
> >>       return 0;
> >>   }
> >> +static int
> >> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
> >> +              union intel_sseu *ctx_sseu)
> >> +{
> >> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
> >> +        user_sseu->slice_mask == 0)
> >> +        return -EINVAL;
> >> +
> >> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
> >> +        user_sseu->subslice_mask == 0)
> >> +        return -EINVAL;
> >> +
> >> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
> >> +        return -EINVAL;
> >> +
> >> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
> >> +        user_sseu->max_eus_per_subslice < 
> >> user_sseu->min_eus_per_subslice ||
> >> +        user_sseu->max_eus_per_subslice == 0)
> >> +        return -EINVAL;
> >> +
> >> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
> >> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
> >> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
> >> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> +                  struct intel_engine_cs *engine,
> >> +                  union intel_sseu sseu)
> >> +{
> >> +    struct drm_i915_private *dev_priv = ctx->i915;
> >> +    struct i915_timeline *timeline;
> >> +    struct i915_request *rq;
> >> +    union intel_sseu actual_sseu;
> >> +    enum intel_engine_id id;
> >> +    int ret;
> >> +
> >> +    /*
> >> +     * First notify user when this capability is not available so 
> >> that it
> >> +     * can be detected with any valid input.
> >> +     */
> >> +    if (!engine->emit_rpcs_config)
> >> +        return -ENODEV;
> >> +
> >> +    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
> > 
> > Are there other uses for the value union in the series? Think whether it 
> > could be dropped and memcmp used here for simplicity.
> > 
> >> +        return 0;
> >> +
> >> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >> +
> >> +    i915_retire_requests(dev_priv);
> >> +
> >> +    /* Now use the RCS to actually reconfigure. */
> >> +    engine = dev_priv->engine[RCS];
> >> +
> >> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
> >> +    if (IS_ERR(rq))
> >> +        return PTR_ERR(rq);

[snip]

> >> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
> >> +    if (ret) {
> >> +        __i915_request_add(rq, true);
> >> +        return ret;
> >> +    }
> >> +
> >> +    /* Queue this switch after all other activity */
> >> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
> >> +        struct i915_request *prev;
> >> +
> >> +        prev = last_request_on_engine(timeline, engine);
> >> +        if (prev)
> >> +            i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> >> +                             &prev->submit,
> >> +                             I915_FENCE_GFP);
> >> +    }
> >> +
> >> +    __i915_request_add(rq, true);
> > 
> > This is actually the bit I reading the patch for. So this I think is 
> > much better/safer than previous idling. However one concern, and maybe 
> > not a concern but just something which needs to be explained in the 
> > uAPI, is what it means with regards to the point from which the new 
> > configuration becomes live.
> > 
> > Could preemption for instance make it not defined enough? Could we, or 
> > should we, also link this request somehow onto the issuing context 
> > timeline so it must be first there? Hm, should we use the user context 
> > instead of the kernel one, and set the highest priority? But would have 
> > to avoid triggering preemption.

Preemption is a concern with the above code. A barrier does need to be
added from the target context back to the SDM request. What I had in
mind was preferably doing LRI _from_ each context. Then it is naturally
ordered with respect to execution on every context. Failing that, to
allow SDM you need to add a nop request on each context with await on
the sdm request, or we assign the sdm request a U32_MAX priority. (Using
that priority boost would massively mess up the timelines on the system
though, so try to avoid it. ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-15  9:05   ` Tvrtko Ursulin
  2018-05-16 15:40     ` Tvrtko Ursulin
@ 2018-05-21 13:22     ` Lionel Landwerlin
  2018-05-21 16:00       ` Tvrtko Ursulin
  1 sibling, 1 reply; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-21 13:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 15/05/18 10:05, Tvrtko Ursulin wrote:
>
> On 14/05/2018 16:56, Lionel Landwerlin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>     struct drm_i915_gem_context_param arg;
>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                     .instance = 0, };
>>
>>     memset(&arg, 0, sizeof(arg));
>>     arg.ctx_id = ctx;
>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>     arg.value = (uintptr_t) &sseu;
>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>         sseu.packed.subslice_mask = 0;
>>
>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>     }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>> (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities 
>> (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel 
>> context (Chris)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>   5 files changed, 281 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 01310c99e032..0b72a771c3f3 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
>> drm_device *dev, void *data,
>>       return 0;
>>   }
>>   +static int
>> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
>> +              union intel_sseu *ctx_sseu)
>> +{
>> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
>> +        user_sseu->slice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
>> +        user_sseu->subslice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice < 
>> user_sseu->min_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice == 0)
>> +        return -EINVAL;
>> +
>> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
>> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
>> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
>> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct i915_timeline *timeline;
>> +    struct i915_request *rq;
>> +    union intel_sseu actual_sseu;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    /*
>> +     * First notify user when this capability is not available so 
>> that it
>> +     * can be detected with any valid input.
>> +     */
>> +    if (!engine->emit_rpcs_config)
>> +        return -ENODEV;
>> +
>> +    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
>
> Are there other uses for the value union in the series? Think whether 
> it could be dropped and memcmp used here for simplicity.

Sure.

>
>> +        return 0;
>> +
>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +    i915_retire_requests(dev_priv);
>> +
>> +    /* Now use the RCS to actually reconfigure. */
>> +    engine = dev_priv->engine[RCS];
>> +
>> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
>> +    if (IS_ERR(rq))
>> +        return PTR_ERR(rq);
>> +
>> +    /*
>> +     * If i915/perf is active, we want a stable powergating 
>> configuration
>> +     * on the system. Act as if we recorded the user's request but 
>> program
>> +     * the default sseu configuration. When i915/perf is stopped, the
>> +     * recorded configuration will be programmed.
>> +     */
>> +    actual_sseu = dev_priv->perf.oa.exclusive_stream ?
>
> This feels it should jump out more since there is no obvious 
> connection between the two. Probably a helper of some sort like 
> intel_sanitize_sseu? Or intel_apply_sseu? Merge? ... ?
>
> The comment would then be in the helper which I think would be better 
> than sprinkle OA knowledge into context params code.

Okay.

>
>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
>> +        sseu;
>> +
>> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
>> +    if (ret) {
>> +        __i915_request_add(rq, true);
>> +        return ret;
>> +    }
>> +
>> +    /* Queue this switch after all other activity */
>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>> +        struct i915_request *prev;
>> +
>> +        prev = last_request_on_engine(timeline, engine);
>> +        if (prev)
>> + i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                             &prev->submit,
>> +                             I915_FENCE_GFP);
>> +    }
>> +
>> +    __i915_request_add(rq, true);
>
> This is actually the bit I reading the patch for. So this I think is 
> much better/safer than previous idling. However one concern, and maybe 
> not a concern but just something which needs to be explained in the 
> uAPI, is what it means with regards to the point from which the new 
> configuration becomes live.

To me it's seems that all context ioctl effects are ordered.
So I would expect any execbuf after this ioctl would have the right 
configuration.
Anything before would have the previous configuration.

If that's your understanind too, I can add that in the documentation.

>
> Could preemption for instance make it not defined enough? Could we, or 
> should we, also link this request somehow onto the issuing context 
> timeline so it must be first there? Hm, should we use the user context 
> instead of the kernel one, and set the highest priority? But would 
> have to avoid triggering preemption.

My understanding is that a context can't MI_LRI itself before Gen11.
I haven't tested that on Gen11 though.

I'm afraid that leaves us with only a max priority request tied to all 
the previous requests.
Or somehow keep a pointer to the last request to change the context 
powergating configuration and add a dependency on all new request until 
it's retired?

>
>> +    /*
>> +     * Apply the configuration to all engine. Our hardware doesn't
>> +     * currently support different configurations for each engine.
>> +     */
>> +    for_each_engine(engine, dev_priv, id) {
>> +        struct intel_context *ce = to_intel_context(ctx, engine);
>> +
>> +        ce->sseu.value = sseu.value;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void 
>> *data,
>>                       struct drm_file *file)
>>   {
>> @@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct 
>> drm_device *dev, void *data,
>>       case I915_CONTEXT_PARAM_PRIORITY:
>>           args->value = ctx->sched.priority;
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU: {
>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>> +        struct intel_engine_cs *engine;
>> +        struct intel_context *ce;
>> +
>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                   sizeof(param_sseu))) {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        engine = intel_engine_lookup_user(to_i915(dev),
>> +                          param_sseu.class,
>> +                          param_sseu.instance);
>> +        if (!engine) {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        ce = &ctx->__engine[engine->id];
>
> to_intel_context(ctx, engine) ?

Done.

>
>> +
>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>> +        param_sseu.min_eus_per_subslice = 
>> ce->sseu.min_eus_per_subslice;
>> +        param_sseu.max_eus_per_subslice = 
>> ce->sseu.max_eus_per_subslice;
>> +
>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                 sizeof(param_sseu)))
>> +            ret = -EFAULT;
>> +        break;
>> +    }
>>       default:
>>           ret = -EINVAL;
>>           break;
>> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
>> drm_device *dev, void *data,
>>                   ctx->sched.priority = priority;
>>           }
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU:
>> +        {
>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>> +            struct intel_engine_cs *engine;
>> +            union intel_sseu ctx_sseu;
>> +
>> +            if (args->size) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            if (copy_from_user(&user_sseu, 
>> u64_to_user_ptr(args->value),
>> +                       sizeof(user_sseu))) {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            engine = intel_engine_lookup_user(dev_priv,
>> +                              user_sseu.class,
>> +                              user_sseu.instance);
>> +            if (!engine) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = 
>> intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>> +                            &user_sseu, &ctx_sseu);
>
> Should setter have a helper as well?

I'm not sure what you're asking here :(
Setter for what?

>
>> +            if (ret)
>> +                break;
>>   +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>> +                                ctx_sseu);
>> +        }
>> +        break;
>>       default:
>>           ret = -EINVAL;
>>           break;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 320b416482e1..4e0216f33c75 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct 
>> i915_request *request, u32 *cs)
>>   }
>>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>>   +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           union intel_sseu ctx_sseu)
>> +{
>> +    u32 rpcs = 0;
>> +
>> +    /*
>> +     * Starting in Gen9, render power gating can leave
>> +     * slice/subslice/EU in a partially enabled state. We
>> +     * must make an explicit request through RPCS for full
>> +     * enablement.
>> +     */
>> +    if (sseu->has_slice_pg) {
>> +        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_subslice_pg) {
>> +        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> +            GEN8_RPCS_SS_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_eu_pg) {
>> +        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MIN_SHIFT;
>> +        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MAX_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    return rpcs;
>> +}
>> +
>> +static int gen8_emit_rpcs_config(struct i915_request *rq,
>> +                 struct i915_gem_context *ctx,
>> +                 union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = rq->i915;
>> +    struct intel_context *ce = to_intel_context(ctx, 
>> dev_priv->engine[RCS]);
>> +    u64 offset;
>> +    u32 *cs;
>> +
>> +    /* Let the deferred state allocation take care of this. */
>> +    if (!ce->state)
>> +        return 0;
>> +
>> +    cs = intel_ring_begin(rq, 4);
>> +    if (IS_ERR(cs))
>> +        return PTR_ERR(cs);
>> +
>> +    offset = ce->state->node.start +
>> +        LRC_STATE_PN * PAGE_SIZE +
>> +        (CTX_R_PWR_CLK_STATE + 1) * 4;
>> +
>> +    *cs++ = MI_STORE_DWORD_IMM_GEN4;
>> +    *cs++ = lower_32_bits(offset);
>> +    *cs++ = upper_32_bits(offset);
>> +    *cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>> +
>> +    intel_ring_advance(rq, cs);
>> +
>> +    return 0;
>> +}
>> +
>>   static int gen8_init_rcs_context(struct i915_request *rq)
>>   {
>>       int ret;
>> @@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct 
>> intel_engine_cs *engine)
>>       engine->emit_breadcrumb = gen8_emit_breadcrumb;
>>       engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>>   +    engine->emit_rpcs_config = gen8_emit_rpcs_config;
>> +
>>       engine->set_default_submission = execlists_set_default_submission;
>>         if (INTEL_GEN(engine->i915) < 11) {
>> @@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct 
>> intel_engine_cs *engine)
>>       return logical_ring_init(engine);
>>   }
>>   -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> -           union intel_sseu ctx_sseu)
>> -{
>> -    u32 rpcs = 0;
>> -
>> -    /*
>> -     * Starting in Gen9, render power gating can leave
>> -     * slice/subslice/EU in a partially enabled state. We
>> -     * must make an explicit request through RPCS for full
>> -     * enablement.
>> -    */
>> -    if (sseu->has_slice_pg) {
>> -        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_subslice_pg) {
>> -        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> -            GEN8_RPCS_SS_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_eu_pg) {
>> -        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MIN_SHIFT;
>> -        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MAX_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    return rpcs;
>> -}
>> -
>>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs 
>> *engine)
>>   {
>>       u32 indirect_ctx_offset;
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 8f19349a6055..44fb3a1cf8f9 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct 
>> drm_i915_private *dev_priv,
>>               engine->emit_breadcrumb_sz++;
>>       }
>>   +    engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
>> +
>>       engine->set_default_submission = i9xx_set_default_submission;
>>         if (INTEL_GEN(dev_priv) >= 6)
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index cc7e73730469..9745f4ab8214 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -463,6 +463,10 @@ struct intel_engine_cs {
>>       void        (*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>>       int        emit_breadcrumb_sz;
>>   +    int        (*emit_rpcs_config)(struct i915_request *rq,
>> +                        struct i915_gem_context *ctx,
>> +                        union intel_sseu sseu);
>> +
>>       /* Pass the request to the hardware queue (e.g. directly into
>>        * the legacy ringbuffer or to the end of an execlist).
>>        *
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 7f5634ce8e88..24b90836ce1d 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>> +    /*
>> +     * When using the following param, value should be a pointer to
>> +     * drm_i915_gem_context_param_sseu.
>> +     */
>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>       __u64 value;
>>   };
>>   +struct drm_i915_gem_context_param_sseu {
>> +    /*
>> +     * Engine class & instance to be configured or queried.
>> +     */
>> +    __u32 class;
>> +    __u32 instance;
>> +
>> +    /*
>> +     * Mask of slices to enable for the context. Valid values are a 
>> subset
>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>> +     */
>> +    __u8 slice_mask;
>> +
>> +    /*
>> +     * Mask of subslices to enable for the context. Valid values are a
>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>> +     */
>> +    __u8 subslice_mask;
>> +
>> +    /*
>> +     * Minimum/Maximum number of EUs to enable per subslice for the
>> +     * context. min_eus_per_subslice must be inferior or equal to
>> +     * max_eus_per_subslice.
>> +     */
>> +    __u8 min_eus_per_subslice;
>> +    __u8 max_eus_per_subslice;
>> +
>> +    /*
>> +     * Unused for now. Must be cleared to zero.
>> +     */
>> +    __u32 rsvd;
>> +};
>> +
>>   enum drm_i915_oa_format {
>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>       I915_OA_FORMAT_A29,        /* HSW only */
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-21 13:22     ` Lionel Landwerlin
@ 2018-05-21 16:00       ` Tvrtko Ursulin
  2018-05-21 16:14         ` Lionel Landwerlin
  2018-05-22 16:11         ` Lionel Landwerlin
  0 siblings, 2 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2018-05-21 16:00 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 21/05/2018 14:22, Lionel Landwerlin wrote:
> On 15/05/18 10:05, Tvrtko Ursulin wrote:
>>
>> On 14/05/2018 16:56, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>     struct drm_i915_gem_context_param arg;
>>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>                                                     .instance = 0, };
>>>
>>>     memset(&arg, 0, sizeof(arg));
>>>     arg.ctx_id = ctx;
>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>     arg.value = (uintptr_t) &sseu;
>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>         sseu.packed.subslice_mask = 0;
>>>
>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>     }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>>> (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> v5: Validate sseu configuration against the device's capabilities 
>>> (Lionel)
>>>
>>> v6: Change context powergating settings through MI_SDM on kernel 
>>> context (Chris)

[snip]

>>
>>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
>>> +        sseu;
>>> +
>>> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
>>> +    if (ret) {
>>> +        __i915_request_add(rq, true);
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* Queue this switch after all other activity */
>>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {

This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.

>>> +        struct i915_request *prev;
>>> +
>>> +        prev = last_request_on_engine(timeline, engine);
>>> +        if (prev)
>>> + i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>>> +                             &prev->submit,
>>> +                             I915_FENCE_GFP);
>>> +    }
>>> +
>>> +    __i915_request_add(rq, true);
>>
>> This is actually the bit I reading the patch for. So this I think is 
>> much better/safer than previous idling. However one concern, and maybe 
>> not a concern but just something which needs to be explained in the 
>> uAPI, is what it means with regards to the point from which the new 
>> configuration becomes live.
> 
> To me it's seems that all context ioctl effects are ordered.
> So I would expect any execbuf after this ioctl would have the right 
> configuration.
> Anything before would have the previous configuration.
> 
> If that's your understanind too, I can add that in the documentation.

I guess I am missing what prevents the context which issued the SSEU 
re-configuraiton to run before the re-configuration request executes. 
They are on separate timelines and subsequent execbuf on the issuing 
context won't depend on it.

>>
>> Could preemption for instance make it not defined enough? Could we, or 
>> should we, also link this request somehow onto the issuing context 
>> timeline so it must be first there? Hm, should we use the user context 
>> instead of the kernel one, and set the highest priority? But would 
>> have to avoid triggering preemption.
> 
> My understanding is that a context can't MI_LRI itself before Gen11.
> I haven't tested that on Gen11 though.
> 
> I'm afraid that leaves us with only a max priority request tied to all 
> the previous requests.
> Or somehow keep a pointer to the last request to change the context 
> powergating configuration and add a dependency on all new request until 
> it's retired?

I lost the train of dependencies here. :) [snip]

>>
>>> +
>>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>>> +        param_sseu.min_eus_per_subslice = 
>>> ce->sseu.min_eus_per_subslice;
>>> +        param_sseu.max_eus_per_subslice = 
>>> ce->sseu.max_eus_per_subslice;
>>> +
>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                 sizeof(param_sseu)))
>>> +            ret = -EFAULT;
>>> +        break;
>>> +    }
>>>       default:
>>>           ret = -EINVAL;
>>>           break;
>>> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>                   ctx->sched.priority = priority;
>>>           }
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU:
>>> +        {
>>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>>> +            struct intel_engine_cs *engine;
>>> +            union intel_sseu ctx_sseu;
>>> +
>>> +            if (args->size) {
>>> +                ret = -EINVAL;
>>> +                break;
>>> +            }
>>> +
>>> +            if (copy_from_user(&user_sseu, 
>>> u64_to_user_ptr(args->value),
>>> +                       sizeof(user_sseu))) {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            engine = intel_engine_lookup_user(dev_priv,
>>> +                              user_sseu.class,
>>> +                              user_sseu.instance);
>>> +            if (!engine) {
>>> +                ret = -EINVAL;
>>> +                break;
>>> +            }
>>> +
>>> +            ret = 
>>> intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>> +                            &user_sseu, &ctx_sseu);
>>
>> Should setter have a helper as well?
> 
> I'm not sure what you're asking here :(
> Setter for what?

set_param vs get_param. I only noticed you have helper to conver struct 
sseu from user to kernel version for one direction only. It could be it 
doesn't make sense to have the other, haven't looked that deeply.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-21 16:00       ` Tvrtko Ursulin
@ 2018-05-21 16:14         ` Lionel Landwerlin
  2018-05-22 16:11         ` Lionel Landwerlin
  1 sibling, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-21 16:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 21/05/18 17:00, Tvrtko Ursulin wrote:
>
> On 21/05/2018 14:22, Lionel Landwerlin wrote:
>> On 15/05/18 10:05, Tvrtko Ursulin wrote:
>>>
>>> On 14/05/2018 16:56, Lionel Landwerlin wrote:
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice 
>>>> configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and 
>>>> forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>>>
>>>> We expose complete control over the RPCS register, allowing
>>>> configuration of slice/subslice, via masks packed into a u64 for
>>>> simplicity. For example,
>>>>
>>>>     struct drm_i915_gem_context_param arg;
>>>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>> .instance = 0, };
>>>>
>>>>     memset(&arg, 0, sizeof(arg));
>>>>     arg.ctx_id = ctx;
>>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>>     arg.value = (uintptr_t) &sseu;
>>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 
>>>> 0) {
>>>>         sseu.packed.subslice_mask = 0;
>>>>
>>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>>     }
>>>>
>>>> could be used to disable all subslices where supported.
>>>>
>>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in 
>>>> intel_lr_context_set_sseu() (Lionel)
>>>>
>>>> v3: Add ability to program this per engine (Chris)
>>>>
>>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>>
>>>> v5: Validate sseu configuration against the device's capabilities 
>>>> (Lionel)
>>>>
>>>> v6: Change context powergating settings through MI_SDM on kernel 
>>>> context (Chris)
>
> [snip]
>
>>>
>>>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
>>>> +        sseu;
>>>> +
>>>> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
>>>> +    if (ret) {
>>>> +        __i915_request_add(rq, true);
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    /* Queue this switch after all other activity */
>>>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>
> This can iterate over gt.active_rings for a shorter walk. See current 
> state of engine_has_idle_kernel_context.

Thanks.

>
>>>> +        struct i915_request *prev;
>>>> +
>>>> +        prev = last_request_on_engine(timeline, engine);
>>>> +        if (prev)
>>>> + i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>>>> +                             &prev->submit,
>>>> +                             I915_FENCE_GFP);
>>>> +    }
>>>> +
>>>> +    __i915_request_add(rq, true);
>>>
>>> This is actually the bit I reading the patch for. So this I think is 
>>> much better/safer than previous idling. However one concern, and 
>>> maybe not a concern but just something which needs to be explained 
>>> in the uAPI, is what it means with regards to the point from which 
>>> the new configuration becomes live.
>>
>> To me it's seems that all context ioctl effects are ordered.
>> So I would expect any execbuf after this ioctl would have the right 
>> configuration.
>> Anything before would have the previous configuration.
>>
>> If that's your understanind too, I can add that in the documentation.
>
> I guess I am missing what prevents the context which issued the SSEU 
> re-configuraiton to run before the re-configuration request executes. 
> They are on separate timelines and subsequent execbuf on the issuing 
> context won't depend on it.
>
>>>
>>> Could preemption for instance make it not defined enough? Could we, 
>>> or should we, also link this request somehow onto the issuing 
>>> context timeline so it must be first there? Hm, should we use the 
>>> user context instead of the kernel one, and set the highest 
>>> priority? But would have to avoid triggering preemption.
>>
>> My understanding is that a context can't MI_LRI itself before Gen11.
>> I haven't tested that on Gen11 though.
>>
>> I'm afraid that leaves us with only a max priority request tied to 
>> all the previous requests.
>> Or somehow keep a pointer to the last request to change the context 
>> powergating configuration and add a dependency on all new request 
>> until it's retired?
>
> I lost the train of dependencies here. :) [snip]

Sorry, I was getting confused by reading Chris' reply at the same time.

There is a problem with preemption and since the documentation tells us 
that we can't MI_LRI on the context itself (like Chris suggested and 
what would be really simpler), we have to go with an MI_SDM which this 
patch currently lacks synchronization for.

>
>>>
>>>> +
>>>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>>>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>>>> +        param_sseu.min_eus_per_subslice = 
>>>> ce->sseu.min_eus_per_subslice;
>>>> +        param_sseu.max_eus_per_subslice = 
>>>> ce->sseu.max_eus_per_subslice;
>>>> +
>>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>>> +                 sizeof(param_sseu)))
>>>> +            ret = -EFAULT;
>>>> +        break;
>>>> +    }
>>>>       default:
>>>>           ret = -EINVAL;
>>>>           break;
>>>> @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
>>>> drm_device *dev, void *data,
>>>>                   ctx->sched.priority = priority;
>>>>           }
>>>>           break;
>>>> +    case I915_CONTEXT_PARAM_SSEU:
>>>> +        {
>>>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>>>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>>>> +            struct intel_engine_cs *engine;
>>>> +            union intel_sseu ctx_sseu;
>>>> +
>>>> +            if (args->size) {
>>>> +                ret = -EINVAL;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            if (copy_from_user(&user_sseu, 
>>>> u64_to_user_ptr(args->value),
>>>> +                       sizeof(user_sseu))) {
>>>> +                ret = -EFAULT;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            engine = intel_engine_lookup_user(dev_priv,
>>>> +                              user_sseu.class,
>>>> +                              user_sseu.instance);
>>>> +            if (!engine) {
>>>> +                ret = -EINVAL;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            ret = 
>>>> intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>>> +                            &user_sseu, &ctx_sseu);
>>>
>>> Should setter have a helper as well?
>>
>> I'm not sure what you're asking here :(
>> Setter for what?
>
> set_param vs get_param. I only noticed you have helper to conver 
> struct sseu from user to kernel version for one direction only. It 
> could be it doesn't make sense to have the other, haven't looked that 
> deeply.

Ah yeah, for the getter.

It's just that the setter does some error checking that the getter 
doesn't need to.
It makes the switch() { ... } a bit less verbose.

>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-21 16:00       ` Tvrtko Ursulin
  2018-05-21 16:14         ` Lionel Landwerlin
@ 2018-05-22 16:11         ` Lionel Landwerlin
  2018-05-22 16:13           ` Lionel Landwerlin
  1 sibling, 1 reply; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 16:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 433 bytes --]

On 21/05/18 17:00, Tvrtko Ursulin wrote:
>>>>
>>>> +
>>>> +    /* Queue this switch after all other activity */
>>>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>
> This can iterate over gt.active_rings for a shorter walk. See current 
> state of engine_has_idle_kernel_context.
>
For some reason, iterating over gt.active_rings will trigger an invalid 
memory access :|

Not sure what's wrong here...


[-- Attachment #1.2: Type: text/html, Size: 1159 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-22 16:11         ` Lionel Landwerlin
@ 2018-05-22 16:13           ` Lionel Landwerlin
  0 siblings, 0 replies; 20+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 16:13 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 669 bytes --]

On 22/05/18 17:11, Lionel Landwerlin wrote:
> On 21/05/18 17:00, Tvrtko Ursulin wrote:
>>>>>
>>>>> +
>>>>> +    /* Queue this switch after all other activity */
>>>>> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
>>
>> This can iterate over gt.active_rings for a shorter walk. See current 
>> state of engine_has_idle_kernel_context.
>>
> For some reason, iterating over gt.active_rings will trigger an 
> invalid memory access :|
>
> Not sure what's wrong here...
>
Duh!

Found it :

list_for_each_entry(ring, &dev_priv->gt.active_rings, link) {

Instead of :

list_for_each_entry(ring, &dev_priv->gt.active_rings, active_link) {

-
Lionel

[-- Attachment #1.2: Type: text/html, Size: 1736 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-05-22 16:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14 15:55 [PATCH v5 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
2018-05-14 15:55 ` [PATCH v5 6/7] drm/i915: count powergating transitions per engine Lionel Landwerlin
2018-05-14 15:56 ` [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-05-15  9:05   ` Tvrtko Ursulin
2018-05-16 15:40     ` Tvrtko Ursulin
2018-05-16 15:44       ` Lionel Landwerlin
2018-05-16 15:51       ` Chris Wilson
2018-05-21 13:22     ` Lionel Landwerlin
2018-05-21 16:00       ` Tvrtko Ursulin
2018-05-21 16:14         ` Lionel Landwerlin
2018-05-22 16:11         ` Lionel Landwerlin
2018-05-22 16:13           ` Lionel Landwerlin
2018-05-14 16:10 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev4) Patchwork
2018-05-14 16:12 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-14 16:33 ` ✓ Fi.CI.BAT: success " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.