All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] drm/i915: per context slice/subslice powergating
@ 2018-05-22 17:59 Lionel Landwerlin
  2018-05-22 17:59 ` [PATCH v6 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 17:59 UTC (permalink / raw)
  To: intel-gfx

Hi all,

This iteration adds a couple of things that were missing in v5 :

  - Synchronize requests on the last powergating change request

  - Add a new sysfs entry "gem_allow_sseu" to let normal users set
    their sseu configuration. It's disabled by default for normal
    users.

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (4):
  drm/i915/perf: simplify configure all context function
  drm/i915/perf: reuse intel_lrc ctx regs macro
  drm/i915/perf: lock powergating configuration to default when active
  drm/i915: add a sysfs entry to let users set sseu configs

 drivers/gpu/drm/i915/i915_drv.h         |  34 ++++
 drivers/gpu/drm/i915/i915_gem.c         |   2 +
 drivers/gpu/drm/i915/i915_gem_context.c | 226 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h |  17 ++
 drivers/gpu/drm/i915/i915_perf.c        |  69 ++++----
 drivers/gpu/drm/i915/i915_request.c     |  20 +++
 drivers/gpu/drm/i915/i915_request.h     |  13 ++
 drivers/gpu/drm/i915/i915_sysfs.c       |  30 ++++
 drivers/gpu/drm/i915/intel_lrc.c        | 117 +++++++-----
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  38 ++++
 13 files changed, 501 insertions(+), 74 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 1/7] drm/i915: Program RPCS for Broadwell
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-05-22 17:59 ` Lionel Landwerlin
  2018-05-22 17:59 ` [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 17:59 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 857ab04452f0..c2500c209c63 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2486,13 +2486,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-22 17:59 ` [PATCH v6 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2018-05-22 17:59 ` Lionel Landwerlin
  2018-05-23 14:54   ` Tvrtko Ursulin
  2018-05-22 17:59 ` [PATCH v6 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 17:59 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem_context.h | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h     | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 22 +++++++++++-----------
 4 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b69b18ef8120..ea9ae1046827 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -260,6 +260,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	unsigned int n;
 	int ret;
 
@@ -315,6 +317,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = to_intel_context(ctx, engine);
+
+		ce->sseu = intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+	}
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index c3262b4dd2ee..b18d870c4932 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -30,6 +30,7 @@
 #include <linux/radix-tree.h>
 
 #include "i915_gem.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -157,6 +158,9 @@ struct i915_gem_context {
 		int pin_count;
 
 		const struct intel_context_ops *ops;
+		
+		/** sseu: Control eu/slice partitioning */
+		union intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
@@ -335,4 +339,17 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
+static inline union intel_sseu
+intel_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
+{
+	union intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 1bbbb7a9fa03..aca60895582b 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -39,6 +39,19 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+union intel_sseu {
+	struct {
+		u8 slice_mask;
+		u8 subslice_mask;
+		u8 min_eus_per_subslice;
+		u8 max_eus_per_subslice;
+	};
+	u64 value;
+};
+
 struct intel_wait {
 	struct rb_node node;
 	struct task_struct *tsk;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c2500c209c63..f875be03eadb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2481,8 +2481,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     union intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2492,24 +2492,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
 			GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2633,7 +2632,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  ctx->__engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 3/7] drm/i915/perf: simplify configure all context function
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-22 17:59 ` [PATCH v6 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
  2018-05-22 17:59 ` [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-05-22 17:59 ` Lionel Landwerlin
  2018-05-23 14:54   ` Tvrtko Ursulin
  2018-05-22 17:59 ` [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 17:59 UTC (permalink / raw)
  To: intel-gfx

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4f0eb84b3c00..805dfc732bba 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1762,7 +1762,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1779,7 +1779,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	 */
 	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1791,10 +1791,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1804,7 +1802,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-05-22 17:59 ` [PATCH v6 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
@ 2018-05-22 17:59 ` Lionel Landwerlin
  2018-05-23 14:57   ` Tvrtko Ursulin
  2018-05-22 18:00 ` [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 17:59 UTC (permalink / raw)
  To: intel-gfx

Abstract the context image access a bit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 805dfc732bba..a5d98bda5c2e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -210,6 +210,7 @@
 #include "i915_oa_cflgt3.h"
 #include "i915_oa_cnl.h"
 #include "i915_oa_icl.h"
+#include "intel_lrc_reg.h"
 
 /* HW requires this to be a power of two, between 128k and 16M, though driver
  * is currently generally designed assuming the largest 16M size is used such
@@ -1579,27 +1580,25 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
 	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
-	u32 flex_mmio[] = {
-		i915_mmio_reg_offset(EU_PERF_CNTL0),
-		i915_mmio_reg_offset(EU_PERF_CNTL1),
-		i915_mmio_reg_offset(EU_PERF_CNTL2),
-		i915_mmio_reg_offset(EU_PERF_CNTL3),
-		i915_mmio_reg_offset(EU_PERF_CNTL4),
-		i915_mmio_reg_offset(EU_PERF_CNTL5),
-		i915_mmio_reg_offset(EU_PERF_CNTL6),
+	i915_reg_t flex_regs[] = {
+		EU_PERF_CNTL0,
+		EU_PERF_CNTL1,
+		EU_PERF_CNTL2,
+		EU_PERF_CNTL3,
+		EU_PERF_CNTL4,
+		EU_PERF_CNTL5,
+		EU_PERF_CNTL6,
 	};
 	int i;
 
-	reg_state[ctx_oactxctrl] = i915_mmio_reg_offset(GEN8_OACTXCONTROL);
-	reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
-				      GEN8_OA_TIMER_PERIOD_SHIFT) |
-				     (dev_priv->perf.oa.periodic ?
-				      GEN8_OA_TIMER_ENABLE : 0) |
-				     GEN8_OA_COUNTER_RESUME;
+	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
+		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
+		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
+		GEN8_OA_COUNTER_RESUME);
 
-	for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
+	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
 		u32 state_offset = ctx_flexeu0 + i * 2;
-		u32 mmio = flex_mmio[i];
+		u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
 
 		/*
 		 * This arbitrary default will select the 'EU FPU0 Pipeline
@@ -1619,8 +1618,7 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 			}
 		}
 
-		reg_state[state_offset] = mmio;
-		reg_state[state_offset+1] = value;
+		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-05-22 17:59 ` [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
@ 2018-05-22 18:00 ` Lionel Landwerlin
  2018-05-23 15:04   ` Tvrtko Ursulin
  2018-05-22 18:00 ` [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 18:00 UTC (permalink / raw)
  To: intel-gfx

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active. There is no known issue with this apart from a performance
penality for some media workloads that benefit from running on a
partially powergated GPU. We already prevent RC6 from affecting the
programming so it doesn't sound completely unreasonable to hold on
powergating for the same reason.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_perf.c | 24 +++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
 drivers/gpu/drm/i915/intel_lrc.h |  3 +++
 4 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b86ed6401120..21631b51b37b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2743,6 +2743,22 @@ int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
 int intel_engines_init_mmio(struct drm_i915_private *dev_priv);
 int intel_engines_init(struct drm_i915_private *dev_priv);
 
+static inline union intel_sseu
+intel_engine_prepare_sseu(struct intel_engine_cs *engine,
+			  union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system. The most natural configuration to take in that case
+	 * is the default (i.e maximum the hardware can do).
+	 */
+	return dev_priv->perf.oa.exclusive_stream ?
+		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
+		sseu;
+}
+
 /* intel_hotplug.c */
 void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 			   u32 pin_mask, u32 long_mask);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a5d98bda5c2e..7ba8a3ff744c 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1574,7 +1574,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  */
 static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+					   const struct i915_oa_config *oa_config,
+					   union intel_sseu sseu)
 {
 	struct drm_i915_private *dev_priv = ctx->i915;
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
@@ -1620,6 +1621,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
 }
 
 /*
@@ -1751,6 +1755,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				       const struct i915_oa_config *oa_config)
 {
 	struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	union intel_sseu default_sseu =
+		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
 	struct i915_gem_context *ctx;
 	int ret;
 	unsigned int wait_flags = I915_WAIT_LOCKED;
@@ -1795,7 +1801,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
+					       oa_config ? default_sseu : ce->sseu);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2167,14 +2174,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    u32 *reg_state)
 {
+	struct drm_i915_private *dev_priv = engine->i915;
 	struct i915_perf_stream *stream;
 
 	if (engine->id != RCS)
 		return;
 
-	stream = engine->i915->perf.oa.exclusive_stream;
-	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+	stream = dev_priv->perf.oa.exclusive_stream;
+	if (stream) {
+		union intel_sseu default_sseu =
+			intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+
+		gen8_update_reg_state_unlocked(ctx, reg_state,
+					       stream->oa_config,
+					       default_sseu);
+	}
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f875be03eadb..8fb6e66a7a84 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2481,8 +2481,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(const struct sseu_dev_info *sseu,
-		     union intel_sseu ctx_sseu)
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2630,10 +2630,13 @@ static void execlists_init_reg_state(u32 *regs,
 	}
 
 	if (rcs) {
+		union intel_sseu sseu = ctx->__engine[engine->id].sseu;
+
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
-				  ctx->__engine[engine->id].sseu));
+			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				       intel_engine_prepare_sseu(engine,
+								 sseu)));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 1593194e930c..265da7228869 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,4 +104,7 @@ struct i915_gem_context;
 
 void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu);
+
 #endif /* _INTEL_LRC_H_ */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-05-22 18:00 ` [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
@ 2018-05-22 18:00 ` Lionel Landwerlin
  2018-05-23 15:13   ` Tvrtko Ursulin
  2018-05-22 18:00 ` [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs Lionel Landwerlin
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 18:00 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
	                                                .instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
    dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  13 ++
 drivers/gpu/drm/i915/i915_gem.c         |   2 +
 drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c     |  20 +++
 drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  38 ++++++
 8 files changed, 314 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 21631b51b37b..09cfcfe1c339 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2067,6 +2067,12 @@ struct drm_i915_private {
 		u32 active_requests;
 		u32 request_serial;
 
+		/**
+		 * Global barrier to ensuring ordering of sseu transitions
+		 * requests.
+		 */
+		struct i915_gem_active global_barrier;
+
 		/**
 		 * Is the GPU currently considered idle, or busy executing
 		 * userspace requests? Whilst idle, we allow runtime power
@@ -3227,6 +3233,13 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
+static inline void i915_gem_set_global_barrier(struct drm_i915_private *i915,
+					       struct i915_request *rq)
+{
+	lockdep_assert_held(&i915->drm.struct_mutex);
+	i915_gem_active_set(&i915->gt.global_barrier, rq);
+}
+
 /* i915_gem_fence_reg.c */
 struct drm_i915_fence_reg *
 i915_reserve_fence(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 03874b50ada9..9c2a0d04bd39 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5548,6 +5548,8 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->priorities)
 		goto err_dependencies;
 
+	init_request_active(&dev_priv->gt.global_barrier, NULL);
+
 	INIT_LIST_HEAD(&dev_priv->gt.timelines);
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ea9ae1046827..5c5a12f1c265 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -730,6 +730,103 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+			  const struct drm_i915_gem_context_param_sseu *user_sseu,
+			  union intel_sseu *ctx_sseu)
+{
+	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+	    user_sseu->slice_mask == 0)
+		return -EINVAL;
+
+	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+	    user_sseu->subslice_mask == 0)
+		return -EINVAL;
+
+	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+		return -EINVAL;
+
+	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice == 0)
+		return -EINVAL;
+
+	ctx_sseu->slice_mask = user_sseu->slice_mask;
+	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+	return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct i915_request *rq;
+	struct intel_ring *ring;
+	enum intel_engine_id id;
+	int ret;
+
+	/*
+	 * First notify user when this capability is not available so that it
+	 * can be detected with any valid input.
+	 */
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	if (memcmp(&to_intel_context(ctx, engine)->sseu,
+		   &sseu, sizeof(sseu)) == 0) {
+		return 0;
+	}
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	i915_retire_requests(dev_priv);
+
+	/* Now use the RCS to actually reconfigure. */
+	engine = dev_priv->engine[RCS];
+
+	rq = i915_request_alloc(engine, dev_priv->kernel_context);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	ret = engine->emit_rpcs_config(rq, ctx,
+				       intel_engine_prepare_sseu(engine, sseu));
+	if (ret) {
+		__i915_request_add(rq, true);
+		return ret;
+	}
+
+	/* Queue this switch after all other activity */
+	list_for_each_entry(ring, &dev_priv->gt.active_rings, active_link) {
+		struct i915_request *prev;
+
+		prev = last_request_on_engine(ring->timeline, engine);
+		if (prev)
+			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							 &prev->submit,
+							 I915_FENCE_GFP);
+	}
+
+	i915_gem_set_global_barrier(dev_priv, rq);
+	__i915_request_add(rq, true);
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = to_intel_context(ctx, engine);
+
+		ce->sseu.value = sseu.value;
+	}
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+		struct intel_context *ce;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = intel_engine_lookup_user(to_i915(dev),
+						  param_sseu.class,
+						  param_sseu.instance);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ce = to_intel_context(ctx, engine);
+
+		param_sseu.slice_mask = ce->sseu.slice_mask;
+		param_sseu.subslice_mask = ce->sseu.subslice_mask;
+		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -841,7 +969,46 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		{
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			struct intel_engine_cs *engine;
+			union intel_sseu ctx_sseu;
 
+			if (args->size) {
+				ret = -EINVAL;
+				break;
+			}
+
+			if (!capable(CAP_SYS_ADMIN)) {
+				ret = -EPERM;
+				break;
+			}
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = intel_engine_lookup_user(dev_priv,
+							  user_sseu.class,
+							  user_sseu.instance);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+							&user_sseu, &ctx_sseu);
+			if (ret)
+				break;
+
+			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
+								ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fc499bcbd105..9f0b965125c4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -643,6 +643,22 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	return NOTIFY_DONE;
 }
 
+static int
+i915_request_await_request(struct i915_request *to, struct i915_request *from);
+
+static int add_global_barrier(struct i915_request *rq)
+{
+	struct i915_request *barrier;
+	int ret = 0;
+
+	barrier = i915_gem_active_raw(&rq->i915->gt.global_barrier,
+				      &rq->i915->drm.struct_mutex);
+	if (barrier)
+		ret = i915_request_await_request(rq, barrier);
+
+	return ret;
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -804,6 +820,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_global_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	/* Unconditionally invalidate GPU caches and TLBs. */
 	ret = engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8fb6e66a7a84..e52c9511b5fb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2240,6 +2240,72 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 }
 static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   union intel_sseu ctx_sseu)
+{
+	u32 rpcs = 0;
+
+	/*
+	 * Starting in Gen9, render power gating can leave
+	 * slice/subslice/EU in a partially enabled state. We
+	 * must make an explicit request through RPCS for full
+	 * enablement.
+	 */
+	if (sseu->has_slice_pg) {
+		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_subslice_pg) {
+		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
+			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
+			GEN8_RPCS_EU_MIN_SHIFT;
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
+			GEN8_RPCS_EU_MAX_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	return rpcs;
+}
+
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct i915_gem_context *ctx,
+				 union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = rq->i915;
+	struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]);
+	u64 offset;
+	u32 *cs;
+
+	/* Let the deferred state allocation take care of this. */
+	if (!ce->state)
+		return 0;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = ce->state->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
@@ -2333,6 +2399,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
 
+	engine->emit_rpcs_config = gen8_emit_rpcs_config;
+
 	engine->set_default_submission = execlists_set_default_submission;
 
 	if (INTEL_GEN(engine->i915) < 11) {
@@ -2481,41 +2549,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
-		   union intel_sseu ctx_sseu)
-{
-	u32 rpcs = 0;
-
-	/*
-	 * Starting in Gen9, render power gating can leave
-	 * slice/subslice/EU in a partially enabled state. We
-	 * must make an explicit request through RPCS for full
-	 * enablement.
-	*/
-	if (sseu->has_slice_pg) {
-		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_subslice_pg) {
-		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_eu_pg) {
-		rpcs |= ctx_sseu.min_eus_per_subslice <<
-			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= ctx_sseu.max_eus_per_subslice <<
-			GEN8_RPCS_EU_MAX_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	return rpcs;
-}
-
 static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 {
 	u32 indirect_ctx_offset;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 001cf6bcb349..643466d4fa2d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2061,6 +2061,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 			engine->emit_breadcrumb_sz++;
 	}
 
+	engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
+
 	engine->set_default_submission = i9xx_set_default_submission;
 
 	if (INTEL_GEN(dev_priv) >= 6)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..6bf4d3b57ced 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -456,6 +456,10 @@ struct intel_engine_cs {
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
+	int		(*emit_rpcs_config)(struct i915_request *rq,
+					    struct i915_gem_context *ctx,
+					    union intel_sseu sseu);
+
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..24b90836ce1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u32 class;
+	__u32 instance;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u8 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u8 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u8 min_eus_per_subslice;
+	__u8 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-05-22 18:00 ` [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-22 18:00 ` Lionel Landwerlin
  2018-05-23 15:30   ` Tvrtko Ursulin
  2018-05-22 18:55 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev5) Patchwork
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-22 18:00 UTC (permalink / raw)
  To: intel-gfx

There are concerns about denial of service around the per context sseu
configuration capability. In a previous commit introducing the
capability we allowed it only for capable users. This changes adds a
new debugfs entry to let any user configure its own context
powergating setup.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  5 +++
 drivers/gpu/drm/i915/i915_gem_context.c | 52 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_sysfs.c       | 30 ++++++++++++++
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 09cfcfe1c339..0fccec29fdda 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1843,6 +1843,8 @@ struct drm_i915_private {
 		struct ida hw_ida;
 #define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
 #define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */
+
+		bool allow_sseu;
 	} contexts;
 
 	u32 fdi_rx_config;
@@ -3274,6 +3276,9 @@ i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
 	return ctx;
 }
 
+int i915_gem_contexts_set_allow_sseu(struct drm_i915_private *dev_priv, bool allowed);
+bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private *dev_priv);
+
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file);
 int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5c5a12f1c265..815a9d1c29f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				break;
 			}
 
-			if (!capable(CAP_SYS_ADMIN)) {
+			if (!dev_priv->contexts.allow_sseu &&
+			    !capable(CAP_SYS_ADMIN)) {
 				ret = -EPERM;
 				break;
 			}
@@ -1058,6 +1059,55 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
 	return ret;
 }
 
+int i915_gem_contexts_set_allow_sseu(struct drm_i915_private *dev_priv,
+				     bool allowed)
+{
+	struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	int ret = 0;
+
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	/*
+	 * When we allow each context to configure its powergating
+	 * configuration, there is no need to put the configurations back to
+	 * the default, it should already be the case.
+	 */
+	if (!allowed) {
+		union intel_sseu default_sseu =
+			intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+		struct i915_gem_context *ctx;
+
+		list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
+			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
+								default_sseu);
+			if (ret)
+				break;
+		}
+	}
+
+	dev_priv->contexts.allow_sseu = allowed;
+
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	return ret;
+}
+
+bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	bool ret;
+
+	if (!engine->emit_rpcs_config)
+		return false;
+
+	mutex_lock(&dev_priv->drm.struct_mutex);
+	ret = dev_priv->contexts.allow_sseu;
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	return ret;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_context.c"
 #include "selftests/i915_gem_context.c"
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index e5e6f6bb2b05..9fd15b138ac9 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -483,6 +483,34 @@ static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr
 	return snprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
+static ssize_t gem_allow_sseu_show(struct device *kdev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
+	int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", ret);
+}
+
+static ssize_t gem_allow_sseu_store(struct device *kdev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
+	u32 val;
+	ssize_t ret;
+
+	ret = kstrtou32(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	i915_gem_contexts_set_allow_sseu(dev_priv, val != 0);
+
+	return ret ?: count;
+}
+
+static DEVICE_ATTR_RW(gem_allow_sseu);
+
 static const struct attribute *gen6_attrs[] = {
 	&dev_attr_gt_act_freq_mhz.attr,
 	&dev_attr_gt_cur_freq_mhz.attr,
@@ -492,6 +520,7 @@ static const struct attribute *gen6_attrs[] = {
 	&dev_attr_gt_RP0_freq_mhz.attr,
 	&dev_attr_gt_RP1_freq_mhz.attr,
 	&dev_attr_gt_RPn_freq_mhz.attr,
+	&dev_attr_gem_allow_sseu.attr,
 	NULL,
 };
 
@@ -505,6 +534,7 @@ static const struct attribute *vlv_attrs[] = {
 	&dev_attr_gt_RP1_freq_mhz.attr,
 	&dev_attr_gt_RPn_freq_mhz.attr,
 	&dev_attr_vlv_rpe_freq_mhz.attr,
+	&dev_attr_gem_allow_sseu.attr,
 	NULL,
 };
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev5)
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2018-05-22 18:00 ` [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs Lionel Landwerlin
@ 2018-05-22 18:55 ` Patchwork
  2018-05-22 18:58 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-22 18:55 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev5)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
377ebb5122c5 drm/i915: Program RPCS for Broadwell
f8abc1461d28 drm/i915: Record the sseu configuration per-context & engine
-:65: ERROR:TRAILING_WHITESPACE: trailing whitespace
#65: FILE: drivers/gpu/drm/i915/i915_gem_context.h:161:
+^I^I$

total: 1 errors, 0 warnings, 0 checks, 123 lines checked
a6a8c598a410 drm/i915/perf: simplify configure all context function
59f0b58443a7 drm/i915/perf: reuse intel_lrc ctx regs macro
2bfb63f77f52 drm/i915/perf: lock powergating configuration to default when active
14eb252b3b42 drm/i915: Expose RPCS (SSEU) configuration to userspace
-:40: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#40: 
v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

total: 0 errors, 1 warnings, 0 checks, 437 lines checked
92c303df8706 drm/i915: add a sysfs entry to let users set sseu configs

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: per context slice/subslice powergating (rev5)
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2018-05-22 18:55 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev5) Patchwork
@ 2018-05-22 18:58 ` Patchwork
  2018-05-22 19:16 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-23  0:10 ` ✗ Fi.CI.IGT: failure " Patchwork
  10 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-22 18:58 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev5)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: Record the sseu configuration per-context & engine
Okay!

Commit: drm/i915/perf: simplify configure all context function
Okay!

Commit: drm/i915/perf: reuse intel_lrc ctx regs macro
Okay!

Commit: drm/i915/perf: lock powergating configuration to default when active
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3663:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3679:16: warning: expression using sizeof(void)

Commit: drm/i915: Expose RPCS (SSEU) configuration to userspace
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3679:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3692:16: warning: expression using sizeof(void)

Commit: drm/i915: add a sysfs entry to let users set sseu configs
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3692:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3697:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: per context slice/subslice powergating (rev5)
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2018-05-22 18:58 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-22 19:16 ` Patchwork
  2018-05-23  0:10 ` ✗ Fi.CI.IGT: failure " Patchwork
  10 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-22 19:16 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev5)
URL   : https://patchwork.freedesktop.org/series/42285/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4221 -> Patchwork_9089 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9089 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9089, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/5/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9089:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9089 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@kms_flip@basic-flip-vs-dpms:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-c-frame-sequence:
      fi-cfl-s3:          FAIL (fdo#103481) -> PASS

    
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927


== Participating hosts (44 -> 39) ==

  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4221 -> Patchwork_9089

  CI_DRM_4221: d83aef98e4f2e35440222c69ef80a68daf1abb4e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4490: 0b381c7d1067a4fe520b72d4d391d4920834cbe0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9089: 92c303df87067fd2ee535cf162ca72ce478d6d55 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4490: 6ab75f7eb5e1dccbb773e1739beeb2d7cbd6ad0d @ git://anongit.freedesktop.org/piglit


== Kernel 32bit build ==

Warning: Kernel 32bit buildtest failed:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9089/build_32bit.log

  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CHK     kernel/config_data.h
  CC [M]  drivers/gpu/drm/i915/i915_query.o
In file included from ./include/asm-generic/barrier.h:20:0,
                 from ./arch/x86/include/asm/barrier.h:86,
                 from ./include/linux/nospec.h:8,
                 from drivers/gpu/drm/i915/i915_query.c:7:
drivers/gpu/drm/i915/i915_query.c: In function ‘i915_query_ioctl’:
./include/linux/compiler.h:339:38: error: call to ‘__compiletime_assert_119’ declared with attribute error: BUILD_BUG_ON failed: sizeof(_s) > sizeof(long)
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                      ^
./include/linux/compiler.h:319:4: note: in definition of macro ‘__compiletime_assert’
    prefix ## suffix();    \
    ^~~~~~
./include/linux/compiler.h:339:2: note: in expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:45:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:69:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^~~~~~~~~~~~~~~~
./include/linux/nospec.h:53:2: note: in expansion of macro ‘BUILD_BUG_ON’
  BUILD_BUG_ON(sizeof(_i) > sizeof(long));   \
  ^~~~~~~~~~~~
drivers/gpu/drm/i915/i915_query.c:118:15: note: in expansion of macro ‘array_index_nospec’
    func_idx = array_index_nospec(func_idx,
               ^~~~~~~~~~~~~~~~~~
scripts/Makefile.build:312: recipe for target 'drivers/gpu/drm/i915/i915_query.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_query.o] Error 1
scripts/Makefile.build:559: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:559: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:559: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1060: recipe for target 'drivers' failed
make: *** [drivers] Error 2


== Linux commits ==

92c303df8706 drm/i915: add a sysfs entry to let users set sseu configs
14eb252b3b42 drm/i915: Expose RPCS (SSEU) configuration to userspace
2bfb63f77f52 drm/i915/perf: lock powergating configuration to default when active
59f0b58443a7 drm/i915/perf: reuse intel_lrc ctx regs macro
a6a8c598a410 drm/i915/perf: simplify configure all context function
f8abc1461d28 drm/i915: Record the sseu configuration per-context & engine
377ebb5122c5 drm/i915: Program RPCS for Broadwell

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9089/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: per context slice/subslice powergating (rev5)
  2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (9 preceding siblings ...)
  2018-05-22 19:16 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-23  0:10 ` Patchwork
  10 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-23  0:10 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev5)
URL   : https://patchwork.freedesktop.org/series/42285/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4221_full -> Patchwork_9089_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9089_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9089_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/5/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9089_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_ctx_param@invalid-param-get:
      shard-apl:          PASS -> FAIL +1
      shard-glk:          PASS -> FAIL
      shard-snb:          PASS -> FAIL
      shard-hsw:          PASS -> FAIL

    igt@gem_ctx_param@invalid-param-set:
      shard-hsw:          PASS -> DMESG-FAIL
      shard-snb:          PASS -> DMESG-FAIL
      shard-glk:          PASS -> DMESG-FAIL
      shard-apl:          PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@pm_rc6_residency@rc6-accuracy:
      shard-snb:          PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9089_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_color@pipe-b-ctm-negative:
      shard-apl:          PASS -> DMESG-FAIL (fdo#103558, fdo#105602)

    igt@kms_cursor_crc@cursor-256x256-suspend:
      shard-apl:          PASS -> FAIL (fdo#103375)

    igt@kms_flip@2x-flip-vs-blocking-wf-vblank:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip@2x-flip-vs-expired-vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#105707) +1

    
    ==== Possible fixes ====

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          FAIL (fdo#105703) -> PASS

    igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          FAIL (fdo#106509, fdo#105454) -> PASS

    igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
      shard-hsw:          FAIL (fdo#100368) -> PASS

    igt@kms_flip@wf_vblank-ts-check-interruptible:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#105454 https://bugs.freedesktop.org/show_bug.cgi?id=105454
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#105707 https://bugs.freedesktop.org/show_bug.cgi?id=105707
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509


== Participating hosts (9 -> 8) ==

  Missing    (1): shard-kbl 


== Build changes ==

    * Linux: CI_DRM_4221 -> Patchwork_9089

  CI_DRM_4221: d83aef98e4f2e35440222c69ef80a68daf1abb4e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4490: 0b381c7d1067a4fe520b72d4d391d4920834cbe0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9089: 92c303df87067fd2ee535cf162ca72ce478d6d55 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4490: 6ab75f7eb5e1dccbb773e1739beeb2d7cbd6ad0d @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9089/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine
  2018-05-22 17:59 ` [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-05-23 14:54   ` Tvrtko Ursulin
  2018-05-23 14:58     ` Chris Wilson
  2018-05-23 15:52     ` Lionel Landwerlin
  0 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 14:54 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 18:59, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to expose the ability to reconfigure the slices, subslice and
> eu per context and per engine. To facilitate that, store the current
> configuration on the context for each engine, which is initially set
> to the device default upon creation.
> 
> v2: record sseu configuration per context & engine (Chris)
> 
> v3: introduce the i915_gem_context_sseu to store powergating
>      programming, sseu_dev_info has grown quite a bit (Lionel)
> 
> v4: rename i915_gem_sseu into intel_sseu (Chris)
>      use to_intel_context() (Chris)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c |  9 +++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h | 17 +++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.h     | 13 +++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        | 22 +++++++++++-----------
>   4 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b69b18ef8120..ea9ae1046827 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -260,6 +260,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   		    struct drm_i915_file_private *file_priv)
>   {
>   	struct i915_gem_context *ctx;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	unsigned int n;
>   	int ret;
>   
> @@ -315,6 +317,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	 * is no remap info, it will be a NOP. */
>   	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
>   
> +	/* On all engines, use the whole device by default */
> +	for_each_engine(engine, dev_priv, id) {
> +		struct intel_context *ce = to_intel_context(ctx, engine);
> +
> +		ce->sseu = intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
> +	}
> +
>   	i915_gem_context_set_bannable(ctx);
>   	ctx->ring_size = 4 * PAGE_SIZE;
>   	ctx->desc_template =
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index c3262b4dd2ee..b18d870c4932 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -30,6 +30,7 @@
>   #include <linux/radix-tree.h>
>   
>   #include "i915_gem.h"
> +#include "intel_device_info.h"
>   
>   struct pid;
>   
> @@ -157,6 +158,9 @@ struct i915_gem_context {
>   		int pin_count;
>   
>   		const struct intel_context_ops *ops;
> +		
> +		/** sseu: Control eu/slice partitioning */
> +		union intel_sseu sseu;
>   	} __engine[I915_NUM_ENGINES];
>   
>   	/** ring_size: size for allocating the per-engine ring buffer */
> @@ -335,4 +339,17 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
>   	kref_put(&ctx->ref, i915_gem_context_release);
>   }
>   
> +static inline union intel_sseu
> +intel_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
> +{
> +	union intel_sseu value = {
> +		.slice_mask = sseu->slice_mask,
> +		.subslice_mask = sseu->subslice_mask[0],
> +		.min_eus_per_subslice = sseu->max_eus_per_subslice,
> +		.max_eus_per_subslice = sseu->max_eus_per_subslice,
> +	};
> +

Why do we need two internal representations for SSEU configuration?

(Comes back later.) Ok, the INTEL_INFO one is much larger and intel_sseu 
presumably is designed to only contain data which is dynamically changeable?

> +	return value;
> +}
> +
>   #endif /* !__I915_GEM_CONTEXT_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 1bbbb7a9fa03..aca60895582b 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -39,6 +39,19 @@ struct drm_i915_gem_object;
>   struct i915_request;
>   struct i915_timeline;
>   
> +/*
> + * Powergating configuration for a particular (context,engine).
> + */
> +union intel_sseu {
> +	struct {
> +		u8 slice_mask;
> +		u8 subslice_mask;
> +		u8 min_eus_per_subslice;
> +		u8 max_eus_per_subslice;
> +	};
> +	u64 value;

Why is the union needed?

> +};
> +
>   struct intel_wait {
>   	struct rb_node node;
>   	struct task_struct *tsk;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index c2500c209c63..f875be03eadb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2481,8 +2481,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32
> -make_rpcs(struct drm_i915_private *dev_priv)
> +static u32 make_rpcs(const struct sseu_dev_info *sseu,
> +		     union intel_sseu ctx_sseu)
>   {
>   	u32 rpcs = 0;
>   
> @@ -2492,24 +2492,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
>   	 * must make an explicit request through RPCS for full
>   	 * enablement.
>   	*/
> -	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
> +	if (sseu->has_slice_pg) {
>   		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> -		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
> -			GEN8_RPCS_S_CNT_SHIFT;
> +		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>   		rpcs |= GEN8_RPCS_ENABLE;
>   	}
>   
> -	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
> +	if (sseu->has_subslice_pg) {
>   		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> -		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
> +		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>   			GEN8_RPCS_SS_CNT_SHIFT;
>   		rpcs |= GEN8_RPCS_ENABLE;
>   	}
>   
> -	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
> -		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> +	if (sseu->has_eu_pg) {
> +		rpcs |= ctx_sseu.min_eus_per_subslice <<
>   			GEN8_RPCS_EU_MIN_SHIFT;
> -		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> +		rpcs |= ctx_sseu.max_eus_per_subslice <<
>   			GEN8_RPCS_EU_MAX_SHIFT;
>   		rpcs |= GEN8_RPCS_ENABLE;
>   	}
> @@ -2633,7 +2632,8 @@ static void execlists_init_reg_state(u32 *regs,
>   	if (rcs) {
>   		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> -			make_rpcs(dev_priv));
> +			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> +				  ctx->__engine[engine->id].sseu));

to_intel_context(ctx, engine)->sseu ?

>   
>   		i915_oa_init_reg_state(engine, ctx, regs);
>   	}
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/7] drm/i915/perf: simplify configure all context function
  2018-05-22 17:59 ` [PATCH v6 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
@ 2018-05-23 14:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 14:54 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 18:59, Lionel Landwerlin wrote:
> We don't need any special treatment on error so just return as soon as
> possible.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
>   1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 4f0eb84b3c00..805dfc732bba 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1762,7 +1762,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   	/* Switch away from any user context. */
>   	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
>   	if (ret)
> -		goto out;
> +		return ret;
>   
>   	/*
>   	 * The OA register config is setup through the context image. This image
> @@ -1779,7 +1779,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   	 */
>   	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
>   	if (ret)
> -		goto out;
> +		return ret;
>   
>   	/* Update all contexts now that we've stalled the submission. */
>   	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> @@ -1791,10 +1791,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   			continue;
>   
>   		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> -		if (IS_ERR(regs)) {
> -			ret = PTR_ERR(regs);
> -			goto out;
> -		}
> +		if (IS_ERR(regs))
> +			return PTR_ERR(regs);
>   
>   		ce->state->obj->mm.dirty = true;
>   		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
> @@ -1804,7 +1802,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   		i915_gem_object_unpin_map(ce->state->obj);
>   	}
>   
> - out:
>   	return ret;
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-05-22 17:59 ` [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
@ 2018-05-23 14:57   ` Tvrtko Ursulin
  2018-05-23 15:54     ` Lionel Landwerlin
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 14:57 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 18:59, Lionel Landwerlin wrote:
> Abstract the context image access a bit.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
>   1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 805dfc732bba..a5d98bda5c2e 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -210,6 +210,7 @@
>   #include "i915_oa_cflgt3.h"
>   #include "i915_oa_cnl.h"
>   #include "i915_oa_icl.h"
> +#include "intel_lrc_reg.h"
>   
>   /* HW requires this to be a power of two, between 128k and 16M, though driver
>    * is currently generally designed assuming the largest 16M size is used such
> @@ -1579,27 +1580,25 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>   	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
>   	/* The MMIO offsets for Flex EU registers aren't contiguous */
> -	u32 flex_mmio[] = {
> -		i915_mmio_reg_offset(EU_PERF_CNTL0),
> -		i915_mmio_reg_offset(EU_PERF_CNTL1),
> -		i915_mmio_reg_offset(EU_PERF_CNTL2),
> -		i915_mmio_reg_offset(EU_PERF_CNTL3),
> -		i915_mmio_reg_offset(EU_PERF_CNTL4),
> -		i915_mmio_reg_offset(EU_PERF_CNTL5),
> -		i915_mmio_reg_offset(EU_PERF_CNTL6),
> +	i915_reg_t flex_regs[] = {
> +		EU_PERF_CNTL0,
> +		EU_PERF_CNTL1,
> +		EU_PERF_CNTL2,
> +		EU_PERF_CNTL3,
> +		EU_PERF_CNTL4,
> +		EU_PERF_CNTL5,
> +		EU_PERF_CNTL6,
>   	};
>   	int i;
>   
> -	reg_state[ctx_oactxctrl] = i915_mmio_reg_offset(GEN8_OACTXCONTROL);
> -	reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
> -				      GEN8_OA_TIMER_PERIOD_SHIFT) |
> -				     (dev_priv->perf.oa.periodic ?
> -				      GEN8_OA_TIMER_ENABLE : 0) |
> -				     GEN8_OA_COUNTER_RESUME;
> +	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
> +		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
> +		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> +		GEN8_OA_COUNTER_RESUME);
>   
> -	for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
> +	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
>   		u32 state_offset = ctx_flexeu0 + i * 2;
> -		u32 mmio = flex_mmio[i];
> +		u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
>   
>   		/*
>   		 * This arbitrary default will select the 'EU FPU0 Pipeline
> @@ -1619,8 +1618,7 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   			}
>   		}
>   
> -		reg_state[state_offset] = mmio;
> -		reg_state[state_offset+1] = value;
> +		CTX_REG(reg_state, state_offset, flex_regs[i], value);

Does flex_regs[i] needs to be mmio?

>   	}
>   }
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine
  2018-05-23 14:54   ` Tvrtko Ursulin
@ 2018-05-23 14:58     ` Chris Wilson
  2018-05-23 15:52     ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-23 14:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-23 15:54:04)
> 
> On 22/05/2018 18:59, Lionel Landwerlin wrote:
> > @@ -315,6 +317,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
> >        * is no remap info, it will be a NOP. */
> >       ctx->remap_slice = ALL_L3_SLICES(dev_priv);
> >   
> > +     /* On all engines, use the whole device by default */
> > +     for_each_engine(engine, dev_priv, id) {
> > +             struct intel_context *ce = to_intel_context(ctx, engine);
> > +
> > +             ce->sseu = intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
> > +     }

See a few lines above where we iterate to setup all ce.

> > @@ -2633,7 +2632,8 @@ static void execlists_init_reg_state(u32 *regs,
> >       if (rcs) {
> >               regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> >               CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> > -                     make_rpcs(dev_priv));
> > +                     make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> > +                               ctx->__engine[engine->id].sseu));
> 
> to_intel_context(ctx, engine)->sseu ?

ce->sseu. Is it not passed in yet? Or is that part of the virtual engine
patch?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active
  2018-05-22 18:00 ` [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
@ 2018-05-23 15:04   ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 15:04 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 19:00, Lionel Landwerlin wrote:
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes.
> 
> One possible solution to this problem is to reprogram the NOA muxes
> when we switch to a new context. We initially tried this in the
> workaround batchbuffer but some concerns where raised about the cost
> of reprogramming at every context switch. This solution is also not
> without consequences from the userspace point of view. Reprogramming
> of the muxes can only happen once the powergating configuration has
> changed (which happens after context switch). This means for a window
> of time during the recording, counters recorded by the OA unit might
> be invalid. This requires userspace dealing with OA reports to discard
> the invalid values.
> 
> Minimizing the reprogramming could be implemented by tracking of the
> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> to discard some of the programming commands, but the command streamer
> would still have to parse all the MI_LRI instructions in the
> workaround batchbuffer.
> 
> Another solution, which this change implements, is to simply disregard
> the user requested configuration for the period of time when i915/perf
> is active. There is no known issue with this apart from a performance
> penality for some media workloads that benefit from running on a
> partially powergated GPU. We already prevent RC6 from affecting the
> programming so it doesn't sound completely unreasonable to hold on
> powergating for the same reason.
> 
> v2: Leave RPCS programming in intel_lrc.c (Lionel)
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_perf.c | 24 +++++++++++++++++++-----
>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>   4 files changed, 45 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b86ed6401120..21631b51b37b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2743,6 +2743,22 @@ int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
>   int intel_engines_init_mmio(struct drm_i915_private *dev_priv);
>   int intel_engines_init(struct drm_i915_private *dev_priv);
>   
> +static inline union intel_sseu
> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
> +			  union intel_sseu sseu)
> +{
> +	struct drm_i915_private *dev_priv = engine->i915;
> +
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. The most natural configuration to take in that case
> +	 * is the default (i.e maximum the hardware can do).
> +	 */
> +	return dev_priv->perf.oa.exclusive_stream ?
> +		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
> +		sseu;
> +}
> +
>   /* intel_hotplug.c */
>   void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
>   			   u32 pin_mask, u32 long_mask);
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index a5d98bda5c2e..7ba8a3ff744c 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1574,7 +1574,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
>    */
>   static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   					   u32 *reg_state,
> -					   const struct i915_oa_config *oa_config)
> +					   const struct i915_oa_config *oa_config,
> +					   union intel_sseu sseu)
>   {
>   	struct drm_i915_private *dev_priv = ctx->i915;
>   	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
> @@ -1620,6 +1621,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
> +
> +	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>   }
>   
>   /*
> @@ -1751,6 +1755,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   				       const struct i915_oa_config *oa_config)
>   {
>   	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	union intel_sseu default_sseu =
> +		intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
>   	struct i915_gem_context *ctx;
>   	int ret;
>   	unsigned int wait_flags = I915_WAIT_LOCKED;
> @@ -1795,7 +1801,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   		ce->state->obj->mm.dirty = true;
>   		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>   
> -		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
> +		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
> +					       oa_config ? default_sseu : ce->sseu);
>   
>   		i915_gem_object_unpin_map(ce->state->obj);
>   	}
> @@ -2167,14 +2174,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    u32 *reg_state)
>   {
> +	struct drm_i915_private *dev_priv = engine->i915;
>   	struct i915_perf_stream *stream;
>   
>   	if (engine->id != RCS)
>   		return;
>   
> -	stream = engine->i915->perf.oa.exclusive_stream;
> -	if (stream)
> -		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
> +	stream = dev_priv->perf.oa.exclusive_stream;
> +	if (stream) {
> +		union intel_sseu default_sseu =
> +			intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
> +
> +		gen8_update_reg_state_unlocked(ctx, reg_state,
> +					       stream->oa_config,
> +					       default_sseu);
> +	}
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f875be03eadb..8fb6e66a7a84 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2481,8 +2481,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
> -		     union intel_sseu ctx_sseu)
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   union intel_sseu ctx_sseu)
>   {
>   	u32 rpcs = 0;
>   
> @@ -2630,10 +2630,13 @@ static void execlists_init_reg_state(u32 *regs,
>   	}
>   
>   	if (rcs) {
> +		union intel_sseu sseu = ctx->__engine[engine->id].sseu;

to_intel_context

> +
>   		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> -			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> -				  ctx->__engine[engine->id].sseu));
> +			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> +				       intel_engine_prepare_sseu(engine,
> +								 sseu)));
>   
>   		i915_oa_init_reg_state(engine, ctx, regs);
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 1593194e930c..265da7228869 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,4 +104,7 @@ struct i915_gem_context;
>   
>   void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   union intel_sseu ctx_sseu);
> +
>   #endif /* _INTEL_LRC_H_ */
> 

If you'll want me to review this I'll need to study i915_perf.c a bit. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-22 18:00 ` [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-23 15:13   ` Tvrtko Ursulin
  2018-05-23 15:18     ` Chris Wilson
  2018-05-23 17:12     ` Lionel Landwerlin
  0 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 15:13 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 19:00, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> 	struct drm_i915_gem_context_param arg;
> 	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> 	                                                .instance = 0, };
> 
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	arg.value = (uintptr_t) &sseu;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		sseu.packed.subslice_mask = 0;
> 
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a global
>      dependency (Chris)
>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |  13 ++
>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>   drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c     |  20 +++
>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>   include/uapi/drm/i915_drm.h             |  38 ++++++
>   8 files changed, 314 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 21631b51b37b..09cfcfe1c339 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2067,6 +2067,12 @@ struct drm_i915_private {
>   		u32 active_requests;
>   		u32 request_serial;
>   
> +		/**
> +		 * Global barrier to ensuring ordering of sseu transitions
> +		 * requests.
> +		 */
> +		struct i915_gem_active global_barrier;
> +
>   		/**
>   		 * Is the GPU currently considered idle, or busy executing
>   		 * userspace requests? Whilst idle, we allow runtime power
> @@ -3227,6 +3233,13 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
>   	return container_of(vm, struct i915_hw_ppgtt, base);
>   }
>   
> +static inline void i915_gem_set_global_barrier(struct drm_i915_private *i915,
> +					       struct i915_request *rq)
> +{
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +	i915_gem_active_set(&i915->gt.global_barrier, rq);
> +}
> +
>   /* i915_gem_fence_reg.c */
>   struct drm_i915_fence_reg *
>   i915_reserve_fence(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 03874b50ada9..9c2a0d04bd39 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5548,6 +5548,8 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   	if (!dev_priv->priorities)
>   		goto err_dependencies;
>   
> +	init_request_active(&dev_priv->gt.global_barrier, NULL);
> +
>   	INIT_LIST_HEAD(&dev_priv->gt.timelines);
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>   	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index ea9ae1046827..5c5a12f1c265 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -730,6 +730,103 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	return 0;
>   }
>   
> +static int
> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +			  const struct drm_i915_gem_context_param_sseu *user_sseu,
> +			  union intel_sseu *ctx_sseu)
> +{
> +	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
> +	    user_sseu->slice_mask == 0)
> +		return -EINVAL;
> +
> +	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
> +	    user_sseu->subslice_mask == 0)
> +		return -EINVAL;
> +
> +	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
> +		return -EINVAL;
> +
> +	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
> +	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
> +	    user_sseu->max_eus_per_subslice == 0)
> +		return -EINVAL;
> +
> +	ctx_sseu->slice_mask = user_sseu->slice_mask;
> +	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
> +	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
> +	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
> +
> +	return 0;
> +}
> +
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +				  struct intel_engine_cs *engine,
> +				  union intel_sseu sseu)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +	struct i915_request *rq;
> +	struct intel_ring *ring;
> +	enum intel_engine_id id;
> +	int ret;
> +
> +	/*
> +	 * First notify user when this capability is not available so that it
> +	 * can be detected with any valid input.
> +	 */
> +	if (!engine->emit_rpcs_config)
> +		return -ENODEV;
> +
> +	if (memcmp(&to_intel_context(ctx, engine)->sseu,
> +		   &sseu, sizeof(sseu)) == 0) {
> +		return 0;

But union is still there so I don't know.

> +	}
> +
> +	lockdep_assert_held(&dev_priv->drm.struct_mutex);

Best to move above return -ENODEV line.

> +
> +	i915_retire_requests(dev_priv);
> +
> +	/* Now use the RCS to actually reconfigure. */
> +	engine = dev_priv->engine[RCS];
> +
> +	rq = i915_request_alloc(engine, dev_priv->kernel_context);
> +	if (IS_ERR(rq))
> +		return PTR_ERR(rq);
> +
> +	ret = engine->emit_rpcs_config(rq, ctx,
> +				       intel_engine_prepare_sseu(engine, sseu));
> +	if (ret) {
> +		__i915_request_add(rq, true);
> +		return ret;
> +	}
> +
> +	/* Queue this switch after all other activity */
> +	list_for_each_entry(ring, &dev_priv->gt.active_rings, active_link) {
> +		struct i915_request *prev;
> +
> +		prev = last_request_on_engine(ring->timeline, engine);
> +		if (prev)
> +			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +							 &prev->submit,
> +							 I915_FENCE_GFP);
> +	}
> +
> +	i915_gem_set_global_barrier(dev_priv, rq);
> +	__i915_request_add(rq, true);
> +
> +	/*
> +	 * Apply the configuration to all engine. Our hardware doesn't
> +	 * currently support different configurations for each engine.
> +	 */
> +	for_each_engine(engine, dev_priv, id) {
> +		struct intel_context *ce = to_intel_context(ctx, engine);
> +
> +		ce->sseu.value = sseu.value;
> +	}
> +
> +	return 0;
> +}
> +
>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file)
>   {
> @@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		args->value = ctx->sched.priority;
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU: {
> +		struct drm_i915_gem_context_param_sseu param_sseu;
> +		struct intel_engine_cs *engine;
> +		struct intel_context *ce;
> +
> +		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +				   sizeof(param_sseu))) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		engine = intel_engine_lookup_user(to_i915(dev),
> +						  param_sseu.class,
> +						  param_sseu.instance);
> +		if (!engine) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		ce = to_intel_context(ctx, engine);
> +
> +		param_sseu.slice_mask = ce->sseu.slice_mask;
> +		param_sseu.subslice_mask = ce->sseu.subslice_mask;
> +		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> +		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> +
> +		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +				 sizeof(param_sseu)))
> +			ret = -EFAULT;
> +		break;

Should we think about maybe not implementing the getter? I mean, is it 
useful or just code for the driver which could be dropped?

> +	}
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -841,7 +969,46 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				ctx->sched.priority = priority;
>   		}
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		{
> +			struct drm_i915_private *dev_priv = to_i915(dev);
> +			struct drm_i915_gem_context_param_sseu user_sseu;
> +			struct intel_engine_cs *engine;
> +			union intel_sseu ctx_sseu;
>   
> +			if (args->size) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			if (!capable(CAP_SYS_ADMIN)) {

Hm not sure, will come back to it in the next patch.

> +				ret = -EPERM;
> +				break;
> +			}
> +
> +			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +					   sizeof(user_sseu))) {
> +				ret = -EFAULT;
> +				break;
> +			}
> +
> +			engine = intel_engine_lookup_user(dev_priv,
> +							  user_sseu.class,
> +							  user_sseu.instance);
> +			if (!engine) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +							&user_sseu, &ctx_sseu);
> +			if (ret)
> +				break;
> +
> +			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
> +								ctx_sseu);
> +		}
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index fc499bcbd105..9f0b965125c4 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -643,6 +643,22 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>   	return NOTIFY_DONE;
>   }
>   
> +static int
> +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> +
> +static int add_global_barrier(struct i915_request *rq)
> +{
> +	struct i915_request *barrier;
> +	int ret = 0;
> +
> +	barrier = i915_gem_active_raw(&rq->i915->gt.global_barrier,
> +				      &rq->i915->drm.struct_mutex);
> +	if (barrier)
> +		ret = i915_request_await_request(rq, barrier);
> +
> +	return ret;
> +}
> +
>   /**
>    * i915_request_alloc - allocate a request structure
>    *
> @@ -804,6 +820,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 */
>   	rq->head = rq->ring->emit;
>   
> +	ret = add_global_barrier(rq);
> +	if (ret)
> +		goto err_unwind;

Who ever clears the barrier?

> +
>   	/* Unconditionally invalidate GPU caches and TLBs. */
>   	ret = engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8fb6e66a7a84..e52c9511b5fb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2240,6 +2240,72 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   }
>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>   
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   union intel_sseu ctx_sseu)
> +{
> +	u32 rpcs = 0;
> +
> +	/*
> +	 * Starting in Gen9, render power gating can leave
> +	 * slice/subslice/EU in a partially enabled state. We
> +	 * must make an explicit request through RPCS for full
> +	 * enablement.
> +	 */
> +	if (sseu->has_slice_pg) {
> +		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> +		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	if (sseu->has_subslice_pg) {
> +		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> +		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
> +			GEN8_RPCS_SS_CNT_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	if (sseu->has_eu_pg) {
> +		rpcs |= ctx_sseu.min_eus_per_subslice <<
> +			GEN8_RPCS_EU_MIN_SHIFT;
> +		rpcs |= ctx_sseu.max_eus_per_subslice <<
> +			GEN8_RPCS_EU_MAX_SHIFT;
> +		rpcs |= GEN8_RPCS_ENABLE;
> +	}
> +
> +	return rpcs;
> +}
> +
> +static int gen8_emit_rpcs_config(struct i915_request *rq,
> +				 struct i915_gem_context *ctx,
> +				 union intel_sseu sseu)
> +{
> +	struct drm_i915_private *dev_priv = rq->i915;
> +	struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]);
> +	u64 offset;
> +	u32 *cs;
> +
> +	/* Let the deferred state allocation take care of this. */
> +	if (!ce->state)
> +		return 0;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	offset = ce->state->node.start +
> +		LRC_STATE_PN * PAGE_SIZE +
> +		(CTX_R_PWR_CLK_STATE + 1) * 4;
> +
> +	*cs++ = MI_STORE_DWORD_IMM_GEN4;
> +	*cs++ = lower_32_bits(offset);
> +	*cs++ = upper_32_bits(offset);
> +	*cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
>   static int gen8_init_rcs_context(struct i915_request *rq)
>   {
>   	int ret;
> @@ -2333,6 +2399,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->emit_breadcrumb = gen8_emit_breadcrumb;
>   	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>   
> +	engine->emit_rpcs_config = gen8_emit_rpcs_config;
> +
>   	engine->set_default_submission = execlists_set_default_submission;
>   
>   	if (INTEL_GEN(engine->i915) < 11) {
> @@ -2481,41 +2549,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> -		   union intel_sseu ctx_sseu)
> -{
> -	u32 rpcs = 0;
> -
> -	/*
> -	 * Starting in Gen9, render power gating can leave
> -	 * slice/subslice/EU in a partially enabled state. We
> -	 * must make an explicit request through RPCS for full
> -	 * enablement.
> -	*/
> -	if (sseu->has_slice_pg) {
> -		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> -		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	if (sseu->has_subslice_pg) {
> -		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> -		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
> -			GEN8_RPCS_SS_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	if (sseu->has_eu_pg) {
> -		rpcs |= ctx_sseu.min_eus_per_subslice <<
> -			GEN8_RPCS_EU_MIN_SHIFT;
> -		rpcs |= ctx_sseu.max_eus_per_subslice <<
> -			GEN8_RPCS_EU_MAX_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> -	}
> -
> -	return rpcs;
> -}
> -
>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
>   {
>   	u32 indirect_ctx_offset;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 001cf6bcb349..643466d4fa2d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2061,6 +2061,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   			engine->emit_breadcrumb_sz++;
>   	}
>   
> +	engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
> +
>   	engine->set_default_submission = i9xx_set_default_submission;
>   
>   	if (INTEL_GEN(dev_priv) >= 6)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index acef385c4c80..6bf4d3b57ced 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -456,6 +456,10 @@ struct intel_engine_cs {
>   	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>   	int		emit_breadcrumb_sz;
>   
> +	int		(*emit_rpcs_config)(struct i915_request *rq,
> +					    struct i915_gem_context *ctx,
> +					    union intel_sseu sseu);
> +
>   	/* Pass the request to the hardware queue (e.g. directly into
>   	 * the legacy ringbuffer or to the end of an execlist).
>   	 *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 7f5634ce8e88..24b90836ce1d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +	/*
> +	 * When using the following param, value should be a pointer to
> +	 * drm_i915_gem_context_param_sseu.
> +	 */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
>   	__u64 value;
>   };
>   
> +struct drm_i915_gem_context_param_sseu {
> +	/*
> +	 * Engine class & instance to be configured or queried.
> +	 */
> +	__u32 class;
> +	__u32 instance;

Chris and I were talking about whether u32 is overkill and we should 
settle for u16:u16 for class:instance. I think 16-bit should be enough. 
But it can also be u32, I don't think there are any real downsides here 
unless we want to be consistent in all places.

> +
> +	/*
> +	 * Mask of slices to enable for the context. Valid values are a subset
> +	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
> +	 */
> +	__u8 slice_mask;
> +
> +	/*
> +	 * Mask of subslices to enable for the context. Valid values are a
> +	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
> +	 */
> +	__u8 subslice_mask;

Is this future proof enough, say for Gen11?

> +
> +	/*
> +	 * Minimum/Maximum number of EUs to enable per subslice for the
> +	 * context. min_eus_per_subslice must be inferior or equal to
> +	 * max_eus_per_subslice.
> +	 */
> +	__u8 min_eus_per_subslice;
> +	__u8 max_eus_per_subslice;
> +
> +	/*
> +	 * Unused for now. Must be cleared to zero.
> +	 */
> +	__u32 rsvd;
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-23 15:13   ` Tvrtko Ursulin
@ 2018-05-23 15:18     ` Chris Wilson
  2018-05-23 17:12     ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-23 15:18 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-23 16:13:38)
> 
> On 22/05/2018 19:00, Lionel Landwerlin wrote:
> > @@ -804,6 +820,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >        */
> >       rq->head = rq->ring->emit;
> >   
> > +     ret = add_global_barrier(rq);
> > +     if (ret)
> > +             goto err_unwind;
> 
> Who ever clears the barrier?

Automatically cleared on request retirement (as it is hooked in with a
i915_gem_active).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-22 18:00 ` [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs Lionel Landwerlin
@ 2018-05-23 15:30   ` Tvrtko Ursulin
  2018-05-23 17:33     ` Lionel Landwerlin
  2018-05-23 17:37     ` Lionel Landwerlin
  0 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-23 15:30 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 22/05/2018 19:00, Lionel Landwerlin wrote:
> There are concerns about denial of service around the per context sseu
> configuration capability. In a previous commit introducing the
> capability we allowed it only for capable users. This changes adds a
> new debugfs entry to let any user configure its own context
> powergating setup.
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |  5 +++
>   drivers/gpu/drm/i915/i915_gem_context.c | 52 ++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_sysfs.c       | 30 ++++++++++++++
>   3 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 09cfcfe1c339..0fccec29fdda 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1843,6 +1843,8 @@ struct drm_i915_private {
>   		struct ida hw_ida;
>   #define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
>   #define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */
> +
> +		bool allow_sseu;

I think spelling out "dynamic" in patch titles, commit messages and even 
in this variable name would be better and clearer.

>   	} contexts;
>   
>   	u32 fdi_rx_config;
> @@ -3274,6 +3276,9 @@ i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
>   	return ctx;
>   }
>   
> +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private *dev_priv, bool allowed);
> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private *dev_priv);
> +
>   int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   			 struct drm_file *file);
>   int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 5c5a12f1c265..815a9d1c29f3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				break;
>   			}
>   
> -			if (!capable(CAP_SYS_ADMIN)) {
> +			if (!dev_priv->contexts.allow_sseu &&
> +			    !capable(CAP_SYS_ADMIN)) {

So the thing I mentioned in the previous patch. My thinking was not to 
fail it if sysfs toggle is disabled, but just don't do dynamic 
switching. That way the software stack works in all cases but user has 
the option of tuning his system.

I mean it can still be done but in userspace. Set param fails, userspace 
goes for a fall back.

Only difference is I guess whether it is useful to allow switching at 
runtime. I am imagining running some 3d and media, and then toggling the 
knob to see what happens to power and performance on the fly. But maybe 
that is not so interesting.

Along the same lines I was thinking CAP_SYS_ADMIN limitation could be 
dropped.

Both points are for a wider discussion I guess.

>   				ret = -EPERM;
>   				break;
>   			}
> @@ -1058,6 +1059,55 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>   	return ret;
>   }
>   
> +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private *dev_priv,
> +				     bool allowed)
> +{
> +	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	int ret = 0;
> +
> +	if (!engine->emit_rpcs_config)
> +		return -ENODEV;
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);

This should be interruptible so user can Ctrl-C if it is stuck.

> +
> +	/*
> +	 * When we allow each context to configure its powergating
> +	 * configuration, there is no need to put the configurations back to
> +	 * the default, it should already be the case.
> +	 */
> +	if (!allowed) {
> +		union intel_sseu default_sseu =
> +			intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
> +		struct i915_gem_context *ctx;
> +
> +		list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> +			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
> +								default_sseu);
> +			if (ret)
> +				break;
> +		}
> +	}
> +
> +	dev_priv->contexts.allow_sseu = allowed;
> +
> +	mutex_unlock(&dev_priv->drm.struct_mutex);
> +	return ret;
> +}
> +
> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	bool ret;
> +
> +	if (!engine->emit_rpcs_config)
> +		return false;
> +
> +	mutex_lock(&dev_priv->drm.struct_mutex);
> +	ret = dev_priv->contexts.allow_sseu;
> +	mutex_unlock(&dev_priv->drm.struct_mutex);

I guess this mutex does nothing in this case.

> +	return ret;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/mock_context.c"
>   #include "selftests/i915_gem_context.c"
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index e5e6f6bb2b05..9fd15b138ac9 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -483,6 +483,34 @@ static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr
>   	return snprintf(buf, PAGE_SIZE, "%d\n", val);
>   }
>   
> +static ssize_t gem_allow_sseu_show(struct device *kdev,
> +				   struct device_attribute *attr, char *buf)
> +{
> +	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
> +	int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
> +
> +	return snprintf(buf, PAGE_SIZE, "%d\n", ret);

Propagate ENODEV all the way by making i915_gem_contexts_get_allow_sseu 
return an int?

> +}
> +
> +static ssize_t gem_allow_sseu_store(struct device *kdev,
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t count)
> +{
> +	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
> +	u32 val;
> +	ssize_t ret;

Chris will complain about Christmas trees. :)

(And possible about new use of dev_priv throughout the series but that 
battle is in stale mate.)

> +
> +	ret = kstrtou32(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_contexts_set_allow_sseu(dev_priv, val != 0);

I would enforce 0 or 1 to allow (hypothetical) future expansion with 
more values.

> +
> +	return ret ?: count;
> +}
> +
> +static DEVICE_ATTR_RW(gem_allow_sseu);

I would like to open up a round of bike-shedding of the knob name. :)

How about allow_dynamic_sseu? gem_ or not, don't know.

> +
>   static const struct attribute *gen6_attrs[] = {
>   	&dev_attr_gt_act_freq_mhz.attr,
>   	&dev_attr_gt_cur_freq_mhz.attr,
> @@ -492,6 +520,7 @@ static const struct attribute *gen6_attrs[] = {
>   	&dev_attr_gt_RP0_freq_mhz.attr,
>   	&dev_attr_gt_RP1_freq_mhz.attr,
>   	&dev_attr_gt_RPn_freq_mhz.attr,
> +	&dev_attr_gem_allow_sseu.attr,
>   	NULL,
>   };
>   
> @@ -505,6 +534,7 @@ static const struct attribute *vlv_attrs[] = {
>   	&dev_attr_gt_RP1_freq_mhz.attr,
>   	&dev_attr_gt_RPn_freq_mhz.attr,
>   	&dev_attr_vlv_rpe_freq_mhz.attr,
> +	&dev_attr_gem_allow_sseu.attr,
>   	NULL,
>   };
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine
  2018-05-23 14:54   ` Tvrtko Ursulin
  2018-05-23 14:58     ` Chris Wilson
@ 2018-05-23 15:52     ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-23 15:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 23/05/18 15:54, Tvrtko Ursulin wrote:
>
> On 22/05/2018 18:59, Lionel Landwerlin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to expose the ability to reconfigure the slices, subslice and
>> eu per context and per engine. To facilitate that, store the current
>> configuration on the context for each engine, which is initially set
>> to the device default upon creation.
>>
>> v2: record sseu configuration per context & engine (Chris)
>>
>> v3: introduce the i915_gem_context_sseu to store powergating
>>      programming, sseu_dev_info has grown quite a bit (Lionel)
>>
>> v4: rename i915_gem_sseu into intel_sseu (Chris)
>>      use to_intel_context() (Chris)
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c |  9 +++++++++
>>   drivers/gpu/drm/i915/i915_gem_context.h | 17 +++++++++++++++++
>>   drivers/gpu/drm/i915/i915_request.h     | 13 +++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 22 +++++++++++-----------
>>   4 files changed, 50 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index b69b18ef8120..ea9ae1046827 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -260,6 +260,8 @@ __create_hw_context(struct drm_i915_private 
>> *dev_priv,
>>               struct drm_i915_file_private *file_priv)
>>   {
>>       struct i915_gem_context *ctx;
>> +    struct intel_engine_cs *engine;
>> +    enum intel_engine_id id;
>>       unsigned int n;
>>       int ret;
>>   @@ -315,6 +317,13 @@ __create_hw_context(struct drm_i915_private 
>> *dev_priv,
>>        * is no remap info, it will be a NOP. */
>>       ctx->remap_slice = ALL_L3_SLICES(dev_priv);
>>   +    /* On all engines, use the whole device by default */
>> +    for_each_engine(engine, dev_priv, id) {
>> +        struct intel_context *ce = to_intel_context(ctx, engine);
>> +
>> +        ce->sseu = 
>> intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
>> +    }
>> +
>>       i915_gem_context_set_bannable(ctx);
>>       ctx->ring_size = 4 * PAGE_SIZE;
>>       ctx->desc_template =
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h 
>> b/drivers/gpu/drm/i915/i915_gem_context.h
>> index c3262b4dd2ee..b18d870c4932 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
>> @@ -30,6 +30,7 @@
>>   #include <linux/radix-tree.h>
>>     #include "i915_gem.h"
>> +#include "intel_device_info.h"
>>     struct pid;
>>   @@ -157,6 +158,9 @@ struct i915_gem_context {
>>           int pin_count;
>>             const struct intel_context_ops *ops;
>> +
>> +        /** sseu: Control eu/slice partitioning */
>> +        union intel_sseu sseu;
>>       } __engine[I915_NUM_ENGINES];
>>         /** ring_size: size for allocating the per-engine ring buffer */
>> @@ -335,4 +339,17 @@ static inline void i915_gem_context_put(struct 
>> i915_gem_context *ctx)
>>       kref_put(&ctx->ref, i915_gem_context_release);
>>   }
>>   +static inline union intel_sseu
>> +intel_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
>> +{
>> +    union intel_sseu value = {
>> +        .slice_mask = sseu->slice_mask,
>> +        .subslice_mask = sseu->subslice_mask[0],
>> +        .min_eus_per_subslice = sseu->max_eus_per_subslice,
>> +        .max_eus_per_subslice = sseu->max_eus_per_subslice,
>> +    };
>> +
>
> Why do we need two internal representations for SSEU configuration?
>
> (Comes back later.) Ok, the INTEL_INFO one is much larger and 
> intel_sseu presumably is designed to only contain data which is 
> dynamically changeable?

Correct. Copied in each context.

>
>> +    return value;
>> +}
>> +
>>   #endif /* !__I915_GEM_CONTEXT_H__ */
>> diff --git a/drivers/gpu/drm/i915/i915_request.h 
>> b/drivers/gpu/drm/i915/i915_request.h
>> index 1bbbb7a9fa03..aca60895582b 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -39,6 +39,19 @@ struct drm_i915_gem_object;
>>   struct i915_request;
>>   struct i915_timeline;
>>   +/*
>> + * Powergating configuration for a particular (context,engine).
>> + */
>> +union intel_sseu {
>> +    struct {
>> +        u8 slice_mask;
>> +        u8 subslice_mask;
>> +        u8 min_eus_per_subslice;
>> +        u8 max_eus_per_subslice;
>> +    };
>> +    u64 value;
>
> Why is the union needed?

Oh yeah, I should throw it out.

>
>> +};
>> +
>>   struct intel_wait {
>>       struct rb_node node;
>>       struct task_struct *tsk;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index c2500c209c63..f875be03eadb 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2481,8 +2481,8 @@ int logical_xcs_ring_init(struct 
>> intel_engine_cs *engine)
>>       return logical_ring_init(engine);
>>   }
>>   -static u32
>> -make_rpcs(struct drm_i915_private *dev_priv)
>> +static u32 make_rpcs(const struct sseu_dev_info *sseu,
>> +             union intel_sseu ctx_sseu)
>>   {
>>       u32 rpcs = 0;
>>   @@ -2492,24 +2492,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
>>        * must make an explicit request through RPCS for full
>>        * enablement.
>>       */
>> -    if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
>> +    if (sseu->has_slice_pg) {
>>           rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> -        rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
>> -            GEN8_RPCS_S_CNT_SHIFT;
>> +        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>>           rpcs |= GEN8_RPCS_ENABLE;
>>       }
>>   -    if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
>> +    if (sseu->has_subslice_pg) {
>>           rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> -        rpcs |= 
>> hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
>> +        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>>               GEN8_RPCS_SS_CNT_SHIFT;
>>           rpcs |= GEN8_RPCS_ENABLE;
>>       }
>>   -    if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
>> -        rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
>> +    if (sseu->has_eu_pg) {
>> +        rpcs |= ctx_sseu.min_eus_per_subslice <<
>>               GEN8_RPCS_EU_MIN_SHIFT;
>> -        rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
>> +        rpcs |= ctx_sseu.max_eus_per_subslice <<
>>               GEN8_RPCS_EU_MAX_SHIFT;
>>           rpcs |= GEN8_RPCS_ENABLE;
>>       }
>> @@ -2633,7 +2632,8 @@ static void execlists_init_reg_state(u32 *regs,
>>       if (rcs) {
>>           regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>           CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> -            make_rpcs(dev_priv));
>> +            make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> +                  ctx->__engine[engine->id].sseu));
>
> to_intel_context(ctx, engine)->sseu ?

Thanks.

>
>>             i915_oa_init_reg_state(engine, ctx, regs);
>>       }
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-05-23 14:57   ` Tvrtko Ursulin
@ 2018-05-23 15:54     ` Lionel Landwerlin
  2018-05-24 10:33       ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-23 15:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 23/05/18 15:57, Tvrtko Ursulin wrote:
>
> On 22/05/2018 18:59, Lionel Landwerlin wrote:
>> Abstract the context image access a bit.
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
>>   1 file changed, 16 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 805dfc732bba..a5d98bda5c2e 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -210,6 +210,7 @@
>>   #include "i915_oa_cflgt3.h"
>>   #include "i915_oa_cnl.h"
>>   #include "i915_oa_icl.h"
>> +#include "intel_lrc_reg.h"
>>     /* HW requires this to be a power of two, between 128k and 16M, 
>> though driver
>>    * is currently generally designed assuming the largest 16M size is 
>> used such
>> @@ -1579,27 +1580,25 @@ static void 
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>       u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>>       u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
>>       /* The MMIO offsets for Flex EU registers aren't contiguous */
>> -    u32 flex_mmio[] = {
>> -        i915_mmio_reg_offset(EU_PERF_CNTL0),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL1),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL2),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL3),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL4),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL5),
>> -        i915_mmio_reg_offset(EU_PERF_CNTL6),
>> +    i915_reg_t flex_regs[] = {
>> +        EU_PERF_CNTL0,
>> +        EU_PERF_CNTL1,
>> +        EU_PERF_CNTL2,
>> +        EU_PERF_CNTL3,
>> +        EU_PERF_CNTL4,
>> +        EU_PERF_CNTL5,
>> +        EU_PERF_CNTL6,
>>       };
>>       int i;
>>   -    reg_state[ctx_oactxctrl] = 
>> i915_mmio_reg_offset(GEN8_OACTXCONTROL);
>> -    reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
>> -                      GEN8_OA_TIMER_PERIOD_SHIFT) |
>> -                     (dev_priv->perf.oa.periodic ?
>> -                      GEN8_OA_TIMER_ENABLE : 0) |
>> -                     GEN8_OA_COUNTER_RESUME;
>> +    CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>> +        (dev_priv->perf.oa.period_exponent << 
>> GEN8_OA_TIMER_PERIOD_SHIFT) |
>> +        (dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
>> +        GEN8_OA_COUNTER_RESUME);
>>   -    for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
>> +    for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
>>           u32 state_offset = ctx_flexeu0 + i * 2;
>> -        u32 mmio = flex_mmio[i];
>> +        u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
>>             /*
>>            * This arbitrary default will select the 'EU FPU0 Pipeline
>> @@ -1619,8 +1618,7 @@ static void 
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>               }
>>           }
>>   -        reg_state[state_offset] = mmio;
>> -        reg_state[state_offset+1] = value;
>> +        CTX_REG(reg_state, state_offset, flex_regs[i], value);
>
> Does flex_regs[i] needs to be mmio?

Yeah, the macro uses i915_mmio_reg_offset().

>
>>       }
>>   }
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-23 15:13   ` Tvrtko Ursulin
  2018-05-23 15:18     ` Chris Wilson
@ 2018-05-23 17:12     ` Lionel Landwerlin
  2018-05-24 10:43       ` Tvrtko Ursulin
  1 sibling, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-23 17:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 23/05/18 16:13, Tvrtko Ursulin wrote:
>
> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>     struct drm_i915_gem_context_param arg;
>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                     .instance = 0, };
>>
>>     memset(&arg, 0, sizeof(arg));
>>     arg.ctx_id = ctx;
>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>     arg.value = (uintptr_t) &sseu;
>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>         sseu.packed.subslice_mask = 0;
>>
>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>     }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>> (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities 
>> (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel 
>> context (Chris)
>>
>> v7: Synchronize the requests following a powergating setting change 
>> using a global
>>      dependency (Chris)
>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>      Disable RPCS configuration setting for non capable users 
>> (Lionel/Tvrtko)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |  13 ++
>>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>>   drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_request.c     |  20 +++
>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>   8 files changed, 314 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 21631b51b37b..09cfcfe1c339 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2067,6 +2067,12 @@ struct drm_i915_private {
>>           u32 active_requests;
>>           u32 request_serial;
>>   +        /**
>> +         * Global barrier to ensuring ordering of sseu transitions
>> +         * requests.
>> +         */
>> +        struct i915_gem_active global_barrier;
>> +
>>           /**
>>            * Is the GPU currently considered idle, or busy executing
>>            * userspace requests? Whilst idle, we allow runtime power
>> @@ -3227,6 +3233,13 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
>>       return container_of(vm, struct i915_hw_ppgtt, base);
>>   }
>>   +static inline void i915_gem_set_global_barrier(struct 
>> drm_i915_private *i915,
>> +                           struct i915_request *rq)
>> +{
>> +    lockdep_assert_held(&i915->drm.struct_mutex);
>> +    i915_gem_active_set(&i915->gt.global_barrier, rq);
>> +}
>> +
>>   /* i915_gem_fence_reg.c */
>>   struct drm_i915_fence_reg *
>>   i915_reserve_fence(struct drm_i915_private *dev_priv);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 03874b50ada9..9c2a0d04bd39 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -5548,6 +5548,8 @@ int i915_gem_init_early(struct drm_i915_private 
>> *dev_priv)
>>       if (!dev_priv->priorities)
>>           goto err_dependencies;
>>   +    init_request_active(&dev_priv->gt.global_barrier, NULL);
>> +
>>       INIT_LIST_HEAD(&dev_priv->gt.timelines);
>>       INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>>       INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index ea9ae1046827..5c5a12f1c265 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -730,6 +730,103 @@ int i915_gem_context_destroy_ioctl(struct 
>> drm_device *dev, void *data,
>>       return 0;
>>   }
>>   +static int
>> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
>> +              union intel_sseu *ctx_sseu)
>> +{
>> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
>> +        user_sseu->slice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
>> +        user_sseu->subslice_mask == 0)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
>> +        return -EINVAL;
>> +
>> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice < 
>> user_sseu->min_eus_per_subslice ||
>> +        user_sseu->max_eus_per_subslice == 0)
>> +        return -EINVAL;
>> +
>> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
>> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
>> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
>> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct i915_request *rq;
>> +    struct intel_ring *ring;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    /*
>> +     * First notify user when this capability is not available so 
>> that it
>> +     * can be detected with any valid input.
>> +     */
>> +    if (!engine->emit_rpcs_config)
>> +        return -ENODEV;
>> +
>> +    if (memcmp(&to_intel_context(ctx, engine)->sseu,
>> +           &sseu, sizeof(sseu)) == 0) {
>> +        return 0;
>
> But union is still there so I don't know.

Moving to struct.

>
>> +    }
>> +
>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>
> Best to move above return -ENODEV line.

Done.

>
>> +
>> +    i915_retire_requests(dev_priv);
>> +
>> +    /* Now use the RCS to actually reconfigure. */
>> +    engine = dev_priv->engine[RCS];
>> +
>> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
>> +    if (IS_ERR(rq))
>> +        return PTR_ERR(rq);
>> +
>> +    ret = engine->emit_rpcs_config(rq, ctx,
>> +                       intel_engine_prepare_sseu(engine, sseu));
>> +    if (ret) {
>> +        __i915_request_add(rq, true);
>> +        return ret;
>> +    }
>> +
>> +    /* Queue this switch after all other activity */
>> +    list_for_each_entry(ring, &dev_priv->gt.active_rings, 
>> active_link) {
>> +        struct i915_request *prev;
>> +
>> +        prev = last_request_on_engine(ring->timeline, engine);
>> +        if (prev)
>> + i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                             &prev->submit,
>> +                             I915_FENCE_GFP);
>> +    }
>> +
>> +    i915_gem_set_global_barrier(dev_priv, rq);
>> +    __i915_request_add(rq, true);
>> +
>> +    /*
>> +     * Apply the configuration to all engine. Our hardware doesn't
>> +     * currently support different configurations for each engine.
>> +     */
>> +    for_each_engine(engine, dev_priv, id) {
>> +        struct intel_context *ce = to_intel_context(ctx, engine);
>> +
>> +        ce->sseu.value = sseu.value;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void 
>> *data,
>>                       struct drm_file *file)
>>   {
>> @@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct 
>> drm_device *dev, void *data,
>>       case I915_CONTEXT_PARAM_PRIORITY:
>>           args->value = ctx->sched.priority;
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU: {
>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>> +        struct intel_engine_cs *engine;
>> +        struct intel_context *ce;
>> +
>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                   sizeof(param_sseu))) {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        engine = intel_engine_lookup_user(to_i915(dev),
>> +                          param_sseu.class,
>> +                          param_sseu.instance);
>> +        if (!engine) {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        ce = to_intel_context(ctx, engine);
>> +
>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>> +        param_sseu.min_eus_per_subslice = 
>> ce->sseu.min_eus_per_subslice;
>> +        param_sseu.max_eus_per_subslice = 
>> ce->sseu.max_eus_per_subslice;
>> +
>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                 sizeof(param_sseu)))
>> +            ret = -EFAULT;
>> +        break;
>
> Should we think about maybe not implementing the getter? I mean, is it 
> useful or just code for the driver which could be dropped?

Well, render fd can be transfer between processes, so I think it makes 
sense to have a way to tell what is the current configuration.

>
>> +    }
>>       default:
>>           ret = -EINVAL;
>>           break;
>> @@ -841,7 +969,46 @@ int i915_gem_context_setparam_ioctl(struct 
>> drm_device *dev, void *data,
>>                   ctx->sched.priority = priority;
>>           }
>>           break;
>> +    case I915_CONTEXT_PARAM_SSEU:
>> +        {
>> +            struct drm_i915_private *dev_priv = to_i915(dev);
>> +            struct drm_i915_gem_context_param_sseu user_sseu;
>> +            struct intel_engine_cs *engine;
>> +            union intel_sseu ctx_sseu;
>>   +            if (args->size) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            if (!capable(CAP_SYS_ADMIN)) {
>
> Hm not sure, will come back to it in the next patch.
>
>> +                ret = -EPERM;
>> +                break;
>> +            }
>> +
>> +            if (copy_from_user(&user_sseu, 
>> u64_to_user_ptr(args->value),
>> +                       sizeof(user_sseu))) {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            engine = intel_engine_lookup_user(dev_priv,
>> +                              user_sseu.class,
>> +                              user_sseu.instance);
>> +            if (!engine) {
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            ret = 
>> intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>> +                            &user_sseu, &ctx_sseu);
>> +            if (ret)
>> +                break;
>> +
>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>> +                                ctx_sseu);
>> +        }
>> +        break;
>>       default:
>>           ret = -EINVAL;
>>           break;
>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
>> b/drivers/gpu/drm/i915/i915_request.c
>> index fc499bcbd105..9f0b965125c4 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -643,6 +643,22 @@ submit_notify(struct i915_sw_fence *fence, enum 
>> i915_sw_fence_notify state)
>>       return NOTIFY_DONE;
>>   }
>>   +static int
>> +i915_request_await_request(struct i915_request *to, struct 
>> i915_request *from);
>> +
>> +static int add_global_barrier(struct i915_request *rq)
>> +{
>> +    struct i915_request *barrier;
>> +    int ret = 0;
>> +
>> +    barrier = i915_gem_active_raw(&rq->i915->gt.global_barrier,
>> +                      &rq->i915->drm.struct_mutex);
>> +    if (barrier)
>> +        ret = i915_request_await_request(rq, barrier);
>> +
>> +    return ret;
>> +}
>> +
>>   /**
>>    * i915_request_alloc - allocate a request structure
>>    *
>> @@ -804,6 +820,10 @@ i915_request_alloc(struct intel_engine_cs 
>> *engine, struct i915_gem_context *ctx)
>>        */
>>       rq->head = rq->ring->emit;
>>   +    ret = add_global_barrier(rq);
>> +    if (ret)
>> +        goto err_unwind;
>
> Who ever clears the barrier?

It's cleared when the request is retired.

>
>> +
>>       /* Unconditionally invalidate GPU caches and TLBs. */
>>       ret = engine->emit_flush(rq, EMIT_INVALIDATE);
>>       if (ret)
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 8fb6e66a7a84..e52c9511b5fb 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2240,6 +2240,72 @@ static void gen8_emit_breadcrumb_rcs(struct 
>> i915_request *request, u32 *cs)
>>   }
>>   static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
>>   +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           union intel_sseu ctx_sseu)
>> +{
>> +    u32 rpcs = 0;
>> +
>> +    /*
>> +     * Starting in Gen9, render power gating can leave
>> +     * slice/subslice/EU in a partially enabled state. We
>> +     * must make an explicit request through RPCS for full
>> +     * enablement.
>> +     */
>> +    if (sseu->has_slice_pg) {
>> +        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_subslice_pg) {
>> +        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> +        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> +            GEN8_RPCS_SS_CNT_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    if (sseu->has_eu_pg) {
>> +        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MIN_SHIFT;
>> +        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> +            GEN8_RPCS_EU_MAX_SHIFT;
>> +        rpcs |= GEN8_RPCS_ENABLE;
>> +    }
>> +
>> +    return rpcs;
>> +}
>> +
>> +static int gen8_emit_rpcs_config(struct i915_request *rq,
>> +                 struct i915_gem_context *ctx,
>> +                 union intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = rq->i915;
>> +    struct intel_context *ce = to_intel_context(ctx, 
>> dev_priv->engine[RCS]);
>> +    u64 offset;
>> +    u32 *cs;
>> +
>> +    /* Let the deferred state allocation take care of this. */
>> +    if (!ce->state)
>> +        return 0;
>> +
>> +    cs = intel_ring_begin(rq, 4);
>> +    if (IS_ERR(cs))
>> +        return PTR_ERR(cs);
>> +
>> +    offset = ce->state->node.start +
>> +        LRC_STATE_PN * PAGE_SIZE +
>> +        (CTX_R_PWR_CLK_STATE + 1) * 4;
>> +
>> +    *cs++ = MI_STORE_DWORD_IMM_GEN4;
>> +    *cs++ = lower_32_bits(offset);
>> +    *cs++ = upper_32_bits(offset);
>> +    *cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>> +
>> +    intel_ring_advance(rq, cs);
>> +
>> +    return 0;
>> +}
>> +
>>   static int gen8_init_rcs_context(struct i915_request *rq)
>>   {
>>       int ret;
>> @@ -2333,6 +2399,8 @@ logical_ring_default_vfuncs(struct 
>> intel_engine_cs *engine)
>>       engine->emit_breadcrumb = gen8_emit_breadcrumb;
>>       engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
>>   +    engine->emit_rpcs_config = gen8_emit_rpcs_config;
>> +
>>       engine->set_default_submission = execlists_set_default_submission;
>>         if (INTEL_GEN(engine->i915) < 11) {
>> @@ -2481,41 +2549,6 @@ int logical_xcs_ring_init(struct 
>> intel_engine_cs *engine)
>>       return logical_ring_init(engine);
>>   }
>>   -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> -           union intel_sseu ctx_sseu)
>> -{
>> -    u32 rpcs = 0;
>> -
>> -    /*
>> -     * Starting in Gen9, render power gating can leave
>> -     * slice/subslice/EU in a partially enabled state. We
>> -     * must make an explicit request through RPCS for full
>> -     * enablement.
>> -    */
>> -    if (sseu->has_slice_pg) {
>> -        rpcs |= GEN8_RPCS_S_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_subslice_pg) {
>> -        rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
>> -        rpcs |= hweight8(ctx_sseu.subslice_mask) <<
>> -            GEN8_RPCS_SS_CNT_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    if (sseu->has_eu_pg) {
>> -        rpcs |= ctx_sseu.min_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MIN_SHIFT;
>> -        rpcs |= ctx_sseu.max_eus_per_subslice <<
>> -            GEN8_RPCS_EU_MAX_SHIFT;
>> -        rpcs |= GEN8_RPCS_ENABLE;
>> -    }
>> -
>> -    return rpcs;
>> -}
>> -
>>   static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs 
>> *engine)
>>   {
>>       u32 indirect_ctx_offset;
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 001cf6bcb349..643466d4fa2d 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2061,6 +2061,8 @@ static void intel_ring_default_vfuncs(struct 
>> drm_i915_private *dev_priv,
>>               engine->emit_breadcrumb_sz++;
>>       }
>>   +    engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
>> +
>>       engine->set_default_submission = i9xx_set_default_submission;
>>         if (INTEL_GEN(dev_priv) >= 6)
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index acef385c4c80..6bf4d3b57ced 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -456,6 +456,10 @@ struct intel_engine_cs {
>>       void        (*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
>>       int        emit_breadcrumb_sz;
>>   +    int        (*emit_rpcs_config)(struct i915_request *rq,
>> +                        struct i915_gem_context *ctx,
>> +                        union intel_sseu sseu);
>> +
>>       /* Pass the request to the hardware queue (e.g. directly into
>>        * the legacy ringbuffer or to the end of an execlist).
>>        *
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 7f5634ce8e88..24b90836ce1d 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>> +    /*
>> +     * When using the following param, value should be a pointer to
>> +     * drm_i915_gem_context_param_sseu.
>> +     */
>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>       __u64 value;
>>   };
>>   +struct drm_i915_gem_context_param_sseu {
>> +    /*
>> +     * Engine class & instance to be configured or queried.
>> +     */
>> +    __u32 class;
>> +    __u32 instance;
>
> Chris and I were talking about whether u32 is overkill and we should 
> settle for u16:u16 for class:instance. I think 16-bit should be 
> enough. But it can also be u32, I don't think there are any real 
> downsides here unless we want to be consistent in all places.

Let me know what you think is best.

>
>> +
>> +    /*
>> +     * Mask of slices to enable for the context. Valid values are a 
>> subset
>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>> +     */
>> +    __u8 slice_mask;
>> +
>> +    /*
>> +     * Mask of subslices to enable for the context. Valid values are a
>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>> +     */
>> +    __u8 subslice_mask;
>
> Is this future proof enough, say for Gen11?

As far as I can see, this fits.
No objection to bump it to 16/32bits if you'd like.

>
>> +
>> +    /*
>> +     * Minimum/Maximum number of EUs to enable per subslice for the
>> +     * context. min_eus_per_subslice must be inferior or equal to
>> +     * max_eus_per_subslice.
>> +     */
>> +    __u8 min_eus_per_subslice;
>> +    __u8 max_eus_per_subslice;
>> +
>> +    /*
>> +     * Unused for now. Must be cleared to zero.
>> +     */
>> +    __u32 rsvd;
>> +};
>> +
>>   enum drm_i915_oa_format {
>>       I915_OA_FORMAT_A13 = 1,        /* HSW only */
>>       I915_OA_FORMAT_A29,        /* HSW only */
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-23 15:30   ` Tvrtko Ursulin
@ 2018-05-23 17:33     ` Lionel Landwerlin
  2018-05-24 10:39       ` Tvrtko Ursulin
  2018-05-23 17:37     ` Lionel Landwerlin
  1 sibling, 1 reply; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-23 17:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Rogozhkin, Dmitry V

On 23/05/18 16:30, Tvrtko Ursulin wrote:
>
> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>> There are concerns about denial of service around the per context sseu
>> configuration capability. In a previous commit introducing the
>> capability we allowed it only for capable users. This changes adds a
>> new debugfs entry to let any user configure its own context
>> powergating setup.
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |  5 +++
>>   drivers/gpu/drm/i915/i915_gem_context.c | 52 ++++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_sysfs.c       | 30 ++++++++++++++
>>   3 files changed, 86 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 09cfcfe1c339..0fccec29fdda 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1843,6 +1843,8 @@ struct drm_i915_private {
>>           struct ida hw_ida;
>>   #define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
>>   #define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */
>> +
>> +        bool allow_sseu;
>
> I think spelling out "dynamic" in patch titles, commit messages and 
> even in this variable name would be better and clearer.

Done.

>
>>       } contexts;
>>         u32 fdi_rx_config;
>> @@ -3274,6 +3276,9 @@ i915_gem_context_lookup(struct 
>> drm_i915_file_private *file_priv, u32 id)
>>       return ctx;
>>   }
>>   +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private 
>> *dev_priv, bool allowed);
>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>> *dev_priv);
>> +
>>   int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>>                struct drm_file *file);
>>   int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 5c5a12f1c265..815a9d1c29f3 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct 
>> drm_device *dev, void *data,
>>                   break;
>>               }
>>   -            if (!capable(CAP_SYS_ADMIN)) {
>> +            if (!dev_priv->contexts.allow_sseu &&
>> +                !capable(CAP_SYS_ADMIN)) {
>
> So the thing I mentioned in the previous patch. My thinking was not to 
> fail it if sysfs toggle is disabled, but just don't do dynamic 
> switching. That way the software stack works in all cases but user has 
> the option of tuning his system.
>
> I mean it can still be done but in userspace. Set param fails, 
> userspace goes for a fall back.
>
> Only difference is I guess whether it is useful to allow switching at 
> runtime. I am imagining running some 3d and media, and then toggling 
> the knob to see what happens to power and performance on the fly. But 
> maybe that is not so interesting.
>
> Along the same lines I was thinking CAP_SYS_ADMIN limitation could be 
> dropped.
>
> Both points are for a wider discussion I guess.

Okay, let's get Dmitry input on this.

>
>>                   ret = -EPERM;
>>                   break;
>>               }
>> @@ -1058,6 +1059,55 @@ int i915_gem_context_reset_stats_ioctl(struct 
>> drm_device *dev,
>>       return ret;
>>   }
>>   +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private 
>> *dev_priv,
>> +                     bool allowed)
>> +{
>> +    struct intel_engine_cs *engine = dev_priv->engine[RCS];
>> +    int ret = 0;
>> +
>> +    if (!engine->emit_rpcs_config)
>> +        return -ENODEV;
>> +
>> +    mutex_lock(&dev_priv->drm.struct_mutex);
>
> This should be interruptible so user can Ctrl-C if it is stuck.

Ah right, changing.

>
>> +
>> +    /*
>> +     * When we allow each context to configure its powergating
>> +     * configuration, there is no need to put the configurations 
>> back to
>> +     * the default, it should already be the case.
>> +     */
>> +    if (!allowed) {
>> +        union intel_sseu default_sseu =
>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
>> +        struct i915_gem_context *ctx;
>> +
>> +        list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>> +                                default_sseu);
>> +            if (ret)
>> +                break;
>> +        }
>> +    }
>> +
>> +    dev_priv->contexts.allow_sseu = allowed;
>> +
>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>> +    return ret;
>> +}
>> +
>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>> *dev_priv)
>> +{
>> +    struct intel_engine_cs *engine = dev_priv->engine[RCS];
>> +    bool ret;
>> +
>> +    if (!engine->emit_rpcs_config)
>> +        return false;
>> +
>> +    mutex_lock(&dev_priv->drm.struct_mutex);
>> +    ret = dev_priv->contexts.allow_sseu;
>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>
> I guess this mutex does nothing in this case.

Yeah, I'm not sure whether I can read a boolean or whether it should be 
an atomic...

>
>> +    return ret;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/mock_context.c"
>>   #include "selftests/i915_gem_context.c"
>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
>> b/drivers/gpu/drm/i915/i915_sysfs.c
>> index e5e6f6bb2b05..9fd15b138ac9 100644
>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>> @@ -483,6 +483,34 @@ static ssize_t gt_rp_mhz_show(struct device 
>> *kdev, struct device_attribute *attr
>>       return snprintf(buf, PAGE_SIZE, "%d\n", val);
>>   }
>>   +static ssize_t gem_allow_sseu_show(struct device *kdev,
>> +                   struct device_attribute *attr, char *buf)
>> +{
>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>> +    int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
>> +
>> +    return snprintf(buf, PAGE_SIZE, "%d\n", ret);
>
> Propagate ENODEV all the way by making 
> i915_gem_contexts_get_allow_sseu return an int?

Duh! Well spotted.

>
>> +}
>> +
>> +static ssize_t gem_allow_sseu_store(struct device *kdev,
>> +                    struct device_attribute *attr,
>> +                    const char *buf, size_t count)
>> +{
>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>> +    u32 val;
>> +    ssize_t ret;
>
> Chris will complain about Christmas trees. :)
>
> (And possible about new use of dev_priv throughout the series but that 
> battle is in stale mate.)

Which one is it? :) i915 or dev_priv?

>
>> +
>> +    ret = kstrtou32(buf, 0, &val);
>> +    if (ret)
>> +        return ret;
>> +
>> +    i915_gem_contexts_set_allow_sseu(dev_priv, val != 0);
>
> I would enforce 0 or 1 to allow (hypothetical) future expansion with 
> more values.

Okay.

>
>> +
>> +    return ret ?: count;
>> +}
>> +
>> +static DEVICE_ATTR_RW(gem_allow_sseu);
>
> I would like to open up a round of bike-shedding of the knob name. :)
>
> How about allow_dynamic_sseu? gem_ or not, don't know.

Sure, no problem.

>
>> +
>>   static const struct attribute *gen6_attrs[] = {
>>       &dev_attr_gt_act_freq_mhz.attr,
>>       &dev_attr_gt_cur_freq_mhz.attr,
>> @@ -492,6 +520,7 @@ static const struct attribute *gen6_attrs[] = {
>>       &dev_attr_gt_RP0_freq_mhz.attr,
>>       &dev_attr_gt_RP1_freq_mhz.attr,
>>       &dev_attr_gt_RPn_freq_mhz.attr,
>> +    &dev_attr_gem_allow_sseu.attr,
>>       NULL,
>>   };
>>   @@ -505,6 +534,7 @@ static const struct attribute *vlv_attrs[] = {
>>       &dev_attr_gt_RP1_freq_mhz.attr,
>>       &dev_attr_gt_RPn_freq_mhz.attr,
>>       &dev_attr_vlv_rpe_freq_mhz.attr,
>> +    &dev_attr_gem_allow_sseu.attr,
>>       NULL,
>>   };
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-23 15:30   ` Tvrtko Ursulin
  2018-05-23 17:33     ` Lionel Landwerlin
@ 2018-05-23 17:37     ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-23 17:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 404 bytes --]

On 23/05/18 16:30, Tvrtko Ursulin wrote:
>>
>> +{
>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>> +    int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
>> +
>> +    return snprintf(buf, PAGE_SIZE, "%d\n", ret);
>
> Propagate ENODEV all the way by making 
> i915_gem_contexts_get_allow_sseu return an int? 

Actually should a cat on a sysfs entry fail with ENODEV?


[-- Attachment #1.2: Type: text/html, Size: 940 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-05-23 15:54     ` Lionel Landwerlin
@ 2018-05-24 10:33       ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-24 10:33 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 23/05/2018 16:54, Lionel Landwerlin wrote:
> On 23/05/18 15:57, Tvrtko Ursulin wrote:
>>
>> On 22/05/2018 18:59, Lionel Landwerlin wrote:
>>> Abstract the context image access a bit.
>>>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
>>>   1 file changed, 16 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 805dfc732bba..a5d98bda5c2e 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -210,6 +210,7 @@
>>>   #include "i915_oa_cflgt3.h"
>>>   #include "i915_oa_cnl.h"
>>>   #include "i915_oa_icl.h"
>>> +#include "intel_lrc_reg.h"
>>>     /* HW requires this to be a power of two, between 128k and 16M, 
>>> though driver
>>>    * is currently generally designed assuming the largest 16M size is 
>>> used such
>>> @@ -1579,27 +1580,25 @@ static void 
>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>       u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>>>       u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
>>>       /* The MMIO offsets for Flex EU registers aren't contiguous */
>>> -    u32 flex_mmio[] = {
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL0),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL1),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL2),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL3),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL4),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL5),
>>> -        i915_mmio_reg_offset(EU_PERF_CNTL6),
>>> +    i915_reg_t flex_regs[] = {
>>> +        EU_PERF_CNTL0,
>>> +        EU_PERF_CNTL1,
>>> +        EU_PERF_CNTL2,
>>> +        EU_PERF_CNTL3,
>>> +        EU_PERF_CNTL4,
>>> +        EU_PERF_CNTL5,
>>> +        EU_PERF_CNTL6,
>>>       };
>>>       int i;
>>>   -    reg_state[ctx_oactxctrl] = 
>>> i915_mmio_reg_offset(GEN8_OACTXCONTROL);
>>> -    reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
>>> -                      GEN8_OA_TIMER_PERIOD_SHIFT) |
>>> -                     (dev_priv->perf.oa.periodic ?
>>> -                      GEN8_OA_TIMER_ENABLE : 0) |
>>> -                     GEN8_OA_COUNTER_RESUME;
>>> +    CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>>> +        (dev_priv->perf.oa.period_exponent << 
>>> GEN8_OA_TIMER_PERIOD_SHIFT) |
>>> +        (dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
>>> +        GEN8_OA_COUNTER_RESUME);
>>>   -    for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
>>> +    for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
>>>           u32 state_offset = ctx_flexeu0 + i * 2;
>>> -        u32 mmio = flex_mmio[i];
>>> +        u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
>>>             /*
>>>            * This arbitrary default will select the 'EU FPU0 Pipeline
>>> @@ -1619,8 +1618,7 @@ static void 
>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>               }
>>>           }
>>>   -        reg_state[state_offset] = mmio;
>>> -        reg_state[state_offset+1] = value;
>>> +        CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>
>> Does flex_regs[i] needs to be mmio?
> 
> Yeah, the macro uses i915_mmio_reg_offset().

In that case:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-23 17:33     ` Lionel Landwerlin
@ 2018-05-24 10:39       ` Tvrtko Ursulin
  2018-05-24 10:45         ` Lionel Landwerlin
  2018-05-25 15:42         ` Lionel Landwerlin
  0 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-24 10:39 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx, Rogozhkin, Dmitry V


On 23/05/2018 18:33, Lionel Landwerlin wrote:
> On 23/05/18 16:30, Tvrtko Ursulin wrote:
>>
>> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>>> There are concerns about denial of service around the per context sseu
>>> configuration capability. In a previous commit introducing the
>>> capability we allowed it only for capable users. This changes adds a
>>> new debugfs entry to let any user configure its own context
>>> powergating setup.
>>>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  5 +++
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 52 ++++++++++++++++++++++++-
>>>   drivers/gpu/drm/i915/i915_sysfs.c       | 30 ++++++++++++++
>>>   3 files changed, 86 insertions(+), 1 deletion(-)

[snip]

>>
>>>       } contexts;
>>>         u32 fdi_rx_config;
>>> @@ -3274,6 +3276,9 @@ i915_gem_context_lookup(struct 
>>> drm_i915_file_private *file_priv, u32 id)
>>>       return ctx;
>>>   }
>>>   +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private 
>>> *dev_priv, bool allowed);
>>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>>> *dev_priv);
>>> +
>>>   int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>>>                struct drm_file *file);
>>>   int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>>> b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index 5c5a12f1c265..815a9d1c29f3 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>                   break;
>>>               }
>>>   -            if (!capable(CAP_SYS_ADMIN)) {
>>> +            if (!dev_priv->contexts.allow_sseu &&
>>> +                !capable(CAP_SYS_ADMIN)) {
>>
>> So the thing I mentioned in the previous patch. My thinking was not to 
>> fail it if sysfs toggle is disabled, but just don't do dynamic 
>> switching. That way the software stack works in all cases but user has 
>> the option of tuning his system.
>>
>> I mean it can still be done but in userspace. Set param fails, 
>> userspace goes for a fall back.
>>
>> Only difference is I guess whether it is useful to allow switching at 
>> runtime. I am imagining running some 3d and media, and then toggling 
>> the knob to see what happens to power and performance on the fly. But 
>> maybe that is not so interesting.
>>
>> Along the same lines I was thinking CAP_SYS_ADMIN limitation could be 
>> dropped.
>>
>> Both points are for a wider discussion I guess.
> 
> Okay, let's get Dmitry input on this.

To expand, my thinking is to let media stack configure its contexts for 
fewer slices, but let the user/sysadmin decide whether 3d/compute 
performance is more important, or media.

Downside is that with this approach you would have to re-configure all 
contexts when sysfs toggle is changed from zero to one.

[snip]

>>
>>> +
>>> +    /*
>>> +     * When we allow each context to configure its powergating
>>> +     * configuration, there is no need to put the configurations 
>>> back to
>>> +     * the default, it should already be the case.
>>> +     */
>>> +    if (!allowed) {
>>> +        union intel_sseu default_sseu =
>>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
>>> +        struct i915_gem_context *ctx;
>>> +
>>> +        list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
>>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>>> +                                default_sseu);
>>> +            if (ret)
>>> +                break;
>>> +        }
>>> +    }
>>> +
>>> +    dev_priv->contexts.allow_sseu = allowed;
>>> +
>>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>>> +    return ret;
>>> +}
>>> +
>>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>>> *dev_priv)
>>> +{
>>> +    struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>> +    bool ret;
>>> +
>>> +    if (!engine->emit_rpcs_config)
>>> +        return false;
>>> +
>>> +    mutex_lock(&dev_priv->drm.struct_mutex);
>>> +    ret = dev_priv->contexts.allow_sseu;
>>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>>
>> I guess this mutex does nothing in this case.
> 
> Yeah, I'm not sure whether I can read a boolean or whether it should be 
> an atomic...

You can just read a boolean.

> 
>>
>>> +    return ret;
>>> +}
>>> +
>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>   #include "selftests/mock_context.c"
>>>   #include "selftests/i915_gem_context.c"
>>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
>>> b/drivers/gpu/drm/i915/i915_sysfs.c
>>> index e5e6f6bb2b05..9fd15b138ac9 100644
>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>>> @@ -483,6 +483,34 @@ static ssize_t gt_rp_mhz_show(struct device 
>>> *kdev, struct device_attribute *attr
>>>       return snprintf(buf, PAGE_SIZE, "%d\n", val);
>>>   }
>>>   +static ssize_t gem_allow_sseu_show(struct device *kdev,
>>> +                   struct device_attribute *attr, char *buf)
>>> +{
>>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>>> +    int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
>>> +
>>> +    return snprintf(buf, PAGE_SIZE, "%d\n", ret);
>>
>> Propagate ENODEV all the way by making 
>> i915_gem_contexts_get_allow_sseu return an int?
> 
> Duh! Well spotted.

I think I confused set and get while reviewing.

It is probably fine to read back zero when the feature is not supported 
and just fail the write. But could ENODEV here as well, my opinion is 
not strong.

> 
>>
>>> +}
>>> +
>>> +static ssize_t gem_allow_sseu_store(struct device *kdev,
>>> +                    struct device_attribute *attr,
>>> +                    const char *buf, size_t count)
>>> +{
>>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>>> +    u32 val;
>>> +    ssize_t ret;
>>
>> Chris will complain about Christmas trees. :)
>>
>> (And possible about new use of dev_priv throughout the series but that 
>> battle is in stale mate.)
> 
> Which one is it? :) i915 or dev_priv?

i915 unless I915_READ/WRITE are used in the function. :(

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-23 17:12     ` Lionel Landwerlin
@ 2018-05-24 10:43       ` Tvrtko Ursulin
  2018-05-24 11:01         ` Lionel Landwerlin
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2018-05-24 10:43 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 23/05/2018 18:12, Lionel Landwerlin wrote:
> On 23/05/18 16:13, Tvrtko Ursulin wrote:
>>
>> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>     struct drm_i915_gem_context_param arg;
>>>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>                                                     .instance = 0, };
>>>
>>>     memset(&arg, 0, sizeof(arg));
>>>     arg.ctx_id = ctx;
>>>     arg.param = I915_CONTEXT_PARAM_SSEU;
>>>     arg.value = (uintptr_t) &sseu;
>>>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>         sseu.packed.subslice_mask = 0;
>>>
>>>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>     }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
>>> (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> v5: Validate sseu configuration against the device's capabilities 
>>> (Lionel)
>>>
>>> v6: Change context powergating settings through MI_SDM on kernel 
>>> context (Chris)
>>>
>>> v7: Synchronize the requests following a powergating setting change 
>>> using a global
>>>      dependency (Chris)
>>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>>      Disable RPCS configuration setting for non capable users 
>>> (Lionel/Tvrtko)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  13 ++
>>>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_request.c     |  20 +++
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>>   include/uapi/drm/i915_drm.h             |  38 ++++++
>>>   8 files changed, 314 insertions(+), 35 deletions(-)

[snip]

>>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void 
>>> *data,
>>>                       struct drm_file *file)
>>>   {
>>> @@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct 
>>> drm_device *dev, void *data,
>>>       case I915_CONTEXT_PARAM_PRIORITY:
>>>           args->value = ctx->sched.priority;
>>>           break;
>>> +    case I915_CONTEXT_PARAM_SSEU: {
>>> +        struct drm_i915_gem_context_param_sseu param_sseu;
>>> +        struct intel_engine_cs *engine;
>>> +        struct intel_context *ce;
>>> +
>>> +        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                   sizeof(param_sseu))) {
>>> +            ret = -EFAULT;
>>> +            break;
>>> +        }
>>> +
>>> +        engine = intel_engine_lookup_user(to_i915(dev),
>>> +                          param_sseu.class,
>>> +                          param_sseu.instance);
>>> +        if (!engine) {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        ce = to_intel_context(ctx, engine);
>>> +
>>> +        param_sseu.slice_mask = ce->sseu.slice_mask;
>>> +        param_sseu.subslice_mask = ce->sseu.subslice_mask;
>>> +        param_sseu.min_eus_per_subslice = 
>>> ce->sseu.min_eus_per_subslice;
>>> +        param_sseu.max_eus_per_subslice = 
>>> ce->sseu.max_eus_per_subslice;
>>> +
>>> +        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                 sizeof(param_sseu)))
>>> +            ret = -EFAULT;
>>> +        break;
>>
>> Should we think about maybe not implementing the getter? I mean, is it 
>> useful or just code for the driver which could be dropped?
> 
> Well, render fd can be transfer between processes, so I think it makes 
> sense to have a way to tell what is the current configuration.

I was thinking that userspace can already get the default configuration 
via existing get params / topology query.

But if you change the ctx sseu config and pass that fd out its a bit 
evil. :)

Anyway, no strong feelings to keep it. Was just thinking whether we can 
save ourselves adding some code.

[snip]

>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 7f5634ce8e88..24b90836ce1d 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
>>>   #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
>>>   #define   I915_CONTEXT_DEFAULT_PRIORITY        0
>>>   #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
>>> +    /*
>>> +     * When using the following param, value should be a pointer to
>>> +     * drm_i915_gem_context_param_sseu.
>>> +     */
>>> +#define I915_CONTEXT_PARAM_SSEU        0x7
>>>       __u64 value;
>>>   };
>>>   +struct drm_i915_gem_context_param_sseu {
>>> +    /*
>>> +     * Engine class & instance to be configured or queried.
>>> +     */
>>> +    __u32 class;
>>> +    __u32 instance;
>>
>> Chris and I were talking about whether u32 is overkill and we should 
>> settle for u16:u16 for class:instance. I think 16-bit should be 
>> enough. But it can also be u32, I don't think there are any real 
>> downsides here unless we want to be consistent in all places.
> 
> Let me know what you think is best.

I'd say u16:u16, Chris?

> 
>>
>>> +
>>> +    /*
>>> +     * Mask of slices to enable for the context. Valid values are a 
>>> subset
>>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>>> +     */
>>> +    __u8 slice_mask;
>>> +
>>> +    /*
>>> +     * Mask of subslices to enable for the context. Valid values are a
>>> +     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
>>> +     */
>>> +    __u8 subslice_mask;
>>
>> Is this future proof enough, say for Gen11?
> 
> As far as I can see, this fits.
> No objection to bump it to 16/32bits if you'd like.

Feel like I've asked you this before, sorry - nothing in the future will 
need per slice subslice mask?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-24 10:39       ` Tvrtko Ursulin
@ 2018-05-24 10:45         ` Lionel Landwerlin
  2018-05-25 15:42         ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-24 10:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Rogozhkin, Dmitry V

On 24/05/18 11:39, Tvrtko Ursulin wrote:
>
> On 23/05/2018 18:33, Lionel Landwerlin wrote:
>> On 23/05/18 16:30, Tvrtko Ursulin wrote:
>>>
>>> On 22/05/2018 19:00, Lionel Landwerlin wrote:
>>>> There are concerns about denial of service around the per context sseu
>>>> configuration capability. In a previous commit introducing the
>>>> capability we allowed it only for capable users. This changes adds a
>>>> new debugfs entry to let any user configure its own context
>>>> powergating setup.
>>>>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_drv.h         |  5 +++
>>>>   drivers/gpu/drm/i915/i915_gem_context.c | 52 
>>>> ++++++++++++++++++++++++-
>>>>   drivers/gpu/drm/i915/i915_sysfs.c       | 30 ++++++++++++++
>>>>   3 files changed, 86 insertions(+), 1 deletion(-)
>
> [snip]
>
>>>
>>>>       } contexts;
>>>>         u32 fdi_rx_config;
>>>> @@ -3274,6 +3276,9 @@ i915_gem_context_lookup(struct 
>>>> drm_i915_file_private *file_priv, u32 id)
>>>>       return ctx;
>>>>   }
>>>>   +int i915_gem_contexts_set_allow_sseu(struct drm_i915_private 
>>>> *dev_priv, bool allowed);
>>>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>>>> *dev_priv);
>>>> +
>>>>   int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>>>>                struct drm_file *file);
>>>>   int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>>>> b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index 5c5a12f1c265..815a9d1c29f3 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct 
>>>> drm_device *dev, void *data,
>>>>                   break;
>>>>               }
>>>>   -            if (!capable(CAP_SYS_ADMIN)) {
>>>> +            if (!dev_priv->contexts.allow_sseu &&
>>>> +                !capable(CAP_SYS_ADMIN)) {
>>>
>>> So the thing I mentioned in the previous patch. My thinking was not 
>>> to fail it if sysfs toggle is disabled, but just don't do dynamic 
>>> switching. That way the software stack works in all cases but user 
>>> has the option of tuning his system.
>>>
>>> I mean it can still be done but in userspace. Set param fails, 
>>> userspace goes for a fall back.
>>>
>>> Only difference is I guess whether it is useful to allow switching 
>>> at runtime. I am imagining running some 3d and media, and then 
>>> toggling the knob to see what happens to power and performance on 
>>> the fly. But maybe that is not so interesting.
>>>
>>> Along the same lines I was thinking CAP_SYS_ADMIN limitation could 
>>> be dropped.
>>>
>>> Both points are for a wider discussion I guess.
>>
>> Okay, let's get Dmitry input on this.
>
> To expand, my thinking is to let media stack configure its contexts 
> for fewer slices, but let the user/sysadmin decide whether 3d/compute 
> performance is more important, or media.
>
> Downside is that with this approach you would have to re-configure all 
> contexts when sysfs toggle is changed from zero to one.
>
> [snip]

Not an issue for me.

Only concern is whether this might make things easier to debug/spot when 
sseu configuration requests are silently discarded.

>
>>>
>>>> +
>>>> +    /*
>>>> +     * When we allow each context to configure its powergating
>>>> +     * configuration, there is no need to put the configurations 
>>>> back to
>>>> +     * the default, it should already be the case.
>>>> +     */
>>>> +    if (!allowed) {
>>>> +        union intel_sseu default_sseu =
>>>> + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
>>>> +        struct i915_gem_context *ctx;
>>>> +
>>>> +        list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
>>>> +            ret = i915_gem_context_reconfigure_sseu(ctx, engine,
>>>> +                                default_sseu);
>>>> +            if (ret)
>>>> +                break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    dev_priv->contexts.allow_sseu = allowed;
>>>> +
>>>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +bool i915_gem_contexts_get_allow_sseu(struct drm_i915_private 
>>>> *dev_priv)
>>>> +{
>>>> +    struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>>> +    bool ret;
>>>> +
>>>> +    if (!engine->emit_rpcs_config)
>>>> +        return false;
>>>> +
>>>> +    mutex_lock(&dev_priv->drm.struct_mutex);
>>>> +    ret = dev_priv->contexts.allow_sseu;
>>>> +    mutex_unlock(&dev_priv->drm.struct_mutex);
>>>
>>> I guess this mutex does nothing in this case.
>>
>> Yeah, I'm not sure whether I can read a boolean or whether it should 
>> be an atomic...
>
> You can just read a boolean.
>
>>
>>>
>>>> +    return ret;
>>>> +}
>>>> +
>>>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>>   #include "selftests/mock_context.c"
>>>>   #include "selftests/i915_gem_context.c"
>>>> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
>>>> b/drivers/gpu/drm/i915/i915_sysfs.c
>>>> index e5e6f6bb2b05..9fd15b138ac9 100644
>>>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>>>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>>>> @@ -483,6 +483,34 @@ static ssize_t gt_rp_mhz_show(struct device 
>>>> *kdev, struct device_attribute *attr
>>>>       return snprintf(buf, PAGE_SIZE, "%d\n", val);
>>>>   }
>>>>   +static ssize_t gem_allow_sseu_show(struct device *kdev,
>>>> +                   struct device_attribute *attr, char *buf)
>>>> +{
>>>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>>>> +    int ret = i915_gem_contexts_get_allow_sseu(dev_priv);
>>>> +
>>>> +    return snprintf(buf, PAGE_SIZE, "%d\n", ret);
>>>
>>> Propagate ENODEV all the way by making 
>>> i915_gem_contexts_get_allow_sseu return an int?
>>
>> Duh! Well spotted.
>
> I think I confused set and get while reviewing.
>
> It is probably fine to read back zero when the feature is not 
> supported and just fail the write. But could ENODEV here as well, my 
> opinion is not strong.

I think it makes sense to return 0 in case of ENODEV and actually error 
out with ENODEV when trying to write a new value.

>
>>
>>>
>>>> +}
>>>> +
>>>> +static ssize_t gem_allow_sseu_store(struct device *kdev,
>>>> +                    struct device_attribute *attr,
>>>> +                    const char *buf, size_t count)
>>>> +{
>>>> +    struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
>>>> +    u32 val;
>>>> +    ssize_t ret;
>>>
>>> Chris will complain about Christmas trees. :)
>>>
>>> (And possible about new use of dev_priv throughout the series but 
>>> that battle is in stale mate.)
>>
>> Which one is it? :) i915 or dev_priv?
>
> i915 unless I915_READ/WRITE are used in the function. :(
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-24 10:43       ` Tvrtko Ursulin
@ 2018-05-24 11:01         ` Lionel Landwerlin
  0 siblings, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-24 11:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 993 bytes --]

On 24/05/18 11:43, Tvrtko Ursulin wrote:
>
>>
>>>
>>>> +
>>>> +    /*
>>>> +     * Mask of slices to enable for the context. Valid values are 
>>>> a subset
>>>> +     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
>>>> +     */
>>>> +    __u8 slice_mask;
>>>> +
>>>> +    /*
>>>> +     * Mask of subslices to enable for the context. Valid values 
>>>> are a
>>>> +     * subset of the bitmask value return by 
>>>> I915_PARAM_SUBSLICE_MASK.
>>>> +     */
>>>> +    __u8 subslice_mask;
>>>
>>> Is this future proof enough, say for Gen11?
>>
>> As far as I can see, this fits.
>> No objection to bump it to 16/32bits if you'd like.
>
> Feel like I've asked you this before, sorry - nothing in the future 
> will need per slice subslice mask?

As far as I can see this remains the same uniform subslice per slice 
programming style.
We could play it safe and put all the masks in 64bits in the uAPI.

What do you think?

>
> Regards,
>
> Tvrtko



[-- Attachment #1.2: Type: text/html, Size: 2227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs
  2018-05-24 10:39       ` Tvrtko Ursulin
  2018-05-24 10:45         ` Lionel Landwerlin
@ 2018-05-25 15:42         ` Lionel Landwerlin
  1 sibling, 0 replies; 31+ messages in thread
From: Lionel Landwerlin @ 2018-05-25 15:42 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Rogozhkin, Dmitry V, Joonas Lahtinen


[-- Attachment #1.1: Type: text/plain, Size: 1914 bytes --]

On 24/05/18 11:39, Tvrtko Ursulin wrote:
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -981,7 +981,8 @@ int i915_gem_context_setparam_ioctl(struct 
>>>> drm_device *dev, void *data,
>>>>                   break;
>>>>               }
>>>>   -            if (!capable(CAP_SYS_ADMIN)) {
>>>> +            if (!dev_priv->contexts.allow_sseu &&
>>>> +                !capable(CAP_SYS_ADMIN)) {
>>>
>>> So the thing I mentioned in the previous patch. My thinking was not 
>>> to fail it if sysfs toggle is disabled, but just don't do dynamic 
>>> switching. That way the software stack works in all cases but user 
>>> has the option of tuning his system.
>>>
>>> I mean it can still be done but in userspace. Set param fails, 
>>> userspace goes for a fall back.
>>>
>>> Only difference is I guess whether it is useful to allow switching 
>>> at runtime. I am imagining running some 3d and media, and then 
>>> toggling the knob to see what happens to power and performance on 
>>> the fly. But maybe that is not so interesting.
>>>
>>> Along the same lines I was thinking CAP_SYS_ADMIN limitation could 
>>> be dropped.
>>>
>>> Both points are for a wider discussion I guess.
>>
>> Okay, let's get Dmitry input on this.
>
> To expand, my thinking is to let media stack configure its contexts 
> for fewer slices, but let the user/sysadmin decide whether 3d/compute 
> performance is more important, or media.
>
> Downside is that with this approach you would have to re-configure all 
> contexts when sysfs toggle is changed from zero to one.

Right, I've updated this series locally with Tvrkto's latest comments on v7.

The issue discussed above is the last remaining detail on which some 
kind of decision needs to be taken.

Ccing Joonas who might have an input.

Cheers,

-
Lionel

[-- Attachment #1.2: Type: text/html, Size: 3102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-05-25 15:42 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 17:59 [PATCH v6 0/7] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-05-22 17:59 ` [PATCH v6 1/7] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-05-22 17:59 ` [PATCH v6 2/7] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-05-23 14:54   ` Tvrtko Ursulin
2018-05-23 14:58     ` Chris Wilson
2018-05-23 15:52     ` Lionel Landwerlin
2018-05-22 17:59 ` [PATCH v6 3/7] drm/i915/perf: simplify configure all context function Lionel Landwerlin
2018-05-23 14:54   ` Tvrtko Ursulin
2018-05-22 17:59 ` [PATCH v6 4/7] drm/i915/perf: reuse intel_lrc ctx regs macro Lionel Landwerlin
2018-05-23 14:57   ` Tvrtko Ursulin
2018-05-23 15:54     ` Lionel Landwerlin
2018-05-24 10:33       ` Tvrtko Ursulin
2018-05-22 18:00 ` [PATCH v6 5/7] drm/i915/perf: lock powergating configuration to default when active Lionel Landwerlin
2018-05-23 15:04   ` Tvrtko Ursulin
2018-05-22 18:00 ` [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-05-23 15:13   ` Tvrtko Ursulin
2018-05-23 15:18     ` Chris Wilson
2018-05-23 17:12     ` Lionel Landwerlin
2018-05-24 10:43       ` Tvrtko Ursulin
2018-05-24 11:01         ` Lionel Landwerlin
2018-05-22 18:00 ` [PATCH v6 7/7] drm/i915: add a sysfs entry to let users set sseu configs Lionel Landwerlin
2018-05-23 15:30   ` Tvrtko Ursulin
2018-05-23 17:33     ` Lionel Landwerlin
2018-05-24 10:39       ` Tvrtko Ursulin
2018-05-24 10:45         ` Lionel Landwerlin
2018-05-25 15:42         ` Lionel Landwerlin
2018-05-23 17:37     ` Lionel Landwerlin
2018-05-22 18:55 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev5) Patchwork
2018-05-22 18:58 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-22 19:16 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-23  0:10 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.