All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/8] Per context dynamic (sub)slice power-gating
@ 2018-08-14 14:40 Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 1/8] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
                   ` (12 more replies)
  0 siblings, 13 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Updated series after continuing Lionel's work.

Userspace for the feature is the media-driver project on GitHub. Please see
https://github.com/intel/media-driver/pull/271/commits.

Headline changes:

 1.

  No more master allow/disallow sysfs switch. Feature is unconditionally
  enabled for Gen11 and on other platforms it requires CAP_SYS_ADMIN.

  *** To be discussed if this is a good idea or not. ***

 2.

  Two new patches due a) breaking out the global barrier, and b) fixing one
  GEM_BUG_ON regarding incorrent kernel context classification by i915_is_ggtt.


Otherwise please see individial patch change logs.

Main topic for the cover letter though is addressing the question of dynamic
slice re-configuration performance impact.

Introduction into this problem space is that changing the (sub)slice
configuration has a cost at context switch time in the order of tens of milli-
seconds. (It varies per Gen and with different slice count transitions.)

So the question is whether a malicious unprivileged workload can negatively
impact other clients. To try and answer this question I have extended gem_wsim
and creating some test workloads. (Note that my testing was done on a Gen9
system. Overall message could be the same on Gen11 but needs to be verified.)

First test was a simulated video playback client running in parallel with a
simulated game of both medium and high complexity (uses around 60% or 90% of the
render engine respectively, and 7% of the blitter engine). I had two flavours of
the playback client, one which runs normally and one which requests reduced
slice configuration. Both workloads are targetting to run at 60fps.

Second test is the same but against a heavier simulated game workload, the one
which uses around 90% of the render engine.

Results are achieved frames per second as observed from the game client:

                     No player  Normal player   SSEU enabled player
        Medium game     59.6        59.6               59.6
         Heavy game     59.7        58.4               58.1

Here we can see that the medium workload was not affected either by the normal
or SSEU player, while the heavy workload did see a performance hit. Both with
the video player running in parallel, and slighlty larger when the player was
SSEU enabled.

Second test is running a malicious client (or clients) in parallel to the same
simulated game workloads. These clients try to trigger many context switches by
using multiple contexts with dependencies set up so request coalescing is
defeated as much as possible.

I tested both with normal and SSEU enabled malicious clients:

                     DoS client   SSEU DoS client
        Medium game     59.5           59.6
         Heavy game     57.8           55.4

For here we can see a similar picture as with the first test. Medium game client
is not affected by either DoS client, while the heavy game client is, more so
with the SSEU enabled attacker.

From both tests I think the conclusion is that dynamic SSEU switching does
increase the magnitude of performance loss, especially with over-subscribed
engines, due cost being proportional to context switch frequency.

Likelyhood is that it slightly lowers the utilization level at which this starts
to happen, but does not introduce a completely new vector of attack - that is -
where it was possible to DoS a system from an unprivileged client, it still is.
In both cases (SSEU enabled or not), a malicious client has the option to grind
the system to a halt, albeit it may need fewer submission threads to do so when
it is SSEU enabled.

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (3):
  drm/i915/perf: simplify configure all context function
  drm/i915/perf: reuse intel_lrc ctx regs macro
  drm/i915/perf: lock powergating configuration to default when active

Tvrtko Ursulin (2):
  drm/i915: Add global barrier support
  drm/i915: Explicitly mark Global GTT address spaces

 drivers/gpu/drm/i915/i915_drv.h         |  56 +++++++
 drivers/gpu/drm/i915/i915_gem.c         |   2 +
 drivers/gpu/drm/i915/i915_gem_context.c | 189 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h |   4 +
 drivers/gpu/drm/i915/i915_gem_gtt.c     |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   5 +-
 drivers/gpu/drm/i915/i915_perf.c        |  68 +++++----
 drivers/gpu/drm/i915/i915_request.c     |  16 ++
 drivers/gpu/drm/i915/i915_request.h     |  10 ++
 drivers/gpu/drm/i915/intel_lrc.c        |  87 ++++++++---
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  43 ++++++
 13 files changed, 439 insertions(+), 50 deletions(-)

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/8] drm/i915: Program RPCS for Broadwell
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 2/8] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3f90c74038ef..d3ffb268a7a1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2489,13 +2489,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/8] drm/i915: Record the sseu configuration per-context & engine
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 1/8] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 3/8] drm/i915/perf: simplify configure all context function Tvrtko Ursulin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

v5: More to_intel_context() (Tvrtko)
    Switch intel_sseu from union to struct (Tvrtko)
    Move context default sseu in existing loop (Chris)

v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.h |  4 ++++
 drivers/gpu/drm/i915/i915_request.h     | 10 ++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 22 +++++++++++-----------
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5fa13887b911..d6049c3f911b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3445,6 +3445,20 @@ mkwrite_device_info(struct drm_i915_private *dev_priv)
 	return (struct intel_device_info *)&dev_priv->info;
 }
 
+static inline struct intel_sseu
+intel_device_default_sseu(struct drm_i915_private *i915)
+{
+	const struct sseu_dev_info *sseu = &INTEL_INFO(i915)->sseu;
+	struct intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 /* modesetting */
 extern void intel_modeset_init_hw(struct drm_device *dev);
 extern int intel_modeset_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f15a039772db..8a12984e7495 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -291,6 +291,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		struct intel_context *ce = &ctx->__engine[n];
 
 		ce->gem_context = ctx;
+		/* Use the whole device by default */
+		ce->sseu = intel_device_default_sseu(dev_priv);
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 851dad6decd7..1e51c2a46644 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -31,6 +31,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -157,6 +158,9 @@ struct i915_gem_context {
 		int pin_count;
 
 		const struct intel_context_ops *ops;
+
+		/** sseu: Control eu/slice partitioning */
+		struct intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 9898301ab7ef..eb6f8cce16c4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -39,6 +39,16 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+struct intel_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
 struct intel_wait {
 	struct rb_node node;
 	struct task_struct *tsk;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d3ffb268a7a1..7b2f2d6bb057 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     struct intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2495,24 +2495,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
 			GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2638,7 +2637,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  to_intel_context(ctx, engine)->sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/8] drm/i915/perf: simplify configure all context function
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 1/8] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 2/8] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 4/8] drm/i915/perf: reuse intel_lrc ctx regs macro Tvrtko Ursulin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0376338d1f8d..49597cf31707 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1819,7 +1819,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1838,7 +1838,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				     wait_flags,
 				     MAX_SCHEDULE_TIMEOUT);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1850,10 +1850,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1863,7 +1861,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/8] drm/i915/perf: reuse intel_lrc ctx regs macro
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 3/8] drm/i915/perf: simplify configure all context function Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Abstract the context image access a bit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 34 +++++++++++++++-----------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 49597cf31707..ccb20230df2c 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -210,6 +210,7 @@
 #include "i915_oa_cflgt3.h"
 #include "i915_oa_cnl.h"
 #include "i915_oa_icl.h"
+#include "intel_lrc_reg.h"
 
 /* HW requires this to be a power of two, between 128k and 16M, though driver
  * is currently generally designed assuming the largest 16M size is used such
@@ -1636,27 +1637,25 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
 	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
-	u32 flex_mmio[] = {
-		i915_mmio_reg_offset(EU_PERF_CNTL0),
-		i915_mmio_reg_offset(EU_PERF_CNTL1),
-		i915_mmio_reg_offset(EU_PERF_CNTL2),
-		i915_mmio_reg_offset(EU_PERF_CNTL3),
-		i915_mmio_reg_offset(EU_PERF_CNTL4),
-		i915_mmio_reg_offset(EU_PERF_CNTL5),
-		i915_mmio_reg_offset(EU_PERF_CNTL6),
+	i915_reg_t flex_regs[] = {
+		EU_PERF_CNTL0,
+		EU_PERF_CNTL1,
+		EU_PERF_CNTL2,
+		EU_PERF_CNTL3,
+		EU_PERF_CNTL4,
+		EU_PERF_CNTL5,
+		EU_PERF_CNTL6,
 	};
 	int i;
 
-	reg_state[ctx_oactxctrl] = i915_mmio_reg_offset(GEN8_OACTXCONTROL);
-	reg_state[ctx_oactxctrl+1] = (dev_priv->perf.oa.period_exponent <<
-				      GEN8_OA_TIMER_PERIOD_SHIFT) |
-				     (dev_priv->perf.oa.periodic ?
-				      GEN8_OA_TIMER_ENABLE : 0) |
-				     GEN8_OA_COUNTER_RESUME;
+	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
+		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
+		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
+		GEN8_OA_COUNTER_RESUME);
 
-	for (i = 0; i < ARRAY_SIZE(flex_mmio); i++) {
+	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
 		u32 state_offset = ctx_flexeu0 + i * 2;
-		u32 mmio = flex_mmio[i];
+		u32 mmio = i915_mmio_reg_offset(flex_regs[i]);
 
 		/*
 		 * This arbitrary default will select the 'EU FPU0 Pipeline
@@ -1676,8 +1675,7 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 			}
 		}
 
-		reg_state[state_offset] = mmio;
-		reg_state[state_offset+1] = value;
+		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
 }
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 4/8] drm/i915/perf: reuse intel_lrc ctx regs macro Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:59   ` Lionel Landwerlin
  2018-08-14 14:40 ` [PATCH 6/8] drm/i915: Add global barrier support Tvrtko Ursulin
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active. There is no known issue with this apart from a performance
penality for some media workloads that benefit from running on a
partially powergated GPU. We already prevent RC6 from affecting the
programming so it doesn't sound completely unreasonable to hold on
powergating for the same reason.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
    More to_intel_context() (Tvrtko)
    s/dev_priv/i915/ (Tvrtko)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
 drivers/gpu/drm/i915/intel_lrc.h |  3 +++
 4 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d6049c3f911b..5c12d2676435 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3851,4 +3851,19 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
 		return I915_HWS_CSB_WRITE_INDEX;
 }
 
+static inline struct intel_sseu
+intel_engine_prepare_sseu(struct intel_engine_cs *engine,
+			  struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = engine->i915;
+
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system. The most natural configuration to take in that case
+	 * is the default (i.e maximum the hardware can do).
+	 */
+	return i915->perf.oa.exclusive_stream ?
+		intel_device_default_sseu(i915) : sseu;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ccb20230df2c..c2fc2399e0ed 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  */
 static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+					   const struct i915_oa_config *oa_config,
+					   struct intel_sseu sseu)
 {
 	struct drm_i915_private *dev_priv = ctx->i915;
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
@@ -1677,6 +1678,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
 }
 
 /*
@@ -1807,6 +1811,7 @@ static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
 static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				       const struct i915_oa_config *oa_config)
 {
+	struct intel_sseu default_sseu = intel_device_default_sseu(dev_priv);
 	struct intel_engine_cs *engine = dev_priv->engine[RCS];
 	struct i915_gem_context *ctx;
 	int ret;
@@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
+					       oa_config ? default_sseu : ce->sseu);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    u32 *reg_state)
 {
+	struct drm_i915_private *i915 = engine->i915;
 	struct i915_perf_stream *stream;
 
 	if (engine->id != RCS)
 		return;
 
-	stream = engine->i915->perf.oa.exclusive_stream;
-	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+	stream = i915->perf.oa.exclusive_stream;
+	if (stream) {
+		struct intel_sseu default_sseu =
+			intel_device_default_sseu(i915);
+
+		gen8_update_reg_state_unlocked(ctx, reg_state,
+					       stream->oa_config,
+					       default_sseu);
+	}
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7b2f2d6bb057..8a2997be7ef7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(const struct sseu_dev_info *sseu,
-		     struct intel_sseu ctx_sseu)
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   struct intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
 	}
 
 	if (rcs) {
+		struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
+
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
-				  to_intel_context(ctx, engine)->sseu));
+			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				       intel_engine_prepare_sseu(engine,
+								 sseu)));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f5a5502ecf70..bf3acdc3d0af 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   struct intel_sseu ctx_sseu);
+
 #endif /* _INTEL_LRC_H_ */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/8] drm/i915: Add global barrier support
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:59   ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 7/8] drm/i915: Explicitly mark Global GTT address spaces Tvrtko Ursulin
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Global barrier is a facility to allow serialization between different
timelines.

After calling i915_gem_set_global_barrier on a request, all following
submissions on any engine will be set up as depending on this global
barrier. Once the global barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

-------------------------------------------------------------------------
This code was part of the larger SSEU patch but I extracted it to be
separate for ease of review and clarity. I think it originates from Chris
Wilson so permission pending I will change the author and add appropriate
S-o-B.
-------------------------------------------------------------------------

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c     |  2 ++
 drivers/gpu/drm/i915/i915_request.c | 16 ++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c12d2676435..643089ba01b9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2098,6 +2098,16 @@ struct drm_i915_private {
 		u32 active_requests;
 		u32 request_serial;
 
+		/**
+		 * Global barrier for the ability to serialize ordering between
+		 * different timelines.
+		 *
+		 * Users can call i915_gem_set_global_barrier which will make
+		 * all subsequent submission be execute only after this barrier
+		 * has been completed.
+		 */
+		struct i915_gem_active global_barrier;
+
 		/**
 		 * Is the GPU currently considered idle, or busy executing
 		 * userspace requests? Whilst idle, we allow runtime power
@@ -3230,6 +3240,23 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_hw_ppgtt, vm);
 }
 
+/**
+ * i915_gem_set_global_barrier - orders submission on different timelines
+ * @i915: i915 device private
+ * @rq: request after which new submissions can proceed
+ *
+ * Sets the passed in request as the serialization point for all subsequent
+ * submissions, regardless of the engine/timeline. Subsequent requests will not
+ * be submitted to GPU until the global barrier has been completed.
+ */
+static inline void
+i915_gem_set_global_barrier(struct drm_i915_private *i915,
+			    struct i915_request *rq)
+{
+	lockdep_assert_held(&i915->drm.struct_mutex);
+	i915_gem_active_set(&i915->gt.global_barrier, rq);
+}
+
 /* i915_gem_fence_reg.c */
 struct drm_i915_fence_reg *
 i915_reserve_fence(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0453eb42a1a3..be462ef65786 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5752,6 +5752,8 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->priorities)
 		goto err_dependencies;
 
+	init_request_active(&dev_priv->gt.global_barrier, NULL);
+
 	INIT_LIST_HEAD(&dev_priv->gt.timelines);
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 09ed48833b54..8b45f74dc748 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -644,6 +644,18 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	return NOTIFY_DONE;
 }
 
+static int add_global_barrier(struct i915_request *rq)
+{
+	struct i915_request *barrier;
+
+	barrier = i915_gem_active_raw(&rq->i915->gt.global_barrier,
+				      &rq->i915->drm.struct_mutex);
+	if (barrier)
+		return i915_request_await_dma_fence(rq, &barrier->fence);
+
+	return 0;
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -806,6 +818,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_global_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	/* Unconditionally invalidate GPU caches and TLBs. */
 	ret = engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 7/8] drm/i915: Explicitly mark Global GTT address spaces
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 6/8] drm/i915: Add global barrier support Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

So far we have been relying on vm->file pointer being NULL to declare
something GGTT.

This has the unfortunate consequence that the default kernel context is
also declared GGTT and interferes with the following patch which wants to
instantiate VMA's and execute requests against the kernel context.

Change the is_ggtt test to use an explicit flag in struct address_space to
solve this issue.

Note that the bit used is free since there is an alignment hole in the
struct.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++
 drivers/gpu/drm/i915/i915_gem_gtt.h | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4137af4bd8f5..64151ad2a02b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3610,6 +3610,8 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	i915_address_space_init(&ggtt->vm, dev_priv);
 
+	ggtt->vm.is_ggtt = true;
+
 	/* Only VLV supports read-only GGTT mappings */
 	ggtt->vm.has_read_only = IS_VALLEYVIEW(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dd161c187a68..d8dd4d9280bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -331,6 +331,9 @@ struct i915_address_space {
 
 	struct pagestash free_pages;
 
+	/* Global GTT */
+	bool is_ggtt:1;
+
 	/* Some systems require uncached updates of the page directories */
 	bool pt_kmap_wc:1;
 
@@ -364,7 +367,7 @@ struct i915_address_space {
 	I915_SELFTEST_DECLARE(bool scrub_64K);
 };
 
-#define i915_is_ggtt(V) (!(V)->file)
+#define i915_is_ggtt(vm) ((vm)->is_ggtt)
 
 static inline bool
 i915_vm_is_48bit(const struct i915_address_space *vm)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 7/8] drm/i915: Explicitly mark Global GTT address spaces Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:22   ` Chris Wilson
  2018-08-14 14:45 ` [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
	                                                .instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
    dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  43 ++++++
 4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8a12984e7495..6d6220634e9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct i915_request *rq;
+	struct intel_ring *ring;
+	int ret;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	/* Submitting requests etc needs the hw awake. */
+	intel_runtime_pm_get(i915);
+
+	i915_retire_requests(i915);
+
+	/* Now use the RCS to actually reconfigure. */
+	engine = i915->engine[RCS];
+
+	rq = i915_request_alloc(engine, i915->kernel_context);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		goto out_put;
+	}
+
+	ret = engine->emit_rpcs_config(rq, ctx, sseu);
+	if (ret)
+		goto out_add;
+
+	/* Queue this switch after all other activity */
+	list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
+		struct i915_request *prev;
+
+		prev = last_request_on_engine(ring->timeline, engine);
+		if (prev)
+			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							 &prev->submit,
+							 I915_FENCE_GFP);
+	}
+
+	i915_gem_set_global_barrier(i915, rq);
+
+out_add:
+	i915_request_add(rq);
+out_put:
+	intel_runtime_pm_put(i915);
+
+	return ret;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -810,6 +895,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -819,6 +907,101 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int
+__user_to_context_sseu(const struct sseu_dev_info *device,
+		       const struct drm_i915_gem_context_param_sseu *user,
+		       struct intel_sseu *context)
+{
+	/* No zeros in any field. */
+	if (!user->slice_mask || !user->subslice_mask ||
+	    !user->min_eus_per_subslice || !user->max_eus_per_subslice)
+		return -EINVAL;
+
+	/* Max > min. */
+	if (user->max_eus_per_subslice < user->min_eus_per_subslice)
+		return -EINVAL;
+
+	/* Check validity against hardware. */
+	if (user->slice_mask & ~device->slice_mask)
+		return -EINVAL;
+
+	if (user->subslice_mask & ~device->subslice_mask[0])
+		return -EINVAL;
+
+	if (user->max_eus_per_subslice > device->max_eus_per_subslice)
+		return -EINVAL;
+
+	context->slice_mask = user->slice_mask;
+	context->subslice_mask = user->subslice_mask;
+	context->min_eus_per_subslice = user->min_eus_per_subslice;
+	context->max_eus_per_subslice = user->max_eus_per_subslice;
+
+	return 0;
+}
+
+static int set_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_sseu ctx_sseu;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (!USES_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (!IS_GEN11(i915) && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	ret = __user_to_context_sseu(&INTEL_INFO(i915)->sseu, &user_sseu,
+				     &ctx_sseu);
+	if (ret)
+		return ret;
+
+	ce = to_intel_context(ctx, engine);
+
+	/* Nothing to do if unmodified. */
+	if (!memcmp(&ce->sseu, &ctx_sseu, sizeof(ctx_sseu)))
+		return 0;
+
+	ret = i915_gem_context_reconfigure_sseu(ctx, engine, ctx_sseu);
+	if (ret)
+		return ret;
+
+	/*
+	 * Copy the configuration to all engines. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, i915, id) {
+		ce = to_intel_context(ctx, engine);
+		ce->sseu = ctx_sseu;
+	}
+
+	return 0;
+}
+
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -884,7 +1067,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
-
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = set_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8a2997be7ef7..0f780c666e98 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2232,6 +2232,60 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 }
 static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
 
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct i915_gem_context *ctx,
+				 struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = rq->i915;
+	struct intel_context *ce = to_intel_context(ctx, i915->engine[RCS]);
+	struct i915_vma *vma;
+	u64 offset;
+	u32 *cs;
+	int err;
+
+	/* Let the deferred state allocation take care of this. */
+	if (!ce->state)
+		return 0;
+
+	vma = i915_vma_instance(ce->state->obj,
+				&i915->kernel_context->ppgtt->vm,
+				NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err) {
+		i915_vma_close(vma);
+		return err;
+	}
+
+	err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+	if (unlikely(err)) {
+		i915_vma_close(vma);
+		return err;
+	}
+
+	i915_vma_unpin(vma);
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = vma->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(&INTEL_INFO(i915)->sseu,
+			       intel_engine_prepare_sseu(rq->engine, sseu));
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
@@ -2324,6 +2378,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_flush = gen8_emit_flush;
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
+	engine->emit_rpcs_config = gen8_emit_rpcs_config;
 
 	engine->set_default_submission = intel_execlists_set_default_submission;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9090885d57de..acb8b6fe912a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -477,6 +477,10 @@ struct intel_engine_cs {
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
+	int		(*emit_rpcs_config)(struct i915_request *rq,
+					    struct i915_gem_context *ctx,
+					    struct intel_sseu sseu);
+
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a4446f452040..e195c38b15a6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1478,9 +1478,52 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 class;
+	__u16 instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd1;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd2;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v10 0/8] Per context dynamic (sub)slice power-gating
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
@ 2018-08-14 14:45 ` Tvrtko Ursulin
  2018-08-14 15:15 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:45 UTC (permalink / raw)
  To: Intel-gfx


On 14/08/18 15:40, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Updated series after continuing Lionel's work.
> 
> Userspace for the feature is the media-driver project on GitHub. Please see
> https://github.com/intel/media-driver/pull/271/commits.
> 
> Headline changes:
> 
>   1.
> 
>    No more master allow/disallow sysfs switch. Feature is unconditionally
>    enabled for Gen11 and on other platforms it requires CAP_SYS_ADMIN.
> 
>    *** To be discussed if this is a good idea or not. ***
> 
>   2.
> 
>    Two new patches due a) breaking out the global barrier, and b) fixing one
>    GEM_BUG_ON regarding incorrent kernel context classification by i915_is_ggtt.
> 
> 
> Otherwise please see individial patch change logs.
> 
> Main topic for the cover letter though is addressing the question of dynamic
> slice re-configuration performance impact.
> 
> Introduction into this problem space is that changing the (sub)slice
> configuration has a cost at context switch time in the order of tens of milli-
> seconds. (It varies per Gen and with different slice count transitions.)
> 
> So the question is whether a malicious unprivileged workload can negatively
> impact other clients. To try and answer this question I have extended gem_wsim
> and creating some test workloads. (Note that my testing was done on a Gen9
> system. Overall message could be the same on Gen11 but needs to be verified.)
> 
> First test was a simulated video playback client running in parallel with a
> simulated game of both medium and high complexity (uses around 60% or 90% of the
> render engine respectively, and 7% of the blitter engine). I had two flavours of
> the playback client, one which runs normally and one which requests reduced
> slice configuration. Both workloads are targetting to run at 60fps.
> 
> Second test is the same but against a heavier simulated game workload, the one
> which uses around 90% of the render engine.
> 
> Results are achieved frames per second as observed from the game client:
> 
>                       No player  Normal player   SSEU enabled player
>          Medium game     59.6        59.6               59.6
>           Heavy game     59.7        58.4               58.1
> 
> Here we can see that the medium workload was not affected either by the normal
> or SSEU player, while the heavy workload did see a performance hit. Both with
> the video player running in parallel, and slighlty larger when the player was
> SSEU enabled.
> 
> Second test is running a malicious client (or clients) in parallel to the same
> simulated game workloads. These clients try to trigger many context switches by
> using multiple contexts with dependencies set up so request coalescing is
> defeated as much as possible.
> 
> I tested both with normal and SSEU enabled malicious clients:
> 
>                       DoS client   SSEU DoS client
>          Medium game     59.5           59.6
>           Heavy game     57.8           55.4
> 
> For here we can see a similar picture as with the first test. Medium game client
> is not affected by either DoS client, while the heavy game client is, more so
> with the SSEU enabled attacker.
> 
>>From both tests I think the conclusion is that dynamic SSEU switching does
> increase the magnitude of performance loss, especially with over-subscribed
> engines, due cost being proportional to context switch frequency.
> 
> Likelyhood is that it slightly lowers the utilization level at which this starts
> to happen, but does not introduce a completely new vector of attack - that is -
> where it was possible to DoS a system from an unprivileged client, it still is.
> In both cases (SSEU enabled or not), a malicious client has the option to grind
> the system to a halt, albeit it may need fewer submission threads to do so when
> it is SSEU enabled.

For reference, gem_wsim workloads used to test this (even though the 
number of people familiar with them is quite low):

Medium game workload:

1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
1.RCS.1000-2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667

Heavy game workload:

1.RCS.500.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
1.RCS.2000.0.0
P.2.1
2.BCS.1000.-2.0
2.RCS.2000.-1.1
p.16667

Normal video player:

1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667

SSEU enabled video player:

S.1.1
S.2.1
1.VCS.5000-10000.0.0
2.RCS.1000-2000.-1.0
P.3.1
3.BCS.1000.-2.0
p.16667

Malicious client:

1.RCS.1.0.0
2.RCS.1.-1.0

SSEU enabled malicious client:

S.2.1
1.RCS.1.0.0
2.RCS.1.-1.0


Regards,

Tvrtko

> 
> Chris Wilson (3):
>    drm/i915: Program RPCS for Broadwell
>    drm/i915: Record the sseu configuration per-context & engine
>    drm/i915: Expose RPCS (SSEU) configuration to userspace
> 
> Lionel Landwerlin (3):
>    drm/i915/perf: simplify configure all context function
>    drm/i915/perf: reuse intel_lrc ctx regs macro
>    drm/i915/perf: lock powergating configuration to default when active
> 
> Tvrtko Ursulin (2):
>    drm/i915: Add global barrier support
>    drm/i915: Explicitly mark Global GTT address spaces
> 
>   drivers/gpu/drm/i915/i915_drv.h         |  56 +++++++
>   drivers/gpu/drm/i915/i915_gem.c         |   2 +
>   drivers/gpu/drm/i915/i915_gem_context.c | 189 +++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h |   4 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c     |   2 +
>   drivers/gpu/drm/i915/i915_gem_gtt.h     |   5 +-
>   drivers/gpu/drm/i915/i915_perf.c        |  68 +++++----
>   drivers/gpu/drm/i915/i915_request.c     |  16 ++
>   drivers/gpu/drm/i915/i915_request.h     |  10 ++
>   drivers/gpu/drm/i915/intel_lrc.c        |  87 ++++++++---
>   drivers/gpu/drm/i915/intel_lrc.h        |   3 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>   include/uapi/drm/i915_drm.h             |  43 ++++++
>   13 files changed, 439 insertions(+), 50 deletions(-)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] drm/i915: Add global barrier support
  2018-08-14 14:40 ` [PATCH 6/8] drm/i915: Add global barrier support Tvrtko Ursulin
@ 2018-08-14 14:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, Chris Wilson


Chris, for this one please let me know if it is okay to give you 
authorship and to add your S-o-B.

Tvrtko

On 14/08/2018 15:40, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Global barrier is a facility to allow serialization between different
> timelines.
> 
> After calling i915_gem_set_global_barrier on a request, all following
> submissions on any engine will be set up as depending on this global
> barrier. Once the global barrier has been completed it automatically gets
> cleared and things continue as normal.
> 
> This facility will be used by the upcoming context SSEU code.
> 
> -------------------------------------------------------------------------
> This code was part of the larger SSEU patch but I extracted it to be
> separate for ease of review and clarity. I think it originates from Chris
> Wilson so permission pending I will change the author and add appropriate
> S-o-B.
> -------------------------------------------------------------------------
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h     | 27 +++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem.c     |  2 ++
>   drivers/gpu/drm/i915/i915_request.c | 16 ++++++++++++++++
>   3 files changed, 45 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5c12d2676435..643089ba01b9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2098,6 +2098,16 @@ struct drm_i915_private {
>   		u32 active_requests;
>   		u32 request_serial;
>   
> +		/**
> +		 * Global barrier for the ability to serialize ordering between
> +		 * different timelines.
> +		 *
> +		 * Users can call i915_gem_set_global_barrier which will make
> +		 * all subsequent submission be execute only after this barrier
> +		 * has been completed.
> +		 */
> +		struct i915_gem_active global_barrier;
> +
>   		/**
>   		 * Is the GPU currently considered idle, or busy executing
>   		 * userspace requests? Whilst idle, we allow runtime power
> @@ -3230,6 +3240,23 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
>   	return container_of(vm, struct i915_hw_ppgtt, vm);
>   }
>   
> +/**
> + * i915_gem_set_global_barrier - orders submission on different timelines
> + * @i915: i915 device private
> + * @rq: request after which new submissions can proceed
> + *
> + * Sets the passed in request as the serialization point for all subsequent
> + * submissions, regardless of the engine/timeline. Subsequent requests will not
> + * be submitted to GPU until the global barrier has been completed.
> + */
> +static inline void
> +i915_gem_set_global_barrier(struct drm_i915_private *i915,
> +			    struct i915_request *rq)
> +{
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +	i915_gem_active_set(&i915->gt.global_barrier, rq);
> +}
> +
>   /* i915_gem_fence_reg.c */
>   struct drm_i915_fence_reg *
>   i915_reserve_fence(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0453eb42a1a3..be462ef65786 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5752,6 +5752,8 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   	if (!dev_priv->priorities)
>   		goto err_dependencies;
>   
> +	init_request_active(&dev_priv->gt.global_barrier, NULL);
> +
>   	INIT_LIST_HEAD(&dev_priv->gt.timelines);
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>   	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 09ed48833b54..8b45f74dc748 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -644,6 +644,18 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>   	return NOTIFY_DONE;
>   }
>   
> +static int add_global_barrier(struct i915_request *rq)
> +{
> +	struct i915_request *barrier;
> +
> +	barrier = i915_gem_active_raw(&rq->i915->gt.global_barrier,
> +				      &rq->i915->drm.struct_mutex);
> +	if (barrier)
> +		return i915_request_await_dma_fence(rq, &barrier->fence);
> +
> +	return 0;
> +}
> +
>   /**
>    * i915_request_alloc - allocate a request structure
>    *
> @@ -806,6 +818,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 */
>   	rq->head = rq->ring->emit;
>   
> +	ret = add_global_barrier(rq);
> +	if (ret)
> +		goto err_unwind;
> +
>   	/* Unconditionally invalidate GPU caches and TLBs. */
>   	ret = engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
@ 2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:11     ` Lionel Landwerlin
  2018-08-14 18:44     ` Tvrtko Ursulin
  2018-08-14 15:22   ` Chris Wilson
  1 sibling, 2 replies; 44+ messages in thread
From: Chris Wilson @ 2018-08-14 14:59 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>                                                         .instance = 0, };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a global
>     dependency (Chris)
>     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>     s/dev_priv/i915/ (Tvrtko)
>     Change uapi class/instance fields to u16 (Tvrtko)
>     Bump mask fields to 64bits (Lionel)
>     Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> 
> v9: Import context image into kernel context's ppgtt only when
>     reconfiguring powergated slice/subslices (Chris)
>     Use aliasing ppgtt when needed (Michel)
> 
> Tvrtko Ursulin:
> 
> v10:
>  * Update for upstream changes.
>  * Request submit needs a RPM reference.
>  * Reject on !FULL_PPGTT for simplicity.
>  * Pull out get/set param to helpers for readability and less indent.
>  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>    on the same timeline and avoid GEM_BUG_ON.
>  * No need to explicitly assign a NULL pointer to engine in legacy mode.
>  * No need to move gen8_make_rpcs up.
>  * Factored out global barrier as prep patch.
>  * Allow to only CAP_SYS_ADMIN if !Gen11.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Issue: https://github.com/intel/media-driver/issues/267
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>  include/uapi/drm/i915_drm.h             |  43 ++++++
>  4 files changed, 288 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8a12984e7495..6d6220634e9e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +                                 struct intel_engine_cs *engine,
> +                                 struct intel_sseu sseu)
> +{
> +       struct drm_i915_private *i915 = ctx->i915;
> +       struct i915_request *rq;
> +       struct intel_ring *ring;
> +       int ret;
> +
> +       lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +       /* Submitting requests etc needs the hw awake. */
> +       intel_runtime_pm_get(i915);
> +
> +       i915_retire_requests(i915);

?

> +
> +       /* Now use the RCS to actually reconfigure. */
> +       engine = i915->engine[RCS];

? Modifying registers stored in another engine's context image.

> +
> +       rq = i915_request_alloc(engine, i915->kernel_context);
> +       if (IS_ERR(rq)) {
> +               ret = PTR_ERR(rq);
> +               goto out_put;
> +       }
> +
> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);

It's just an LRI, I'd rather we do it directly unless there's evidence
that there will be na explicit rpcs config instruction in future. It
just doesn't seem general enough.

> +       if (ret)
> +               goto out_add;
> +
> +       /* Queue this switch after all other activity */

Only needs to be after the target ctx.

> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> +               struct i915_request *prev;
> +
> +               prev = last_request_on_engine(ring->timeline, engine);

As constructed above you need target-engine + RCS.

> +               if (prev)
> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +                                                        &prev->submit,
> +                                                        I915_FENCE_GFP);
> +       }
> +
> +       i915_gem_set_global_barrier(i915, rq);

This is just for a link from ctx-engine to this rq. Overkill much?
Presumably this stems from using the wrong engine.

> +
> +out_add:
> +       i915_request_add(rq);

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
  2018-08-14 14:40 ` [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
@ 2018-08-14 14:59   ` Lionel Landwerlin
  2018-08-15 11:58     ` Tvrtko Ursulin
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 14:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Hey Tvrtko,

Thanks for taking over this series.

I've been talking to developers using the i915/perf interface and from 
their point of view, they expect the system to be in a stable 
configuration when doing measurements.

One issue with this patch on Gen11 is that it will lock the system in a 
configuration that isn't workable for media workloads (all subslices 
enabled).
So I think we should set the value for the locked configuration per 
generation (gen8-10: all slices/subslices, gen11: only subslices that 
contain VME samplers) so that we always get a functional configurations 
for all workloads.
Could we want to select that configuration when opening perf?

Another question is how do we expose the selected value to the user. But 
that can be solved in a different series.

Cheers,

-
Lionel

On 14/08/18 15:40, Tvrtko Ursulin wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes.
>
> One possible solution to this problem is to reprogram the NOA muxes
> when we switch to a new context. We initially tried this in the
> workaround batchbuffer but some concerns where raised about the cost
> of reprogramming at every context switch. This solution is also not
> without consequences from the userspace point of view. Reprogramming
> of the muxes can only happen once the powergating configuration has
> changed (which happens after context switch). This means for a window
> of time during the recording, counters recorded by the OA unit might
> be invalid. This requires userspace dealing with OA reports to discard
> the invalid values.
>
> Minimizing the reprogramming could be implemented by tracking of the
> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> to discard some of the programming commands, but the command streamer
> would still have to parse all the MI_LRI instructions in the
> workaround batchbuffer.
>
> Another solution, which this change implements, is to simply disregard
> the user requested configuration for the period of time when i915/perf
> is active. There is no known issue with this apart from a performance
> penality for some media workloads that benefit from running on a
> partially powergated GPU. We already prevent RC6 from affecting the
> programming so it doesn't sound completely unreasonable to hold on
> powergating for the same reason.
>
> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>
> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>      More to_intel_context() (Tvrtko)
>      s/dev_priv/i915/ (Tvrtko)
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
>   drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>   4 files changed, 43 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d6049c3f911b..5c12d2676435 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3851,4 +3851,19 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
>   		return I915_HWS_CSB_WRITE_INDEX;
>   }
>   
> +static inline struct intel_sseu
> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
> +			  struct intel_sseu sseu)
> +{
> +	struct drm_i915_private *i915 = engine->i915;
> +
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. The most natural configuration to take in that case
> +	 * is the default (i.e maximum the hardware can do).
> +	 */
> +	return i915->perf.oa.exclusive_stream ?
> +		intel_device_default_sseu(i915) : sseu;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index ccb20230df2c..c2fc2399e0ed 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
>    */
>   static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   					   u32 *reg_state,
> -					   const struct i915_oa_config *oa_config)
> +					   const struct i915_oa_config *oa_config,
> +					   struct intel_sseu sseu)
>   {
>   	struct drm_i915_private *dev_priv = ctx->i915;
>   	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
> @@ -1677,6 +1678,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
> +
> +	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>   }
>   
>   /*
> @@ -1807,6 +1811,7 @@ static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>   static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   				       const struct i915_oa_config *oa_config)
>   {
> +	struct intel_sseu default_sseu = intel_device_default_sseu(dev_priv);
>   	struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   	struct i915_gem_context *ctx;
>   	int ret;
> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   		ce->state->obj->mm.dirty = true;
>   		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>   
> -		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
> +		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
> +					       oa_config ? default_sseu : ce->sseu);
>   
>   		i915_gem_object_unpin_map(ce->state->obj);
>   	}
> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    u32 *reg_state)
>   {
> +	struct drm_i915_private *i915 = engine->i915;
>   	struct i915_perf_stream *stream;
>   
>   	if (engine->id != RCS)
>   		return;
>   
> -	stream = engine->i915->perf.oa.exclusive_stream;
> -	if (stream)
> -		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
> +	stream = i915->perf.oa.exclusive_stream;
> +	if (stream) {
> +		struct intel_sseu default_sseu =
> +			intel_device_default_sseu(i915);
> +
> +		gen8_update_reg_state_unlocked(ctx, reg_state,
> +					       stream->oa_config,
> +					       default_sseu);
> +	}
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7b2f2d6bb057..8a2997be7ef7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
> -		     struct intel_sseu ctx_sseu)
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   struct intel_sseu ctx_sseu)
>   {
>   	u32 rpcs = 0;
>   
> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>   	}
>   
>   	if (rcs) {
> +		struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
> +
>   		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> -			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> -				  to_intel_context(ctx, engine)->sseu));
> +			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> +				       intel_engine_prepare_sseu(engine,
> +								 sseu)));
>   
>   		i915_oa_init_reg_state(engine, ctx, regs);
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f5a5502ecf70..bf3acdc3d0af 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
>   void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
>   
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   struct intel_sseu ctx_sseu);
> +
>   #endif /* _INTEL_LRC_H_ */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:59   ` Chris Wilson
@ 2018-08-14 15:11     ` Lionel Landwerlin
  2018-08-14 15:18       ` Chris Wilson
  2018-08-14 18:44     ` Tvrtko Ursulin
  1 sibling, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 15:11 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 15:59, Chris Wilson wrote:
> And I'd still recommend not using indirect access if we can apply the
> changes immediately.
> -Chris
>

Hangs on Gen9 :(

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2018-08-14 14:45 ` [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
@ 2018-08-14 15:15 ` Patchwork
  2018-08-14 15:19 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-08-14 15:15 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating
URL   : https://patchwork.freedesktop.org/series/48194/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
6a364f5371a6 drm/i915: Program RPCS for Broadwell
f2e29d92ffdf drm/i915: Record the sseu configuration per-context & engine
5802b42315f5 drm/i915/perf: simplify configure all context function
e3d46c7d1eb6 drm/i915/perf: reuse intel_lrc ctx regs macro
19449778b5b3 drm/i915/perf: lock powergating configuration to default when active
128663cebf31 drm/i915: Add global barrier support
46adb8a4a966 drm/i915: Explicitly mark Global GTT address spaces
-:44: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#44: FILE: drivers/gpu/drm/i915/i915_gem_gtt.h:335:
+	bool is_ggtt:1;

total: 0 errors, 1 warnings, 0 checks, 25 lines checked
491e75be6bef drm/i915: Expose RPCS (SSEU) configuration to userspace
-:40: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#40: 
v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

total: 0 errors, 1 warnings, 0 checks, 340 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:11     ` Lionel Landwerlin
@ 2018-08-14 15:18       ` Chris Wilson
  2018-08-14 16:05         ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-08-14 15:18 UTC (permalink / raw)
  To: Intel-gfx, Lionel Landwerlin, Tvrtko Ursulin

Quoting Lionel Landwerlin (2018-08-14 16:11:36)
> On 14/08/18 15:59, Chris Wilson wrote:
> > And I'd still recommend not using indirect access if we can apply the
> > changes immediately.
> > -Chris
> >
> 
> Hangs on Gen9 :(

How does modifying the context image of an idle (unpinned) context cause
a hang?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.SPARSE: warning for Per context dynamic (sub)slice power-gating
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2018-08-14 15:15 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2018-08-14 15:19 ` Patchwork
  2018-08-14 15:32 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-08-14 20:22 ` ✗ Fi.CI.IGT: failure " Patchwork
  12 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-08-14 15:19 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating
URL   : https://patchwork.freedesktop.org/series/48194/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: Record the sseu configuration per-context & engine
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3684:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3698:16: warning: expression using sizeof(void)

Commit: drm/i915/perf: simplify configure all context function
Okay!

Commit: drm/i915/perf: reuse intel_lrc ctx regs macro
Okay!

Commit: drm/i915/perf: lock powergating configuration to default when active
Okay!

Commit: drm/i915: Add global barrier support
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3698:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3725:16: warning: expression using sizeof(void)

Commit: drm/i915: Explicitly mark Global GTT address spaces
Okay!

Commit: drm/i915: Expose RPCS (SSEU) configuration to userspace
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
  2018-08-14 14:59   ` Chris Wilson
@ 2018-08-14 15:22   ` Chris Wilson
  2018-08-15 11:51     ` Tvrtko Ursulin
  1 sibling, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-08-14 15:22 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +                                 struct intel_engine_cs *engine,
> +                                 struct intel_sseu sseu)
> +{
> +       struct drm_i915_private *i915 = ctx->i915;
> +       struct i915_request *rq;
> +       struct intel_ring *ring;
> +       int ret;
> +
> +       lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +       /* Submitting requests etc needs the hw awake. */
> +       intel_runtime_pm_get(i915);
> +
> +       i915_retire_requests(i915);
> +
> +       /* Now use the RCS to actually reconfigure. */
> +       engine = i915->engine[RCS];
> +
> +       rq = i915_request_alloc(engine, i915->kernel_context);
> +       if (IS_ERR(rq)) {
> +               ret = PTR_ERR(rq);
> +               goto out_put;
> +       }
> +
> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> +       if (ret)
> +               goto out_add;
> +
> +       /* Queue this switch after all other activity */
> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> +               struct i915_request *prev;
> +
> +               prev = last_request_on_engine(ring->timeline, engine);
> +               if (prev)
> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +                                                        &prev->submit,
> +                                                        I915_FENCE_GFP);
> +       }
> +
> +       i915_gem_set_global_barrier(i915, rq);
> +
> +out_add:
> +       i915_request_add(rq);
> +out_put:
> +       intel_runtime_pm_put(i915);
> +
> +       return ret;

Looks like we should be able to hook this up to a selftest to confirm
the modification does land in the target context image, and a SRM to
confirm it loaded.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✓ Fi.CI.BAT: success for Per context dynamic (sub)slice power-gating
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2018-08-14 15:19 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-08-14 15:32 ` Patchwork
  2018-08-14 20:22 ` ✗ Fi.CI.IGT: failure " Patchwork
  12 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-08-14 15:32 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating
URL   : https://patchwork.freedesktop.org/series/48194/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4666 -> Patchwork_9941 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/48194/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9941 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      fi-skl-6700hq:      PASS -> DMESG-FAIL (fdo#106560, fdo#107174)

    
    ==== Possible fixes ====

    igt@drv_module_reload@basic-reload-inject:
      fi-hsw-4770r:       DMESG-WARN (fdo#107425) -> PASS

    igt@drv_selftest@live_hangcheck:
      fi-kbl-7560u:       DMESG-FAIL (fdo#106947, fdo#106560) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      {fi-byt-clapper}:   FAIL (fdo#103167) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#106947 https://bugs.freedesktop.org/show_bug.cgi?id=106947
  fdo#107174 https://bugs.freedesktop.org/show_bug.cgi?id=107174
  fdo#107425 https://bugs.freedesktop.org/show_bug.cgi?id=107425


== Participating hosts (53 -> 49) ==

  Additional (1): fi-glk-j4005 
  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4666 -> Patchwork_9941

  CI_DRM_4666: 26f5eeef80e4332958ea855e90a4d015a9481e3f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4594: b0263e5d0563a81a42cf66e7d3b84662d3222862 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9941: 491e75be6beffef5d6503f91b4ce12d1fc3d7289 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

491e75be6bef drm/i915: Expose RPCS (SSEU) configuration to userspace
46adb8a4a966 drm/i915: Explicitly mark Global GTT address spaces
128663cebf31 drm/i915: Add global barrier support
19449778b5b3 drm/i915/perf: lock powergating configuration to default when active
e3d46c7d1eb6 drm/i915/perf: reuse intel_lrc ctx regs macro
5802b42315f5 drm/i915/perf: simplify configure all context function
f2e29d92ffdf drm/i915: Record the sseu configuration per-context & engine
6a364f5371a6 drm/i915: Program RPCS for Broadwell

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9941/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:18       ` Chris Wilson
@ 2018-08-14 16:05         ` Lionel Landwerlin
  2018-08-14 16:09           ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 16:05 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 16:18, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-08-14 16:11:36)
>> On 14/08/18 15:59, Chris Wilson wrote:
>>> And I'd still recommend not using indirect access if we can apply the
>>> changes immediately.
>>> -Chris
>>>
>> Hangs on Gen9 :(
> How does modifying the context image of an idle (unpinned) context cause
> a hang?
> -Chris
>
I thought you meant a context modifying it's own RPCS register... no?

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 16:05         ` Lionel Landwerlin
@ 2018-08-14 16:09           ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 16:09 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 17:05, Lionel Landwerlin wrote:
> On 14/08/18 16:18, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2018-08-14 16:11:36)
>>> On 14/08/18 15:59, Chris Wilson wrote:
>>>> And I'd still recommend not using indirect access if we can apply the
>>>> changes immediately.
>>>> -Chris
>>>>
>>> Hangs on Gen9 :(
>> How does modifying the context image of an idle (unpinned) context cause
>> a hang?
>> -Chris
>>
> I thought you meant a context modifying it's own RPCS register... no?
>
> -
> Lionel

Oh I get it now... Sorry, forget me :)

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:11     ` Lionel Landwerlin
@ 2018-08-14 18:44     ` Tvrtko Ursulin
  2018-08-14 18:53       ` Chris Wilson
  1 sibling, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 18:44 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 15:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>          struct drm_i915_gem_context_param arg;
>>          struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                          .instance = 0, };
>>
>>          memset(&arg, 0, sizeof(arg));
>>          arg.ctx_id = ctx;
>>          arg.param = I915_CONTEXT_PARAM_SSEU;
>>          arg.value = (uintptr_t) &sseu;
>>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>                  sseu.packed.subslice_mask = 0;
>>
>>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>          }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
>>
>> v7: Synchronize the requests following a powergating setting change using a global
>>      dependency (Chris)
>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
>>
>> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      s/dev_priv/i915/ (Tvrtko)
>>      Change uapi class/instance fields to u16 (Tvrtko)
>>      Bump mask fields to 64bits (Lionel)
>>      Don't return EPERM when dynamic sseu is disabled (Tvrtko)
>>
>> v9: Import context image into kernel context's ppgtt only when
>>      reconfiguring powergated slice/subslices (Chris)
>>      Use aliasing ppgtt when needed (Michel)
>>
>> Tvrtko Ursulin:
>>
>> v10:
>>   * Update for upstream changes.
>>   * Request submit needs a RPM reference.
>>   * Reject on !FULL_PPGTT for simplicity.
>>   * Pull out get/set param to helpers for readability and less indent.
>>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>>     on the same timeline and avoid GEM_BUG_ON.
>>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
>>   * No need to move gen8_make_rpcs up.
>>   * Factored out global barrier as prep patch.
>>   * Allow to only CAP_SYS_ADMIN if !Gen11.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Issue: https://github.com/intel/media-driver/issues/267
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>   include/uapi/drm/i915_drm.h             |  43 ++++++
>>   4 files changed, 288 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 8a12984e7495..6d6220634e9e 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>          return 0;
>>   }
>>   
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                                 struct intel_engine_cs *engine,
>> +                                 struct intel_sseu sseu)
>> +{
>> +       struct drm_i915_private *i915 = ctx->i915;
>> +       struct i915_request *rq;
>> +       struct intel_ring *ring;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>> +
>> +       /* Submitting requests etc needs the hw awake. */
>> +       intel_runtime_pm_get(i915);
>> +
>> +       i915_retire_requests(i915);
> 
> ?

I wondered myself but did not make myself dig through all the history of 
the series. Cannot think that it does anything useful in the current design.

>> +
>> +       /* Now use the RCS to actually reconfigure. */
>> +       engine = i915->engine[RCS];
> 
> ? Modifying registers stored in another engine's context image.

Well, I was hoping design was kind of agreed by now.

>> +
>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>> +       if (IS_ERR(rq)) {
>> +               ret = PTR_ERR(rq);
>> +               goto out_put;
>> +       }
>> +
>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> 
> It's just an LRI, I'd rather we do it directly unless there's evidence
> that there will be na explicit rpcs config instruction in future. It
> just doesn't seem general enough.

No complaints, can do.

>> +       if (ret)
>> +               goto out_add;
>> +
>> +       /* Queue this switch after all other activity */
> 
> Only needs to be after the target ctx.

True, so find just the timeline belonging to target context. Some 
backpointers would be needed to find it. Or a walk and compare against 
target ce->ring->timeline->fence_context. Sounds like more than the 
latter isn't justified for this use case.

> 
>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>> +               struct i915_request *prev;
>> +
>> +               prev = last_request_on_engine(ring->timeline, engine);
> 
> As constructed above you need target-engine + RCS.

Target engine is always RCS. Looks like the engine param to this 
function is pointless.

> 
>> +               if (prev)
>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                                                        &prev->submit,
>> +                                                        I915_FENCE_GFP);
>> +       }
>> +
>> +       i915_gem_set_global_barrier(i915, rq);
> 
> This is just for a link from ctx-engine to this rq. Overkill much?
> Presumably this stems from using the wrong engine.

AFAIU this is to ensure target context cannot get ahead of this request. 
Without it could overtake and then there is no guarantee execbufs 
following set param will be with new SSEU configuration.
> 
>> +
>> +out_add:
>> +       i915_request_add(rq);
> 
> And I'd still recommend not using indirect access if we can apply the
> changes immediately.

Direct (CPU) access means blocking in set param until the context is 
idle. AFAIR in one of the earlier version GPU idling approach was used 
but I thought that is very dangerous API. Can we somehow take just one 
context out of circulation? Still have to idle since it could be deep in 
the queue. The current approach is nicely async. And I don't like having 
two paths - direct if unpinned, or via GPU if pinned. In fact it would 
even be a third path since normally it happens via init_reg_state of 
pristine contexts.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 18:44     ` Tvrtko Ursulin
@ 2018-08-14 18:53       ` Chris Wilson
  2018-08-15  9:12         ` Tvrtko Ursulin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-08-14 18:53 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 19:44:09)
> 
> On 14/08/2018 15:59, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> >> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>          struct drm_i915_gem_context_param arg;
> >>          struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>                                                          .instance = 0, };
> >>
> >>          memset(&arg, 0, sizeof(arg));
> >>          arg.ctx_id = ctx;
> >>          arg.param = I915_CONTEXT_PARAM_SSEU;
> >>          arg.value = (uintptr_t) &sseu;
> >>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>                  sseu.packed.subslice_mask = 0;
> >>
> >>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>          }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> >>
> >> v7: Synchronize the requests following a powergating setting change using a global
> >>      dependency (Chris)
> >>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
> >>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> >>
> >> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
> >>      s/dev_priv/i915/ (Tvrtko)
> >>      Change uapi class/instance fields to u16 (Tvrtko)
> >>      Bump mask fields to 64bits (Lionel)
> >>      Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> >>
> >> v9: Import context image into kernel context's ppgtt only when
> >>      reconfiguring powergated slice/subslices (Chris)
> >>      Use aliasing ppgtt when needed (Michel)
> >>
> >> Tvrtko Ursulin:
> >>
> >> v10:
> >>   * Update for upstream changes.
> >>   * Request submit needs a RPM reference.
> >>   * Reject on !FULL_PPGTT for simplicity.
> >>   * Pull out get/set param to helpers for readability and less indent.
> >>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
> >>     on the same timeline and avoid GEM_BUG_ON.
> >>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
> >>   * No need to move gen8_make_rpcs up.
> >>   * Factored out global barrier as prep patch.
> >>   * Allow to only CAP_SYS_ADMIN if !Gen11.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Issue: https://github.com/intel/media-driver/issues/267
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
> >>   drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h             |  43 ++++++
> >>   4 files changed, 288 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 8a12984e7495..6d6220634e9e 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >>          return 0;
> >>   }
> >>   
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> +                                 struct intel_engine_cs *engine,
> >> +                                 struct intel_sseu sseu)
> >> +{
> >> +       struct drm_i915_private *i915 = ctx->i915;
> >> +       struct i915_request *rq;
> >> +       struct intel_ring *ring;
> >> +       int ret;
> >> +
> >> +       lockdep_assert_held(&i915->drm.struct_mutex);
> >> +
> >> +       /* Submitting requests etc needs the hw awake. */
> >> +       intel_runtime_pm_get(i915);
> >> +
> >> +       i915_retire_requests(i915);
> > 
> > ?
> 
> I wondered myself but did not make myself dig through all the history of 
> the series. Cannot think that it does anything useful in the current design.
> 
> >> +
> >> +       /* Now use the RCS to actually reconfigure. */
> >> +       engine = i915->engine[RCS];
> > 
> > ? Modifying registers stored in another engine's context image.
> 
> Well, I was hoping design was kind of agreed by now.

:-p

This wasn't about using requests per-se, but raising the question of why
use rcs to adjust the vcs context image. If we used vcs, the
question of serialisation is then only on that engine.
 
> >> +       rq = i915_request_alloc(engine, i915->kernel_context);
> >> +       if (IS_ERR(rq)) {
> >> +               ret = PTR_ERR(rq);
> >> +               goto out_put;
> >> +       }
> >> +
> >> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> > 
> > It's just an LRI, I'd rather we do it directly unless there's evidence
> > that there will be na explicit rpcs config instruction in future. It
> > just doesn't seem general enough.
> 
> No complaints, can do.
> 
> >> +       if (ret)
> >> +               goto out_add;
> >> +
> >> +       /* Queue this switch after all other activity */
> > 
> > Only needs to be after the target ctx.
> 
> True, so find just the timeline belonging to target context. Some 
> backpointers would be needed to find it. Or a walk and compare against 
> target ce->ring->timeline->fence_context. Sounds like more than the 
> latter isn't justified for this use case.

Right, we should be able to use ce->ring->timeline.

> >> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> >> +               struct i915_request *prev;
> >> +
> >> +               prev = last_request_on_engine(ring->timeline, engine);
> > 
> > As constructed above you need target-engine + RCS.
> 
> Target engine is always RCS. Looks like the engine param to this 
> function is pointless.

Always always? At the beginning the speculation was that the other
engines were going to get their own subslice parameters. I guess that
was on the same design sheet as the VCS commands...

> >> +               if (prev)
> >> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> >> +                                                        &prev->submit,
> >> +                                                        I915_FENCE_GFP);
> >> +       }
> >> +
> >> +       i915_gem_set_global_barrier(i915, rq);
> > 
> > This is just for a link from ctx-engine to this rq. Overkill much?
> > Presumably this stems from using the wrong engine.
> 
> AFAIU this is to ensure target context cannot get ahead of this request. 
> Without it could overtake and then there is no guarantee execbufs 
> following set param will be with new SSEU configuration.

Right, but we only need a fence on the target context to this request.
The existing way would be to queue an empty request on the target with
the await on this rq, but if you feel that is one request too many we
can use a timeline barrier. Or if that is too narrow, an engine barrier.
Over time, I can see that we may want all 3 barrier levels. The cost
isn't too onerous.

> >> +
> >> +out_add:
> >> +       i915_request_add(rq);
> > 
> > And I'd still recommend not using indirect access if we can apply the
> > changes immediately.
> 
> Direct (CPU) access means blocking in set param until the context is 
> idle.

No, it doesn't. It's is just a choice of whether you use request to a
pinned context, or direct access if idle (literally ce->pin_count).
Direct access is going to be near zero overhead.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.IGT: failure for Per context dynamic (sub)slice power-gating
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  2018-08-14 15:32 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-08-14 20:22 ` Patchwork
  12 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-08-14 20:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating
URL   : https://patchwork.freedesktop.org/series/48194/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4666_full -> Patchwork_9941_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9941_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9941_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9941_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@mock_vma:
      shard-apl:          PASS -> DMESG-FAIL +1
      shard-snb:          PASS -> DMESG-FAIL

    igt@gem_ctx_param@invalid-param-get:
      shard-apl:          PASS -> FAIL
      shard-glk:          PASS -> FAIL
      shard-hsw:          PASS -> FAIL
      shard-kbl:          PASS -> FAIL

    igt@gem_ctx_param@invalid-param-set:
      shard-kbl:          PASS -> DMESG-FAIL +1
      shard-hsw:          PASS -> DMESG-FAIL +1
      shard-snb:          PASS -> FAIL +1
      shard-glk:          PASS -> DMESG-FAIL +1

    
    ==== Warnings ====

    igt@drv_selftest@live_evict:
      shard-snb:          PASS -> SKIP +17

    igt@drv_selftest@live_execlists:
      shard-hsw:          PASS -> SKIP +17

    igt@drv_selftest@live_objects:
      shard-glk:          PASS -> SKIP +17

    igt@drv_selftest@live_requests:
      shard-kbl:          PASS -> SKIP +17

    igt@drv_selftest@live_workarounds:
      shard-apl:          PASS -> SKIP +17

    igt@perf_pmu@rc6:
      shard-kbl:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9941_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_cursor_legacy@cursor-vs-flip-legacy:
      shard-hsw:          PASS -> FAIL (fdo#103355)

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-apl:          PASS -> FAIL (fdo#105363, fdo#102887)

    
    ==== Possible fixes ====

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103355 https://bugs.freedesktop.org/show_bug.cgi?id=103355
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4666 -> Patchwork_9941

  CI_DRM_4666: 26f5eeef80e4332958ea855e90a4d015a9481e3f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4594: b0263e5d0563a81a42cf66e7d3b84662d3222862 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9941: 491e75be6beffef5d6503f91b4ce12d1fc3d7289 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9941/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 18:53       ` Chris Wilson
@ 2018-08-15  9:12         ` Tvrtko Ursulin
  0 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-15  9:12 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 19:53, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 19:44:09)
>>
>> On 14/08/2018 15:59, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>>>
>>>> We expose complete control over the RPCS register, allowing
>>>> configuration of slice/subslice, via masks packed into a u64 for
>>>> simplicity. For example,
>>>>
>>>>           struct drm_i915_gem_context_param arg;
>>>>           struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>>                                                           .instance = 0, };
>>>>
>>>>           memset(&arg, 0, sizeof(arg));
>>>>           arg.ctx_id = ctx;
>>>>           arg.param = I915_CONTEXT_PARAM_SSEU;
>>>>           arg.value = (uintptr_t) &sseu;
>>>>           if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>>                   sseu.packed.subslice_mask = 0;
>>>>
>>>>                   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>>           }
>>>>
>>>> could be used to disable all subslices where supported.
>>>>
>>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>>
>>>> v3: Add ability to program this per engine (Chris)
>>>>
>>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>>
>>>> v5: Validate sseu configuration against the device's capabilities (Lionel)
>>>>
>>>> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
>>>>
>>>> v7: Synchronize the requests following a powergating setting change using a global
>>>>       dependency (Chris)
>>>>       Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>>>       Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
>>>>
>>>> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>       s/dev_priv/i915/ (Tvrtko)
>>>>       Change uapi class/instance fields to u16 (Tvrtko)
>>>>       Bump mask fields to 64bits (Lionel)
>>>>       Don't return EPERM when dynamic sseu is disabled (Tvrtko)
>>>>
>>>> v9: Import context image into kernel context's ppgtt only when
>>>>       reconfiguring powergated slice/subslices (Chris)
>>>>       Use aliasing ppgtt when needed (Michel)
>>>>
>>>> Tvrtko Ursulin:
>>>>
>>>> v10:
>>>>    * Update for upstream changes.
>>>>    * Request submit needs a RPM reference.
>>>>    * Reject on !FULL_PPGTT for simplicity.
>>>>    * Pull out get/set param to helpers for readability and less indent.
>>>>    * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>>>>      on the same timeline and avoid GEM_BUG_ON.
>>>>    * No need to explicitly assign a NULL pointer to engine in legacy mode.
>>>>    * No need to move gen8_make_rpcs up.
>>>>    * Factored out global barrier as prep patch.
>>>>    * Allow to only CAP_SYS_ADMIN if !Gen11.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>>> Issue: https://github.com/intel/media-driver/issues/267
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>>>>    drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>>>    include/uapi/drm/i915_drm.h             |  43 ++++++
>>>>    4 files changed, 288 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index 8a12984e7495..6d6220634e9e 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>>           return 0;
>>>>    }
>>>>    
>>>> +static int
>>>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>>>> +                                 struct intel_engine_cs *engine,
>>>> +                                 struct intel_sseu sseu)
>>>> +{
>>>> +       struct drm_i915_private *i915 = ctx->i915;
>>>> +       struct i915_request *rq;
>>>> +       struct intel_ring *ring;
>>>> +       int ret;
>>>> +
>>>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>>>> +
>>>> +       /* Submitting requests etc needs the hw awake. */
>>>> +       intel_runtime_pm_get(i915);
>>>> +
>>>> +       i915_retire_requests(i915);
>>>
>>> ?
>>
>> I wondered myself but did not make myself dig through all the history of
>> the series. Cannot think that it does anything useful in the current design.
>>
>>>> +
>>>> +       /* Now use the RCS to actually reconfigure. */
>>>> +       engine = i915->engine[RCS];
>>>
>>> ? Modifying registers stored in another engine's context image.
>>
>> Well, I was hoping design was kind of agreed by now.
> 
> :-p
> 
> This wasn't about using requests per-se, but raising the question of why
> use rcs to adjust the vcs context image. If we used vcs, the
> question of serialisation is then only on that engine.

It only ever modifies the RCS context image - since that is where the 
feature is only supported AFAIU. And does it via RCS which again AFAIU 
ensures we not only received the user interrupt for the last previous rq 
on the timeline, but it has been fully context saved as well.

In the light of this perhaps the uAPI allowing set param on any engine 
is incorrect for now, and it should instead return -ENODEV for !RCS. It 
is misleading that SSEU config on other engines will succeed, but in 
fact the RCS context image is modified and SSEU configuration only 
applies if/when the RCS context runs.

So now I am thinking the uAPI is fine to support other engines for now, 
to be future proof, but unless I am not getting something it should 
really return -ENODEV for !RCS.

>>>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>>>> +       if (IS_ERR(rq)) {
>>>> +               ret = PTR_ERR(rq);
>>>> +               goto out_put;
>>>> +       }
>>>> +
>>>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
>>>
>>> It's just an LRI, I'd rather we do it directly unless there's evidence
>>> that there will be na explicit rpcs config instruction in future. It
>>> just doesn't seem general enough.
>>
>> No complaints, can do.
>>
>>>> +       if (ret)
>>>> +               goto out_add;
>>>> +
>>>> +       /* Queue this switch after all other activity */
>>>
>>> Only needs to be after the target ctx.
>>
>> True, so find just the timeline belonging to target context. Some
>> backpointers would be needed to find it. Or a walk and compare against
>> target ce->ring->timeline->fence_context. Sounds like more than the
>> latter isn't justified for this use case.
> 
> Right, we should be able to use ce->ring->timeline.
> 
>>>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>>>> +               struct i915_request *prev;
>>>> +
>>>> +               prev = last_request_on_engine(ring->timeline, engine);
>>>
>>> As constructed above you need target-engine + RCS.
>>
>> Target engine is always RCS. Looks like the engine param to this
>> function is pointless.
> 
> Always always? At the beginning the speculation was that the other
> engines were going to get their own subslice parameters. I guess that
> was on the same design sheet as the VCS commands...

See above.

> 
>>>> +               if (prev)
>>>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>>>> +                                                        &prev->submit,
>>>> +                                                        I915_FENCE_GFP);
>>>> +       }
>>>> +
>>>> +       i915_gem_set_global_barrier(i915, rq);
>>>
>>> This is just for a link from ctx-engine to this rq. Overkill much?
>>> Presumably this stems from using the wrong engine.
>>
>> AFAIU this is to ensure target context cannot get ahead of this request.
>> Without it could overtake and then there is no guarantee execbufs
>> following set param will be with new SSEU configuration.
> 
> Right, but we only need a fence on the target context to this request.
> The existing way would be to queue an empty request on the target with
> the await on this rq, but if you feel that is one request too many we
> can use a timeline barrier. Or if that is too narrow, an engine barrier.
> Over time, I can see that we may want all 3 barrier levels. The cost
> isn't too onerous.

Not sure exactly what you are thinking here but I think we do need a 
global barrier, or at least a barrier local to the RCS engine.

Since the context update is done from the kernel context, we have to 
ensure new requests against the target context cannot overtake the 
context update request.

Perhaps it would be sufficient to set the highest, non user accessible 
priority on the update request? Barrier sounds more future proof and 
correct from design point of view though.

> 
>>>> +
>>>> +out_add:
>>>> +       i915_request_add(rq);
>>>
>>> And I'd still recommend not using indirect access if we can apply the
>>> changes immediately.
>>
>> Direct (CPU) access means blocking in set param until the context is
>> idle.
> 
> No, it doesn't. It's is just a choice of whether you use request to a
> pinned context, or direct access if idle (literally ce->pin_count).
> Direct access is going to be near zero overhead.

You cut all the rest of my comments here so it is unclear what you meant.

It was mostly about avoiding multiple paths to do the same thing. As I 
see it, we cannot only have the CPU update path - since it does imply 
idling the GPU. If you disagree with that please explain.

And if we do need the SDM path anyway, the CPU update path could be 
added later. I agree it would bring a small advantage of not having to 
add a global barrier *if* the target context is idle *and* RCS is busy 
with other contexts.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:22   ` Chris Wilson
@ 2018-08-15 11:51     ` Tvrtko Ursulin
  2018-08-15 11:56       ` Chris Wilson
  0 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-15 11:51 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 16:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                                 struct intel_engine_cs *engine,
>> +                                 struct intel_sseu sseu)
>> +{
>> +       struct drm_i915_private *i915 = ctx->i915;
>> +       struct i915_request *rq;
>> +       struct intel_ring *ring;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>> +
>> +       /* Submitting requests etc needs the hw awake. */
>> +       intel_runtime_pm_get(i915);
>> +
>> +       i915_retire_requests(i915);
>> +
>> +       /* Now use the RCS to actually reconfigure. */
>> +       engine = i915->engine[RCS];
>> +
>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>> +       if (IS_ERR(rq)) {
>> +               ret = PTR_ERR(rq);
>> +               goto out_put;
>> +       }
>> +
>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
>> +       if (ret)
>> +               goto out_add;
>> +
>> +       /* Queue this switch after all other activity */
>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>> +               struct i915_request *prev;
>> +
>> +               prev = last_request_on_engine(ring->timeline, engine);
>> +               if (prev)
>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                                                        &prev->submit,
>> +                                                        I915_FENCE_GFP);
>> +       }
>> +
>> +       i915_gem_set_global_barrier(i915, rq);
>> +
>> +out_add:
>> +       i915_request_add(rq);
>> +out_put:
>> +       intel_runtime_pm_put(i915);
>> +
>> +       return ret;
> 
> Looks like we should be able to hook this up to a selftest to confirm
> the modification does land in the target context image, and a SRM to
> confirm it loaded.

Lionel wrote an IGT which reads it back via SRM so that should be 
covered. I will be posting the IGT counterpart at some point as well, 
ideally when the agreement on the i915 side is there.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-15 11:51     ` Tvrtko Ursulin
@ 2018-08-15 11:56       ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-08-15 11:56 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-15 12:51:28)
> 
> On 14/08/2018 16:22, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> > Looks like we should be able to hook this up to a selftest to confirm
> > the modification does land in the target context image, and a SRM to
> > confirm it loaded.
> 
> Lionel wrote an IGT which reads it back via SRM so that should be 
> covered. I will be posting the IGT counterpart at some point as well, 
> ideally when the agreement on the i915 side is there.

Wouldn't you rather have both? I know I would :-p

The emphasis is slightly different, IGT wants to cover the ABI
behaviour, selftests looking towards the edge cases hard to reach
otherwise. But even a trivial selftest would let us know the series
functions as intended.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
  2018-08-14 14:59   ` Lionel Landwerlin
@ 2018-08-15 11:58     ` Tvrtko Ursulin
  2018-08-16  7:31       ` Tvrtko Ursulin
  0 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-15 11:58 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, Intel-gfx


On 14/08/2018 15:59, Lionel Landwerlin wrote:
> Hey Tvrtko,
> 
> Thanks for taking over this series.
> 
> I've been talking to developers using the i915/perf interface and from 
> their point of view, they expect the system to be in a stable 
> configuration when doing measurements.
> 
> One issue with this patch on Gen11 is that it will lock the system in a 
> configuration that isn't workable for media workloads (all subslices 
> enabled).
> So I think we should set the value for the locked configuration per 
> generation (gen8-10: all slices/subslices, gen11: only subslices that 
> contain VME samplers) so that we always get a functional configurations 
> for all workloads.
> Could we want to select that configuration when opening perf?

This would be via i915_perf.c/gen8_configure_all_contexts?

Sounds like an unfortunate but workable compromise. As long as there 
doesn't appear another asymmetric slice feature in the future, which 
doesn't align with VME. Like one workloads wants slices 0 & 2, another 
wants 1 & 3, or whatever. If that happens then I don't know what we do 
apart from locking out perf/OA.

> Another question is how do we expose the selected value to the user. But 
> that can be solved in a different series.

SSEU get_param would cut it? Although it would be perhaps be unexpected 
to get different results depending on whether perf/OA is active or not.. 
Hm... export device and available bitmasks via get param? Device bitmask 
would be fixed and available would change depending on whether perf/OA 
is active or not.

Regards,

Tvrtko

> Cheers,
> 
> -
> Lionel
> 
> On 14/08/18 15:40, Tvrtko Ursulin wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes.
>>
>> One possible solution to this problem is to reprogram the NOA muxes
>> when we switch to a new context. We initially tried this in the
>> workaround batchbuffer but some concerns where raised about the cost
>> of reprogramming at every context switch. This solution is also not
>> without consequences from the userspace point of view. Reprogramming
>> of the muxes can only happen once the powergating configuration has
>> changed (which happens after context switch). This means for a window
>> of time during the recording, counters recorded by the OA unit might
>> be invalid. This requires userspace dealing with OA reports to discard
>> the invalid values.
>>
>> Minimizing the reprogramming could be implemented by tracking of the
>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>> to discard some of the programming commands, but the command streamer
>> would still have to parse all the MI_LRI instructions in the
>> workaround batchbuffer.
>>
>> Another solution, which this change implements, is to simply disregard
>> the user requested configuration for the period of time when i915/perf
>> is active. There is no known issue with this apart from a performance
>> penality for some media workloads that benefit from running on a
>> partially powergated GPU. We already prevent RC6 from affecting the
>> programming so it doesn't sound completely unreasonable to hold on
>> powergating for the same reason.
>>
>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>
>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      More to_intel_context() (Tvrtko)
>>      s/dev_priv/i915/ (Tvrtko)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
>>   drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>   4 files changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d6049c3f911b..5c12d2676435 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3851,4 +3851,19 @@ static inline int 
>> intel_hws_csb_write_index(struct drm_i915_private *i915)
>>           return I915_HWS_CSB_WRITE_INDEX;
>>   }
>> +static inline struct intel_sseu
>> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
>> +              struct intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *i915 = engine->i915;
>> +
>> +    /*
>> +     * If i915/perf is active, we want a stable powergating 
>> configuration
>> +     * on the system. The most natural configuration to take in that 
>> case
>> +     * is the default (i.e maximum the hardware can do).
>> +     */
>> +    return i915->perf.oa.exclusive_stream ?
>> +        intel_device_default_sseu(i915) : sseu;
>> +}
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index ccb20230df2c..c2fc2399e0ed 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct 
>> drm_i915_private *dev_priv)
>>    */
>>   static void gen8_update_reg_state_unlocked(struct i915_gem_context 
>> *ctx,
>>                          u32 *reg_state,
>> -                       const struct i915_oa_config *oa_config)
>> +                       const struct i915_oa_config *oa_config,
>> +                       struct intel_sseu sseu)
>>   {
>>       struct drm_i915_private *dev_priv = ctx->i915;
>>       u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>> @@ -1677,6 +1678,9 @@ static void 
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>           CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>       }
>> +
>> +    CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> +        gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>>   }
>>   /*
>> @@ -1807,6 +1811,7 @@ static int 
>> gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>>   static int gen8_configure_all_contexts(struct drm_i915_private 
>> *dev_priv,
>>                          const struct i915_oa_config *oa_config)
>>   {
>> +    struct intel_sseu default_sseu = 
>> intel_device_default_sseu(dev_priv);
>>       struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>       struct i915_gem_context *ctx;
>>       int ret;
>> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct 
>> drm_i915_private *dev_priv,
>>           ce->state->obj->mm.dirty = true;
>>           regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>> -        gen8_update_reg_state_unlocked(ctx, regs, oa_config);
>> +        gen8_update_reg_state_unlocked(ctx, regs, oa_config,
>> +                           oa_config ? default_sseu : ce->sseu);
>>           i915_gem_object_unpin_map(ce->state->obj);
>>       }
>> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct 
>> intel_engine_cs *engine,
>>                   struct i915_gem_context *ctx,
>>                   u32 *reg_state)
>>   {
>> +    struct drm_i915_private *i915 = engine->i915;
>>       struct i915_perf_stream *stream;
>>       if (engine->id != RCS)
>>           return;
>> -    stream = engine->i915->perf.oa.exclusive_stream;
>> -    if (stream)
>> -        gen8_update_reg_state_unlocked(ctx, reg_state, 
>> stream->oa_config);
>> +    stream = i915->perf.oa.exclusive_stream;
>> +    if (stream) {
>> +        struct intel_sseu default_sseu =
>> +            intel_device_default_sseu(i915);
>> +
>> +        gen8_update_reg_state_unlocked(ctx, reg_state,
>> +                           stream->oa_config,
>> +                           default_sseu);
>> +    }
>>   }
>>   /**
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 7b2f2d6bb057..8a2997be7ef7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs 
>> *engine)
>>       return logical_ring_init(engine);
>>   }
>> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
>> -             struct intel_sseu ctx_sseu)
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu)
>>   {
>>       u32 rpcs = 0;
>> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>>       }
>>       if (rcs) {
>> +        struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
>> +
>>           regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>           CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> -            make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> -                  to_intel_context(ctx, engine)->sseu));
>> +            gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> +                       intel_engine_prepare_sseu(engine,
>> +                                 sseu)));
>>           i915_oa_init_reg_state(engine, ctx, regs);
>>       }
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h 
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index f5a5502ecf70..bf3acdc3d0af 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct 
>> drm_i915_private *dev_priv);
>>   void intel_execlists_set_default_submission(struct intel_engine_cs 
>> *engine);
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu);
>> +
>>   #endif /* _INTEL_LRC_H_ */
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
  2018-08-15 11:58     ` Tvrtko Ursulin
@ 2018-08-16  7:31       ` Tvrtko Ursulin
  2018-08-16  9:56         ` Lionel Landwerlin
  0 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-08-16  7:31 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, Intel-gfx


On 15/08/2018 12:58, Tvrtko Ursulin wrote:
> 
> On 14/08/2018 15:59, Lionel Landwerlin wrote:
>> Hey Tvrtko,
>>
>> Thanks for taking over this series.
>>
>> I've been talking to developers using the i915/perf interface and from 
>> their point of view, they expect the system to be in a stable 
>> configuration when doing measurements.
>>
>> One issue with this patch on Gen11 is that it will lock the system in 
>> a configuration that isn't workable for media workloads (all subslices 
>> enabled).
>> So I think we should set the value for the locked configuration per 
>> generation (gen8-10: all slices/subslices, gen11: only subslices that 
>> contain VME samplers) so that we always get a functional 
>> configurations for all workloads.
>> Could we want to select that configuration when opening perf?
> 
> This would be via i915_perf.c/gen8_configure_all_contexts?
> 
> Sounds like an unfortunate but workable compromise. As long as there 
> doesn't appear another asymmetric slice feature in the future, which 
> doesn't align with VME. Like one workloads wants slices 0 & 2, another 
> wants 1 & 3, or whatever. If that happens then I don't know what we do 
> apart from locking out perf/OA.
> 
>> Another question is how do we expose the selected value to the user. 
>> But that can be solved in a different series.
> 
> SSEU get_param would cut it? Although it would be perhaps be unexpected 
> to get different results depending on whether perf/OA is active or not.. 
> Hm... export device and available bitmasks via get param? Device bitmask 
> would be fixed and available would change depending on whether perf/OA 
> is active or not.

Big downside to this is that observing something via OA would cause slow 
down of the whole system - since the SSEU config would be locked to a 
subset of slices on Gen11. :( Regardless or not of the details.. say 
even if we choose to lock only if there is one non-default SSEU context, 
that still locks all whilst perf/OA is active.

That's quite bad I think. But the alternative is to lock out perf/OA 
while non-default SSEU contexts are present. Which is also very bad. :(

Overall I don't see a solution. Latter is maybe better by being more 
explicit? Who are the perf/OA users and what would they prefer?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
  2018-08-16  7:31       ` Tvrtko Ursulin
@ 2018-08-16  9:56         ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2018-08-16  9:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

On 16/08/18 08:31, Tvrtko Ursulin wrote:
>
> On 15/08/2018 12:58, Tvrtko Ursulin wrote:
>>
>> On 14/08/2018 15:59, Lionel Landwerlin wrote:
>>> Hey Tvrtko,
>>>
>>> Thanks for taking over this series.
>>>
>>> I've been talking to developers using the i915/perf interface and 
>>> from their point of view, they expect the system to be in a stable 
>>> configuration when doing measurements.
>>>
>>> One issue with this patch on Gen11 is that it will lock the system 
>>> in a configuration that isn't workable for media workloads (all 
>>> subslices enabled).
>>> So I think we should set the value for the locked configuration per 
>>> generation (gen8-10: all slices/subslices, gen11: only subslices 
>>> that contain VME samplers) so that we always get a functional 
>>> configurations for all workloads.
>>> Could we want to select that configuration when opening perf?
>>
>> This would be via i915_perf.c/gen8_configure_all_contexts?
>>
>> Sounds like an unfortunate but workable compromise. As long as there 
>> doesn't appear another asymmetric slice feature in the future, which 
>> doesn't align with VME. Like one workloads wants slices 0 & 2, 
>> another wants 1 & 3, or whatever. If that happens then I don't know 
>> what we do apart from locking out perf/OA.
>>
>>> Another question is how do we expose the selected value to the user. 
>>> But that can be solved in a different series.
>>
>> SSEU get_param would cut it? Although it would be perhaps be 
>> unexpected to get different results depending on whether perf/OA is 
>> active or not.. Hm... export device and available bitmasks via get 
>> param? Device bitmask would be fixed and available would change 
>> depending on whether perf/OA is active or not.

I would avoid changing the get_param. This is supposed to report the 
physical device's properties.
Maybe add an ioctl entry to report that configuration onto the i915_perf fd?

>
> Big downside to this is that observing something via OA would cause 
> slow down of the whole system - since the SSEU config would be locked 
> to a subset of slices on Gen11. :( Regardless or not of the details.. 
> say even if we choose to lock only if there is one non-default SSEU 
> context, that still locks all whilst perf/OA is active.
>
> That's quite bad I think. But the alternative is to lock out perf/OA 
> while non-default SSEU contexts are present. Which is also very bad. :(
>
> Overall I don't see a solution. Latter is maybe better by being more 
> explicit? Who are the perf/OA users and what would they prefer?

The other solution I could think of would be to prevent schedule 
contexts that are not compatible with the OA configuration.
That doesn't sound great either...

>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08 20:56     ` Chris Wilson
@ 2018-05-09 15:35       ` Lionel Landwerlin
  0 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 15:35 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx

On 08/05/18 21:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-03 18:18:43)
>> On 25/04/2018 12:45, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>        struct drm_i915_gem_context_param arg;
>>>        struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>>
>>>        memset(&arg, 0, sizeof(arg));
>>>        arg.ctx_id = ctx;
>>>        arg.param = I915_CONTEXT_PARAM_SSEU;
>>>        arg.value = (uintptr_t) &sseu;
>>>        if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>                sseu.packed.subslice_mask = 0;
>>>
>>>                drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>        }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>>    drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>>    drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>>    include/uapi/drm/i915_drm.h             | 28 +++++++++
>>>    4 files changed, 168 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index bdf050beeb94..b97ddcf47514 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>        return 0;
>>>    }
>>>    
>>> +static struct i915_gem_context_sseu
>>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>>> +                                  const struct drm_i915_gem_context_param_sseu *user_sseu)
>>> +{
>>> +     struct i915_gem_context_sseu value = {
>>> +             .slice_mask = user_sseu->packed.slice_mask == 0 ?
>>> +                           sseu->slice_mask :
>>> +                           (user_sseu->packed.slice_mask & sseu->slice_mask),
>>> +             .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>>> +                              sseu->subslice_mask[0] :
>>> +                              (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>>> +             .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>>> +                                         sseu->max_eus_per_subslice),
>>> +             .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>>> +                                         sseu->max_eus_per_subslice),
>>> +     };
>>> +
>>> +     return value;
>>> +}
>>> +
>>>    int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>                                    struct drm_file *file)
>>>    {
>>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>        case I915_CONTEXT_PARAM_PRIORITY:
>>>                args->value = ctx->sched.priority;
>>>                break;
>>> +     case I915_CONTEXT_PARAM_SSEU: {
>>> +             struct drm_i915_gem_context_param_sseu param_sseu;
>>> +             struct intel_engine_cs *engine;
>>> +
>>> +             if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                                sizeof(param_sseu))) {
>>> +                     ret = -EFAULT;
>>> +                     break;
>>> +             }
>>> +
>>> +             engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>> +                                                 param_sseu.flags);
>>> +             if (!engine) {
>>> +                     ret = -EINVAL;
>>> +                     break;
>>> +             }
>>> +
>>> +             param_sseu.packed.slice_mask =
>>> +                     ctx->engine[engine->id].sseu.slice_mask;
>>> +             param_sseu.packed.subslice_mask =
>>> +                     ctx->engine[engine->id].sseu.subslice_mask;
>>> +             param_sseu.packed.min_eus_per_subslice =
>>> +                     ctx->engine[engine->id].sseu.min_eus_per_subslice;
>>> +             param_sseu.packed.max_eus_per_subslice =
>>> +                     ctx->engine[engine->id].sseu.max_eus_per_subslice;
>>> +
>>> +             if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                              sizeof(param_sseu)))
>>> +                     ret = -EFAULT;
>>> +             break;
>>> +     }
>>>        default:
>>>                ret = -EINVAL;
>>>                break;
>>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                else
>>>                        i915_gem_context_clear_bannable(ctx);
>>>                break;
>>> -
>>>        case I915_CONTEXT_PARAM_PRIORITY:
>>>                {
>>>                        s64 priority = args->value;
>>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                                ctx->sched.priority = priority;
>>>                }
>>>                break;
>>> +     case I915_CONTEXT_PARAM_SSEU:
>>> +             if (args->size)
>>> +                     ret = -EINVAL;
>>> +             else if (!HAS_EXECLISTS(ctx->i915))
>>> +                     ret = -ENODEV;
>>> +             else {
>>> +                     struct drm_i915_private *dev_priv = to_i915(dev);
>>> +                     struct drm_i915_gem_context_param_sseu user_sseu;
>>> +                     struct i915_gem_context_sseu ctx_sseu;
>>> +                     struct intel_engine_cs *engine;
>>> +
>>> +                     if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>>> +                                        sizeof(user_sseu))) {
>>> +                             ret = -EFAULT;
>>> +                             break;
>>> +                     }
>>> +
>>> +                     engine = i915_gem_engine_from_flags(dev_priv, file,
>>> +                                                         user_sseu.flags);
>>> +                     if (!engine) {
>>> +                             ret = -EINVAL;
>>> +                             break;
>>> +                     }
>>> +
>>> +                     ctx_sseu =
>>> +                             i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>> +                                                                  &user_sseu);
>>>    
>>> +                     ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>>> +             }
>>> +             break;
>>>        default:
>>>                ret = -EINVAL;
>>>                break;
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index dca17ef24de5..9caddcb1f234 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>>        }
>>>    }
>>>    
>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>> +                           struct intel_engine_cs *engine,
>>> +                           struct i915_gem_context_sseu *sseu)
>>> +{
>>> +     struct drm_i915_private *dev_priv = ctx->i915;
>>> +     struct intel_context *ce;
>>> +     enum intel_engine_id id;
>>> +     int ret;
>>> +
>>> +     lockdep_assert_held(&dev_priv->drm.struct_mutex);
>>> +
>>> +     if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>>> +             return 0;
>>> +
>>> +     /*
>>> +      * We can only program this on render ring.
>>> +      */
>>> +     ce = &ctx->engine[RCS];
>>> +
>>> +     if (ce->pin_count) { /* Assume that the context is active! */
>>> +             ret = i915_gem_switch_to_kernel_context(dev_priv);
>>> +             if (ret)
>>> +                     return ret;
>>> +
>>> +             ret = i915_gem_wait_for_idle(dev_priv,
>>> +                                          I915_WAIT_INTERRUPTIBLE |
>>> +                                          I915_WAIT_LOCKED);
>> Could we consider the alternative of only allowing this to be configured
>> on context create? That way we would not need to idle the GPU every time
>> userspace decides to fiddle with it. It is unprivileged so quite an easy
>> way for random app to ruin GPU performance for everyone.
> I think we can do dynamic reconfiguration of the context using LRI. And
> if we can't do so from within the context, we can use SDM from another.
> -Chris
>
Documentation says we can't LRI. So that leaves SDM from another context.
Should we use the kernel one?

Thanks,

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 17:18   ` Tvrtko Ursulin
  2018-05-04 16:25     ` Lionel Landwerlin
@ 2018-05-08 20:56     ` Chris Wilson
  2018-05-09 15:35       ` Lionel Landwerlin
  1 sibling, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-05-08 20:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-03 18:18:43)
> 
> On 25/04/2018 12:45, Lionel Landwerlin wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > We want to allow userspace to reconfigure the subslice configuration for
> > its own use case. To do so, we expose a context parameter to allow
> > adjustment of the RPCS register stored within the context image (and
> > currently not accessible via LRI). If the context is adjusted before
> > first use, the adjustment is for "free"; otherwise if the context is
> > active we flush the context off the GPU (stalling all users) and forcing
> > the GPU to save the context to memory where we can modify it and so
> > ensure that the register is reloaded on next execution.
> > 
> > The overhead of managing additional EU subslices can be significant,
> > especially in multi-context workloads. Non-GPGPU contexts should
> > preferably disable the subslices it is not using, and others should
> > fine-tune the number to match their workload.
> > 
> > We expose complete control over the RPCS register, allowing
> > configuration of slice/subslice, via masks packed into a u64 for
> > simplicity. For example,
> > 
> >       struct drm_i915_gem_context_param arg;
> >       struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> > 
> >       memset(&arg, 0, sizeof(arg));
> >       arg.ctx_id = ctx;
> >       arg.param = I915_CONTEXT_PARAM_SSEU;
> >       arg.value = (uintptr_t) &sseu;
> >       if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >               sseu.packed.subslice_mask = 0;
> > 
> >               drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >       }
> > 
> > could be used to disable all subslices where supported.
> > 
> > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> > 
> > v3: Add ability to program this per engine (Chris)
> > 
> > v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> > CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > CC: Zhipeng Gong <zhipeng.gong@intel.com>
> > CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
> >   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
> >   include/uapi/drm/i915_drm.h             | 28 +++++++++
> >   4 files changed, 168 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index bdf050beeb94..b97ddcf47514 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >       return 0;
> >   }
> >   
> > +static struct i915_gem_context_sseu
> > +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> > +                                  const struct drm_i915_gem_context_param_sseu *user_sseu)
> > +{
> > +     struct i915_gem_context_sseu value = {
> > +             .slice_mask = user_sseu->packed.slice_mask == 0 ?
> > +                           sseu->slice_mask :
> > +                           (user_sseu->packed.slice_mask & sseu->slice_mask),
> > +             .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> > +                              sseu->subslice_mask[0] :
> > +                              (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> > +             .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> > +                                         sseu->max_eus_per_subslice),
> > +             .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> > +                                         sseu->max_eus_per_subslice),
> > +     };
> > +
> > +     return value;
> > +}
> > +
> >   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >                                   struct drm_file *file)
> >   {
> > @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >       case I915_CONTEXT_PARAM_PRIORITY:
> >               args->value = ctx->sched.priority;
> >               break;
> > +     case I915_CONTEXT_PARAM_SSEU: {
> > +             struct drm_i915_gem_context_param_sseu param_sseu;
> > +             struct intel_engine_cs *engine;
> > +
> > +             if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> > +                                sizeof(param_sseu))) {
> > +                     ret = -EFAULT;
> > +                     break;
> > +             }
> > +
> > +             engine = i915_gem_engine_from_flags(to_i915(dev), file,
> > +                                                 param_sseu.flags);
> > +             if (!engine) {
> > +                     ret = -EINVAL;
> > +                     break;
> > +             }
> > +
> > +             param_sseu.packed.slice_mask =
> > +                     ctx->engine[engine->id].sseu.slice_mask;
> > +             param_sseu.packed.subslice_mask =
> > +                     ctx->engine[engine->id].sseu.subslice_mask;
> > +             param_sseu.packed.min_eus_per_subslice =
> > +                     ctx->engine[engine->id].sseu.min_eus_per_subslice;
> > +             param_sseu.packed.max_eus_per_subslice =
> > +                     ctx->engine[engine->id].sseu.max_eus_per_subslice;
> > +
> > +             if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> > +                              sizeof(param_sseu)))
> > +                     ret = -EFAULT;
> > +             break;
> > +     }
> >       default:
> >               ret = -EINVAL;
> >               break;
> > @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >               else
> >                       i915_gem_context_clear_bannable(ctx);
> >               break;
> > -
> >       case I915_CONTEXT_PARAM_PRIORITY:
> >               {
> >                       s64 priority = args->value;
> > @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >                               ctx->sched.priority = priority;
> >               }
> >               break;
> > +     case I915_CONTEXT_PARAM_SSEU:
> > +             if (args->size)
> > +                     ret = -EINVAL;
> > +             else if (!HAS_EXECLISTS(ctx->i915))
> > +                     ret = -ENODEV;
> > +             else {
> > +                     struct drm_i915_private *dev_priv = to_i915(dev);
> > +                     struct drm_i915_gem_context_param_sseu user_sseu;
> > +                     struct i915_gem_context_sseu ctx_sseu;
> > +                     struct intel_engine_cs *engine;
> > +
> > +                     if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> > +                                        sizeof(user_sseu))) {
> > +                             ret = -EFAULT;
> > +                             break;
> > +                     }
> > +
> > +                     engine = i915_gem_engine_from_flags(dev_priv, file,
> > +                                                         user_sseu.flags);
> > +                     if (!engine) {
> > +                             ret = -EINVAL;
> > +                             break;
> > +                     }
> > +
> > +                     ctx_sseu =
> > +                             i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> > +                                                                  &user_sseu);
> >   
> > +                     ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> > +             }
> > +             break;
> >       default:
> >               ret = -EINVAL;
> >               break;
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index dca17ef24de5..9caddcb1f234 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
> >       }
> >   }
> >   
> > +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> > +                           struct intel_engine_cs *engine,
> > +                           struct i915_gem_context_sseu *sseu)
> > +{
> > +     struct drm_i915_private *dev_priv = ctx->i915;
> > +     struct intel_context *ce;
> > +     enum intel_engine_id id;
> > +     int ret;
> > +
> > +     lockdep_assert_held(&dev_priv->drm.struct_mutex);
> > +
> > +     if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> > +             return 0;
> > +
> > +     /*
> > +      * We can only program this on render ring.
> > +      */
> > +     ce = &ctx->engine[RCS];
> > +
> > +     if (ce->pin_count) { /* Assume that the context is active! */
> > +             ret = i915_gem_switch_to_kernel_context(dev_priv);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             ret = i915_gem_wait_for_idle(dev_priv,
> > +                                          I915_WAIT_INTERRUPTIBLE |
> > +                                          I915_WAIT_LOCKED);
> 
> Could we consider the alternative of only allowing this to be configured 
> on context create? That way we would not need to idle the GPU every time 
> userspace decides to fiddle with it. It is unprivileged so quite an easy 
> way for random app to ruin GPU performance for everyone.

I think we can do dynamic reconfiguration of the context using LRI. And
if we can't do so from within the context, we can use SDM from another.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08  8:24         ` Tvrtko Ursulin
@ 2018-05-08 16:00           ` Rogozhkin, Dmitry V
  0 siblings, 0 replies; 44+ messages in thread
From: Rogozhkin, Dmitry V @ 2018-05-08 16:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Landwerlin, Lionel G, intel-gfx

>> Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers?

When we create a context we know the optimal slice count for it. And this optimal point does not change for this context in unperturbed conditions, i.e. when context runs alone. However, there are critical use cases when we never run single context, but instead we run few contexts in parallel and each of them has its own optimal slice count operating point. As a result, major question is: what is the cost of context switch if there is associated slice configuration change, powering on or off the slice(s)? In the ideal situation when the cost is 0, we don't need any change of slice configuration for single GEM context between submitting batch buffers. Problem is that cost is far from 0. And the cost is non-tolerable for the worst case when we will have a switch at each next batch buffer. As a result, we are forced to have some negotiation channel between different contexts and make them agree on some single slice configuration which will exist for reasonably long period of time to have associated cost negligible in genera. During this period we will submit a number of batch buffers before next reconfiguration attempt. So, that's the situation when we need to reconfigure slice configuration for a single GEM context between submitting batches.

Dmitry.

-----Original Message-----
From: Tvrtko Ursulin [mailto:tvrtko.ursulin@linux.intel.com] 
Sent: Tuesday, May 8, 2018 1:25 AM
To: Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>; Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>; intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:
>>>  I'm pretty sure Dmitry wants dynamic configurations.
> 
> Yes, I afraid we really need dynamic slice configurations for media.

Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers?

Regards,

Tvrtko


> *From:*Landwerlin, Lionel G
> *Sent:* Friday, May 4, 2018 9:25 AM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; 
> intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
> <dmitry.v.rogozhkin@intel.com>
> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
> configuration to userspace
> 
> On 03/05/18 18:18, Tvrtko Ursulin wrote:
> 
>            +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>         +                  struct intel_engine_cs *engine,
>         +                  struct i915_gem_context_sseu *sseu)
>         +{
>         +    struct drm_i915_private *dev_priv = ctx->i915;
>         +    struct intel_context *ce;
>         +    enum intel_engine_id id;
>         +    int ret;
>         +
>         +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>         +
>         +    if (memcmp(sseu, &ctx->engine[engine->id].sseu,
>         sizeof(*sseu)) == 0)
>         +        return 0;
>         +
>         +    /*
>         +     * We can only program this on render ring.
>         +     */
>         +    ce = &ctx->engine[RCS];
>         +
>         +    if (ce->pin_count) { /* Assume that the context is active! */
>         +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>         +        if (ret)
>         +            return ret;
>         +
>         +        ret = i915_gem_wait_for_idle(dev_priv,
>         +                         I915_WAIT_INTERRUPTIBLE |
>         +                         I915_WAIT_LOCKED);
> 
> 
>     Could we consider the alternative of only allowing this to be
>     configured on context create? That way we would not need to idle the
>     GPU every time userspace decides to fiddle with it. It is
>     unprivileged so quite an easy way for random app to ruin GPU
>     performance for everyone.
> 
>     Regards,
> 
>     Tvrtko
> 
> I'm pretty sure Dmitry wants dynamic configurations.
> 
> Dmitry?
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08  4:04       ` Rogozhkin, Dmitry V
@ 2018-05-08  8:24         ` Tvrtko Ursulin
  2018-05-08 16:00           ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08  8:24 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V, Landwerlin, Lionel G, intel-gfx


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:
>>>  I'm pretty sure Dmitry wants dynamic configurations.
> 
> Yes, I afraid we really need dynamic slice configurations for media.

Context of the question was whether you need to change slice 
configuration for a single GEM context between submitting batch buffers?

Regards,

Tvrtko


> *From:*Landwerlin, Lionel G
> *Sent:* Friday, May 4, 2018 9:25 AM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; 
> intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
> <dmitry.v.rogozhkin@intel.com>
> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
> configuration to userspace
> 
> On 03/05/18 18:18, Tvrtko Ursulin wrote:
> 
>            +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>         +                  struct intel_engine_cs *engine,
>         +                  struct i915_gem_context_sseu *sseu)
>         +{
>         +    struct drm_i915_private *dev_priv = ctx->i915;
>         +    struct intel_context *ce;
>         +    enum intel_engine_id id;
>         +    int ret;
>         +
>         +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>         +
>         +    if (memcmp(sseu, &ctx->engine[engine->id].sseu,
>         sizeof(*sseu)) == 0)
>         +        return 0;
>         +
>         +    /*
>         +     * We can only program this on render ring.
>         +     */
>         +    ce = &ctx->engine[RCS];
>         +
>         +    if (ce->pin_count) { /* Assume that the context is active! */
>         +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>         +        if (ret)
>         +            return ret;
>         +
>         +        ret = i915_gem_wait_for_idle(dev_priv,
>         +                         I915_WAIT_INTERRUPTIBLE |
>         +                         I915_WAIT_LOCKED);
> 
> 
>     Could we consider the alternative of only allowing this to be
>     configured on context create? That way we would not need to idle the
>     GPU every time userspace decides to fiddle with it. It is
>     unprivileged so quite an easy way for random app to ruin GPU
>     performance for everyone.
> 
>     Regards,
> 
>     Tvrtko
> 
> I'm pretty sure Dmitry wants dynamic configurations.
> 
> Dmitry?
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-04 16:25     ` Lionel Landwerlin
@ 2018-05-08  4:04       ` Rogozhkin, Dmitry V
  2018-05-08  8:24         ` Tvrtko Ursulin
  0 siblings, 1 reply; 44+ messages in thread
From: Rogozhkin, Dmitry V @ 2018-05-08  4:04 UTC (permalink / raw)
  To: Landwerlin, Lionel G, Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1734 bytes --]

>> I'm pretty sure Dmitry wants dynamic configurations.

Yes, I afraid we really need dynamic slice configurations for media.

From: Landwerlin, Lionel G
Sent: Friday, May 4, 2018 9:25 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

On 03/05/18 18:18, Tvrtko Ursulin wrote:
  +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+                  struct intel_engine_cs *engine,
+                  struct i915_gem_context_sseu *sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct intel_context *ce;
+    enum intel_engine_id id;
+    int ret;
+
+    lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+    if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
+        return 0;
+
+    /*
+     * We can only program this on render ring.
+     */
+    ce = &ctx->engine[RCS];
+
+    if (ce->pin_count) { /* Assume that the context is active! */
+        ret = i915_gem_switch_to_kernel_context(dev_priv);
+        if (ret)
+            return ret;
+
+        ret = i915_gem_wait_for_idle(dev_priv,
+                         I915_WAIT_INTERRUPTIBLE |
+                         I915_WAIT_LOCKED);

Could we consider the alternative of only allowing this to be configured on context create? That way we would not need to idle the GPU every time userspace decides to fiddle with it. It is unprivileged so quite an easy way for random app to ruin GPU performance for everyone.

Regards,

Tvrtko


I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?

[-- Attachment #1.2: Type: text/html, Size: 5985 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 17:18   ` Tvrtko Ursulin
@ 2018-05-04 16:25     ` Lionel Landwerlin
  2018-05-08  4:04       ` Rogozhkin, Dmitry V
  2018-05-08 20:56     ` Chris Wilson
  1 sibling, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2018-05-04 16:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Rogozhkin, Dmitry V


[-- Attachment #1.1: Type: text/plain, Size: 1528 bytes --]

On 03/05/18 18:18, Tvrtko Ursulin wrote:
>>   +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  struct i915_gem_context_sseu *sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct intel_context *ce;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +    if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) 
>> == 0)
>> +        return 0;
>> +
>> +    /*
>> +     * We can only program this on render ring.
>> +     */
>> +    ce = &ctx->engine[RCS];
>> +
>> +    if (ce->pin_count) { /* Assume that the context is active! */
>> +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = i915_gem_wait_for_idle(dev_priv,
>> +                         I915_WAIT_INTERRUPTIBLE |
>> +                         I915_WAIT_LOCKED);
>
> Could we consider the alternative of only allowing this to be 
> configured on context create? That way we would not need to idle the 
> GPU every time userspace decides to fiddle with it. It is unprivileged 
> so quite an easy way for random app to ruin GPU performance for everyone.
>
> Regards,
>
> Tvrtko 


I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?

[-- Attachment #1.2: Type: text/html, Size: 2577 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
  2018-04-26 10:00   ` Joonas Lahtinen
@ 2018-05-03 17:18   ` Tvrtko Ursulin
  2018-05-04 16:25     ` Lionel Landwerlin
  2018-05-08 20:56     ` Chris Wilson
  1 sibling, 2 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 17:18 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 25/04/2018 12:45, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> 	struct drm_i915_gem_context_param arg;
> 	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> 
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	arg.value = (uintptr_t) &sseu;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		sseu.packed.subslice_mask = 0;
> 
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>   include/uapi/drm/i915_drm.h             | 28 +++++++++
>   4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index bdf050beeb94..b97ddcf47514 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	return 0;
>   }
>   
> +static struct i915_gem_context_sseu
> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +				     const struct drm_i915_gem_context_param_sseu *user_sseu)
> +{
> +	struct i915_gem_context_sseu value = {
> +		.slice_mask = user_sseu->packed.slice_mask == 0 ?
> +			      sseu->slice_mask :
> +			      (user_sseu->packed.slice_mask & sseu->slice_mask),
> +		.subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> +				 sseu->subslice_mask[0] :
> +				 (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> +		.min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> +					    sseu->max_eus_per_subslice),
> +		.max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> +					    sseu->max_eus_per_subslice),
> +	};
> +
> +	return value;
> +}
> +
>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file)
>   {
> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		args->value = ctx->sched.priority;
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU: {
> +		struct drm_i915_gem_context_param_sseu param_sseu;
> +		struct intel_engine_cs *engine;
> +
> +		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +				   sizeof(param_sseu))) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +						    param_sseu.flags);
> +		if (!engine) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		param_sseu.packed.slice_mask =
> +			ctx->engine[engine->id].sseu.slice_mask;
> +		param_sseu.packed.subslice_mask =
> +			ctx->engine[engine->id].sseu.subslice_mask;
> +		param_sseu.packed.min_eus_per_subslice =
> +			ctx->engine[engine->id].sseu.min_eus_per_subslice;
> +		param_sseu.packed.max_eus_per_subslice =
> +			ctx->engine[engine->id].sseu.max_eus_per_subslice;
> +
> +		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +				 sizeof(param_sseu)))
> +			ret = -EFAULT;
> +		break;
> +	}
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			i915_gem_context_clear_bannable(ctx);
>   		break;
> -
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		{
>   			s64 priority = args->value;
> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				ctx->sched.priority = priority;
>   		}
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		if (args->size)
> +			ret = -EINVAL;
> +		else if (!HAS_EXECLISTS(ctx->i915))
> +			ret = -ENODEV;
> +		else {
> +			struct drm_i915_private *dev_priv = to_i915(dev);
> +			struct drm_i915_gem_context_param_sseu user_sseu;
> +			struct i915_gem_context_sseu ctx_sseu;
> +			struct intel_engine_cs *engine;
> +
> +			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +					   sizeof(user_sseu))) {
> +				ret = -EFAULT;
> +				break;
> +			}
> +
> +			engine = i915_gem_engine_from_flags(dev_priv, file,
> +							    user_sseu.flags);
> +			if (!engine) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ctx_sseu =
> +				i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +								     &user_sseu);
>   
> +			ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> +		}
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index dca17ef24de5..9caddcb1f234 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>   	}
>   }
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      struct i915_gem_context_sseu *sseu)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +	struct intel_context *ce;
> +	enum intel_engine_id id;
> +	int ret;
> +
> +	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +	if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> +		return 0;
> +
> +	/*
> +	 * We can only program this on render ring.
> +	 */
> +	ce = &ctx->engine[RCS];
> +
> +	if (ce->pin_count) { /* Assume that the context is active! */
> +		ret = i915_gem_switch_to_kernel_context(dev_priv);
> +		if (ret)
> +			return ret;
> +
> +		ret = i915_gem_wait_for_idle(dev_priv,
> +					     I915_WAIT_INTERRUPTIBLE |
> +					     I915_WAIT_LOCKED);

Could we consider the alternative of only allowing this to be configured 
on context create? That way we would not need to idle the GPU every time 
userspace decides to fiddle with it. It is unprivileged so quite an easy 
way for random app to ruin GPU performance for everyone.

Regards,

Tvrtko

> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (ce->state) {
> +		void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> +		u32 *regs;
> +
> +		if (IS_ERR(vaddr))
> +			return PTR_ERR(vaddr);
> +
> +		regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> +
> +		regs[CTX_R_PWR_CLK_STATE + 1] =
> +			make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +		i915_gem_object_unpin_map(ce->state->obj);
> +	}
> +
> +	/*
> +	 * Apply the configuration to all engine. Our hardware doesn't
> +	 * currently support different configurations for each engine.
> +	 */
> +	for_each_engine(engine, dev_priv, id)
> +		ctx->engine[id].sseu = *sseu;
> +
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/intel_lrc.c"
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index d91d69a17206..4c84266814fa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>   	return ctx->engine[engine->id].lrc_desc;
>   }
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      struct i915_gem_context_sseu *sseu);
> +
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 96eda8176030..75749ce11c03 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +	/*
> +	 * When using the following param, value should be a pointer to
> +	 * drm_i915_gem_context_param_sseu.
> +	 */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
>   	__u64 value;
>   };
>   
> +struct drm_i915_gem_context_param_sseu {
> +	/*
> +	 * Engine to be configured or queried. Same value you would use with
> +	 * drm_i915_gem_execbuffer2.
> +	 */
> +	__u64 flags;
> +
> +	/*
> +	 * Setting slice_mask or subslice_mask to 0 will make the context use
> +	 * masks reported respectively by I915_PARAM_SLICE_MASK or
> +	 * I915_PARAM_SUBSLICE_MASK.
> +	 */
> +	union {
> +		struct {
> +			__u8 slice_mask;
> +			__u8 subslice_mask;
> +			__u8 min_eus_per_subslice;
> +			__u8 max_eus_per_subslice;
> +		} packed;
> +		__u64 value;
> +	};
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
  2018-05-03 16:25         ` Lionel Landwerlin
@ 2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 16:30 UTC (permalink / raw)
  To: Joonas Lahtinen, Lionel Landwerlin, intel-gfx


On 03/05/2018 17:04, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-26 13:22:30)
>> On 26/04/18 11:00, Joonas Lahtinen wrote:
>>> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>> This hit a dead end last time due to the system wide policy needed to
>>> avoid two parties fighting over the slice count (and going back and
>>> forth between two slice counts would counter the benefits received from
>>> this).
>>>
>>> Do we now have a solution for the contention? I don't see code to
>>> negotiate a global value, just raw setter.
>>>
>>> Regards, Joonas
>>
>> I've tried to come up with some numbers about the cost of the back &
>> forth (see igt series).
>>
>> Other than that, I don't think we can expect the kernel to workaround
>> the inefficient use of the hardware by userspace.
> 
> Well, I'm pretty sure we should not try to make the situation too
> miserable for the basic usecases.
> 
> If we allow two contexts to fight over the slice count, countering any
> perceived benefit, I don't think that'll be a good default.
> 
> My recollection of the last round of discussion was that reasonable
> thing to do would be to only disable slices when everyone is willing to
> let go of. Then it would become a system maintainer level decision to
> only run on two slices for when they see fit (configuring the userspace).

Yes there was definitely talk about co-ordination between OpenCL and 
media stacks.

Btw we mentioned on IRC a few days back that a sysfs knob like 
allow_media_priority, defaulting to 0, could be a good start.

That way userspace could use the uAPI, but unless sysadmin allows it to, 
there will be no fighting over slice counts.

Then at some future time, as long as the uAPI is sensible, more advanced 
policies could be explored. Regardless if in userspace or i915.

Regards,

Tvrtko

> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).
> 
> Regards, Joonas
> 
>>
>>>
>>>> We expose complete control over the RPCS register, allowing
>>>> configuration of slice/subslice, via masks packed into a u64 for
>>>> simplicity. For example,
>>>>
>>>>           struct drm_i915_gem_context_param arg;
>>>>           struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>>>
>>>>           memset(&arg, 0, sizeof(arg));
>>>>           arg.ctx_id = ctx;
>>>>           arg.param = I915_CONTEXT_PARAM_SSEU;
>>>>           arg.value = (uintptr_t) &sseu;
>>>>           if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>>                   sseu.packed.subslice_mask = 0;
>>>>
>>>>                   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>>           }
>>>>
>>>> could be used to disable all subslices where supported.
>>>>
>>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>>
>>>> v3: Add ability to program this per engine (Chris)
>>>>
>>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>>>    drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>>>    include/uapi/drm/i915_drm.h             | 28 +++++++++
>>>>    4 files changed, 168 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index bdf050beeb94..b97ddcf47514 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>>           return 0;
>>>>    }
>>>>    
>>>> +static struct i915_gem_context_sseu
>>>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>>>> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
>>>> +{
>>>> +       struct i915_gem_context_sseu value = {
>>>> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
>>>> +                             sseu->slice_mask :
>>>> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
>>>> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>>>> +                                sseu->subslice_mask[0] :
>>>> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>>>> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>>>> +                                           sseu->max_eus_per_subslice),
>>>> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>>>> +                                           sseu->max_eus_per_subslice),
>>>> +       };
>>>> +
>>>> +       return value;
>>>> +}
>>>> +
>>>>    int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>>                                       struct drm_file *file)
>>>>    {
>>>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>>           case I915_CONTEXT_PARAM_PRIORITY:
>>>>                   args->value = ctx->sched.priority;
>>>>                   break;
>>>> +       case I915_CONTEXT_PARAM_SSEU: {
>>>> +               struct drm_i915_gem_context_param_sseu param_sseu;
>>>> +               struct intel_engine_cs *engine;
>>>> +
>>>> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>>> +                                  sizeof(param_sseu))) {
>>>> +                       ret = -EFAULT;
>>>> +                       break;
>>>> +               }
>>>> +
>>>> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>>> +                                                   param_sseu.flags);
>>>> +               if (!engine) {
>>>> +                       ret = -EINVAL;
>>>> +                       break;
>>>> +               }
>>>> +
>>>> +               param_sseu.packed.slice_mask =
>>>> +                       ctx->engine[engine->id].sseu.slice_mask;
>>>> +               param_sseu.packed.subslice_mask =
>>>> +                       ctx->engine[engine->id].sseu.subslice_mask;
>>>> +               param_sseu.packed.min_eus_per_subslice =
>>>> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
>>>> +               param_sseu.packed.max_eus_per_subslice =
>>>> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
>>>> +
>>>> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>>> +                                sizeof(param_sseu)))
>>>> +                       ret = -EFAULT;
>>>> +               break;
>>>> +       }
>>>>           default:
>>>>                   ret = -EINVAL;
>>>>                   break;
>>>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>>                   else
>>>>                           i915_gem_context_clear_bannable(ctx);
>>>>                   break;
>>>> -
>>>>           case I915_CONTEXT_PARAM_PRIORITY:
>>>>                   {
>>>>                           s64 priority = args->value;
>>>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>>                                   ctx->sched.priority = priority;
>>>>                   }
>>>>                   break;
>>>> +       case I915_CONTEXT_PARAM_SSEU:
>>>> +               if (args->size)
>>>> +                       ret = -EINVAL;
>>>> +               else if (!HAS_EXECLISTS(ctx->i915))
>>>> +                       ret = -ENODEV;
>>>> +               else {
>>>> +                       struct drm_i915_private *dev_priv = to_i915(dev);
>>>> +                       struct drm_i915_gem_context_param_sseu user_sseu;
>>>> +                       struct i915_gem_context_sseu ctx_sseu;
>>>> +                       struct intel_engine_cs *engine;
>>>> +
>>>> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>>>> +                                          sizeof(user_sseu))) {
>>>> +                               ret = -EFAULT;
>>>> +                               break;
>>>> +                       }
>>>> +
>>>> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
>>>> +                                                           user_sseu.flags);
>>>> +                       if (!engine) {
>>>> +                               ret = -EINVAL;
>>>> +                               break;
>>>> +                       }
>>>> +
>>>> +                       ctx_sseu =
>>>> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>>> +                                                                    &user_sseu);
>>>>    
>>>> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>>>> +               }
>>>> +               break;
>>>>           default:
>>>>                   ret = -EINVAL;
>>>>                   break;
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index dca17ef24de5..9caddcb1f234 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>>>           }
>>>>    }
>>>>    
>>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>>> +                             struct intel_engine_cs *engine,
>>>> +                             struct i915_gem_context_sseu *sseu)
>>>> +{
>>>> +       struct drm_i915_private *dev_priv = ctx->i915;
>>>> +       struct intel_context *ce;
>>>> +       enum intel_engine_id id;
>>>> +       int ret;
>>>> +
>>>> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
>>>> +
>>>> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>>>> +               return 0;
>>>> +
>>>> +       /*
>>>> +        * We can only program this on render ring.
>>>> +        */
>>>> +       ce = &ctx->engine[RCS];
>>>> +
>>>> +       if (ce->pin_count) { /* Assume that the context is active! */
>>>> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +
>>>> +               ret = i915_gem_wait_for_idle(dev_priv,
>>>> +                                            I915_WAIT_INTERRUPTIBLE |
>>>> +                                            I915_WAIT_LOCKED);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +       }
>>>> +
>>>> +       if (ce->state) {
>>>> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
>>>> +               u32 *regs;
>>>> +
>>>> +               if (IS_ERR(vaddr))
>>>> +                       return PTR_ERR(vaddr);
>>>> +
>>>> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
>>>> +
>>>> +               regs[CTX_R_PWR_CLK_STATE + 1] =
>>>> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>>>> +               i915_gem_object_unpin_map(ce->state->obj);
>>>> +       }
>>>> +
>>>> +       /*
>>>> +        * Apply the configuration to all engine. Our hardware doesn't
>>>> +        * currently support different configurations for each engine.
>>>> +        */
>>>> +       for_each_engine(engine, dev_priv, id)
>>>> +               ctx->engine[id].sseu = *sseu;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>    #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>>    #include "selftests/intel_lrc.c"
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>>>> index d91d69a17206..4c84266814fa 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>>>> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>>>>           return ctx->engine[engine->id].lrc_desc;
>>>>    }
>>>>    
>>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>>> +                             struct intel_engine_cs *engine,
>>>> +                             struct i915_gem_context_sseu *sseu);
>>>> +
>>>>    #endif /* _INTEL_LRC_H_ */
>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>> index 96eda8176030..75749ce11c03 100644
>>>> --- a/include/uapi/drm/i915_drm.h
>>>> +++ b/include/uapi/drm/i915_drm.h
>>>> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>>>>    #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>>>>    #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>>>>    #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
>>>> +       /*
>>>> +        * When using the following param, value should be a pointer to
>>>> +        * drm_i915_gem_context_param_sseu.
>>>> +        */
>>>> +#define I915_CONTEXT_PARAM_SSEU                0x7
>>>>           __u64 value;
>>>>    };
>>>>    
>>>> +struct drm_i915_gem_context_param_sseu {
>>>> +       /*
>>>> +        * Engine to be configured or queried. Same value you would use with
>>>> +        * drm_i915_gem_execbuffer2.
>>>> +        */
>>>> +       __u64 flags;
>>>> +
>>>> +       /*
>>>> +        * Setting slice_mask or subslice_mask to 0 will make the context use
>>>> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
>>>> +        * I915_PARAM_SUBSLICE_MASK.
>>>> +        */
>>>> +       union {
>>>> +               struct {
>>>> +                       __u8 slice_mask;
>>>> +                       __u8 subslice_mask;
>>>> +                       __u8 min_eus_per_subslice;
>>>> +                       __u8 max_eus_per_subslice;
>>>> +               } packed;
>>>> +               __u64 value;
>>>> +       };
>>>> +};
>>>> +
>>>>    enum drm_i915_oa_format {
>>>>           I915_OA_FORMAT_A13 = 1,     /* HSW only */
>>>>           I915_OA_FORMAT_A29,         /* HSW only */
>>>> -- 
>>>> 2.17.0
>>>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
@ 2018-05-03 16:25         ` Lionel Landwerlin
  2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 16:25 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On 03/05/18 17:04, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-26 13:22:30)
>> On 26/04/18 11:00, Joonas Lahtinen wrote:
>>> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>> This hit a dead end last time due to the system wide policy needed to
>>> avoid two parties fighting over the slice count (and going back and
>>> forth between two slice counts would counter the benefits received from
>>> this).
>>>
>>> Do we now have a solution for the contention? I don't see code to
>>> negotiate a global value, just raw setter.
>>>
>>> Regards, Joonas
>> I've tried to come up with some numbers about the cost of the back &
>> forth (see igt series).
>>
>> Other than that, I don't think we can expect the kernel to workaround
>> the inefficient use of the hardware by userspace.
> Well, I'm pretty sure we should not try to make the situation too
> miserable for the basic usecases.
>
> If we allow two contexts to fight over the slice count, countering any
> perceived benefit, I don't think that'll be a good default.
>
> My recollection of the last round of discussion was that reasonable
> thing to do would be to only disable slices when everyone is willing to
> let go of. Then it would become a system maintainer level decision to
> only run on two slices for when they see fit (configuring the userspace).

How would you detect that everybody is willing to let go?
We don't appear to have a mechanism to detect that.
Wouldn't that require to teach all userspace drivers?

>
> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).

That sounds more workable, although fairly complicated.
Maybe we could tweak the priority based on the slice count and say :
     for the next (1second / number of slice config), priority bumped 
for the configuration with x slices
     and rotate every (1second / number of slice config)

Would that resolve the back & forth issue completely though?

Moving a particular context back & forth between different 
configurations is costly too.
You need to drain the context, then pin the context before you can edit 
and resubmit.

Thanks for your feedback,

-
Lionel

>
> Regards, Joonas
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
@ 2018-05-03 16:14         ` Chris Wilson
  2018-05-03 16:25         ` Lionel Landwerlin
  2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-05-03 16:14 UTC (permalink / raw)
  To: Joonas Lahtinen, Lionel Landwerlin, intel-gfx

Quoting Joonas Lahtinen (2018-05-03 17:04:31)
> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).

Have you been peeking at my reading material? ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-26 10:22     ` Lionel Landwerlin
@ 2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
                           ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-05-03 16:04 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-26 13:22:30)
> On 26/04/18 11:00, Joonas Lahtinen wrote:
> > Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> >> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> > This hit a dead end last time due to the system wide policy needed to
> > avoid two parties fighting over the slice count (and going back and
> > forth between two slice counts would counter the benefits received from
> > this).
> >
> > Do we now have a solution for the contention? I don't see code to
> > negotiate a global value, just raw setter.
> >
> > Regards, Joonas
> 
> I've tried to come up with some numbers about the cost of the back & 
> forth (see igt series).
> 
> Other than that, I don't think we can expect the kernel to workaround 
> the inefficient use of the hardware by userspace.

Well, I'm pretty sure we should not try to make the situation too
miserable for the basic usecases.

If we allow two contexts to fight over the slice count, countering any
perceived benefit, I don't think that'll be a good default.

My recollection of the last round of discussion was that reasonable
thing to do would be to only disable slices when everyone is willing to
let go of. Then it would become a system maintainer level decision to
only run on two slices for when they see fit (configuring the userspace).

More advanced tactics would include scheduling work so that we try to
avoid the slice count changes and deduct the switching time from the
execution budget of the app requesting less slices (if we had fair
time slicing).

Regards, Joonas

> 
> >
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>          struct drm_i915_gem_context_param arg;
> >>          struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> >>
> >>          memset(&arg, 0, sizeof(arg));
> >>          arg.ctx_id = ctx;
> >>          arg.param = I915_CONTEXT_PARAM_SSEU;
> >>          arg.value = (uintptr_t) &sseu;
> >>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>                  sseu.packed.subslice_mask = 0;
> >>
> >>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>          }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> >> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> >> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
> >>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
> >>   include/uapi/drm/i915_drm.h             | 28 +++++++++
> >>   4 files changed, 168 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index bdf050beeb94..b97ddcf47514 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >>          return 0;
> >>   }
> >>   
> >> +static struct i915_gem_context_sseu
> >> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
> >> +{
> >> +       struct i915_gem_context_sseu value = {
> >> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
> >> +                             sseu->slice_mask :
> >> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
> >> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> >> +                                sseu->subslice_mask[0] :
> >> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> >> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> >> +                                           sseu->max_eus_per_subslice),
> >> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> >> +                                           sseu->max_eus_per_subslice),
> >> +       };
> >> +
> >> +       return value;
> >> +}
> >> +
> >>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >>                                      struct drm_file *file)
> >>   {
> >> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >>          case I915_CONTEXT_PARAM_PRIORITY:
> >>                  args->value = ctx->sched.priority;
> >>                  break;
> >> +       case I915_CONTEXT_PARAM_SSEU: {
> >> +               struct drm_i915_gem_context_param_sseu param_sseu;
> >> +               struct intel_engine_cs *engine;
> >> +
> >> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> >> +                                  sizeof(param_sseu))) {
> >> +                       ret = -EFAULT;
> >> +                       break;
> >> +               }
> >> +
> >> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
> >> +                                                   param_sseu.flags);
> >> +               if (!engine) {
> >> +                       ret = -EINVAL;
> >> +                       break;
> >> +               }
> >> +
> >> +               param_sseu.packed.slice_mask =
> >> +                       ctx->engine[engine->id].sseu.slice_mask;
> >> +               param_sseu.packed.subslice_mask =
> >> +                       ctx->engine[engine->id].sseu.subslice_mask;
> >> +               param_sseu.packed.min_eus_per_subslice =
> >> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
> >> +               param_sseu.packed.max_eus_per_subslice =
> >> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
> >> +
> >> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> >> +                                sizeof(param_sseu)))
> >> +                       ret = -EFAULT;
> >> +               break;
> >> +       }
> >>          default:
> >>                  ret = -EINVAL;
> >>                  break;
> >> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >>                  else
> >>                          i915_gem_context_clear_bannable(ctx);
> >>                  break;
> >> -
> >>          case I915_CONTEXT_PARAM_PRIORITY:
> >>                  {
> >>                          s64 priority = args->value;
> >> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >>                                  ctx->sched.priority = priority;
> >>                  }
> >>                  break;
> >> +       case I915_CONTEXT_PARAM_SSEU:
> >> +               if (args->size)
> >> +                       ret = -EINVAL;
> >> +               else if (!HAS_EXECLISTS(ctx->i915))
> >> +                       ret = -ENODEV;
> >> +               else {
> >> +                       struct drm_i915_private *dev_priv = to_i915(dev);
> >> +                       struct drm_i915_gem_context_param_sseu user_sseu;
> >> +                       struct i915_gem_context_sseu ctx_sseu;
> >> +                       struct intel_engine_cs *engine;
> >> +
> >> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> >> +                                          sizeof(user_sseu))) {
> >> +                               ret = -EFAULT;
> >> +                               break;
> >> +                       }
> >> +
> >> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
> >> +                                                           user_sseu.flags);
> >> +                       if (!engine) {
> >> +                               ret = -EINVAL;
> >> +                               break;
> >> +                       }
> >> +
> >> +                       ctx_sseu =
> >> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> >> +                                                                    &user_sseu);
> >>   
> >> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> >> +               }
> >> +               break;
> >>          default:
> >>                  ret = -EINVAL;
> >>                  break;
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >> index dca17ef24de5..9caddcb1f234 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
> >>          }
> >>   }
> >>   
> >> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> >> +                             struct intel_engine_cs *engine,
> >> +                             struct i915_gem_context_sseu *sseu)
> >> +{
> >> +       struct drm_i915_private *dev_priv = ctx->i915;
> >> +       struct intel_context *ce;
> >> +       enum intel_engine_id id;
> >> +       int ret;
> >> +
> >> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >> +
> >> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> >> +               return 0;
> >> +
> >> +       /*
> >> +        * We can only program this on render ring.
> >> +        */
> >> +       ce = &ctx->engine[RCS];
> >> +
> >> +       if (ce->pin_count) { /* Assume that the context is active! */
> >> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
> >> +               if (ret)
> >> +                       return ret;
> >> +
> >> +               ret = i915_gem_wait_for_idle(dev_priv,
> >> +                                            I915_WAIT_INTERRUPTIBLE |
> >> +                                            I915_WAIT_LOCKED);
> >> +               if (ret)
> >> +                       return ret;
> >> +       }
> >> +
> >> +       if (ce->state) {
> >> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> >> +               u32 *regs;
> >> +
> >> +               if (IS_ERR(vaddr))
> >> +                       return PTR_ERR(vaddr);
> >> +
> >> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> >> +
> >> +               regs[CTX_R_PWR_CLK_STATE + 1] =
> >> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> >> +               i915_gem_object_unpin_map(ce->state->obj);
> >> +       }
> >> +
> >> +       /*
> >> +        * Apply the configuration to all engine. Our hardware doesn't
> >> +        * currently support different configurations for each engine.
> >> +        */
> >> +       for_each_engine(engine, dev_priv, id)
> >> +               ctx->engine[id].sseu = *sseu;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >>   #include "selftests/intel_lrc.c"
> >>   #endif
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> >> index d91d69a17206..4c84266814fa 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.h
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> >> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
> >>          return ctx->engine[engine->id].lrc_desc;
> >>   }
> >>   
> >> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> >> +                             struct intel_engine_cs *engine,
> >> +                             struct i915_gem_context_sseu *sseu);
> >> +
> >>   #endif /* _INTEL_LRC_H_ */
> >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >> index 96eda8176030..75749ce11c03 100644
> >> --- a/include/uapi/drm/i915_drm.h
> >> +++ b/include/uapi/drm/i915_drm.h
> >> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
> >>   #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
> >>   #define   I915_CONTEXT_DEFAULT_PRIORITY                0
> >>   #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
> >> +       /*
> >> +        * When using the following param, value should be a pointer to
> >> +        * drm_i915_gem_context_param_sseu.
> >> +        */
> >> +#define I915_CONTEXT_PARAM_SSEU                0x7
> >>          __u64 value;
> >>   };
> >>   
> >> +struct drm_i915_gem_context_param_sseu {
> >> +       /*
> >> +        * Engine to be configured or queried. Same value you would use with
> >> +        * drm_i915_gem_execbuffer2.
> >> +        */
> >> +       __u64 flags;
> >> +
> >> +       /*
> >> +        * Setting slice_mask or subslice_mask to 0 will make the context use
> >> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
> >> +        * I915_PARAM_SUBSLICE_MASK.
> >> +        */
> >> +       union {
> >> +               struct {
> >> +                       __u8 slice_mask;
> >> +                       __u8 subslice_mask;
> >> +                       __u8 min_eus_per_subslice;
> >> +                       __u8 max_eus_per_subslice;
> >> +               } packed;
> >> +               __u64 value;
> >> +       };
> >> +};
> >> +
> >>   enum drm_i915_oa_format {
> >>          I915_OA_FORMAT_A13 = 1,     /* HSW only */
> >>          I915_OA_FORMAT_A29,         /* HSW only */
> >> -- 
> >> 2.17.0
> >>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-26 10:00   ` Joonas Lahtinen
@ 2018-04-26 10:22     ` Lionel Landwerlin
  2018-05-03 16:04       ` Joonas Lahtinen
  0 siblings, 1 reply; 44+ messages in thread
From: Lionel Landwerlin @ 2018-04-26 10:22 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On 26/04/18 11:00, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
> This hit a dead end last time due to the system wide policy needed to
> avoid two parties fighting over the slice count (and going back and
> forth between two slice counts would counter the benefits received from
> this).
>
> Do we now have a solution for the contention? I don't see code to
> negotiate a global value, just raw setter.
>
> Regards, Joonas

I've tried to come up with some numbers about the cost of the back & 
forth (see igt series).

Other than that, I don't think we can expect the kernel to workaround 
the inefficient use of the hardware by userspace.

>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>          struct drm_i915_gem_context_param arg;
>>          struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>
>>          memset(&arg, 0, sizeof(arg));
>>          arg.ctx_id = ctx;
>>          arg.param = I915_CONTEXT_PARAM_SSEU;
>>          arg.value = (uintptr_t) &sseu;
>>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>                  sseu.packed.subslice_mask = 0;
>>
>>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>          }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>   include/uapi/drm/i915_drm.h             | 28 +++++++++
>>   4 files changed, 168 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index bdf050beeb94..b97ddcf47514 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>          return 0;
>>   }
>>   
>> +static struct i915_gem_context_sseu
>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
>> +{
>> +       struct i915_gem_context_sseu value = {
>> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
>> +                             sseu->slice_mask :
>> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
>> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>> +                                sseu->subslice_mask[0] :
>> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>> +                                           sseu->max_eus_per_subslice),
>> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>> +                                           sseu->max_eus_per_subslice),
>> +       };
>> +
>> +       return value;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>                                      struct drm_file *file)
>>   {
>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>          case I915_CONTEXT_PARAM_PRIORITY:
>>                  args->value = ctx->sched.priority;
>>                  break;
>> +       case I915_CONTEXT_PARAM_SSEU: {
>> +               struct drm_i915_gem_context_param_sseu param_sseu;
>> +               struct intel_engine_cs *engine;
>> +
>> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                                  sizeof(param_sseu))) {
>> +                       ret = -EFAULT;
>> +                       break;
>> +               }
>> +
>> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
>> +                                                   param_sseu.flags);
>> +               if (!engine) {
>> +                       ret = -EINVAL;
>> +                       break;
>> +               }
>> +
>> +               param_sseu.packed.slice_mask =
>> +                       ctx->engine[engine->id].sseu.slice_mask;
>> +               param_sseu.packed.subslice_mask =
>> +                       ctx->engine[engine->id].sseu.subslice_mask;
>> +               param_sseu.packed.min_eus_per_subslice =
>> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
>> +               param_sseu.packed.max_eus_per_subslice =
>> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
>> +
>> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                                sizeof(param_sseu)))
>> +                       ret = -EFAULT;
>> +               break;
>> +       }
>>          default:
>>                  ret = -EINVAL;
>>                  break;
>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                  else
>>                          i915_gem_context_clear_bannable(ctx);
>>                  break;
>> -
>>          case I915_CONTEXT_PARAM_PRIORITY:
>>                  {
>>                          s64 priority = args->value;
>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                                  ctx->sched.priority = priority;
>>                  }
>>                  break;
>> +       case I915_CONTEXT_PARAM_SSEU:
>> +               if (args->size)
>> +                       ret = -EINVAL;
>> +               else if (!HAS_EXECLISTS(ctx->i915))
>> +                       ret = -ENODEV;
>> +               else {
>> +                       struct drm_i915_private *dev_priv = to_i915(dev);
>> +                       struct drm_i915_gem_context_param_sseu user_sseu;
>> +                       struct i915_gem_context_sseu ctx_sseu;
>> +                       struct intel_engine_cs *engine;
>> +
>> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>> +                                          sizeof(user_sseu))) {
>> +                               ret = -EFAULT;
>> +                               break;
>> +                       }
>> +
>> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
>> +                                                           user_sseu.flags);
>> +                       if (!engine) {
>> +                               ret = -EINVAL;
>> +                               break;
>> +                       }
>> +
>> +                       ctx_sseu =
>> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>> +                                                                    &user_sseu);
>>   
>> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>> +               }
>> +               break;
>>          default:
>>                  ret = -EINVAL;
>>                  break;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index dca17ef24de5..9caddcb1f234 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>          }
>>   }
>>   
>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                             struct intel_engine_cs *engine,
>> +                             struct i915_gem_context_sseu *sseu)
>> +{
>> +       struct drm_i915_private *dev_priv = ctx->i915;
>> +       struct intel_context *ce;
>> +       enum intel_engine_id id;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>> +               return 0;
>> +
>> +       /*
>> +        * We can only program this on render ring.
>> +        */
>> +       ce = &ctx->engine[RCS];
>> +
>> +       if (ce->pin_count) { /* Assume that the context is active! */
>> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               ret = i915_gem_wait_for_idle(dev_priv,
>> +                                            I915_WAIT_INTERRUPTIBLE |
>> +                                            I915_WAIT_LOCKED);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       if (ce->state) {
>> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
>> +               u32 *regs;
>> +
>> +               if (IS_ERR(vaddr))
>> +                       return PTR_ERR(vaddr);
>> +
>> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
>> +
>> +               regs[CTX_R_PWR_CLK_STATE + 1] =
>> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>> +               i915_gem_object_unpin_map(ce->state->obj);
>> +       }
>> +
>> +       /*
>> +        * Apply the configuration to all engine. Our hardware doesn't
>> +        * currently support different configurations for each engine.
>> +        */
>> +       for_each_engine(engine, dev_priv, id)
>> +               ctx->engine[id].sseu = *sseu;
>> +
>> +       return 0;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/intel_lrc.c"
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>> index d91d69a17206..4c84266814fa 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>>          return ctx->engine[engine->id].lrc_desc;
>>   }
>>   
>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                             struct intel_engine_cs *engine,
>> +                             struct i915_gem_context_sseu *sseu);
>> +
>>   #endif /* _INTEL_LRC_H_ */
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 96eda8176030..75749ce11c03 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>>   #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>>   #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>>   #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
>> +       /*
>> +        * When using the following param, value should be a pointer to
>> +        * drm_i915_gem_context_param_sseu.
>> +        */
>> +#define I915_CONTEXT_PARAM_SSEU                0x7
>>          __u64 value;
>>   };
>>   
>> +struct drm_i915_gem_context_param_sseu {
>> +       /*
>> +        * Engine to be configured or queried. Same value you would use with
>> +        * drm_i915_gem_execbuffer2.
>> +        */
>> +       __u64 flags;
>> +
>> +       /*
>> +        * Setting slice_mask or subslice_mask to 0 will make the context use
>> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
>> +        * I915_PARAM_SUBSLICE_MASK.
>> +        */
>> +       union {
>> +               struct {
>> +                       __u8 slice_mask;
>> +                       __u8 subslice_mask;
>> +                       __u8 min_eus_per_subslice;
>> +                       __u8 max_eus_per_subslice;
>> +               } packed;
>> +               __u64 value;
>> +       };
>> +};
>> +
>>   enum drm_i915_oa_format {
>>          I915_OA_FORMAT_A13 = 1,     /* HSW only */
>>          I915_OA_FORMAT_A29,         /* HSW only */
>> -- 
>> 2.17.0
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-04-26 10:00   ` Joonas Lahtinen
  2018-04-26 10:22     ` Lionel Landwerlin
  2018-05-03 17:18   ` Tvrtko Ursulin
  1 sibling, 1 reply; 44+ messages in thread
From: Joonas Lahtinen @ 2018-04-26 10:00 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas

> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>  include/uapi/drm/i915_drm.h             | 28 +++++++++
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index bdf050beeb94..b97ddcf47514 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static struct i915_gem_context_sseu
> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
> +{
> +       struct i915_gem_context_sseu value = {
> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
> +                             sseu->slice_mask :
> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> +                                sseu->subslice_mask[0] :
> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> +                                           sseu->max_eus_per_subslice),
> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> +                                           sseu->max_eus_per_subslice),
> +       };
> +
> +       return value;
> +}
> +
>  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>                                     struct drm_file *file)
>  {
> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>         case I915_CONTEXT_PARAM_PRIORITY:
>                 args->value = ctx->sched.priority;
>                 break;
> +       case I915_CONTEXT_PARAM_SSEU: {
> +               struct drm_i915_gem_context_param_sseu param_sseu;
> +               struct intel_engine_cs *engine;
> +
> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +                                  sizeof(param_sseu))) {
> +                       ret = -EFAULT;
> +                       break;
> +               }
> +
> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +                                                   param_sseu.flags);
> +               if (!engine) {
> +                       ret = -EINVAL;
> +                       break;
> +               }
> +
> +               param_sseu.packed.slice_mask =
> +                       ctx->engine[engine->id].sseu.slice_mask;
> +               param_sseu.packed.subslice_mask =
> +                       ctx->engine[engine->id].sseu.subslice_mask;
> +               param_sseu.packed.min_eus_per_subslice =
> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
> +               param_sseu.packed.max_eus_per_subslice =
> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
> +
> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +                                sizeof(param_sseu)))
> +                       ret = -EFAULT;
> +               break;
> +       }
>         default:
>                 ret = -EINVAL;
>                 break;
> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                 else
>                         i915_gem_context_clear_bannable(ctx);
>                 break;
> -
>         case I915_CONTEXT_PARAM_PRIORITY:
>                 {
>                         s64 priority = args->value;
> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                                 ctx->sched.priority = priority;
>                 }
>                 break;
> +       case I915_CONTEXT_PARAM_SSEU:
> +               if (args->size)
> +                       ret = -EINVAL;
> +               else if (!HAS_EXECLISTS(ctx->i915))
> +                       ret = -ENODEV;
> +               else {
> +                       struct drm_i915_private *dev_priv = to_i915(dev);
> +                       struct drm_i915_gem_context_param_sseu user_sseu;
> +                       struct i915_gem_context_sseu ctx_sseu;
> +                       struct intel_engine_cs *engine;
> +
> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +                                          sizeof(user_sseu))) {
> +                               ret = -EFAULT;
> +                               break;
> +                       }
> +
> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
> +                                                           user_sseu.flags);
> +                       if (!engine) {
> +                               ret = -EINVAL;
> +                               break;
> +                       }
> +
> +                       ctx_sseu =
> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +                                                                    &user_sseu);
>  
> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> +               }
> +               break;
>         default:
>                 ret = -EINVAL;
>                 break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index dca17ef24de5..9caddcb1f234 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>         }
>  }
>  
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +                             struct intel_engine_cs *engine,
> +                             struct i915_gem_context_sseu *sseu)
> +{
> +       struct drm_i915_private *dev_priv = ctx->i915;
> +       struct intel_context *ce;
> +       enum intel_engine_id id;
> +       int ret;
> +
> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> +               return 0;
> +
> +       /*
> +        * We can only program this on render ring.
> +        */
> +       ce = &ctx->engine[RCS];
> +
> +       if (ce->pin_count) { /* Assume that the context is active! */
> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
> +               if (ret)
> +                       return ret;
> +
> +               ret = i915_gem_wait_for_idle(dev_priv,
> +                                            I915_WAIT_INTERRUPTIBLE |
> +                                            I915_WAIT_LOCKED);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       if (ce->state) {
> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> +               u32 *regs;
> +
> +               if (IS_ERR(vaddr))
> +                       return PTR_ERR(vaddr);
> +
> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> +
> +               regs[CTX_R_PWR_CLK_STATE + 1] =
> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +               i915_gem_object_unpin_map(ce->state->obj);
> +       }
> +
> +       /*
> +        * Apply the configuration to all engine. Our hardware doesn't
> +        * currently support different configurations for each engine.
> +        */
> +       for_each_engine(engine, dev_priv, id)
> +               ctx->engine[id].sseu = *sseu;
> +
> +       return 0;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftests/intel_lrc.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index d91d69a17206..4c84266814fa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>         return ctx->engine[engine->id].lrc_desc;
>  }
>  
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +                             struct intel_engine_cs *engine,
> +                             struct i915_gem_context_sseu *sseu);
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 96eda8176030..75749ce11c03 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>  #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>  #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>  #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
> +       /*
> +        * When using the following param, value should be a pointer to
> +        * drm_i915_gem_context_param_sseu.
> +        */
> +#define I915_CONTEXT_PARAM_SSEU                0x7
>         __u64 value;
>  };
>  
> +struct drm_i915_gem_context_param_sseu {
> +       /*
> +        * Engine to be configured or queried. Same value you would use with
> +        * drm_i915_gem_execbuffer2.
> +        */
> +       __u64 flags;
> +
> +       /*
> +        * Setting slice_mask or subslice_mask to 0 will make the context use
> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
> +        * I915_PARAM_SUBSLICE_MASK.
> +        */
> +       union {
> +               struct {
> +                       __u8 slice_mask;
> +                       __u8 subslice_mask;
> +                       __u8 min_eus_per_subslice;
> +                       __u8 max_eus_per_subslice;
> +               } packed;
> +               __u64 value;
> +       };
> +};
> +
>  enum drm_i915_oa_format {
>         I915_OA_FORMAT_A13 = 1,     /* HSW only */
>         I915_OA_FORMAT_A29,         /* HSW only */
> -- 
> 2.17.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-26 10:00   ` Joonas Lahtinen
  2018-05-03 17:18   ` Tvrtko Ursulin
  0 siblings, 2 replies; 44+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
 include/uapi/drm/i915_drm.h             | 28 +++++++++
 4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static struct i915_gem_context_sseu
+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+				     const struct drm_i915_gem_context_param_sseu *user_sseu)
+{
+	struct i915_gem_context_sseu value = {
+		.slice_mask = user_sseu->packed.slice_mask == 0 ?
+			      sseu->slice_mask :
+			      (user_sseu->packed.slice_mask & sseu->slice_mask),
+		.subslice_mask = user_sseu->packed.subslice_mask == 0 ?
+				 sseu->subslice_mask[0] :
+				 (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
+		.min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
+					    sseu->max_eus_per_subslice),
+		.max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
+					    sseu->max_eus_per_subslice),
+	};
+
+	return value;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = i915_gem_engine_from_flags(to_i915(dev), file,
+						    param_sseu.flags);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		param_sseu.packed.slice_mask =
+			ctx->engine[engine->id].sseu.slice_mask;
+		param_sseu.packed.subslice_mask =
+			ctx->engine[engine->id].sseu.subslice_mask;
+		param_sseu.packed.min_eus_per_subslice =
+			ctx->engine[engine->id].sseu.min_eus_per_subslice;
+		param_sseu.packed.max_eus_per_subslice =
+			ctx->engine[engine->id].sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
-
 	case I915_CONTEXT_PARAM_PRIORITY:
 		{
 			s64 priority = args->value;
@@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		if (args->size)
+			ret = -EINVAL;
+		else if (!HAS_EXECLISTS(ctx->i915))
+			ret = -ENODEV;
+		else {
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			struct i915_gem_context_sseu ctx_sseu;
+			struct intel_engine_cs *engine;
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = i915_gem_engine_from_flags(dev_priv, file,
+							    user_sseu.flags);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ctx_sseu =
+				i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+								     &user_sseu);
 
+			ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dca17ef24de5..9caddcb1f234 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 	}
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      struct i915_gem_context_sseu *sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
+		return 0;
+
+	/*
+	 * We can only program this on render ring.
+	 */
+	ce = &ctx->engine[RCS];
+
+	if (ce->pin_count) { /* Assume that the context is active! */
+		ret = i915_gem_switch_to_kernel_context(dev_priv);
+		if (ret)
+			return ret;
+
+		ret = i915_gem_wait_for_idle(dev_priv,
+					     I915_WAIT_INTERRUPTIBLE |
+					     I915_WAIT_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (ce->state) {
+		void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+		u32 *regs;
+
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
+
+		regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
+
+		regs[CTX_R_PWR_CLK_STATE + 1] =
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id)
+		ctx->engine[id].sseu = *sseu;
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/intel_lrc.c"
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index d91d69a17206..4c84266814fa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
 	return ctx->engine[engine->id].lrc_desc;
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      struct i915_gem_context_sseu *sseu);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 96eda8176030..75749ce11c03 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine to be configured or queried. Same value you would use with
+	 * drm_i915_gem_execbuffer2.
+	 */
+	__u64 flags;
+
+	/*
+	 * Setting slice_mask or subslice_mask to 0 will make the context use
+	 * masks reported respectively by I915_PARAM_SLICE_MASK or
+	 * I915_PARAM_SUBSLICE_MASK.
+	 */
+	union {
+		struct {
+			__u8 slice_mask;
+			__u8 subslice_mask;
+			__u8 min_eus_per_subslice;
+			__u8 max_eus_per_subslice;
+		} packed;
+		__u64 value;
+	};
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-08-16  9:56 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 1/8] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 2/8] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 3/8] drm/i915/perf: simplify configure all context function Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 4/8] drm/i915/perf: reuse intel_lrc ctx regs macro Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
2018-08-14 14:59   ` Lionel Landwerlin
2018-08-15 11:58     ` Tvrtko Ursulin
2018-08-16  7:31       ` Tvrtko Ursulin
2018-08-16  9:56         ` Lionel Landwerlin
2018-08-14 14:40 ` [PATCH 6/8] drm/i915: Add global barrier support Tvrtko Ursulin
2018-08-14 14:59   ` Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 7/8] drm/i915: Explicitly mark Global GTT address spaces Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-08-14 14:59   ` Chris Wilson
2018-08-14 15:11     ` Lionel Landwerlin
2018-08-14 15:18       ` Chris Wilson
2018-08-14 16:05         ` Lionel Landwerlin
2018-08-14 16:09           ` Lionel Landwerlin
2018-08-14 18:44     ` Tvrtko Ursulin
2018-08-14 18:53       ` Chris Wilson
2018-08-15  9:12         ` Tvrtko Ursulin
2018-08-14 15:22   ` Chris Wilson
2018-08-15 11:51     ` Tvrtko Ursulin
2018-08-15 11:56       ` Chris Wilson
2018-08-14 14:45 ` [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-08-14 15:15 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-08-14 15:19 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-08-14 15:32 ` ✓ Fi.CI.BAT: success " Patchwork
2018-08-14 20:22 ` ✗ Fi.CI.IGT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-04-26 10:00   ` Joonas Lahtinen
2018-04-26 10:22     ` Lionel Landwerlin
2018-05-03 16:04       ` Joonas Lahtinen
2018-05-03 16:14         ` Chris Wilson
2018-05-03 16:25         ` Lionel Landwerlin
2018-05-03 16:30         ` Tvrtko Ursulin
2018-05-03 17:18   ` Tvrtko Ursulin
2018-05-04 16:25     ` Lionel Landwerlin
2018-05-08  4:04       ` Rogozhkin, Dmitry V
2018-05-08  8:24         ` Tvrtko Ursulin
2018-05-08 16:00           ` Rogozhkin, Dmitry V
2018-05-08 20:56     ` Chris Wilson
2018-05-09 15:35       ` Lionel Landwerlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.