All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/7] Per context dynamic (sub)slice power-gating
@ 2018-09-05 14:22 Tvrtko Ursulin
  2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
                   ` (11 more replies)
  0 siblings, 12 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Updated series after continuing Lionel's work.

Userspace for the feature is the media-driver project on GitHub. Please see
https://github.com/intel/media-driver/pull/271/commits.

No headline changes this time.

Some review feedback, some refactoring, some patches got merged and two new
appeared to help with the simplified implementation and also lock SSEU config
to a workable set on Icelake.

IGT to be sent separately.

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (1):
  drm/i915/perf: lock powergating configuration to default when active

Tvrtko Ursulin (3):
  drm/i915/execlists: Move RPCS setup to context pin
  drm/i915: Add timeline barrier support
  drm/i915/icl: Support co-existance between per-context SSEU and OA

 drivers/gpu/drm/i915/i915_drv.h               |  14 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 305 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |  10 +
 drivers/gpu/drm/i915/i915_perf.c              |   5 +
 drivers/gpu/drm/i915/i915_request.c           |  13 +
 drivers/gpu/drm/i915/i915_request.h           |  10 +
 drivers/gpu/drm/i915/i915_timeline.c          |   3 +
 drivers/gpu/drm/i915/i915_timeline.h          |  27 ++
 drivers/gpu/drm/i915/intel_lrc.c              |  65 ++--
 drivers/gpu/drm/i915/intel_lrc.h              |   3 +
 .../gpu/drm/i915/selftests/mock_timeline.c    |   2 +
 include/uapi/drm/i915_drm.h                   |  43 +++
 12 files changed, 479 insertions(+), 21 deletions(-)

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 15:14   ` Chris Wilson
  2018-09-05 14:22 ` [PATCH 2/7] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Configuring RPCS in context image just before pin is sufficient and will
come extra handy in one of the following patches.

v2:
 * Split image setup a bit differently. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9b1f0e5211a0..358fad63564c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1305,6 +1305,8 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, 0, flags);
 }
 
+static u32 make_rpcs(struct drm_i915_private *dev_priv);
+
 static struct intel_context *
 __execlists_context_pin(struct intel_engine_cs *engine,
 			struct i915_gem_context *ctx,
@@ -1344,6 +1346,12 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	GEM_BUG_ON(!intel_ring_offset_valid(ce->ring, ce->ring->head));
 	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
 
+	/* RPCS */
+	if (engine->class == RENDER_CLASS) {
+		ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
+						make_rpcs(engine->i915);
+	}
+
 	ce->state->obj->pin_global++;
 	i915_gem_context_get(ctx);
 	return ce;
@@ -2706,8 +2714,7 @@ static void execlists_init_reg_state(u32 *regs,
 
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
-		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/7] drm/i915: Program RPCS for Broadwell
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
  2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 358fad63564c..3bdc1ac3e926 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2501,13 +2501,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	u8 subslices = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
 	 * wide and Icelake has up to eight subslices, specfial programming is
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
  2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
  2018-09-05 14:22 ` [PATCH 2/7] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 15:18   ` Chris Wilson
  2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

v5: More to_intel_context() (Tvrtko)
    Switch intel_sseu from union to struct (Tvrtko)
    Move context default sseu in existing loop (Chris)

v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)

Tvrtko Ursulin:

v7:
 * Pass intel_sseu by pointer instead of value to make_rpcs.
 * Rebase for make_rpcs changes.

v8:
 * Rebase for RPCS edit on pin.

v9:
 * Rebase for context image setup changes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 +++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.h |  4 ++++
 drivers/gpu/drm/i915/i915_request.h     | 10 ++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 26 ++++++++++++-------------
 5 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 767615ecdea5..e4682fc572e6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3449,6 +3449,20 @@ mkwrite_device_info(struct drm_i915_private *dev_priv)
 	return (struct intel_device_info *)&dev_priv->info;
 }
 
+static inline struct intel_sseu
+intel_device_default_sseu(struct drm_i915_private *i915)
+{
+	const struct sseu_dev_info *sseu = &INTEL_INFO(i915)->sseu;
+	struct intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 /* modesetting */
 extern void intel_modeset_init_hw(struct drm_device *dev);
 extern int intel_modeset_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 747b8170a15a..ca2c8fcd1090 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -343,6 +343,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		struct intel_context *ce = &ctx->__engine[n];
 
 		ce->gem_context = ctx;
+		/* Use the whole device by default */
+		ce->sseu = intel_device_default_sseu(dev_priv);
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index e09673ca731d..79d2e8f62ad1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -31,6 +31,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -165,6 +166,9 @@ struct i915_gem_context {
 		int pin_count;
 
 		const struct intel_context_ops *ops;
+
+		/** sseu: Control eu/slice partitioning */
+		struct intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 9898301ab7ef..eb6f8cce16c4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -39,6 +39,16 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+struct intel_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
 struct intel_wait {
 	struct rb_node node;
 	struct task_struct *tsk;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3bdc1ac3e926..8a477e43dbca 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1305,7 +1305,8 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, 0, flags);
 }
 
-static u32 make_rpcs(struct drm_i915_private *dev_priv);
+static u32 make_rpcs(struct drm_i915_private *dev_priv,
+		     struct intel_sseu *ctx_sseu);
 
 static struct intel_context *
 __execlists_context_pin(struct intel_engine_cs *engine,
@@ -1349,7 +1350,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	/* RPCS */
 	if (engine->class == RENDER_CLASS) {
 		ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
-						make_rpcs(engine->i915);
+					make_rpcs(engine->i915, &ce->sseu);
 	}
 
 	ce->state->obj->pin_global++;
@@ -2493,12 +2494,13 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(struct drm_i915_private *dev_priv,
+		     struct intel_sseu *ctx_sseu)
 {
-	bool subslice_pg = INTEL_INFO(dev_priv)->sseu.has_subslice_pg;
-	u8 slices = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
-	u8 subslices = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
+	const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
+	bool subslice_pg = sseu->has_subslice_pg;
+	u8 slices = hweight8(ctx_sseu->slice_mask);
+	u8 subslices = hweight8(ctx_sseu->subslice_mask);
 	u32 rpcs = 0;
 
 	/*
@@ -2539,7 +2541,7 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		u32 mask, val = slices;
 
 		if (INTEL_GEN(dev_priv) >= 11) {
@@ -2567,18 +2569,16 @@ make_rpcs(struct drm_i915_private *dev_priv)
 		rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_SS_CNT_ENABLE | val;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
+	if (sseu->has_eu_pg) {
 		u32 val;
 
-		val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
-		      GEN8_RPCS_EU_MIN_SHIFT;
+		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
 		val &= GEN8_RPCS_EU_MIN_MASK;
 
 		rpcs |= val;
 
-		val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
-		      GEN8_RPCS_EU_MAX_SHIFT;
+		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
 		val &= GEN8_RPCS_EU_MAX_MASK;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 15:21   ` Chris Wilson
  2018-09-06  9:57   ` Lionel Landwerlin
  2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active. There is no known issue with this apart from a performance
penality for some media workloads that benefit from running on a
partially powergated GPU. We already prevent RC6 from affecting the
programming so it doesn't sound completely unreasonable to hold on
powergating for the same reason.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
    More to_intel_context() (Tvrtko)
    s/dev_priv/i915/ (Tvrtko)

Tvrtko Ursulin:

v4:
 * Rebase for make_rpcs changes.

v5:
 * Apply OA restriction from make_rpcs directly.

v6:
 * Rebase for context image setup changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c |  5 +++++
 drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_lrc.h |  3 +++
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ccb20230df2c..dd65b72bddd4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(dev_priv,
+			       &to_intel_context(ctx,
+						 dev_priv->engine[RCS])->sseu));
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8a477e43dbca..9709c1fbe836 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1305,9 +1305,6 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, 0, flags);
 }
 
-static u32 make_rpcs(struct drm_i915_private *dev_priv,
-		     struct intel_sseu *ctx_sseu);
-
 static struct intel_context *
 __execlists_context_pin(struct intel_engine_cs *engine,
 			struct i915_gem_context *ctx,
@@ -1350,7 +1347,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	/* RPCS */
 	if (engine->class == RENDER_CLASS) {
 		ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
-					make_rpcs(engine->i915, &ce->sseu);
+					gen8_make_rpcs(engine->i915, &ce->sseu);
 	}
 
 	ce->state->obj->pin_global++;
@@ -2494,15 +2491,28 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(struct drm_i915_private *dev_priv,
-		     struct intel_sseu *ctx_sseu)
+u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
+		   struct intel_sseu *req_sseu)
 {
 	const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
 	bool subslice_pg = sseu->has_subslice_pg;
-	u8 slices = hweight8(ctx_sseu->slice_mask);
-	u8 subslices = hweight8(ctx_sseu->subslice_mask);
+	struct intel_sseu ctx_sseu;
+	u8 slices, subslices;
 	u32 rpcs = 0;
 
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system. The most natural configuration to take in that case
+	 * is the default (i.e maximum the hardware can do).
+	 */
+	if (unlikely(dev_priv->perf.oa.exclusive_stream))
+		ctx_sseu = intel_device_default_sseu(dev_priv);
+	else
+		ctx_sseu = *req_sseu;
+
+	slices = hweight8(ctx_sseu.slice_mask);
+	subslices = hweight8(ctx_sseu.subslice_mask);
+
 	/*
 	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
 	 * wide and Icelake has up to eight subslices, specfial programming is
@@ -2572,13 +2582,13 @@ static u32 make_rpcs(struct drm_i915_private *dev_priv,
 	if (sseu->has_eu_pg) {
 		u32 val;
 
-		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
+		val = ctx_sseu.min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
 		val &= GEN8_RPCS_EU_MIN_MASK;
 
 		rpcs |= val;
 
-		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
+		val = ctx_sseu.max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
 		val &= GEN8_RPCS_EU_MAX_MASK;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f5a5502ecf70..11da6fc0002d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
 
+u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
+		   struct intel_sseu *ctx_sseu);
+
 #endif /* _INTEL_LRC_H_ */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/7] drm/i915: Add timeline barrier support
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 15:23   ` Chris Wilson
  2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
 * Assert barrier has been retired on timeline_fini. (Chris Wilson)
 * Fix mock_timeline.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c           | 13 +++++++++
 drivers/gpu/drm/i915/i915_timeline.c          |  3 +++
 drivers/gpu/drm/i915/i915_timeline.h          | 27 +++++++++++++++++++
 .../gpu/drm/i915/selftests/mock_timeline.c    |  2 ++
 4 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 09ed48833b54..245f43f3bcc1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -644,6 +644,15 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	return NOTIFY_DONE;
 }
 
+static int add_timeline_barrier(struct i915_request *rq)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(&rq->timeline->barrier,
+				    &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -806,6 +815,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_timeline_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	/* Unconditionally invalidate GPU caches and TLBs. */
 	ret = engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 4667cc08c416..5a87c5bd5154 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -37,6 +37,8 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
+
+	init_request_active(&timeline->barrier, NULL);
 }
 
 /**
@@ -69,6 +71,7 @@ void i915_timelines_park(struct drm_i915_private *i915)
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(!list_empty(&timeline->requests));
+	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index a2c2c3ab5fb0..c8526ab44dbc 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -72,6 +72,16 @@ struct i915_timeline {
 	 */
 	u32 global_sync[I915_NUM_ENGINES];
 
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions be executed only after this barrier has been
+	 * completed.
+	 */
+	struct i915_gem_active barrier;
+
 	struct list_head link;
 	const char *name;
 
@@ -125,4 +135,21 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 
 void i915_timelines_park(struct drm_i915_private *i915);
 
+/**
+ * i915_timeline_set_barrier - orders submission between different timelines
+ * @timeline: timeline to set the barrier on
+ * @rq: request after which new submissions can proceed
+ *
+ * Sets the passed in request as the serialization point for all subsequent
+ * submissions on @timeline. Subsequent requests will not be submitted to GPU
+ * until the barrier has been completed.
+ */
+static inline void
+i915_timeline_set_barrier(struct i915_timeline *timeline,
+			  struct i915_request *rq)
+{
+	GEM_BUG_ON(timeline->fence_context == rq->timeline->fence_context);
+	i915_gem_active_set(&timeline->barrier, rq);
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index dcf3b16f5a07..a718b64c988e 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -19,6 +19,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	i915_syncmap_init(&timeline->sync);
 
+	init_request_active(&timeline->barrier, NULL);
+
 	INIT_LIST_HEAD(&timeline->link);
 }
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 15:29   ` Chris Wilson
  2018-09-05 14:22 ` [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA Tvrtko Ursulin
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
	                                                .instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
    dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

v11:
 * Remove engine vfunc in favour of local helper. (Chris Wilson)
 * Stop retiring requests before updates since it is not needed
   (Chris Wilson)
 * Implement direct CPU update path for idle contexts. (Chris Wilson)
 * Left side dependency needs only be on the same context timeline.
   (Chris Wilson)
 * It is sufficient to order the timeline. (Chris Wilson)
 * Reject !RCS configuration attempts with -ENODEV for now.

v12:
 * Rebase for make_rpcs.

v13:
 * Centralize SSEU normalization to make_rpcs.
 * Type width checking (uAPI <-> implementation).
 * Gen11 restrictions uAPI checks.
 * Gen11 subslice count differences handling.
 Chris Wilson:
 * args->size handling fixes.
 * Update context image from GGTT.
 * Postpone context image update to pinning.
 * Use i915_gem_active_raw instead of last_request_on_engine.

v14:
 * Add activity tracker on intel_context to fix the lifetime issues
   and simplify the code. (Chris Wilson)

v15:
 * Fix context pin leak if no space in ring by simplifying the
   context pinning sequence.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 303 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h |   6 +
 drivers/gpu/drm/i915/intel_lrc.c        |   4 +-
 include/uapi/drm/i915_drm.h             |  43 ++++
 4 files changed, 353 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ca2c8fcd1090..aa1f34e63080 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -90,6 +90,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
+static void intel_context_retire(struct i915_gem_active *active,
+				 struct i915_request *rq)
+{
+	struct intel_context *ce = container_of(active, typeof(*ce), active);
+
+	intel_context_unpin(ce);
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
@@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		ce->gem_context = ctx;
 		/* Use the whole device by default */
 		ce->sseu = intel_device_default_sseu(dev_priv);
+
+		init_request_active(&ce->active, intel_context_retire);
 	}
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
@@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	if (!ctx)
 		return -ENOENT;
 
-	args->size = 0;
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 		ret = -EINVAL;
 		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
 		args->value = ctx->flags & CONTEXT_NO_ZEROMAP;
 		break;
 	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+
 		if (ctx->ppgtt)
 			args->value = ctx->ppgtt->vm.total;
 		else if (to_i915(dev)->mm.aliasing_ppgtt)
@@ -875,14 +930,20 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 			args->value = to_i915(dev)->ggtt.vm.total;
 		break;
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
 		args->value = i915_gem_context_no_error_capture(ctx);
 		break;
 	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
 	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -892,6 +953,242 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct intel_context *ce,
+				 struct intel_sseu sseu)
+{
+	u64 offset;
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = ce->state->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(rq->i915, &sseu);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int
+gen8_modify_rpcs_gpu(struct intel_context *ce,
+		     struct intel_engine_cs *engine,
+		     struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = engine->i915;
+	struct i915_request *rq, *prev;
+	int ret;
+
+	GEM_BUG_ON(!ce->pin_count);
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	/* Submitting requests etc needs the hw awake. */
+	intel_runtime_pm_get(i915);
+
+	rq = i915_request_alloc(engine, i915->kernel_context);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		goto out_put;
+	}
+
+	ret = gen8_emit_rpcs_config(rq, ce, sseu);
+	if (ret)
+		goto out_add;
+
+	/* Queue this switch after all other activity by this context. */
+	prev = i915_gem_active_raw(&ce->ring->timeline->last_request,
+				   &i915->drm.struct_mutex);
+	if (prev && !i915_request_completed(prev))
+		i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+						 &prev->submit,
+						 I915_FENCE_GFP);
+
+	/* Order all following requests to be after. */
+	i915_timeline_set_barrier(ce->ring->timeline, rq);
+
+	/*
+	 * Guarantee context image and the timeline remains pinned until the
+	 * modifying request is retired by setting the ce activity tracker.
+	 *
+	 * But we only need to take one pin on the account of it. Or in other
+	 * words transfer the pinned ce object to tracked active request.
+	 */
+	if (!i915_gem_active_isset(&ce->active))
+		__intel_context_pin(ce);
+	i915_gem_active_set(&ce->active, rq);
+
+out_add:
+	i915_request_add(rq);
+out_put:
+	intel_runtime_pm_put(i915);
+
+	return ret;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
+{
+	struct intel_context *ce = to_intel_context(ctx, engine);
+	int ret = 0;
+
+	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
+	GEM_BUG_ON(engine->id != RCS);
+
+	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
+
+	/* Nothing to do if unmodified. */
+	if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
+		return 0;
+
+	/*
+	 * If context is not idle we have to submit an ordered request to modify
+	 * its context image via the kernel context. Pristine and idle contexts
+	 * will be configured on pinning.
+	 */
+	if (ce->pin_count)
+		ret = gen8_modify_rpcs_gpu(ce, engine, sseu);
+
+	if (!ret)
+		ce->sseu = sseu;
+
+	return ret;
+}
+
+static int
+user_to_context_sseu(struct drm_i915_private *i915,
+		     const struct drm_i915_gem_context_param_sseu *user,
+		     struct intel_sseu *context)
+{
+	const struct sseu_dev_info *device = &INTEL_INFO(i915)->sseu;
+
+	/* No zeros in any field. */
+	if (!user->slice_mask || !user->subslice_mask ||
+	    !user->min_eus_per_subslice || !user->max_eus_per_subslice)
+		return -EINVAL;
+
+	/* Max > min. */
+	if (user->max_eus_per_subslice < user->min_eus_per_subslice)
+		return -EINVAL;
+
+	/* Check validity against hardware. */
+	if (user->slice_mask & ~device->slice_mask)
+		return -EINVAL;
+
+	if (user->subslice_mask & ~device->subslice_mask[0])
+		return -EINVAL;
+
+	if (user->max_eus_per_subslice > device->max_eus_per_subslice)
+		return -EINVAL;
+
+	/*
+	 * Some future proofing on the types since the uAPI is wider than the
+	 * current internal implementation.
+	 */
+	if (WARN_ON((fls(user->slice_mask) >
+		     sizeof(context->slice_mask) * BITS_PER_BYTE) ||
+		    (fls(user->subslice_mask) >
+		     sizeof(context->subslice_mask) * BITS_PER_BYTE) ||
+		    overflows_type(user->min_eus_per_subslice,
+				   context->min_eus_per_subslice) ||
+		    overflows_type(user->max_eus_per_subslice,
+				   context->max_eus_per_subslice)))
+		return -EINVAL;
+
+	context->slice_mask = user->slice_mask;
+	context->subslice_mask = user->subslice_mask;
+	context->min_eus_per_subslice = user->min_eus_per_subslice;
+	context->max_eus_per_subslice = user->max_eus_per_subslice;
+
+	/* Part specific restrictions. */
+	if (IS_GEN11(i915)) {
+		unsigned int hw_ss_per_s = hweight8(device->subslice_mask[0]);
+		unsigned int req_s = hweight8(context->slice_mask);
+		unsigned int req_ss = hweight8(context->subslice_mask);
+
+		/*
+		 * Only full subslice enablement is possible if more than one
+		 * slice is turned on.
+		 */
+		if (req_s > 1 && req_ss != hw_ss_per_s)
+			return -EINVAL;
+
+		/*
+		 * If more than four (SScount bitfield limit) subslices are
+		 * requested then the number has to be even.
+		 */
+		if (req_ss > 4 && (req_ss & 1))
+			return -EINVAL;
+
+		/*
+		 * If only one slice is enabled subslice count must be at most
+		 * half of the all available subslices.
+		 */
+		if (req_s == 1 && req_ss > (hw_ss_per_s / 2))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int set_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_sseu sseu;
+	int ret;
+
+	if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (INTEL_GEN(i915) < 8)
+		return -ENODEV;
+
+	if (!IS_GEN11(i915) && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	/* Only render engine supports RPCS configuration. */
+	if (engine->class != RENDER_CLASS)
+		return -ENODEV;
+
+	ret = user_to_context_sseu(i915, &user_sseu, &sseu);
+	if (ret)
+		return ret;
+
+	ret = i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
+	if (ret)
+		return ret;
+
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
-
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = set_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 79d2e8f62ad1..968e1d47d944 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -165,6 +165,12 @@ struct i915_gem_context {
 		u64 lrc_desc;
 		int pin_count;
 
+		/**
+		 * active: Active tracker for the external rq activity on this
+		 * intel_context object.
+		 */
+		struct i915_gem_active active;
+
 		const struct intel_context_ops *ops;
 
 		/** sseu: Control eu/slice partitioning */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9709c1fbe836..3c85392a3109 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
 	 * subslices are enabled, or a count between one and four on the first
 	 * slice.
 	 */
-	if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) {
+	if (IS_GEN11(dev_priv) &&
+	    slices == 1 &&
+	    subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) {
 		GEM_BUG_ON(subslices & 1);
 
 		subslice_pg = false;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a4446f452040..e195c38b15a6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1478,9 +1478,52 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 class;
+	__u16 instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd1;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd2;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
@ 2018-09-05 14:22 ` Tvrtko Ursulin
  2018-09-05 14:46 ` ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2) Patchwork
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-05 14:22 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

When OA is active we want to lock the powergating configuration, but on
Icelake users like media stack will have issues if we lock to the full
device configuration.

Instead lock to a subset of (sub)slices which are currently a known
working configuration for all users.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3c85392a3109..19c9c46308e5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2502,13 +2502,28 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
 
 	/*
 	 * If i915/perf is active, we want a stable powergating configuration
-	 * on the system. The most natural configuration to take in that case
-	 * is the default (i.e maximum the hardware can do).
+	 * on the system.
+	 *
+	 * We could choose full enablement, but on ICL we know there are use
+	 * cases which disable slices for functional, apart for performance
+	 * reasons. So in this case we select a known stable subset.
 	 */
-	if (unlikely(dev_priv->perf.oa.exclusive_stream))
-		ctx_sseu = intel_device_default_sseu(dev_priv);
-	else
+	if (!dev_priv->perf.oa.exclusive_stream) {
 		ctx_sseu = *req_sseu;
+	} else {
+		ctx_sseu = intel_device_default_sseu(dev_priv);
+
+		if (IS_GEN11(dev_priv)) {
+			/*
+			 * We only need subslice count so it doesn't matter
+			 * which ones we select - just turn of low bits in the
+			 * amount of half of all available subslices per slice.
+			 */
+			ctx_sseu.subslice_mask =
+				~(~0 << (hweight8(ctx_sseu.subslice_mask) / 2));
+			ctx_sseu.slice_mask = 0x1;
+		}
+	}
 
 	slices = hweight8(ctx_sseu.slice_mask);
 	subslices = hweight8(ctx_sseu.subslice_mask);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2)
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2018-09-05 14:22 ` [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA Tvrtko Ursulin
@ 2018-09-05 14:46 ` Patchwork
  2018-09-05 14:49 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2018-09-05 14:46 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating (rev2)
URL   : https://patchwork.freedesktop.org/series/48194/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ed66235dc3a5 drm/i915/execlists: Move RPCS setup to context pin
ecdf22ac2704 drm/i915: Program RPCS for Broadwell
377a656e5c19 drm/i915: Record the sseu configuration per-context & engine
3a30bf8ddfa7 drm/i915/perf: lock powergating configuration to default when active
51555815cbdc drm/i915: Add timeline barrier support
8a0c3e82be78 drm/i915: Expose RPCS (SSEU) configuration to userspace
-:40: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#40: 
v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

total: 0 errors, 1 warnings, 0 checks, 441 lines checked
8e7f97c0a4d7 drm/i915/icl: Support co-existance between per-context SSEU and OA
-:4: WARNING:TYPO_SPELLING: 'existance' may be misspelled - perhaps 'existence'?
#4: 
Subject: [PATCH] drm/i915/icl: Support co-existance between per-context SSEU

total: 0 errors, 1 warnings, 0 checks, 33 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* ✗ Fi.CI.SPARSE: warning for Per context dynamic (sub)slice power-gating (rev2)
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2018-09-05 14:46 ` ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2) Patchwork
@ 2018-09-05 14:49 ` Patchwork
  2018-09-05 15:05 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2018-09-05 14:49 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating (rev2)
URL   : https://patchwork.freedesktop.org/series/48194/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915/execlists: Move RPCS setup to context pin
Okay!

Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: Record the sseu configuration per-context & engine
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3688:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3702:16: warning: expression using sizeof(void)

Commit: drm/i915/perf: lock powergating configuration to default when active
Okay!

Commit: drm/i915: Add timeline barrier support
Okay!

Commit: drm/i915: Expose RPCS (SSEU) configuration to userspace
+drivers/gpu/drm/i915/intel_lrc.c:2543:25: warning: expression using sizeof(void)

Commit: drm/i915/icl: Support co-existance between per-context SSEU and OA
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* ✓ Fi.CI.BAT: success for Per context dynamic (sub)slice power-gating (rev2)
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2018-09-05 14:49 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-09-05 15:05 ` Patchwork
  2018-09-05 19:55 ` ✗ Fi.CI.IGT: failure " Patchwork
  2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
  11 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2018-09-05 15:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating (rev2)
URL   : https://patchwork.freedesktop.org/series/48194/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4772 -> Patchwork_10095 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/48194/revisions/2/mbox/

== Known issues ==

  Here are the changes found in Patchwork_10095 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_frontbuffer_tracking@basic:
      fi-byt-clapper:     PASS -> FAIL (fdo#103167)

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-a:
      fi-byt-clapper:     PASS -> FAIL (fdo#103191, fdo#107362)

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       DMESG-WARN (fdo#105128, fdo#107139) -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-ilk-650:         DMESG-WARN (fdo#106387) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-byt-clapper:     FAIL (fdo#103191, fdo#107362) -> PASS
      fi-blb-e6850:       INCOMPLETE (fdo#107718) -> PASS

    
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#106387 https://bugs.freedesktop.org/show_bug.cgi?id=106387
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718


== Participating hosts (54 -> 48) ==

  Missing    (6): fi-ilk-m540 fi-bxt-dsi fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_4772 -> Patchwork_10095

  CI_DRM_4772: 1351ee8f3aacdb8f4a71cd17a7035556065c59a9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4629: c3b6d69aa3dd2d1a6c1f2e787670a0aef78f2ea5 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10095: 8e7f97c0a4d7d79b7f434f2c18a1455fedfcc6a5 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8e7f97c0a4d7 drm/i915/icl: Support co-existance between per-context SSEU and OA
8a0c3e82be78 drm/i915: Expose RPCS (SSEU) configuration to userspace
51555815cbdc drm/i915: Add timeline barrier support
3a30bf8ddfa7 drm/i915/perf: lock powergating configuration to default when active
377a656e5c19 drm/i915: Record the sseu configuration per-context & engine
ecdf22ac2704 drm/i915: Program RPCS for Broadwell
ed66235dc3a5 drm/i915/execlists: Move RPCS setup to context pin

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10095/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin
  2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
@ 2018-09-05 15:14   ` Chris Wilson
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Wilson @ 2018-09-05 15:14 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:16)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Configuring RPCS in context image just before pin is sufficient and will
> come extra handy in one of the following patches.
> 
> v2:
>  * Split image setup a bit differently. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine
  2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
@ 2018-09-05 15:18   ` Chris Wilson
  2018-09-06  9:36     ` Tvrtko Ursulin
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-05 15:18 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:18)
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to expose the ability to reconfigure the slices, subslice and
> eu per context and per engine. To facilitate that, store the current
> configuration on the context for each engine, which is initially set
> to the device default upon creation.
> 
> v2: record sseu configuration per context & engine (Chris)
> 
> v3: introduce the i915_gem_context_sseu to store powergating
>     programming, sseu_dev_info has grown quite a bit (Lionel)
> 
> v4: rename i915_gem_sseu into intel_sseu (Chris)
>     use to_intel_context() (Chris)
> 
> v5: More to_intel_context() (Tvrtko)
>     Switch intel_sseu from union to struct (Tvrtko)
>     Move context default sseu in existing loop (Chris)
> 
> v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)
> 
> Tvrtko Ursulin:
> 
> v7:
>  * Pass intel_sseu by pointer instead of value to make_rpcs.
>  * Rebase for make_rpcs changes.
> 
> v8:
>  * Rebase for RPCS edit on pin.
> 
> v9:
>  * Rebase for context image setup changes.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I feel this is substantially different (since I just outlined a v1!) to
merit a

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

and probably deserves a different author. I think Lionel is still the
principle author here, but Tvrtko has done a lot of refactoring and
integrating in the new scheme.

> -static u32 make_rpcs(struct drm_i915_private *dev_priv);
> +static u32 make_rpcs(struct drm_i915_private *dev_priv,
> +                    struct intel_sseu *ctx_sseu);
>  
>  static struct intel_context *
>  __execlists_context_pin(struct intel_engine_cs *engine,
> @@ -1349,7 +1350,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>         /* RPCS */
>         if (engine->class == RENDER_CLASS) {
>                 ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
> -                                               make_rpcs(engine->i915);
> +                                       make_rpcs(engine->i915, &ce->sseu);

We have different habits here; my vim config just gives this a single
tab indent beyond the incomplete line. (Was going to say it earlier ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
@ 2018-09-05 15:21   ` Chris Wilson
  2018-09-06  9:41     ` Tvrtko Ursulin
  2018-09-06  9:57   ` Lionel Landwerlin
  1 sibling, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-05 15:21 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:19)
> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
> -                    struct intel_sseu *ctx_sseu)
> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
> +                  struct intel_sseu *req_sseu)

Should we retrospectively make this const?

(And anychance for a s/dev_priv/i915?)

>  {
>         const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
>         bool subslice_pg = sseu->has_subslice_pg;
> -       u8 slices = hweight8(ctx_sseu->slice_mask);
> -       u8 subslices = hweight8(ctx_sseu->subslice_mask);
> +       struct intel_sseu ctx_sseu;
> +       u8 slices, subslices;
>         u32 rpcs = 0;
>  
> +       /*
> +        * If i915/perf is active, we want a stable powergating configuration
> +        * on the system. The most natural configuration to take in that case
> +        * is the default (i.e maximum the hardware can do).
> +        */
> +       if (unlikely(dev_priv->perf.oa.exclusive_stream))
> +               ctx_sseu = intel_device_default_sseu(dev_priv);
> +       else
> +               ctx_sseu = *req_sseu;

:(

I'm not sure if I can suggest anything better, but this does feel like a
layering violation.

It makes sense which makes it only feel worse.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/7] drm/i915: Add timeline barrier support
  2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
@ 2018-09-05 15:23   ` Chris Wilson
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Wilson @ 2018-09-05 15:23 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:20)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Timeline barrier allows serialization between different timelines.
> 
> After calling i915_timeline_set_barrier with a request, all following
> submissions on this timeline will be set up as depending on this request,
> or barrier. Once the barrier has been completed it automatically gets
> cleared and things continue as normal.
> 
> This facility will be used by the upcoming context SSEU code.
> 
> v2:
>  * Assert barrier has been retired on timeline_fini. (Chris Wilson)
>  * Fix mock_timeline.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I really should follow through on my threat to move
switch_to_kernel_context over to a similar scheme.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
@ 2018-09-05 15:29   ` Chris Wilson
  2018-09-06  9:50     ` Tvrtko Ursulin
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-05 15:29 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:21)
> From: Chris Wilson <chris@chris-wilson.co.uk>

Now this looks nothing like my first suggestion!

I think Tvrtko should stand ad the author of the final mechanism, I
think it is substantially different from the submission method first
done by Lionel.
 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>                                                         .instance = 0, };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a global
>     dependency (Chris)
>     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>     s/dev_priv/i915/ (Tvrtko)
>     Change uapi class/instance fields to u16 (Tvrtko)
>     Bump mask fields to 64bits (Lionel)
>     Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> 
> v9: Import context image into kernel context's ppgtt only when
>     reconfiguring powergated slice/subslices (Chris)
>     Use aliasing ppgtt when needed (Michel)
> 
> Tvrtko Ursulin:
> 
> v10:
>  * Update for upstream changes.
>  * Request submit needs a RPM reference.
>  * Reject on !FULL_PPGTT for simplicity.
>  * Pull out get/set param to helpers for readability and less indent.
>  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>    on the same timeline and avoid GEM_BUG_ON.
>  * No need to explicitly assign a NULL pointer to engine in legacy mode.
>  * No need to move gen8_make_rpcs up.
>  * Factored out global barrier as prep patch.
>  * Allow to only CAP_SYS_ADMIN if !Gen11.
> 
> v11:
>  * Remove engine vfunc in favour of local helper. (Chris Wilson)
>  * Stop retiring requests before updates since it is not needed
>    (Chris Wilson)
>  * Implement direct CPU update path for idle contexts. (Chris Wilson)
>  * Left side dependency needs only be on the same context timeline.
>    (Chris Wilson)
>  * It is sufficient to order the timeline. (Chris Wilson)
>  * Reject !RCS configuration attempts with -ENODEV for now.
> 
> v12:
>  * Rebase for make_rpcs.
> 
> v13:
>  * Centralize SSEU normalization to make_rpcs.
>  * Type width checking (uAPI <-> implementation).
>  * Gen11 restrictions uAPI checks.
>  * Gen11 subslice count differences handling.
>  Chris Wilson:
>  * args->size handling fixes.
>  * Update context image from GGTT.
>  * Postpone context image update to pinning.
>  * Use i915_gem_active_raw instead of last_request_on_engine.
> 
> v14:
>  * Add activity tracker on intel_context to fix the lifetime issues
>    and simplify the code. (Chris Wilson)
> 
> v15:
>  * Fix context pin leak if no space in ring by simplifying the
>    context pinning sequence.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Issue: https://github.com/intel/media-driver/issues/267
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 303 +++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem_context.h |   6 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   4 +-
>  include/uapi/drm/i915_drm.h             |  43 ++++
>  4 files changed, 353 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index ca2c8fcd1090..aa1f34e63080 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -90,6 +90,7 @@
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
>  #include "i915_trace.h"
> +#include "intel_lrc_reg.h"
>  #include "intel_workarounds.h"
>  
>  #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> @@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
>         return desc;
>  }
>  
> +static void intel_context_retire(struct i915_gem_active *active,
> +                                struct i915_request *rq)
> +{
> +       struct intel_context *ce = container_of(active, typeof(*ce), active);
> +
> +       intel_context_unpin(ce);
> +}
> +
>  static struct i915_gem_context *
>  __create_hw_context(struct drm_i915_private *dev_priv,
>                     struct drm_i915_file_private *file_priv)
> @@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>                 ce->gem_context = ctx;
>                 /* Use the whole device by default */
>                 ce->sseu = intel_device_default_sseu(dev_priv);
> +
> +               init_request_active(&ce->active, intel_context_retire);
>         }
>  
>         INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
> @@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static int get_sseu(struct i915_gem_context *ctx,
> +                   struct drm_i915_gem_context_param *args)
> +{
> +       struct drm_i915_gem_context_param_sseu user_sseu;
> +       struct intel_engine_cs *engine;
> +       struct intel_context *ce;
> +
> +       if (args->size == 0)
> +               goto out;
> +       else if (args->size < sizeof(user_sseu))
> +               return -EINVAL;
> +
> +       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +                          sizeof(user_sseu)))
> +               return -EFAULT;
> +
> +       if (user_sseu.rsvd1 || user_sseu.rsvd2)
> +               return -EINVAL;
> +
> +       engine = intel_engine_lookup_user(ctx->i915,
> +                                         user_sseu.class,
> +                                         user_sseu.instance);
> +       if (!engine)
> +               return -EINVAL;
> +
> +       ce = to_intel_context(ctx, engine);
> +
> +       user_sseu.slice_mask = ce->sseu.slice_mask;
> +       user_sseu.subslice_mask = ce->sseu.subslice_mask;
> +       user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> +       user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> +
> +       if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
> +                        sizeof(user_sseu)))
> +               return -EFAULT;
> +
> +out:
> +       args->size = sizeof(user_sseu);
> +
> +       return 0;
> +}
> +
>  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>                                     struct drm_file *file)
>  {
> @@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>         if (!ctx)
>                 return -ENOENT;
>  
> -       args->size = 0;

I've a slight preference to setting to 0 then overwriting it afterwards.

>         switch (args->param) {
>         case I915_CONTEXT_PARAM_BAN_PERIOD:
>                 ret = -EINVAL;
>                 break;
>         case I915_CONTEXT_PARAM_NO_ZEROMAP:
> +               args->size = 0;
>                 args->value = ctx->flags & CONTEXT_NO_ZEROMAP;
>                 break;
>         case I915_CONTEXT_PARAM_GTT_SIZE:
> +               args->size = 0;
> +
>                 if (ctx->ppgtt)
>                         args->value = ctx->ppgtt->vm.total;
>                 else if (to_i915(dev)->mm.aliasing_ppgtt)
>  int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                                     struct drm_file *file)
>  {
> @@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                                 ctx->sched.priority = priority;
>                 }
>                 break;
> -
> +       case I915_CONTEXT_PARAM_SSEU:
> +               ret = set_sseu(ctx, args);
> +               break;
>         default:
>                 ret = -EINVAL;
>                 break;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 79d2e8f62ad1..968e1d47d944 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -165,6 +165,12 @@ struct i915_gem_context {
>                 u64 lrc_desc;
>                 int pin_count;
>  
> +               /**
> +                * active: Active tracker for the external rq activity on this
> +                * intel_context object.
> +                */
> +               struct i915_gem_active active;
> +
>                 const struct intel_context_ops *ops;
>  
>                 /** sseu: Control eu/slice partitioning */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9709c1fbe836..3c85392a3109 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
>          * subslices are enabled, or a count between one and four on the first
>          * slice.
>          */
> -       if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) {
> +       if (IS_GEN11(dev_priv) &&
> +           slices == 1 &&
> +           subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) {

Sneaky. Is this a direct consequence of exposing sseu to the user, or
should argue the protection required irrespective of who fill sseu?

>                 GEM_BUG_ON(subslices & 1);
>  
>                 subslice_pg = false;

For the rq mechanics after all the hassle I gave Tvrtko,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I didn't look closely at the validation layer.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* ✗ Fi.CI.IGT: failure for Per context dynamic (sub)slice power-gating (rev2)
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2018-09-05 15:05 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-09-05 19:55 ` Patchwork
  2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
  11 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2018-09-05 19:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per context dynamic (sub)slice power-gating (rev2)
URL   : https://patchwork.freedesktop.org/series/48194/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4772_full -> Patchwork_10095_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_10095_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10095_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10095_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_ctx_param@invalid-param-get:
      shard-apl:          PASS -> FAIL
      shard-glk:          PASS -> FAIL
      shard-snb:          PASS -> FAIL
      shard-hsw:          PASS -> FAIL
      shard-kbl:          PASS -> FAIL

    
== Known issues ==

  Here are the changes found in Patchwork_10095_full that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@gem_ppgtt@blt-vs-render-ctx0:
      shard-kbl:          INCOMPLETE (fdo#106023, fdo#103665) -> PASS

    igt@kms_rotation_crc@primary-rotation-180:
      shard-kbl:          DMESG-WARN (fdo#105602, fdo#103558) -> PASS +9
      shard-apl:          DMESG-WARN (fdo#105602, fdo#103558) -> PASS +9

    
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#106023 https://bugs.freedesktop.org/show_bug.cgi?id=106023


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4772 -> Patchwork_10095

  CI_DRM_4772: 1351ee8f3aacdb8f4a71cd17a7035556065c59a9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4629: c3b6d69aa3dd2d1a6c1f2e787670a0aef78f2ea5 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10095: 8e7f97c0a4d7d79b7f434f2c18a1455fedfcc6a5 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10095/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine
  2018-09-05 15:18   ` Chris Wilson
@ 2018-09-06  9:36     ` Tvrtko Ursulin
  0 siblings, 0 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-06  9:36 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 05/09/2018 16:18, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-09-05 15:22:18)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to expose the ability to reconfigure the slices, subslice and
>> eu per context and per engine. To facilitate that, store the current
>> configuration on the context for each engine, which is initially set
>> to the device default upon creation.
>>
>> v2: record sseu configuration per context & engine (Chris)
>>
>> v3: introduce the i915_gem_context_sseu to store powergating
>>      programming, sseu_dev_info has grown quite a bit (Lionel)
>>
>> v4: rename i915_gem_sseu into intel_sseu (Chris)
>>      use to_intel_context() (Chris)
>>
>> v5: More to_intel_context() (Tvrtko)
>>      Switch intel_sseu from union to struct (Tvrtko)
>>      Move context default sseu in existing loop (Chris)
>>
>> v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)
>>
>> Tvrtko Ursulin:
>>
>> v7:
>>   * Pass intel_sseu by pointer instead of value to make_rpcs.
>>   * Rebase for make_rpcs changes.
>>
>> v8:
>>   * Rebase for RPCS edit on pin.
>>
>> v9:
>>   * Rebase for context image setup changes.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> I feel this is substantially different (since I just outlined a v1!) to
> merit a
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> and probably deserves a different author. I think Lionel is still the
> principle author here, but Tvrtko has done a lot of refactoring and
> integrating in the new scheme.

Agreed Lionel is the real author here - mine were just small tweaks.

>> -static u32 make_rpcs(struct drm_i915_private *dev_priv);
>> +static u32 make_rpcs(struct drm_i915_private *dev_priv,
>> +                    struct intel_sseu *ctx_sseu);
>>   
>>   static struct intel_context *
>>   __execlists_context_pin(struct intel_engine_cs *engine,
>> @@ -1349,7 +1350,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>>          /* RPCS */
>>          if (engine->class == RENDER_CLASS) {
>>                  ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
>> -                                               make_rpcs(engine->i915);
>> +                                       make_rpcs(engine->i915, &ce->sseu);
> 
> We have different habits here; my vim config just gives this a single
> tab indent beyond the incomplete line. (Was going to say it earlier ;)

Not sure if my approach here is always consistent, but I *think* I first 
try to indent it to where it "looks good". If neither indentation looks 
decidedly better, then I push it so end aligns with the wrap marker. I 
in this particular case I wasn't too happy with any of the options. :(

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-05 15:21   ` Chris Wilson
@ 2018-09-06  9:41     ` Tvrtko Ursulin
  0 siblings, 0 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-06  9:41 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 05/09/2018 16:21, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-09-05 15:22:19)
>> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
>> -                    struct intel_sseu *ctx_sseu)
>> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
>> +                  struct intel_sseu *req_sseu)
> 
> Should we retrospectively make this const?

Can do, but generally I try to avoid it kernel code since most of the 
time it is way more pain than benefit.

> (And anychance for a s/dev_priv/i915?)

Will check if it is doable without much noise at any of the stages.

>>   {
>>          const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
>>          bool subslice_pg = sseu->has_subslice_pg;
>> -       u8 slices = hweight8(ctx_sseu->slice_mask);
>> -       u8 subslices = hweight8(ctx_sseu->subslice_mask);
>> +       struct intel_sseu ctx_sseu;
>> +       u8 slices, subslices;
>>          u32 rpcs = 0;
>>   
>> +       /*
>> +        * If i915/perf is active, we want a stable powergating configuration
>> +        * on the system. The most natural configuration to take in that case
>> +        * is the default (i.e maximum the hardware can do).
>> +        */
>> +       if (unlikely(dev_priv->perf.oa.exclusive_stream))
>> +               ctx_sseu = intel_device_default_sseu(dev_priv);
>> +       else
>> +               ctx_sseu = *req_sseu;
> 
> :(
> 
> I'm not sure if I can suggest anything better, but this does feel like a
> layering violation.
> 
> It makes sense which makes it only feel worse.

It used to be a helper which applied the adjustment but I wasn't happy 
with how callers then had to know to call the helper and decided 
handling it at the core is better in more than one way.

I think bottom line is there is fundamental interaction between the two 
so some layering violation has to happen.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-09-05 15:29   ` Chris Wilson
@ 2018-09-06  9:50     ` Tvrtko Ursulin
  2018-09-06  9:54       ` Chris Wilson
  2018-09-06  9:58       ` Lionel Landwerlin
  0 siblings, 2 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-06  9:50 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 05/09/2018 16:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-09-05 15:22:21)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Now this looks nothing like my first suggestion!
> 
> I think Tvrtko should stand ad the author of the final mechanism, I
> think it is substantially different from the submission method first
> done by Lionel.

Okay I'll relieve you from authorship on this one. Not sure who between 
Lionel and me with, but I'll think of something.

>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>          struct drm_i915_gem_context_param arg;
>>          struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                          .instance = 0, };
>>
>>          memset(&arg, 0, sizeof(arg));
>>          arg.ctx_id = ctx;
>>          arg.param = I915_CONTEXT_PARAM_SSEU;
>>          arg.value = (uintptr_t) &sseu;
>>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>                  sseu.packed.subslice_mask = 0;
>>
>>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>          }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
>>
>> v7: Synchronize the requests following a powergating setting change using a global
>>      dependency (Chris)
>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
>>
>> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      s/dev_priv/i915/ (Tvrtko)
>>      Change uapi class/instance fields to u16 (Tvrtko)
>>      Bump mask fields to 64bits (Lionel)
>>      Don't return EPERM when dynamic sseu is disabled (Tvrtko)
>>
>> v9: Import context image into kernel context's ppgtt only when
>>      reconfiguring powergated slice/subslices (Chris)
>>      Use aliasing ppgtt when needed (Michel)
>>
>> Tvrtko Ursulin:
>>
>> v10:
>>   * Update for upstream changes.
>>   * Request submit needs a RPM reference.
>>   * Reject on !FULL_PPGTT for simplicity.
>>   * Pull out get/set param to helpers for readability and less indent.
>>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>>     on the same timeline and avoid GEM_BUG_ON.
>>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
>>   * No need to move gen8_make_rpcs up.
>>   * Factored out global barrier as prep patch.
>>   * Allow to only CAP_SYS_ADMIN if !Gen11.
>>
>> v11:
>>   * Remove engine vfunc in favour of local helper. (Chris Wilson)
>>   * Stop retiring requests before updates since it is not needed
>>     (Chris Wilson)
>>   * Implement direct CPU update path for idle contexts. (Chris Wilson)
>>   * Left side dependency needs only be on the same context timeline.
>>     (Chris Wilson)
>>   * It is sufficient to order the timeline. (Chris Wilson)
>>   * Reject !RCS configuration attempts with -ENODEV for now.
>>
>> v12:
>>   * Rebase for make_rpcs.
>>
>> v13:
>>   * Centralize SSEU normalization to make_rpcs.
>>   * Type width checking (uAPI <-> implementation).
>>   * Gen11 restrictions uAPI checks.
>>   * Gen11 subslice count differences handling.
>>   Chris Wilson:
>>   * args->size handling fixes.
>>   * Update context image from GGTT.
>>   * Postpone context image update to pinning.
>>   * Use i915_gem_active_raw instead of last_request_on_engine.
>>
>> v14:
>>   * Add activity tracker on intel_context to fix the lifetime issues
>>     and simplify the code. (Chris Wilson)
>>
>> v15:
>>   * Fix context pin leak if no space in ring by simplifying the
>>     context pinning sequence.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Issue: https://github.com/intel/media-driver/issues/267
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 303 +++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_gem_context.h |   6 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   4 +-
>>   include/uapi/drm/i915_drm.h             |  43 ++++
>>   4 files changed, 353 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index ca2c8fcd1090..aa1f34e63080 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -90,6 +90,7 @@
>>   #include <drm/i915_drm.h>
>>   #include "i915_drv.h"
>>   #include "i915_trace.h"
>> +#include "intel_lrc_reg.h"
>>   #include "intel_workarounds.h"
>>   
>>   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
>> @@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
>>          return desc;
>>   }
>>   
>> +static void intel_context_retire(struct i915_gem_active *active,
>> +                                struct i915_request *rq)
>> +{
>> +       struct intel_context *ce = container_of(active, typeof(*ce), active);
>> +
>> +       intel_context_unpin(ce);
>> +}
>> +
>>   static struct i915_gem_context *
>>   __create_hw_context(struct drm_i915_private *dev_priv,
>>                      struct drm_i915_file_private *file_priv)
>> @@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>>                  ce->gem_context = ctx;
>>                  /* Use the whole device by default */
>>                  ce->sseu = intel_device_default_sseu(dev_priv);
>> +
>> +               init_request_active(&ce->active, intel_context_retire);
>>          }
>>   
>>          INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>> @@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>          return 0;
>>   }
>>   
>> +static int get_sseu(struct i915_gem_context *ctx,
>> +                   struct drm_i915_gem_context_param *args)
>> +{
>> +       struct drm_i915_gem_context_param_sseu user_sseu;
>> +       struct intel_engine_cs *engine;
>> +       struct intel_context *ce;
>> +
>> +       if (args->size == 0)
>> +               goto out;
>> +       else if (args->size < sizeof(user_sseu))
>> +               return -EINVAL;
>> +
>> +       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>> +                          sizeof(user_sseu)))
>> +               return -EFAULT;
>> +
>> +       if (user_sseu.rsvd1 || user_sseu.rsvd2)
>> +               return -EINVAL;
>> +
>> +       engine = intel_engine_lookup_user(ctx->i915,
>> +                                         user_sseu.class,
>> +                                         user_sseu.instance);
>> +       if (!engine)
>> +               return -EINVAL;
>> +
>> +       ce = to_intel_context(ctx, engine);
>> +
>> +       user_sseu.slice_mask = ce->sseu.slice_mask;
>> +       user_sseu.subslice_mask = ce->sseu.subslice_mask;
>> +       user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
>> +       user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
>> +
>> +       if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
>> +                        sizeof(user_sseu)))
>> +               return -EFAULT;
>> +
>> +out:
>> +       args->size = sizeof(user_sseu);
>> +
>> +       return 0;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>                                      struct drm_file *file)
>>   {
>> @@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>          if (!ctx)
>>                  return -ENOENT;
>>   
>> -       args->size = 0;
> 
> I've a slight preference to setting to 0 then overwriting it afterwards.

We can't use/validate it then. Alternative to just not clear it for ABI 
where it is not used? In other words would go away from the case 
branches completely. Does the ABI depend on it being zeroed on return 
from get_param? It would be strange..

> 
>>          switch (args->param) {
>>          case I915_CONTEXT_PARAM_BAN_PERIOD:
>>                  ret = -EINVAL;
>>                  break;
>>          case I915_CONTEXT_PARAM_NO_ZEROMAP:
>> +               args->size = 0;
>>                  args->value = ctx->flags & CONTEXT_NO_ZEROMAP;
>>                  break;
>>          case I915_CONTEXT_PARAM_GTT_SIZE:
>> +               args->size = 0;
>> +
>>                  if (ctx->ppgtt)
>>                          args->value = ctx->ppgtt->vm.total;
>>                  else if (to_i915(dev)->mm.aliasing_ppgtt)
>>   int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                                      struct drm_file *file)
>>   {
>> @@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                                  ctx->sched.priority = priority;
>>                  }
>>                  break;
>> -
>> +       case I915_CONTEXT_PARAM_SSEU:
>> +               ret = set_sseu(ctx, args);
>> +               break;
>>          default:
>>                  ret = -EINVAL;
>>                  break;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
>> index 79d2e8f62ad1..968e1d47d944 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
>> @@ -165,6 +165,12 @@ struct i915_gem_context {
>>                  u64 lrc_desc;
>>                  int pin_count;
>>   
>> +               /**
>> +                * active: Active tracker for the external rq activity on this
>> +                * intel_context object.
>> +                */
>> +               struct i915_gem_active active;
>> +
>>                  const struct intel_context_ops *ops;
>>   
>>                  /** sseu: Control eu/slice partitioning */
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9709c1fbe836..3c85392a3109 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
>>           * subslices are enabled, or a count between one and four on the first
>>           * slice.
>>           */
>> -       if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) {
>> +       if (IS_GEN11(dev_priv) &&
>> +           slices == 1 &&
>> +           subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) {
> 
> Sneaky. Is this a direct consequence of exposing sseu to the user, or
> should argue the protection required irrespective of who fill sseu?

The former, when not exposed to the user some invalid/impossible 
combinations are not possible. I don't feel we should validate the SKU 
configuration detection here but trust it.

Regards,

Tvrtko

> 
>>                  GEM_BUG_ON(subslices & 1);
>>   
>>                  subslice_pg = false;
> 
> For the rq mechanics after all the hassle I gave Tvrtko,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> I didn't look closely at the validation layer.
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-09-06  9:50     ` Tvrtko Ursulin
@ 2018-09-06  9:54       ` Chris Wilson
  2018-09-06  9:58       ` Lionel Landwerlin
  1 sibling, 0 replies; 37+ messages in thread
From: Chris Wilson @ 2018-09-06  9:54 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-06 10:50:32)
> 
> On 05/09/2018 16:29, Chris Wilson wrote:
> > I've a slight preference to setting to 0 then overwriting it afterwards.
> 
> We can't use/validate it then. Alternative to just not clear it for ABI 
> where it is not used? In other words would go away from the case 
> branches completely. Does the ABI depend on it being zeroed on return 
> from get_param? It would be strange..

Bah, I was just looking at the patch hoping for the best (without
thinking). I've some recollection of doing something similar and coming
to the same conclusion.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
  2018-09-05 15:21   ` Chris Wilson
@ 2018-09-06  9:57   ` Lionel Landwerlin
  2018-09-06 10:10     ` Chris Wilson
  1 sibling, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-06  9:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On 05/09/2018 15:22, Tvrtko Ursulin wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes.
>
> One possible solution to this problem is to reprogram the NOA muxes
> when we switch to a new context. We initially tried this in the
> workaround batchbuffer but some concerns where raised about the cost
> of reprogramming at every context switch. This solution is also not
> without consequences from the userspace point of view. Reprogramming
> of the muxes can only happen once the powergating configuration has
> changed (which happens after context switch). This means for a window
> of time during the recording, counters recorded by the OA unit might
> be invalid. This requires userspace dealing with OA reports to discard
> the invalid values.
>
> Minimizing the reprogramming could be implemented by tracking of the
> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> to discard some of the programming commands, but the command streamer
> would still have to parse all the MI_LRI instructions in the
> workaround batchbuffer.
>
> Another solution, which this change implements, is to simply disregard
> the user requested configuration for the period of time when i915/perf
> is active. There is no known issue with this apart from a performance
> penality for some media workloads that benefit from running on a
> partially powergated GPU. We already prevent RC6 from affecting the
> programming so it doesn't sound completely unreasonable to hold on
> powergating for the same reason.
>
> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>
> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>      More to_intel_context() (Tvrtko)
>      s/dev_priv/i915/ (Tvrtko)
>
> Tvrtko Ursulin:
>
> v4:
>   * Rebase for make_rpcs changes.
>
> v5:
>   * Apply OA restriction from make_rpcs directly.
>
> v6:
>   * Rebase for context image setup changes.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>   drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>   3 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index ccb20230df2c..dd65b72bddd4 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
> +
> +	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(dev_priv,
> +			       &to_intel_context(ctx,
> +						 dev_priv->engine[RCS])->sseu));


I think there is one issue I missed on the previous iterations of this 
patch.

This gen8_update_reg_state_unlocked() is called when the GPU is parked 
on the kernel context.

It's supposed to update all contexts, but I think we might not be able 
to update the kernel context image while the GPU is using it.

Context save might happen after we edited the image and that would 
override the values we just put in there.


The OA config is emitted through context image edition in this function 
but also through the ring buffer in 
gen8_switch_to_updated_kernel_context() for the kernel context.

Since we can't have a context modify its own RCPS value, we'll have to 
resort to yet another context to do that for the kernel context.


I remember having a patch that created yet another kernel context (let's 
call it rpcs edition context), which is used to reconfigure rpcs for 
every context but itself and then have the kernel context reconfigure 
this rpcs edition context.

Or alternatively not do anything to it, because it's only going to run 
to edit other contexts at a time when we don't care about power 
configuration stability.


-

Lionel


>   }
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8a477e43dbca..9709c1fbe836 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1305,9 +1305,6 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
>   	return i915_vma_pin(vma, 0, 0, flags);
>   }
>   
> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
> -		     struct intel_sseu *ctx_sseu);
> -
>   static struct intel_context *
>   __execlists_context_pin(struct intel_engine_cs *engine,
>   			struct i915_gem_context *ctx,
> @@ -1350,7 +1347,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   	/* RPCS */
>   	if (engine->class == RENDER_CLASS) {
>   		ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
> -					make_rpcs(engine->i915, &ce->sseu);
> +					gen8_make_rpcs(engine->i915, &ce->sseu);
>   	}
>   
>   	ce->state->obj->pin_global++;
> @@ -2494,15 +2491,28 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
> -		     struct intel_sseu *ctx_sseu)
> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
> +		   struct intel_sseu *req_sseu)
>   {
>   	const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
>   	bool subslice_pg = sseu->has_subslice_pg;
> -	u8 slices = hweight8(ctx_sseu->slice_mask);
> -	u8 subslices = hweight8(ctx_sseu->subslice_mask);
> +	struct intel_sseu ctx_sseu;
> +	u8 slices, subslices;
>   	u32 rpcs = 0;
>   
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. The most natural configuration to take in that case
> +	 * is the default (i.e maximum the hardware can do).
> +	 */
> +	if (unlikely(dev_priv->perf.oa.exclusive_stream))
> +		ctx_sseu = intel_device_default_sseu(dev_priv);
> +	else
> +		ctx_sseu = *req_sseu;
> +
> +	slices = hweight8(ctx_sseu.slice_mask);
> +	subslices = hweight8(ctx_sseu.subslice_mask);
> +
>   	/*
>   	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
>   	 * wide and Icelake has up to eight subslices, specfial programming is
> @@ -2572,13 +2582,13 @@ static u32 make_rpcs(struct drm_i915_private *dev_priv,
>   	if (sseu->has_eu_pg) {
>   		u32 val;
>   
> -		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
> +		val = ctx_sseu.min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
>   		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
>   		val &= GEN8_RPCS_EU_MIN_MASK;
>   
>   		rpcs |= val;
>   
> -		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
> +		val = ctx_sseu.max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
>   		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
>   		val &= GEN8_RPCS_EU_MAX_MASK;
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f5a5502ecf70..11da6fc0002d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
>   void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
>   
> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
> +		   struct intel_sseu *ctx_sseu);
> +
>   #endif /* _INTEL_LRC_H_ */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-09-06  9:50     ` Tvrtko Ursulin
  2018-09-06  9:54       ` Chris Wilson
@ 2018-09-06  9:58       ` Lionel Landwerlin
  1 sibling, 0 replies; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-06  9:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


[-- Attachment #1.1: Type: text/plain, Size: 562 bytes --]

On 06/09/2018 10:50, Tvrtko Ursulin wrote:
>
> On 05/09/2018 16:29, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2018-09-05 15:22:21)
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> Now this looks nothing like my first suggestion!
>>
>> I think Tvrtko should stand ad the author of the final mechanism, I
>> think it is substantially different from the submission method first
>> done by Lionel.
>
> Okay I'll relieve you from authorship on this one. Not sure who 
> between Lionel and me with, but I'll think of something.

Feel free to take over :)


[-- Attachment #1.2: Type: text/html, Size: 1381 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-06  9:57   ` Lionel Landwerlin
@ 2018-09-06 10:10     ` Chris Wilson
  2018-09-06 10:18       ` Lionel Landwerlin
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-06 10:10 UTC (permalink / raw)
  To: Intel-gfx, Lionel Landwerlin, Tvrtko Ursulin

Quoting Lionel Landwerlin (2018-09-06 10:57:47)
> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
> > From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >
> > If some of the contexts submitting workloads to the GPU have been
> > configured to shutdown slices/subslices, we might loose the NOA
> > configurations written in the NOA muxes.
> >
> > One possible solution to this problem is to reprogram the NOA muxes
> > when we switch to a new context. We initially tried this in the
> > workaround batchbuffer but some concerns where raised about the cost
> > of reprogramming at every context switch. This solution is also not
> > without consequences from the userspace point of view. Reprogramming
> > of the muxes can only happen once the powergating configuration has
> > changed (which happens after context switch). This means for a window
> > of time during the recording, counters recorded by the OA unit might
> > be invalid. This requires userspace dealing with OA reports to discard
> > the invalid values.
> >
> > Minimizing the reprogramming could be implemented by tracking of the
> > last programmed configuration somewhere in GGTT and use MI_PREDICATE
> > to discard some of the programming commands, but the command streamer
> > would still have to parse all the MI_LRI instructions in the
> > workaround batchbuffer.
> >
> > Another solution, which this change implements, is to simply disregard
> > the user requested configuration for the period of time when i915/perf
> > is active. There is no known issue with this apart from a performance
> > penality for some media workloads that benefit from running on a
> > partially powergated GPU. We already prevent RC6 from affecting the
> > programming so it doesn't sound completely unreasonable to hold on
> > powergating for the same reason.
> >
> > v2: Leave RPCS programming in intel_lrc.c (Lionel)
> >
> > v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
> >      More to_intel_context() (Tvrtko)
> >      s/dev_priv/i915/ (Tvrtko)
> >
> > Tvrtko Ursulin:
> >
> > v4:
> >   * Rebase for make_rpcs changes.
> >
> > v5:
> >   * Apply OA restriction from make_rpcs directly.
> >
> > v6:
> >   * Rebase for context image setup changes.
> >
> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c |  5 +++++
> >   drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
> >   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
> >   3 files changed, 28 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index ccb20230df2c..dd65b72bddd4 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
> >   
> >               CTX_REG(reg_state, state_offset, flex_regs[i], value);
> >       }
> > +
> > +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> > +             gen8_make_rpcs(dev_priv,
> > +                            &to_intel_context(ctx,
> > +                                              dev_priv->engine[RCS])->sseu));
> 
> 
> I think there is one issue I missed on the previous iterations of this 
> patch.
> 
> This gen8_update_reg_state_unlocked() is called when the GPU is parked 
> on the kernel context.
> 
> It's supposed to update all contexts, but I think we might not be able 
> to update the kernel context image while the GPU is using it.

The kernel context is only ever taken in extremis (you are either
parking or stalling userspace) so I don't care.
 
> Context save might happen after we edited the image and that would 
> override the values we just put in there.
> 
> 
> The OA config is emitted through context image edition in this function 
> but also through the ring buffer in 
> gen8_switch_to_updated_kernel_context() for the kernel context.
> 
> Since we can't have a context modify its own RCPS value, we'll have to 
> resort to yet another context to do that for the kernel context.
> 
> 
> I remember having a patch that created yet another kernel context (let's 
> call it rpcs edition context), which is used to reconfigure rpcs for 
> every context but itself and then have the kernel context reconfigure 
> this rpcs edition context.
> 
> Or alternatively not do anything to it, because it's only going to run 
> to edit other contexts at a time when we don't care about power 
> configuration stability.

Exactly.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-06 10:10     ` Chris Wilson
@ 2018-09-06 10:18       ` Lionel Landwerlin
  2018-09-06 10:22         ` Chris Wilson
  0 siblings, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-06 10:18 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 06/09/2018 11:10, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>
>>> If some of the contexts submitting workloads to the GPU have been
>>> configured to shutdown slices/subslices, we might loose the NOA
>>> configurations written in the NOA muxes.
>>>
>>> One possible solution to this problem is to reprogram the NOA muxes
>>> when we switch to a new context. We initially tried this in the
>>> workaround batchbuffer but some concerns where raised about the cost
>>> of reprogramming at every context switch. This solution is also not
>>> without consequences from the userspace point of view. Reprogramming
>>> of the muxes can only happen once the powergating configuration has
>>> changed (which happens after context switch). This means for a window
>>> of time during the recording, counters recorded by the OA unit might
>>> be invalid. This requires userspace dealing with OA reports to discard
>>> the invalid values.
>>>
>>> Minimizing the reprogramming could be implemented by tracking of the
>>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>>> to discard some of the programming commands, but the command streamer
>>> would still have to parse all the MI_LRI instructions in the
>>> workaround batchbuffer.
>>>
>>> Another solution, which this change implements, is to simply disregard
>>> the user requested configuration for the period of time when i915/perf
>>> is active. There is no known issue with this apart from a performance
>>> penality for some media workloads that benefit from running on a
>>> partially powergated GPU. We already prevent RC6 from affecting the
>>> programming so it doesn't sound completely unreasonable to hold on
>>> powergating for the same reason.
>>>
>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>
>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>       More to_intel_context() (Tvrtko)
>>>       s/dev_priv/i915/ (Tvrtko)
>>>
>>> Tvrtko Ursulin:
>>>
>>> v4:
>>>    * Rebase for make_rpcs changes.
>>>
>>> v5:
>>>    * Apply OA restriction from make_rpcs directly.
>>>
>>> v6:
>>>    * Rebase for context image setup changes.
>>>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>    drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
>>>    drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>    3 files changed, 28 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index ccb20230df2c..dd65b72bddd4 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>    
>>>                CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>>        }
>>> +
>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>> +             gen8_make_rpcs(dev_priv,
>>> +                            &to_intel_context(ctx,
>>> +                                              dev_priv->engine[RCS])->sseu));
>>
>> I think there is one issue I missed on the previous iterations of this
>> patch.
>>
>> This gen8_update_reg_state_unlocked() is called when the GPU is parked
>> on the kernel context.
>>
>> It's supposed to update all contexts, but I think we might not be able
>> to update the kernel context image while the GPU is using it.
> The kernel context is only ever taken in extremis (you are either
> parking or stalling userspace) so I don't care.


The patch exposing the RPCS configuration to userspace will make use of 
the kernel context while OA/perf is enabled. Even if it reprograms the 
locked value that will break the power configuration stability on Gen11 
(because the locked configuration will be different from the kernel 
context configuration).


-

Lionel

>   
>> Context save might happen after we edited the image and that would
>> override the values we just put in there.
>>
>>
>> The OA config is emitted through context image edition in this function
>> but also through the ring buffer in
>> gen8_switch_to_updated_kernel_context() for the kernel context.
>>
>> Since we can't have a context modify its own RCPS value, we'll have to
>> resort to yet another context to do that for the kernel context.
>>
>>
>> I remember having a patch that created yet another kernel context (let's
>> call it rpcs edition context), which is used to reconfigure rpcs for
>> every context but itself and then have the kernel context reconfigure
>> this rpcs edition context.
>>
>> Or alternatively not do anything to it, because it's only going to run
>> to edit other contexts at a time when we don't care about power
>> configuration stability.
> Exactly.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-06 10:18       ` Lionel Landwerlin
@ 2018-09-06 10:22         ` Chris Wilson
  2018-09-06 10:36           ` Lionel Landwerlin
  0 siblings, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-06 10:22 UTC (permalink / raw)
  To: Intel-gfx, Lionel Landwerlin, Tvrtko Ursulin

Quoting Lionel Landwerlin (2018-09-06 11:18:01)
> On 06/09/2018 11:10, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2018-09-06 10:57:47)
> >> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
> >>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>>
> >>> If some of the contexts submitting workloads to the GPU have been
> >>> configured to shutdown slices/subslices, we might loose the NOA
> >>> configurations written in the NOA muxes.
> >>>
> >>> One possible solution to this problem is to reprogram the NOA muxes
> >>> when we switch to a new context. We initially tried this in the
> >>> workaround batchbuffer but some concerns where raised about the cost
> >>> of reprogramming at every context switch. This solution is also not
> >>> without consequences from the userspace point of view. Reprogramming
> >>> of the muxes can only happen once the powergating configuration has
> >>> changed (which happens after context switch). This means for a window
> >>> of time during the recording, counters recorded by the OA unit might
> >>> be invalid. This requires userspace dealing with OA reports to discard
> >>> the invalid values.
> >>>
> >>> Minimizing the reprogramming could be implemented by tracking of the
> >>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> >>> to discard some of the programming commands, but the command streamer
> >>> would still have to parse all the MI_LRI instructions in the
> >>> workaround batchbuffer.
> >>>
> >>> Another solution, which this change implements, is to simply disregard
> >>> the user requested configuration for the period of time when i915/perf
> >>> is active. There is no known issue with this apart from a performance
> >>> penality for some media workloads that benefit from running on a
> >>> partially powergated GPU. We already prevent RC6 from affecting the
> >>> programming so it doesn't sound completely unreasonable to hold on
> >>> powergating for the same reason.
> >>>
> >>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
> >>>
> >>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
> >>>       More to_intel_context() (Tvrtko)
> >>>       s/dev_priv/i915/ (Tvrtko)
> >>>
> >>> Tvrtko Ursulin:
> >>>
> >>> v4:
> >>>    * Rebase for make_rpcs changes.
> >>>
> >>> v5:
> >>>    * Apply OA restriction from make_rpcs directly.
> >>>
> >>> v6:
> >>>    * Rebase for context image setup changes.
> >>>
> >>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_perf.c |  5 +++++
> >>>    drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
> >>>    drivers/gpu/drm/i915/intel_lrc.h |  3 +++
> >>>    3 files changed, 28 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >>> index ccb20230df2c..dd65b72bddd4 100644
> >>> --- a/drivers/gpu/drm/i915/i915_perf.c
> >>> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >>> @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
> >>>    
> >>>                CTX_REG(reg_state, state_offset, flex_regs[i], value);
> >>>        }
> >>> +
> >>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> >>> +             gen8_make_rpcs(dev_priv,
> >>> +                            &to_intel_context(ctx,
> >>> +                                              dev_priv->engine[RCS])->sseu));
> >>
> >> I think there is one issue I missed on the previous iterations of this
> >> patch.
> >>
> >> This gen8_update_reg_state_unlocked() is called when the GPU is parked
> >> on the kernel context.
> >>
> >> It's supposed to update all contexts, but I think we might not be able
> >> to update the kernel context image while the GPU is using it.
> > The kernel context is only ever taken in extremis (you are either
> > parking or stalling userspace) so I don't care.
> 
> 
> The patch exposing the RPCS configuration to userspace will make use of 
> the kernel context while OA/perf is enabled. Even if it reprograms the 
> locked value that will break the power configuration stability on Gen11 
> (because the locked configuration will be different from the kernel 
> context configuration).

Sure, but as you point out that's only on changing configuration.

What's missing in the patch is that we only bail early if the new sseu
matches the ce->sseu, but that doesn't necessarily match whats in the
context due to OA. (Or maybe I missed the conversion to rpcs value and
checking.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-06 10:22         ` Chris Wilson
@ 2018-09-06 10:36           ` Lionel Landwerlin
  2018-09-07  8:26             ` Tvrtko Ursulin
  0 siblings, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-06 10:36 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 06/09/2018 11:22, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>> On 06/09/2018 11:10, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>
>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>> configurations written in the NOA muxes.
>>>>>
>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>> when we switch to a new context. We initially tried this in the
>>>>> workaround batchbuffer but some concerns where raised about the cost
>>>>> of reprogramming at every context switch. This solution is also not
>>>>> without consequences from the userspace point of view. Reprogramming
>>>>> of the muxes can only happen once the powergating configuration has
>>>>> changed (which happens after context switch). This means for a window
>>>>> of time during the recording, counters recorded by the OA unit might
>>>>> be invalid. This requires userspace dealing with OA reports to discard
>>>>> the invalid values.
>>>>>
>>>>> Minimizing the reprogramming could be implemented by tracking of the
>>>>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>>>>> to discard some of the programming commands, but the command streamer
>>>>> would still have to parse all the MI_LRI instructions in the
>>>>> workaround batchbuffer.
>>>>>
>>>>> Another solution, which this change implements, is to simply disregard
>>>>> the user requested configuration for the period of time when i915/perf
>>>>> is active. There is no known issue with this apart from a performance
>>>>> penality for some media workloads that benefit from running on a
>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>> powergating for the same reason.
>>>>>
>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>
>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>        More to_intel_context() (Tvrtko)
>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>
>>>>> Tvrtko Ursulin:
>>>>>
>>>>> v4:
>>>>>     * Rebase for make_rpcs changes.
>>>>>
>>>>> v5:
>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>
>>>>> v6:
>>>>>     * Rebase for context image setup changes.
>>>>>
>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>> @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>     
>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>>>>         }
>>>>> +
>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>> +             gen8_make_rpcs(dev_priv,
>>>>> +                            &to_intel_context(ctx,
>>>>> +                                              dev_priv->engine[RCS])->sseu));
>>>> I think there is one issue I missed on the previous iterations of this
>>>> patch.
>>>>
>>>> This gen8_update_reg_state_unlocked() is called when the GPU is parked
>>>> on the kernel context.
>>>>
>>>> It's supposed to update all contexts, but I think we might not be able
>>>> to update the kernel context image while the GPU is using it.
>>> The kernel context is only ever taken in extremis (you are either
>>> parking or stalling userspace) so I don't care.
>>
>> The patch exposing the RPCS configuration to userspace will make use of
>> the kernel context while OA/perf is enabled. Even if it reprograms the
>> locked value that will break the power configuration stability on Gen11
>> (because the locked configuration will be different from the kernel
>> context configuration).
> Sure, but as you point out that's only on changing configuration.
>
> What's missing in the patch is that we only bail early if the new sseu
> matches the ce->sseu, but that doesn't necessarily match whats in the
> context due to OA. (Or maybe I missed the conversion to rpcs value and
> checking.)
> -Chris
>

Yep, because the gen8_make_rpcs() post processes the values store at the 
gem context level, we risk rerunning the kernel context to write the 
exiting value.
Sorry this is all so messy :(

-
Lionel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v11 0/7] Per context dynamic (sub)slice power-gating
  2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2018-09-05 19:55 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-09-06 19:33 ` Chris Wilson
  2018-09-06 19:52   ` Chris Wilson
  11 siblings, 1 reply; 37+ messages in thread
From: Chris Wilson @ 2018-09-06 19:33 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-05 15:22:15)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Updated series after continuing Lionel's work.
> 
> Userspace for the feature is the media-driver project on GitHub. Please see
> https://github.com/intel/media-driver/pull/271/commits.
> 
> No headline changes this time.
> 
> Some review feedback, some refactoring, some patches got merged and two new
> appeared to help with the simplified implementation and also lock SSEU config
> to a workable set on Icelake.

Scanning through the buglist, caught this little gem
https://bugs.freedesktop.org/show_bug.cgi?id=103484
in which Lionel mentioned that he wanted to fix up as part of this
series. The conclusion is that we can remove the i915_sseu_status
debugfs if we are happy with the testing and runtime adjustment.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v11 0/7] Per context dynamic (sub)slice power-gating
  2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
@ 2018-09-06 19:52   ` Chris Wilson
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Wilson @ 2018-09-06 19:52 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Chris Wilson (2018-09-06 20:33:35)
> Quoting Tvrtko Ursulin (2018-09-05 15:22:15)
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Updated series after continuing Lionel's work.
> > 
> > Userspace for the feature is the media-driver project on GitHub. Please see
> > https://github.com/intel/media-driver/pull/271/commits.
> > 
> > No headline changes this time.
> > 
> > Some review feedback, some refactoring, some patches got merged and two new
> > appeared to help with the simplified implementation and also lock SSEU config
> > to a workable set on Icelake.
> 
> Scanning through the buglist, caught this little gem
> https://bugs.freedesktop.org/show_bug.cgi?id=103484
> in which Lionel mentioned that he wanted to fix up as part of this
> series. The conclusion is that we can remove the i915_sseu_status
> debugfs if we are happy with the testing and runtime adjustment.

Please also note https://bugs.freedesktop.org/show_bug.cgi?id=100899 in
the changelog somewhere and update when done. Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-06 10:36           ` Lionel Landwerlin
@ 2018-09-07  8:26             ` Tvrtko Ursulin
  2018-09-07  8:59               ` Chris Wilson
  2018-09-07  9:23               ` Lionel Landwerlin
  0 siblings, 2 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-07  8:26 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 06/09/2018 11:36, Lionel Landwerlin wrote:
> On 06/09/2018 11:22, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>
>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>> configurations written in the NOA muxes.
>>>>>>
>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>> when we switch to a new context. We initially tried this in the
>>>>>> workaround batchbuffer but some concerns where raised about the cost
>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>> without consequences from the userspace point of view. Reprogramming
>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>> changed (which happens after context switch). This means for a window
>>>>>> of time during the recording, counters recorded by the OA unit might
>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>> discard
>>>>>> the invalid values.
>>>>>>
>>>>>> Minimizing the reprogramming could be implemented by tracking of the
>>>>>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>>>>>> to discard some of the programming commands, but the command streamer
>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>> workaround batchbuffer.
>>>>>>
>>>>>> Another solution, which this change implements, is to simply 
>>>>>> disregard
>>>>>> the user requested configuration for the period of time when 
>>>>>> i915/perf
>>>>>> is active. There is no known issue with this apart from a performance
>>>>>> penality for some media workloads that benefit from running on a
>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>> powergating for the same reason.
>>>>>>
>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>
>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>
>>>>>> Tvrtko Ursulin:
>>>>>>
>>>>>> v4:
>>>>>>     * Rebase for make_rpcs changes.
>>>>>>
>>>>>> v5:
>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>
>>>>>> v6:
>>>>>>     * Rebase for context image setup changes.
>>>>>>
>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>> ++++++++++++++++++++----------
>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>> value);
>>>>>>         }
>>>>>> +
>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>> +                            &to_intel_context(ctx,
>>>>>> +                                              
>>>>>> dev_priv->engine[RCS])->sseu));
>>>>> I think there is one issue I missed on the previous iterations of this
>>>>> patch.
>>>>>
>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is parked
>>>>> on the kernel context.
>>>>>
>>>>> It's supposed to update all contexts, but I think we might not be able
>>>>> to update the kernel context image while the GPU is using it.
>>>> The kernel context is only ever taken in extremis (you are either
>>>> parking or stalling userspace) so I don't care.
>>>
>>> The patch exposing the RPCS configuration to userspace will make use of
>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>> locked value that will break the power configuration stability on Gen11
>>> (because the locked configuration will be different from the kernel
>>> context configuration).
>> Sure, but as you point out that's only on changing configuration.
>>
>> What's missing in the patch is that we only bail early if the new sseu
>> matches the ce->sseu, but that doesn't necessarily match whats in the
>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>> checking.)
>> -Chris
>>
> 
> Yep, because the gen8_make_rpcs() post processes the values store at the 
> gem context level, we risk rerunning the kernel context to write the 
> exiting value.
> Sorry this is all so messy :(

Lets see if I managed to follow here.

The current code indeed bails out at the set ctx param level if the 
requested state matches the ce->state. My thinking was that ce->state is 
the master state and whatever happens in "post processing" via 
gen8_make_rpcs should be hidden from it since the design is that the 
i915_perf.c will re-configure all contexts when the OA active status 
changes (to either direction).

So I don't see a problem in those two interactions.

Apart from one, get_param_sseu will lie a bit - we can discuss about 
this one more. At one point I suggested we have two sets of masks in the 
uAPI, requested and active in a way. So userspace could query what it 
set and what is actually active.

Now second issue is if i915_perf.c is able to reprogram the kernel config.

Here its true, it will write to the context image and that will get 
overwritten by context save.

If that is a problem for OA, I was initially if a throw-away second 
"kernel" context could be use to re-program the real one, but perhaps 
even simpler - what about a mmio write to program the RPCS while kernel 
context is active?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-07  8:26             ` Tvrtko Ursulin
@ 2018-09-07  8:59               ` Chris Wilson
  2018-09-07  9:23               ` Lionel Landwerlin
  1 sibling, 0 replies; 37+ messages in thread
From: Chris Wilson @ 2018-09-07  8:59 UTC (permalink / raw)
  To: Intel-gfx, Lionel Landwerlin, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-09-07 09:26:27)
> 
> On 06/09/2018 11:36, Lionel Landwerlin wrote:
> > On 06/09/2018 11:22, Chris Wilson wrote:
> >> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
> >>> On 06/09/2018 11:10, Chris Wilson wrote:
> >>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
> >>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
> >>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>>>>>
> >>>>>> If some of the contexts submitting workloads to the GPU have been
> >>>>>> configured to shutdown slices/subslices, we might loose the NOA
> >>>>>> configurations written in the NOA muxes.
> >>>>>>
> >>>>>> One possible solution to this problem is to reprogram the NOA muxes
> >>>>>> when we switch to a new context. We initially tried this in the
> >>>>>> workaround batchbuffer but some concerns where raised about the cost
> >>>>>> of reprogramming at every context switch. This solution is also not
> >>>>>> without consequences from the userspace point of view. Reprogramming
> >>>>>> of the muxes can only happen once the powergating configuration has
> >>>>>> changed (which happens after context switch). This means for a window
> >>>>>> of time during the recording, counters recorded by the OA unit might
> >>>>>> be invalid. This requires userspace dealing with OA reports to 
> >>>>>> discard
> >>>>>> the invalid values.
> >>>>>>
> >>>>>> Minimizing the reprogramming could be implemented by tracking of the
> >>>>>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> >>>>>> to discard some of the programming commands, but the command streamer
> >>>>>> would still have to parse all the MI_LRI instructions in the
> >>>>>> workaround batchbuffer.
> >>>>>>
> >>>>>> Another solution, which this change implements, is to simply 
> >>>>>> disregard
> >>>>>> the user requested configuration for the period of time when 
> >>>>>> i915/perf
> >>>>>> is active. There is no known issue with this apart from a performance
> >>>>>> penality for some media workloads that benefit from running on a
> >>>>>> partially powergated GPU. We already prevent RC6 from affecting the
> >>>>>> programming so it doesn't sound completely unreasonable to hold on
> >>>>>> powergating for the same reason.
> >>>>>>
> >>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
> >>>>>>
> >>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
> >>>>>>        More to_intel_context() (Tvrtko)
> >>>>>>        s/dev_priv/i915/ (Tvrtko)
> >>>>>>
> >>>>>> Tvrtko Ursulin:
> >>>>>>
> >>>>>> v4:
> >>>>>>     * Rebase for make_rpcs changes.
> >>>>>>
> >>>>>> v5:
> >>>>>>     * Apply OA restriction from make_rpcs directly.
> >>>>>>
> >>>>>> v6:
> >>>>>>     * Rebase for context image setup changes.
> >>>>>>
> >>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
> >>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
> >>>>>> ++++++++++++++++++++----------
> >>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
> >>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> >>>>>> b/drivers/gpu/drm/i915/i915_perf.c
> >>>>>> index ccb20230df2c..dd65b72bddd4 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >>>>>> @@ -1677,6 +1677,11 @@ static void 
> >>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
> >>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
> >>>>>> value);
> >>>>>>         }
> >>>>>> +
> >>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> >>>>>> +             gen8_make_rpcs(dev_priv,
> >>>>>> +                            &to_intel_context(ctx,
> >>>>>> +                                              
> >>>>>> dev_priv->engine[RCS])->sseu));
> >>>>> I think there is one issue I missed on the previous iterations of this
> >>>>> patch.
> >>>>>
> >>>>> This gen8_update_reg_state_unlocked() is called when the GPU is parked
> >>>>> on the kernel context.
> >>>>>
> >>>>> It's supposed to update all contexts, but I think we might not be able
> >>>>> to update the kernel context image while the GPU is using it.
> >>>> The kernel context is only ever taken in extremis (you are either
> >>>> parking or stalling userspace) so I don't care.
> >>>
> >>> The patch exposing the RPCS configuration to userspace will make use of
> >>> the kernel context while OA/perf is enabled. Even if it reprograms the
> >>> locked value that will break the power configuration stability on Gen11
> >>> (because the locked configuration will be different from the kernel
> >>> context configuration).
> >> Sure, but as you point out that's only on changing configuration.
> >>
> >> What's missing in the patch is that we only bail early if the new sseu
> >> matches the ce->sseu, but that doesn't necessarily match whats in the
> >> context due to OA. (Or maybe I missed the conversion to rpcs value and
> >> checking.)
> >> -Chris
> >>
> > 
> > Yep, because the gen8_make_rpcs() post processes the values store at the 
> > gem context level, we risk rerunning the kernel context to write the 
> > exiting value.
> > Sorry this is all so messy :(
> 
> Lets see if I managed to follow here.
> 
> The current code indeed bails out at the set ctx param level if the 
> requested state matches the ce->state. My thinking was that ce->state is 
> the master state and whatever happens in "post processing" via 
> gen8_make_rpcs should be hidden from it since the design is that the 
> i915_perf.c will re-configure all contexts when the OA active status 
> changes (to either direction).
> 
> So I don't see a problem in those two interactions.

Our muttering was just along the lines that we can skip the update via
GPU if oa was already active.
 
> Apart from one, get_param_sseu will lie a bit - we can discuss about 
> this one more. At one point I suggested we have two sets of masks in the 
> uAPI, requested and active in a way. So userspace could query what it 
> set and what is actually active.

In essence, the context should only get to see its own value, not the
system value since that is privileged information (of the OA user in
this case). It's always a nasty dilemma and I think idempotence of the
user interface is far more important (i.e. the current paired setparam,
getparam is the correct starting point for the API).
 
> Now second issue is if i915_perf.c is able to reprogram the kernel config.
> 
> Here its true, it will write to the context image and that will get 
> overwritten by context save.
> 
> If that is a problem for OA, I was initially if a throw-away second 
> "kernel" context could be use to re-program the real one, but perhaps 
> even simpler - what about a mmio write to program the RPCS while kernel 
> context is active?

I object to OA reporting on the kernel context. I think it should never
provide information about the system contexts as that is privileged
information.
* crawls back under his rock
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-07  8:26             ` Tvrtko Ursulin
  2018-09-07  8:59               ` Chris Wilson
@ 2018-09-07  9:23               ` Lionel Landwerlin
  2018-09-07  9:39                 ` Tvrtko Ursulin
  1 sibling, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-07  9:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


[-- Attachment #1.1: Type: text/plain, Size: 7000 bytes --]

On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>
> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>> On 06/09/2018 11:22, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>
>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>> configurations written in the NOA muxes.
>>>>>>>
>>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>> workaround batchbuffer but some concerns where raised about the 
>>>>>>> cost
>>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>>> without consequences from the userspace point of view. 
>>>>>>> Reprogramming
>>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>>> changed (which happens after context switch). This means for a 
>>>>>>> window
>>>>>>> of time during the recording, counters recorded by the OA unit 
>>>>>>> might
>>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>>> discard
>>>>>>> the invalid values.
>>>>>>>
>>>>>>> Minimizing the reprogramming could be implemented by tracking of 
>>>>>>> the
>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>> MI_PREDICATE
>>>>>>> to discard some of the programming commands, but the command 
>>>>>>> streamer
>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>> workaround batchbuffer.
>>>>>>>
>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>> disregard
>>>>>>> the user requested configuration for the period of time when 
>>>>>>> i915/perf
>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>> performance
>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>>> powergating for the same reason.
>>>>>>>
>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>
>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>
>>>>>>> Tvrtko Ursulin:
>>>>>>>
>>>>>>> v4:
>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>
>>>>>>> v5:
>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>
>>>>>>> v6:
>>>>>>>     * Rebase for context image setup changes.
>>>>>>>
>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>> ++++++++++++++++++++----------
>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>>> value);
>>>>>>>         }
>>>>>>> +
>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>> + &to_intel_context(ctx,
>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>> I think there is one issue I missed on the previous iterations of 
>>>>>> this
>>>>>> patch.
>>>>>>
>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is 
>>>>>> parked
>>>>>> on the kernel context.
>>>>>>
>>>>>> It's supposed to update all contexts, but I think we might not be 
>>>>>> able
>>>>>> to update the kernel context image while the GPU is using it.
>>>>> The kernel context is only ever taken in extremis (you are either
>>>>> parking or stalling userspace) so I don't care.
>>>>
>>>> The patch exposing the RPCS configuration to userspace will make 
>>>> use of
>>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>>> locked value that will break the power configuration stability on 
>>>> Gen11
>>>> (because the locked configuration will be different from the kernel
>>>> context configuration).
>>> Sure, but as you point out that's only on changing configuration.
>>>
>>> What's missing in the patch is that we only bail early if the new sseu
>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>>> checking.)
>>> -Chris
>>>
>>
>> Yep, because the gen8_make_rpcs() post processes the values store at 
>> the gem context level, we risk rerunning the kernel context to write 
>> the exiting value.
>> Sorry this is all so messy :(
>
> Lets see if I managed to follow here.
>
> The current code indeed bails out at the set ctx param level if the 
> requested state matches the ce->state. My thinking was that ce->state 
> is the master state and whatever happens in "post processing" via 
> gen8_make_rpcs should be hidden from it since the design is that the 
> i915_perf.c will re-configure all contexts when the OA active status 
> changes (to either direction).
>
> So I don't see a problem in those two interactions.


Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for ICL.

You then enable OA which locks the configuration at (0x1,0xf).

The kernel context has retained its (0x1/0xff) configuration.


And after you change the config of contextA to (0x1,0x7).


This would lead to the kernel context scheduled with (0x1,0xff) while OA 
is active.


>
> Apart from one, get_param_sseu will lie a bit - we can discuss about 
> this one more. At one point I suggested we have two sets of masks in 
> the uAPI, requested and active in a way. So userspace could query what 
> it set and what is actually active.
>
> Now second issue is if i915_perf.c is able to reprogram the kernel 
> config.
>
> Here its true, it will write to the context image and that will get 
> overwritten by context save.
>
> If that is a problem for OA, I was initially if a throw-away second 
> "kernel" context could be use to re-program the real one, but perhaps 
> even simpler - what about a mmio write to program the RPCS while 
> kernel context is active?


Documentation says : "This register must not be programmed directly 
through CPU MMIO cycle."


Sorry :(


-

Lionel


>
> Regards,
>
> Tvrtko
>


[-- Attachment #1.2: Type: text/html, Size: 12875 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-07  9:23               ` Lionel Landwerlin
@ 2018-09-07  9:39                 ` Tvrtko Ursulin
  2018-09-07  9:55                   ` Lionel Landwerlin
  0 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-07  9:39 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 07/09/2018 10:23, Lionel Landwerlin wrote:
> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>
>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>
>>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>
>>>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>> workaround batchbuffer but some concerns where raised about the 
>>>>>>>> cost
>>>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>> Reprogramming
>>>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>>>> changed (which happens after context switch). This means for a 
>>>>>>>> window
>>>>>>>> of time during the recording, counters recorded by the OA unit 
>>>>>>>> might
>>>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>>>> discard
>>>>>>>> the invalid values.
>>>>>>>>
>>>>>>>> Minimizing the reprogramming could be implemented by tracking of 
>>>>>>>> the
>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>> MI_PREDICATE
>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>> streamer
>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>> workaround batchbuffer.
>>>>>>>>
>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>> disregard
>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>> i915/perf
>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>> performance
>>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>>>> powergating for the same reason.
>>>>>>>>
>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>
>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>
>>>>>>>> Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> v4:
>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>
>>>>>>>> v5:
>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>
>>>>>>>> v6:
>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>
>>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>>>> value);
>>>>>>>>         }
>>>>>>>> +
>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>> + &to_intel_context(ctx,
>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>> I think there is one issue I missed on the previous iterations of 
>>>>>>> this
>>>>>>> patch.
>>>>>>>
>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is 
>>>>>>> parked
>>>>>>> on the kernel context.
>>>>>>>
>>>>>>> It's supposed to update all contexts, but I think we might not be 
>>>>>>> able
>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>> parking or stalling userspace) so I don't care.
>>>>>
>>>>> The patch exposing the RPCS configuration to userspace will make 
>>>>> use of
>>>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>>>> locked value that will break the power configuration stability on 
>>>>> Gen11
>>>>> (because the locked configuration will be different from the kernel
>>>>> context configuration).
>>>> Sure, but as you point out that's only on changing configuration.
>>>>
>>>> What's missing in the patch is that we only bail early if the new sseu
>>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>>>> checking.)
>>>> -Chris
>>>>
>>>
>>> Yep, because the gen8_make_rpcs() post processes the values store at 
>>> the gem context level, we risk rerunning the kernel context to write 
>>> the exiting value.
>>> Sorry this is all so messy :(
>>
>> Lets see if I managed to follow here.
>>
>> The current code indeed bails out at the set ctx param level if the 
>> requested state matches the ce->state. My thinking was that ce->state 
>> is the master state and whatever happens in "post processing" via 
>> gen8_make_rpcs should be hidden from it since the design is that the 
>> i915_perf.c will re-configure all contexts when the OA active status 
>> changes (to either direction).
>>
>> So I don't see a problem in those two interactions.
> 
> 
> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for ICL.
> 
> You then enable OA which locks the configuration at (0x1,0xf).
> 
> The kernel context has retained its (0x1/0xff) configuration.
> 
> 
> And after you change the config of contextA to (0x1,0x7).
> 
> 
> This would lead to the kernel context scheduled with (0x1,0xff) while OA 
> is active.

Okay that's a problem discussed in the paragraph below - that the kernel 
context is not updated at all. But is it a problem for OA? Will it mess 
up some counters even if kernel context isn't executing anything 
interacting with them? Or is it?

> 
>>
>> Apart from one, get_param_sseu will lie a bit - we can discuss about 
>> this one more. At one point I suggested we have two sets of masks in 
>> the uAPI, requested and active in a way. So userspace could query what 
>> it set and what is actually active.
>>
>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>> config.
>>
>> Here its true, it will write to the context image and that will get 
>> overwritten by context save.
>>
>> If that is a problem for OA, I was initially if a throw-away second 
>> "kernel" context could be use to re-program the real one, but perhaps 
>> even simpler - what about a mmio write to program the RPCS while 
>> kernel context is active?
> 
> 
> Documentation says : "This register must not be programmed directly 
> through CPU MMIO cycle."
> 
> 
> Sorry :(

Ugh.. okay, help me understand if kernel context absolutely needs to 
follow the "lock" for OA to work and then we'll see what to do.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-07  9:39                 ` Tvrtko Ursulin
@ 2018-09-07  9:55                   ` Lionel Landwerlin
  2018-09-10 13:44                     ` Tvrtko Ursulin
  0 siblings, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-07  9:55 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 07/09/2018 10:39, Tvrtko Ursulin wrote:
>
> On 07/09/2018 10:23, Lionel Landwerlin wrote:
>> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>>
>>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>>
>>>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>>
>>>>>>>>> One possible solution to this problem is to reprogram the NOA 
>>>>>>>>> muxes
>>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>>> workaround batchbuffer but some concerns where raised about 
>>>>>>>>> the cost
>>>>>>>>> of reprogramming at every context switch. This solution is 
>>>>>>>>> also not
>>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>>> Reprogramming
>>>>>>>>> of the muxes can only happen once the powergating 
>>>>>>>>> configuration has
>>>>>>>>> changed (which happens after context switch). This means for a 
>>>>>>>>> window
>>>>>>>>> of time during the recording, counters recorded by the OA unit 
>>>>>>>>> might
>>>>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>>>>> discard
>>>>>>>>> the invalid values.
>>>>>>>>>
>>>>>>>>> Minimizing the reprogramming could be implemented by tracking 
>>>>>>>>> of the
>>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>>> MI_PREDICATE
>>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>>> streamer
>>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>>> workaround batchbuffer.
>>>>>>>>>
>>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>>> disregard
>>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>>> i915/perf
>>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>>> performance
>>>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>>>> partially powergated GPU. We already prevent RC6 from 
>>>>>>>>> affecting the
>>>>>>>>> programming so it doesn't sound completely unreasonable to 
>>>>>>>>> hold on
>>>>>>>>> powergating for the same reason.
>>>>>>>>>
>>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>>
>>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>>
>>>>>>>>> Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> v4:
>>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>>
>>>>>>>>> v5:
>>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>>
>>>>>>>>> v6:
>>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>>>>> value);
>>>>>>>>>         }
>>>>>>>>> +
>>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, 
>>>>>>>>> GEN8_R_PWR_CLK_STATE,
>>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>>> + &to_intel_context(ctx,
>>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>>> I think there is one issue I missed on the previous iterations 
>>>>>>>> of this
>>>>>>>> patch.
>>>>>>>>
>>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is 
>>>>>>>> parked
>>>>>>>> on the kernel context.
>>>>>>>>
>>>>>>>> It's supposed to update all contexts, but I think we might not 
>>>>>>>> be able
>>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>>> parking or stalling userspace) so I don't care.
>>>>>>
>>>>>> The patch exposing the RPCS configuration to userspace will make 
>>>>>> use of
>>>>>> the kernel context while OA/perf is enabled. Even if it 
>>>>>> reprograms the
>>>>>> locked value that will break the power configuration stability on 
>>>>>> Gen11
>>>>>> (because the locked configuration will be different from the kernel
>>>>>> context configuration).
>>>>> Sure, but as you point out that's only on changing configuration.
>>>>>
>>>>> What's missing in the patch is that we only bail early if the new 
>>>>> sseu
>>>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>>>> context due to OA. (Or maybe I missed the conversion to rpcs value 
>>>>> and
>>>>> checking.)
>>>>> -Chris
>>>>>
>>>>
>>>> Yep, because the gen8_make_rpcs() post processes the values store 
>>>> at the gem context level, we risk rerunning the kernel context to 
>>>> write the exiting value.
>>>> Sorry this is all so messy :(
>>>
>>> Lets see if I managed to follow here.
>>>
>>> The current code indeed bails out at the set ctx param level if the 
>>> requested state matches the ce->state. My thinking was that 
>>> ce->state is the master state and whatever happens in "post 
>>> processing" via gen8_make_rpcs should be hidden from it since the 
>>> design is that the i915_perf.c will re-configure all contexts when 
>>> the OA active status changes (to either direction).
>>>
>>> So I don't see a problem in those two interactions.
>>
>>
>> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for 
>> ICL.
>>
>> You then enable OA which locks the configuration at (0x1,0xf).
>>
>> The kernel context has retained its (0x1/0xff) configuration.
>>
>>
>> And after you change the config of contextA to (0x1,0x7).
>>
>>
>> This would lead to the kernel context scheduled with (0x1,0xff) while 
>> OA is active.
>
> Okay that's a problem discussed in the paragraph below - that the 
> kernel context is not updated at all. But is it a problem for OA? Will 
> it mess up some counters even if kernel context isn't executing 
> anything interacting with them? Or is it?


What the HW is going to do to the NOA logic in power configuration 
changes is not really documented.
Experimentally on SKL GT4, it seems a change in power configuration will 
trigger a power off of everything before applying the power at the new 
configuration.
So that would imply loosing the NOA programming when we switch to the 
kernel context which means invalid values in the counters.


>
>>
>>>
>>> Apart from one, get_param_sseu will lie a bit - we can discuss about 
>>> this one more. At one point I suggested we have two sets of masks in 
>>> the uAPI, requested and active in a way. So userspace could query 
>>> what it set and what is actually active.
>>>
>>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>>> config.
>>>
>>> Here its true, it will write to the context image and that will get 
>>> overwritten by context save.
>>>
>>> If that is a problem for OA, I was initially if a throw-away second 
>>> "kernel" context could be use to re-program the real one, but 
>>> perhaps even simpler - what about a mmio write to program the RPCS 
>>> while kernel context is active?
>>
>>
>> Documentation says : "This register must not be programmed directly 
>> through CPU MMIO cycle."
>>
>>
>> Sorry :(
>
> Ugh.. okay, help me understand if kernel context absolutely needs to 
> follow the "lock" for OA to work and then we'll see what to do.

I think so.

Your idea of a throw away context to reprogramming every seems sound.

Thanks,

-
Lionel


>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-07  9:55                   ` Lionel Landwerlin
@ 2018-09-10 13:44                     ` Tvrtko Ursulin
  2018-09-11 20:11                       ` Lionel Landwerlin
  0 siblings, 1 reply; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-10 13:44 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 07/09/2018 10:55, Lionel Landwerlin wrote:
> On 07/09/2018 10:39, Tvrtko Ursulin wrote:
>>
>> On 07/09/2018 10:23, Lionel Landwerlin wrote:
>>> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>>>
>>>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>>>
>>>>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>>>
>>>>>>>>>> One possible solution to this problem is to reprogram the NOA 
>>>>>>>>>> muxes
>>>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>>>> workaround batchbuffer but some concerns where raised about 
>>>>>>>>>> the cost
>>>>>>>>>> of reprogramming at every context switch. This solution is 
>>>>>>>>>> also not
>>>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>>>> Reprogramming
>>>>>>>>>> of the muxes can only happen once the powergating 
>>>>>>>>>> configuration has
>>>>>>>>>> changed (which happens after context switch). This means for a 
>>>>>>>>>> window
>>>>>>>>>> of time during the recording, counters recorded by the OA unit 
>>>>>>>>>> might
>>>>>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>>>>>> discard
>>>>>>>>>> the invalid values.
>>>>>>>>>>
>>>>>>>>>> Minimizing the reprogramming could be implemented by tracking 
>>>>>>>>>> of the
>>>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>>>> MI_PREDICATE
>>>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>>>> streamer
>>>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>>>> workaround batchbuffer.
>>>>>>>>>>
>>>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>>>> disregard
>>>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>>>> i915/perf
>>>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>>>> performance
>>>>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>>>>> partially powergated GPU. We already prevent RC6 from 
>>>>>>>>>> affecting the
>>>>>>>>>> programming so it doesn't sound completely unreasonable to 
>>>>>>>>>> hold on
>>>>>>>>>> powergating for the same reason.
>>>>>>>>>>
>>>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>>>
>>>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>>>
>>>>>>>>>> Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> v4:
>>>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>>>
>>>>>>>>>> v5:
>>>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>>>
>>>>>>>>>> v6:
>>>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>>>>>> value);
>>>>>>>>>>         }
>>>>>>>>>> +
>>>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, 
>>>>>>>>>> GEN8_R_PWR_CLK_STATE,
>>>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>>>> + &to_intel_context(ctx,
>>>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>>>> I think there is one issue I missed on the previous iterations 
>>>>>>>>> of this
>>>>>>>>> patch.
>>>>>>>>>
>>>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is 
>>>>>>>>> parked
>>>>>>>>> on the kernel context.
>>>>>>>>>
>>>>>>>>> It's supposed to update all contexts, but I think we might not 
>>>>>>>>> be able
>>>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>>>> parking or stalling userspace) so I don't care.
>>>>>>>
>>>>>>> The patch exposing the RPCS configuration to userspace will make 
>>>>>>> use of
>>>>>>> the kernel context while OA/perf is enabled. Even if it 
>>>>>>> reprograms the
>>>>>>> locked value that will break the power configuration stability on 
>>>>>>> Gen11
>>>>>>> (because the locked configuration will be different from the kernel
>>>>>>> context configuration).
>>>>>> Sure, but as you point out that's only on changing configuration.
>>>>>>
>>>>>> What's missing in the patch is that we only bail early if the new 
>>>>>> sseu
>>>>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>>>>> context due to OA. (Or maybe I missed the conversion to rpcs value 
>>>>>> and
>>>>>> checking.)
>>>>>> -Chris
>>>>>>
>>>>>
>>>>> Yep, because the gen8_make_rpcs() post processes the values store 
>>>>> at the gem context level, we risk rerunning the kernel context to 
>>>>> write the exiting value.
>>>>> Sorry this is all so messy :(
>>>>
>>>> Lets see if I managed to follow here.
>>>>
>>>> The current code indeed bails out at the set ctx param level if the 
>>>> requested state matches the ce->state. My thinking was that 
>>>> ce->state is the master state and whatever happens in "post 
>>>> processing" via gen8_make_rpcs should be hidden from it since the 
>>>> design is that the i915_perf.c will re-configure all contexts when 
>>>> the OA active status changes (to either direction).
>>>>
>>>> So I don't see a problem in those two interactions.
>>>
>>>
>>> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for 
>>> ICL.
>>>
>>> You then enable OA which locks the configuration at (0x1,0xf).
>>>
>>> The kernel context has retained its (0x1/0xff) configuration.
>>>
>>>
>>> And after you change the config of contextA to (0x1,0x7).
>>>
>>>
>>> This would lead to the kernel context scheduled with (0x1,0xff) while 
>>> OA is active.
>>
>> Okay that's a problem discussed in the paragraph below - that the 
>> kernel context is not updated at all. But is it a problem for OA? Will 
>> it mess up some counters even if kernel context isn't executing 
>> anything interacting with them? Or is it?
> 
> 
> What the HW is going to do to the NOA logic in power configuration 
> changes is not really documented.
> Experimentally on SKL GT4, it seems a change in power configuration will 
> trigger a power off of everything before applying the power at the new 
> configuration.
> So that would imply loosing the NOA programming when we switch to the 
> kernel context which means invalid values in the counters.
> 
> 
>>
>>>
>>>>
>>>> Apart from one, get_param_sseu will lie a bit - we can discuss about 
>>>> this one more. At one point I suggested we have two sets of masks in 
>>>> the uAPI, requested and active in a way. So userspace could query 
>>>> what it set and what is actually active.
>>>>
>>>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>>>> config.
>>>>
>>>> Here its true, it will write to the context image and that will get 
>>>> overwritten by context save.
>>>>
>>>> If that is a problem for OA, I was initially if a throw-away second 
>>>> "kernel" context could be use to re-program the real one, but 
>>>> perhaps even simpler - what about a mmio write to program the RPCS 
>>>> while kernel context is active?
>>>
>>>
>>> Documentation says : "This register must not be programmed directly 
>>> through CPU MMIO cycle."
>>>
>>>
>>> Sorry :(
>>
>> Ugh.. okay, help me understand if kernel context absolutely needs to 
>> follow the "lock" for OA to work and then we'll see what to do.
> 
> I think so.
> 
> Your idea of a throw away context to reprogramming every seems sound.

I was in the middle of refactoring the series to implement this when I 
started suspecting the need for it.

Premise of the problem statement was the kernel context doesn't get 
updated by the OA code, but, there is a loop in 
gen8_configure_all_contexts which goes through exactly all of them after 
it has idled the GPU.

AFAICS that means it is able to map the kernel context state and edit it 
so everything seems fine from this angle. (Kernel context is 
perma-pinned in software, but after idling the GPU we know it is not 
actually on the GPU so it safe to edit it's image.)

Am I missing some hole here, and if not, why does the code needs to have 
a primary update via a request in gen8_switch_to_updated_kernel_context? 
Context image after idling seems again sufficient.

Would it be possible to exercise the hypothetical loss of NOA 
configuration from an IGT, if kernel context wasn't correctly updated?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-10 13:44                     ` Tvrtko Ursulin
@ 2018-09-11 20:11                       ` Lionel Landwerlin
  2018-09-12  8:03                         ` Tvrtko Ursulin
  0 siblings, 1 reply; 37+ messages in thread
From: Lionel Landwerlin @ 2018-09-11 20:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 10/09/2018 14:44, Tvrtko Ursulin wrote:
>
> On 07/09/2018 10:55, Lionel Landwerlin wrote:
>> On 07/09/2018 10:39, Tvrtko Ursulin wrote:
>>>
>>> On 07/09/2018 10:23, Lionel Landwerlin wrote:
>>>> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>>>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>>>>
>>>>>>>>>>> If some of the contexts submitting workloads to the GPU have 
>>>>>>>>>>> been
>>>>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>>>>
>>>>>>>>>>> One possible solution to this problem is to reprogram the 
>>>>>>>>>>> NOA muxes
>>>>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>>>>> workaround batchbuffer but some concerns where raised about 
>>>>>>>>>>> the cost
>>>>>>>>>>> of reprogramming at every context switch. This solution is 
>>>>>>>>>>> also not
>>>>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>>>>> Reprogramming
>>>>>>>>>>> of the muxes can only happen once the powergating 
>>>>>>>>>>> configuration has
>>>>>>>>>>> changed (which happens after context switch). This means for 
>>>>>>>>>>> a window
>>>>>>>>>>> of time during the recording, counters recorded by the OA 
>>>>>>>>>>> unit might
>>>>>>>>>>> be invalid. This requires userspace dealing with OA reports 
>>>>>>>>>>> to discard
>>>>>>>>>>> the invalid values.
>>>>>>>>>>>
>>>>>>>>>>> Minimizing the reprogramming could be implemented by 
>>>>>>>>>>> tracking of the
>>>>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>>>>> MI_PREDICATE
>>>>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>>>>> streamer
>>>>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>>>>> workaround batchbuffer.
>>>>>>>>>>>
>>>>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>>>>> disregard
>>>>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>>>>> i915/perf
>>>>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>>>>> performance
>>>>>>>>>>> penality for some media workloads that benefit from running 
>>>>>>>>>>> on a
>>>>>>>>>>> partially powergated GPU. We already prevent RC6 from 
>>>>>>>>>>> affecting the
>>>>>>>>>>> programming so it doesn't sound completely unreasonable to 
>>>>>>>>>>> hold on
>>>>>>>>>>> powergating for the same reason.
>>>>>>>>>>>
>>>>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>>>>
>>>>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>>>>
>>>>>>>>>>> Tvrtko Ursulin:
>>>>>>>>>>>
>>>>>>>>>>> v4:
>>>>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>>>>
>>>>>>>>>>> v5:
>>>>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>>>>
>>>>>>>>>>> v6:
>>>>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Lionel Landwerlin 
>>>>>>>>>>> <lionel.g.landwerlin@intel.com>
>>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>>>>                 CTX_REG(reg_state, state_offset, 
>>>>>>>>>>> flex_regs[i], value);
>>>>>>>>>>>         }
>>>>>>>>>>> +
>>>>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, 
>>>>>>>>>>> GEN8_R_PWR_CLK_STATE,
>>>>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>>>>> + &to_intel_context(ctx,
>>>>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>>>>> I think there is one issue I missed on the previous 
>>>>>>>>>> iterations of this
>>>>>>>>>> patch.
>>>>>>>>>>
>>>>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU 
>>>>>>>>>> is parked
>>>>>>>>>> on the kernel context.
>>>>>>>>>>
>>>>>>>>>> It's supposed to update all contexts, but I think we might 
>>>>>>>>>> not be able
>>>>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>>>>> parking or stalling userspace) so I don't care.
>>>>>>>>
>>>>>>>> The patch exposing the RPCS configuration to userspace will 
>>>>>>>> make use of
>>>>>>>> the kernel context while OA/perf is enabled. Even if it 
>>>>>>>> reprograms the
>>>>>>>> locked value that will break the power configuration stability 
>>>>>>>> on Gen11
>>>>>>>> (because the locked configuration will be different from the 
>>>>>>>> kernel
>>>>>>>> context configuration).
>>>>>>> Sure, but as you point out that's only on changing configuration.
>>>>>>>
>>>>>>> What's missing in the patch is that we only bail early if the 
>>>>>>> new sseu
>>>>>>> matches the ce->sseu, but that doesn't necessarily match whats 
>>>>>>> in the
>>>>>>> context due to OA. (Or maybe I missed the conversion to rpcs 
>>>>>>> value and
>>>>>>> checking.)
>>>>>>> -Chris
>>>>>>>
>>>>>>
>>>>>> Yep, because the gen8_make_rpcs() post processes the values store 
>>>>>> at the gem context level, we risk rerunning the kernel context to 
>>>>>> write the exiting value.
>>>>>> Sorry this is all so messy :(
>>>>>
>>>>> Lets see if I managed to follow here.
>>>>>
>>>>> The current code indeed bails out at the set ctx param level if 
>>>>> the requested state matches the ce->state. My thinking was that 
>>>>> ce->state is the master state and whatever happens in "post 
>>>>> processing" via gen8_make_rpcs should be hidden from it since the 
>>>>> design is that the i915_perf.c will re-configure all contexts when 
>>>>> the OA active status changes (to either direction).
>>>>>
>>>>> So I don't see a problem in those two interactions.
>>>>
>>>>
>>>> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) 
>>>> for ICL.
>>>>
>>>> You then enable OA which locks the configuration at (0x1,0xf).
>>>>
>>>> The kernel context has retained its (0x1/0xff) configuration.
>>>>
>>>>
>>>> And after you change the config of contextA to (0x1,0x7).
>>>>
>>>>
>>>> This would lead to the kernel context scheduled with (0x1,0xff) 
>>>> while OA is active.
>>>
>>> Okay that's a problem discussed in the paragraph below - that the 
>>> kernel context is not updated at all. But is it a problem for OA? 
>>> Will it mess up some counters even if kernel context isn't executing 
>>> anything interacting with them? Or is it?
>>
>>
>> What the HW is going to do to the NOA logic in power configuration 
>> changes is not really documented.
>> Experimentally on SKL GT4, it seems a change in power configuration 
>> will trigger a power off of everything before applying the power at 
>> the new configuration.
>> So that would imply loosing the NOA programming when we switch to the 
>> kernel context which means invalid values in the counters.
>>
>>
>>>
>>>>
>>>>>
>>>>> Apart from one, get_param_sseu will lie a bit - we can discuss 
>>>>> about this one more. At one point I suggested we have two sets of 
>>>>> masks in the uAPI, requested and active in a way. So userspace 
>>>>> could query what it set and what is actually active.
>>>>>
>>>>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>>>>> config.
>>>>>
>>>>> Here its true, it will write to the context image and that will 
>>>>> get overwritten by context save.
>>>>>
>>>>> If that is a problem for OA, I was initially if a throw-away 
>>>>> second "kernel" context could be use to re-program the real one, 
>>>>> but perhaps even simpler - what about a mmio write to program the 
>>>>> RPCS while kernel context is active?
>>>>
>>>>
>>>> Documentation says : "This register must not be programmed directly 
>>>> through CPU MMIO cycle."
>>>>
>>>>
>>>> Sorry :(
>>>
>>> Ugh.. okay, help me understand if kernel context absolutely needs to 
>>> follow the "lock" for OA to work and then we'll see what to do.
>>
>> I think so.
>>
>> Your idea of a throw away context to reprogramming every seems sound.
>
> I was in the middle of refactoring the series to implement this when I 
> started suspecting the need for it.
>
> Premise of the problem statement was the kernel context doesn't get 
> updated by the OA code, but, there is a loop in 
> gen8_configure_all_contexts which goes through exactly all of them 
> after it has idled the GPU.
>
> AFAICS that means it is able to map the kernel context state and edit 
> it so everything seems fine from this angle. (Kernel context is 
> perma-pinned in software, but after idling the GPU we know it is not 
> actually on the GPU so it safe to edit it's image.)
>
> Am I missing some hole here, and if not, why does the code needs to 
> have a primary update via a request in 
> gen8_switch_to_updated_kernel_context? Context image after idling 
> seems again sufficient.


My understanding is that the current code puts the GPU idle on the 
kernel context.
When idling stops, the GPU will save the last values from the HW 
register into the kernel context image.
We currently don't see any issue because we also run a set of commands 
on the kernel context prior to idle that mirror the contexts image edition.

That won't be the case for the RPCS register because we can't load it 
from the command streamer.


>
> Would it be possible to exercise the hypothetical loss of NOA 
> configuration from an IGT, if kernel context wasn't correctly updated?


The test configs saved in the kernel use NOA and their counters are 
progressing at different ratios from the "core clock" (RING_TIMESTAMP 
register).
In IGT tests/perf.c, the gen8_sanity_check_test_oa_reports() verify that 
the counters progress as expected.

But I can't tell from what part of the GT the test configs are sourcing 
signals from. If they do from unslice, then that won't work :(

-
Lionel



>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
  2018-09-11 20:11                       ` Lionel Landwerlin
@ 2018-09-12  8:03                         ` Tvrtko Ursulin
  0 siblings, 0 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2018-09-12  8:03 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 11/09/2018 21:11, Lionel Landwerlin wrote:
> On 10/09/2018 14:44, Tvrtko Ursulin wrote:
>>
>> On 07/09/2018 10:55, Lionel Landwerlin wrote:
>>> On 07/09/2018 10:39, Tvrtko Ursulin wrote:
>>>>
>>>> On 07/09/2018 10:23, Lionel Landwerlin wrote:
>>>>> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>>>>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>>>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>> If some of the contexts submitting workloads to the GPU have 
>>>>>>>>>>>> been
>>>>>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>>>>>
>>>>>>>>>>>> One possible solution to this problem is to reprogram the 
>>>>>>>>>>>> NOA muxes
>>>>>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>>>>>> workaround batchbuffer but some concerns where raised about 
>>>>>>>>>>>> the cost
>>>>>>>>>>>> of reprogramming at every context switch. This solution is 
>>>>>>>>>>>> also not
>>>>>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>>>>>> Reprogramming
>>>>>>>>>>>> of the muxes can only happen once the powergating 
>>>>>>>>>>>> configuration has
>>>>>>>>>>>> changed (which happens after context switch). This means for 
>>>>>>>>>>>> a window
>>>>>>>>>>>> of time during the recording, counters recorded by the OA 
>>>>>>>>>>>> unit might
>>>>>>>>>>>> be invalid. This requires userspace dealing with OA reports 
>>>>>>>>>>>> to discard
>>>>>>>>>>>> the invalid values.
>>>>>>>>>>>>
>>>>>>>>>>>> Minimizing the reprogramming could be implemented by 
>>>>>>>>>>>> tracking of the
>>>>>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>>>>>> MI_PREDICATE
>>>>>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>>>>>> streamer
>>>>>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>>>>>> workaround batchbuffer.
>>>>>>>>>>>>
>>>>>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>>>>>> disregard
>>>>>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>>>>>> i915/perf
>>>>>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>>>>>> performance
>>>>>>>>>>>> penality for some media workloads that benefit from running 
>>>>>>>>>>>> on a
>>>>>>>>>>>> partially powergated GPU. We already prevent RC6 from 
>>>>>>>>>>>> affecting the
>>>>>>>>>>>> programming so it doesn't sound completely unreasonable to 
>>>>>>>>>>>> hold on
>>>>>>>>>>>> powergating for the same reason.
>>>>>>>>>>>>
>>>>>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>>>>>
>>>>>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>>>>>
>>>>>>>>>>>> Tvrtko Ursulin:
>>>>>>>>>>>>
>>>>>>>>>>>> v4:
>>>>>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>>>>>
>>>>>>>>>>>> v5:
>>>>>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>>>>>
>>>>>>>>>>>> v6:
>>>>>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Lionel Landwerlin 
>>>>>>>>>>>> <lionel.g.landwerlin@intel.com>
>>>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>>>>>                 CTX_REG(reg_state, state_offset, 
>>>>>>>>>>>> flex_regs[i], value);
>>>>>>>>>>>>         }
>>>>>>>>>>>> +
>>>>>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, 
>>>>>>>>>>>> GEN8_R_PWR_CLK_STATE,
>>>>>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>>>>>> + &to_intel_context(ctx,
>>>>>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>>>>>> I think there is one issue I missed on the previous 
>>>>>>>>>>> iterations of this
>>>>>>>>>>> patch.
>>>>>>>>>>>
>>>>>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU 
>>>>>>>>>>> is parked
>>>>>>>>>>> on the kernel context.
>>>>>>>>>>>
>>>>>>>>>>> It's supposed to update all contexts, but I think we might 
>>>>>>>>>>> not be able
>>>>>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>>>>>> parking or stalling userspace) so I don't care.
>>>>>>>>>
>>>>>>>>> The patch exposing the RPCS configuration to userspace will 
>>>>>>>>> make use of
>>>>>>>>> the kernel context while OA/perf is enabled. Even if it 
>>>>>>>>> reprograms the
>>>>>>>>> locked value that will break the power configuration stability 
>>>>>>>>> on Gen11
>>>>>>>>> (because the locked configuration will be different from the 
>>>>>>>>> kernel
>>>>>>>>> context configuration).
>>>>>>>> Sure, but as you point out that's only on changing configuration.
>>>>>>>>
>>>>>>>> What's missing in the patch is that we only bail early if the 
>>>>>>>> new sseu
>>>>>>>> matches the ce->sseu, but that doesn't necessarily match whats 
>>>>>>>> in the
>>>>>>>> context due to OA. (Or maybe I missed the conversion to rpcs 
>>>>>>>> value and
>>>>>>>> checking.)
>>>>>>>> -Chris
>>>>>>>>
>>>>>>>
>>>>>>> Yep, because the gen8_make_rpcs() post processes the values store 
>>>>>>> at the gem context level, we risk rerunning the kernel context to 
>>>>>>> write the exiting value.
>>>>>>> Sorry this is all so messy :(
>>>>>>
>>>>>> Lets see if I managed to follow here.
>>>>>>
>>>>>> The current code indeed bails out at the set ctx param level if 
>>>>>> the requested state matches the ce->state. My thinking was that 
>>>>>> ce->state is the master state and whatever happens in "post 
>>>>>> processing" via gen8_make_rpcs should be hidden from it since the 
>>>>>> design is that the i915_perf.c will re-configure all contexts when 
>>>>>> the OA active status changes (to either direction).
>>>>>>
>>>>>> So I don't see a problem in those two interactions.
>>>>>
>>>>>
>>>>> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) 
>>>>> for ICL.
>>>>>
>>>>> You then enable OA which locks the configuration at (0x1,0xf).
>>>>>
>>>>> The kernel context has retained its (0x1/0xff) configuration.
>>>>>
>>>>>
>>>>> And after you change the config of contextA to (0x1,0x7).
>>>>>
>>>>>
>>>>> This would lead to the kernel context scheduled with (0x1,0xff) 
>>>>> while OA is active.
>>>>
>>>> Okay that's a problem discussed in the paragraph below - that the 
>>>> kernel context is not updated at all. But is it a problem for OA? 
>>>> Will it mess up some counters even if kernel context isn't executing 
>>>> anything interacting with them? Or is it?
>>>
>>>
>>> What the HW is going to do to the NOA logic in power configuration 
>>> changes is not really documented.
>>> Experimentally on SKL GT4, it seems a change in power configuration 
>>> will trigger a power off of everything before applying the power at 
>>> the new configuration.
>>> So that would imply loosing the NOA programming when we switch to the 
>>> kernel context which means invalid values in the counters.
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Apart from one, get_param_sseu will lie a bit - we can discuss 
>>>>>> about this one more. At one point I suggested we have two sets of 
>>>>>> masks in the uAPI, requested and active in a way. So userspace 
>>>>>> could query what it set and what is actually active.
>>>>>>
>>>>>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>>>>>> config.
>>>>>>
>>>>>> Here its true, it will write to the context image and that will 
>>>>>> get overwritten by context save.
>>>>>>
>>>>>> If that is a problem for OA, I was initially if a throw-away 
>>>>>> second "kernel" context could be use to re-program the real one, 
>>>>>> but perhaps even simpler - what about a mmio write to program the 
>>>>>> RPCS while kernel context is active?
>>>>>
>>>>>
>>>>> Documentation says : "This register must not be programmed directly 
>>>>> through CPU MMIO cycle."
>>>>>
>>>>>
>>>>> Sorry :(
>>>>
>>>> Ugh.. okay, help me understand if kernel context absolutely needs to 
>>>> follow the "lock" for OA to work and then we'll see what to do.
>>>
>>> I think so.
>>>
>>> Your idea of a throw away context to reprogramming every seems sound.
>>
>> I was in the middle of refactoring the series to implement this when I 
>> started suspecting the need for it.
>>
>> Premise of the problem statement was the kernel context doesn't get 
>> updated by the OA code, but, there is a loop in 
>> gen8_configure_all_contexts which goes through exactly all of them 
>> after it has idled the GPU.
>>
>> AFAICS that means it is able to map the kernel context state and edit 
>> it so everything seems fine from this angle. (Kernel context is 
>> perma-pinned in software, but after idling the GPU we know it is not 
>> actually on the GPU so it safe to edit it's image.)
>>
>> Am I missing some hole here, and if not, why does the code needs to 
>> have a primary update via a request in 
>> gen8_switch_to_updated_kernel_context? Context image after idling 
>> seems again sufficient.
> 
> 
> My understanding is that the current code puts the GPU idle on the 
> kernel context.
> When idling stops, the GPU will save the last values from the HW 
> register into the kernel context image.
> We currently don't see any issue because we also run a set of commands 
> on the kernel context prior to idle that mirror the contexts image edition.

We actually retire everything, and the kernel context, wait for any 
pending interrupts (context save) and check that CS reports idle. So I 
think it is actually simpler than the OA code currently does it. I just 
wanted a way to express it as a test.

> That won't be the case for the RPCS register because we can't load it 
> from the command streamer.
> 
>>
>> Would it be possible to exercise the hypothetical loss of NOA 
>> configuration from an IGT, if kernel context wasn't correctly updated?
> 
> 
> The test configs saved in the kernel use NOA and their counters are 
> progressing at different ratios from the "core clock" (RING_TIMESTAMP 
> register).
> In IGT tests/perf.c, the gen8_sanity_check_test_oa_reports() verify that 
> the counters progress as expected.
> 
> But I can't tell from what part of the GT the test configs are sourcing 
> signals from. If they do from unslice, then that won't work :(

So not testable? Shall we still apply the doctrine of "if in doubt rip 
it out"? Or too risky? Shouldn't be.. looks pretty obvious we check 
properly for idle. If we simplify the worst that can happen is that OA 
becomes glitchy paired with Gen11 media workloads which is not so 
critical, but I don't think it will break to start with.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-09-12  8:03 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
2018-09-05 15:14   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 2/7] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
2018-09-05 15:18   ` Chris Wilson
2018-09-06  9:36     ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
2018-09-05 15:21   ` Chris Wilson
2018-09-06  9:41     ` Tvrtko Ursulin
2018-09-06  9:57   ` Lionel Landwerlin
2018-09-06 10:10     ` Chris Wilson
2018-09-06 10:18       ` Lionel Landwerlin
2018-09-06 10:22         ` Chris Wilson
2018-09-06 10:36           ` Lionel Landwerlin
2018-09-07  8:26             ` Tvrtko Ursulin
2018-09-07  8:59               ` Chris Wilson
2018-09-07  9:23               ` Lionel Landwerlin
2018-09-07  9:39                 ` Tvrtko Ursulin
2018-09-07  9:55                   ` Lionel Landwerlin
2018-09-10 13:44                     ` Tvrtko Ursulin
2018-09-11 20:11                       ` Lionel Landwerlin
2018-09-12  8:03                         ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
2018-09-05 15:23   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-09-05 15:29   ` Chris Wilson
2018-09-06  9:50     ` Tvrtko Ursulin
2018-09-06  9:54       ` Chris Wilson
2018-09-06  9:58       ` Lionel Landwerlin
2018-09-05 14:22 ` [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA Tvrtko Ursulin
2018-09-05 14:46 ` ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2) Patchwork
2018-09-05 14:49 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-09-05 15:05 ` ✓ Fi.CI.BAT: success " Patchwork
2018-09-05 19:55 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
2018-09-06 19:52   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.